RiAiR: A Framework for Sensitive RDF Protection
Keywords:RDF protection, Sensitive information, Semantic Web, Disclosure source
The Semantic Web and the Linked Open Data (LOD) initiatives pro-mote the integration and combination of RDF data on the Web. In some cases, data need to be analyzed and protected before publication in order to avoid the disclosure of sensitive information. However, existing RDF techniques do not ensure that sensitive information cannot be discovered since all RDF resources are linked in the Semantic Web and the combination of different datasets could produce or disclose unexpected sensitive information. In this context, we propose a framework, called RiAiR, which reduces the complexity of the RDF structure in order to decrease the interaction of the expert user for the classification of RDF data into identifiers, quasi-identifiers, etc. An intersection process suggests disclosure sources that can compromise the data. Moreover, by a generalization method, we decrease the connections among resources to comply with the main objectives of integration and combination of the Semantic Web. Results show a viability and high performance for a scenario where heterogeneous and linked datasets are present.
M. Davis A. Phillips. Tags for Identifying Languages. https://tools.ietf.org/html/bcp47. Online; accessed 2017-09-11.
Ainur Abdrashitov and Anton Spivak. Sensor data anonymization based on genetic algorithm clustering with l-diversity. 2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology (FRUCT-ISPIT), pages 3–8, 2016.
Olivia Angiuli and Jim Waldo. Statistical tradeoffs between generalization and suppression in the de-identification of large-scale data sets. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, volume 2, pages 589–593. IEEE, 2016.
Yotam Aron. Information privacy for linked data. PhD thesis, Massachusetts Institute of Technology, 2013.
Lars Backstrom, Cynthia Dwork, and Jon Kleinberg. Wherefore art thou r3579x?: Anonymized social networks, hidden patterns, and structural steganography. In Proceedings of the 16th International Conference on World Wide Web, WWW’07, pages 181–190, New York, NY, USA, 2007. ACM.
Claudio Bettini, Xiaoyang Sean Wang, and Sushil Jajodia. The role of quasi-identifiers in k-anonymity revisited. CoRR, abs/cs/0611035, 2006.
Alina Campan and Traian Marius Truta. A clustering approach for data and structural anonymity in social networks, 2008.
Sean Chester, Bruce Kapron, Ganesh Ramesh, Gautam Srivastava, Alex Thomo, and S. Venkatesh. k-anonymization of social networks by vertex addition. In In Proc. 15th Adbis (2), Volume 789 Of Ceur Workshop Proceedings, pages 107–116, 2011.
Richard Cyganiak, David Wood, and Markus Lanthaler. RDF 1.1 Concepts and Abstract Syntax. Technical report, 2014. Online; accessed 2016-12-06.
Irvin Dongo, Firas Al Khalil, Richard Chbeir, and Yudith Cardinale. Semantic Web Datatype Similarity: Towards Better RDF Document Matching, pages 189–205. Springer International Publishing, Cham, 2017.
Irvin Dongo, Yudith Cardinale, and Richard Chbeir. Rdf-f: Rdf datatype inferring framework. Data Science and Engineering, 3(2):115–135, Jun 2018.
Martin Duerst and Michael Suignard. Internationalized Resource Identifiers (IRIs). Technical report, Microsoft Corporation, 2004.
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating Noise to Sensitivity in Private Data Analysis, pages 265–284. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power-law relationships of the internet topology. SIGCOMM Comput. Commun. Rev., 29(4):251–262, August 1999.
Christian Fluhr. From text to rdf. In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, OAIR ’13, pages 221–222, Paris, France, France, 2013. Le Centre de Hautes Etudes Internationales d’informatique Documentaire.
Y. Gao, T. Luo, J. Li, and C. Wang. Research on k anonymity algorithm based on association analysis of data utility. In 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pages 426–432, 2017.
P. Gayathri and V. V. Rajendran. Semantic search on summarized rdf triples. In 2017 International Conference on Intelligent Computing and Control (I2C2), pages 1–6, June 2017.
Kalpa Gunaratna, Krishnaprasad Thirunarayan, Amit Sheth, and Gong Cheng. Gleaning types for literals in rdf triples with application to entity summarization. In Proc. of the 13th International Conference on The SW., pages 85–100, NY, USA, 2016.
Jianmin Han, Huiqun Yu, and Juan Yu. An improved l-diversity model for numerical sensitive attributes. In Communications and Networking in China, 2008. ChinaCom 2008. Third International Conference on, pages 938–943. IEEE, 2008.
Kimia Hassanzadeh, Marek Reformat, Witold Pedrycz, Iqbal Jamal, and John Berezowski. T2r: System for converting tex-tual documents into rdf triples. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03, WI-IAT ’13, pages 221–228, Washington, DC, USA, 2013. IEEE Computer Society.
Michael Hausenblas, Li Ding, and Vassilios Peristeras. Linked open government data. IEEE Intelligent Systems, 27:11–15, 2012.
Michael Hay, Gerome Miklau, David Jensen, Don Towsley, and Philipp Weis. Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow., 1(1):102–114, August 2008.
Michael Hay, Vibhor Rastogi, Gerome Miklau, and Dan Suciu. Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow., 3(1–2):1021–1032, September 2010.
B Heitmann, Felix Hermsen, and S Decker. k-rdf-neighbourhood anonymity: Combining structural and attribute-based anonymisation for linked data. In 5th Workshop on Society, Privacy and the Semantic Web–Policy and Technology (PrivOn2017)(PrivOn), C. Brewster, M. Cheatham, M. dAquin, S. Decker and S. Kirrane, eds, CEUR Workshop Proceedings, Aachen, 2017.
Jyun-Yao Huang, Christoph Lange, and Sören Auer. Streaming transformation of xml to rdf using xpath-based mappings. In Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS ’15, pages 129–136, New York, NY, USA, 2015. ACM.
Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 106–115. IEEE, 2007.
Kun Liu and Evimaria Terzi. Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, pages 93– 106, New York, NY, USA, 2008. ACM.
Lian Liu, Jie Wang, Jinze Liu, and Jun Zhang. Privacy pre-serving in social networks against sensitive edge disclosure. Technical report, Technical Report Technical Report CMIDA-HiPSCCS 006-08, Department of Computer Science, University of Kentucky, KY, 2008.
Maria Laura Maag, Ludovic Denoyer, and Patrick Gallinari. Graph anonymization using machine learning. In Advanced Information Networking and Applications (AINA), 2014 IEEE 28th International Conference on, pages 1111–1118. IEEE, 2014.
Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrishnan Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, pages 24–24. IEEE, 2006.
Amirreza Masoumzadeh, James Joshi, and Hassan A. Karimi. Lbs (k, t)-anonymity: A spatio-temporal approach to anonymity for location-based service users. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS ’09, pages 464–467, New York, NY, USA, 2009. ACM.
Noman Mohammed, Rui Chen, Benjamin C.M. Fung, and Philip S. Yu. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pages 493–501, New York, NY, USA, 2011. ACM.
Andrew Nierman and H. V. Jagadish. Evaluating structural similarity in XML documents. In Mary F. Fernandez and Yannis Papakonstantinou, editors, Proceedings of the Fifth International Workshop on the Web and Databases, WebDB 2002, pages 61–66. University of California, 2002.
Keiichiro Oishi, Yasuyuki Tahara, Yuichi Sei, and Akihiko Ohsuga. Proposal of l-diversity algorithm considering distance between sensitive attribute values. 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1–8, 2017.
Vassilis Papakonstantinou, Giorgos Flouris, Irini Fundulaki, and Haridimos Kondylakis. Securing access to sensitive RDF data. In ESWC (Satellite Events), volume 8798 of Lecture Notes in Computer Science, pages 455–460. Springer, 2014.
Peter F. Patel-Schneider Patrick J. Hayes. RDF 1.1 Semantics, W3C Recommendation 25 February 2014. https://www.w3.org/TR/rdf11-mt/#literals-and-datatypes, 2014. Online; accessed 2016-12-06.
Jyothsna Rachapalli, Vaibhav Khadilkar, Murat Kantarcioglu, and Bhavani Thuraisingham. Redact: A framework for sanitizing rdf data. In Proceedings of the 22nd International Conference on World Wide Web, WWW ‘13 Companion, pages 157–158, New York, NY, USA, 2013. ACM.
Jyothsna Rachapalli, Vaibhav Khadilkar, Murat Kantarcioglu, and Bhavani Thuraisingham. Rdf-x: A language for sanitizing rdf graphs. In Proceedings of the 23rd International Conference on World Wide Web, WWW ’14 Companion, pages 363–364, New York, NY, USA, 2014. ACM.
Jyothsna Rachapalli, Vaibhav Khadilkar, Murat Kantarcioglu, and Bhavani Thuraisingham. Redaction based rdf access control language. In Proceedings of the 19th ACM Symposium on Access Control Models and Technologies, SACMAT ’14, pages 177–180, New York, NY, USA, 2014. ACM.
Jyothsna Rachapalli, Vaibhav Khadilkar, Murat Kantarcioglu, and Bhavani Thuraisingham. Towards fine grained rdf access control. In Proceedings of the 19th ACM Symposium on Access Control Models and Technologies, SACMAT ’14, pages 165–176, New York, NY, USA, 2014. ACM.
Filip Radulovic, Raúl García-Castro, and Asunción Gómez-Pérez. Towards the anonymisation of rdf data. In SEKE, 2015.
P. Samarati. Protecting respondents’ identities in microdata release. IEEE Trans. on Knowl. and Data Eng., 13(6):1010–1027, November 2001.
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In Proceedings of the IEEE Symposium on Research in Security and Privacy, 1998.
Yuichi Sei and Akihiko Ohsuga. Randomized addition of sensitive attributes for l-diversity. 2014 11th International Conference on Security and Cryptography (SECRYPT), pages 1–11, 2014.
Yuichi Sei, Hiroshi Okumura, Takao Takenouchi, and Akihiko Ohsuga. Anonymization of sensitive quasi-identifiers for l-diversity and t-closeness. IEEE Transactions on Dependable and Secure Computing, 2017.
Dipalee Shah and Rajesh Ingle. Privacy-preserving deletion to generalization-based anonymous database. In Proceedings of the CUBE International Information Technology Conference, CUBE ’12, pages 459–463, New York, NY, USA, 2012. ACM.
Moonshik Shin, Sunyong Yoo, Kwang H Lee, and Doheon Lee. Electronic medical records privacy preservation through k-anonymity clustering method. In Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on, pages 1119–1124. IEEE, 2012.
Rôney Reis C. Silva, Bruno C. Leal, Felipe T. Brito, Vânia M. P. Vidal, and Javam C. Machado. A differentially private approach for querying rdf data of social networks. In Proceedings of the 21st International Database Engineering & Applications Symposium, IDEAS 2017, pages 74–81, New York, NY, USA, 2017. ACM.
Regina Ticona-Herrera, Joe Tekli, Richard Chbeir, Sébastien Laborie, Irvin Dongo, and Renato Guzman. Toward RDF Normalization, pages 261–275. Springer International Publishing, Cham, 2015.
Stanley Wasserman and Katherine Faust. Social network analysis: Methods and applications, volume 8. Cambridge university press, 1994.
Gaoming Yang, Jingzhao Li, Shunxiang Zhang, and Li Yu. An enhanced l-diversity privacy preservation. In Fuzzy Systems and Knowledge Discovery (FSKD), 2013 10th International Conference on, pages 1115–1120. IEEE, 2013.
Xiaowei Ying and Xintao Wu. Randomizing social networks: a spectrum preserving approach. In SDM, pages 739–750. SIAM, 2008.
Mingxuan Yuan, Lei Chen, Philip S. Yu, and Ting Yu. Protecting sensitive labels in social network data anonymization. IEEE Trans. on Knowl. and Data Eng., 25(3):633–647, March 2013.
D. Zhang, T. Song, J. He, X. Shi, and Y. Dong. A similarity-oriented rdf graph matching algorithm for ranking linked data. In 2012 IEEE 12th International Conference on Computer and Information Technology, pages 427–434, Oct 2012.
Jianpei Zhang, Ying Zhao, Yue Yang, and Jing Yang. A k-anonymity clustering algorithm based on the information entropy. In Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on, pages 319–324. IEEE, 2014.
Bin Zhou and Jian Pei. Preserving privacy in social networks against neighborhood attacks. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE ’08, pages 506–515, Washington, DC, USA, 2008. IEEE Computer Society.