A New Geometric Data Perturbation Method for Data Anonymization Based on Random Number Generators

Authors

DOI:

https://doi.org/10.13052/jwe1540-9589.20613

Keywords:

privacy-preserving data mining, data anonymization, data perturbation, geometric perturbation, random number generators

Abstract

With the technology’s rapid development and its involvement in all areas of our lives, the volume and value of data have become a significant field of study. Valuation of the data to this extent has produced some consequences in terms of people’s knowledge. Data anonymization is the most important of these issues in terms of the security of personal data. Much work has been done in this area and continues to being done. In this study, we proposed a method called RSUGP for the anonymization of sensitive attributes. A new noise model based on random number generators has been proposed instead of the Gaussian noise or random noise methods, which are being used conventionally in geometric data perturbation. We tested our proposed RSUGP method with six different databases and four different classification methods for classification accuracy and attack resistance; then, we presented the results section. Experiments show that the proposed method was more successful than the other two classification accuracy, attack resistance, and runtime.

Downloads

Download data is not yet available.

Author Biographies

Merve Kanmaz, Computer Programming Department, Istanbul University-Cerrahpasa, Istanbul, Turkey

Merve Kanmaz was born in İstanbul, Turkey. She received the B.S. and M.S. degrees in computer engineering from İstanbul University, İstanbul, in 2011 and 2016, respectively. She is currently pursuing the Ph.D. degree in computer engineering with İstanbul University-Cerrahpaşa, İstanbul. Since 2016, she has been working as a Lecturer at the Computer Programming Department, İstanbul University – Cerrahpaşa since 2014. Her research interests include data anonymization, information security, and big data. She has two journal articles and published and presented five international conference papers.

Muhammed Ali Aydın, Computer Engineering Department, Istanbul University-Cerrahpasa, Istanbul, Turkey

Muhammed Ali Aydin received the B.S. degree from İstanbul University, İstanbul, Turkey, in 2001, the M.Sc. degree from Istanbul Technical University, İstanbul, in 2005, and the Ph.D. degree from İstanbul University, in 2009, all in computer engineering. He was a Postdoctoral Research Associate with the Department of RST, Telecom SudParis, Paris, France, from 2010 to 2011. He has been working as an Associate Professor at the Computer Engineering Department, İstanbul University-Cerrahpaşa, since 2009. He has also been the Vice Dean of the Engineering Faculty and the Head of the Cyber Security Department, since 2016. He received ten research projects consisting of over Turkey from local industries in Turkey and the İstanbul University-Cerrahpaşa Research Foundation. He has authored 20 journal articles and published and presented 70 papers at international conferences. His research interests include cryptography, network security, information security, and optical networks.

Ahmet Sertbaş, Computer Engineering Department, Istanbul University-Cerrahpasa, Istanbul, Turkey

Ahmet Sertbas was born in İstanbul, Turkey. He received the B.S. and M.S. degrees in electronic engineering from Istanbul Technical University, İstanbul, in 1986 and 1990, respectively, and the Ph.D. degree in electric-electronic engineering from İstanbul University, İstanbul, in 1997. Since 2000, he has been an Assistant Professor, an Associate Professor, and a Professor with the Computer Engineering Department, İstanbul University, and a Professor with the Computer Engineering Department, İstanbul University-Cerrahpaşa, since 2018. His research interests include image processing, artificial intelligence, computer arithmetic, and hardware security. He has 19 articles in indexed SCI-SCIE journals, and many journal articles not indexed SCI-SCIE and international conference papers.

References

Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business intelligence and analytics: From big data to big impact. MIS quarterly, 1165–1188.

Internet Live Stats. Available at: https://www.internetlivestats.com/twitter-statistics/, [Accessed May. 28, 2021].

Dodero, J. M., Rodriguez-Garcia, M., Ruiz-Rube, I., & Palomo-Duarte, M. (2019). Privacy-Preserving Reengineering of Model-View-Controller Application Architectures Using Linked Data. Journal of Web Engineering, 18(7), 695–728.

Canbay, Y., Vural, Y., & Saǧiroǧlu, Ş. (2020). Mahremiyet korumalı büyük veri yayınlama için kavramsal model önerileri. Politeknik Dergisi, 23(3), 785–798.

Zhang, X., Yang, L. T., Liu, C., & Chen, J. (2013). A scalable two-phase top-down specialization approach for data anonymization using MapReduce on cloud. IEEE Transactions on Parallel and Distributed Systems, 25(2), 363–373.

Ranjan, A., & Ranjan, P. (2016, April). Two-phase entropy-based approach to big data anonymization. In 2016 International Conference on Computing, Communication and Automation (ICCCA) (pp. 76–81). IEEE.

Fung, B. C., Wang, K., Fu, A. W. C., & Philip, S. Y. (2010). Introduction to privacy-preserving data publishing: Concepts and techniques. CRC Press.

Verykios, V. S., Bertino, E., Fovino, I. N., Provenza, L. P., Saygin, Y., & Theodoridis, Y. (2004). State-of-the-art in privacy preserving data mining. ACM Sigmod Record, 33(1), 50–57.

Gai, K., Qiu, M., Zhao, H., & Xiong, J. (2016, June). Privacy-aware adaptive data encryption strategy of big data in cloud computing. In 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud) (pp. 273–278). IEEE.

Sweeney, L. (1998). Datafly: A system for providing anonymity in medical data. In Database Security XI (pp. 356–381). Springer, Boston, MA.

Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 3-es.

Li, N., Li, T., & Venkatasubramanian, S. (2007, April). t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd International Conference on Data Engineering (pp. 106–115). IEEE.

Aldeen, Y. A. A. S., Salleh, M., & Razzaque, M. A. (2015). A comprehensive review on privacy preserving data mining. SpringerPlus, 4(1), 1–36.

Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., & Verykios, V. (1999, November). Disclosure limitation of sensitive rules. In Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX’99)(Cat. No. PR00453) (pp. 45–52). IEEE.

Saygin, Y., Verykios, V. S., & Clifton, C. (2001). Using unknowns to prevent discovery of association rules. ACM Sigmod Record, 30(4), 45–54.

Verykios, V. S., Elmagarmid, A. K., Bertino, E., Saygin, Y., & Dasseni, E. (2004). Association rule hiding. IEEE Transactions on knowledge and data engineering, 16(4), 434–447.

Chamikara, M. A. P., Bertók, P., Liu, D., Camtepe, S., & Khalil, I. (2018). Efficient data perturbation for privacy preserving and accurate data stream mining. Pervasive and Mobile Computing, 48, 1–19.

Muralidhar, K., Parsa, R., & Sarathy, R. (1999). A general additive data perturbation method for database security. management science, 45(10), 1399–1415.

Agrawal, R., & Srikant, R. (2000, May). Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data (pp. 439–450).

Tang, J., Korolova, A., Bai, X., Wang, X., & Wang, X. (2017). Privacy loss in apple’s implementation of differential privacy on macos 10.12. arXiv preprint arXiv:1709.02753.

McKenna, F. T. (1997). Object-oriented finite element programming: frameworks for analysis, algorithms and parallel computing. University of California, Berkeley.

Fienberg, S. E., & McIntyre, J. (2004, June). Data swapping: Variations on a theme by dalenius and reiss. In International Workshop on Privacy in Statistical Databases (pp. 14–29). Springer, Berlin, Heidelberg.

Hasan, A. T., Jiang, Q., Luo, J., Li, C., & Chen, L. (2016). An effective value swapping method for privacy preserving data publishing. Security and Communication Networks, 9(16), 3219–3228.

Estivill-Castro, V., & Brankovic, L. (1999, August). Data swapping: Balancing privacy against precision in mining for logic rules. In International Conference on Data Warehousing and Knowledge Discovery (pp. 389–398). Springer, Berlin, Heidelberg.

Aggarwal, C. C., & Philip, S. Y. (2004, March). A condensation approach to privacy preserving data mining. In International Conference on Extending Database Technology (pp. 183–199). Springer, Berlin, Heidelberg.

Chen, K., & Liu, L. (2005). A random rotation perturbation approach to privacy preserving data classification.

Chen, K., & Liu, L. (2005, November). Privacy preserving data classification with rotation perturbation. In Fifth IEEE International Conference on Data Mining (ICDM’05) (pp. 4-pp). IEEE.

Lin, Z., Wang, J., Liu, L., & Zhang, J. (2009, March). Generalized random rotation perturbation for vertically partitioned data sets. In 2009 IEEE Symposium on Computational Intelligence and Data Mining (pp. 159–162). IEEE.

Li, F., Zhang, R., Xu, Y., Liu, J., & Li, J. (2016, September). Privacy preservation based on rotation perturbation in weighted social networks. In 2016 16th International Symposium on Communications and Information Technologies (ISCIT) (pp. 206–211). IEEE.

Kadampur, M. A., & Somayajulu, D. V. (2010, September). Privacy preserving technique for Euclidean distance based mining algorithms using a wavelet related transform. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 202–209). Springer, Berlin, Heidelberg.

Liu, K., Kargupta, H., & Ryan, J. (2005). Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on knowledge and Data Engineering, 18(1), 92–106.

Chen, K., & Liu, L. (2011). Geometric data perturbation for privacy preserving outsourced data mining. Knowledge and information systems, 29(3), 657–695.

Sopaoğlu U. Privacy Preserving Anonymization of Big Data and Data Streams. PhD, TOBB University of Economics and Technology,2020

Eyupoglu, C., Aydin, M. A., Zaim, A. H., & Sertbas, A. (2018). An efficient big data anonymization algorithm based on chaos and perturbation techniques. Entropy, 20(5), 373.

Chen, K., & Liu, L. (2009). Privacy-preserving multiparty collaborative mining with geometric data perturbation. IEEE Transactions on Parallel and Distributed Systems, 20(12), 1764–1776.

Balasubramaniam, S., & Kavitha, V. (2015). Geometric data perturbation-based personal health record transactions in cloud computing. The Scientific World Journal, 2015.

Reddy, V. S., & Rao, B. T. (2018). A combined clustering and geometric data perturbation approach for enriching privacy preservation of healthcare data in hybrid clouds. International Journal of Intelligent Engineering and Systems, 11(1), 201–210.

Darshna R., Avani J. (2015). Geometrıc Data Perturbatıon Usıng Clusterıng Algorıthm. International Journal Of Advances In Cloud Computing And Computer Science (IJACCCS). 1(1): 2454–4078.

Dhiraj, S. S., Khan, A. M. A., Khan, W., & Challagalla, A. (2009, January). Privacy preservation in k-means clustering by cluster rotation. In TENCON 2009-2009 IEEE Region 10 Conference (pp. 1–7). IEEE.

Javid, T., & Gupta, M. K. (2019, November). Privacy Preserving Classification using 4-Dimensional Rotation Transformation. In 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART) (pp. 279–284). IEEE.

Sreekumar, K., & Baburaj, E. (2012). Privacy preservation using geometric data perturbation and fragmentation approach in wireless sensor networks.

Oliveira, S. R., & Zaiane, O. R. (2010). Privacy preserving clustering by data transformation. Journal of Information and Data Management, 1(1), 37–37.

Chamikara, M. A. P., Bertók, P., Liu, D., Camtepe, S., & Khalil, I. (2019). An efficient and scalable privacy preserving algorithm for big data and data streams. Computers & Security, 87, 101570.

Yeliz GE. True Random Number Generation Based on Human Movements. BEU Journal of Science. 2019;8(1):261–9.

Özkaynak F. Cryptographic Random Number Generators. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi. 2015;8(2): 37–45.

Okkalioglu, B. D., Okkalioglu, M., Koc, M., & Polat, H. (2015). A survey: deriving private information from perturbed data. Artificial Intelligence Review, 44(4), 547–569.

Friedman Test in SPSS Statistics. Available at: https://statistics.laerd.com/spss-tutorials/friedman-test-using-spss-statistics.php [Accessed May. 28, 2021].

Küpeli C., Bulut F. Performance Analysis of Filters over Salt-Pepper and Gauss Noises in Images. Haliç Üniversitesi Fen Bilimleri Dergisi. 3(2):211–39. DOI: 10.46373/hafebid.768240

Downloads

Published

2021-10-18

Issue

Section

Articles