Self-sovereign and Secure Data Sharing Through Docker Containers for Machine Learning on Remote Node
DOI:
https://doi.org/10.13052/jwe1540-9589.2352Keywords:
Self-sovereignty, trusted execution environment, data sharing, containers, Web3.0Abstract
Collecting personal data from various sources and using it for machine learning (ML) is prevalent. However, there are increasing concerns about the monopolization and potential breach of private data by greedy and malicious organizations. Interest in Web 3.0 systems is on the rise as an alternative. These systems aim to guarantee the self-sovereignty of personal data in a decentralized setting. Users can share data with others directly for fair compensation. Nevertheless, malicious remote users can still violate the integrity and confidentiality of personal data. Therefore, this paper proposes a novel method of preventing unwanted leakage and counterfeiting of the private data lent on the premise of remote users. This paper focuses on the decentralized nature of Web 3.0 to leverage existing personal storage so that the burden of collecting secure data is relieved. Data owners create a lightweight Docker container to encapsulate their private data sources. The data owners generate another container to be deployed on a remote premise for taking and executing any ML algorithms remote users create. Between the containers forming a distributed trusted execution environment (TEE), data are read through a secure channel. Since the TEE is strictly controlled by the data owner, no malicious ML application can leak or breach the private information. This paper explains the engineering details of how this new method is realized.
Downloads
References
Jungmin Kim and Kangho Bong. Survey on artificial intelligence industry. Technical report, IITP, 2023. https:/spri.kr/posts/view/23578?code=sw_reports&s_year=&data_page=1 [Accessed: July 2, 2024].
Magnus Redeker, Sören Volgmann, Florian Pethig, and Johannes Kalhoff. Towards data sovereignty of asset administration shells across value added chains. In 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), volume 1, pages 1151–1154. IEEE, 2020.
Atilla Aydın and Türksel Kaya Bensghir. Digital data sovereignty: towards a conceptual framework. In 2019 1st International Informatics and Software Engineering Conference (UBMYK), pages 1–6. IEEE, 2019.
Georgios A Kaissis, Marcus R Makowski, Daniel Rückert, and Rickmer F Braren. Secure, privacy-preserving and federated machine learning in medical imaging. Nature Machine Intelligence, 2(6):305–311, 2020.
Joon-Woo Lee, HyungChul Kang, Yongwoo Lee, Woosuk Choi, Jieun Eom, Maxim Deryabin, Eunsang Lee, Junghyun Lee, Donghoon Yoo, Young-Sik Kim, et al. Privacy-preserving machine learning with fully homomorphic encryption for deep neural network. IEEE Access, 10:30039–30054, 2022.
Sabrina Sicari, Alessandra Rizzardi, and Alberto Coen-Porisini. Insights into security and privacy towards fog computing evolution. Computers & Security, page 102822, 2022.
Lizhi Sun, Shuocheng Wang, Hao Wu, Yuhang Gong, Fengyuan Xu, Yunxin Liu, Hao Han, and Sheng Zhong. Leap: Trustzone based developer-friendly tee for intelligent mobile apps. IEEE Transactions on Mobile Computing, 2022.
Soo-Yong Shin. Issues and solutions of healthcare data de-identification: the case of south korea. Journal of Korean Medical Science, 33(5), 2018.
Emily M Weitzenboeck, Pierre Lison, Malgorzata Cyndecka, and Malcolm Langford. The gdpr and unstructured data: is anonymization possible? International Data Privacy Law, 12(3):184–206, 2022.
Young Yoon, Dae-hyun Ban, Sung-Won Han, Hong-Uk Woo, Eun-ho Heo, Sang-Ho Shin, Jung-kyuen Lee, and Dong-hyeok An. Terminal, cloud apparatus, driving method of terminal, method for processing cooperative data, computer readable recording medium, January 18 2022. US Patent 11,228,653.
Alexandra Wood, Micah Altman, Aaron Bembenek, Mark Bun, Marco Gaboardi, James Honaker, Kobbi Nissim, David R O’Brien, Thomas Steinke, and Salil Vadhan. Differential privacy: A primer for a non-technical audience. Vand. J. Ent. & Tech. L., 21:209, 2018.
Steven Ruggles, Catherine Fitch, Diana Magnuson, and Jonathan Schroeder. Differential privacy and census data: Implications for social and economic research. In AEA papers and proceedings, volume 109, pages 403–408. American Economic Association 2014 Broadway, Suite 305, Nashville, TN 37203, 2019.
Craig Gentry. Computing arbitrary functions of encrypted data. Communications of the ACM, 53(3):97–105, 2010.
Kundan Munjal and Rekha Bhatia. A systematic review of homomorphic encryption and its contributions in healthcare industry. Complex & Intelligent Systems, 9(4):3759–3786, 2023.
Young Yoon and Jaehoon Kim. Homomorphic matching on publish/subscribe brokers based on simple integer partition and factorization for secret forwarding. In Proceedings of the 23rd International Middleware Conference Demos and Posters, pages 11–12, 2022.
Young Yoon and Juno Moon. Verifying the integrity of private transaction information in smart contract using homomorphic encryption. In 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), pages 38–40. IEEE, 2019.
Wonkyung Jung, Eojin Lee, Sangpyo Kim, Jongmin Kim, Namhoon Kim, Keewoo Lee, Chohong Min, Jung Hee Cheon, and Jung Ho Ahn. Accelerating fully homomorphic encryption through architecture-centric analysis and optimization. IEEE Access, 9:98772–98789, 2021.
Youngjin Bae, Jung Hee Cheon, Wonhee Cho, Jaehyung Kim, and Taekyung Kim. Meta-bts: Bootstrapping precision beyond the limit. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 223–234, 2022.
Youngjin Bae, Jung Hee Cheon, Jaehyung Kim, Jai Hyun Park, and Damien Stehlé. Hermes: Efficient ring packing using mlwe ciphertexts and application to transciphering. In Annual International Cryptology Conference, pages 37–69. Springer, 2023.
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agueray Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
Federated Learning. Collaborative machine learning without centralized training data. Publication date: Thursday, April, 6, 2017.
Otkrist Gupta and Ramesh Raskar. Distributed learning of deep neural network over multiple agents. Journal of Network and Computer Applications, 116:1–8, 2018.
Zongshun Zhang, Andrea Pinto, Valeria Turina, Flavio Esposito, and Ibrahim Matta. Privacy and efficiency of communications in federated split learning. IEEE Transactions on Big Data, 2023.
Chandra Thapa, Pathum Chamikara Mahawaga Arachchige, Seyit Camtepe, and Lichao Sun. Splitfed: When federated learning meets split learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8485–8493, 2022.
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
globalplatform.org. Globalplatform specifications archive. {https:/globalplatform.org/specs-library/?filter-committee=tee} [Accessed: 02.22.24].
trustedfirmware.org. Op-tee documentation. {https:/optee.readthedocs.io} [Accessed: 02.22.24].
Nezer Jacob Zaidenberg, Raz Ben Yehuda, and Roee Shimon Leon. Arm hypervisor and trustzone alternatives. Encyclopedia of Criminal Activities and the Deep Web, pages 1150–1162, 2020.
Wikipidia. Docker. {https:/en.wikipedia.org/wiki/Docker_(software)} [Accessed: 02.22.24].
Kubernetes.io. Kubernetes documentation. {https:/kubernetes.io/docs/home/} [Accessed: 02.22.24].
docker.com. Use containers to build, share and run your applications. {https:/www.docker.com/resources/what-container} [Accessed: 02.22.24].
ietf. Totp: Time-based one-time password algorithm. {https:/datatracker.ietf.org/doc/html/rfc6238} [Accessed: 02.22.24].
Hyeonmin Kim and Young Yoon. An ensemble of text convolutional neural networks and multi-head attention layers for classifying threats in network packets. Electronics, 12(20):4253, 2023.