Federated Latent Dirichlet Allocation for User Preference Mining

Authors

  • Xing Wu Beijing National Research Center for Information Science and Technology (BNRist), Department of Automation, Tsinghua University, Beijing, China
  • Yushun Fan Beijing National Research Center for Information Science and Technology (BNRist), Department of Automation, Tsinghua University, Beijing, China
  • Jia Zhang Department of Computer Science, Southern Methodist University, Dallas, TX, USA
  • Zhenfeng Gao Sangfor Technologies Inc., Shenzhen, China

DOI:

https://doi.org/10.13052/jwe1540-9589.2244

Keywords:

Web service composition, user preference mining, Federated learning, LDA, Homomorphic encryption, Blockchain

Abstract

In the field of Web services computing, a recent demand trend is to mine user preferences based on user requirements when creating Web service compositions, in order to meet comprehensive and ever evolving user needs. Machine learning methods such as the latent Dirichlet allocation (LDA) have been applied for user preference mining. However, training a high-quality LDA model typically requires large amounts of data. With the prevalence of government regulations and laws and the enhancement of people’s awareness of privacy protection, the traditional way of collecting user data on a central server is no longer applicable. Therefore, it is necessary to design a privacy-preserving method to train an LDA model without massive collecting or leaking data. In this paper, we present novel federated LDA techniques to learn user preferences in the Web service ecosystem. On the basis of a user-level distributed LDA algorithm, we establish two federated LDA models in charge of two-layer training scenarios: a centralized synchronous federated LDA (CSFed-LDA) for synchronous scenarios and a decentralized asynchronous federated LDA (DAFed-LDA) for asynchronous ones. In the former CSFed-LDA model, an importance-based partially homomorphic encryption (IPHE) technique is developed to protect privacy in an efficient manner. In the latter DAFed-LDA model, blockchain technology is incorporated and a multi-channel-based authority control scheme (MCACS) is designed to enhance data security. Extensive experiments over a real-world dataset ProgrammableWeb.com have demonstrated the model performance, security assurance and training speed of our approach.

Downloads

Download data is not yet available.

Author Biographies

Xing Wu, Beijing National Research Center for Information Science and Technology (BNRist), Department of Automation, Tsinghua University, Beijing, China

Xing Wu received his BS degree in control theory and application from Tsinghua University, China, in 2017. He is currently working toward a Ph.D. degree in the Department of Automation, Tsinghua University. His research interests include services computing, service recommendation, federated learning and blockchain

Yushun Fan, Beijing National Research Center for Information Science and Technology (BNRist), Department of Automation, Tsinghua University, Beijing, China

Yushun Fan received his Ph.D. degree in control theory and application from Tsinghua University, China, in 1990. He is currently a professor with the Department of Automation, Director of the System Integration Institute, and Director of the Networking Manufacturing Laboratory, Tsinghua University. From September 1993 to 1995, he was a visiting scientist, supported by Alexander von Humboldt Stiftung, with the Fraunhofer Institute for Production System and Design Technology (FHG/IPK), Germany. He has authored 10 books and published more than 300 research papers in journals and conferences. His research interests include enterprise modeling methods and optimization analysis, business process re-engineering, workflow management, system integration, object-oriented technologies and flexible software systems, petri nets modeling and analysis, and workshop management and control.

Jia Zhang, Department of Computer Science, Southern Methodist University, Dallas, TX, USA

Jia Zhang received her PhD degree in computer science from the University of Illinois at Chicago. She is currently the Cruse C. and Marjorie F. Calahan Centennial Chair in Engineering, Professor of Department of Computer Science at Southern Methodist University. Her research interests emphasize the application of machine learning and information retrieval methods to tackle data science infrastructure problems, with a recent focus on scientific workflows, provenance mining, software discovery, knowledge graphs, and their interdisciplinary applications. Dr. Zhang has co-authored one textbook “Services Computing” and has published over 170 refereed journal papers, book chapters, and conference papers. Dr. Zhang has served as an associated editor of the IEEE TSC since 2008. She served as Program Committee Chair for IEEE SCC (2020), ICWS (2019), CLOUD (2018), and BigData Congress (2017). She is a senior member of the IEEE.

Zhenfeng Gao, Sangfor Technologies Inc., Shenzhen, China

Zhenfeng Gao received his PhD degree in control theory and application in 2018 from Tsinghua University, China. He is currently working as a postdoctor at the Graduated school at shenzhen, Tsinghua University as well as the postdoctoral research center at Sangfor Technologies Inc. His research interests include services computing, service recommendation, big data and blockchain technology.

References

Divyakant Agrawal, Sudipto Das, and Amr El Abbadi. Big data and cloud computing: new wine or just new bottles? Proceedings of the VLDB Endowment, 2010.

Shereen H Ali, Rana A El-Atier, Khaled M Abo-Al-Ez, and Ahmed I Saleh. A gen-fuzzy based strategy (gfbs) for web service classification. Wireless Personal Communications, 113:1917–1953, 2020.

Vasilios Andrikopoulos, Salima Benbernou, and Michael P Papazoglou. On the Evolution of Services. IEEE Transactions on Software Engineering, 38(3):609–628, 2012.

Frederik Armknecht, Colin Boyd, Christopher Carr, Kristian Gjøsteen, Angela Jäschke, Christian A Reuter, and Martin Strand. A guide to fully homomorphic encryption. Cryptology ePrint Archive, 2015.

Sampathkumar Arumugam, Shishir Kumar Shandilya, and Nebojsa Bacanin. Federated learning-based privacy preservation with blockchain assistance in iot 5G heterogeneous networks. Journal of Web Engineering, pages 1323–1346, 2022.

B. Bai, Y. Fan, W. Tan, and J. Zhang. SR-LDA: Mining effective representations for generating service ecosystem knowledge maps. In Proceedings of IEEE International Conference on Services Computing (SCC), pages 124–131, 2017.

B. Bai, Y. Fan, W. Tan, and J. Zhang. Dltsr: A deep learning framework for recommendations of long-tail web services. IEEE Transactions on Services Computing, 13(1):73–85, 2020.

Kailash Chander Bhardwaj and RK Sharma. Machine learning in efficient and effective web service discovery. Journal of Web Engineering, pages 196–214, 2015.

M. Blake and Y. Wei. Service-oriented computing and cloud computing: Challenges and opportunities. IEEE Internet Computing, 14(06):72–75, 2010.

David M Blei, Andrew Y Ng, Michael I Jordan, and John Lafferty. Latent Dirichlet Allocation. J. Mach. Learn. Res, 3:993–1022, 2003.

D. Chai, L. Wang, K. Chen, and Q. Yang. Secure federated matrix factorization. IEEE Intelligent Systems, 2020.

Jianfei Chen, Kaiwei Li, Jun Zhu, and Wenguang Chen. Warplda: a cache efficient o(1) algorithm for latent dirichlet allocation. Proceedings of the Vldb Endowment, 9(10):744–755, 2016.

Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Dimitrios Papadopoulos, and Qiang Yang. Secureboost: A lossless federated learning framework. IEEE Intelligent Systems, 2021.

T. T. A. Dinh, R. Liu, M. Zhang, G. Chen, B. C. Ooi, and J. Wang. Untangling blockchain: A data processing view of blockchain systems. IEEE Transactions on Knowledge and Data Engineering, 30(7): 1366–1385, 2018.

Cynthia Dwork. Differential privacy: A survey of results. In International conference on theory and applications of models of computation, pages 1–19. Springer, 2008.

Z. Gao, Y. Fan, C. Wu, W. Tan, J. Zhang, Y. Ni, B. Bai, and S. Chen. SeCo-LDA: Mining service co-occurrence topics for composition recommendation. IEEE Transactions on Services Computing, 12(3):446–459, 2019.

Zhenfeng Gao, Yushun Fan, Xiu Li, Liang Gu, Cheng Wu, and Jia Zhang. Discovery and analysis about the evolution of service composition patterns. Journal of Web Engineering, 18(7):579–626, 2019.

T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Supplement 1):5228–5235, 2004.

Y. Hao, Y. Fan, W. Tan, and J. Zhang. Service recommendation based on targeted reconstruction of service descriptions. In Proceedings of IEEE International Conference on Web Services (ICWS), pages 285–292, 2017.

H. Kim, J. Park, M. Bennis, and S. Kim. Blockchained on-device federated learning. IEEE Communications Letters, 24(6):1279–1283, 2020.

Jakub Konen, H. Brendan Mcmahan, Daniel Ramage, and Peter Richtárik. Federated optimization: Distributed machine learning for on-device intelligence. 2016.

C. Li, R. Zhang, J. Huai, X. Guo, and H. Sun. A probabilistic approach for web service discovery. In Proceedings of IEEE International Conference on Services Computing (SCC), pages 49–56, 2013.

X. Liu and I. Fulia. Incorporating user, topic, and service related latent factors into web service recommendation. In IEEE International Conference on Web Services (ICWS), pages 185–192, 2015.

Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. Cryptography Mailing list at https://metzdowd.com, 03 2009.

David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. Distributed algorithms for topic models. Journal of Machine Learning Research, 10(12):1801–1828, 2009.

Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm. In Proceedings of USENIX Annual Technical Conference, pages 305–319, 2014.

Abdelaziz Ouadah, Allel Hadjali, Fahima Nader, and Karim Benouaret. Sefap: an efficient approach for ranking skyline web services. Journal of Ambient Intelligence and Humanized Computing, 10:709–725, 2019.

Pascal Paillier. Public-key cryptosystems based on composite degree residuosity classes. In Jacques Stern, editor, Advances in Cryptology, pages 223–238, 1999.

Nhat Hai Phan, Xintao Wu, and Dejing Dou. Preserving differential privacy in convolutional deep belief networks. Machine Learning, 106(9-10):1681–1704, 2017.

L. T. Phong, Y. Aono, T. Hayashi, L. Wang, and S. Moriai. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security, 13(5):1333–1345, 2018.

Jin Qi, Bin Xu, Yu Xue, Kun Wang, and Yanfei Sun. Knowledge based differential evolution for cloud computing service composition. Journal of Ambient Intelligence and Humanized Computing, 9:565–574, 2018.

Paritosh Ramanan and Kiyoshi Nakayama. Baffle : Blockchain based aggregator free federated learning. 2019.

F. Sattler, S. Wiedemann, K. R. Müller, and W. Samek. Robust and communication-efficient federated learning from non-i.i.d. data. IEEE Transactions on Neural Networks and Learning Systems, 31(9): 3400–3413, 2020.

Gustavus J. Simmons. Symmetric and asymmetric encryption. Acm Computing Surveys, 11(4):305–330, 1979.

CB Sivaparthipan, Bala Anand Muthu, G Fathima, Priyan Malarvizhi Kumar, Mamoun Alazab, and Vicente García Díaz. Blockchain assisted disease identification of covid-19 patients with the help of ida-dnn classifier. Wireless Personal Communications, 126(3):2597–2620, 2022.

Hongbing Wang, Bin Zou, Guibing Guo, Danrong Yang, and Jie Zhang. Integrating trust with user preference for effective web service composition. IEEE Transactions on Services Computing, 10(4):574–588, 2017.

N. Wang, X. Xiao, Y. Yang, J. Zhao, S. C. Hui, H. Shin, J. Shin, and G. Yu. Collecting and analyzing multidimensional data with local differential privacy. In Proceedings of IEEE 35th International Conference on Data Engineering (ICDE), pages 638–649, 2019.

Ronghan Wang and Junwei Lu. Qos-aware service discovery and selection management for cloud-edge computing using a hybrid meta-heuristic algorithm in iot. Wireless Personal Communications, 126(3):2269–2282, 2022.

Yansheng Wang, Yongxin Tong, and Dingyuan Shi. Federated latent dirichlet allocation: A local differential privacy based framework. In Proceedings of AAAI, pages 6283–6290, 2020.

B. Xia, Y. Fan, W. Tan, K. Huang, J. Zhang, and C. Wu. Category-aware api clustering and distributed recommendation for automatic mashup creation. IEEE Transactions on Services Computing, 8(5):674–687, 2015.

Peichen Xie, Bingzhe Wu, and Guangyu Sun. Bayhenn: Combining bayesian deep learning and homomorphic encryption for secure dnn inference. In Proceedings of The 28th International Joint Conference on Artificial Intelligence (IJCAI), pages 4831–4837, 2019.

Runhua Xu, Nathalie Baracaldo, Yi Zhou, Ali Anwar, and Heiko Ludwig. Hybridalpha: An efficient approach for privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pages 13–23, 2019.

Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2):1–19, 2019.

Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric Po Xing, Tie-Yan Liu, and Wei-Ying Ma. Lightlda: Big topic models on modest computer clusters. In Proceedings of the International Conference on World Wide Web (WWW), page 1351–1361, 2015.

J. Zhang, Y. Fan, J. Zhang, and B. Bai. Learning to build accurate service representations and visualization. IEEE Transactions on Services Computing, 2020.

Jia Zhang. A mobile agent-based tool supporting web services testing. Wireless Personal Communications, 56:147–172, 2011.

Y. Zhang, Y. Qian, and Y. Wang. A recommendation algorithm based on dynamic user preference and service quality. In IEEE International Conference on Web Services (ICWS), pages 91–98, 2018.

Y. Zhao, J. Zhao, L. Jiang, R. Tan, D. Niyato, Z. Li, L. Lyu, and Y. Liu. Privacy-preserving blockchain-based federated learning for iot devices. IEEE Internet of Things Journal, 2020.

Downloads

Published

2023-10-25

How to Cite

Wu, X. ., Fan, Y. ., Zhang, J. ., & Gao, Z. . (2023). Federated Latent Dirichlet Allocation for User Preference Mining. Journal of Web Engineering, 22(04), 639–678. https://doi.org/10.13052/jwe1540-9589.2244

Issue

Section

Articles