FedBully: A Cross-Device Federated Approach for Privacy Enabled Cyber Bullying Detection using Sentence Encoders

Authors

  • Nisha P. Shetty Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India-576104
  • Balachandra Muniyal Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India-576104
  • Aman Priyanshu Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India-576104
  • Vedant Rishi Das Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India-576104

DOI:

https://doi.org/10.13052/jcsm2245-1439.1242

Keywords:

Federated Learning, convolutional neural network, Secure Aggregation, Natural Language Processing, Cyberbullying

Abstract

Cyberbullying has become one of the most pressing concerns for online platforms, putting individuals at risk and raising severe public concerns. Recent studies have shown a significant correlation between declining mental health and cyberbullying. Automated detection offers a great solution to this problem; however, the sensitivity of client-data becomes a concern during data collection, and as such, access may be restricted. This paper demonstrates FedBully, a federated approach for cyberbullying detection using sentence encoders for feature extraction. This paper introduces concepts of secure aggregation to ensure client privacy in a cross-device learning system. Optimal hyper-parameters were studied through comprehensive experiments, and a computationally and communicationally inexpensive network is proposed. Experiments reveal promising results with up to 93% classification AUC (Area Under the Curve) using only dense networks to fine-tune sentence embeddings on IID datasets and 91% AUC on non-IID datasets, where IID refers to Independent and Identically Distributed data. The analysis also shows that data independence profoundly impacts network performance, with AUC decreasing by a mean of 5.1% between Non-IID and IID. A rich and extensive study has also been performed on client network size and secure aggregation protocols, which prove the robustness and practicality of the proposed model. The novel approach presented offers an efficient and practical solution to training a cross-device cyberbullying detector while ensuring client-privacy.

Downloads

Download data is not yet available.

Author Biographies

Nisha P. Shetty, Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India-576104

Nisha P. Shetty has acquired her bachelor’s and master’s degree from Visvesvaraya Technological University. She is currently pursuing her doctorate at Manipal Institute of Technology, Manipal. She is working in the area of social network security.

Balachandra Muniyal, Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India-576104

Balachandra Muniyal’s research area includes Network Security, Algorithms, and Operating systems. He has more than 30 publications in national and international conferences/journals. Currently, he is working as a Professor in the Dept. of Information & Communication Technology, Manipal Institute of Technology, Manipal. He has around 25 years of teaching experience in various Institutes.

Aman Priyanshu, Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India-576104

Aman Priyanshu is a final-year undergraduate at the Manipal Institute of Technology. His research interests include Privacy Preserving Machine Learning, Explainable AI, Fairness, and AI for Social Good.

Vedant Rishi Das, Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India-576104

Vedant Rishi Das is currently pursuing his bachelor’s degree in Computer Science and Engineering branch in Manipal Institute of Technology, Manipal. His areas of interests are Natural Language Processing and Machine Learning.

References

J.W. Patchin, S. Hinduja, ‘Bullying Beyond the Schoolyard: Preventing and Responding to Cyberbullying’, Corwin: Thousand Oaks, CA, 2014. ISBN: 1483349934.

K. RIGBY, ‘Effects of peer victimization in schools and perceived social support on adolescent well-being’, Journal of Adolescence, 2000, 23, 57–68. doi: https://doi.org/10.1006/jado.1999.0289.

K. Rigby, ‘What children tell us about bullying in schools’, Children Australia, 1997, 22, 28–34. doi: 10.1017/S1035077200008178.

C.F. Yen, T.L. Liu, P. Yang, H.F. Hu, ‘Risk and Protective Factors of Suicidal Ideation and Attempt among Adolescents with Different Types of School Bullying Involvement’, Archives of Suicide Research, 2015, 19, 435–452, [https://doi.org/10.1080/13811118.2015.1004347930]. PMID: 26566860, doi: 10.1080/13811118.2015.1004490.

H. Rosa, N. Pereira, R. Ribeiro, P. Ferreira, J. Carvalho, S. Oliveira, L. Coheur, P. Paulino, A. Veiga Simão, ‘Trancoso, I. Automatic cyberbullying detection: A systematic review’, Computers in Human Behavior 2019, 93, 333–345. doi: https://doi.org/10.1016/j.chb.2018.12.021.

L. Cheng, Y.N. Silva, D. Hall, H. Liu, ‘Session-Based Cyberbullying Detection: Problems and Challenges’, IEEE Internet Computing, 2021, 25, 66–72. doi: 10.1109/MIC.2020.3032930.

M.A. Al-garadi, K.D. Varathan, S.D. Ravana, ‘Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network’, Computers in Human Behavior, 2016, 63, 433–443. doi: https://doi.org/10.1016/j.chb.2016.05.051.

V. Nahar, S. Al-Maskari, X. Li, C. Pang, ‘Semi-supervised Learning for Cyberbullying Detection in Social Networks’, In Proceedings of the Databases Theory and Applications; H. Wang, M.A. Sharaf, Eds.; Springer International Publishing: Cham, 2014; pp. 160–171.

J. Konečný, H.B. McMahan, D. Ramage, P. Richtárik, ‘Federated Optimization: Distributed Machine Learning for On-Device Intelligence’, 2016, [arXiv:cs.LG/1610.02527]

L.T. Phong, Y. Aono, T. Hayashi, L. Wang, S. Moriai, ‘Privacy-Preserving Deep Learning via Additively Homomorphic Encryption’. Trans. Info. For. Sec. 2018, 13, 1333–1345.

T. Li, A.K. Sahu, A. Talwalkar, V. Smith, ‘Federated Learning: Challenges, Methods, and Future Directions’. IEEE Signal Processing Magazine 2020, 37, 50–60. doi: 10.1109/MSP.2020.2975749.

B. Liu, B. Yan, Y. Zhou, Y. Yang, Y. Zhang, ‘Experiments of Federated Learning for COVID-19 Chest X-ray Images’, 2020, [arXiv:eess.IV/2007.05592].

P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. ‘Advances and Open Problems in Federated Learning’, 2021, [arXiv:cs.LG/1912.04977].

A. Hard, K. Rao, R. Mathews, S. Ramaswamy, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, D. Ramage, ‘Federated Learning for Mobile Keyboard Prediction’, 2019, [arXiv:cs.CL/1811.03604].

S. Ramaswamy, R. Mathews, K. Rao, F. Beaufays, ‘Federated Learning for Emoji Prediction in a Mobile Keyboard’, 2019, [arXiv:cs.CL/1906.04329].

M. Chen, R. Mathews, T. Ouyang, F. Beaufays, ‘Federated Learning Of Out-Of-Vocabulary Words’, 2019, [arXiv:cs.CL/1903.10635].

H. Miyajima, N. Shigei, H. Miyajima, N. Shiratori, ‘Federated Learning with Divided Data for BP. Proceedings of the International MultiConference of Engineers and Computer Scientists 2021’, 2021, pp. 94–99.

L.U. Khan, W. Saad, Z. Han, E. Hossain, C.S. Hong, ‘Federated Learning for Internet of Things: Recent Advances, Taxonomy, and Open Challenges’. IEEE Communications Surveys Tutorials 2021, 23, 1759–1799. doi: 10.1109/COMST.2021.3090430.

K. Pillutla, S.M. Kakade, Z. Harchaoui, ‘Robust Aggregation for Federated Learning’, 2019, [arXiv:stat.ML/1912.13445].

O. Gencoglu, ‘Cyberbullying Detection With Fairness Constraints’ IEEE Internet Computing 2021, 25, 20–29. doi: 10.1109/MIC.2020.3032440561.

V. Balakrishnan, S. Khan, H.R. Arabnia, ‘Improving cyberbullying detection using Twitter users’ psychological features and machine learning. Computers Security 2020, 90, 101710. doi: https://doi.org/10.1016/j.cose.2019.101710.

A. Muneer, S.M. Fati, ‘A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter’, Future Internet 2020, 12. doi: 10.3390/fi12110187.

S. Nadali, M.A.A. Murad, N.M. Sharef, A. Mustapha, S. Shojaee, ‘A review of cyberbullying detection: An overview’. In Proceedings of the 2013 13th International Conference on Intelligent Systems Design and Applications, 2013, pp. 325–330. doi: 10.1109/ISDA.2013.6920758.

H. Miyajima, H. Miyajima, N. Shiratori, ‘Fast and Secure Edge-computing Algorithms for Classification Problems’, IAENG International Journal of Computer Science, 2019.

N. Rezvani, A. Beheshti, A. Tabebordbar, ‘Linking Textual and Contextual Features for Intelligent Cyberbullying Detection in Social Media’, In Proceedings of the Proceedings of the 18th International Conference on Advances in Mobile Computing Multimedia; Association for Computing Machinery: New York, NY, USA, 2020; MoMM ’20, pp. 3–10. doi: 10.1145/3428690.3429171.

M. Dadvar, K. Eckert, ‘Cyberbullying Detection in Social Networks Using Deep Learning Based Models’, In Proceedings of the Big Data Analytics and Knowledge Discovery; M. Song, I.Y. Song, G. Kotsis, A.M. Tjoa, I. Khalil, Eds.; Springer International Publishing: Cham, 2020; pp. 245–255.

P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, B. Xu, ‘Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification’, In Proceedings of the Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); Association for Computational Linguistics: Berlin, Germany, 2016; pp. 207–212. 423 doi: 10.18653/v1/P16-2034.

P. Wanda, J.H. Jie, ‘DeepSentiment : Finding Malicious Sentiment in Online Social Network based on Dynamic Deep Learning’, IAENG International Journal of Computer Science, 2019, pp. 4–12.

V. Indurthi, B. Syed, M. Shrivastava, N. Chakravartula, M. Gupta, V. Varma, ‘FERMI at SemEval-2019 Task 5: Using Sentence embeddings to Identify Hate Speech Against Immigrants and Women in Twitter’, In Proceedings of the Proceedings of the 13th International Workshop on Semantic Evaluation; Association for Computational Linguistics: Minneapolis, Minnesota, USA, 2019, pp. 70–74. doi: 10.18653/v1/S19-2009.

J. Yadav, D. Kumar, D. Chauhan, ‘Cyberbullying Detection using Pre-Trained BERT Model’, In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), 2020, pp. 1096–1100. doi: 10.1109/ICESC48915.2020.9155700.433

M. Ptaszynski, F. Masui, T. Nitta, S. Hatakeyama, Y. Kimura, R. Rzepka, K. Araki, Sustainable cyberbullying detection with category-maximized relevance of harmful phrases and double-filtered automatic optimization. International Journal of Child-Computer Interaction 2016, 8, 15–30. doi: https://doi.org/10.1016/j.ijcci.2016.07.002.

M. Dadvar, D. Trieschnigg, R. Ordelman, F. de Jong, ‘Improving Cyberbullying Detection with User Context’, In Proceedings of the Advances in Information Retrieval, P. Serdyukov, P. Braslavski, S.O. Kuznetsov, J. Kamps, S. Rüger, E. Agichtein, I. Segalovich, E. Yilmaz, Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2013; pp. 693–696.

X. Zhu, J. Wang, Z. Hong, J. Xiao, ‘Empirical Studies of Institutional Federated Learning For Natural Language Processing’, In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020; Association for Computational Linguistics: Online, 2020; pp. 625–634. doi: 10.18653/v1/2020.findings-emnlp.55.

N. Reimers, I. Gurevych, ‘Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks’, 2019, [arXiv:cs.CL/1908.10084].

D. Cer, Y. Yang, S. yi Kong, N. Hua, N. Limtiaco, R.S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. ‘Universal Sentence Encoder’, 2018, [arXiv:cs.CL/1803.11175].

A. Conneau, D. Kiela, H. Schwenk, L. Barrault, A. Bordes, ‘Supervised Learning of Universal Sentence Representations from Natural Language Inference Data’, 2018, [arXiv:cs.CL/1705.02364].

K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H.B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth, Practical Secure Aggregation for Federated Learning on User-Held Data, 2016, [arXiv:cs.CR/1611.04482].

K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečný, S. Mazzocchi, H.B. McMahan, et al. ‘Towards Federated Learning at Scale: System Design’, 2019, [arXiv:cs.LG/1902.01046].

H. Miyajima, N. Shigei, H. Miyajima, N. Shiratori, ‘Securely Distributed Computation with Divided Data for Particle Swarm Optimization’, Proceedings of the International MultiConference of Engineers and Computer Scientists 2021, 2021, pp. 1–6.

Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, V. Chandra, ‘Federated Learning with Non-IID Data’, 2018, [arXiv:cs.LG/1806.00582].

P. Parmar, S.B. Padhar, S.N. Patel, N.I. Bhatt, R. Jhaveri, ‘Survey of Various Homomorphic Encryption algorithms and Schemes’, International Journal of Computer Applications 2014, 91, 26–32.

D.P. Kingma, J. Ba, ‘Adam: A Method for Stochastic Optimization’, 2017, [arXiv:cs.LG/1412.6980].

F. Elsafoury, ‘Cyberbullying datasets’, 2020. doi:10.17632/jf4pzyvnpj.1.

J. Geiping, H. Bauermeister, H. Dröge, M. Moeller, ‘Inverting Gradients – How easy is it to break privacy in federated learning?’, 2020, [arXiv:cs.CV/2003.14053].

Dadvar, M., Trieschnigg, D., de Jong, F. Experts and Machines Against Bullies: A Hybrid Approach to Detect Cyberbullies. 2014, Vol. 8436. doi: 10.1007/978-3-319-06483-3_25.

. Potha, N., Maragoudakis, M. Cyberbullying Detection using Time Series Modeling. In Proceedings of the 2014 IEEE International Conference on Data Mining Workshop, 2014, pp. 373–382. doi: 10.1109/ICDMW.2014.170.

Del Bosque, L.P., Garza, S.E. Aggressive Text Detection for Cyberbullying. In Proceedings of the Human-Inspired Computing and Its Applications; Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N., Eds., Springer International Publishing: Cham, 2014; pp. 221–232.

Ibn Rafiq, R., Hosseinmardi, H., Han, R., Lv, Q., Mishra, S., Mattson, S.A. Careful what you share in six seconds: Detecting cyberbullying instances in Vine. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2015, pp. 617–622. doi: 10.1145/2808797.2809381.

Waseem, Z., Hovy, D. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In Proceedings of the Proceedings of the NAACL Student Research Workshop; Association for Computational Linguistics: San Diego, California, 2016; pp. 88–93. doi: 10.18653/v1/N16-2013.

Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y. Abusive Language Detection in Online User Content. In Proceedings of the Proceedings of the 25th International Conference on World Wide Web; International World Wide Web Conferences Steering Committee: Republic and Canton of Geneva, CHE, 2016; WWW ’16, p. 145–153. doi: 10.1145/2872427.2883062.

Waseem, Z. Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In Proceedings of the Proceedings of the First Workshop on NLP and Computational Social Science; Association for Computational Linguistics: Austin, Texas, 2016; pp. 138–142. doi: 10.18653/v1/W16-5618.

Singh, V.K., Huang, Q., Atrey, P.K. Cyberbullying detection using probabilistic socio-textual information fusion. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2016, pp. 884–887. doi: 10.1109/ASONAM.2016.7752342.

Raisi, E., Huang, B. Cyberbullying Detection with Weakly Supervised Machine Learning. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2017, pp. 409–416.

Wulczyn, E., Thain, N., Dixon, L. Ex Machina. Proceedings of the 26th International Conference on World Wide Web 2017. doi: 10.1145/3038912.3052591.

Dani, H., Li, J., Liu, H., Sentiment Informed Cyberbullying Detection in Social Media; 2017; pp. 52–67. doi: 10.1007/978-3-319-71249-9_4.

Chatzakou, D., Kourtellis, N., Blackburn, J., De Cristofaro, E., Stringhini, G., Vakali, A. Mean Birds. Proceedings of the 2017 ACM on Web Science Conference 2017. doi: 10.1145/3091478.3091487.

Agrawal, S., Awekar, A. Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms. Advances in Information Retrieval 2018, pp. 141–153. doi: 10.1007/978-3-319-76941-7_11.

Huang, Q., Inkpen, D., Zhang, J., Van Bruwaene, D. Cyberbullying Intervention Based on Convolutional Neural Networks. In Proceedings of the Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018); Association for Computational Linguistics: Santa Fe, New Mexico, USA, 2018; pp. 42–51.

Zhang, Z., Robinson, D., Tepper, J. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. 2018.

Rafiq, R.I., Hosseinmardi, H., Han, R., Lv, Q., Mishra, S. Scalable and Timely Detection of Cyberbullying in Online Social Networks. In Proceedings of the Proceedings of the 33rd Annual ACM Symposium on Applied Computing; Association for Computing Machinery: New York, NY, USA, 2018; SAC ’18, pp. 1738–1747. doi: 10.1145/3167132.3167317.

Rosa, H., Carvalho, J.P., Calado, P., Martins, B., Ribeiro, R., Coheur, L. Using Fuzzy Fingerprints for Cyberbullying Detection in Social Networks. In Proceedings of the 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2018, pp. 1–7. doi: 10.1109/FUZZ-IEEE.2018.8491557.

Kao, H.T., Yan, S., Huang, D., Bartley, N., Hosseinmardi, H., Ferrara, E. Understanding Cyberbullying on Instagram and Ask.Fm via Social Role Detection. In Proceedings of the Companion Proceedings of The 2019 World Wide Web Conference; Association for Computing Machinery: New York, NY, USA, 2019; WWW ’19, pp. 183–188. doi: 10.1145/3308560.3316505.

Kumar, A., Nayak, S., Chandra, N., Empirical Analysis of Supervised Machine Learning Techniques for Cyberbullying Detection: Proceedings of ICICC 2018, Volume 2; 2019; pp. 223–230. doi: 10.1007/978-981-13-2354-6_24.

Rosa, H., Matos, D., Ribeiro, R., Coheur, L., Carvalho, J.P. A “Deeper” Look at Detecting Cyberbullying in Social Networks. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8. doi: 10.1109/IJCNN.2018.8489211.

D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li and H. Vincent Poor, “Federated Learning for Internet of Things: A Comprehensive Survey,” in IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp. 1622–1658, third quarter 2021, doi: 10.1109/COMST.2021.3075439.

O. A. Wahab, A. Mourad, H. Otrok and T. Taleb, “Federated Machine Learning: Survey, Multi-Level Classification, Desirable Criteria and Future Directions in Communication and Networking Systems,” in IEEE Communications Surveys & Tutorials, vol. 23, no. 2, pp. 1342–1397, Secondquarter 2021, doi: 10.1109/COMST.2021.3058573.

K. Pillutla, S. M. Kakade and Z. Harchaoui, “Robust Aggregation for Federated Learning,” in IEEE Transactions on Signal Processing, vol. 70, pp. 1142–1154, 2022, doi: 10.1109/TSP.2022.3153135.

Y. Liu, X. Zhu, J. Wang and J. Xiao, “A Quantitative Metric for Privacy Leakage in Federated Learning,” ICASSP 2021 – 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 3065–3069, doi: 10.1109/ICASSP39728.2021.9413539.

Xuefei Yin, Yanming Zhu, and Jiankun Hu. 2021. A Comprehensive Survey of Privacy-preserving Federated Learning: A Taxonomy, Review, and Future Directions. ACM Comput. Surv. 54, 6, Article 131 (July 2022), 36 pages. https://doi.org/10.1145/3460427.

Q. Li et al., “A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection,” in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2021.3124599.

Downloads

Published

2023-06-30

How to Cite

1.
Shetty NP, Muniyal B, Priyanshu A, Das VR. FedBully: A Cross-Device Federated Approach for Privacy Enabled Cyber Bullying Detection using Sentence Encoders. JCSANDM [Internet]. 2023 Jun. 30 [cited 2024 Jul. 22];12(04):465-96. Available from: https://journals.riverpublishers.com/index.php/JCSANDM/article/view/16209

Issue

Section

Articles