Cyberbullying Detection in Social Networks: Artificial Intelligence Approach
Keywords:Cyberbullying, machine learning, detection, algorithms, twitter, cybercrime, social media
Over the past decade, digital communication has reached a massive scale globally. Unfortunately, cyberbullying has become prevalent, with perpetrators hiding behind the mask of relative internet anonymity. In this work, efforts were made to review prominent classification algorithms and also to propose an ensemble model for identifying cases of cyberbullying, using Twitter datasets. The algorithms used for evaluation are Naive Bayes, K-Nearest Neighbors, Logistic Regression, Decision Tree, Random Forest, Linear Support Vector Classifier, Adaptive Boosting, Stochastic Gradient Descent and Bagging classifiers. Through experimentations, comparisons were made with the classifiers against four metrics: accuracy, precision, recall and F1 score. The results reveal the performances of all the algorithms used with their corresponding metrics. The ensemble model generated better results while Linear Support Vector Classifier (SVC) was the least effective of all. Random Forest classifier has shown to be the best performing classifier with medians of 0.77, 0.73 and 0.94 across the datasets. The ensemble model has shown to improve the results of its constituent classifiers with medians of 0.77, 0.66 and 0.94, as against the 0.59, 0.42 and 0.86 of Linear Support Vector Classifier.
Al-garadi, M. A., Varathan, K. D., and Ravana, S. D. (2016). Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Computers in Human Behavior, 11.
Amarashinghe, T., Aponso, A., and Krishnarajah, N. (2018). Critical Analysis of Machine Learning Based Approaches for Fraud Detection in Financial Transactions. China: Association for Computing Machinery.
Amrita, M. (2017). Collaborative Detection of Cyberbullying Behavior In Twitter Data. Indiana: Department of Computer Science.
Astor, M. (2017, August 13). A Guide to the Charlottesville Aftermath – The New York Times. Retrieved from The New York Times: https://www.nytimes.com/2017/08/13/us/charlottesville-virginia-overview.html
Aziz S., M. U. Khan, Z. Ahmad Choudhry, A. Aymin and A. Usman, “ECG-based Biometric Authentication using Empirical Mode Decomposition and Support Vector Machines,” 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2019, pp. 0906–0912, doi: 10.1109/IEMCON.2019.8936174.
Badawy, Adam, E. F., and Kristina, L. (2018, August). Analyzing the Digital Traces of Political Manipulation: The 2016 Russian Interference Twitter Campaign. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 258–265.
Bail, C., Guay, B., Maloney, E., Combs, A., Hillygus, S., Merhout, F., …Volfovsky, A. (2020). Assessing the Russian Internet Research Agency’s impact on the political attitudes and behaviors of American Twitter users in late 2017. In A. Underdal (Ed.), Proceedings of the National Academy of Sciences Jan 2020.
Bastos, M., and Farkas, J. (2019, August 6). Social Media +
Bisaso, K. R., Karungi, S. A., Kiragga, A., Mukonzo, J. K., and Castelnuovo, B. (2018). A comparative study of logistic regression-based machine learning techniques for prediction of early virological suppression in antiretroviral initiating HIV patients. BMC medical informatics and decision making, 18(1), 77. https://doi.org/10.1186/s12911-018-0659-x
Chatzakou, D., Kourtellis, N., Blackburn, J., De Cristofaro, E., Stringhini, G., and Vakali, A. (2017, June 25-28). MeanBirds: Detecting Aggression and Bullying on Twitter. pp. 1–10.
Chengsheng, Tu and Huacheng, Liu and Bing, Xu. (2017). AdaBoost typical Algorithm and its application research. MATEC Web of Conferences. 139. 00222. 10.1051/matecconf/201713900222.
Daniel, D. (2017). Machine Learning for The Automated Identification of Cyberbullying and Cyberharassment. 1–146.
Dey, D. (2018). ML | Bagging Classifier.
Eichstaedt, J., Schwartz, H., Kern, M., Park, G., Labarthe, D., Merchant, R., …Seligman, M. (2015, February). Psychological Language on Twitter Predicts County-Level Heart Disease Mortality. Psychological Science, 26(2), 159–169.
Feng Bao, Thomas Maier. Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems. Foundations of Data Science, 2020, doi: 10.3934/fods.2020001
Gomez-Adorno, H., Bel-Enguix, G., Sierra, G., Sanchez, O., and Quezada, D. (2018). A Machine Learning Approach for Detecting Aggressive tweets in Spanish. Mexico City.
Gutierrez-Esparza, G. O., Vallejo-Allende, M., and Hernandez-Torruco, J. (2019, May 2). Classification of Cyber-Aggression Cases Applying Machine Learning. pp. 1–17.
Haidar, B., Chamoun, M., and Serhrouchni, A. (2017). A Multilingual System for Cyberbullying Detection: Arabic Content Detection using Machine Learning. Advances in Science, Technology and Engineering Systems Journal Vol. 2, No. 6, 1–10.
Hani, J., Nashaat, M., Ahmed, M., and Mohammed, A. (2019). Social Media Cyberbullying Detection using Machine Learning. (IJACSA) International Journal of Advanced Computer Science and Applications, 2–6.
Khaled Fawagreh, Mohamed Medhat Gaber and Eyad Elyan (2014) Random forests: from early developments to recent advancements, Systems Science & Control Engineering, 2:1, 602–609, DOI: 10.1080/21642583.2014.956265
Lauw, H. W., Shafer, J. C., Agrawal, R., and Ntoulas, A. (2010). Homophily in the digital world: a LiveJournal case study. Internet Computing, IEEE, 14(2), 15e23.
Klein, A. (2019). From Twitter to Charlottesville: Analyzing the Fighting Words Between the Alt-Right and Antifa. International Journal of Communication, 13, 297–318.
Nandakumar, V., Kovoor, B. C., and Sreeja, M. U. (2018). Cyberbullying revelation in twitter data using naïve Bayes classifier algorithm. International Journal of Advanced Research in Computer Science, 1–4.
Nandhini, B. S., and Sheeba, J. I. (2016). Cyberbullying Detection and Classification Using Information Retrieval Algorithm. India.
Sampasa-Kanyinga, H., Roumeliotis, P., and Xu, H. (2014). Associations between cyberbullying and school bullying victimization and suicidal ideation, plans and attempts among Canadian schoolchildren. PLoS One, 9(7).
Quinlan, J.R. Induction of decision trees. Mach Learn 1, 81–106 (1986).
Rushing, W. (2018, April 3). After Charlottesville. Contexts, 17(1), 16–27.
Whittaker, E., and Kowalski, R. M. (2015). Cyberbullying via social media. Journal of School Violence, 14(1), 11e29.
Zhang, H (2004) The Optimality of Naïve Bayes.
Zhang, Z (2016) Introduction to Machine Learning: K-Nearest Neighbors. Ann Transl Med. 2016, 4(11): 218 doi: 10.21037/atm.2016.03.37
Azeez NA, Ayemobola TJ, Misra S, Maskeliūnas R, Damaševičius R(2019). “Network Intrusion Detection with a Hashing Based Apriori Algorithm Using Hadoop MapReduce”. Computers. 2019; 8(4):86.
Azeez, N.A., Salaudeen, B.B., Misra, S.; Damasevicius, R; Maskeliunas, R (2019) “Identifying Phishing Attacks in Communication Networks using URL Consistency Features”, International Journal of Electronic Security and Digital Forensics (InderScience). https://www.inderscience.com/info/ingeneral/forthcoming.php?jcode=ijesdf
Azeez, N.A.; Odufuwa, O.E.; Misra, S.; Oluranti, J.;Damaševičius, R.(2021) Windows PE Malware Detection Using Ensemble Learning. Informatics 2021, 8, 10. https://doi.org/10.3390/informatics8010010