Hate Speech Detection in Social Media (Twitter) Using Neural Network
DOI:
https://doi.org/10.13052/jmm1550-4646.1936Keywords:
Hate speech, Twitter, toxic, cyberbullying, Convolutional Neural NetworkAbstract
Hate speech recently became a real threat in social media, and almost all social media users are intended to in different ways. Hate speech is not limited to a group or society. It affects many people and can be classified as abusive, offensive, sexism, racism, political affiliation, religious hate, nationality, skin color, disability, gender-based, ethnicity, sexual orientation, immigrants, and others. Many researchers and authorities attempt to discover new procedures to sense hate speech in social media, especially on Facebook and Twitter, and many methods, models, and algorithms are used for this purpose. One of the most valuable models for detecting hate speech is Convolutional Neural Network (CNN). This review aims to assort academic studies on hate speech detection in Twitter using CNN-based models summarize the results of each model to expand the understanding of the recent circumstances of hate speech detection in Twitter. For this purpose, we implemented a broad, automated search using Boolean and Snowballing searching methods to find academic works in this area. Studies and papers have been distinguished, and the following information was obtained and aggregated from each article: authors, publication’s year, the journal name or the conference name, proposed model/method, the aim of the study, the outcome, and the quality of each study. According to the findings, the CNN and CNN-based models are standard models for hate speech detection. Besides, the findings show that other new models have a great compact on hate speech detection, and there is good progress in this field. However, the problems that still exist with hate speech detection models mainly are; most of the models cannot detect hate speech automatically. The methods are not suitable with all the languages, and they are working only with one language; most are best suited with the English language, and when they are used with datasets with other languages. Besides, the models are suffering from confusion in speech classification. Finally, most models are not considering a user-to-user speech in social media.
Downloads
References
W. Alorainy, P. Burnap, H. Liu, and M. Williams, “The Enemy Among Us: Detecting Hate Speech with Threats Based ‘Othering’ Language Embeddings,” 2018, [Online]. Available: http://arxiv.org/abs/1801.07495.
S. T. Luu, K. Van Nguyen, and N. L. T. Nguyen, “A Large-Scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12798 LNAI, pp. 415–426, 2021, doi: 10.1007/978-3-030-79457-6_35.
L. Ketsbaia, B. Issac, and X. Chen, “Detection of hate tweets using machine learning and deep learning,” Proc. – 2020 IEEE 19th Int. Conf. Trust. Secur. Priv. Comput. Commun. Trust. 2020, pp. 751–758, 2020, doi: 10.1109/TrustCom50675.2020.00103.
A. Schmidt and M. Wiegand, “A Survey on Hate Speech Detection using Natural Language Processing,” Soc. 2017 – 5th Int. Work. Nat. Lang. Process. Soc. Media, Proc. Work. AFNLP SIG Soc., no. 2012, pp. 1–10, 2017, doi: 10.18653/v1/w17-1101.
S. Ahammed, M. Rahman, M. H. Niloy, and S. M. M. H. Chowdhury, “Implementation of Machine Learning to Detect Hate Speech in Bangla Language,” Proc. 2019 8th Int. Conf. Syst. Model. Adv. Res. Trends, SMART 2019, pp. 317–320, 2020, doi: 10.1109/SMART46866.2019.9117214.
S. Malmasi and M. Zampieri, “Detecting hate speech in social media,” Int. Conf. Recent Adv. Nat. Lang. Process. RANLP, vol. 2017-Septe, pp. 467–472, 2017, doi: 10.26615/978-954-452-049-6-062.
M. Polignano, P. Basile, M. de Gemmis, and G. Semeraro, “Hate speech detection through Alberto Italian language understanding model,” CEUR Workshop Proc., vol. 2521, 2019.
A. Alotaibi and M. H. Abul Hasanat, “Racism Detection in Twitter Using Deep Learning and Text Mining Techniques for the Arabic Language,” Proc. – 2020 1st Int. Conf. Smart Syst. Emerg. Technol. SMART-TECH 2020, pp. 161–164, 2020, doi: 10.1109/SMART-TECH49988.2020.00047.
M. A. Carlin and M. Elhilali, “A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 23, no. 12, pp. 2422–2433, 2015, doi: 10.1109/TASLP.2015.2481179.
B. Gambäck and U. K. Sikdar, “Using Convolutional Neural Networks to Classify Hate-Speech,” no. 7491, pp. 85–90, 2017, doi: 10.18653/v1/w17-3013.
P. Mayr, I. Frommholz, and G. Cabanac, “Bibliometric-enhanced information retrieval: 7th international BIR workshop,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10772 LNCS, pp. 827–828, 2018, doi: 10.1007/978-3-319-76941-7.
N. Albadi, M. Kurdi, and S. Mishra, “Investigating the effect of combining GRU neural networks with handcrafted features for religious hatred detection on Arabic Twitter space,” Soc. Netw. Anal. Min., vol. 9, no. 1, pp. 1–19, 2019, doi: 10.1007/s13278-019-0587-5.
M. B. Aliyu, “American Journal of Engineering Research ( AJER ) Efficiency of Boolean Search strings for Information Retrieval,” Am. J. Eng. Res., vol. 6, no. 11, pp. 216–222, 2017.
C. Wohlin, “Guidelines for snowballing in systematic literature studies and a replication in software engineering,” ACM Int. Conf. Proceeding Ser., 2014, doi: 10.1145/2601248.2601268.
A. Kumar and N. Sachdeva, “Cyberbullying detection on social multimedia using soft computing techniques: a meta-analysis,” Multimed. Tools Appl., vol. 78, no. 17, pp. 23973–24010, 2019, doi: 10.1007/s11042-019-7234-z.
W. Alorainy, P. Burnap, H. Liu, and M. L. Williams, “‘The Enemy Among Us,”’ ACM Trans. Web, vol. 13, no. 3, pp. 1–26, 2019, doi: 10.1145/3324997.
M. Bani Yassein, S. Aljawarneh, and Y. Wahsheh, “Hybrid Real-Time Protection System for Online Social Networks,” Found. Sci., vol. 25, no. 4, pp. 1095–1124, 2020, doi: 10.1007/s10699-019-09595-7.
P. Fortuna, J. Soler-Company, and L. Wanner, “How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?,” Inf. Process. Manag., vol. 58, no. 3, p. 102524, 2021, doi: 10.1016/j.ipm.2021.102524.
Z. L. Chia, M. Ptaszynski, F. Masui, G. Leliwa, and M. Wroczynski, “Machine Learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection,” Inf. Process. Manag., vol. 58, no. 4, 2021, doi: 10.1016/j.ipm.2021.102600.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2323, 1998, doi: 10.1109/5.726791.
T. Sercu, C. Puhrsch, B. Kingsbury, I. B. M. T. J. Watson, and Y. Heights, “Very deep multilingual convolutional neural networks for LVCSR Center for Data Science, Courant Institute of Mathematical Sciences, New York University,” Icassp 2016, pp. 4955–4959, 2016.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 – Conf. Track Proc., pp. 1–14, 2015.
B. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Cnn,” Commun. ACM, vol. 60, no. 6, pp. 84–90, 2012.
K. Cho, B. van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder–decoder approaches,” Proc. SSST 2014 – 8th Work. Syntax. Semant. Struct. Stat. Transl., pp. 103–111, 2014, doi: 10.3115/v1/w14-4012.
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” pp. 1–9, 2014.
M. Ravanelli, P. Brakel, M. Omologo, and Y. Bengio, “Light Gated Recurrent Units for Speech Recognition,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 2, no. 2, pp. 92–102, 2018, doi: 10.1109/TETCI.2017.2762739.
J. Liu, C. Wu, and J. Wang, “Gated recurrent units based neural network for time heterogeneous feedback recommendation,” Inf. Sci. (Ny)., vol. 423, pp. 50–65, 2018, doi: 10.1016/j.ins.2017.09.048.
Z. Zhang and L. Luo, “Hate speech detection: A solved problem? The challenging case of long tail on Twitter,” Semant. Web, vol. 10, no. 5, pp. 925–945, 2019, doi: 10.3233/SW-180338.
R. Alshalan and H. Al-Khalifa, “A deep learning approach for automatic hate speech detection in the saudi twittersphere,” Appl. Sci., vol. 10, no. 23, pp. 1–16, 2020, doi: 10.3390/app10238614.
E. Pronoza, P. Panicheva, O. Koltsova, and P. Rosso, “Detecting ethnicity-targeted hate speech in Russian social media texts,” Inf. Process. Manag., vol. 58, no. 6, p. 102674, 2021, doi: 10.1016/j.ipm.2021.102674.
À. A. Carracedo and R. J. Mondéjar, “Profiling Hate Speech Spreaders on Twitter,” CEUR Workshop Proc., vol. 2936, no. August, pp. 1801–1807, 2021.
A. Bisht, A. Singh, H. S. Bhadauria, J. Virmani, and Kriti, Detection of hate speech and offensive language in twitter data using LSTM model, vol. 1124. Springer Singapore, 2020.
H. T.-T. Do, H. D. Huynh, K. Van Nguyen, N. L.-T. Nguyen, and A. G.-T. Nguyen, “Hate Speech Detection on Vietnamese Social Media Text using the Bidirectional-LSTM Model,” pp. 4–7, 2019.
O. Levy, K. Lee, N. FitzGerald, and L. Zettlemoyer, “Long short-term memory as a dynamically computed element-wise weighted sum,” ACL 2018 – 56th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap., vol. 2, pp. 732–739, 2018, doi: 10.18653/v1/p18-2116.
A. Elouali et al., “Hate speech detection on multilingual twitter using convolutional neural networks,” WebSci 2020 – Proc. 12th ACM Conf. Web Sci., vol. 34, no. 4, pp. 1–6, 2020, doi: 10.18280/ria.340111.
C. Parada, M. Dredze, D. Filimonov, and F. Jelinek, “Contextual information improves OOV detection in speech,” NAACL HLT 2010 – Hum. Lang. Technol. 2010 Annu. Conf. North Am. Chapter Assoc. Comput. Linguist. Proc. Main Conf., no. June, pp. 216–224, 2010.
P. Malik, “Toxic Speech Detection using Traditional Machine Learning Models and BERT and fastText Embedding with Deep Neural Networks,” no. Iccmc, pp. 1254–1259, 2021.
N. Vashistha and A. Zubiaga, “Online multilingual hate speech detection: Experimenting with hindi and english social media,” Inf., vol. 12, no. 1, pp. 1–16, 2021, doi: 10.3390/info12010005.
H. Faris, I. Aljarah, M. Habib, and P. A. Castillo, “Hate speech detection using word embedding and deep learning in the Arabic language context,” ICPRAM 2020 – Proc. 9th Int. Conf. Pattern Recognit. Appl. Methods, no. March, pp. 453–460, 2020, doi: 10.5220/0008954004530460.
Q. Hua, S. Qundong, J. Dingchao, G. Lei, Z. Yanpeng, and L. Pengkang, “A Character-Level Method for Text Classification,” Proc. 2018 2nd IEEE Adv. Inf. Manag. Commun. Electron. Autom. Control Conf. IMCEC 2018, no. Imcec, pp. 402–406, 2018, doi: 10.1109/IMCEC.2018.8469258.
J. Wang, Z. Wang, D. Zhang, and J. Yan, “Combining knowledge with deep convolutional neural networks for short text classification,” IJCAI Int. Jt. Conf. Artif. Intell., vol. 0, pp. 2915–2921, 2017, doi: 10.24963/ijcai.2017/406.
T. Chen, R. Xu, Y. He, and X. Wang, “Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN,” Expert Syst. Appl., vol. 72, pp. 221–230, 2017, doi: 10.1016/j.eswa.2016.10.065.
S. González-Carvajal and E. C. Garrido-Merchán, “Comparing BERT against traditional machine learning text classification,” no. Ml, 2020.
J. Moon, W. I. Cho, and J. Lee, “BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection,” no. October, pp. 25–31, 2020, doi: 10.18653/v1/2020.socialnlp-1.4.
D. Ruta and B. Gabrys, “An Overview of Classifier Fusion Methods An Overview of Classifier Fusion Methods,” no. January 2000, 2016.
Y. Zhou, Y. Yang, H. Liu, X. Liu, and N. Savage, “Deep Learning Based Fusion Approach for Hate Speech Detection,” IEEE Access, vol. 8, pp. 128923–128929, 2020, doi: 10.1109/ACCESS.2020.3009244.
T. Li, Y. Zhang, and T. Wang, “SRPM–CNN: a combined model based on slide relative position matrix and CNN for time series classification,” Complex Intell. Syst., vol. 7, no. 3, pp. 1619–1631, 2021, doi: 10.1007/s40747-021-00296-y.
M. M. Ahsan, T. E. Alam, T. Trafalis, and P. Huebner, “Deep MLP-CNN model using mixed-data to distinguish between COVID-19 and Non-COVID-19 patients,” Symmetry (Basel)., vol. 12, no. 9, 2020, doi: 10.3390/sym12091526.
M. Sajjad, F. Zulifqar, M. U. G. Khan, and M. Azeem, “Hate Speech Detection using Fusion Approach,” 2019 Int. Conf. Appl. Eng. Math. ICAEM 2019 – Proc., pp. 251–255, 2019, doi: 10.1109/ICAEM.2019.8853762.
K. O’Shea and R. Nash, “An Introduction to Convolutional Neural Networks,” no. December, 2015, [Online]. Available: http://arxiv.org/abs/1511.08458.
S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” Proc. 2017 Int. Conf. Eng. Technol. ICET 2017, vol. 2018–Janua, no. August, pp. 1–6, 2018, doi: 10.1109/ICEngTechnol.2017.8308186.
S. Modha, T. Mandl, P. Majumder, and D. Patel, “Tracking Hate in Social Media: Evaluation, Challenges and Approaches,” SN Comput. Sci., vol. 1, no. 2, 2020, doi: 10.1007/s42979-020-0082-0.
P. Chiril, E. W. Pamungkas, F. Benamara, V. Moriceau, and V. Patti, Emotionally Informed Hate Speech Detection: A Multi-target Perspective, vol. 14, no. 1. Springer US, 2022.
S. C. Silva, A. B. S. Serapião, and I. Paraboni, “Hate-speech detection in Portuguese using CNN and psycho-linguistic dictionary,” no. September, 2019.
H. Rizwan, M. H. Shakeel, and A. Karim, “Hate-speech and offensive language detection in Roman Urdu,” EMNLP 2020 – 2020 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 2512–2522, 2020, doi: 10.18653/v1/2020.emnlp-main.197.
A. Aref, R. Husni Al Mahmoud, K. Taha, and M. Al-Sharif, “Hate Speech Detection of Arabic Shorttext,” pp. 81–94, 2020, doi: 10.5121/csit.2020.100507.
I. Abu Farha and W. Magdy, “Multitask Learning for {A}rabic Offensive Language and Hate-Speech Detection,” Proc. 4th Work. Open-Source Arab. Corpora Process. Tools, with a Shar. Task Offensive Lang. Detect., no. May, pp. 86–90, 2020, [Online]. Available: https://www.aclweb.org/anthology/2020.osact-1.14.
S. Tructures, Z. Deng, Y. Luo, J. Zhu, and B. Zhang, “B Ayesian L Earning of D Eep N Eural N Etwork,” 2019 Int. Conf. Intell. Comput. Control Syst., no. 2, pp. 1–20, 2019.
M. Beatty, “Graph-Based Methods to Detect Hate Speech Diffusion on Twitter,” Proc. 2020 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Mining, ASONAM 2020, pp. 502–506, 2020, doi: 10.1109/ASONAM49781.2020.9381473.
R. Cao, R. K. W. Lee, and T. A. Hoang, “DeepHate: Hate Speech Detection via Multi-Faceted Text Representations,” WebSci 2020 – Proc. 12th ACM Conf. Web Sci., pp. 11–20, 2020, doi: 10.1145/3394231.3397890.
I. Shahin, A. B. Nassif, and M. B. Alsabek, “COVID-19 Electrocardiograms Classification using CNN Models,” Proc. – Int. Conf. Dev. eSystems Eng. DeSE, vol. 2021–December, pp. 448–452, 2021, doi: 10.1109/DESE54285.2021.9719358.