AMR-CNN: Abstract Meaning Representation with Convolution Neural Network for Toxic Content Detection

Authors

  • Ermal Elbasani Department of Computer Science and Engineering, Sun Moon University, Asan, 31460 South Korea https://orcid.org/0000-0001-7051-3299
  • Jeong-Dong Kim Department of Computer Science and Engineering, Sun Moon University, Asan, 31460 South Korea, Genome-based BioIT Convergence Institute, Sun Moon University, Asan, 31460 South Korea https://orcid.org/0000-0002-5113-221X

DOI:

https://doi.org/10.13052/jwe1540-9589.2135

Keywords:

Toxic content detection, Texta analysis, Abstract meaning representation, Convolution neural network, Natural Language Processing.

Abstract

Recognizing the offensive, abusive, and profanity of multimedia content on the web has been a challenge to keep the web environment for user’s freedom of speech. As profanity filtering function has been developed and applied in text, audio, and video context in platforms such as social media, entertainment, and education, the number of methods to trick the web-based application also has been increased and became a new issue to be solved. Compared to commonly developed toxic content detection systems that use lexicon and keyword-based detection, this work tries to embrace a different approach by the meaning of the sentence. Meaning representation is a way to grasp the meaning of linguistic input. This work proposed a data-driven approach utilizing Abstract meaning Representation to extract the meaning of the online text content into a convolutional neural network to detect level profanity. This work implements the proposed model in two kinds of datasets from the Offensive Language Identification Dataset and other datasets from the Offensive Hate dataset merged with the Twitter Sentiment Analysis dataset. The results indicate that the proposed model performs effectively, and can achieve a satisfactory accuracy in recognizing the level of online text content toxicity.

Downloads

Download data is not yet available.

Author Biographies

Ermal Elbasani, Department of Computer Science and Engineering, Sun Moon University, Asan, 31460 South Korea

Ermal Elbasani received his bachelor’s and master’s degree in Electronic Engineering in Electronic Engineering from Polytechnic University of Tirana, in Albania, 2011 and 2013 respectively. Currently attending the philosophy of doctorate in Computer and Electronics Engineering from Sunmoon University in 2021 South Korea. His research interest includes healthcare and wellness monitoring, graph and deep learning, and biological sequential data analysis.

Jeong-Dong Kim, Department of Computer Science and Engineering, Sun Moon University, Asan, 31460 South Korea, Genome-based BioIT Convergence Institute, Sun Moon University, Asan, 31460 South Korea

Jeong-Dong Kim received the bachelor’s degree in computer engineering from Sun Moon University in 2005. He received his M.S. and Ph.D. degrees in Computer Science from Korea University at Korea in 2008 and 2012, respectively. He is an associate professor in the department of computer science and engineering, Sun Moon University, Asan, Korea. His research interests include bigdata analysis based on deep learning, Healthcare, software & data engineering, and bioinformatics.

References

Gaydhani, A., Doma, V., Kendre, S. and Bhagwat, L. Detecting hate speech and offensive language on twitter using machine learning: An n-gram and tfidf based approach. arXiv preprint arXiv:1809.08651, 2018.

Watanabe, H., Bouazizi, M. and Ohtsuki, T. Hate speech on twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE access, 6, pp. 13825–13835, 2018.

Davidson, T., Warmsley, D., Macy, M.W. and Weber, I., Automated hate speech detection and the problem of offensive language. CoRR, abs/1703.04009. URL: http://arxiv.org/abs/1703.04009, 2017.

Hua, T., Chen, F., Zhao, L., Lu, C.T. and Ramakrishnan, N., STED: semi-supervised targeted-interest event detectionin in twitter. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1466–1469), August 2013.

Burnap, P. and Williams, M.L., Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data science, 5(1), p. 11, 2016.

Xiang, G., Fan, B., Wang, L., Hong, J. and Rose, C., October. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 1980–1984, 2012.

Gitari, N.D., Zuping, Z., Damien, H. and Long, J., A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering, 10(4), pp. 215–230, 2015.

Pavlopoulos, J., Malakasiotis, P. and Androutsopoulos, I., Deeper attention to abusive user content moderation. In Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 1125–1135, September 2017.

Pitsilis, G.K., Ramampiaro, H. and Langseth, H., Detecting offensive language in tweets using deep learning. arXiv preprint arXiv:1801.04433, 2018.

Gao, L. and Huang, R. Detecting online hate speech using context aware models. arXiv preprint arXiv:1710.07395, 2017.

Park, J.H. and Fung, P., One-step and two-step classification for abusive language detection on twitter. arXiv preprint arXiv:1706.01206, 2017.

Badjatiya, P., Gupta, S., Gupta, M. and Varma, V., April. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion (pp. 759–760), 2017.

Park, J.H. and Fung, P., One-step and two-step classification for abusive language detection on twitter. arXiv preprint arXiv:1706.01206, 2017.

Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G. and Plagianakos, V.P. Convolutional neural networks for toxic comment classification. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence pp. 1–6, July, 2018.

Khieu, K. and Narwal, N., Detecting and classifying toxic comments. Web: https://web.stanford.edu/class/archive/cs/cs224n/cs224n,1184.

Chu, T., Jue, K. and Wang, M., 2016. Comment abuse classification with deep learning. Von https://web.stanford.edu/class/cs224n/reports/2762092.pdf abgerufen.

Kohli, M., Kuehler, E. and Palowitch, J., Paying attention to toxic comments online. Web: https://web.stanford.edu/class/archive/cs/cs224n/cs224n, 1184.

Banarescu, L., Bonial, C., Cai, S., Georgescu, M., Griffitt, K., Hermjakob, U., Knight, K., Koehn, P., Palmer, M. and Schneider, N., Abstract meaning representation for sembanking. In Proceedings of the 7th linguistic annotation workshop and interoperability with discourse, pp. 178–186, 2013.

Matthiessen, C.M.I.M. and BATEMAN, J., Systemic-Functional Linguistics in Language Generation: Penman, 1991.

Flanigan, J., Thomson, S., Carbonell, J.G., Dyer, C. and Smith, N.A., June. A discriminative graph-based parser for the abstract meaning representation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1426–1436), 2014.

Kipf, T.N. and Welling, M., Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.

Rao, S., Marcu, D., Knight, K., Daumé III, H., Biomedical event extraction using abstract meaning representation. BioNLP 2017 pp. 126–135, 2017.

Dohare, S., Karnick, H., Text summarization using abstract meaning representation. arXiv preprint arXiv:1706.01678, 2017.

Song, L., Zhang, Y., Peng, X., Wang, Z., Gildea, D., Amr-to-text generation as a traveling salesman problem. In: EMNLP 2016.

Tayal, Kshitij, Rao Nikhil, Saurabh Agarwal, and Karthik Subbian. “Short text classification using graph convolutional network.” In NIPS workshop on Graph Representation Learning. 2019.

Guo, Beibei, Yu Xiao, Chiping Zhang, and Yong Zhao. “Graph theory-based adaptive intermittent synchronization for stochastic delayed complex networks with semi-Markov jump.” Applied Mathematics and Computation 366: 124739, 2020.

Downloads

Published

2022-02-22

Issue

Section

SPECIAL ISSUE ON Future Multimedia Contents and Technology on Web in the 5G Era