A Study of Profanity Effect in Sentiment Analysis on Natural Language Processing Using ANN

Authors

  • Cheong-Ghil Kim Dept. of Computer Science, Namseoul University, CA, USA
  • Young-Jun Hwang Dept. of Computer Science, Namseoul University, CA, USA
  • Chayapol Kamyod Computer and Communication Engineering for Capacity Building Research Center, School of Information Technology, Mae Fah Luang University, Chiang Rai 57100, Thailand

DOI:

https://doi.org/10.13052/jwe1540-9589.2139

Keywords:

Deep Learning, Sentiment Analysis, Opinion Mining, Natural Language Processing, Stop words

Abstract

The development of wireless communication technology and mobile devices has brought about the advent of an era of sharing text data that overflows on social media and the web. In particular, social media has become a major source of storing people’s sentiments in the form of opinions and views on specific issues in the form of unstructured information. Therefore, the importance of emotion analysis is increasing, especially with machine learning for both personal life and companies’ management environments. At this time, data reliability is an essential component for data classification. The accuracy of sentiment classification can be heavily determined according to the reliability of data, in which case noise data may also influence this classification. Although there is stopword that does not have meaning in such noise data, data that does not fit the purpose of analysis can also be referred to as noise data. This paper aims to provide an analysis of the impact of profanity data on deep learning-based sentiment classification. For this purpose, we used movie review data on the Web and simulated the changes in performance before and after the removal of the profanity data. The accuracy of the model trained with the data and the model trained with the data before removal were compared to determine whether the profanity is noise data that lowers the accuracy in sentiment analysis. The simulation results show that the accuracy dropped by about 2% when judging profanity as noise data in the sentiment classification for review data.

Downloads

Download data is not yet available.

Author Biographies

Cheong-Ghil Kim, Dept. of Computer Science, Namseoul University, CA, USA

Cheong-Ghil Kim received the B.S. in Computer Science from University of Redlands, CA, U.S.A. in 1987. He received the M.S. and Ph.D. degree in Computer Science from Yonsei University, Korea, in 2003 and 2006, respectively. Currently, he is a professor at the Department of Computer Science, Namseoul University, Korea. His research areas include Multimedia Embedded Systems, Mobile AR, and 3D Contents. He is a member of IEEE.

Young-Jun Hwang, Dept. of Computer Science, Namseoul University, CA, USA

Young-Jun Hwang is an undergraduate student majoring with Computer Science at the Namseoul University in Korea. He completed the 9th Best of the Best course organized by KITRI. His research interests are cyber security and Machine Learning.

Chayapol Kamyod, Computer and Communication Engineering for Capacity Building Research Center, School of Information Technology, Mae Fah Luang University, Chiang Rai 57100, Thailand

Chayapol Kamyod received his Ph.D. in Wireless Communication from the Center of TeleInFrastruktur (CTIF) at Aalborg University (AAU), Denmark. He received M. Eng. in Electrical Engineering from The City College of New York, New York, USA. In addition, he received B.Eng. in Telecommunication Engineering and M. Sci. in Laser Technology and Photonics from Suranaree University of Technology, Nakhon Ratchasima, Thailand. He is currently a lecturer in Computer Engineering program at School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand. His research interests are resilience and reliability of computer network and system, wireless sensor networks, embedded technology, and IoT applications.

References

Ashima Yadav and Dinesh Kumar Vishwakarma. Sentiment analysis using deep learning architectures: a review, Artificial Intelligence Review 53:4335–4385, 2020.

K. Xu, G. Qi, J. Huang, T. Wu, and X. Fu. Detecting Bursts in Sentiment-Aware Topics from Social Media, Knowledge-Based Systems, Vol. 141, pp. 44–54 DOI: 10.1016/j.knosys.2017.11.007, February 2018.

Umar Ishfaq and Khalid Iqbal. Identifying the Influential bloggers: A modular approach based on Sentiment Analysis, Journal of Web Engineering, 16(5 & 6): pp. 505–523, 2017.

Ruirong Xue, Subin Huang, Xiangfeng Luo, Dandan Jiang, and Yan Peng. Semantic Emotion-Topic Model in Social Media Environment, Journal of Web Engineering, 17(1 & 2): pp. 073–092, 2018.

L. Zhang and B. Liu. Sentiment Analysis and Opinion Mining, In: Sammut C., Webb G.I. (eds) Encyclopaedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_907, 2017.

Julia Hirschberg and Christopher D. Manning Advances in natural language processing, Science, 349(6245), pp. 261–266, DOI: 10.1126/science.aaa8685, 17 July 2015.

Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent Trends in Deep Learning Based Natural Language Processing, IEEE Computational Intelligence Magazine 13(3): pp. 55–75 DOI: 10.1109/MCI.2018.2840738, August 2018.

Geetika Gautam and Divakar Yadav. Sentiment Analysis of Twitter Data Using Machine Learning Approaches and Semantic Analysis, Proc. of 2014 Seventh International Conference on Contemporary Computing (IC3), pp. 7–9 August 2014.

Erik Cambria, Björn Schuller, Yunqing Xia, and Catherine Havasi. New Avenues in Opinion Mining and Sentiment Analysis, IEEE Intelligent Systems, 28(2): 15–21, 2013.

P. Chandrasekar and K. Qian. The Impact of Data Preprocessing on the Performance of a Naive Bayes Classifier, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), pp. 618–619, doi: 10.1109/COMPSAC.2016.205, 2016.

P. M. Nadkarni, L. O. Machado, and W. W. Chapman. Natural language processing: an introduction, Journal of the American Medical Informatics Association, Vol. 18, pp. 544–551, 2011.

Sumit Chopra, Michael Auli, and Alexander M. Rush. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks, 2016 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2016.

Daniel W. Otter, Julian R. Medina, and Jugal K. Kalita. A Survey of the Usages of Deep Learning for Natural Language Processing, IEEE Transactions on Ransactions on Neural Networks and Learning Systems, 32(2) February 2021.

D. Rumelhart, G. Hinton, and R. Williams, Learning internal representations by error propagation, UCSD, La Jolla, CA, USA, Tech. Rep. ICS-8506, 1985.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., 9(8): pp. 1735–1780, 1997.

K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv:1406.1078. [Online]. Available: http://arxiv.org/abs/1406.1078, 2014.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need, in Proc. NIPS, pp. 6000–6010, 2017.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv:1810.04805. [Online]. Available: http://arxiv.org/abs/1810.04805, 2018.

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. Improving Language Understanding by Generative PreTraining, [Online]. Available: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.

Bing Liu, Sentiment Analysis and Opinion Mining, Synthesis Lectures on Human Language Technologies, 5(1): pp. 1–167, May 2012.

D. P. Hong, H. Jeong, S. Park, E. Han, H. Kim, and I. Yun. Study on the Methodology for Extracting Information from SNS Using a Sentiment Analysis, J. Korea Inst. 16(6): pp. 141–155 December 2017.

H. Y. Park, K. J. Kim, Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode, Journal of Intelligence and Information Systems, Vol. 25, pp. 141–154, 2019.

W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent Neural Network Regularization, International Conference on Learning Representations, 2015.

E. J. Lee Basic and applied research of CNN and RNN, Broadcasting and Media Magazine, 22(1): pp. 87–95, 2017.

Y. Bengio, P. Simard, and P. Fransconi. Long-term dependencies with gradient descent is difficult IEEE Transaction on Neural Networks, 5(2) March 1994.

NaverSentimentcorpus https://github.com/e9t/nsmc. access 2021-09-21.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov Dropout: A Simple Way to Prevent Neural Networks From Overfitting, Journal of Machine Learning Research, 1929–1958, 2014.

Colab https://colab.research.google.com/

Downloads

Published

2022-03-22

How to Cite

Kim, C.-G. ., Hwang, Y.-J. ., & Kamyod, C. . (2022). A Study of Profanity Effect in Sentiment Analysis on Natural Language Processing Using ANN. Journal of Web Engineering, 21(03), 751–766. https://doi.org/10.13052/jwe1540-9589.2139

Issue

Section

SPECIAL ISSUE ON Future Multimedia Contents and Technology on Web in the 5G Era