Refining Word Embeddings with Sentiment Information for Sentiment Analysis

Authors

  • Mohammed Kasri Computer Science Department, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco
  • Marouane Birjali Computer Science Department, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco
  • Mohamed Nabil Computer Science Department, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco
  • Abderrahim Beni-Hssane Computer Science Department, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco
  • Anas El-Ansari Mohammed First University, MASI Laboratory, Nador, Morocco
  • Mohamed El Fissaoui Mohammed First University, MASI Laboratory, Nador, Morocco

DOI:

https://doi.org/10.13052/jicts2245-800X.1031

Keywords:

Sentiment embeddings, Sentiment analysis, Word embeddings, Sentiment lexicon, Deep learning

Abstract

Natural Language Processing problems generally require the use of pre-trained distributed word representations to be solved with deep learning models. However, distributed representations usually rely on contextual information which prevents them from learning all the important word characteristics. The task of sentiment analysis suffers from such a problem because sentiment information is ignored during the process of learning word embeddings. The performance of sentiment analysis can be affected since two words with similar vectors may have opposite sentiment orientations. The present paper introduces a novel model called Continuous Sentiment Contextualized Vectors (CSCV) to address this problem. The proposed model can learn word sentiment embedding using its surrounding context words. It uses Continuous Bag-of-Words (CBOW) model to deal with the context and sentiment lexicons to identify sentiment. Existing pre-trained vectors are combined then with the obtained sentiment vectors using Principal component analysis (PCA) to enhance their quality. The experiments show that: (1) CSCV vectors can be used to enhance any pre-trained word vectors; (2) The result vectors strongly alleviate the problem of similar words with opposite polarities; (3) The performance of sentiment classification is improved by applying this approach.

Downloads

Download data is not yet available.

Author Biographies

Mohammed Kasri, Computer Science Department, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco

Mohammed Kasri is a PhD student received the bachelor’s degree in computer science from Ibn Tofail University in 2011 and the master’s degree in computer science from Chouib Doukkali University in 2014. Fields of interest: Sentiment Analysis, Machine Learning, Big Data and Programming Languages.

Marouane Birjali, Computer Science Department, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco

Marouane Birjali received his PhD. degree in computer science from the Faculty of Sciences, Chouaïb Doukkali University, El Jadida since 2019. Currently, he is a Researcher in the same faculty and working as an IT engineer. His research interests include Big Data, AI and Sentiment Analysis.

Mohamed Nabil, Computer Science Department, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco

Mohamed Nabil received the B.Sc. degree in Computer Sciences from Hassan 1st University, Faculty of Sciences and Technical of Settat in Morocco, in 2001, and a M.Sc. degree in engineering decision from the Hassan 1st University, Faculty of Sciences and Techniques of Settat in Morocco, in 2008. Professor of Computer Sciences in high school – from 2002 to 2019. Assistant Professor at the Faculty of Sciences of El-jadida in Morocco – since 2019. H’s a Member of LaROSERI at Faculty of Sciences of El-jadida. His current research interests are: Vehicular Ad hoc Networks (Security and QoS), Simulation Network Performance, Network Protocols and Analysis of Quality of Service in Next Generation Networks, Natural Language Processing, and Game Theory.

Abderrahim Beni-Hssane, Computer Science Department, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco

Abderrahim Beni-Hssane received his Ph.D. degree in computer science from Mohamed V University, Rabat, Morocco, in 1997. Since September 1994, he has been a Researcher and a Professor at the Science Faculty, Chouaib Doukkali University, El Jadida, Morocco. His research interests include Performance evaluation in wireless networks, Cryptography, Sentiment Analysis, Cloud Computing, and Big Data.

Anas El-Ansari, Mohammed First University, MASI Laboratory, Nador, Morocco

Anas El-Ansari received his PhD. degree in computer science from the Faculty of Sciences, Chouaïb Doukkali University, El Jadida. Currently, he is a Researcher and a Professor with Polydisciplinary Faculty of Nador, Mohamed First University, Morocco. His research interests include Recommender systems, Cryptography, Privacy, Sentiment Analysis and Semantic Web.

Mohamed El Fissaoui, Mohammed First University, MASI Laboratory, Nador, Morocco

Mohamed El Fissaoui is a researcher and professor at High School of Technology of Nador, Mohammed First University Oujda, Morocco. He received a master degree in computer science and a Ph.D degree in Computer Sciences from the faculty of sciences, Chouaîb Doukkali University, El Jadida, Morocco. His current interests include developing a specification and design techniques for use within Intelligent Network, data mining, Big data, information Retrieval, Mobile Agents, Vanets. He is also a member of MASI laboratory, at FPN, Nador.

References

Birjali, M., Kasri, M., and Beni-Hssane, A., 2021, “A Comprehensive Survey on Sentiment Analysis: Approaches, Challenges and Trends,” Knowledge-Based Syst., pp. 1–26.

El-Ansari, A., Beni-Hssane, A., and Saadi, M., 2020, “An Improved Modeling Method for Profile-Based Personalized Search,” Proceedings of the 3rd International Conference on Networking, Information Systems & Security, ACM, New York, NY, USA, pp. 1–6.

El-Ansari, A., Beni-Hssane, A., and Saadi, M., 2017, “A Multiple Ontologies Based System for Answering Natural Language Questions,” pp. 177–186.

Atzeni, M., and Reforgiato Recupero, D., 2020, “Multi-Domain Sentiment Analysis with Mimicked and Polarized Word Embeddings for Human–Robot Interaction,” Futur. Gener. Comput. Syst., 110, pp. 984–999.

Ali, F., Kwak, D., Khan, P., El-Sappagh, S., Ali, A., Ullah, S., Kim, K. H., and Kwak, K.-S., 2019, “Transportation Sentiment Analysis Using Word Embedding and Ontology-Based Topic Modeling,” Knowledge-Based Syst., 174, pp. 27–42.

Dessí, D., Dragoni, M., Fenu, G., Marras, M., and Reforgiato Recupero, D., 2020, “Deep Learning Adaptation with Word Embeddings for Sentiment Analysis on Online Course Reviews,” pp. 57–83.

Kaibi, I., Nfaoui, E. H., and Satori, H., 2020, “Sentiment Analysis Approach Based on Combination of Word Embedding Techniques,” pp. 805–813.

Kasri, M., Birjali, M., and Beni-Hssane, A., 2019, “A Comparison of Features Extraction Methods for Arabic Sentiment Analysis,” Proceedings of the 4th International Conference on Big Data and Internet of Things, ACM, New York, NY, USA, pp. 1–6.

Mikolov, T., Chen, K., Corrado, G., and Dean, J., 2013, “Efficient Estimation of Word Representations in Vector Space,” pp. 1–12.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J., 2013, “Distributed Representations of Words and Phrases and Their Compositionality.”

Pennington, J., Socher, R., and Manning, C. D., 2014, “GloVe: Global Vectors for Word Representation,” Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543.

Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T., 2016, “Enriching Word Vectors with Subword Information,” pp. 1–12.

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L., 2018, “Deep Contextualized Word Representations.”

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., 2018, “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.”

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I., 2019, “Language Models Are Unsupervised Multitask Learners.”

Araque, O., Corcuera-Platas, I., Sánchez-Rada, J. F., and Iglesias, C. A., 2017, “Enhancing Deep Learning Sentiment Analysis with Ensemble Techniques in Social Applications,” Expert Syst. Appl., 77, pp. 236–246.

Giatsoglou, M., Vozalis, M. G., Diamantaras, K., Vakali, A., Sarigiannidis, G., and Chatzisavvas, K. C., 2017, “Sentiment Analysis Leveraging Emotions and Word Embeddings,” Expert Syst. Appl., 69, pp. 214–224.

Kasri, M., Birjali, M., and Beni-Hssane, A., 2021, “Word2Sent: A New Learning Sentiment-Embedding Model with Low Dimension for Sentence Level Sentiment Classification,” Concurr. Comput. , 33(9), pp. 1–12.

Mikolov, T., Yih, W., and Zweig, G., 2013, “Linguistic Regularities in Continuous Space Word Representations,” Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751.

Erritali, M., Beni-Hssane, A., Birjali, M., and Madani, Y., 2016, “An Approach of Semantic Similarity Measure between Documents Based on Big Data,” Int. J. Electr. Comput. Eng., 6(5), pp. 1–10.

El-Ansari, A., Beni-Hssane, A., Saadi, M., and El Fissaoui, M., 2021, “PAPIR: Privacy-Aware Personalized Information Retrieval,” J. Ambient Intell. Humaniz. Comput., 12(10), pp. 9891–9907.

El-Ansari, A., Beni-hssane, A., and Saadi, M., 2020, “An Ontology-Based Profiling Method for Accurate Web Personalization Systems.”

Naderalvojoud, B., and Sezer, E. A., 2020, “Sentiment Aware Word Embeddings Using Refinement and Senti-Contextualized Learning Approach,” Neurocomputing, 405, pp. 149–160.

Rezaeinia, S. M., Rahmani, R., Ghodsi, A., and Veisi, H., 2019, “Sentiment Analysis Based on Improved Pre-Trained Word Embeddings,” Expert Syst. Appl., 117, pp. 139–147.

Yu, L.-C., Wang, J., Lai, K. R., and Zhang, X., 2018, “Refining Word Embeddings Using Intensity Scores for Sentiment Analysis,” IEEE/ACM Trans. Audio, Speech, Lang. Process., 26(3), pp. 671–681.

Baccianella, S., Esuli, A., and Sebastiani, F., 2010, “SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining,” Proceedings of the International Conference on Language Resources and Evaluation, {LREC} 2010, 17–23 May 2010, Valletta, Malta, N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, and D. Tapias, eds., European Language Resources Association.

Cambria, E., Poria, S., Hazarika, D., and Kwok, K., 2018, “SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings,” Thirty-Second AAAI Conference on Artificial Intelligence.

Hutto, C. J., and Gilbert, E., 2015, “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text,” Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014.

Kasri, M., Birjali, M., El-Ansari, A., and Beni-Hssane, A., 2021, “Enhanced Word Embeddings with Sentiment Contextualized Vectors for Sentiment Analysis,” The International Conference on Information, Communication & Cybersecurity, ICI2C’21, Khouribga, Morocco, pp. 1–10.

Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C., 2003, “A Neural Probabilistic Language Model,” J. Mach. Learn. Res., 3(null), pp. 1137–1155.

Collobert, R., and Weston, J., 2008, “A Unified Architecture for Natural Language Processing,” Proceedings of the 25th International Conference on Machine Learning – ICML ’08, ACM Press, New York, New York, USA, pp. 160–167.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P., 2011, “Natural Language Processing (Almost) from Scratch.”

Levy, O., and Goldberg, Y., 2014, “Dependency-Based Word Embeddings,” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Baltimore, Maryland, pp. 302–308.

Radford, A., 2018, “Improving Language Understanding by Generative Pre-Training.”

Hu, B., Tang, B., Chen, Q., and Kang, L., 2016, “A Novel Word Embedding Learning Model Using the Dissociation between Nouns and Verbs,” Neurocomputing, 171, pp. 1108–1117.

Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C., 2011, “Learning Word Vectors for Sentiment Analysis,” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Portland, Oregon, USA, pp. 142–150.

Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., and Qin, B., 2014, “Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification,” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1555–1565.

Ren, Y., Zhang, Y., Zhang, M., and Ji, D., 2016, “Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings,” Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI Press, pp. 3038–3044.

Lan, M., Zhang, Z., Lu, Y., and Wu, J., 2016, “Three Convolutional Neural Network-Based Models for Learning Sentiment Word Vectors towards Sentiment Analysis,” 2016 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 3172–3179.

Warriner, A. B., Kuperman, V., and Brysbaert, M., 2013, “Norms of Valence, Arousal, and Dominance for 13,915 English Lemmas,” Behav. Res. Methods, 45(4), pp. 1191–1207.

Yin, R., Li, P., and Wang, B., 2017, “Sentiment Lexical-Augmented Convolutional Neural Networks for Sentiment Analysis,” 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC), IEEE, pp. 630–635.

Yu, L.-C., Wang, J., Lai, K. R., and Zhang, X., 2017, “Refining Word Embeddings for Sentiment Analysis,” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 534–539.

Miller, G. A., 1995, “WordNet: A Lexical Database for English,” Commun. ACM, 38(11), pp. 39–41.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J., 1986, “Learning Representations by Back-Propagating Errors,” Nature, 323(6088), pp. 533–536.

Pang, B., and Lee, L., 2005, “Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales.”

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C., 2013, “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, Washington, USA, pp. 1631–1642.

Kim, Y., 2014, “Convolutional Neural Networks for Sentence Classification.”

Chen, T., Xu, R., He, Y., and Wang, X., 2017, “Improving Sentiment Analysis via Sentence Type Classification Using BiLSTM-CRF and CNN,” Expert Syst. Appl., 72, pp. 221–230.

Tai, K. S., Socher, R., and Manning, C. D., 2015, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks,” Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Beijing, China, pp. 1556–1566.

Liu, P., Qiu, X., and Huang, X., 2016, “Recurrent Neural Network for Text Classification with Multi-Task Learning.”

Liu, G., and Guo, J., 2019, “Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text Classification,” Neurocomputing, 337, pp. 325–338.

Chen, X., Rao, Y., Xie, H., Wang, F. L., Zhao, Y., and Yin, J., 2019, “Sentiment Classification Using Negative and Intensive Sentiment Supplement Information,” Data Sci. Eng., 4(2), pp. 109–118.

El Makkaoui, K., Ezzati, A., Beni-Hssane, A., and Motamed, C., 2016, “Cloud Security and Privacy Model for Providing Secure Cloud Services,” 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech), IEEE, pp. 81–86.

El Makkaoui, K., Beni-Hssane, A., and Ezzati, A., 2019, “Speedy Cloud-RSA Homomorphic Scheme for Preserving Data Confidentiality in Cloud Computing,” J. Ambient Intell. Humaniz. Comput., 10(12), pp. 4629–4640.

El Makkaoui, K., Ezzati, A., and Beni-Hssane, A., 2017, “Cloud-RSA: An Enhanced Homomorphic Encryption Scheme,” pp. 471–480.

Downloads

Published

2022-08-10

How to Cite

Kasri, M. ., Birjali, M. ., Nabil, M. ., Beni-Hssane, A. ., El-Ansari, A. ., & El Fissaoui, M. . (2022). Refining Word Embeddings with Sentiment Information for Sentiment Analysis. Journal of ICT Standardization, 10(03), 353–382. https://doi.org/10.13052/jicts2245-800X.1031

Issue

Section

Intelligent Systems for Smart Applications

Most read articles by the same author(s)