Machine Learning and Semantic Orientation Ensemble Methods for Egyptian Telecom Tweets Sentiment Analysis

Authors

  • Amira Shoukry Department of Computer Science and Engineering, The American University in Cairo (AUC), Cairo, Egypt
  • Ahmed Rafea Department of Computer Science and Engineering, The American University in Cairo (AUC), Cairo, Egypt

DOI:

https://doi.org/10.13052/jwe1540-9589.1924

Keywords:

Arabic sentiment analysis, lexicon based sentiment analysis, egyptian dialect, arabic opinion mining, ensemble learning

Abstract

The vast amount of data currently available online attracted many parties to analyze sentiments expressed in these data extracting valuable knowledge. Many approaches have been proposed to classify the posted content utilizing a single classifier. However, it has been proven that ensemble learning and combining multiple classifiers may enhance classification performance. The aim of this study is to improve the Egyptian sentiment classification by combining different classification algorithms. First, we investigated the benefit of combining multiple SO classifiers using different subsets from SATALex Egyptian lexicon. Second, we investigated the benefit of combining three classification algorithms; Naïve Bayes, Maximum Entropy and Support Vector Machines, adopted as base-classifiers. The experimental results show that combining classifiers can effectively improve the accuracy of Egyptian dataset sentiment classification. However, building these ensembles require more time for processing than the individual classifiers. The time needed depends on the number of classifiers used and the combination method used to combine these classifiers. Thus, the more classifiers used, the more time needed.

Downloads

Download data is not yet available.

Author Biographies

Amira Shoukry, Department of Computer Science and Engineering, The American University in Cairo (AUC), Cairo, Egypt

Amira Shoukry attended the American University in Cairo (AUC), Egypt where she received her B.Sc. degree in Computer Engineering in 2010. She then obtained her M.Sc. degree in Computer Science in 2013, AUC. Dean’s List of Honors, AUC, spring 2012. Her two main publications are “Preprocessing Egyptian Dialect Tweets for Sentiment Mining” and “Sentence-level Arabic Sentiment Analysis”. Amira has held different testing and software quality engineering senior positions at IBM Technologies since 2013. She, as a software testing expert and professional, has acquired a solid experience in software quality control of either web, desktop, or mobile applications. She is currently working as a test automation manager at IBM leading some of the major projects. Current Research interests are Data and Knowledge Mining, or Pattern Recognition.

Ahmed Rafea, Department of Computer Science and Engineering, The American University in Cairo (AUC), Cairo, Egypt

Ahmed Rafea received his PhD from Paul Sabatier University in Toulouse, France. He is a Computer Science Professor and Ex-Chair of the Computer Science and Engineering Department at the American University in Cairo. He served as the Chair of the Computer Science Department and Vice Dean at the Faculty of Computers and Information, Cairo University. He also served as a Visiting Professor at San Diego State University and National University in the United States. Dr. Rafea has led many projects aiming at using Artificial Intelligence and Expert Systems Technologies for the development of the Agriculture sector in Egypt. Dr. Rafea was the principal investigator of several projects for developing Intelligent Systems, Machine Translation, and Social Media Mining in collaboration with European and American Universities. Dr. Rafea’s research interests are Data, Text and Web Mining, Natural Language Processing and Machine Translation, Knowledge Engineering and Knowledge Based System Development. Dr. Rafea has authored over 200 scientific papers in International and National Journals, Conference Proceedings and Book chapters.

References

Abbasi, A., Chen, H. and Salem, A., “Sentiment Analysis in Multiple Languages: Feature selection for opinion classification in Web forums,” ACM Transactions on Information Systems (TOIS), v. 26, no. 3, pp. 12, 2008.

Augustyniak, Łukasz; Szymañski, Piotr; Kajdanowicz, Tomasz; Tuligłowicz, Włodzimierz; Alhajj, Reda; Szymanski, Boleslaw and Kazienko, Prze-mysław. (2014). “Simpler is better? Lexicon-based ensemble sentiment classification beats supervised methods”. In Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), Beijing, China, 17–20 August 2014; pp. 924–929.

Augustyniak, Łukasz; Szymañski, Piotr; Kajdanowicz, Tomasz; and Kazienko, Przemysław. (2016). “Fast and accurate - improving lexicon-based sentiment classification with an Ensemble Methods”. Conference: 8th Asian Conference on Intelligent Information and Database Systems, 14–16.

Catal, Cagatay; and Nangir, Mehmet. (2017). “A sentiment classification model based on multiple classifiers”. Applied Soft Computing, 50 (2017), pp. 135–14.

Deng, Lingjia, and Janyce Wiebe. “Sentiment Propagation via Implicature Constraints.” Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014, doi:10.3115/v1/e14-1040.

Hamilton, William L., et al. “Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora.” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, doi:10.18653/v1/d16-1057.

Hovy, Dirk. “Demographic Factors Improve Classification Performance.” Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015, doi:10.3115/v1/p15-1073.

Medhat, Walaa; Hassan Yousef, Ahmed; and Mohamed, Hoda. (2014). “Sentiment Analysis Algorithms and Applications: A Survey”. Ain Shams Engineering Journal. 5. 10.1016/j.asej.2014.04.011.

Ohana, Bruno; Tierney, Brendan; and Delany, Sarah Jane. (2011). “Domain independent sentiment classification with many lexicons.” In 4th International Symposium on Mining and Web at 25th International Conference on Advanced Information Networking and Applications (AINA), pages 632–637. IEEE Computer Society. doi:10.1109/WAINA.2011.103

Oussous, Ahmed; Lahcen, Ayoub Ait; Belfkih, Samir. (2018). “Improving Sentiment Analysis of Moroccan Tweets Using Ensemble Learning”. In: Tabii Y., Lazaar M., Al Achhab M., Enneya N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham.

Shoukry, Amira, Rafea, Ahmed. 2012. “Preprocessing Egyptian Dialect Tweets for Sentiment Mining”. In Proceedings of the fourth workshop on Computational Approaches to Arabic Script-Based Languages. pp. 47–56, San Diego, California, USA.

Shoukry, Amira; and Rafea, Ahmed (2019). “SATALex: Telecom Domain-specific Sentiment Lexicons for Egyptian and Gulf Arabic Dialects”. In Proceedings of the 15th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, 169–176, 2019, Vienna, Austria.

Yang, Yi and Jacob Eisenstein. “Putting Things in Context: Community-specific Embedding Projections for Sentiment Analysis.” CoRR abs/1511.06052 (2015).

Zhou, Zhi-Hua. (2012). “Ensemble Methods: Foundations and Algorithms.” CRC Press, Boca Raton (2012).

Published

2020-06-03

How to Cite

Shoukry, A., & Rafea, A. (2020). Machine Learning and Semantic Orientation Ensemble Methods for Egyptian Telecom Tweets Sentiment Analysis. Journal of Web Engineering, 19(2), 195–214. https://doi.org/10.13052/jwe1540-9589.1924

Issue

Section

SPECIAL ISSUE: Advanced Practices in Web Engineering 2020