SENTIMENT CLASSIFICATION OF ARABIC TWEETS: A SUPERVISED APPROACH
Keywords:
Sentiment Analysis, opinion mining, Arabic, Twitter, Machine Learning, Supervised ApproachAbstract
Social media platforms have proven to be a powerful source of opinion sharing. Thus, mining and analyzing these opinions has an important role in decision-making and product benchmarking. However, the manual processing of the huge amount of content that these web-based applications host is an arduous task. This has led to the emergence of a new field of research known as Sentiment Analysis. In this respect, our objective in this work is to investigate sentiment classification in Arabic tweets using machine learning. Three classifiers namely Naïve Bayes, Support Vector Machine and K-Nearest Neighbor were evaluated on an in-house developed dataset using different features. A comparison of these classifiers has revealed that Support Vector Machine outperforms others classifiers and achieves a 78% accuracy rate.
Downloads
References
B. Liu, ‘Sentiment analysis and opinion mining’, Synth. Lect. Hum. Lang. Technol., vol. 5, no. 1,
pp. 1–167, 2012.
A. Mountassir, H. Benbrahim, and I. Berrada, ‘An empirical study to address the problem of
unbalanced data sets in sentiment classification’, presented at the Systems, Man, and Cybernetics
(SMC), 2012 IEEE International Conference on, 2012, pp. 3298–3303.
M. Rushdi‐Saleh, M. T. Martín‐Valdivia, L. A. Ureña‐López, and J. M. Perea‐Ortega, ‘OCA:
Opinion corpus for Arabic’, J. Am. Soc. Inf. Sci. Technol., vol. 62, no. 10, pp. 2045–2054, 2011.
M. A. Aly and A. F. Atiya, ‘LABR: A Large Scale Arabic Book Reviews Dataset.’, presented at
the ACL (2), 2013, pp. 494–498.
S. R. El-Beltagy and A. Ali, ‘Open issues in the sentiment analysis of Arabic social media: A case
study’, presented at the Innovations in information technology (iit), 2013 9th international
conference on, 2013, pp. 215–220.
N. El-Makky et al., ‘Sentiment analysis of colloquial Arabic tweets’, 2015.
S. Kiritchenko, S. M. Mohammad, and M. Salameh, ‘SemEval-2016 Task 7: Determining
sentiment intensity of english and arabic phrases’, presented at the Proceedings of the International
Workshop on Semantic Evaluation (SemEval), San Diego, California, June, 2016.
S. R. El-Beltagy, ‘NileTMRG at SemEval-2016 Task 7: Deriving Prior Polarities for Arabic
Sentiment Terms’, Proc. SemEval, pp. 486–490, 2016.
E. Refaee and V. Rieser, ‘iLab-Edinburgh at SemEval-2016 Task 7: A hybrid approach for
determining sentiment intensity of Arabic Twitter phrases’, Proc. SemEval, pp. 474–480, 2016.
A. Htait, S. Fournier, and P. Bellot, ‘LSIS at SemEval-2016 Task 7: Using web search engines for
English and Arabic unsupervised sentiment intensity prediction’, Proc. SemEval, pp. 469–473,
S. R. El-Beltagy, T. Khalil, A. Halaby, and M. Hammad, ‘Combining Lexical Features and a
Supervised Learning Approach for Arabic Sentiment Analysis’, 2016.
J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, ‘Tackling the poor assumptions of naive bayes
text classifiers’, presented at the ICML, 2003, vol. 3, pp. 616–623.
P. S. Dodds et al., ‘Human language reveals a universal positivity bias’, Proc. Natl. Acad. Sci., vol.
, no. 8, pp. 2389–2394, 2015.
M. Abdul-Mageed and M. T. Diab, ‘SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon
for Arabic Subjectivity and Sentiment Analysis.’, presented at the LREC, 2014, pp. 1162–1169.
E. Kouloumpis, T. Wilson, and J. D. Moore, ‘Twitter sentiment analysis: The good the bad and the
omg!’, Icwsm, vol. 11, pp. 538–541, 2011.
E. Refaee and V. Rieser, ‘An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis.’,
presented at the LREC, 2014, pp. 2268–2273.
C. E. Shannon, ‘A mathematical theory of communication’, ACM SIGMOBILE Mob. Comput.
Commun. Rev., vol. 5, no. 1, pp. 3–55, 2001.
H. Yu and V. Hatzivassiloglou, ‘Towards answering opinion questions: Separating facts from
opinions and identifying the polarity of opinion sentences’, presented at the Proceedings of the
conference on Empirical methods in natural language processing, 2003, pp. 129–136.
B. Pang, L. Lee, and S. Vaithyanathan, ‘Thumbs up?: sentiment classification using machine
learning techniques’, presented at the Proceedings of the ACL-02 conference on Empirical
methods in natural language processing-Volume 10, 2002, pp. 79–86.
A. Go, R. Bhayani, and L. Huang, ‘Twitter sentiment classification using distant supervision’,
CS224N Proj. Rep. Stanf., vol. 1, p. 12, 2009.
T. Joachims, ‘Text categorization with support vector machines: Learning with many relevant
features’, presented at the European conference on machine learning, 1998, pp. 137–142.
R. Tokuhisa, K. Inui, and Y. Matsumoto, ‘Emotion classification using massive examples
extracted from the web’, presented at the Proceedings of the 22nd International Conference on
Computational Linguistics-Volume 1, 2008, pp. 881–888.
F. Sebastiani, ‘Machine learning in automated text categorization’, ACM Comput. Surv. CSUR,
vol. 34, no. 1, pp. 1–47, 2002.
T. M. Mitchell, ‘Machine learning’, McGraw Hill, 1997.