• NAAIMA BOUDAD ENSIAS, Mohammed V University, Rabat
  • RDOUAN FAIZI ENSIAS, Mohammed V University, Rabat
  • RACHID OULAD HAJ THAMI ENSIAS, Mohammed V University, Rabat
  • RADDOUANE CHIHEB ENSIAS, Mohammed V University, Rabat


Sentiment Analysis, opinion mining, Arabic, Twitter, Machine Learning, Supervised Approach


Social media platforms have proven to be a powerful source of opinion sharing. Thus, mining and analyzing these opinions has an important role in decision-making and product benchmarking. However, the manual processing of the huge amount of content that these web-based applications host is an arduous task. This has led to the emergence of a new field of research known as Sentiment Analysis. In this respect, our objective in this work is to investigate sentiment classification in Arabic tweets using machine learning. Three classifiers namely Naïve Bayes, Support Vector Machine and K-Nearest Neighbor were evaluated on an in-house developed dataset using different features. A comparison of these classifiers has revealed that Support Vector Machine outperforms others classifiers and achieves a 78% accuracy rate.



Download data is not yet available.


B. Liu, ‘Sentiment analysis and opinion mining’, Synth. Lect. Hum. Lang. Technol., vol. 5, no. 1,

pp. 1–167, 2012.

A. Mountassir, H. Benbrahim, and I. Berrada, ‘An empirical study to address the problem of

unbalanced data sets in sentiment classification’, presented at the Systems, Man, and Cybernetics

(SMC), 2012 IEEE International Conference on, 2012, pp. 3298–3303.

M. Rushdi‐Saleh, M. T. Martín‐Valdivia, L. A. Ureña‐López, and J. M. Perea‐Ortega, ‘OCA:

Opinion corpus for Arabic’, J. Am. Soc. Inf. Sci. Technol., vol. 62, no. 10, pp. 2045–2054, 2011.

M. A. Aly and A. F. Atiya, ‘LABR: A Large Scale Arabic Book Reviews Dataset.’, presented at

the ACL (2), 2013, pp. 494–498.

S. R. El-Beltagy and A. Ali, ‘Open issues in the sentiment analysis of Arabic social media: A case

study’, presented at the Innovations in information technology (iit), 2013 9th international

conference on, 2013, pp. 215–220.

N. El-Makky et al., ‘Sentiment analysis of colloquial Arabic tweets’, 2015.

S. Kiritchenko, S. M. Mohammad, and M. Salameh, ‘SemEval-2016 Task 7: Determining

sentiment intensity of english and arabic phrases’, presented at the Proceedings of the International

Workshop on Semantic Evaluation (SemEval), San Diego, California, June, 2016.

S. R. El-Beltagy, ‘NileTMRG at SemEval-2016 Task 7: Deriving Prior Polarities for Arabic

Sentiment Terms’, Proc. SemEval, pp. 486–490, 2016.

E. Refaee and V. Rieser, ‘iLab-Edinburgh at SemEval-2016 Task 7: A hybrid approach for

determining sentiment intensity of Arabic Twitter phrases’, Proc. SemEval, pp. 474–480, 2016.

A. Htait, S. Fournier, and P. Bellot, ‘LSIS at SemEval-2016 Task 7: Using web search engines for

English and Arabic unsupervised sentiment intensity prediction’, Proc. SemEval, pp. 469–473,

S. R. El-Beltagy, T. Khalil, A. Halaby, and M. Hammad, ‘Combining Lexical Features and a

Supervised Learning Approach for Arabic Sentiment Analysis’, 2016.

J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, ‘Tackling the poor assumptions of naive bayes

text classifiers’, presented at the ICML, 2003, vol. 3, pp. 616–623.

P. S. Dodds et al., ‘Human language reveals a universal positivity bias’, Proc. Natl. Acad. Sci., vol.

, no. 8, pp. 2389–2394, 2015.

M. Abdul-Mageed and M. T. Diab, ‘SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon

for Arabic Subjectivity and Sentiment Analysis.’, presented at the LREC, 2014, pp. 1162–1169.

E. Kouloumpis, T. Wilson, and J. D. Moore, ‘Twitter sentiment analysis: The good the bad and the

omg!’, Icwsm, vol. 11, pp. 538–541, 2011.

E. Refaee and V. Rieser, ‘An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis.’,

presented at the LREC, 2014, pp. 2268–2273.

C. E. Shannon, ‘A mathematical theory of communication’, ACM SIGMOBILE Mob. Comput.

Commun. Rev., vol. 5, no. 1, pp. 3–55, 2001.

H. Yu and V. Hatzivassiloglou, ‘Towards answering opinion questions: Separating facts from

opinions and identifying the polarity of opinion sentences’, presented at the Proceedings of the

conference on Empirical methods in natural language processing, 2003, pp. 129–136.

B. Pang, L. Lee, and S. Vaithyanathan, ‘Thumbs up?: sentiment classification using machine

learning techniques’, presented at the Proceedings of the ACL-02 conference on Empirical

methods in natural language processing-Volume 10, 2002, pp. 79–86.

A. Go, R. Bhayani, and L. Huang, ‘Twitter sentiment classification using distant supervision’,

CS224N Proj. Rep. Stanf., vol. 1, p. 12, 2009.

T. Joachims, ‘Text categorization with support vector machines: Learning with many relevant

features’, presented at the European conference on machine learning, 1998, pp. 137–142.

R. Tokuhisa, K. Inui, and Y. Matsumoto, ‘Emotion classification using massive examples

extracted from the web’, presented at the Proceedings of the 22nd International Conference on

Computational Linguistics-Volume 1, 2008, pp. 881–888.

F. Sebastiani, ‘Machine learning in automated text categorization’, ACM Comput. Surv. CSUR,

vol. 34, no. 1, pp. 1–47, 2002.

T. M. Mitchell, ‘Machine learning’, McGraw Hill, 1997.