Sentiment Analysis of Online Reviews: A Machine Learning Based Approach with TF-IDF Vectorization

Authors

  • Khalid Alemerien Information Technology Department, Tafila Technical University Tafila, Jordan
  • Aram Al-Ghareeb Deanship of Scientific Research and Graduate Studies, Tafila Technical University, Tafila, Jordan
  • Malek Zakarya Alksasbeh Faculty of Information Technology, Al-Hussein Bin Talal University, Jordan

DOI:

https://doi.org/10.13052/jmm1550-4646.2055

Keywords:

Machine learning (ML), natural language processing (NLP), sentiment analysis, online review, tourism, support vector machine (SVM)

Abstract

Nowadays, online reviews wield considerable influence over consumer decision-making processes. Surveys show 84% of people compare their trustworthiness to recommendations from personal connections in these online reviews. Online reviews of services or destinations can significantly benefit the tourism industry. Therefore, our primary intent of this study is to leverage Machine Learning (ML) and Natural Language Processing (NLP) for sentiment analysis of hotel reviews in Jordan in order to assist both hotel owners and tourists. In this study, we proposed a ML-based approach using Support Vector Machine (SVM) and TF-IDF to perform sentiment analysis of hotel reviews into positive or negative. In addition, our experiments were performed using our real dataset, “JOHotelRating”, which was gathered in the Jordanian context. In the feature extraction stage, we utilized the Term Frequency-Inverse Document Frequency (TF-IDF) method. In the machine learning (ML) classification phase, we utilized various algorithms such as Support Vector Machine (SVM), Multinomial Naïve Bayes (MNB), Bernoulli’s Naïve Bayes (BNB), Decision Tree (DT), and Random Forest (RF). SVM with TF-IDF for feature extraction, emerged as the standout performer, achieving an impressive 97% accuracy in sentiment classification. Our proposed approach offers the hotel owners a time-saving method to identify positive and negative reviews, allow them to understand trends, and enhance the overall customer experience. On the tourist side, the study attempts to tackle the challenge of comprehending numerous reviews by providing sentiment analysis, ultimately aiding them in making better-informed decisions when selecting a hotel in Jordan.

Downloads

Download data is not yet available.

Author Biographies

Khalid Alemerien, Information Technology Department, Tafila Technical University Tafila, Jordan

Khalid Alemerien is an associate professor of Software Engineering/Computer Science at Information Technology Department in the college of ICT, Tafila Technical University (TTU), Jordan. Dr. Alemerien was awarded his Masters and Ph.D. in Software Engineering/Computer Science from North Dakota State University, USA in 2013 and 2014, respectively. His research interests focus on Software Engineering, Usable Security and Privacy, Machine Learning and Deep Learning, Informatics, E-learning, Privacy and Security in IoT-based Systems. Dr. Alemerien has published numerous research papers in prestigious journals and conference proceedings.

Aram Al-Ghareeb, Deanship of Scientific Research and Graduate Studies, Tafila Technical University, Tafila, Jordan

Aram Al-Ghareeb holds a bachelor degree in computer science. She is currently pursuing a master’s in Cyber Physical Systems (CPSs) at Tafila Technical University, Jordan. Her research interests focus on CPSs, IoT based Systems, Machine Learning, and Computer Education.

Malek Zakarya Alksasbeh, Faculty of Information Technology, Al-Hussein Bin Talal University, Jordan

Malek Zakarya Alksasbeh is currently a Full Professor in the Faculty of Information Technology at Al-Hussein Bin Talal University, Ma’an, Jordan. He received his B.S. degree in Computer science from Mu’tah University, Jordan, in 2005, and the M.S. and Ph.D. degrees in Information Technology from the University Utara Malaysia (UUM), Malaysia, in 2008 and 2012, respectively. His research interests are in the areas of Smart Systems, Information Retrieval, Deep learning, and Instructional Technology.

References

Puh, K., and Bagić Babac, M. (2023). Predicting sentiment and rating of tourist reviews using machine learning. Journal of Hospitality and Tourism Insights, 6(3), 1188–1204. https://doi.org/10.1108/JHTI-02-2022-0078.

Liu, Y., Ding, X., Chi, M., Wu, J., and Ma, L. (2024). Assessing the helpfulness of hotel reviews for information overload: A multi-view spatial feature approach. Information Technology & Tourism, 26(1), 59–87. https://doi.org/10.1007/s40558-023-00280-x.

Chen, W. (2024). Exploring the Dynamics of Electronic Word-of-Mouth in Chinese Tourism: A Social Network Perspective. Journal of the Knowledge Economy, 1–23. https://doi.org/10.1007/s13132-024-01780-9.

Akbar, A. R., Kalis, M. C. I., Afifah, N., Purmono, B. B., and Yakin, I. (2023). The Influence of Product Packaging Design and Online Customer Review on Brand Awareness and Their Impact on Online Purchase Intention. South Asian Res J Bus Manag, 5(1), 10–18.

Shi, H. X., and Li, X. J. (2011, July). A sentiment analysis model for hotel reviews based on supervised learning. In 2011 International Conference on Machine Learning and Cybernetics (Vol. 3, pp. 950–954). IEEE.

Rodrigues, V., Eusébio, C., and Breda, Z. (2023). Enhancing sustainable development through tourism digitalisation: a systematic literature review. Information Technology & Tourism, 25(1), 13–45. https://doi.org/10.1007/s40558-022-00241-w.

Wang, W. (2023). Design of cloud computing database and tourism intelligent platform based on machine learning. Soft Computing, 1–9. https://doi.org/10.1007/s00500-023-08642-7.

Hartmann, J., Heitmann, M., Siebert, C., and Schamp, C. (2023). More than a feeling: Accuracy and application of sentiment analysis. International Journal of Research in Marketing, 40(1), 75–87.

Taherdoost, H., and Madanchian, M. (2023). Artificial intelligence and sentiment analysis: A review in competitive research. Computers, 12(2), 37. https://doi.org/10.3390/computers12020037.

Vargas-Calderón, V., Moros Ochoa, A., Castro Nieto, G. Y., and Camargo, J. E. (2021). Machine learning for assessing quality of service in the hospitality sector based on customer reviews. Information Technology & Tourism, 23, 351–379. https://doi.org/10.1007/s40558-021-00207-4.

Ministry of Tourism and Antiquities. Number of Classified Hotels in Jordan: 1998-2021 (2023) CEIC Data. https://www.ceicdata.com/en/jordan/tourist-accommodation-establishments-statistics/number-of-classified-hotels.

Wadhe, A. A., and Suratkar, S. S. (2020, February). Tourist place reviews sentiment classification using machine learning techniques. In 2020 international conference on Industry 4.0 Technology (I4Tech) (pp. 1–6). IEEE.

Dharma, A. S., and Saragih, Y. G. R. (2022). Comparison of Feature Extraction Methods on Sentiment Analysis in Hotel Reviews. Sinkron: jurnal dan penelitian teknik informatika, 7(4), 2349–2354. https://doi.org/10.33395/sinkron.v7i4.11706.

Srivastava, R., Bharti, P. K., and Verma, P. (2022). Comparative Analysis of Lexicon and Machine Learning Approach for Sentiment Analysis. International Journal of Advanced Computer Science and Applications, 13(3). https://doi.org/10.14569/IJACSA.2022.0130312.

Ye, Q., Zhang, Z., and Law, R. (2009). Sentiment classification of online reviews to travel destinations by supervised Machine learning approaches. Expert systems with applications, 36(3), 6527–6535. https://doi.org/10.1016/j.eswa.2008.07.035.

Rai, P., and Ahirwal, R. (2018). Tourism Review Sentiment Analysis using Lexicon Features and Machine Learning Approach. E ISSN, 2348-1269.

Kulkarni, A., Barve, P., and Phade, A. (2019). A machine learning approach to building a tourism recommendation system using sentiment analysis. International Journal of Computer Applications, 178, 48–51.

Farisi, A. A., Sibaroni, Y., and Al Faraby, S. (2019, March). Sentiment analysis on hotel reviews using Multinomial Naïve Bayes classifier. In Journal of Physics: Conference Series (Vol. 1192, No. 1, p. 012024). IOP Publishing.

Li, X., and Liu, C. (2020, April). Comparison of Machine Learning Models for Sentimental Analysis of Hotel Reviews. In IOP Conference Series: Materials Science and Engineering (Vol. 806, No. 1, p. 012029). IOP Publishing.

İnan, H. E. (2024). Comparison of Machine Learning Algorithms for Classification of Hotel Reviews: Sentiment Analysis of TripAdvisor Reviews. GSI Journals Serie A: Advancements in Tourism Recreation and Sports Sciences, 7(1), 111–122.

Tripadvisor (2023). https://www.tripadvisor.com, last accessed in 15/11/2024.

Xiang, Z., Du, Q., Ma, Y., and Fan, W. (2018). Assessing reliability of social media data: lessons from mining TripAdvisor hotel reviews. Information Technology & Tourism, 18, 43–59. https://doi.org/10.1007/s40558-017-0098-z.

Sharupa, N. A., Rahman, M., Alvi, N., Raihan, M., Islam, A., and Raihan, T. (2020, July). Emotion detection of Twitter post using multinomial Naive Bayes. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1–6). IEEE.

Sarang, P. (2023). Naive Bayes: A Supervised Learning Algorithm for Classification. In Thinking Data Science: A Data Science Practitioner’s Guide (pp. 143–152). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-02363-7_7.

Mahmoud Masadeh, Moustapha. A, Sharada B, Hanumanthappa J, Hemachandran K, Channabasava Chola and Abdullah Y. Muaad, Investigating the Impact of Preprocessing Techniques and Representation Models on Arabic Text Classification using Machine Learning, International Journal of Advanced Computer Science and Applications (IJACSA), 15(1), 2024. http://dx.doi.org/10.14569/IJACSA.2024.01501110.

Tandon, V., and Mehra, R. (2023). An Integrated Approach For Analysing Sentiments On Social Media. Informatica, 47(2). https://doi.org/10.31449/inf.v47i2.4390.

Priyanka, and Kumar, D. (2020). Decision tree classifier: a detailed survey. International Journal of Information and Decision Sciences, 12(3), 246–269. https://doi.org/10.1504/IJIDS.2020.108141.

Chory, R. N., Nasrun, M., and Setianingsih, C. (2018, November). Sentiment analysis on user satisfaction level of mobile data services using Support Vector Machine (SVM) algorithm. In 2018 IEEE International Conference on Internet of Things and Intelligence System (IOTAIS) (pp. 194–200). IEEE.

Siautama, R., IA, A. C., and Suhartono, D. (2021). Extractive hotel review summarization based on TF/IDF and adjective – Noun pairing by considering annual sentiment trends. Procedia Computer Science, 179, 558–565. https://doi.org/10.1016/j.procs.2021.01.040.

Cai, M., Du, Y., Tan, Y., and Lu, X. (2023). Aspect-based classification method for review spam detection. Multimedia Tools and Applications, 1–22. https://doi.org/10.1007/s11042-023-16293-x.

Alemerien, K., Alsarayreh, S., and Altarawneh, E. (2024). Diagnosing Cardiovascular Diseases using Optimized Machine Learning Algorithms with GridSearchCV. Journal of Applied Data Sciences, 5(4), 1539–1552.

Shrivastava, A. (2024). A Deep Learning model based on CNN using Keras and TensorFlow to determine real time melting point of chemical substances. ELCVIA Electronic Letters on Computer Vision and Image Analysis, 23(1), 47–67.

Gupta, S., Singhal, N., Hundekari, S., Upreti, K., Gautam, A., Kumar, P., and Verma, R. (2024). Aspect Based Feature Extraction in Sentiment Analysis using Bi-GRU-LSTM Model. Journal of Mobile Multimedia, 20(4), 935–960.

Downloads

Published

2024-12-20

How to Cite

Alemerien, K. ., Al-Ghareeb, A. ., & Alksasbeh, M. Z. . (2024). Sentiment Analysis of Online Reviews: A Machine Learning Based Approach with TF-IDF Vectorization. Journal of Mobile Multimedia, 20(05), 1089–1116. https://doi.org/10.13052/jmm1550-4646.2055

Issue

Section

Articles