Sentiment Analysis of Online Reviews: A Machine Learning Based Approach with TF-IDF Vectorization
DOI:
https://doi.org/10.13052/jmm1550-4646.2055Keywords:
Machine learning (ML), natural language processing (NLP), sentiment analysis, online review, tourism, support vector machine (SVM)Abstract
Nowadays, online reviews wield considerable influence over consumer decision-making processes. Surveys show 84% of people compare their trustworthiness to recommendations from personal connections in these online reviews. Online reviews of services or destinations can significantly benefit the tourism industry. Therefore, our primary intent of this study is to leverage Machine Learning (ML) and Natural Language Processing (NLP) for sentiment analysis of hotel reviews in Jordan in order to assist both hotel owners and tourists. In this study, we proposed a ML-based approach using Support Vector Machine (SVM) and TF-IDF to perform sentiment analysis of hotel reviews into positive or negative. In addition, our experiments were performed using our real dataset, “JOHotelRating”, which was gathered in the Jordanian context. In the feature extraction stage, we utilized the Term Frequency-Inverse Document Frequency (TF-IDF) method. In the machine learning (ML) classification phase, we utilized various algorithms such as Support Vector Machine (SVM), Multinomial Naïve Bayes (MNB), Bernoulli’s Naïve Bayes (BNB), Decision Tree (DT), and Random Forest (RF). SVM with TF-IDF for feature extraction, emerged as the standout performer, achieving an impressive 97% accuracy in sentiment classification. Our proposed approach offers the hotel owners a time-saving method to identify positive and negative reviews, allow them to understand trends, and enhance the overall customer experience. On the tourist side, the study attempts to tackle the challenge of comprehending numerous reviews by providing sentiment analysis, ultimately aiding them in making better-informed decisions when selecting a hotel in Jordan.
Downloads
References
Puh, K., and Bagić Babac, M. (2023). Predicting sentiment and rating of tourist reviews using machine learning. Journal of Hospitality and Tourism Insights, 6(3), 1188–1204. https://doi.org/10.1108/JHTI-02-2022-0078.
Liu, Y., Ding, X., Chi, M., Wu, J., and Ma, L. (2024). Assessing the helpfulness of hotel reviews for information overload: A multi-view spatial feature approach. Information Technology & Tourism, 26(1), 59–87. https://doi.org/10.1007/s40558-023-00280-x.
Chen, W. (2024). Exploring the Dynamics of Electronic Word-of-Mouth in Chinese Tourism: A Social Network Perspective. Journal of the Knowledge Economy, 1–23. https://doi.org/10.1007/s13132-024-01780-9.
Akbar, A. R., Kalis, M. C. I., Afifah, N., Purmono, B. B., and Yakin, I. (2023). The Influence of Product Packaging Design and Online Customer Review on Brand Awareness and Their Impact on Online Purchase Intention. South Asian Res J Bus Manag, 5(1), 10–18.
Shi, H. X., and Li, X. J. (2011, July). A sentiment analysis model for hotel reviews based on supervised learning. In 2011 International Conference on Machine Learning and Cybernetics (Vol. 3, pp. 950–954). IEEE.
Rodrigues, V., Eusébio, C., and Breda, Z. (2023). Enhancing sustainable development through tourism digitalisation: a systematic literature review. Information Technology & Tourism, 25(1), 13–45. https://doi.org/10.1007/s40558-022-00241-w.
Wang, W. (2023). Design of cloud computing database and tourism intelligent platform based on machine learning. Soft Computing, 1–9. https://doi.org/10.1007/s00500-023-08642-7.
Hartmann, J., Heitmann, M., Siebert, C., and Schamp, C. (2023). More than a feeling: Accuracy and application of sentiment analysis. International Journal of Research in Marketing, 40(1), 75–87.
Taherdoost, H., and Madanchian, M. (2023). Artificial intelligence and sentiment analysis: A review in competitive research. Computers, 12(2), 37. https://doi.org/10.3390/computers12020037.
Vargas-Calderón, V., Moros Ochoa, A., Castro Nieto, G. Y., and Camargo, J. E. (2021). Machine learning for assessing quality of service in the hospitality sector based on customer reviews. Information Technology & Tourism, 23, 351–379. https://doi.org/10.1007/s40558-021-00207-4.
Ministry of Tourism and Antiquities. Number of Classified Hotels in Jordan: 1998-2021 (2023) CEIC Data. https://www.ceicdata.com/en/jordan/tourist-accommodation-establishments-statistics/number-of-classified-hotels.
Wadhe, A. A., and Suratkar, S. S. (2020, February). Tourist place reviews sentiment classification using machine learning techniques. In 2020 international conference on Industry 4.0 Technology (I4Tech) (pp. 1–6). IEEE.
Dharma, A. S., and Saragih, Y. G. R. (2022). Comparison of Feature Extraction Methods on Sentiment Analysis in Hotel Reviews. Sinkron: jurnal dan penelitian teknik informatika, 7(4), 2349–2354. https://doi.org/10.33395/sinkron.v7i4.11706.
Srivastava, R., Bharti, P. K., and Verma, P. (2022). Comparative Analysis of Lexicon and Machine Learning Approach for Sentiment Analysis. International Journal of Advanced Computer Science and Applications, 13(3). https://doi.org/10.14569/IJACSA.2022.0130312.
Ye, Q., Zhang, Z., and Law, R. (2009). Sentiment classification of online reviews to travel destinations by supervised Machine learning approaches. Expert systems with applications, 36(3), 6527–6535. https://doi.org/10.1016/j.eswa.2008.07.035.
Rai, P., and Ahirwal, R. (2018). Tourism Review Sentiment Analysis using Lexicon Features and Machine Learning Approach. E ISSN, 2348-1269.
Kulkarni, A., Barve, P., and Phade, A. (2019). A machine learning approach to building a tourism recommendation system using sentiment analysis. International Journal of Computer Applications, 178, 48–51.
Farisi, A. A., Sibaroni, Y., and Al Faraby, S. (2019, March). Sentiment analysis on hotel reviews using Multinomial Naïve Bayes classifier. In Journal of Physics: Conference Series (Vol. 1192, No. 1, p. 012024). IOP Publishing.
Li, X., and Liu, C. (2020, April). Comparison of Machine Learning Models for Sentimental Analysis of Hotel Reviews. In IOP Conference Series: Materials Science and Engineering (Vol. 806, No. 1, p. 012029). IOP Publishing.
İnan, H. E. (2024). Comparison of Machine Learning Algorithms for Classification of Hotel Reviews: Sentiment Analysis of TripAdvisor Reviews. GSI Journals Serie A: Advancements in Tourism Recreation and Sports Sciences, 7(1), 111–122.
Tripadvisor (2023). https://www.tripadvisor.com, last accessed in 15/11/2024.
Xiang, Z., Du, Q., Ma, Y., and Fan, W. (2018). Assessing reliability of social media data: lessons from mining TripAdvisor hotel reviews. Information Technology & Tourism, 18, 43–59. https://doi.org/10.1007/s40558-017-0098-z.
Sharupa, N. A., Rahman, M., Alvi, N., Raihan, M., Islam, A., and Raihan, T. (2020, July). Emotion detection of Twitter post using multinomial Naive Bayes. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1–6). IEEE.
Sarang, P. (2023). Naive Bayes: A Supervised Learning Algorithm for Classification. In Thinking Data Science: A Data Science Practitioner’s Guide (pp. 143–152). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-02363-7_7.
Mahmoud Masadeh, Moustapha. A, Sharada B, Hanumanthappa J, Hemachandran K, Channabasava Chola and Abdullah Y. Muaad, Investigating the Impact of Preprocessing Techniques and Representation Models on Arabic Text Classification using Machine Learning, International Journal of Advanced Computer Science and Applications (IJACSA), 15(1), 2024. http://dx.doi.org/10.14569/IJACSA.2024.01501110.
Tandon, V., and Mehra, R. (2023). An Integrated Approach For Analysing Sentiments On Social Media. Informatica, 47(2). https://doi.org/10.31449/inf.v47i2.4390.
Priyanka, and Kumar, D. (2020). Decision tree classifier: a detailed survey. International Journal of Information and Decision Sciences, 12(3), 246–269. https://doi.org/10.1504/IJIDS.2020.108141.
Chory, R. N., Nasrun, M., and Setianingsih, C. (2018, November). Sentiment analysis on user satisfaction level of mobile data services using Support Vector Machine (SVM) algorithm. In 2018 IEEE International Conference on Internet of Things and Intelligence System (IOTAIS) (pp. 194–200). IEEE.
Siautama, R., IA, A. C., and Suhartono, D. (2021). Extractive hotel review summarization based on TF/IDF and adjective – Noun pairing by considering annual sentiment trends. Procedia Computer Science, 179, 558–565. https://doi.org/10.1016/j.procs.2021.01.040.
Cai, M., Du, Y., Tan, Y., and Lu, X. (2023). Aspect-based classification method for review spam detection. Multimedia Tools and Applications, 1–22. https://doi.org/10.1007/s11042-023-16293-x.
Alemerien, K., Alsarayreh, S., and Altarawneh, E. (2024). Diagnosing Cardiovascular Diseases using Optimized Machine Learning Algorithms with GridSearchCV. Journal of Applied Data Sciences, 5(4), 1539–1552.
Shrivastava, A. (2024). A Deep Learning model based on CNN using Keras and TensorFlow to determine real time melting point of chemical substances. ELCVIA Electronic Letters on Computer Vision and Image Analysis, 23(1), 47–67.
Gupta, S., Singhal, N., Hundekari, S., Upreti, K., Gautam, A., Kumar, P., and Verma, R. (2024). Aspect Based Feature Extraction in Sentiment Analysis using Bi-GRU-LSTM Model. Journal of Mobile Multimedia, 20(4), 935–960.



