• HIKMAT ULLAH KHAN COMSATS Institute of Information Technology, Wah, Pakistan


Opinion Mining, Web Forum, Supervised Learning, Mixed-Sentiment, Feature analysis


The sentiment detection of the content has become an active research domain in recent years due to the increased availability of public views and opinions in the social web forums. Earlier works detect the sentiment arousal and valence using a lexicon or a dictionary. This paper aims to classify a post content in the social web forums by identifying the mixed-sentiment views and targets to find such posts in which users’ views have both positive and negative emotions. Identification of the mixed-sentiment content has several potential applications such as monitoring public views, making products related business decisions and predicting users’ behaviors. I propose a non-lexical feature set and compare with the conventional lexicon-based sentiment feature set. The four state of the art classification algorithms applied on the large dataset of public forum verify that the proposed non-lexical features are helpful to find the mixed-sentiment in online forums. The main contribution is the proposal and validation of such features which do not need a lexicon. In addition, a comprehensive analysis of the dataset has been carried out using the power law analysis. The features have been ranked according to their significance in the classification model to identify mixed-sentiment content in the social web forum.


Download data is not yet available.


Pang B, Lee L. Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. 2008. 2(1). 1-135.

Chung J E, Mustafaraj E. Can Collective Sentiment Expressed on Twitter Predict Political Elections?. Proceedings of AAAI Conference on Artificial Intelligence, 2011.

Hassan A, Qazvinian V, Radev D. What's with the Attitude?: Identifying Sentences with Attitude in Online Discussions. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, 2010.

Weninger T, Zhu XA, Han J. An Exploration of Discussion Threads in Social News Sites: A Case Study of the Reddit Community. Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, New York, NY, USA, 2013.

Petz GE. Opinion Mining on the Web 2.0. Characteristics of User Generated Content and Their Impacts. Proceedings of Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data, A. Holzinger and G. Pasi, Eds., Springer, Berlin Heidelberg, 2013. p. 35-46.

Tsytsarau M, Palpanas T.Survey on mining subjective data on the web. Data Mining and Knowledge discovery. 2012. 24(3). 478-514.

Mukherjee A, Liu B. Mining Contentions from Disussions and Debates. Proceedings of Knowledge discovery and data mining, Beijing, China, 2012.

Fang Y, Si L, Somasundaram N, Yu Z. Mining Contrastive Opinions on Political Texts using Cross-Perspective Topic Model. Proceedings of Web search and data mining, 2012.

Tsytsarau M, Palpanas T, Denecke K., Scalable Discovery of Contradictions on the Web. Proceedings of World Wide Web, Raleigh, NC, USA, April 26-30, 2010.

Maynard KB, Rout D. Challenges in developing opinion mining tools for social media. Proceedings of @NLP can u tag user generated content? Workshop at LREC, 2012.

Azam BS. Opinion Mining: Issues and Challenges (A survey). International Journal of Computer Applications. 2012. 49(9).

Cambria E, Schuller B, Xia Y, Havasi C. New Avenues in Opinion Mining and Sentiment Analysis. IEEE Intelligent Systems. March 2013. 28(2). 15-21.

Liu B. Sentiment analysis and subjectivity. Handbook of Natural Language Processing, Second Edition. Taylor and Francis Group, Boca, 2010.

Hu M, Liu B. Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2004.

Fabbrizio GD, Aker A, Gaizauskas R. Summarizing Online Reviews Using Aspect Rating Distributions and Language Modeling. IEEE Intelligent Systems. May 2013. 28(3). 28-37.

Sumit B, Prakhar B, Prasenjit M. Classifying User Messages For Managing Web Forum Data. Proceedings of 15th International Workshop on the Web and Databases, p. 13-18, 2012.

Hai Z, Chang K, Kim JJ, Yang C. Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance. IEEE Transactions on Knowledge and Data Engineering. March 2014. 26(3).623-634.

S. Lavanya, B. Varthini, Sentiment classification of web opinion documents, in Electronics and Communication Systems (ICECS), 2014 International Conference on, 2014.

Gangemi A, Presutti V, Reforgiato RD. Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool. IEEE Computational Intelligence Magazine. 2014. 9(1). 20-30.

Zhongwu Z, Liu B, Zhang L, Xu H, Jia P. Identifying Evaluative Sentences in Online Discussions. Proceedings of AAAI, 2011.

Walker MA, Anand P, Abbott R, Tree JF, Martell C, King J, That is Your Evidence?: Classifying Stance in Online Political Debate. Decis. Support Syst. November 2012. 53(4). 719-729.

Abbott R, Walker M, Anand P, Fox JE, Bowmani R, King J. How Can You Say Such Things: Recognizing Disagreement in Informal Political Argument. Proceedings of the Workshop on Languages in Social Media, Stroudsburg, PA, USA, 2011.

Duan H, Zhai C. Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval. Proceedings of the 33rd European Conference on Advances in Information Retrieval, Berlin, Heidelberg, 2011.

Biyani P, Bhatia S, Caragea C, Mitra P. Using non-lexical features for identifying factual and opinionative threads in online forums. Knowledge-Based Systems. 2014. 69. 170-178.

Mei Q, Ling X, Wondra M, Su H, Zhai C. Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs. Proceedings of World Wide World, New York, USA, 2007.

Bizau A, Rusu D, Mladenic D. Expressing Opinion Diversity. Proceedings of DiverseWed-2011, Knowledge diversity on the Web, 2011.

Rosentinal S, McKeown K. Detecting Opinionated Claims in Online Discussions. Proceedings of IEEE 6th International Conference on Semantic Computing, 2012.

Biyani P, Bhatia S, Caragea C and Mitra P. Thread Specific Features are Helpful for Identifying Subjectivity Orientation of Online Forum Threads. Proceedings of COLING, 2012.

Thelwall M, Buckley K, Platoglou G, Kappas A. Sentiment Strength detection in short informal text. Journal of the American Society for Information Science and Technology. 2010. 61(12). 2544-2558.

Thelwall M, Buckley K, Platoglou G. Sentiment Strength detection for the social web, Journal of the American Society for Information Science and Technology. 2011. 63(1). 163-173.

Pennebaker WJ, Mehl RM, Niederhoffer GK. Psychological Aspects of Natural Language Use: Our Words, Our Selves. Annual Review of Psychology. 2003. 54(1). 547-577.

Esuli A, Sebastiani F. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentimet Analysis and Opinion Mining. Proceedings of Seventh Conference on International Language Resource and Evaluation, 2010.

Chemiei A, Sobkowicz P, Sienkiewicz J, Paltogios P, Buckley K, Thelwall M and Holyst JA. Negative emotions boost user activity at BBC forum. Physica A: Statistical Mechanics and its Applications. 2011. 390(16). 2936-2944.

Pang B, Lillian L, Shivakunar V. Thumbs up? Sentiment Classification using Machine Learning Techniques. Proceedings of Conference on Empirical Methods in Language Processing (EMNLP), Philadelphia, July 2002.

Biyani P, Caragea C, Singh A, Mitra P. I Want What I Need!: Analyzing Subjectivity of Online Forum Threads. Proceedings of the 21st ACM International Conference on Information and Knowledge Management, New York, NY, USA, 2012.

Paltoglou G, Thelwall M. Twitter, MySpace, Digg: Unsupervised Sentiment Analysis in Social Media, ACM Transactions on Intelligent Systems and Technologies. September 2012. 3(4). 1-19.

Tamayo P, Berger C, Campos M, Yarmus J, Milenova B, Mozes A, Taft M, Hornick R, Krishnan R, Thomas S, Kelly M, Mukhin D, Haberstroh B, Stephens S and Myczkowski J. Oracle Data Mining. Data Mining and Knowledge Discovery Handbook. 2005. p. 1315-1329.

Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N, Cost-effective outbreak detection in networks, Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. P. 420-429, 2007.

Severyn A, Moschitti A, Uryupina O, Plank B, Filippova K, Multi-lingual opinion mining on YouTube, Image Processing & Management, 2016. 52(1). 46-60.

Zhou g, Zhu Z, He T, H X, Cross-lingual sentiment classification with stacked utoencoders,Knowledge and Information Systems, 2016. 47(1). 27-44.