Contextualized Satire Detection in Short Texts Using Deep Learning Techniques

Authors

  • Ashraf Kamal PayPal, Chennai–600119, India
  • Muhammad Abulaish Department of Computer Science, South Asian University, New Delhi–110068, India
  • Jahiruddin Department of Computer Science, Jamia Millia Islamia (A Central University), New Delhi–110025, India

DOI:

https://doi.org/10.13052/jwe1540-9589.2312

Keywords:

Information retrieval, online social media, figurative language detection, satire detection, deep learning

Abstract

Satire is prominent in user-generated content on various online platforms in the form of satirical news, customer reviews, blogs, articles, and short messages that are typically of an informal nature. As satire is also used to disseminate false information on the Internet, its computational detection has become a well-known issue. Existing work focuses primarily on formal document- or sentence-level textual data, whereas informal short texts have gotten less attention for satire detection. This paper presents a new model called BiLSTM self-attention (BiSAT) for detecting satire in informal short texts. It consists of various components such as input, embedding, self-attention, and two bi-directional long short-term memory (BiLSTM) layers for learning crucial contextual information pertaining to the satire present in the texts. The input layer uses the text as input to create an input vector, which is then given to the embedding layer to create the appropriate numeric vector. The output of the embedding layer is passed on to the first BiLSTM layer, which extracts contextual information-based sequences in the opposite direction. Between the first and second BiLSTM layers, a self-attention layer is employed to draw attention to the important satirical information that is acquired by the hidden layer of the first BiLSTM. The BiSAT model also takes a classic feature engineering approach, employing a 13-dimensional auxiliary feature vector comprised of features from four separate feature categories: sentiment, punctuation, hyperbole, and affective. The proposed BiSAT model is empirically evaluated on two benchmark datasets and a newly created dataset called Satire-280. It outperforms existing research and baseline methods by a significant margin. The Satire-280 dataset along with code can be downloaded from GitHub repository: https://github.com/Ashraf-Kamal/Satire-Detection.

Downloads

Download data is not yet available.

Author Biographies

Ashraf Kamal, PayPal, Chennai–600119, India

Ashraf Kamal received his Ph.D. degree in Computer Science from Jamia Millia Islamia (A Central University), New Delhi, India in 2021. Currently, he is a Machine Learning Engineer at PayPal, Chennai, India. He qualified UGC-NET in 2014 and his research interests include text mining, machine learning, and information retrieval. He was a recipient of the Visvesvaraya Ph.D. Fellowship from the Ministry of Electronics and Information Technology, Government of India to pursue his Ph.D. work. He has published over 10 research papers in reputed journals and conference proceedings, including two in IEEE/ACM Transactions.

Muhammad Abulaish, Department of Computer Science, South Asian University, New Delhi–110068, India

Muhammad Abulaish (Senior Member, IEEE) received his Ph.D. degree in Computer Science from Indian Institute of Technology (IIT) Delhi in 2007. He is a Full Professor in the Department of Computer Science, South Asian University, New Delhi, India. His research interests include data analytics and mining, social computing, machine learning, and data-driven cyber Security. He has published over 139 research articles in international journals, books, and conference proceedings, including seven in IEEE/ACM Transactions. He is an Associate Editor for the Social Network Analysis and Mining journal. He served as a Senior Program Committee member for CIKM’22. As a member of the Program Committee, he frequently serves prestigious international conferences such as SDM, CIKM, IJCAI-ECAI, PAKDD, Web Intelligence, and BIOKDD. He has also served as Publicity Co-chair for WI’19 and WI’20, as well as Workshop Co-chair for ASONAM’20. He is also a member of the editorial board and a reviewer for numerous reputable journals. He holds senior memberships with IEEE, ACM, and CSI. In addition, he is a lifetime member of ISTE, IETE, and ISCA.

Jahiruddin, Department of Computer Science, Jamia Millia Islamia (A Central University), New Delhi–110025, India

Jahiruddin received his Ph.D. degree in Computer Science from Jamia Millia Islamia (A Central University), New Delhi, India in 2012. He is currently a Full Professor at the Department of Computer Science, Jamia Millia Islamia. His research interests include text mining, computational biology, and social network analysis. He has published over 20 research papers in various reputed journals and conference proceedings.

References

Abulaish, M., Kamal, A., Zaki, M.J.: A survey of figurative language and its computational detection in online social networks. ACM Transactions on the Web 14(1): 1–52 (2020).

Ravi, K., Ravi, V.: A novel automatic satire and irony detection using ensembled feature selection and data mining. Knowledge-Based Systems 120, 15–33 (2017).

Thu, P.P., New, N.: Implementation of emotional features on satire detection. In: Proceedings of the 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Kanazawa, Japan, pp. 149–154, IEEE (2017).

De Sarkar, S., Yang, F., Mukherjee, A.: Attending sentences to detect satirical fake news. In: Proceedings of the 27th International Conference on Computational Linguistics (COLING), Santa Fe, New Mexico, USA, pp. 3371–3380 (2018).

Barbieri, F., Ronzano, F., Saggion, H.: Do we criticise (and laugh) in the same way? automatic detection of multi-lingual satirical news in Twitter. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, pp. 1215–1221 (2015).

Barbieri, F., Ronzano, F., Saggion, H.: Is this tweet satirical? a computational approach for satire detection in spanish. Procesamiento del Lenguaje Natural (55): 135–142 (2015).

Sinha, A., Patekar, P., Mamidi, R.: Unsupervised approach for monitoring satire on social media. In: Proceedings of the 11th Forum for Information Retrieval Evaluation (FIRE), Kolkata, India, pp. 36–41, ACM (2019).

Burfoot, C., Baldwin, T.: Automatic satire detection: are you having a laugh?. In: Proceedings of the AAssociation for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP), Suntec, Singapore, pp. 161–164, ACL and AFNLP (2009).

Rubin, V. L., Conroy, N., Chen, Y., Cornwell, S.: Fake news or truth? using satirical cues to detect potentially misleading news. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), San Diego, California, pp. 7–17, ACL (2016).

Reganti, A. N., Maheshwari, T., Kumar, U., Das, A., Bajpai, R.: Modeling satire in English text for automatic detection. In: Proceedings of the Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction (SENTIRE, ICDMW), Barcelona, Spain, pp. 970–977, IEEE (2016).

del Pilar Salas-Zárate, M., Paredes-Valverde, M. A., Rodriguez-García, M. Á., Valencia-García, R., Alor-Hernández, G.: Automatic detection of satire in Twitter: a psycholinguistic-based approach. Knowledge-Based Systems 128: 20–33 (2017).

Stöckl, A. Detecting Satire in the News with Machine Learning. https://doi.org/10.13140/RG.2.2.17157.40164, pp. 1–5 (2018).

Dutta, S., Chakraborty, A.: A deep learning-inspired method for social media satire detection. In: Wang J, Reddy GRM, Prasad VK, Reddy VS (eds). Soft Computing and Signal Processing, Springer, pp. 243–251 (2019).

Yang, F., Mukherjee, A., Dragut, E.: Satirical news detection and analysis using attention mechanism and linguistic features. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, pp. 1979–1989, ACL (2017).

Sharma, A. S., Mridul, M. A., Islam, M. S.: Automatic detection of satire in bangla documents: a cnn approach based on hybrid feature extraction model. In: Proceedings of the 2nd International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, Bangladesh, pp. 1–5, IEEE (2019).

H, Zachary., Do, Nam., Littman, M. L.: Context-driven satirical headline generation. In: Proceedings of the 2nd Workshop on Figurative Language Processing (FLP), pp. 40–50, ACL (2020).

Frain, A., Wubben, S.: SatiricLR: a language resource of satirical news articles. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC), Portorož, (Slovenia), pp. 4137–4140 (2016).

Ortega-Bueno, R., Rosso, P., Pagola, J. E.: UO UPV2 at HAHA 2019: bigru neural network informed with linguistic features for humor recognition. In: Proceedings of the Iberian Languages Evaluation Forum, co-located with 35th Conference of the Spanish Society for Natural Language Processing, CEUR Workshop, Bilbao, Spain, pp. 212–221 (2019).

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8): 1735–1780 (1997).

Luong, M. T., Pham, H., Manning, C. D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal, pp. 1412–1421, ACL (2015).

Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, Texas, USA, pp. 551–561, ACL (2016).

Bradley, M. M., Lang, P. J.: Affective norms for English words (ANEW): instruction manual and affective ratings. Technical report C-1, the center for research in psychophysiology, University of Florida (1999).

Kim, Y. Convolutional neural networks for sentence classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751, ACL (2014).

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724–1734, ACL (2014).

Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks 61, 85–117 (2015).

Kamal, A., Abulaish, M.: Self-deprecating humor detection: a machine learning approach. In: Proceedings of the 16th International Conference of the Pacific Association for Computational Linguistics (PACLING), Hanoi, Vietnam, October; pp. 483–494, Springer (2019).

Kamal, A., Abulaish, M.: An LSTM-based deep learning approach for detecting self-deprecating sarcasm in textual data. In: Proceedings of the 16th International Conference on Natural Language Processing (ICON), Hyderabad, India, December 18–21, 2019; pp. 201–210, ACL (2019).

Abulaish, M., Kamal, A.: Self-deprecating sarcasm detection: an amalgamation of rule-based and machine learning approach. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI), Santiago, Chile; pp. 574–579, IEEE (2018).

Kamal, A., Abulaish, M.: CAT-BiGRU: Convolution and Attention with Bi-Directional Gated Recurrent Unit for Self-Deprecating Sarcasm Detection. Cognitive Computation: 1–19 (2021).

Downloads

Published

2024-03-27

How to Cite

Kamal, A., Abulaish, M., & Jahiruddin. (2024). Contextualized Satire Detection in Short Texts Using Deep Learning Techniques. Journal of Web Engineering, 23(01), 27–52. https://doi.org/10.13052/jwe1540-9589.2312

Issue

Section

Articles