Joint Representations of Texts and Labels with Compositional Loss for Short Text Classification


  • Ming Hao School of computer and communication engineering, University of science and technology Beijing, Beijing 100083, China
  • Weijing Wang Department of bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
  • Fang Zhou School of computer and communication engineering, University of science and technology Beijing, Beijing 100083, China



Ambiguous text, deep language models, label embedding, text classification, triplet loss


Short text classification is an important foundation for natural language processing (NLP) tasks. Though, the text classification based on deep language models (DLMs) has made a significant headway, in practical applications however, some texts are ambiguous and hard to classify in multi-class classification especially, for short texts whose context length is limited. The mainstream method improves the distinction of ambiguous text by adding context information. However, these methods rely only the text representation, and ignore that the categories overlap and are not completely independent of each other. In this paper, we establish a new general method to solve the problem of ambiguous text classification by introducing label embedding to represent each category, which makes measurable difference between the categories. Further, a new compositional loss function is proposed to train the model, which makes the text representation closer to the ground-truth label and farther away from others. Finally, a constraint is obtained by calculating the similarity between the text representation and label embedding. Errors caused by ambiguous text can be corrected by adding constraints to the output layer of the model. We apply the method to three classical models and conduct experiments on six public datasets. Experiments show that our method can effectively improve the classification accuracy of the ambiguous texts. In addition, combining our method with BERT, we obtain the state-of-the-art results on the CNT dataset.


Download data is not yet available.

Author Biographies

Ming Hao, School of computer and communication engineering, University of science and technology Beijing, Beijing 100083, China

Ming Hao is a Ph.D. student at the University of Science and Technology Beijing, China. He attended the Taiyuan University of Technology, China where he received his B.Sc. in Software Engineering in 2013 and began to study for an M.Sc. in Software Engineering from the University of Science and Technology in Beijing, China in 2014. From 2015 to 2019, Ming worked for the Institute of Automation, Chinese Academy of Science, engaged in algorithm research in the field of natural language processing, and from 2019 to 2020, he was a visiting scholar with the Department of Computer Science, the University of Illinois at Urbana-Champaign, USA. His research interests include machine learning and natural language processing and reinforcement learning.

Weijing Wang, Department of bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

Weijing Wang is a Ph.D. student at the University of Illinois at Urbana Champaign since fall 2019. She attended the Hubei University of Chinese medicine and received her B.S there in 2019. Weijing Wang is currently completing a doctorate in Bioengineering at the University of Illinois at Urbana Champaign. Her Ph.D. work centers on smartphone-based point of care devices and micro and nano based research, she also received her master of engineering degree from this university.

Fang Zhou, School of computer and communication engineering, University of science and technology Beijing, Beijing 100083, China

Fang Zhou received the B.Sc, M.Sc and Ph.D degree in computer science from the University of Science and Technology Beijing, China, in 1995, 2002 and 2012. From 2015 to 2016, she was a Visiting Researcher with the Department of Computer and Information Sciences, Temple University, USA. She is currently an Associate Professor with the Department of Computer Science and Technology, University of Science and Technology Beijing. Her research interests include machine learning, information retrieval and computer vision.


Hu, Q., et al. SNNN: Promoting Word Sentiment and Negation in Neural Sentiment Classification. 2018. {AAAI} Press.

Wu, Z., et al. Improving Review Representations With User Attention and Product Attention for Sentiment Classification. 2018. {AAAI} Press.

Anderson, P., et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. 2018. {IEEE} Computer Society.

Dong, X. and G. de Melo. A Helping Hand: Transfer Learning for Deep Sentiment Analysis. 2018. Association for Computational Linguistics.

Kalchbrenner, N., E. Grefenstette, and P. Blunsom, A Convolutional Neural Network for Modelling Sentences. 2014. 1 %6: p. 655–665 %&.

Kim, Y. Convolutional Neural Networks for Sentence Classification. 2014. {ACL}.

Zeng, D., et al. Relation Classification via Convolutional Deep Neural Network. 2014. {ACL}.

Conneau, A., et al. Very Deep Convolutional Networks for Text Classification. 2017. Association for Computational Linguistics.

Zhou, P., et al. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling. 2016. {ACL}.

Lee, J.Y. and F. Dernoncourt. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks. 2016. The Association for Computational Linguistics.

Zhou, P., et al. Attention-based bidirectional long short-term memory networks for relation classification. 2016.

Ma, F., et al. Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks. 2017.

Wang, P., et al., Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 2016. 174 %6: p. 806–814 %&.

Wang, J., et al. Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification. 2017.

Chen, J., et al. Deep Short Text Classification with Knowledge Powered Attention. 2019. {AAAI} Press.

Pennington, J., R. Socher, and C.D. Manning. Glove: Global Vectors for Word Representation. 2014. {ACL}.

Peters, M.E., et al. Deep Contextualized Word Representations. 2018. Association for Computational Linguistics.

Chelba, C., et al. One billion word benchmark for measuring progress in statistical language modeling. 2014. {ISCA}.

Radford, A., et al., Improving language understanding by generative pre-training. Proceedings of Technical report, OpenAI, 2018. %6: p. %&.

Zhu, Y., et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015. {IEEE} Computer Society.

Devlin, J., et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. Association for Computational Linguistics.

Akata, Z., et al., Label-Embedding for Image Classification. CoRR, 2015. abs/1503.08677 %6: p. %&.

Rodr’i, g.-S.J.e.A. and F. Perronnin. Label embedding for text recognition. 2013. {BMVA} Press.

Frome, A., et al. DeViSE: A Deep Visual-Semantic Embedding Model. 2013.

Maaten, L.v.d. and G. Hinton, Visualizing data using t-SNE. Journal of machine learning research, 2008. 9 %6(Nov): p. 2579–2605 %&.

Schroff, F., D. Kalenichenko, and J. Philbin. FaceNet: A unified embedding for face recognition and clustering. 2015. {IEEE} Computer Society.

Zhuang, B., et al. Fast Training of Triplet-Based Deep Binary Embedding Networks. 2016. {IEEE} Computer Society.

Hermans, A., L. Beyer, and B. Leibe, In Defense of the Triplet Loss for Person Re-Identification. CoRR, 2017. abs/1703.07737 %6: p. %&.

Cheng, D., et al. Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function. 2016. {IEEE} Computer Society.

Chen, W., et al. Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification. 2017. {IEEE} Computer Society.

Wang, S., J. Zhang, and C. Zong, Empirical Exploring Word-Character Relationship for Chinese Sentence Representation. {ACM} Trans. Asian Low Resour. Lang. Inf. Process., 2018. 17 %6(3): p. 14:1–14:18 %&.

Johnson, R. and T. Zhang. Deep pyramid convolutional neural networks for text categorization. 2017.

Zhang, X., J.J. Zhao, and Y. LeCun. Character-level Convolutional Networks for Text Classification. 2015.

Li, X. and D. Roth. Learning Question Classifiers. 2002.

Zhou, Y., et al. Compositional Recurrent Neural Networks for Chinese Short Text Classification. 2016. {IEEE} Computer Society.

Loper, E. and S. Bird, NLTK: The Natural Language Toolkit. CoRR, 2002. cs.CL/0205028 %6: p. %&.

Li, S., et al. Analogical Reasoning on Chinese Morphological and Semantic Relations. 2018. Melbourne, Australia: Association for Computational Linguistics.

Abadi, M.i.n., et al. TensorFlow: A System for Large-Scale Machine Learning. 2016. {USENIX} Association.

Kingma, D.P. and J. Ba. Adam: A Method for Stochastic Optimization. 2015.

Hinton, G.E., et al., Improving neural networks by preventing co-adaptation of feature detectors. CoRR, 2012. abs/1207.0580 %6: p. %&.



How to Cite

Hao, M., Wang, W., & Zhou, F. (2021). Joint Representations of Texts and Labels with Compositional Loss for Short Text Classification. Journal of Web Engineering, 20(3), 669–688.


