Natural Language Processing: Classification of Web Texts Combined with Deep Learning
DOI:
https://doi.org/10.13052/jicts2245-800X.1312Keywords:
Natural language processing, deep learning, web text, text classificationAbstract
With the increasing number of web texts, the classification of web texts has become an important task. In this paper, the text word vector representation method is first analyzed, and bidirectional encoder representations from transformers (BERT) are selected to extract the word vector. The bidirectional gated recurrent unit (BiGRU), convolutional neural network (CNN), and attention mechanism are combined to obtain the context and local features of the text, respectively. Experiments were carried out using the THUCNews dataset. The results showed that in the comparison between word-to-vector (Word2vec), Glove, and BERT, the BERT obtained the best classification result. In the classification of different types of text, the average accuracy and F1 value of the BERT-BGCA method reached 0.9521 and 0.9436, respectively, which were superior to other deep learning methods such as TextCNN. The results suggest that the BERT-BGCA method is effective in classifying web texts and can be applied in practice.
Downloads
References
C. W. Chen, S. P. Tseng, J. F. Wang, ‘Outpatient Text Classification System Using LSTM’, J. Inf. Sci. Eng., vol. 37, pp. 365–379, 2021. DOI: 10.6688/JISE.202103_37(2).0006.
T. Zhou, Y. Wang, X. Zheng, ‘Chinese text classification method using FastText and term frequency-inverse document frequency optimization’, J. Phys.: Conf. Ser., vol. 1693, no. 1, pp. 1–6, 2020. DOI: 10.1088/1742-6596/1693/1/012121.
A. Kaddour, N. Zellal, L. Sayad, ‘Improving text classification using text summarization’, 2022 2nd International Conference on New Technologies of Information and Communication (NTIC), pp. 1–8, 2022. DOI: 10.1109/NTIC55069.2022.10100492.
H. Zhou, ‘Research of Text Classification Based on TF-IDF and CNN-LSTM’, J. Phys.: Conf. Ser., vol. 2171, no. 1, pp. 1–8, 2022.
X. Li, B. You, Q. Peng, S. Feng, ‘Dual-view graph convolutional network for multi-label text classification’, Appl. Intell., vol. 54, no. 19, pp. 9363–9380, 2024. DOI: 10.1007/s10489-024-05666-w.
P. P. Ramadhani, S. Hadi, ‘Text classification on the Instagram caption using support vector machine’, J. Phys.: Conf. Ser., vol. 1722, no. 1, pp. 1–7, 2021. DOI: 10.1088/1742-6596/1722/1/012023.
C. Min, Y. Chu, H. Lin, B. Wang, L. Yang, B. Xu, ‘Topic-aware cosine graph convolutional neural network for short text classification’, Soft Comput., vol. 28, no. 13–14, pp. 8119–8132, 2024. DOI: 10.1007/s00500-024-09679-y.
S. Shaikh, M. Y. Khan, M. S. Nizami, ‘Using Patient Descriptions of 20 Most Common Diseases in Text Classification for Evidence-based Medicine’, 2021 Mohammad Ali Jinnah University International Conference on Computing (MAJICC), Karachi, Pakistan, pp. 1–8, 2021. DOI: 10.1109/MAJICC53071.2021.9526252.
S. Paliwal, A. K. Mishra, M. N. N. Senthilkumar, ‘XGBRS Framework Integrated with Word2Vec Sentiment Analysis for Augmented Drug Recommendation’, Comput. Mater. Con., vol. 72, no. 3 Pt.2, pp. 5345–5362, 2022.
R. Indira, W. Maharani, ‘Personality Detection on Social Media Twitter Using Long Short-Term Memory with Word2Vec’, 2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), pp. 64–69, 2021. DOI: 10.1109/COMNETSAT53002.2021.9530820.
P. Gupta, I. Roy, G. Batra, A. K. Dubey, ‘Decoding Emotions in Text Using GloVe Embeddings’, 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), pp. 36–40, 2021. DOI: 10.1109/ICCCIS51004.2021.9397132.
T. Saha, S. Ramesh Jayashree, S. Saha, P. Bhattacharyya, ‘BERT-Caps: A Transformer-Based Capsule Network for Tweet Act Classification’, IEEE T. Comput. Soc. Sy., vol. 7, no. 5, pp. 1168–1179, 2020. DOI: 10.1109/TCSS.2020.3014128.
R. Sabitha, P. Poonkodi, M. S. Karthik, S. Karthik, ‘Premature Infant Cry Classification via Deep Convolutional Recurrent Neural Network Based on Multi-class Features’, Circ. Syst. Signal Pr., vol. 42, no. 12, pp. 7529–7548, 2023.
J. T. Oh, L. S. Yong, ‘A Fuzzy-AHP-based Movie Recommendation System with the Bidirectional Recurrent Neural Network Language Model’, J. Digit. Converg., vol. 18, pp. 525–531, 2020. DOI: 10.14400/JDC.2020.18.12.525.
Y. Yevnin, S. Chorev, I. Dukan, Y. Toledo, ‘Short-term wave forecasts using gated recurrent unit model’, Ocean Eng., vol. 268, pp. 1–8, 2023. DOI: 10.1016/j.oceaneng.2022.113389.
Q. Qian, J. Yu, H. Zhan, R. Wang, ‘A novel DL-BiGRU multi-feature fusion and deep transfer learning based modeling approach for quality prediction of injection molded products using small-sample datasets’, J. Manuf. Process., vol. 120, pp. 272–285, 2024. DOI: 10.1016/j.jmapro.2024.04.030.
S. Lee, J. S. Lee, ‘Experimental evaluation of convolutional neural network-based inter-crystal scattering recovery for high-resolution PET detectors’, Phys. Med. Biol., vol. 68, 2023. DOI: 10.1088/1361-6560/accacb.
G. L. De la Peña Sarracén, P. Rosso, ‘Correction to: Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation’, Pers. Ubiquit. Comput., vol. 28, no. 2, pp. 443–444, 2024. DOI: 10.1007/s00779-024-01791-y.
M. S. Sun, J. Y. Li, Z. P. Guo, Y. Zhao, Y. B. Zheng, X. C. Si, Z. Y. Liu, ‘THUCTC: an efficient Chinese text classification toolkit’, 2016.
Y. Kim, ‘Convolutional Neural Networks for Sentence Classification’, Conference on Empirical Methods in Natural Language Processing, vol. 2014, pp. 1746–1751, 2014. DOI: 10.3115/v1/D14-1181.
P. Liu, X. Qiu, X. Huang, ‘Recurrent Neural Network for Text Classification with Multi-Task Learning’, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, vol. 2016, pp. 2873–2879, 2016. DOI: 10.48550/arXiv.1605.05101.
L. Yao, C. Mao, Y. Luo, ‘Graph convolutional networks for text classification’, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 7370–7377, 2019.
Y. Liu, C. Liu, L. Wang, Z. Chen, ‘Chinese Event Subject Extraction in the Financial Field Integrated with BIGRU and Multi-head Attention’, J. Phys.: Conf. Ser., vol. 1828, no. 1, pp. 1–8, 2021. DOI: 10.1088/1742-6596/1828/1/012032.
W. Wang, Y. X. Sun, Q. J. Qi, X. F. Meng, ‘Text sentiment classification model based on BiGRU-attention neural network’, Appl. Res. Comput., vol. 36, no. 12, pp. 3558–3564, 2019.




