Hybrid Top Features Extraction Model for Detecting X Rumor Events Using an Ensemble Method

Authors

  • Taukir Alam Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan
  • Wei Chung Shia Molecular Medicine Laboratory, Department of Research, Changhua Christian Hospital, Changhua 500, Taiwan , School of Big Data and Artificial Intelligence, Fujian Polytechnic Normal University, Fuqing 350300, China
  • Fang Rong Hsu Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan
  • Taimoor Hassan Institute of Translational Medicine & New Drug Development, China Medical University, Taichung, 404333, Taiwan
  • Pei-Chun Lin Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan
  • Eric Odle Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan
  • Junzo Watada Waseda University, Tokyo, 169-8050, Japan

DOI:

https://doi.org/10.13052/jwe1540-9589.2414

Keywords:

Deep learning, ensemble, machine learning, RFC, RU, SMOTE, rumor detection, natural language processing (NLP)

Abstract

The paper describes a novel a hybrid ensemble algorithm (HEA) that combines ensemble learning, class imbalance handling, and feature extraction. To address class imbalance in the dataset, the suggested approach integrates SMOTE oversampling and random under sampling (RU) feature extraction. To begin, Pearson correlation analysis is used to detect highly associated features in a dataset. This analysis aids in the selection of the most relevant features, which are either substantially related to the target variable or have a strong association with other features. The method seeks to improve classification performance by focusing on these correlated features. Following that, the SMOTE oversampling and RU algorithms are used to balance the majority and minority categorization characteristics. The SMOTE (synthetic minority oversampling technique) develops synthetic cases for the minority class by interpolating between existing instances, enhancing minority class representation. RU, on the other hand, removes instances from the majority class at random to obtain a balanced distribution. Furthermore, the random forest classifier (RFC) model’s key features are input into an ensemble of decision tree (DT), k-nearest neighbor (KNN), adaptive boosting (AdaBoost), and convolutional neural network (CNN) approaches. This ensemble approach combines multiple models’ predictions, exploiting their particular strengths and catching varied patterns in the data. Popular machine learning algorithms include DT, KNN, AdaBoost, and CNN, which are notable for their capacity to handle many types of data and capture complicated relationships. The evaluation findings show that the suggested HEA approach is effective, with a maximum precision, recall, F-score, and accuracy of 90%. The proposed methodology produces encouraging results, proving its applicability to a variety of categorization problems.

Downloads

Download data is not yet available.

Author Biographies

Taukir Alam, Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan

Taukir Alam received his M.Sc. in electrical engineering from National Kaohsiung Normal University (Taiwan). He has worked as a senior engineer with Delta Taiwan and he is currently a Ph.D. candidate with the Faculty of Information Engineering and Computer Science, Feng Chia University (Taiwan). His research interests include machine learning, image processing, and GenAI where he explores the possibilities of generative models such as variational autoencoders (VAEs) and generative adversarial networks (GANs). His work in this field attempts to push the limits of data augmentation, simulation for AI system training, and creative content development.

Wei Chung Shia, Molecular Medicine Laboratory, Department of Research, Changhua Christian Hospital, Changhua 500, Taiwan , School of Big Data and Artificial Intelligence, Fujian Polytechnic Normal University, Fuqing 350300, China

Wei Chung Shia Ph.D. is the principal investigator of molecular medicine laboratory at Changhua Christian Hospital, Taiwan. He holds a Ph.D. in Information Engineering and Computer Science from Feng Chia University, Taiwan. He mainly specializes in breast cancer and related translational research, including molecular biology/genomics/clinical research. His recent research interests are in medical AI, include using machine-learning/deep-learning approaches for medical image analysis (breast ultrasound imaging and mammography) to predict the benign/malignant, prognosis and chemotherapy response.

Fang Rong Hsu, Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan

Fang Rong Hsu received his Ph.D. degree in Computer Science from the National Chiao-Tung University, Hsinchu, Taiwan in 1992. He is a professor and was the chairperson in the Department of Information Engineering and Computer Science at Feng-Chia University, Taiwan. He was the Chairperson and a Professor of the Department of Bioinformatics, Asia University (2003–2004), Chairperson and a Professor of the Department of Information Technology, Asia University (2002–2003), and he was an associate professor and a professor of Providence University (1994–2002). His current research interests include machine learning, bioinformatics, cloud computing, man machine interaction, information security and graph algorithm.

Taimoor Hassan, Institute of Translational Medicine & New Drug Development, China Medical University, Taichung, 404333, Taiwan

Taimoor Hassan received his bachelor’s degree in medical sciences (Operation Theater Technology) from University of Health Sciences, Pakistan. He is currently enrolled as a Graduate student at Institute of Translational Medicine & New Drug Development, China Medical University (Taiwan). His research interests include bioinformatics, structural computational biology, protein engineering, antibody engineering, medical AI, machine learning, cancer biology, cancer therapeutics, translational medicine and new drug development.

Pei-Chun Lin, Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan

Pei-Chun Lin received her Ph.D. degree from the Graduate School of Information, Production and Systems (IPS), Waseda University, Japan. During her Ph.D. work, Dr. Lin constructed a series of statistical models pertaining to fuzzy data. These models were then applied to decision- making systems with promising results. After obtaining her doctorate, Dr. Lin worked as a researcher at IPS in association with Waseda University while continuing to specialize in the application of fuzzy statistical models. One of her key accomplishments during this period was in the union of fuzzy statistical modeling and artificial intelligence. In addition to research, Dr. Lin serves as editor and reviewer for multiple top journals and is often invited as a keynote speaker. Her research interests include soft computing, artificial intelligence computing, robotics computing, statistical modeling, cloud computing, and big data analysis. Currently, Dr. Lin is Associate Professor in the Department of Information Engineering and Computer Science at Feng Chia University in Taichung City, Taiwan.

Eric Odle, Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan

Eric Odle holds a B.Sc in biology from Saint Louis University (Missouri, USA), a B.A. in Japanese from the University of Alaska Anchorage (Alaska, USA), an M.A. in applied linguistics from Yuan Ze University (Taoyuan, Taiwan), and an M.Sc. in biology from National Taiwan Normal University (Taipei, Taiwan). His master’s thesis in linguistics focused on appraisal-based quality analysis of English–Japanese medical translation, while his master’s thesis in biology focused on AI-assisted behavioral tracking of mutant zebrafish undergoing drug treatment. In addition to rumor detection, Eric is passionate about artificial intelligence applications in Japanese text analysis, machine translation, second language education, music genre classification, sport biomechanics, and bioinformatics.

Junzo Watada, Waseda University, Tokyo, 169-8050, Japan

Junzo Watada received his B.Sc. and M.Sc. degrees in electrical engineering from Osaka City University (Osaka, Japan) as well as a Ph.D. from Osaka Prefecture University (Sakai, Japan). Dr. Watada served as Professor of Management Engineering, Knowledge Engineering, and Soft Computing with the Waseda University Graduate School of Information, Production, and Systems (Kitakyushu, Japan) until March of 2016. Now, he serves as Research Professor with the Zhejiang Gongshang University Research Institute of Quantitative Economics, Full Professor with the University Technology Petronas Department of Computer and Information Sciences (Malaysia), and Professor Emeritus at Waseda University (Japan). His research interests include big data analytics, soft computing, tracking systems, knowledge engineering, and management engineering. Moreover, Dr. Watada is a Life Fellow of both the Japan Society for Fuzzy Theory and Intelligent Informatics as well as the Biomedical Fuzzy Systems Association. Since 2019, he has served as President of the Forum of Inter-disciplinary Mathematics in India, as well as President of the International Society of Management Engineers since 2003. His awards include the Henri Coanda Medal Award from Inventico (Romania) in 2002 and the GH Asachi Medal from the Universitatea Tehnica GH Asachi, IASI (Romania) in 2006. Additionally, Dr. Watada serves as principal editor, co-chief editor, and associate editor for various international journals, including ICIC Express Letters, Information Sciences, Journal of Systems and Control Engineering (Proc. IMechE), International Journal of Innovative Computing, Information and Control, and Fuzzy Optimization and Decision Making.

References

Li, J.; Bin, Y.; Peng, L.; Yang, Y.; Li, Y.; Jin, H.; Huang, Z. Focusing on Relevant Responses for Multi-modal Rumor Detection. IEEE Transactions on Knowledge and Data Engineering 2024.

Tan, L.; Wang, G.; Jia, F.; Lian, X. Research status of deep learning methods for rumor detection. Multimedia Tools and Applications 2023, 82, 2941-2982.

“Los angeles gangs in sick contest to kill 100 people in 100 days,” [Online] https://www.dailymail.co.uk/news/article-3178182/Los-Angeles-gangs-bet-kill-100-people-100-days-first.html. 2015.

Meel, P.; Vishwakarma, D.K. Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities. Expert Systems with Applications 2020, 153, 112986.

Allport, G.W.; Postman, L. An analysis of rumor. Public opinion quarterly 1946, 10, 501–517.

Tan, Z.; Ning, J.; Liu, Y.; Wang, X.; Yang, G.; Yang, W. ECRModel: An elastic collision-based rumor-propagation model in online social networks. IEEE Access 2016, 4, 6105–6120.

Wu, L.; Li, J.; Hu, X.; Liu, H. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In Proceedings of the Proceedings of the 2017 SIAM international conference on data mining, 2017; pp. 99–107.

Bondielli, A.; Marcelloni, F. A survey on fake news and rumour detection techniques. Information sciences 2019, 497, 38–55.

Pathak, A.R.; Mahajan, A.; Singh, K.; Patil, A.; Nair, A. Analysis of techniques for rumor detection in social media. Procedia Computer Science 2020, 167, 2286–2296.

Al-Sarem, M.; Boulila, W.; Al-Harby, M.; Qadir, J.; Alsaeedi, A. Deep learning-based rumor detection on microblogging platforms: a systematic review. IEEE access 2019, 7, 152788–152812.

Eismann, K. Diffusion and persistence of false rumors in social media networks: implications of searchability on rumor self-correction on Twitter. Journal of Business Economics 2021, 91, 1299–1329.

Alzanin, S.M.; Azmi, A.M. Detecting rumors in social media: A survey. Procedia computer science 2018, 142, 294–300.

Grekousis, G. Artificial neural networks and deep learning in urban geography: A systematic review and meta-analysis. Computers, Environment and Urban Systems 2019, 74, 244–256.

Ma, J.; Gao, W.; Mitra, P.; Kwon, S.; Jansen, B.J.; Wong, K.-F.; Cha, M. Detecting rumors from microblogs with recurrent neural networks. 2016.

Schmidhuber, J. Deep learning in neural networks: An overview. Neural networks 2015, 61, 85–117.

Deng, L.; Yu, D. Deep learning: methods and applications. Foundations and trends® in signal processing 2014, 7, 197–387.

Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the Proceedings of the 25th international conference on Machine learning, 2008; pp. 160–167.

Wehrmann, J.; Becker, W.; Cagnini, H.E.; Barros, R.C. A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), 2017; pp. 2384–2391.

Huang, P.-S.; Kim, M.; Hasegawa-Johnson, M.; Smaragdis, P. Deep learning for monaural speech separation. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014; pp. 1562–1566.

Lee, H.; Pham, P.; Largman, Y.; Ng, A. Unsupervised feature learning for audio classification using convolutional deep belief networks. Advances in neural information processing systems 2009, 22.

Deng, S.; Huang, L.; Xu, G.; Wu, X.; Wu, Z. On deep learning for trust-aware recommendations in social networks. IEEE transactions on neural networks and learning systems 2016, 28, 1164–1177.

Castillo, C.; Mendoza, M.; Poblete, B. Information credibility on twitter. In Proceedings of the Proceedings of the 20th international conference on World wide web, 2011; pp. 675–684.

Cai, G.; Wu, H.; Lv, R. Rumors detection in chinese via crowd responses. In Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), 2014; pp. 912–917.

Kwon, S.; Cha, M.; Jung, K.; Chen, W.; Wang, Y. Prominent features of rumor propagation in online social media. In Proceedings of the 2013 IEEE 13th international conference on data mining, 2013; pp. 1103–1108.

Yang, F.; Liu, Y.; Yu, X.; Yang, M. Automatic detection of rumor on sina weibo. In Proceedings of the Proceedings of the ACM SIGKDD workshop on mining data semantics, 2012; pp. 1–7.

Jin, F.; Dougherty, E.; Saraf, P.; Cao, Y.; Ramakrishnan, N. Epidemiological modeling of news and rumors on twitter. In Proceedings of the Proceedings of the 7th workshop on social network mining and analysis, 2013; pp. 1–9.

Dayani, R.; Chhabra, N.; Kadian, T.; Kaushal, R. Rumor detection in twitter: An analysis in retrospect. In Proceedings of the 2015 IEEE International Conference on Advanced Networks and Telecommuncations Systems (ANTS), 2015; pp. 1–3.

Kumar, A.; Sangwan, S.R. Rumor detection using machine learning techniques on social media. In Proceedings of the International Conference on Innovative Computing and Communications: Proceedings of ICICC 2018, Volume 2, 2019; pp. 213–221.

Ajao, O.; Bhowmik, D.; Zargari, S. Fake news identification on twitter with hybrid cnn and rnn models. In Proceedings of the Proceedings of the 9th international conference on social media and society, 2018; pp. 226–230.

Alsaeedi, A.; Al-Sarem, M. Detecting rumors on social media based on a CNN deep learning technique. Arabian Journal for Science and Engineering 2020, 45, 10813–10844.

Asghar, M.Z.; Habib, A.; Habib, A.; Khan, A.; Ali, R.; Khattak, A. Exploring deep neural networks for rumor detection. Journal of Ambient Intelligence and Humanized Computing 2021, 12, 4315–4333.

Roy, A.; Basak, K.; Ekbal, A.; Bhattacharyya, P. A deep ensemble framework for fake news detection and classification. arXiv preprint arXiv:1811.04670 2018.

Alkhodair, S.A.; Ding, S.H.; Fung, B.C.; Liu, J. Detecting breaking news rumors of emerging topics in social media. Information Processing & Management 2020, 57, 102018.

Chen, T.; Li, X.; Yin, H.; Zhang, J. Call attention to rumors: Deep attention based recurrent neural networks for early rumor detection. In Proceedings of the Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2018 Workshops, BDASC, BDM, ML4Cyber, PAISI, DaMEMO, Melbourne, VIC, Australia, June 3, 2018, Revised Selected Papers 22, 2018; pp. 40–52.

Ma, J.; Gao, W.; Wong, K.-F. Rumor detection on twitter with tree-structured recursive neural networks. 2018.

Mendoza, M.; Poblete, B.; Castillo, C. Twitter under crisis: Can we trust what we RT? In Proceedings of the Proceedings of the first workshop on social media analytics, 2010; pp. 71–79.

Takahashi, T.; Igata, N. Rumor detection on twitter. In Proceedings of the The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems, 2012; pp. 452–457.

Pheme Dataset https://figshare.com/articles/dataset/PHEME_dataset_of_rumours_and_non-rumours/4010619?file=6453753.

Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 2002, 16, 321–357.

Yi, X.; Xu, Y.; Hu, Q.; Krishnamoorthy, S.; Li, W.; Tang, Z. ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection. Complex & Intelligent Systems 2022, 8, 2247–2272.

Mohammed, R.; Rawashdeh, J.; Abdullah, M. Machine learning with oversampling and undersampling techniques: overview study and experimental results. In Proceedings of the 2020 11th international conference on information and communication systems (ICICS), 2020; pp. 243–248.

Obilor, E.I.; Amadi, E.C. Test for significance of Pearson’s correlation coefficient. International Journal of Innovative Mathematics, Statistics & Energy Policies 2018, 6, 11–23.

Xiaolong, X.; Wen, C.; Xinheng, W. RFC: a feature selection algorithm for software defect prediction. Journal of Systems Engineering and Electronics 2021, 32, 389–398.

Kaur, A.; Guleria, K.; Trivedi, N.K. Feature selection in machine learning: Methods and comparison. In Proceedings of the 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), 2021; pp. 789–795.

Bagheri, M.A.; Gao, Q.; Escalera, S. A framework towards the unification of ensemble classification methods. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications, 2013; pp. 351–355.

Patel, H.H.; Prajapati, P. Study and analysis of decision tree based classification algorithms. International Journal of Computer Sciences and Engineering 2018, 6, 74–78.

Taunk, K.; De, S.; Verma, S.; Swetapadma, A. A brief review of nearest neighbor algorithm for learning and classification. In Proceedings of the 2019 international conference on intelligent computing and control systems (ICCS), 2019; pp. 1255–1260.

Chengsheng, T.; Huacheng, L.; Bing, X. AdaBoost typical Algorithm and its application research. In Proceedings of the MATEC Web of Conferences, 2017; p. 00222.

Indolia, S.; Goswami, A.K.; Mishra, S.P.; Asopa, P. Conceptual understanding of convolutional neural network-a deep learning approach. Procedia computer science 2018, 132, 679–688.

Downloads

Published

2025-03-10

How to Cite

Alam, T. ., Shia, W. C. ., Hsu, F. R. ., Hassan, T. ., Lin, P.-C. ., Odle, . E. ., & Watada, . J. . (2025). Hybrid Top Features Extraction Model for Detecting X Rumor Events Using an Ensemble Method. Journal of Web Engineering, 24(01), 79–106. https://doi.org/10.13052/jwe1540-9589.2414

Issue

Section

Advanced Practice in Web Engineering in Asia