Metaheuristic Aided Improved LSTM for Multi-document Summarization: A Hybrid Optimization Model

Authors

  • Sunilkumar Ketineni Department of School of Computer Science and Engineering, VIT-AP University, Amaravathi, Andhra Pradesh, India-522237
  • Sheela J Department of School of Computer Science and Engineering, VIT-AP University, Amaravathi, Andhra Pradesh, India-522237

DOI:

https://doi.org/10.13052/jwe1540-9589.2246

Keywords:

Multi-document summarization, LSTM, Score generation, BMICO, Optimization

Abstract

Multi-document summarization (MDS) is an automated process designed to extract information from various texts that have been written regarding the same subject. Here, we present a generic, extractive, MDS approach that employs steps like preprocessing, feature extraction, score generation, and summarization. The input text goes preprocessing steps such as lemmatization, stemming, and tokenization in the first stage. After preprocessing, features are extracted, including improved semantic similarity-based features, term frequency-inverse document frequency (TF-IDF-based features), and thematic-based features. Finally, an improved LSTM model will be proposed to summarize the document based on the scores considered under the objectives such as content coverage and redundancy reduction. The Blue Monkey Integrated Coot Optimization (BMICO) algorithm is proposed in this paper for fine-tuning the optimal weight of the LSTM model that ensures precise summarization. Finally, the suggested BMICO’s effectiveness is evaluated, and the outcome is successfully verified.

Downloads

Download data is not yet available.

Author Biographies

Sunilkumar Ketineni, Department of School of Computer Science and Engineering, VIT-AP University, Amaravathi, Andhra Pradesh, India-522237

Sunilkumar Ketineni received his M.Tech degree in JNTUK. He is currently pursing a Ph.D. in VIT-AP University, Andhra Pradesh, India. Areas of interest are natural language processing and deep learning. He has published five research papers in international journals and conferences of repute in data mining and NLP.

Sheela J, Department of School of Computer Science and Engineering, VIT-AP University, Amaravathi, Andhra Pradesh, India-522237

Sheela J has served VIT, Andhra Pradesh as Assistant Professor in the School of Computer Science and Engineering (SCOPE). She was a faculty member at KITS, Warangal before joining VIT, Andhra Pradesh. She graduated with a B.Eng. from Sri Krishna College of Technology, Coimbatore affiliated to the Anna University University, Chennai, and obtained a Master of Engineering from Anna University Campus, with the third rank from Anna University, Coimbatore. She got her Ph.D. from National Institute of Technology, Tiruchirappalli, India. She cleared the National Eligibility Test for Lectureship by Tamil Nadu in 2016. Before beginning the Ph.D. program, Sheela worked as an Assistant Professor in the Hindustan College of Engineering and Technology. She has 6 years of teaching experience as an Assistance Professor in Anna University and Hindustan College of Engineering and Technology, Coimbatore. She has published over 20 research papers in international journals and conferences of repute in data mining and NLP.

References

Jesus M. Sanchez-Gomez a, Miguel A. Vega-Rodríguez, Carlos J. Pérez, “A decomposition-based multi-objective optimization approach for extractive multi-document text summarization”, Applied Soft Computing Journal, vol. 91, 2020.

Taner Uçkan, Ali Karcı, “Extractive multi-document text summarization based on graph independent sets”, Egyptian Informatics Journal, vol. 21, 2020.

Tran, NT., Nghiem, MQ., Nguyen, N.T.H. et al. ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization. Lang Resources and Evaluation 54, 893–920 (2020). https://doi.org/10.1007/s10579-020-09495-4.

Khaleghi, Z., Fakhredanesh, M. and Hourali, M. MSCSO: Extractive Multi-document Summarization Based on a New Criterion of Sentences Overlapping. Iran J Sci Technol Trans Electr Eng 45, 195–205 (2021). https://doi.org/10.1007/s40998-020-00361-1.

Roul, R.K. Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput 25, 1113–1127 (2021). https://doi.org/10.1007/s00500-020-05207-w.

Min Yang, Xintong Wang, Yao Lu, Jianming Lv, Ying Shen, Chengming Li, “Plausibility-promoting generative adversarial network for abstractive text summarization with multi-task constraint”, Information Sciences, vol. 521, 2020.

Salima Lamsiyah, Abdelkader El Mahdaouy, Bernard Espinasse, Saïd El Alaoui Ouatik, “An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings”, Expert Systems with Applications, vol. 167, 2021.

Lamsiyah, S., El Mahdaouy, A., Ouatik El Alaoui, S. et al. Unsupervised query-focused multi-document summarization based on transfer learning from sentence embedding models, BM25 model, and maximal marginal relevance criterion. J Ambient Intell Human Comput (2021). https://doi.org/10.1007/s12652-021-03165-1.

Alireza Ghadimi, Hamid Beigy, “Deep submodular network: An application to multi-document summarization”, Expert Systems With Applications, vol. 152, 2020.

Gao, Y., Meyer, C.M. and Gurevych, I. Preference-based interactive multi-document summarisation. Inf Retrieval J 23, 555–585 (2020). https://doi.org/10.1007/s10791-019-09367-8.

Minakshi Tomer, Manoj Kumar, “Multi-document extractive text summarization based on firefly algorithm”, Journal of King Saud University – Computer and Information Sciences, 2021.

Hou Pong Chan, Irwin King, “A condense-then-select strategy for text summarization”, Knowledge-Based Systems, vol. 227, 2021.

Ramesh Chandra Belwal, Sawan Rai, Atul Gupta, “Text summarization using topic-based vector space model and semantic measure”, Information Processing and Management, vol. 58, 2021.

Srivastava, A.K., Pandey, D. and Agarwal, A. Extractive multi-document text summarization using dolphin swarm optimization approach. Multimed Tools Appl 80, 11273–11290 (2021). https://doi.org/10.1007/s11042-020-10176-1.

Mohammad Mojrian, Seyed Abolghasem Mirroshandel, “A novel extractive multi-document text summarization system using quantum-inspired genetic algorithm: MTSQIGA”, Expert Systems With Applications, vol. 121, 2021.

Shirin Akther Khanam, Fei Liu, Yi-Ping Phoebe Chen, “Joint knowledge-powered topic level attention for a convolutional text summarization model”, Knowledge-Based Systems, vol. 228, 2021.

Jesus M. Sanchez-Gomez, Miguel A. Vega-Rodríguez, Carlos J. Pérez, “The impact of term-weighting schemes and similarity measures on extractive multi-document text summarization”, Expert Systems With Applications, vol. 169, 2021.

Jesus M. Sanchez-Gomez, Miguel A. Vega-Rodríguez, Carlos J. Pérez, “A decomposition-based multi-objective optimization approach for extractive multi-document text summarization”, Applied Soft Computing Journal, 2020.

Leonhard Hennig and Berlin, “Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis”, International Conference RANLP 2009 – Borovets, Bulgaria.

Mohammad Bidoki, Mohammad R. Moosavi, Mostafa Fakhrahmad, “A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities”, Information Processing and Management, vol. 57, 2020.

R. Alqaisi, W. Ghanem and A. Qaroush, “Extractive Multi-Document Arabic Text Summarization Using Evolutionary Multi-Objective Optimization With K-Medoid Clustering, “in IEEE Access, vol. 8, pp. 228206–228224, 2020, DOI: 10.1109/ACCESS.2020.3046494.

W. Li and H. Zhuge, “Abstractive Multi-Document Summarization Based on Semantic Link Network, “in IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 1, pp. 43–54, 1 Jan. 2021, DOI: 10.1109/TKDE.2019.2922957.

Lucie Skorkovska, “Application of Lemmatization and Summarization Methods in Topic Identification Module for Large Scale Language Modeling Data Filtering”, DOI: 10.1007/978-3-642-32790-2_23, 2012.

Marzieh Berenjkoub, Razieh Mehri, Hadi Khosravi Farsani, Mohammad Ali Nematbakhsh, “A method for stemming and eliminating common words for Persian text summarization”, DOI: 10.1109/NLPKE.2009.5313836, 2009.

R. Alqaisi, W. Ghanem and A. Qaroush, “Extractive Multi-Document Arabic Text Summarization Using Evolutionary Multi-Objective Optimization With K-Medoid Clustering, “in IEEE Access, vol. 8, pp. 228206–228224, 2020, DOI: 10.1109/ACCESS.2020.3046494.

Sahar Sohangir and Dingding Wang, Improved sqrt-cosine similarity measurement, Sohangir and Wang J Big Data (2017) 4:25, DOI: 10.1186/s40537-017-0083-6, 2017.

Bijoyan Das and Sarit Chakraborty, “An Improved Text Sentiment Classification Model Using TF-IDF and Next Word Negation”, 2022.

Puruso Muhammad Hanunggul and Suyanto Suyanto, “The Impact of Local Attention in LSTM for Abstractive Text Summarization”, 2019 International seminar on information technology and intelligent systems (ISRITI).

Ruby, Usha, and Vamsidhar Yendapalli. “Binary cross entropy with deep learning technique for image classification.” Int. J. Adv. Trends Comput. Sci. Eng 9.10 (2020).

Iraj Naruei and Farshid Keynia, “A new optimization method based on COOT bird natural life model “, Expert Systems With Applications, vol. 183, 2021.

Maha Mahmood and Belal Al-Khateeb, “The blue monkey: A new nature inspired metaheuristic optimization algorithm”, Periodicals of Engineering and Natural Sciences, vol. 7, no. 3, 2019.

https://duc.nist.gov/data.html.

Song, S., Huang, H. and Ruan, T. Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78, 857–875 (2019). https://doi.org/10.1007/s11042-018-5749-3.

Kasimahanthi Divya, Kambala Sneha, Baisetti Sowmya, G Sankara Rao,”Text Summarization using Deep Learning”, International Research Journal of Engineering and Technology (IRJET), vol. 7, 2020.

Downloads

Published

2023-10-25

How to Cite

Ketineni, S. ., & J, S. . (2023). Metaheuristic Aided Improved LSTM for Multi-document Summarization: A Hybrid Optimization Model. Journal of Web Engineering, 22(04), 701–730. https://doi.org/10.13052/jwe1540-9589.2246

Issue

Section

Articles