HCNN-LSTM: Hybrid Convolutional Neural Network with Long Short-Term Memory Integrated for Legitimate Web Prediction

Authors

  • Candra Zonyfar Department of Computer Science and Electronics Engineering, Sun Moon University, Korea
  • Jung-Been Lee 1) Department of Computer Science and Electronics Engineering, Sun Moon University, Korea 2) Department of Computer Science and Engineering, Sun Moon University, Asan, 31460 South Korea
  • Jeong-Dong Kim 1) Department of Computer Science and Electronics Engineering, Sun Moon University, Korea 2) Department of Computer Science and Engineering, Sun Moon University, Asan, 31460 South Korea 3) Genom-Based BioIT Convergence Institute, Sun Moon University, Korea

DOI:

https://doi.org/10.13052/jwe1540-9589.2251

Keywords:

Phishing detection, cyber threat, CNN-LSTM, deep learning, machine learning

Abstract

Phishing techniques are the most frequently used threat by attackers to deceive Internet users and obtain sensitive victim information, such as login credentials and credit card numbers. So, it is important for users to know the legitimate website to avoid the traps of fake websites. However, it is difficult for lay users to distinguish legitimate websites, considering that phishing techniques are always developing from time to time. Therefore, a legitimate website detection system is an easy way for users to avoid phishing websites. To address this problem, we present a hybrid deep learning model by combining a convolution neural network and long short-term memory (HCNN-LSTM). A one-dimensional CNN with a LSTM network shared estimation of all sublayers, then implements the proposed model in the benchmark dataset for phishing prediction, which consists of 11430 URLs with 87 attributes extracted of which 56 parameters are selected from URL structure and syntax. The HCNN-LSTM model was successful in binary classification with accuracy, precision, recall, and F1-score of 95.19%, 95.00%, 95.00%, 95.00%, successively outperforming the CNN and LSTM. Thus, the results show that our proposed model is a competitive new model for the legitimate web prediction tasks.

Downloads

Download data is not yet available.

Author Biographies

Candra Zonyfar, Department of Computer Science and Electronics Engineering, Sun Moon University, Korea

Candra Zonyfar received his bachelor’s degree in computer science from Singaperbangsa Karawang in 2013 and his master’s degree in computer science from Budi Luhur University, Indonesia in 2019. He is currently studying for a Ph.D. in computer and electronics engineering from Sun Moon University in 2022 South Korea. His main research interests include deep learning and data science in bioinformatics.

Jung-Been Lee, 1) Department of Computer Science and Electronics Engineering, Sun Moon University, Korea 2) Department of Computer Science and Engineering, Sun Moon University, Asan, 31460 South Korea

Jung-Been Lee received his bachelor’s degree in computer engineering from Chosun University in 2002. He received his M.Sc. and Ph.D. degrees from the College of Informatics at Korea University, Seoul, in 2011 and 2020, respectively. From 2020 to 2022, he was a research professor at Chronobiology Institute at Korea University in Seoul. He is currently an assistant professor in the department of computer science and engineering at Sun Moon University, Asan, Korea. His primary areas of study include mining software artifacts and analysis and machine learning from wearable sensor data.

Jeong-Dong Kim, 1) Department of Computer Science and Electronics Engineering, Sun Moon University, Korea 2) Department of Computer Science and Engineering, Sun Moon University, Asan, 31460 South Korea 3) Genom-Based BioIT Convergence Institute, Sun Moon University, Korea

Jeong-Dong Kim received his bachelor’s degree in computer engineering from Sun Moon University in 2005. He received his M.Sc. and Ph.D. degrees in Computer Science from Korea University at Korea in 2008 and 2012, respectively. He is an associate professor in the department of computer science and engineering, Sun Moon University, Asan, Korea. His research interests include bigdata analysis based on deep learning, healthcare, software and data engineering, and bioinformatics.

References

G. Tsochev, R. Trifonov, O. Nakov, S. Manolov, and G. Pavlova, “Cyber security: Threats and Challenges,” 2020 Int. Conf. Autom. Informatics, ICAI 2020 – Proc., 2020, doi: 10.1109/ICAI50593.2020.9311369.

A. K. Jain and B. B. Gupta, “A survey of phishing attack techniques, defence mechanisms and open research challenges,” Enterp. Inf. Syst., vol. 16, no. 4, pp. 527–565, 2022, doi: 10.1080/17517575.2021.1896786.

A. Hannousse and S. Yahiouche, “Towards benchmark datasets for machine learning based website phishing detection: An experimental study,” Eng. Appl. Artif. Intell., vol. 104, no. June, p. 104347, 2021, doi: 10.1016/j.engappai.2021.104347.

A. Odeh, I. Keshta, and E. Abdelfattah, “Machine LearningTechniquesfor Detection of Website Phishing: A Review for Promises and Challenges,” 2021 IEEE 11th Annu. Comput. Commun. Work. Conf. CCWC 2021, pp. 813–818, 2021, doi: 10.1109/CCWC51732.2021.9375997.

L. Tang and Q. H. Mahmoud, “A Survey of Machine Learning-Based Solutions for Phishing Website Detection,” Mach. Learn. Knowl. Extr., vol. 3, no. 3, pp. 672–694, 2021, doi: 10.3390/make3030034.

S. Maurya, H. S. Saini, and A. Jain, “Browser extension based hybrid anti-phishing framework using feature selection,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 11, pp. 579–588, 2019, doi: 10.14569/IJACSA.2019.0101178.

N. Zhao, “Credibility Evaluation of Web Big Data Information Based on Particle Swarm Optimization,” J. Web Eng., vol. 21, no. 2, pp. 405–423, 2021, doi: 10.13052/jwe1540-9589.21212.

DBIR, “Data Breach Investigations Report (DBIR),” 2021. [Online]. Available: https://www.verizon.com/business/resources/reports/2021/2021-data-breach-investigations-report.pdf?_ga=2.226803477.929326497.1638808696-965423441.1638808696. [Accessed: 06-Aug-2023].

TESSIAN, “Must-Know Phishing Statistics: Updated 2022,” 2022. [Online]. Available: https://www.tessian.com/blog/phishing-statistics-2020/. [Accessed: 06-Aug-2023].

D. Bera, O. Ogbanufe, and D. J. Kim, “Towards a thematic dimensional framework of online fraud: An exploration of fraudulent email attack tactics and intentions,” Decis. Support Syst., vol. 171, no. April, 2023, doi: 10.1016/j.dss.2023.113977.

Anti-Phishing Work Group, “Phishing Activity Trends Report,” Phishing Act. Trends Rep., vol. Q2 2020, no. August, pp. 1–13, 2020.

Wandera, “Mobile Threat Landscape 2020: Understanding THE Key Trend IN Mobile Enterprise Security IN 2020. Technical Report,” 2020.

A. Kumar, K. Abhishek, S. K. Shandilya, and D. M. Ghalib, “Malware Analysis Through Random Forest Approach,” J. Web Eng., vol. 19, 2020, doi: 10.13052/jwe1540-9589.195610.

K. Raghunath, V. V. Kumar, M. Venkatesan, K. K. Singh, M. T R, and A. Singh, “XGBoost Regression Classifier (XRC) Model for Cyber Attack Detection and Classification Using Inception V4,” J. Web Eng., 2022, doi: 10.13052/jwe1540-9589.21413.

F. Wan, F. Yang, T. Wu, D. Zhang, L. Zhang, and Y. Wang, “Chinese shallow semantic parsing based on multilevel linguistic clues,” J. Comput. Methods Sci. Eng., vol. 20, pp. 1–10, 2020, doi: 10.3233/JCM-194111.

B. B. Gupta, K. Yadav, I. Razzak, K. Psannis, A. Castiglione, and X. Chang, “A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment,” Comput. Commun., vol. 175, no. November 2020, pp. 47–57, 2021, doi: 10.1016/j.comcom.2021.04.023.

K. A. Apoorva and S. Sangeetha, “Analysis of uniform resource locator using boosting algorithms for forensic purpose,” Comput. Commun., vol. 190, no. March, pp. 69–77, 2022, doi: 10.1016/j.comcom.2022.04.002.

R. S. Rao, T. Vaishnavi, and A. R. Pais, “CatchPhish: detection of phishing websites by inspecting URLs,” J. Ambient Intell. Humaniz. Comput., vol. 11, no. 2, pp. 813–825, 2020, doi: 10.1007/s12652-019-01311-4.

M. Sánchez-paniagua, E. Fidalgo, E. Alegre, and R. Alaiz-rodríguez, “Phishing websites detection using a novel multipurpose dataset and web technologies features,” Expert Syst. Appl., vol. 207, no. June, p. 118010, 2022, doi: 10.1016/j.eswa.2022.118010.

S. W. Liew, N. F. M. Sani, M. T. Abdullah, R. Yaakob, and M. Y. Sharum, “An effective security alert mechanism for real-time phishing tweet detection on Twitter,” Comput. Secur., vol. 83, pp. 201–207, 2019, doi: 10.1016/j.cose.2019.02.004.

M. Hussain, C. Cheng, R. Xu, and M. Afzal, “CNN-Fusion: An effective and lightweight phishing detection method based on multi-variant ConvNet,” Inf. Sci. (Ny)., vol. 631, no. July 2022, pp. 328–345, 2023, doi: 10.1016/j.ins.2023.02.039.

F. Zheng, Q. Yan, V. C. M. Leung, F. Richard Yu, and Z. Ming, “HDP-CNN: Highway deep pyramid convolution neural network combining word-level and character-level representations for phishing website detection,” Comput. Secur., vol. 114, p. 102584, 2022, doi: 10.1016/j.cose.2021.102584.

E. Oram, P. B. Dash, B. Naik, J. Nayak, S. Vimal, and S. K. Nataraj, “Light gradient boosting machine-based phishing webpage detection model using phisher website features of mimic URLs,” Pattern Recognit. Lett., vol. 152, pp. 100–106, 2021, doi: 10.1016/j.patrec.2021.09.018.

S. Mathulaprangsan, K. Lanthong, D. Jetpipattanapong, S. Sateanpattanakul, and S. Patarapuwadol, “Rice Diseases Recognition Using Effective Deep Learning Models,” 2020 Jt. Int. Conf. Digit. Arts, Media Technol. with ECTI North. Sect. Conf. Electr. Electron. Comput. Telecommun. Eng. ECTI DAMT NCON 2020, no. March, pp. 386–389, 2020, doi: 10.1109/ECTIDAMTNCON48261.2020.9090709.

S. Srinivasan, V. Ravi, M. Alazab, S. Ketha, A. M. Al-Zoubi, and S. Kotti Padannayil, “Spam Emails Detection Based on Distributed Word Embedding with Deep Learning,” Stud. Comput. Intell., vol. 919, no. January 2021, pp. 161–189, 2021, doi: 10.1007/978-3-030-57024-8_7.

R. Wazirali, R. Ahmad, and A. A. K. Abu-Ein, “Sustaining accurate detection of phishing URLs using SDN and feature selection approaches,” Comput. Networks, vol. 201, no. November, p. 108591, 2021, doi: 10.1016/j.comnet.2021.108591.

A. Basit, M. Zafar, X. Liu, A. R. Javed, Z. Jalil, and K. Kifayat, “A comprehensive survey of AI-enabled phishing attacks detection techniques,” Telecommun. Syst., vol. 76, no. 1, pp. 139–154, 2021, doi: 10.1007/s11235-020-00733-2.

D. E. Kouicem, A. Bouabdallah, and H. Lakhlef, “Internet of things security: A top-down survey,” Comput. Networks, vol. 141, pp. 199–221, 2018, doi: 10.1016/j.comnet.2018.03.012.

E. Elbasani and J. D. Kim, “AMR-CNN: Abstract Meaning Representation with Convolution Neural Network for Toxic Content Detection,” J. Web Eng., vol. 21, no. 3, pp. 677–692, 2022, doi: 10.13052/jwe1540-9589.2135.

K. L. Chiew, C. L. Tan, K. S. Wong, K. S. C. Yong, and W. K. Tiong, “A new hybrid ensemble feature selection framework for machine learning-based phishing detection system,” Inf. Sci. (Ny)., vol. 484, pp. 153–166, 2019, doi: 10.1016/j.ins.2019.01.064.

M. A. Adebowale, K. T. Lwin, E. Sánchez, and M. A. Hossain, “Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text,” Expert Syst. Appl., vol. 115, no. December 2017, pp. 300–313, 2019, doi: 10.1016/j.eswa.2018.07.067.

T. Mohammad, Rami, Thabtah, Fadi Abdeljaber and McCluskey, “Predicting phishing websites based on self-structuring neural network,” Neural Comput. Appl., vol. 25(2), ISSN 0941-0643, pp. 443–458, 2014.

F. A. Mohammad, Rami, McCluskey, T.L. and Thabtah, “Intelligent Rule based Phishing Websites Classification. IET Information Security,” IET Inf. Secur., vol. 8(3), ISSN 1751-8709, pp. 153–160, 2014.

“PhishTank.” [Online]. Available: https://phishtank.org/.

O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, “Machine learning based phishing detection from URLs,” Expert Syst. Appl., vol. 117, pp. 345–357, 2019, doi: 10.1016/j.eswa.2018.09.029.

Y. X. Y. Technologies, “yandex-xml,” 2013. [Online]. Available: https://yandex.com.tr/dev/xml/.

M. Alshehri, A. Abugabah, A. Algarni, and S. Almotairi, “Character-level word encoding deep learning model for combating cyber threats in phishing URL detection,” Comput. Electr. Eng., vol. 100, no. March, p. 107868, 2022, doi: 10.1016/j.compeleceng.2022.107868.

X.-C. Z. P. Dong-Jie Liu, Guang-Gang Geng, “Multi-scale semantic deep fusion models for phishing website detection,” Econ. Lett., p. 110456, 2022, doi: 10.1016/j.eswa.2022.118305.

A. Aljofey et al., “An effective detection approach for phishing websites using URL and HTML features,” Sci. Rep., vol. 12, no. 1, pp. 1–19, 2022, doi: 10.1038/s41598-022-10841-5.

H. Wu, X. Zhang, and J. Yang, “Deep Learning-Based Encrypted Network Traffic Classification and Resource Allocation in SDN,” J. Web Eng., vol. 20, no. 8, pp. 2319–2334, 2021, doi: 10.13052/jwe1540-9589.2085.

H. C. Altunay and Z. Albayrak, “A hybrid CNN +

LSTMbased intrusion detection system for industrial IoT networks,” Eng. Sci. Technol. an Int. J., vol. 38, p. 101322, 2023, doi: 10.1016/j.jestch.2022.101322.

W. El-Shafai, I. Almomani, and A. Alkhayer, “Visualized malware multi-classification framework using fine-tuned cnn-based transfer learning models,” Appl. Sci., vol. 11, no. 14, 2021, doi: 10.3390/app11146446.

P. R. Kanna and P. Santhi, “Hybrid Intrusion Detection using MapReduce based Black Widow Optimized Convolutional Long Short-Term Memory Neural Networks,” Expert Syst. Appl., vol. 194, no. May 2021, 2022, doi: 10.1016/j.eswa.2022.116545.

N. Gupta, V. Jindal, and P. Bedi, “LIO-IDS: Handling class imbalance using LSTM and improved one-vs-one technique in intrusion detection system,” Comput. Networks, vol. 192, no. December 2020, 2021, doi: 10.1016/j.comnet.2021.108076.

Y. Imrana, Y. Xiang, L. Ali, and Z. Abdul-Rauf, “A bidirectional LSTM deep learning approach for intrusion detection,” Expert Syst. Appl., vol. 185, no. July, 2021, doi: 10.1016/j.eswa.2021.115524.

S. K. Sahu, D. P. Mohapatra, J. K. Rout, K. S. Sahoo, Q. V. Pham, and N. N. Dao, “A LSTM-FCNN based multi-class intrusion detection using scalable framework,” Comput. Electr. Eng., vol. 99, no. December 2021, 2022, doi: 10.1016/j.compeleceng.2022.107720.

M. Korkmaz, O. K. Sahingoz, and B. DIri, “Detection of Phishing Websites by Using Machine Learning-Based URL Analysis,” 2020 11th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2020, 2020, doi: 10.1109/ICCCNT49239.2020.9225561.

Downloads

Published

2023-12-21

How to Cite

Zonyfar, C. ., Lee, J.-B. ., & Kim, J.-D. . (2023). HCNN-LSTM: Hybrid Convolutional Neural Network with Long Short-Term Memory Integrated for Legitimate Web Prediction. Journal of Web Engineering, 22(05), 757–782. https://doi.org/10.13052/jwe1540-9589.2251

Issue

Section

ECTI