MalVulDroid: Tracing Vulnerabilities from Malware in Android using Natural Language Processing

Authors

  • Shivi Garg Faculty of Informatics and Computing, J.C. Bose University of Science and Technology YMCA, Faridabad, India
  • Niyati Baliyan Department of Computer Engineering, National Institute of Technology Kurukshetra, Haryana, India

DOI:

https://doi.org/10.13052/jwe1540-9589.2185

Keywords:

Android, Machine Learning, Malware, Mapping, Natural Language Processing, Vulnerability

Abstract

The Android operating system is often inflicted with mobile malware attacks, which occur due to some system loopholes or vulnerabilities. One malware can exploit numerous vulnerabilities and multiple malware can exploit a single vulnerability, thus, causing many-to-many ( X : Y ) mapping between malware and vulnerability. Therefore, it is crucial to understand malware behaviour to reduce the vulnerabilities. This paper presents the concept of a “MalVulDroid” framework that maps malware to vulnerabilities using a two-dimensional matrix. The many-to-many ( X : Y )  mapping matrix is obtained by using natural language processing techniques such as Bag-of-Words (BoW) leveraging n-gram probability generation and term frequency-inverse document frequency (TF-IDF), in addition to supervised machine learning classifiers such as multilayer perceptron (MLP), a support vector machine (SVM), a ripple down rule learner (RIDOR), and a pruning rule-based classification tree (PART). This study is the first of its kind where malware-to-vulnerability mapping can be leveraged to measure the rigorousness of unknown vulnerabilities and malware during the early phases of application development. The study considers extensive datasets such as Androzoo, AMD, and CICInvesAndMal2019 with 150 malware families and 48,907 malware samples, and nine major vulnerabilities affecting Android. MalVulDroid exhibits highly promising results with an accuracy of 98.04% for unigrams, and precision and F1-scores of over 90% using ensemble classifiers.

Downloads

Download data is not yet available.

Author Biographies

Shivi Garg, Faculty of Informatics and Computing, J.C. Bose University of Science and Technology YMCA, Faridabad, India

Shivi Garg is an Assistant Professor, Department of Computer Engineering, J.C. Bose University of Science & Technology, YMCA, Faridabad. She has attained Doctor of Philosophy from Information Technology Department, Indira Gandhi Delhi Technical University for Women, (IGDTUW), Delhi, India in December 2021. Her Thesis title is Design and Analysis of Mobile Application Vulnerabilities. She is also a post graduate in Information security from Delhi Technological University (DTU) Delhi, India. She has teaching and research experience since August 2016. Her research interests include-Information Security, mobile security, cyber security, and Machine learning. Her publication and other details can be found at: https://sites.google.com/view/shivigarg/home.

Niyati Baliyan, Department of Computer Engineering, National Institute of Technology Kurukshetra, Haryana, India

Niyati Baliyan is an Assistant Professor, Department of Computer Engineering, National Institute of Technology Kurukshetra, Haryana. She has attained Doctor of Philosophy from Computer Science Department, Indian Institute of Technology (IIT) Roorkee, India. Her thesis title was “Quality Assessment of Semantic Web based Applications”. She also has a Post Graduate Certificate in Information Technology from Sheffield Hallam University, Sheffield, U.K. Dr. Niyati obtained Chancellor’s Gold Medal for being University topper during post graduate studies at Gautam Buddha University. She is co-author of “Semantic Web Based Systems: Quality Assessment Models, Springer Briefs in Computer Science”, 2018. Her research interests include-Knowledge Engineering, Machine Learning, Healthcare analytics, Recommender systems, Information Security, and Natural Language Processing. Her publication and other details can be found at: https://sites.google.com/site/niyatibaliyan.

References

Check Point Software Technologies Ltd., Report on Insights on Emerging Mobile Threats, 2021.

Skybox Security, Report on Vulnerability and Threat Trends, 2021.

McAfee, Report on Mobile Threat, 2021.

U. Ahmed, J.C.W. Lin, and G. Srivastava, G., ‘Mitigating adversarial evasion attacks of ransomware using ensemble learning’, Computers and Electrical Engineering, vol. 100, p. 107903, 2022.

D. Ö. Şahın, S. Akleylek, and E. Kiliç, ‘LinRegDroid: Detection of Android malware using multiple linear regression models-based classifiers’, IEEE Access, vol. 10, pp. 14246–14259, 2022.

P.N. Yeboah and H.B. Baz Musah, ‘NLP technique for malware detection using 1D CNN fusion model’, Security and Communication Networks, 2022.

M. Conti, P. Vinod, and A. Vitella, ‘Obfuscation detection in Android applications using deep learning’, Journal of Information Security and Applications, vol. 70, p. 103311, 2022.

N. Zhang, Y.A. Tan, C. Yang, and Y. Li, ‘Deep learning feature exploration for android malware detection’, Applied Soft Computing, vol. 102, p. 107069, 2021.

E.B. Karbab and M. Debbabi, ‘Maldy: Portable, data-driven malware detection using natural language processing and machine learning techniques on behavioral analysis reports’, Digital Investigation, vol. 28, pp. S77–S87, 2019.

N. Zhang, J. Xue, Y. Ma, R. Zhang, T. Liang, and Y.A. Tan, ‘Hybrid sequence-based Android malware detection using natural language processing’, International Journal of Intelligent Systems, vol. 36, no. 10, pp. 5770–5784, 2021.

S. Wang, Q. Yan, Z. Chen, B. Yang, C. Zhao, M. Conti, and Shandong, ‘Detecting Android malware leveraging text semantics of network flows’, IEEE Transactions on Information Forensics and Security, vol. 13, no. 5, pp. 1096–1109, 2017.

G. Peynirci, M. Eminaǧaoǧlu, and K. Karabulut, ‘Feature selection for malware detection on the Android platform based on differences of IDF values’, Journal of Computer Science and Technology, vol. 35, no. 4, pp. 946–962, 2020.

M. Kinkead, S. Millar, N. McLaughlin, and P. O’Kane, ‘Towards explainable CNNs for Android malware detection’, Procedia Computer Science, vol. 184, pp. 959–965, 2021.

S. I. Imtiaz, S. ur Rehman, A.R. Javed, Z. Jalil, X. Liu, and W.S. Alnumay, ‘DeepAMD: Detection and identification of Android malware using high-efficient deep artificial neural network’, Future Generation Computer Systems, vol. 115, pp. 844–856, 2021.

ZDNet, ‘Three quarters of mobile apps have this security vulnerability that could put your personal data at risk’, 2019.

R. Surendran, T. Thomas, and S. Emmanuel, ‘GSDroid: Graph signal based compact feature representation for Android malware detection’, Expert Systems with Applications, vol. 159, p. 113581, 2020.

S. Garg and N. Baliyan, ‘Comparative analysis of Android and iOS from security viewpoint’, Computer Science Review, vol. 40, p. 100372, 2021.

D. Costa, F. Handrick, I. Medeiros, M. Thales, J. Victor da Silva, I. Lorraine da Silva, and M. Ribeiro, ‘Exploring the use of static and dynamic analysis to improve the performance of the mining sandbox approach for android malware identification’, Journal of Systems and Software, vol. 183, p. 111092, 2022.

F. Wei, Y. Li, S. Roy, X. Ou, and W. Zhou, ‘Deep ground truth analysis of current Android malware’, In Int. Conf. on Detection of Intrusions and Malware, and Vulnerability Assessment, vol. 10327, pp. 252–276, 2017.

S. Garg and N. Baliyan, ‘Android security assessment: A review, taxonomy and research gap study’, Computers & Security, vol. 100, p. 102087, 2020.

S. Garg and N. Baliyan, ‘Data on vulnerability detection in Android’, Data in Brief, vol. 22, pp. 1081–1087, 2019.

K. Allix, T.F. Bissyandé, J. Klein, and Y.L. Traon, ‘AndroZoo: Collecting millions of Android apps for the research community’, In Working Conference on Mining Software Repositories (MSR), pp. 468–471, 2016.

T. Taheri, A.F. Kadir, and A.H. Lashkari, ‘Extensible Android malware detection and family classification using network-flows and API-Calls’, In Int. Conf. on Security Technology (ICCST), pp. 1–8, 2019.

Z.S. Harris, ‘Distributional structure’, Word, vol. 10, no. 2–3, pp. 146–162, 1954.

D. Jurafsky and J.H. Martin, Speech and Language Processing, Pearson Education India, 2000.

H.P. Luhn, ‘A statistical approach to mechanized encoding and searching of literary information’, IBM Journal of Research and Development, vol. 1, no. 4, pp. 309–317, 1957.

S. Noekhah, N.b. Salim, and N.H. Zakaria, ‘Opinion spam detection: Using multi-iterative graph-based model’, Information Processing & Management, vol. 57, no. 1, p. 102140, 2020.

S. Garg and N. Baliyan, ‘A novel parallel classifier scheme for vulnerability detection in Android’, Computers & Electrical Engineering, vol. 77, pp. 12–26, 2019.

T. Mitchell, Machine Learning, Pittsburgh: McGraw-Hill Education, 1997.

T. Chai, S. Prasad and S. Wang, ‘Boosting palmprint identification with gender information using DeepNet’. Future Generation Computer Systems, pp. 41–53, 2019.

G. Bajwa, M, Fazeen, R. Dantu, and S. Tanpure, “Unintentional bugs to vulnerability mapping in android applications.” In 2015 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 176–178, IEEE.

Published

2023-03-19

How to Cite

Garg, S. ., & Baliyan, N. . (2023). MalVulDroid: Tracing Vulnerabilities from Malware in Android using Natural Language Processing. Journal of Web Engineering, 21(08), 2339–2362. https://doi.org/10.13052/jwe1540-9589.2185

Issue

Section

Articles