MalVulDroid: Tracing Vulnerabilities from Malware in Android using Natural Language Processing
Keywords:Android, Machine Learning, Malware, Mapping, Natural Language Processing, Vulnerability
The Android operating system is often inflicted with mobile malware attacks, which occur due to some system loopholes or vulnerabilities. One malware can exploit numerous vulnerabilities and multiple malware can exploit a single vulnerability, thus, causing many-to-many ( X : Y ) mapping between malware and vulnerability. Therefore, it is crucial to understand malware behaviour to reduce the vulnerabilities. This paper presents the concept of a “MalVulDroid” framework that maps malware to vulnerabilities using a two-dimensional matrix. The many-to-many ( X : Y ) mapping matrix is obtained by using natural language processing techniques such as Bag-of-Words (BoW) leveraging n-gram probability generation and term frequency-inverse document frequency (TF-IDF), in addition to supervised machine learning classifiers such as multilayer perceptron (MLP), a support vector machine (SVM), a ripple down rule learner (RIDOR), and a pruning rule-based classification tree (PART). This study is the first of its kind where malware-to-vulnerability mapping can be leveraged to measure the rigorousness of unknown vulnerabilities and malware during the early phases of application development. The study considers extensive datasets such as Androzoo, AMD, and CICInvesAndMal2019 with 150 malware families and 48,907 malware samples, and nine major vulnerabilities affecting Android. MalVulDroid exhibits highly promising results with an accuracy of 98.04% for unigrams, and precision and F1-scores of over 90% using ensemble classifiers.
Check Point Software Technologies Ltd., Report on Insights on Emerging Mobile Threats, 2021.
Skybox Security, Report on Vulnerability and Threat Trends, 2021.
McAfee, Report on Mobile Threat, 2021.
U. Ahmed, J.C.W. Lin, and G. Srivastava, G., ‘Mitigating adversarial evasion attacks of ransomware using ensemble learning’, Computers and Electrical Engineering, vol. 100, p. 107903, 2022.
D. Ö. Şahın, S. Akleylek, and E. Kiliç, ‘LinRegDroid: Detection of Android malware using multiple linear regression models-based classifiers’, IEEE Access, vol. 10, pp. 14246–14259, 2022.
P.N. Yeboah and H.B. Baz Musah, ‘NLP technique for malware detection using 1D CNN fusion model’, Security and Communication Networks, 2022.
M. Conti, P. Vinod, and A. Vitella, ‘Obfuscation detection in Android applications using deep learning’, Journal of Information Security and Applications, vol. 70, p. 103311, 2022.
N. Zhang, Y.A. Tan, C. Yang, and Y. Li, ‘Deep learning feature exploration for android malware detection’, Applied Soft Computing, vol. 102, p. 107069, 2021.
E.B. Karbab and M. Debbabi, ‘Maldy: Portable, data-driven malware detection using natural language processing and machine learning techniques on behavioral analysis reports’, Digital Investigation, vol. 28, pp. S77–S87, 2019.
N. Zhang, J. Xue, Y. Ma, R. Zhang, T. Liang, and Y.A. Tan, ‘Hybrid sequence-based Android malware detection using natural language processing’, International Journal of Intelligent Systems, vol. 36, no. 10, pp. 5770–5784, 2021.
S. Wang, Q. Yan, Z. Chen, B. Yang, C. Zhao, M. Conti, and Shandong, ‘Detecting Android malware leveraging text semantics of network flows’, IEEE Transactions on Information Forensics and Security, vol. 13, no. 5, pp. 1096–1109, 2017.
G. Peynirci, M. Eminaǧaoǧlu, and K. Karabulut, ‘Feature selection for malware detection on the Android platform based on differences of IDF values’, Journal of Computer Science and Technology, vol. 35, no. 4, pp. 946–962, 2020.
M. Kinkead, S. Millar, N. McLaughlin, and P. O’Kane, ‘Towards explainable CNNs for Android malware detection’, Procedia Computer Science, vol. 184, pp. 959–965, 2021.
S. I. Imtiaz, S. ur Rehman, A.R. Javed, Z. Jalil, X. Liu, and W.S. Alnumay, ‘DeepAMD: Detection and identification of Android malware using high-efficient deep artificial neural network’, Future Generation Computer Systems, vol. 115, pp. 844–856, 2021.
ZDNet, ‘Three quarters of mobile apps have this security vulnerability that could put your personal data at risk’, 2019.
R. Surendran, T. Thomas, and S. Emmanuel, ‘GSDroid: Graph signal based compact feature representation for Android malware detection’, Expert Systems with Applications, vol. 159, p. 113581, 2020.
S. Garg and N. Baliyan, ‘Comparative analysis of Android and iOS from security viewpoint’, Computer Science Review, vol. 40, p. 100372, 2021.
D. Costa, F. Handrick, I. Medeiros, M. Thales, J. Victor da Silva, I. Lorraine da Silva, and M. Ribeiro, ‘Exploring the use of static and dynamic analysis to improve the performance of the mining sandbox approach for android malware identification’, Journal of Systems and Software, vol. 183, p. 111092, 2022.
F. Wei, Y. Li, S. Roy, X. Ou, and W. Zhou, ‘Deep ground truth analysis of current Android malware’, In Int. Conf. on Detection of Intrusions and Malware, and Vulnerability Assessment, vol. 10327, pp. 252–276, 2017.
S. Garg and N. Baliyan, ‘Android security assessment: A review, taxonomy and research gap study’, Computers & Security, vol. 100, p. 102087, 2020.
S. Garg and N. Baliyan, ‘Data on vulnerability detection in Android’, Data in Brief, vol. 22, pp. 1081–1087, 2019.
K. Allix, T.F. Bissyandé, J. Klein, and Y.L. Traon, ‘AndroZoo: Collecting millions of Android apps for the research community’, In Working Conference on Mining Software Repositories (MSR), pp. 468–471, 2016.
T. Taheri, A.F. Kadir, and A.H. Lashkari, ‘Extensible Android malware detection and family classification using network-flows and API-Calls’, In Int. Conf. on Security Technology (ICCST), pp. 1–8, 2019.
Z.S. Harris, ‘Distributional structure’, Word, vol. 10, no. 2–3, pp. 146–162, 1954.
D. Jurafsky and J.H. Martin, Speech and Language Processing, Pearson Education India, 2000.
H.P. Luhn, ‘A statistical approach to mechanized encoding and searching of literary information’, IBM Journal of Research and Development, vol. 1, no. 4, pp. 309–317, 1957.
S. Noekhah, N.b. Salim, and N.H. Zakaria, ‘Opinion spam detection: Using multi-iterative graph-based model’, Information Processing & Management, vol. 57, no. 1, p. 102140, 2020.
S. Garg and N. Baliyan, ‘A novel parallel classifier scheme for vulnerability detection in Android’, Computers & Electrical Engineering, vol. 77, pp. 12–26, 2019.
T. Mitchell, Machine Learning, Pittsburgh: McGraw-Hill Education, 1997.
T. Chai, S. Prasad and S. Wang, ‘Boosting palmprint identification with gender information using DeepNet’. Future Generation Computer Systems, pp. 41–53, 2019.
G. Bajwa, M, Fazeen, R. Dantu, and S. Tanpure, “Unintentional bugs to vulnerability mapping in android applications.” In 2015 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 176–178, IEEE.