Malware Analysis Through Random Forest Approach

Authors

  • Ajay Kumar Department of Computer Science & Engineering, NIT Patna, Bihar, India https://orcid.org/0000-0001-6712-6490
  • Kumar Abhishek Department of Computer Science & Engineering, NIT Patna, Bihar, India https://orcid.org/0000-0001-6825-2392
  • Shishir Kumar Shandilya Division Head, Cyber Security and Digital Forensics, Vellore Institute of Technology, VIT Bhopal University, India https://orcid.org/0000-0002-3308-4445
  • Muhammad Rukunuddin Ghalib School of Computer Science and Engineering, Vellore Institute of Technology (VIT), Vellore, India

DOI:

https://doi.org/10.13052/jwe1540-9589.195610

Keywords:

Deep learning, Machine intelligence, signature-centric discovery, behavioral-based detection

Abstract

This paper gives precise and comprehensive detail along with a proposed system for malware detection using ML and Deep Learning techniques by integrating both behavior-based detection methods and signature-based methods. The primary purpose of this paper is (A) Outline difficulty identified with malware detection. (B) Represent detail and categorized ML technique for malware detection. (C) Investigating the structure of basic strategies in malware discovery. (D) Inspecting the essential deep learning approach for malware detection using a grouping of malware inside the data mining. The point of interest and downside of various malware detection approaches were analyzed based on evaluation strategy and their capability. The proposed model uses random forest for making an end-to-end pipeline for malware detection. During comparative study with five other state of the art models, the proposed model obtained accuracy of 99.7% on the dataset. The experimental results show the proposed model outperformed other five state of the art techniques. This research paper encourages the researcher to think about the best approach for malware detection.

Downloads

Download data is not yet available.

Author Biographies

Ajay Kumar, Department of Computer Science & Engineering, NIT Patna, Bihar, India

Ajay Kumar is a senior Network/IT Analyst, working with Government of India (Ministry of Defense). He has B-Tech from Central University Delhi with distinction and M Tech from VJTI, Mumbai with distinction. He is a Ph.D. scholar of Dept. of Computer Science & Engineering, NIT Patna. His area of research is network security, Authentication, IoT and Machine Learning. He has published more than 20 research papers in various renowned International conferences and SCI indexed journals.

Kumar Abhishek, Department of Computer Science & Engineering, NIT Patna, Bihar, India

Kumar Abhishek is working as an Assistant Professor, Department of Computer Science and Engineering, National Institute of Technology Patna, India. His area of interest lies in RDF, Semantic Web, Ontology, Semantic Sensor Web, Ontology mapping and Approximation. He has published more than 100 research papers in various renowned International conferences and SCI indexed journals.

Shishir Kumar Shandilya, Division Head, Cyber Security and Digital Forensics, Vellore Institute of Technology, VIT Bhopal University, India

Shishir Kumar Shandilya is the Division Head of Cyber Security and Digital Forensics at Vellore Institute of Technology, VIT Bhopal University, India. He is also a Visiting Research Fellow at Liverpool Hope University-United Kingdom, a Cambridge University Certified Professional Teacher Trainer, ACM Distinguished Speaker and a Senior Member of IEEE. He is an Academic Advisor to National Cyber Safety Security Standards, New Delhi. He has received IDA Teaching Excellence Award for distinctive use of technology in Teaching by Indian Didactics Association, Bangalore (2016) and Young Scientist Award for two consecutive years, 2005 and 2006, by Indian Science Congress MP Council of Science Technology. He has seven books published by Springer Nature-Singapore, IGI-USA, River-Denmark and Prentice Hall of India. His recently published book is on Advances in Cyber Security Analytics and Decision Systems by Springer.

Muhammad Rukunuddin Ghalib, School of Computer Science and Engineering, Vellore Institute of Technology (VIT), Vellore, India

Muhammad Rukunuddin Ghalib currently works at the Division of Analytics, VIT University. Dr. Muhammad does research in Artificial Neural Network, Data Mining and Computing in Mathematics, Natural Science, Engineering and Medicine. Currently working on IOT based artificial rain creation.

References

Dali Zhu, Hao Jin, Ying Yang, Di Wu, and Weiyi Chen. Deepflow: Deep learning-based malware detection by mining android application for abnormal usage of sensitive data. In 2017 IEEE symposium on computers and communications (ISCC), pages 438–443. IEEE, 2017.

Abhijeet Thakare, Euijong Lee, Ajay Kumar, Valmik B Nikam, and Young-Gab Kim. Parbac: Priority-attribute-based rbac model for azure iot cloud. IEEE Internet of Things Journal, 7(4):2890–2900, 2020.

Mayur Rahul, Narendra Kohli, Rashi Agarwal, and Sanju Mishra. Facial expression recognition using geometric features and modified hidden markov model. International Journal of Grid and Utility Computing, 10(5):488–496, 2019.

Devottam Gaurav, Sanju Mishra Tiwari, Ayush Goyal, Niketa Gandhi, and Ajith Abraham. Machine intelligence-based algorithms for spam filtering on document labeling. Soft Computing, pages 1–14, 2019.

Zahoor-Ur Rehman, Sidra Nasim Khan, Khan Muhammad, Jong Weon Lee, Zhihan Lv, Sung Wook Baik, Peer Azmat Shah, Khalid Awan, and Irfan Mehmood. Machine learning-assisted signature and heuristic-based detection of malwares in android devices. Computers & Electrical Engineering, 69:828–841, 2018.

Ajay Kumar, Kumar Abhishek, Amit Kumar Singh, Pranav Nerurkar, Madhav Chandane, Sunil Bhirud, Dhiren Patel, and Yann Busnel. Multilabel classification of remote sensed satellite imagery. Transactions on Emerging Telecommunications Technologies, page e3988, 2020.

Sanju Mishra, Rafid Sagban, Ali Yakoob, and Niketa Gandhi. Swarm intelligence in anomaly detection systems: an overview. International Journal of Computers and Applications, pages 1–10, 2018.

Saiteja Prasad Chatrati, Gahangir Hossain, Ayush Goyal, Anupama Bhan, Sayantan Bhattacharya, Devottam Gaurav, and Sanju Mishra Tiwari. Smart home health monitoring system for predicting type 2 diabetes and hypertension. Journal of King Saud University-Computer and Information Sciences, 2020.

Pranav Nerurkar, Madhav Chandane, and Sunil Bhirud. Survey of network embedding techniques for social networks. Turkish Journal of Electrical Engineering & Computer Sciences, 27(6):4768–4782, 2019.

Pranav Nerurkar, Aruna Pavate, Mansi Shah, and Samuel Jacob. Performance of internal cluster validations measures for evolutionary clustering. In Computing, Communication and Signal Processing, pages 305–312. Springer, 2019.

Shahid Alam, Zhengyang Qu, Ryan Riley, Yan Chen, and Vaibhav Rastogi. Droidnative: Automating and optimizing detection of android native code malware variants. computers & security, 65:230–246, 2017.

Hisham Shehata Galal, Yousef Bassyouni Mahdy, and Mohammed Ali Atiea. Behavior-based features model for malware detection. Journal of Computer Virology and Hacking Techniques, 12(2):59–67, 2016.

Monire Norouzi, Alireza Souri, and Majid Samad Zamini. A data mining classification approach for behavioral malware detection. Journal of Computer Networks and Communications, 2016, 2016.

Aashima Malhotra and Karan Bajaj. A hybrid pattern based text mining approach for malware detection using dbscan. CSI transactions on ICT, 4(2-4):141–149, 2016.

Zhiqiang Li, Lichao Sun, Qiben Yan, Witawas Srisa-an, and Zhenxiang Chen. Droidclassifier: Efficient adaptive mining of application-layer header for classifying android malware. In International Conference on Security and Privacy in Communication Systems, pages 597–616. Springer, 2016.

Muazzam Siddiqui, Morgan C Wang, and Joohan Lee. A survey of data mining techniques for malware detection using file features. In Proceedings of the 46th annual southeast regional conference on xx, pages 509–510, 2008.

Yuxin Ding, Xuebing Yuan, Ke Tang, Xiao Xiao, and Yibin Zhang. A fast malware detection algorithm based on objective-oriented association mining. Computers & security, 39:315–324, 2013.

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, and Yi-Fan Tseng. Malware detection systems based on api log data mining. In 2015 IEEE 39th annual computer software and applications conference, volume 3, pages 255–260. IEEE, 2015.

Pranav Nerurkar, Madhav Chandane, and Sunil Bhirud. A comparative analysis of community detection algorithms on social networks. In Computational Intelligence: Theories, Applications and Future Directions-Volume I, pages 287–298. Springer, 2019.

Pranav Nerurkar, Madhav Chandane, and Sunil Bhirud. Community detection using node attributes: A non-negative matrix factorization approach. In Computational Intelligence: Theories, Applications and Future Directions-Volume I, pages 275–285. Springer, 2019.

Munkhbayar Bat-Erdene, Hyundo Park, Hongzhe Li, Heejo Lee, and Mahn-Soo Choi. Entropy analysis to classify unknown packing algorithms for malware detection. International Journal of Information Security, 16(3):227–248, 2017.

Tobias Wüchner, Aleksander Cisłak, Martin Ochoa, and Alexander Pretschner. Leveraging compression-based graph mining for behavior-based malware detection. IEEE Transactions on Dependable and Secure Computing, 16(1):99–112, 2017.

Qiguang Miao, Jiachen Liu, Ying Cao, and Jianfeng Song. Malware detection using bilayer behavior abstraction and improved one-class support vector machines. International Journal of Information Security, 15(4):361–379, 2016.

Mojtaba Eskandari, Zeinab Khorshidpour, and Sattar Hashemi. Hdm-analyser: a hybrid analysis approach based on data mining techniques for malware detection. Journal of Computer Virology and Hacking Techniques, 9(2):77–93, 2013.

Stavros D Nikolopoulos and Iosif Polenakis. A graph-based model for malware detection and classification using system-call groups. Journal of Computer Virology and Hacking Techniques, 13(1):29–46, 2017.

Pranav Nerurkar, Archana Shirke, Madhav Chandane, and Sunil Bhirud. A novel heuristic for evolutionary clustering. Procedia Computer Science, 125:780–789, 2018.

Jiang Ming, Zhi Xin, Pengwei Lan, Dinghao Wu, Peng Liu, and Bing Mao. Impeding behavior-based malware analysis via replacement attacks to malware specifications. Journal of Computer Virology and Hacking Techniques, 13(3):193–207, 2017.

Pranav Nerurkar, Archana Shirke, Madhav Chandane, and Sunil Bhirud. Empirical analysis of data clustering algorithms. Procedia Computer Science, 125:770–779, 2018.

Shina Sheen, R Anitha, and V Natarajan. Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing, 151:905–912, 2015.

Altyeb Altaher. An improved android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (ehnfc) and permission-based features. Neural Computing and Applications, 28(12):4147–4157, 2017.

Weixuan Mao, Zhongmin Cai, Don Towsley, Qian Feng, and Xiaohong Guan. Security importance assessment for system objects and malware detection. Computers & Security, 68:47–68, 2017.

Hashem Hashemi, Amin Azmoodeh, Ali Hamzeh, and Sattar Hashemi. Graph embedding as a new approach for unknown malware detection. Journal of Computer Virology and Hacking Techniques, 13(3):153–166, 2017.

Songyang Wu, Pan Wang, Xun Li, and Yong Zhang. Effective detection of android malware based on the usage of data flow apis and machine learning. Information and software technology, 75:17–25, 2016.

Igor Santos, Felix Brezo, Xabier Ugarte-Pedrero, and Pablo G Bringas. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences, 231:64–82, 2013.

Abhishek Bhattacharya and Radha Tamal Goswami. Dmdam: data mining based detection of android malware. In Proceedings of the first international conference on intelligent computing and communication, pages 187–194. Springer, 2017.

Abhishek Bhattacharya and Radha Tamal Goswami. Comparative analysis of different feature ranking techniques in data mining-based android malware detection. In Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications, pages 39–49. Springer, 2017.

Alejandro Martín, Héctor D Menéndez, and David Camacho. Mocdroid: multi-objective evolutionary classifier for android malware detection. Soft Computing, 21(24):7405–7415, 2017.

Aya Hellal and Lotfi Ben Romdhane. Minimal contrast frequent pattern mining for malware detection. Computers & Security, 62:19–32, 2016.

Annamalai Narayanan, Mahinthan Chandramohan, Lihui Chen, and Yang Liu. A multi-view context-aware approach to android malware detection and malicious code localization. Empirical Software Engineering, 23(3):1222–1274, 2018.

Baojiang Cui, Haifeng Jin, Giuliana Carullo, and Zheli Liu. Service-oriented mobile malware detection system based on mining strategies. Pervasive and Mobile Computing, 24:101–116, 2015.

Yanfang Ye, Lingwei Chen, Shifu Hou, William Hardy, and Xin Li. Deepam: a heterogeneous deep learning framework for intelligent malware detection. Knowledge and Information Systems, 54(2):265–285, 2018.

Bin Wu, Tianliang Lu, Kangfeng Zheng, Dongmei Zhang, and Xing Lin. Smartphone malware detection model based on artificial immune system. China Communications, 11(13):86–92, 2014.

James B Fraley and Marco Figueroa. Polymorphic malware detection using topological feature extraction with data mining. In SoutheastCon 2016, pages 1–7. IEEE, 2016.

Mohamed El Boujnouni, Mohamed Jedra, and Noureddine Zahid. New malware detection framework based on n-grams and support vector domain description. In 2015 11th international conference on information assurance and security (IAS), pages 123–128. IEEE, 2015.

Amine Boukhtouta, Serguei A Mokhov, Nour-Eddine Lakhdari, Mourad Debbabi, and Joey Paquet. Network malware classification comparison using dpi and flow packet headers. Journal of Computer Virology and Hacking Techniques, 12(2):69–100, 2016.

Zhenlong Yuan, Yongqiang Lu, and Yibo Xue. Droiddetector: android malware characterization and detection using deep learning. Tsinghua Science and Technology, 21(1):114–123, 2016.

Aziz Mohaisen, Omar Alrawi, and Manar Mohaisen. Amal: High-fidelity, behavior-based automated malware analysis and classification. computers & security, 52:251–266, 2015.

Ping Wang and Yu-Shih Wang. Malware behavioural detection and vaccine development by using a support vector model classifier. Journal of Computer and System Sciences, 81(6):1012–1026, 2015.

Mozammel Chowdhury, Azizur Rahman, and Rafiqul Islam. Malware analysis and detection using data mining and machine learning classification. In International Conference on Applications and Techniques in Cyber Security and Intelligence, pages 266–274. Springer, 2017.

Downloads

Published

2020-12-11

How to Cite

Kumar, A., Abhishek, K., Shandilya, S. K., & Ghalib, M. R. . (2020). Malware Analysis Through Random Forest Approach. Journal of Web Engineering, 19(5-6), 795–818. https://doi.org/10.13052/jwe1540-9589.195610

Issue

Section

Articles