API Call-Based Malware Classification Using Recurrent Neural Networks
Keywords:Malware classification, API call sequence, Recurrent neural network, Long short-term memory, Gated recurrent unit (GRU)
Malicious software, called malware, can perform harmful actions on computer systems, which may cause economic damage and information leakage. Therefore, malware classification is meaningful and required to prevent malware attacks. Application programming interface (API) call sequences are easily observed and are good choices as features for malware classification. However, one of the main issues is how to generate a suitable feature for the algorithms of classification to achieve a high classification accuracy. Different malware sample brings API call sequence with different lengths, and these lengths may reach millions, which may cause computation cost and time complexities. Recurrent neural networks (RNNs) is one of the most versatile approaches to process time series data, which can be used to API call-based Malware calssification. In this paper, we propose a malware classification model with RNN, especially the long short-term memory (LSTM) and the gated recurrent unit (GRU), to classify variants of malware by using long-sequences of API calls. In numerical experiments, a benchmark dataset is used to illustrate the proposed approach and validate its accuracy. The numerical results show that the proposed RNN model works well on the malware classification.
“Pandalabs quarterly report: Q2 2017,” 2017. [Online]. Available: https://www.pandasecurity.com/mediacenter/src/uploads/2017/08/Pandalabs-2017-Q2-EN.pdf, accessed: 2020-07-27.
A. Damodaran, F. D. Troia, C. A. Visaggio, T. H. Austin, and M. Stamp, “A comparison of static, dynamic, and hybrid analysis for malware detection,” J. Comput. Virol. Hacking Techn., vol. 13(1), pp. 1–12, Dec. 29, 2015.
M. Ijaz, M. H. Durad, and M. Ismail, “Static and dynamic malware analysis using machine learning,” Proceedings of the 16th Int. Bhurban Conf. Appl. Sci.Technol. (IBCAST), pp. 793–806, Jan. 8–12, 2019.
L. Nataraj, V. Yegneswaran, P. Porras, and J. Zhang, “A comparative assessment of malware classification using binary texture analysis and dynamic analysis,” Proceedings of the 4th ACM Workshop Secur. Artif. Intell. AISec, pp. 21–30, Oct. 2011.
T. Shibahara, T. Yagi, M. Akiyama, D. Chiba, and T. Yada, “Efficient dynamic malware analysis based on network behavior using deep learning,” Proceedings of the 2016 IEEE Global Commun. Conf. (GLOBECOM), pp. 1–7, Dec. 4–8, 2016.
W. Wong and M. Stamp, “Hunting for metamorphic engines,” J. Comput. Virol., vol. 2(3), pp. 211–229, 2006.
M. K. Shankarapani, S. Ramamoorthy, R. S. Movva, S. Mukkamala, “Malware detection using assembly and API call sequences,” J. Comput. Virol., vol. 2(7), pp. 107–119, 2011.
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” Proceedings of the International Conference on Machine Learning, pp. 2048–2057, 2015.
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N Sainath, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29(6), pp. 82–97, 2012.
T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur, “Recurrent neural network based language model,” In Interspeech, vol. 2(3), 2010.
I. Kwon, and E. G. Im, “Extracting the representative API call patterns of malware families using recurrent neural network,” Proceedings of the International Conference on Research in Adaptive and Convergent Systems, pp. 202–207, 2017.
C. Li, X. Zhang, M. Qaosar, S. Ahmed, K. Md. R. Alam, and Y. Morimoto, “Multi-factor based stock price prediction using hybrid neural networks with attention mechanism,” Proceedings of the 2019 IEEE International Conference on Cloud and Big Data Computing (CBDCom), pp. 961–966, Fukuoka, Japan, 2019.
S. Hochreiter, and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9 (8), pp. 1735–1780, 1997.
C. Li, Minjia He, Mahboob Qaosar, Saleh Ahmed, and Yasuhiko Morimoto, “Capturing temporal dynamics of users’ preferences from purchase history big data for recommendation system,” Proceedings of the IEEE International Conference on Big Data 2018 (BigData2018), Seattle, WA, USA, USA, Dec. 2018.
K. Cho, B. V. Merrienboer, D. Bahdanau, and B. Yoshua, “On the properties of neural machine translation: encoder–decoder approaches,” arXiv:1409.1259, 2014.
M. G. Schultz, E. Eskin, F. Zadok, and S. J. Stolfo, “Data mining methods for detection of new malicious executables,” Proceedings of the IEEE Symp. Secur. Privacy. S&P, pp. 38–49, May, 2001.
J. Z. Kolter, and M. A. Maloof, “Learning to detect malicious executables in the wild,” Proceedings of the ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining KDD, pp. 470–478, 2004.
M. K. Shankarapani, S. Ramamoorthy, R. S. Movva, and S. Mukkamala, “Malware detection using assembly and API call sequences,” J. Comput. Virology, vol. 7(2), pp. 107–119, 2011.
E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. K. Nicholas, “Malware detection by eating a whole exe,” Proceedings of the Workshops 32nd AAAI Conf. Artif. Intell., pp. 268–276, 2018.
A. R. A. Gregio, P. L. de Geus, C. Kruegel, and G. Vigna, “Tracking memory writes for malware classification and code reuse identification,” Proceeding of the Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 134–143, Berlin, Heidelberg, 2013.
R. Canzanese, S. Mancoridis, and M. Kam, “System call-based detection of malicious processes,” Proceedings of the International conference on quality, reliability, and security (QRS), 2015.
R. Canzanese, S. Mancoridis, and M. Kam, “Run-time classification of malicious processes using system call analysis,” Proceedings of the 10th International Conference on Malicious and Unwanted Software (MALWARE), pp. 21–28, 2015.
M. Eskandari, Z. Khorshidpur, and S. Hashemi, “To incorporate sequential dynamic features in malware detection engines,” Proceedings of the Eur. Intell. Secur. Informat. Conf., pp. 46–52, Aug. 2012.
F. Ahmed, H. Hameed, M. Z. Shafiq, and M. Farooq, “Using spatiotemporal information in api calls with machine learning algorithms for malware detection,” Proceedings of the 2nd ACM workshop on Security and artificial intell., pp. 55–62, 2009.
S. S. Hansen, T. M. T. Larsen, M. Stevanovic, and J. M. Pedersen, “An approach for detection and family classification of malware based on behavioral analysis,” Proceedings of the International Conference on Computing, Networking and Communications (ICNC), pp. 1–5, 2016.
Y. Qiao, J. He, Y. Yang, and L. Ji, “Analyzing malware by abstracting the frequent itemsets in API call sequences,” Proceedings of the 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 265–270, 2013.
A. F. Yazi, F. O. Catak, and E. Gul, “Classification of Metamorphic Malware with Deep Learning (LSTM),” IEEE Signal Processing and Applications Conference, 2019.
F. O. Catak, and A. F. Yazi, “A benchmark API call dataset for windows PE malware classification,” CoRR, 2019.
F. O. Catak, A. F. Yazi, and O. Elezaj, “Deep learning based Sequential model for malware analysis using Windows exe API Calls,” PeerJ Computer Science, vol. 6(81), July 27, 2020.
S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse, and T. Yagi, “Malware detection with deep neural network using process behavior,” Proceedings of the IEEE 40th Annu. Comput. Softw. Appl. Conf. (COMPSAC), pp. 577–582, Jun. 10–14, Atlanta, GA, USA, 2016.