Machine Learning for Analyzing Malware

Authors

  • Zhenyan Liu Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China
  • Yifei Zeng Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China
  • Yida Yan Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China
  • Pengfei Zhang Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China
  • Yong Wang Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China

DOI:

https://doi.org/10.13052/2245-1439.631

Keywords:

Malware analysis, Machine learning, Classification, Clustering, Association analysis

Abstract

The Internet has become an indispensable part of people’s work and life, but it also provides favorable communication conditions for malwares. Therefore, malwares are endless and spread faster and become one of the main threats of current network security. Based on the malware analysis process, from the original feature extraction and feature selection to malware analysis, this paper introduces the machine learning algorithms such as classification, clustering and association analysis, and how to use these machine learning algorithms to effectively analyze the malware and its variants.

 

Downloads

Download data is not yet available.

Author Biographies

Zhenyan Liu, Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China

Zhenyan Liu works at School of Software, Beijing Institute of Technology. She received her Ph.D. degree in Computer Architecture from Institute of Computing Technology Chinese Academy of Sciences, China. Her current research interests include big data, artificial intelligence, and cyber security. She’s a fellow of the China Computer Federation (CCF).

 

Yifei Zeng, Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China

Yifei Zeng is a M.Sc. student at Beijing Institute of Technology since autumn 2017. He attended the Guangdong Ocean University of Zhanjiang, China where he received his B.Sc. degree in Software Engineering in 2016. His M.Sc. work focuses on cyber security and artificial intelligence.

Yida Yan, Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China

Yida Yan is a M.Sc. student at Beijing Institute of Technology since autumn 2016. She attended the Hebei Normal University, China where she received her B.Sc. degree in Software Engineering in 2015. Her M.Sc. work focuses on data mining and cyber security. She has obtained Data Mining Senior Engineer Certificate, in China.

Pengfei Zhang, Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China

Pengfei Zhang is a M.Sc. student at Beijing Institute of Technology since autumn 2017. He attended the Nanjing University of Science & Technology, China where he received his B.Sc. degree in Computer Science in 2015. He has been a member of TP-Link from 2015 to 2016. His M.Sc. work focuses on cyber security and artificial intelligence.

Yong Wang, Beijing Key Laboratory of Software Security Engineering Technology, School of Software, Beijing Institute of Technology, Beijing 100081, China

Yong Wang works at School of Software, Beijing Institute of Technology. She received her Ph.D. degree in Computer Science from Beijing Institute of Technology, China. Her current research interests include cyber security and machine learning. She’s a fellow of the China Computer Federation (CCF).

References

Sikorski, M., and Honig, A. (2012). Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. San Francisco, CA: no starch press.

Liao, G., and Liu, J. (2016). A malicious code detection method based on data mining and machine learning. J. Inf. Secu. Res. 2, 74–79.

Huang, H. X, Zhang, L., and Deng, L. (2016). Review of Malware Detection Based on Data Mining. Computer Sci. 43, 13–18.

Lee, D. H., Song, I. S., and Kim, K. J., et al. (2011). A Study on Malicious Codes Pattern Analysis Using Visualization. In International Conference on Information Science and Applications. IEEE Computer Society, 1–5.

Kolter, J. Z., and Maloof, M. A. (2006). Learning to Detect and Classify Malicious Executables in the Wild. J. Mach. Learn. Res. 6, 2721–2744.

Schultz, M. G., Eskin, E., and Zadok, E., et al. (2000). Data Mining Methods for Detection of New Malicious Executables. Security and Privacy, 2001. S&P 2001. In Proceedings. 2001 IEEE Symposium. IEEE, 38–49.

Lai, Y. (2008). A Feature Selection for Malicious Detection. In Ninth Acis International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/distributed Computing. IEEE Computer Society, 365–370.

Mao, M., and Liu Y. (2010). Research on Malicious Program Detection Based on Machine Learning. Software Guide, 9, 23–25.

Domingos, P. (2012). A Few Useful Things to Know about Machine Learning. ACM, 55, 78–87.

Karim, M. E., Walenstein, A., and Lakhotia, A., et al. (2005). Malware Phylogeny Generation Using Permutations of Code. J. Computer Virology, 113–23.

Bilar, D. (2007). Opcodes as Predictor for Malware. Int. J. Electronic Security & Digital Forensics, 1, 156–168.

Santos, I., Brezo, F., and Ugarte-Pedrero, X., et al. (2013). Opcode Sequences as Representation of Executables for Data-Mining-Based Unknown Malware Detection. Information Sciences, 231, 64–82.

Perdisci, R., Lanzi, A., and Lee, W. (2008). Classification of Packed Executables for Accurate Computer Virus Detection. Pattern Recognition Letters 29, 1941–1946.

Ding, Y., Yuan, X., and Tang, K., et al. (2013). A Fast Malware Detection Algorithm Based on Objective-Oriented Association Mining. Computers & Security, 39, 315–324.

Lu, Y. B., Din, S. C., and Zheng, C. F., et al. (2010). Using multi-feature and classifier ensembles to improve malware detection. J. Chung Cheng Institute of Technology, 39, 57–72.

Zhao, G., Xu, K., and Xu, L., et al. (2015). Detecting APT malware infections based on malicious DNS and traffic analysis. IEEE Access, 3, 1132–1142.

Liang, C. (2012). Research on the Main Techonologies in Malware Code Detection. Yangzhou University.

Moskovitch, R., Feher, C., and Tzachar, N., et al. (2015). Unknown Malcode Detection Using OPCODE Representation. In Intelligence and Security Informatics, First European Conference, EuroISI 2008, Esbjerg, Denmark, 204–215.

Kolter, J., and Maloof, M. (2006). Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–44.

Siddiqui, M., Wang, M. C., and Lee, J. (2008). Data Mining Methods for Malware Detection Using Instruction Sequences. In IASTED International Conference on Artificial Intelligence and Applications. ACTA Press, 358–363.

http://www.kaggle.com/malware-classification

Fang, Z. (2011). Research and Implementation of Malware Classification. National University of Defense Technology.

Li, W. (2010). Research and Implementation of Mobile Customer Churn Prediction Based on Decision Tree Algorithm. Beijing University.

Zhu, L. J., and XU, Y. F. (2013). Application of c4.5 algorithm in unknown malicious code identification. J. Shenyang University Chemical Technol. 27, 78–82.

Perdisci, R., Corona, I., and Giacinto, G. (2012). Early detection of malicious flux networks via large-scale passive dns traffic analysis. IEEE Transactions on Dependable & Secure Computing 9, 714–726.

Mistry, P., Neagu, D., Trundle, P. R., et al. (2016). Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology. Soft. Comput. 20, 2967–2979.

Tian, R., Batten, L., and Islam, R., et al. (2010). “An automated classification system based on the strings of trojan and virus families”, in International Conference on Malicious and Unwanted Software. IEEE, 23–30.

Zhao, Z., Wang, J., and Wang, C. (2013). An unknown malware detection scheme based on the features of graph. Secu. Commun. Netw. 6, 239–246.

Shabtai, A., Moskovitch, R., and Feher, C., et al. (2012). Detecting unknown malicious code by applying classification techniques on opcode patterns. Secu. Inf. 11.

Zhu, K., Yin, B., and Mao, Y., et al. (2014). Malware classification approach based on valid window and naive bayes. J. Computer Res. Dev. 51, 373–381.

Sayfullina, L., Eirola, E., Komashinsky, D., Palumbo, P., Miche, Y., Lendasse, A., and Karhunen, J. (2015). Efficient detection of zero-day Android malware using normalized bernoulli naive bayes. In Trustcom/BigDataSE/ISPA, IEEE, 1, 198–205. IEEE.

Passerini, E., Paleari, R., and Martignoni, L., et al. (2008). FluXOR: Detecting and monitoring fast-flux service networks. In Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment, 186–206.

Li, W., Ge, J., and Dai, G. (2015). Detecting malware for android platform: An svm-based approach. In Cyber Security and Cloud Computing (CSCloud), 2015 IEEE 2nd International Conference, 464–469. IEEE.

McGrath, D. K., Kalafut, A., and Gupta, M. (2009). Phishing infrastructure fluxes all the way. IEEE Security & Privacy.

Yu, X., Zhang, B., Kang, L., and Chen, J. (2012). Fast-flux botnet detection based on weighted svm. Inf. Technol. J. 11, 1048–1055.

Ceri, S., Bozzon, A., Brambilla, M., Della, V. E., Fraternali, P., and Quarteroni, S. (2013). Web Information Retrieval. Springer Berlin Heidelberg.

Pelleg, D., and Moore, A. W. (2000). X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In ICML, 1, 727–734.

Sharma, A. (2015). Grouping the Executables to Detect Malwares with High Accuracy. In International Conference on Information Security and Privacy, 667–674.

Dietrich, C. J., Rossow, C., and Freiling, F. C., et al. (2012). On Botnets that Use DNS for Command and Control. In Seventh European Conference on Computer Network Defense. IEEE, 9–16.

Antonakakis, M., Perdisci, R., and Nadji, Y., et al. (2012). From Throw-away Traffic to Bots: Detecting the Rise of DGA-based Malware. In Usenix Conference on Security Symposium. 24–40.

Hierarchical Clustering. Available at: https://en.wikipedia.org/wiki/Hierarchical_clustering

Perdisci, R., Ariu, D., and Giacinto, G. (2013). Scalable fine-grained behavioral clustering of http-based malware. Computer Netw. 57, 487–500.

Chatzis, N., Popescu-Zeletin, R., and Brownlee, N. (2009). Email worm detection by wavelet analysis of DNS query streams. In Computational Intelligence in Cyber Security, 2009. CICS’09. IEEE Symposium on 53–60. IEEE.

Thomas, M., and Mohaisen, A. (2014). Kindred domains: detecting and clustering botnet domains using DNS traffic. In Proceedings of the 23rd International Conference on World Wide Web, 707–712. ACM.

Density-based spatial clustering of applications with noise (DBSCAN). Available at: https://en.wikipedia.org/wiki/DBSCAN

Qian, Y., Peng, G., and Wang, Y., et al. (2015). Homology analysis of malicious code and family clustering. Computer Engineering & Applications 51, 76–81.

Schiavoni, S., Maggi, F., Cavallaro, L., and Zanero, S. (2014). Phoenix: DGA-based botnet tracking and intelligence. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment 192–211. Springer, Cham.

Xiao, Y., Su, H., Qian, Y., and Peng, G. A., (2016). Behavior-based family clustering method for android malwares. Journal of Wuhan University: Natural Science Edition, 62, 429–436.

Han, J., and Kamber, M., (2011). Data Mining Concept and Techniques. Elsevier.

Zhang, W., Zheng, Q., and Shuai, J. M., et al. (2008). New malicious executables detection based on association rules. Computer Eng. 34, 172–174.

Wang, X. Z., Sun, L. C., and Zhang, M., et al. (2011). Malicious Behavior Detection Method Based on Sequential Pattern Discovery. Computer Eng. 37, 1–3.

Adebayo, O. S., and AbdulAziz, N. (2014). Android malware classification using static code analysis and apriori algorithm improved with particle swarm optimization. In Information and Communication Technologies (WICT), 2014 Fourth World Congress 123–128. IEEE.

Han, J., Pei, J., and Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM Sigmod Record 29, 1–12. ACM.

Kruczkowski, M., Niewiadomska-Szynkiewicz, E., and Kozakiewicz, A. (2015). FP-tree and SVM for Malicious Web Campaign Detection. In Asian Conference on Intelligent Information and Database Systems 193–201. Springer, Cham.

Li-xiong, Z., Xiao-lin, X., Jia, L., Lu, Z., Xuan-chen, P., Zhi-yuan, M., and Li-hong, Z. (2015). Malicious URL prediction based on community detection. In Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC), 2015 International Conference 1–7. IEEE.

Li, X., Dong, X., and Wang, Y. (2013). Malicious code forensics based on data mining. In Fuzzy Systems and Knowledge Discovery (FSKD), 2013 10th International Conference 978–983. IEEE.

Downloads

Published

2017-11-17

How to Cite

1.
Liu Z, Zeng Y, Yan Y, Zhang P, Wang Y. Machine Learning for Analyzing Malware. JCSANDM [Internet]. 2017 Nov. 17 [cited 2024 Apr. 19];6(3):227-44. Available from: https://journals.riverpublishers.com/index.php/JCSANDM/article/view/5243

Issue

Section

Articles