Machine Learning for Analyzing Malware
DOI:
https://doi.org/10.13052/2245-1439.631Keywords:
Malware analysis, Machine learning, Classification, Clustering, Association analysisAbstract
The Internet has become an indispensable part of people’s work and life, but it also provides favorable communication conditions for malwares. Therefore, malwares are endless and spread faster and become one of the main threats of current network security. Based on the malware analysis process, from the original feature extraction and feature selection to malware analysis, this paper introduces the machine learning algorithms such as classification, clustering and association analysis, and how to use these machine learning algorithms to effectively analyze the malware and its variants.
Downloads
References
Sikorski, M., and Honig, A. (2012). Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. San Francisco, CA: no starch press.
Liao, G., and Liu, J. (2016). A malicious code detection method based on data mining and machine learning. J. Inf. Secu. Res. 2, 74–79.
Huang, H. X, Zhang, L., and Deng, L. (2016). Review of Malware Detection Based on Data Mining. Computer Sci. 43, 13–18.
Lee, D. H., Song, I. S., and Kim, K. J., et al. (2011). A Study on Malicious Codes Pattern Analysis Using Visualization. In International Conference on Information Science and Applications. IEEE Computer Society, 1–5.
Kolter, J. Z., and Maloof, M. A. (2006). Learning to Detect and Classify Malicious Executables in the Wild. J. Mach. Learn. Res. 6, 2721–2744.
Schultz, M. G., Eskin, E., and Zadok, E., et al. (2000). Data Mining Methods for Detection of New Malicious Executables. Security and Privacy, 2001. S&P 2001. In Proceedings. 2001 IEEE Symposium. IEEE, 38–49.
Lai, Y. (2008). A Feature Selection for Malicious Detection. In Ninth Acis International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/distributed Computing. IEEE Computer Society, 365–370.
Mao, M., and Liu Y. (2010). Research on Malicious Program Detection Based on Machine Learning. Software Guide, 9, 23–25.
Domingos, P. (2012). A Few Useful Things to Know about Machine Learning. ACM, 55, 78–87.
Karim, M. E., Walenstein, A., and Lakhotia, A., et al. (2005). Malware Phylogeny Generation Using Permutations of Code. J. Computer Virology, 113–23.
Bilar, D. (2007). Opcodes as Predictor for Malware. Int. J. Electronic Security & Digital Forensics, 1, 156–168.
Santos, I., Brezo, F., and Ugarte-Pedrero, X., et al. (2013). Opcode Sequences as Representation of Executables for Data-Mining-Based Unknown Malware Detection. Information Sciences, 231, 64–82.
Perdisci, R., Lanzi, A., and Lee, W. (2008). Classification of Packed Executables for Accurate Computer Virus Detection. Pattern Recognition Letters 29, 1941–1946.
Ding, Y., Yuan, X., and Tang, K., et al. (2013). A Fast Malware Detection Algorithm Based on Objective-Oriented Association Mining. Computers & Security, 39, 315–324.
Lu, Y. B., Din, S. C., and Zheng, C. F., et al. (2010). Using multi-feature and classifier ensembles to improve malware detection. J. Chung Cheng Institute of Technology, 39, 57–72.
Zhao, G., Xu, K., and Xu, L., et al. (2015). Detecting APT malware infections based on malicious DNS and traffic analysis. IEEE Access, 3, 1132–1142.
Liang, C. (2012). Research on the Main Techonologies in Malware Code Detection. Yangzhou University.
Moskovitch, R., Feher, C., and Tzachar, N., et al. (2015). Unknown Malcode Detection Using OPCODE Representation. In Intelligence and Security Informatics, First European Conference, EuroISI 2008, Esbjerg, Denmark, 204–215.
Kolter, J., and Maloof, M. (2006). Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–44.
Siddiqui, M., Wang, M. C., and Lee, J. (2008). Data Mining Methods for Malware Detection Using Instruction Sequences. In IASTED International Conference on Artificial Intelligence and Applications. ACTA Press, 358–363.
http://www.kaggle.com/malware-classification
Fang, Z. (2011). Research and Implementation of Malware Classification. National University of Defense Technology.
Li, W. (2010). Research and Implementation of Mobile Customer Churn Prediction Based on Decision Tree Algorithm. Beijing University.
Zhu, L. J., and XU, Y. F. (2013). Application of c4.5 algorithm in unknown malicious code identification. J. Shenyang University Chemical Technol. 27, 78–82.
Perdisci, R., Corona, I., and Giacinto, G. (2012). Early detection of malicious flux networks via large-scale passive dns traffic analysis. IEEE Transactions on Dependable & Secure Computing 9, 714–726.
Mistry, P., Neagu, D., Trundle, P. R., et al. (2016). Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology. Soft. Comput. 20, 2967–2979.
Tian, R., Batten, L., and Islam, R., et al. (2010). “An automated classification system based on the strings of trojan and virus families”, in International Conference on Malicious and Unwanted Software. IEEE, 23–30.
Zhao, Z., Wang, J., and Wang, C. (2013). An unknown malware detection scheme based on the features of graph. Secu. Commun. Netw. 6, 239–246.
Shabtai, A., Moskovitch, R., and Feher, C., et al. (2012). Detecting unknown malicious code by applying classification techniques on opcode patterns. Secu. Inf. 11.
Zhu, K., Yin, B., and Mao, Y., et al. (2014). Malware classification approach based on valid window and naive bayes. J. Computer Res. Dev. 51, 373–381.
Sayfullina, L., Eirola, E., Komashinsky, D., Palumbo, P., Miche, Y., Lendasse, A., and Karhunen, J. (2015). Efficient detection of zero-day Android malware using normalized bernoulli naive bayes. In Trustcom/BigDataSE/ISPA, IEEE, 1, 198–205. IEEE.
Passerini, E., Paleari, R., and Martignoni, L., et al. (2008). FluXOR: Detecting and monitoring fast-flux service networks. In Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment, 186–206.
Li, W., Ge, J., and Dai, G. (2015). Detecting malware for android platform: An svm-based approach. In Cyber Security and Cloud Computing (CSCloud), 2015 IEEE 2nd International Conference, 464–469. IEEE.
McGrath, D. K., Kalafut, A., and Gupta, M. (2009). Phishing infrastructure fluxes all the way. IEEE Security & Privacy.
Yu, X., Zhang, B., Kang, L., and Chen, J. (2012). Fast-flux botnet detection based on weighted svm. Inf. Technol. J. 11, 1048–1055.
Ceri, S., Bozzon, A., Brambilla, M., Della, V. E., Fraternali, P., and Quarteroni, S. (2013). Web Information Retrieval. Springer Berlin Heidelberg.
Pelleg, D., and Moore, A. W. (2000). X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In ICML, 1, 727–734.
Sharma, A. (2015). Grouping the Executables to Detect Malwares with High Accuracy. In International Conference on Information Security and Privacy, 667–674.
Dietrich, C. J., Rossow, C., and Freiling, F. C., et al. (2012). On Botnets that Use DNS for Command and Control. In Seventh European Conference on Computer Network Defense. IEEE, 9–16.
Antonakakis, M., Perdisci, R., and Nadji, Y., et al. (2012). From Throw-away Traffic to Bots: Detecting the Rise of DGA-based Malware. In Usenix Conference on Security Symposium. 24–40.
Hierarchical Clustering. Available at: https://en.wikipedia.org/wiki/Hierarchical_clustering
Perdisci, R., Ariu, D., and Giacinto, G. (2013). Scalable fine-grained behavioral clustering of http-based malware. Computer Netw. 57, 487–500.
Chatzis, N., Popescu-Zeletin, R., and Brownlee, N. (2009). Email worm detection by wavelet analysis of DNS query streams. In Computational Intelligence in Cyber Security, 2009. CICS’09. IEEE Symposium on 53–60. IEEE.
Thomas, M., and Mohaisen, A. (2014). Kindred domains: detecting and clustering botnet domains using DNS traffic. In Proceedings of the 23rd International Conference on World Wide Web, 707–712. ACM.
Density-based spatial clustering of applications with noise (DBSCAN). Available at: https://en.wikipedia.org/wiki/DBSCAN
Qian, Y., Peng, G., and Wang, Y., et al. (2015). Homology analysis of malicious code and family clustering. Computer Engineering & Applications 51, 76–81.
Schiavoni, S., Maggi, F., Cavallaro, L., and Zanero, S. (2014). Phoenix: DGA-based botnet tracking and intelligence. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment 192–211. Springer, Cham.
Xiao, Y., Su, H., Qian, Y., and Peng, G. A., (2016). Behavior-based family clustering method for android malwares. Journal of Wuhan University: Natural Science Edition, 62, 429–436.
Han, J., and Kamber, M., (2011). Data Mining Concept and Techniques. Elsevier.
Zhang, W., Zheng, Q., and Shuai, J. M., et al. (2008). New malicious executables detection based on association rules. Computer Eng. 34, 172–174.
Wang, X. Z., Sun, L. C., and Zhang, M., et al. (2011). Malicious Behavior Detection Method Based on Sequential Pattern Discovery. Computer Eng. 37, 1–3.
Adebayo, O. S., and AbdulAziz, N. (2014). Android malware classification using static code analysis and apriori algorithm improved with particle swarm optimization. In Information and Communication Technologies (WICT), 2014 Fourth World Congress 123–128. IEEE.
Han, J., Pei, J., and Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM Sigmod Record 29, 1–12. ACM.
Kruczkowski, M., Niewiadomska-Szynkiewicz, E., and Kozakiewicz, A. (2015). FP-tree and SVM for Malicious Web Campaign Detection. In Asian Conference on Intelligent Information and Database Systems 193–201. Springer, Cham.
Li-xiong, Z., Xiao-lin, X., Jia, L., Lu, Z., Xuan-chen, P., Zhi-yuan, M., and Li-hong, Z. (2015). Malicious URL prediction based on community detection. In Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC), 2015 International Conference 1–7. IEEE.
Li, X., Dong, X., and Wang, Y. (2013). Malicious code forensics based on data mining. In Fuzzy Systems and Knowledge Discovery (FSKD), 2013 10th International Conference 978–983. IEEE.