Can We Detect Malicious Behaviours in Encrypted DNS Tunnels Using Network Flow Entropy?

Authors

  • Yulduz Khodjaeva Faculty of Computer Science, Dalhousie University, Canada
  • Nur Zincir-Heywood Faculty of Computer Science, Dalhousie University, Canada
  • Ibrahim Zincir Faculty of Engineering, Izmir University of Economics, Turkey

DOI:

https://doi.org/10.13052/jcsm2245-1439.1135

Keywords:

DNS over HTTPS, Entropy, Cybersecurity, Machine Learning, tunnelling atacks

Abstract

This paper explores the concept of entropy of a flow to augment flow statistical features for encrypted DNS tunnelling detection, specifically DNS over HTTPS traffic. To achieve this, the use of flow exporters, namely Argus, DoHlyzer and Tranalyzer2 are studied. Statistical flow features automatically generated by the aforementioned tools are then augmented with the flow entropy. In this work, flow entropy is calculated using three different techniques: (i) entropy over all packets of a flow, (ii) entropy over the first 96 bytes of a flow, and (iii) entropy over the first n-packets of a flow. These features are provided as input to ML classifiers to detect malicious behaviours over four publicly available datasets. This model is optimized using TPOT-AutoML system, where the Random Forest classifier provided the best performance achieving an average F-measure of 98% over all testing datasets employed.

Downloads

Download data is not yet available.

Author Biographies

Yulduz Khodjaeva, Faculty of Computer Science, Dalhousie University, Canada

Yulduz Khodjaeva has recently received her Master of Computer Science degree from Dalhousie University, Canada. During her studies, she carried out research in the cybersecurity area, particularly the detection of malicious behaviours in DNS tunnels. She published her conference paper at ARES 2021: the 16th International Conference on Availability, Reliability and Security. Currently, Yulduz is working as a Software Developer at EY Canada.

Nur Zincir-Heywood, Faculty of Computer Science, Dalhousie University, Canada

Nur Zincir-Heywood is a University Research Professor of Computer Science at Dalhousie University. Her research interests include machine learning for cyber security, and network/service operations and management. She serves as an Associate Editor of the IEEE Transactions on Network and Service Management and Wiley International Journal of Network Management. She also promotes information communication technologies to wider audiences as a tech columnist for CBC Information Morning and a Board Member on CS-Can/INFO-Can.

Ibrahim Zincir, Faculty of Engineering, Izmir University of Economics, Turkey

Ibrahim Zincir is an Assistant Professor in the Department of Software Engineering at Izmir University of Economics. Dr. Zincir received his Ph.D. in Computer Engineering from Plymouth University with a focus on data mining for secure mobile networks. He is a member of the IEEE and regularly promotes software engineering to a wider audience through several media outlets. His research interests include data mining, machine learning, mobile networks and web centric business applications.

References

Argus. https://openargus.org/using-argus. Accessed: 20-Sep-2021.

CIRA-CIC-DoHBrw-2020. https://www.unb.ca/cic/datasets/dohbrw-2020.html. Accessed: 10-Oct-2021.

DoHlyzer. https://github.com/ahlashkari/DoHlyzer. Accessed: 10-Oct-2021.

DoHMeter. https://github.com/ahlashkari/DOHlyzer/tree/master/DoHMeter. Accessed: 10-Oct-2021.

Impact Cyber Trust. https://www.impactcybertrust.org. Accessed: 6-Mar-2021.

Tranalyzer. https://tranalyzer.com. Accessed: 19-Sep-2021.

AutoML: TPOT. http://automl.info/tpot/, 2016. Accessed: Nov-2021.

Introduction to Genetic Algorithms. https://towardsdatascience.com/introduction-to-genetic-algorithms-including-example-code-e396e98d8bf3, 2017. Accessed: 21-Nov-2021.

Gini coefficient. https://en.wikipedia.org/wiki/Gini_coefficient, 2021. Accessed: 17-Nov-2021.

HTTPS encryption on the web. https://transparencyreport.google.com/https/overview?hl=en, 2021. Accessed: 13-Nov-2021.

E. Alpaydin. Introduction to machine learning. MIT Press, 2014.

Epistasis Lab at UPenn. GitHub repository of TPOT tool. https://github.com/EpistasisLab/tpot, 2018. Accessed: 1-Nov-2021.

E. Sandi Aung and H. Yamana. Url-based phishing detection using the entropy of non-alphanumeric characters. In Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services, iiWAS 2019, Munich, Germany, December 2–4, 2019, pages 385–392. ACM, 2019.

Y. M. Banadaki. Detecting malicious dns over https traffic in domain name system using machine learning classifiers. In Journal of Computer Science and Applications, pages 46–55. Science and Education Publishing, 2020.

M. Behnke, N. Briner, D. Cullen, K. Schwerdtfeger, J. Warren, R. Basnet, and T. Doleck. Feature engineering and machine learning model comparison for malicious activity detection in the dns-over-https protocol. IEEE Access, 9:129902–129916, 2021.

P. Berezinski, J. Pawelec, M. Malowidzki, and R. Piotrowski. Entropy-based internet traffic anomaly detection: A case study. In Wojciech Zamojski, Jacek Mazurkiewicz, Jaroslaw Sugier, Tomasz Walkowiak, and Janusz Kacprzyk, editors, Proceedings of the Ninth International Conference on Dependability and Complex Systems DepCoS-RELCOMEX. June 30 – July 4, 2014, Brunów, Poland, volume 286 of Advances in Intelligent Systems and Computing, pages 47–58. Springer, 2014.

L. Bernaille and R. Teixeira. Early recognition of encrypted applications. In Steve Uhlig, Konstantina Papagiannaki, and Olivier Bonaventure, editors, Passive and Active Network Measurement, 8th Internatinoal Conference, PAM 2007, Louvain-la-neuve, Belgium, April 5-6, 2007, Proceedings, volume 4427 of Lecture Notes in Computer Science, pages 165–175. Springer, 2007.

T. Böttger, F. Cuadrado, G. Antichi, E. Leão Fernandes, G. Tyson, I. Castro, and S. Uhlig. An empirical study of the cost of dns-over-https. In Proceedings of the Internet Measurement Conference, IMC 2019, Amsterdam, The Netherlands, October 21-23, 2019, pages 15–21. ACM, 2019.

K. Bumanglag and H. Kettani. On the impact of DNS over HTTPS paradigm on cyber systems. In 3rd International Conference on Information and Computer Technologies, ICICT 2020, San Jose, CA, USA, March 9-12, 2020, pages 494–499. IEEE, 2020.

S. Burschka and B. Dupasquier. Tranalyzer: Versatile high performance network traffic analyser. In 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016, Athens, Greece, December 6-9, 2016, pages 1–8. IEEE, 2016.

A.J. Campbell and N. Zincir-Heywood. Exploring tunneling behaviours in malicious domains with self-organizing maps. In 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, Canberra, Australia, December 1–4, 2020, pages 1419–1426. IEEE, 2020.

A. Das, M. Shen, M. Shashanka, and J. Wang. Detection of exfiltration and tunneling over DNS. In Xuewen Chen, Bo Luo, Feng Luo, Vasile Palade, and M. Arif Wani, editors, 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017, Cancun, Mexico, December 18-21, 2017, pages 737–742. IEEE, 2017.

John W. Dorfinger P., Panholzer G. Real-time detection of encrypted traffic based on entropy estimation. Master’s thesis, 2011.

Dheeru Dua and Casey Graff. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml, 2017. Accessed: 15-Oct-2021.

Claude E.Shannon. Prediction and entropy of printed english. Bell system technical journal, 30:50–64, January 1951.

Tyrell Fawcett. Exfild: a tool for the detection of data exfiltration using entropy and encryption characteristics of network traffic. Master’s thesis, University of Delaware, 2010.

L. Ferreira, A. Luiz Pilastri, C. Manuel Martins, P. Miguel Pires, and P. Cortez. A comparison of automl tools for machine learning, deep learning and xgboost. In International Joint Conference on Neural Networks, IJCNN 2021, Shenzhen, China, July 18–22, 2021, pages 1–8. IEEE, 2021.

F. Haddadi and N. Zincir-Heywood. Benchmarking the effect of flow exporters and protocol filters on botnet traffic classification. IEEE Syst. J., 10(4):1390–1401, 2016.

J. Hale. TPOT Automated Machine Learning in Python. https://towardsdatascience.com/tpot-automated-machine-learning-in-python-4c063b3e5de9, 2018. Accessed: 1-Nov-2021.

D. Hjelm. A New Needle and Haystack: Detecting DNS over HTTPS Usage. https://www.sans.org/reading-room/whitepapers/dns/needle-haystack-detecting-dns-https-usage-39160. Accessed: 10-May-2021.

Arash Habibi Lashkari Iman Sharafaldin and Ali A. Ghorbani. CIC-IDS 2017. https://www.unb.ca/cic/datasets/ids-2017.html. Accessed: 5-Mar-2021.

J.Ahmed, H. Habibi Gharakheili, Q. Raza, C. Russell, and V. Sivaraman. Real-time detection of DNS exfiltration and tunneling from enterprise networks. In Joe Betser, Carol J. Fung, Alex Clemm, Jérôme François, and Shingo Ata, editors, IFIP/IEEE International Symposium on Integrated Network Management, IM 2019, Washington, DC, USA, April 09–11, 2019, pages 649–653. IFIP, 2019.

S. Khanchi, A. Vahdat, M. I. Heywood, and Nur Zincir-Heywood. On botnet detection with genetic programming under streaming data label budgets and class imbalance. Swarm Evol. Comput., 39:123–140, 2018.

Y. Khodjaeva and N. Zincir-Heywood. Network flow entropy for identifying malicious behaviours in DNS tunnels. In Delphine Reinhardt and Tilo Müller, editors, ARES 2021: The 16th International Conference on Availability, Reliability and Security, Vienna, Austria, August 17-20, 2021, pages 72:1–72:7. ACM, 2021.

A. Khormali, J. Park, H. Alasmary, A. Anwar, M. Saad, and D. A. Mohaisen. Domain name system security and privacy: A contemporary survey. Comput. Networks, 185:107699, 2021.

Kh.Shahbar and N. Zincir-Heywood. How far can we push flow analysis to identify encrypted anonymity network traffic? In 2018 IEEE/IFIP Network Operations and Management Symposium, NOMS 2018, Taipei, Taiwan, April 23-27, 2018, pages 1–6. IEEE, 2018.

A. Lakhina, M. Crovella, and Ch. Diot. Mining anomalies using traffic feature distributions. In R. Guérin, R. Govindan, and G. Minshall, editors, Proceedings of the ACM SIGCOMM 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Philadelphia, Pennsylvania, USA, August 22-26, 2005, pages 217–228. ACM, 2005.

D. Lambion, M. Josten, F. G. Olumofin, and M. De Cock. Malicious DNS tunneling detection in real-traffic DNS data. In X. Wu, Ch. Jermaine, L. Xiong, X. Hu, O. Kotevska, S. Lu, W. Xu, S. Aluru, C. Zhai, E. Al-Masri, Zh. Chen, and J. Saltz, editors, 2020 IEEE International Conference on Big Data (IEEE BigData 2020), Atlanta, GA, USA, December 10–13, 2020, pages 5736–5738. IEEE, 2020.

Duc C. Le and N. Zincir-Heywood. A frontier: Dependable, reliable and secure machine learning for network/system management. J. Netw. Syst. Manag., 28(4):827–849, 2020.

Duc C. Le, N. Zincir-Heywood, and M. I. Heywood. Data analytics on network traffic flows for botnet behaviour detection. In 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016, Athens, Greece, December 6–9, 2016, pages 1–7. IEEE, 2016.

Ch. Lu, B. Liu, Zh. Li, Sh. Hao, H. Duan, M. Zhang, Ch. Leng, Y. Liu, Z. Zhang, and J. Wu. An end-to-end, large-scale measurement of dns-over-encryption: How far have we come? In Proceedings of the Internet Measurement Conference, IMC 2019, Amsterdam, The Netherlands, October 21–23, 2019, pages 22–35. ACM, 2019.

M. Nidhal Mejri and J. Ben-Othman. Entropy as a new metric for denial of service attack detection in vehicular ad-hoc networks. In Ravi Prakash, Azzedine Boukerche, Cheng Li, and Falko Dressler, editors, 17th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, MSWiM’14, Montreal, QC, Canada, September 21–26, 2014, pages 73–79. ACM, 2014.

M.MontazeriShatoori, L. Davidson, G. Kaur, and A. Habibi Lashkari. Detection of doh tunnels using time-series classification of encrypted traffic. In IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2020, Calgary, AB, Canada, August 17-22, 2020, pages 63–70. IEEE, 2020.

R. S. Olson, N. Bartley, R. J. Urbanowicz, and J. H. Moore. Evaluation of a tree-based pipeline optimization tool for automating data science. In Tobias Friedrich, Frank Neumann, and Andrew M. Sutton, editors, Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, Denver, CO, USA, July 20–24, 2016, pages 485–492. ACM, 2016.

F. Pacheco, E. Exposito, M. Gineste, C. Baudoin, and J. Aguilar. Towards the deployment of machine learning solutions in network traffic classification: A systematic survey. IEEE Commun. Surv. Tutorials, 21(2):1988–2014, 2019.

R. Raghav, Pratheesh, K. Shedbalkar, Minal Moharir, N Deepamala, P Ramakanth Kumar, and MGP Tanmayananda. Analysis and detection of malicious activity on doh traffic. In 2021 2nd Global Conference for Advancement in Technology (GCAT), pages 1–5, 2021.

M. Seufert, R. Schatz, N. Wehner, B. Gardlo, and P. Casas. Is QUIC becoming the new tcp? on the potential impact of a new protocol on networked multimedia qoe. In 11th International Conference on Quality of Multimedia Experience QoMEX 2019, Berlin, Germany, June 5–7, 2019, pages 1–6. IEEE, 2019.

I. Sharafaldin, A. Habibi Lashkari, and A. A. Ghorbani. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Paolo Mori, Steven Furnell, and Olivier Camp, editors, Proceedings of the 4th International Conference on Information Systems Security and Privacy, ICISSP 2018, Funchal, Madeira – Portugal, January 22–24, 2018, pages 108–116. SciTePress, 2018.

S.K. Singh and P.D. Roy. Detecting malicious dns over https traffic using machine learning. In 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), pages 1–6, 2020.

Georgia Tech. GT Malware Passive DNS Data Daily Feed. http://dx.doi.org/10.23721/102/1354027. Accessed: 6-Mar-2021.

M. Zhou, Sh. Zhang, Y. Qiu, H. Luo, and Zh. Wu. Entropy-based spammer detection. In Proceedings of the 10th International Conference on Internet Multimedia Computing and Service, Nanjing, China, August 17–19, 2018, pages 43:1–43:6. ACM, 2018.

Downloads

Published

2022-08-14

Issue

Section

Extended Workshop Papers