Tree-based Ensemble Algorithms and Feature Selection Method for Intelligent Distributed Denial of Service Attack Detection

Fauzi Adi  Rafrastara; Guruh Fajar  Shidik; Wildanil  Ghozi; Nova  Rijati; Oki  Setiono

doi:10.13052/jcsm2245-1439.1411

2025, Cyber Security Issues and Solutions

2025

Tree-based Ensemble Algorithms and Feature Selection Method for Intelligent Distributed Denial of Service Attack Detection

Cyber Security Issues and Solutions

https://doi.org/10.13052/jcsm2245-1439.1411

Published 2025-02-28

Fauzi Adi Rafrastara⁺⁻
Guruh Fajar Shidik⁺⁻
Wildanil Ghozi⁺⁻
Nova Rijati⁺⁻
Oki Setiono⁺⁻

Fauzi Adi Rafrastara

Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia

Guruh Fajar Shidik

Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia

Wildanil Ghozi

Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia

Nova Rijati

Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia

Oki Setiono

Faculty of Health Science, Universitas Dian Nuswantoro, Semarang, Indonesia

PDF

HTML

Keywords

DDoS detection
ensemble learning
feature selection

How to Cite

[1]

F. A. . Rafrastara, G. F. . Shidik, W. . Ghozi, N. . Rijati, and O. . Setiono, “Tree-based Ensemble Algorithms and Feature Selection Method for Intelligent Distributed Denial of Service Attack Detection”, JCSANDM, vol. 14, no. 01, pp. 1–24, Feb. 2025.

Abstract

DDoS is one of hackers’ mainstay weapons which can cause a decrease in network performance and damage servers. To overcome DDoS attacks, the challenge is to detect and block attacks simultaneously. Traditional classification methods are not effective at distinguishing between attack traffic and normal traffic. In this study, we introduce an ensemble-based machine learning algorithm, paired with an improved Gini index for feature selection, to detect DDoS attacks. Our approach used UNSW_NB15 dataset from Kaggle. Three tree-based ensemble algorithms are used in this research, namely Random Forest, XGBoost, and AdaBoost. By combining each ensemble algorithms with enhanced gini index, all those three algorithms outperformed the baseline models that used single decision tree classifier. XGBoost with gini index achieved the best result with 97.30% for accuracy, recall, and precision, and 96.90% for F1-score. This approach is able to improve the algorithm’s performance while loweing the number of features.

https://doi.org/10.13052/jcsm2245-1439.1411

PDF

HTML

References

F. Cremer, B. Sheehan, M. Mullins, M. Fortmann, B. J. Ryan, and S. Materne, “On the insurability of cyber warfare: An investigation into the German cyber insurance market,” Computers & Security, vol. 142, p. 103886, Jul. 2024, doi: 10.1016/j.cose.2024.103886.

R. Mall, K. Abhishek, M. S., A. Shankar, and A. Kumar, “Stacking ensemble approach for DDoS attack detection in software-defined cyber–physical systems,” Computers and Electrical Engineering, vol. 107, p. 108635, Apr. 2023, doi: 10.1016/j.compeleceng.2023.108635.

M. Alkasassbeh, G. Al-Naymat, A. B. A. Hassanat, and M. Almseidin, “Detecting Distributed Denial of Service Attacks Using Data Mining Techniques,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 1, 2016, doi: 10.14569/IJACSA.2016.070159.

M. A. H. Azmi, C. F. M. Foozy, K. A. M. Sukri, N. A. Abdullah, I. R. A. Hamid, and H. Amnur, “Feature Selection Approach to Detect DDoS Attack Using Machine Learning Algorithms,” JOIV: International Journal on Informatics Visualization, vol. 5, no. 4, p. 395, Dec. 2021, doi: 10.30630/joiv.5.4.734.

M. Arunadevi and V. Sathya, “DDoS Attack Detection using Back Propagation Neural Network Optimized by Bacterial Colony Optimization,” IJIES, vol. 16, no. 5, pp. 301–312, Oct. 2023, doi: 10.22266/ijies2023.1031.26.

Z. Liu, Y. Wang, F. Feng, Y. Liu, Z. Li, and Y. Shan, “A DDoS Detection Method Based on Feature Engineering and Machine Learning in Software-Defined Networks,” Sensors, vol. 23, no. 13, p. 6176, Jul. 2023, doi: 10.3390/s23136176.

M. Khare and R. Oak, “Real-Time Distributed Denial-of-Service (DDoS) Attack Detection Using Decision Trees for Server Performance Maintenance,” in Performance Management of Integrated Systems and its Applications in Software Engineering, M. Pant, T. K. Sharma, S. Basterrech, and C. Banerjee, Eds., Singapore: Springer Singapore, 2020, pp. 1–9. doi: 10.1007/978-981-13-8253-6_1.

U. S. Chanu, K. J. Singh, and Y. J. Chanu, “A dynamic feature selection technique to detect DDoS attack,” Journal of Information Security and Applications, vol. 74, p. 103445, May 2023, doi: 10.1016/j.jisa.2023.103445.

M. Mittal, K. Kumar, and S. Behal, “Deep learning approaches for detecting DDoS attacks: a systematic review,” Soft Comput, vol. 27, no. 18, pp. 13039–13075, Sep. 2023, doi: 10.1007/s00500-021-06608-1.

Y. B. Sanap and P. Aher, “A Comprehensive Survey On Detection And Mitigation Of DDoS Attacks Enabled With Deep Learning Techniques In Cloud Computing,” in 2023 6th International Conference on Advances in Science and Technology (ICAST), 2023, pp. 149–154. doi: 10.1109/ICAST59062.2023.10454990.

J. Gera and B. P. Battula, “Detection of spoofed and non-spoofed DDoS attacks and discriminating them from flash crowds,” EURASIP Journal on Information Security, vol. 2018, no. 1, p. 9, Dec. 2018, doi: 10.1186/s13635-018-0079-6.

F. O. Catak and A. F. Mustacoglu, “Distributed denial of service attack detection using autoencoder and deep neural networks,” IFS, vol. 37, no. 3, pp. 3969–3979, Oct. 2019, doi: 10.3233/JIFS-190159.

A. Bhardwaj, V. Mangat, R. Vig, S. Halder, and M. Conti, “Distributed denial of service attacks in cloud: State-of-the-art of scientific and commercial solutions,” Computer Science Review, vol. 39, p. 100332, 2021, doi: https://doi.org/10.1016/j.cosrev.2020.100332.

Q. Li et al., “A comprehensive survey on DDoS defense systems: New trends and challenges,” Computer Networks, vol. 233, p. 109895, 2023, doi: https://doi.org/10.1016/j.comnet.2023.109895.

Ismail et al., “A Machine Learning-Based Classification and Prediction Technique for DDoS Attacks,” IEEE Access, vol. 10, pp. 21443–21454, 2022, doi: 10.1109/ACCESS.2022.3152577.

M. A. Bouke, A. Abdullah, S. H. ALshatebi, M. T. Abdullah, and H. E. Atigh, “An intelligent DDoS attack detection tree-based model using Gini index feature selection method,” Microprocessors and Microsystems, vol. 98, p. 104823, Apr. 2023, doi: 10.1016/j.micpro.2023.104823.

Y. Yin et al., “IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset,” J Big Data, vol. 10, no. 1, p. 15, Feb. 2023, doi: 10.1186/s40537-023-00694-8.

A. A. Alqarni, “Majority Vote-Based Ensemble Approach for Distributed Denial of Service Attack Detection in Cloud Computing,” JCSANDM, Mar. 2022, doi: 10.13052/jcsm2245-1439.1126.

N. Moustafa and J. Slay, “UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” in 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia: IEEE, Nov. 2015, pp. 1–6. doi: 10.1109/MilCIS.2015.7348942.

G. Karatas, O. Demir, and O. K. Sahingoz, “Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset,” IEEE Access, vol. 8, pp. 32150–32162, 2020, doi: 10.1109/ACCESS.2020.2973219.

K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/j.gltp.2022.04.020.

N. Konstantinou and N. W. Paton, “Feedback driven improvement of data preparation pipelines,” Information Systems, vol. 92, p. 101480, Sep. 2020, doi: 10.1016/j.is.2019.101480.

A. A. A. Fernandes, M. Koehler, N. Konstantinou, P. Pankin, N. W. Paton, and R. Sakellariou, “Data Preparation: A Technological Perspective and Review,” SN Computer Science, vol. 4, no. 4, p. 425, Jun. 2023, doi: 10.1007/s42979-023-01828-8.

F. Ridzuan and W. M. N. Wan Zainon, “A Review on Data Cleansing Methods for Big Data,” Procedia Computer Science, vol. 161, pp. 731–738, 2019, doi: 10.1016/j.procs.2019.11.177.

S. K. Singh and Dr. R. K. Dwivedi, “Data Mining: Dirty Data and Data Cleaning,” SSRN Electronic Journal, 2020, doi: 10.2139/ssrn.3610772.

J. Tang, W. Chen, K. Wang, Y. Zhang, and D. Liang, “Probability-based label enhancement for multi-dimensional classification,” Information Sciences, vol. 653, p. 119790, Jan. 2024, doi: 10.1016/j.ins.2023.119790.

J. T. Hancock and T. M. Khoshgoftaar, “Survey on categorical data for neural networks,” J Big Data, vol. 7, no. 1, p. 28, Dec. 2020, doi: 10.1186/s40537-020-00305-w.

M. K. Dahouda and I. Joe, “A Deep-Learned Embedding Technique for Categorical Features Encoding,” IEEE Access, vol. 9, pp. 114381–114391, 2021, doi: 10.1109/ACCESS.2021.3104357.

S. Bagui, D. Nandi, S. Bagui, and R. J. White, “Machine Learning and Deep Learning for Phishing Email Classification using One-Hot Encoding,” Journal of Computer Science, vol. 17, no. 7, pp. 610–623, Jul. 2021, doi: 10.3844/jcssp.2021.610.623.

D. Seca and J. Mendes-Moreira, “Benchmark of Encoders of Nominal Features for Regression,” in Trends and Applications in Information Systems and Technologies, Á. Rocha, H. Adeli, G. Dzemyda, F. Moreira, and A. M. Ramalho Correia, Eds., Cham: Springer International Publishing, 2021, pp. 146–155.

C. Nkikabahizi, W. Cheruiyot, and A. Kibe, “Chaining Zscore and feature scaling methods to improve neural networks for classification,” Applied Soft Computing, vol. 123, p. 108908, Jul. 2022, doi: 10.1016/j.asoc.2022.108908.

D. Protić et al., “Numerical Feature Selection and Hyperbolic Tangent Feature Scaling in Machine Learning-Based Detection of Anomalies in the Computer Network Behavior,” Electronics, vol. 12, no. 19, p. 4158, Oct. 2023, doi: 10.3390/electronics12194158.

U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” Journal of King Saud University – Computer and Information Sciences, vol. 34, no. 4, pp. 1060–1073, Apr. 2022, doi: 10.1016/j.jksuci.2019.06.012.

S. Tangirala, “Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm*,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, 2020, doi: 10.14569/IJACSA.2020.0110277.

J. Motl and P. Kordik, “Stratified Cross-Validation on Multiple Columns,” in 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), Washington, DC, USA: IEEE, Nov. 2021, pp. 26–31. doi: 10.1109/ICTAI52525.2021.00012.

L. Xavier and R. Thirunavukarasu, “A Distributed Tree-based Ensemble Learning Approach for Efficient Structure Prediction of Protein,” IJIES, vol. 10, no. 3, pp. 226–234, Jun. 2017, doi: 10.22266/ijies2017.0630.25.

A. A. Ceran, Y. Ar, Ö. Ö. Tanrıöver, and S. Seyrek Ceran, “Prediction of software quality with Machine Learning-Based ensemble methods,” Materials Today: Proceedings, vol. 81, pp. 18–25, 2023, doi: 10.1016/j.matpr.2022.11.229.

N. Sharma, M. Mangla, S. N. Mohanty, and C. R. Pattanaik, “Employing stacked ensemble approach for time series forecasting,” International Journal of Information Technology, vol. 13, no. 5, pp. 2075–2080, Oct. 2021, doi: 10.1007/s41870-021-00765-0.

M. Kumar, S. Singhal, S. Shekhar, B. Sharma, and G. Srivastava, “Optimized Stacking Ensemble Learning Model for Breast Cancer Detection and Classification Using Machine Learning,” Sustainability, vol. 14, no. 21, p. 13998, Oct. 2022, doi: 10.3390/su142113998.

J. Zheng, M. Wang, T. Yao, Y. Tang, and H. Liu, “Dynamic Mechanical Strength Prediction of BFRC Based on Stacking Ensemble Learning and Genetic Algorithm Optimization,” Buildings, vol. 13, no. 5, p. 1155, Apr. 2023, doi: 10.3390/buildings13051155.

A. Shahraki, M. Abbasi, and Ø. Haugen, “Boosting algorithms for network intrusion detection: A comparative evaluation of Real AdaBoost, Gentle AdaBoost and Modest AdaBoost,” Engineering Applications of Artificial Intelligence, vol. 94, p. 103770, Sep. 2020, doi: 10.1016/j.engappai.2020.103770.

J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, “Boosting methods for multi-class imbalanced data classification: an experimental review,” J Big Data, vol. 7, no. 1, p. 70, Dec. 2020, doi: 10.1186/s40537-020-00349-y.

N. Tabassum et al., “Semantic Analysis of Urdu English Tweets Empowered by Machine Learning,” Intelligent Automation & Soft Computing, vol. 29, no. 3, pp. 175–186, 2021, doi: 10.32604/iasc.2021.018998.

D. Valero-Carreras, J. Alcaraz, and M. Landete, “Comparing two SVM models through different metrics based on the confusion matrix,” Computers & Operations Research, vol. 152, p. 106131, Apr. 2023, doi: 10.1016/j.cor.2022.106131.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Downloads

Download data is not yet available.

Tree-based Ensemble Algorithms and Feature Selection Method for Intelligent Distributed Denial of Service Attack Detection

Keywords

How to Cite

Download Citation

Abstract

References

Downloads