Exploratory Analysis on Anomaly-based IDS Data Using DASK and Ensemble Learning: A Data Parallelization Approach

Exploratory Analysis on Anomaly-based IDS Data Using DASK and Ensemble Learning: A Data Parallelization Approach

  IJETT-book-cover           
  
© 2022 by IJETT Journal
Volume-70 Issue-12
Year of Publication : 2022
Author : Abhijit Das, Pramod
DOI : 10.14445/22315381/IJETT-V70I12P236

How to Cite?

Abhijit Das, Pramod, "Exploratory Analysis on Anomaly-based IDS Data Using DASK and Ensemble Learning: A Data Parallelization Approach," International Journal of Engineering Trends and Technology, vol. 70, no. 12, pp. 370-391, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I12P236

Abstract
Many scholars and practitioners have focused on anomaly detection because of its potential for identifying novel attacks. Unfortunately, due to system complexity, which necessitates extensive testing, assessment, and tuning before the deployment, its applicability to real-world applications has impeded to perform exploratory analysis on anomaly-based network intrusion detection systems (AIDS). The current study's goal was to get valuable insights into the data by applying machine learning techniques. The AIDS data considered for our research is massive and falls under the big data category; CSE-CIC-IDS2018 comprises around one crore sixty lakh samples 1,62,33,002; after Cleaning, 12,52,846 rows and 78 columns were obtained. NSL KDD raw dataset has 1,50,000 after processing 1,35,684 rows with 44 features, and the UNSWNB15 dataset with 2,5,40,044 rows with 44 features; all these datasets are the benchmark and cover a wide range of attack types. The work adopted an advanced data parallelism approach using DASK and machine learning algorithms. Data parallelism aims to increase processing throughput by partitioning the corpus into concurrent processing streams that all perform the same activities. As a result, widely used benchmark databases like NSL KDD, UNSW-NB-15, and CSECICIDS2018 were used in the proposed research work. The work combined Machine learning techniques and parallel execution of data intending to provide state-of-art technology in analyzing big AIDS data and finding relevant features from each.

Keywords
Anomaly-based Intrusion Detection System (AIDS), Exploratory Data Analysis (EDA), Machine learning, Statistical approach, IDS datasets.

References
[1] M. Souhail et al., "Network Based Intrusion Detection Using the UNSW-NB15 Dataset," International Journal of Computing and Digital Systems, vol. 8, no. 5, pp. 477–487, 2019. Crossref, https://doi.org/10.12785/IJCDS/080505
[2] K. Cup, "Dataset," 1999. [Online]. Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[3] A. U.-N. Dataset, ADFA-NB15-Datasets/bot iot.php, "Next-Generation Network Intrusion Detection System (NG-NIDS)," 2015.
[4] A. Mahfouz et al., "Ensemble Classifiers for Network Intrusion Detection Using a Novel Network Attack Dataset," Future Internet, vol. 12, no. 11, pp. 180, 2020. Crossref, https://doi.org/10.3390/fi12110180
[5] M. Rocklin, "Dask: Parallel Computation with Blocked Algorithms and Task Scheduling," Proceedings of the 14th Python in Science Conference, pp. 126-132, 2015. Crossref, https://doi.org/10.25080/Majora-7b98e3ed-013
[6] P. Mishra et al., "A Detailed Investigation and Analysis of Using Machine Learning Techniques for Intrusion Detection," IEEE Communications Surveys & Tutorials, vol. 21, no. 1, pp. 686–728, 2018. Crossref, https://doi.org/10.1109/COMST.2018.2847722
[7] V. S. Manvith, R. V. Saraswathi, and R. Vasavi, "A Performance Comparison of Machine Learning Approaches on Intrusion Detection Dataset," 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), IEEE, 2021, pp. 782–788, 2021. Crossref, https://doi.org/10.1109/ICICV50876.2021.9388502
[8] Z. H. Abdaljabar, O. N. Ucan, and K. M. A. Alheeti, "An Intrusion Detection System for IoT using KNN and Decision-Tree Based Classification," 2021 International Conference of Modern Trends in Information and Communication Technology Industry (MTICTI), IEEE, pp. 1–5, 2021. Crossref, https://doi.org/10.1109/MTICTI53925.2021.9664772
[9] W.-H. Chen, S.-H. Hsu, and H.-P. Shen, "Application of SVM and ANN for Intrusion Detection," Computers & Operations Research, vol. 32, no. 10, pp. 2617–2634, 2005. Crossref, https://doi.org/10.1016/j.cor.2004.03.019
[10] Z. A. Othman et al., "Improvement Anomaly Intrusion Detection using Fuzzy-ART Based on K-Means based on SNC labeling," Jurnal Teknologi Maklumat & Multimedia, vol. 10, pp. 1–11, 2011.
[11] T. Milo, and A. Somech, "Automating Exploratory Data Analysis via Machine Learning: An Overview," Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 2617–2622, 2020. Crossref, https://doi.org/10.1145/3318464.3383126
[12] R. Kaushik, V. Singh, and R. Kumari, "Multi-class SVM based Network Intrusion Detection with Attribute Selection using Infinite Feature Selection Technique," Journal of Discrete Mathematical Sciences and Cryptography, vol. 24, no. 8, pp. 2137–2153, 2021. Crossref, https://doi.org/10.1080/09720529.2021.2009189
[13] B. M. Serinelli, A. Collen, and N. A. Nijdam, "Training guidance with KDD CUP 1999 and NSL-KDD Data Sets of ANIDINR: Anomalybased Network Intrusion Detection System," Procedia Computer Science, vol. 175, pp. 560–565, 2020. Crossref, https://doi.org/10.1016/j.procs.2020.07.080
[14] M. Tavallaee et al., "A Detailed Analysis of the KDD CUP 99 Data Set," 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6, 2009. Crossref, https://doi.org/10.1109/CISDA.2009.5356528
[15] V. Kumar, A. Das, and D. Sinha, "UIDS: A Unified Intrusion Detection System for IoT Environment," Evolutionary Intelligence, vol.14, pp. 47-59, 2021. Crossref, https://doi.org/10.1007/s12065-019-00291-w
[16] Das, A.., and Pramod, “A Novel Deep Learning Model to Enhance Network Traffic Monitoring for Cybersecurity,” International Journal of Intelligent Systems and Applications in Engineering, vol. 10, no. 1s, pp. 335-342, 2022.
[17] P.G.V. Suresh Kumar, and S. Akthar, "Execution Improvement of Intrusion Detection System Through Dimensionality Reduction for UNSW-NB15 Information," Mobile Computing and Sustainable Informatics, Springer, pp. 385–396, 2022. Crossref, https://doi.org/10.1007/978-981-16-1866-6_28
[18] T. Acharya et al., "Efficacy of Heterogeneous Ensemble Assisted Machine Learning Model for Binary and Multi-Class Network Intrusion Detection," 2021 IEEE International Conference on Automatic Control & Intelligent Systems (I2CACIS), IEEE, pp. 408–413, 2021. Crossref, https://doi.org/10.1109/I2CACIS52118.2021.9495864
[19] V. Kanimozhi, and T. Jacob, "Artificial Intelligence Outflanks all Other Machine Learning Classifiers in Network Intrusion Detection System on the Realistic Cyber Dataset CSE-CIC-IDS2018 using Cloud Computing," ICT Express, vol. 7, no. 3, 2021. Crossref, https://doi.org/10.1016/j.icte.2020.12.004
[20] T. S. Riera et al., "A New Multi-Label Dataset for Web Attacks CAPEC Classification using Machine Learning Techniques," Computers & Security, vol. 120, 2022. Crossref, https://doi.org/10.1016/j.cose.2022.102788
[21] B. A. Tama et al., "An Enhanced Anomaly Detection in Web Traffic Using a Stack of Classifier Ensemble," IEEE Access, vol. 8, pp. 24 120–24 134, 2020. Crossref, https://doi.org/10.10.1109/ACCESS.2020.2969428
[22] M. A. Umar, C. Zhanfang, and Y. Liu, "A Hybrid Intrusion Detection with Decision Tree for Feature Selection," Cryptography and Security, 2020. Crossref, https://doi.org/10.48550/arXiv.2009.13067
[23] A. Abdollahi, and M. Fathi, "An Intrusion Detection System on Ping of Death Attacks in IoT Networks," Wireless Personal Communications, vol. 112, pp. 2057–2070, 2020. Crossref, https://doi.org/10.1007/s11277-020-07139-y
[24] P. Kumar, G. P. Gupta, and R. Tripathi, "Toward Design of an Intelligent Cyber Attack Detection System using Hybrid Feature Reduced Approach for IoT Networks," Arabian Journal for Science and Engineering, vol. 46, pp. 3749–3778, 2021. Crossref, https://doi.org/10.1007/s13369-020-05181-3
[25] K. Adhikary et al., "Evaluating the Performance of Various SVM Kernel Functions based on Basic Features Extracted from KDDCUP'99 Dataset by Random Forest Method for Detecting DDoS Attacks," Wireless Personal Communications, vol.123, pp. 3127– 3145, 2022. Crossref, https://doi.org/10.1007/s11277-021-09280-8
[26] Ming Li et al., "Design and Implementation of an Anomaly Network Traffic Detection Model Integrating Temporal and Spatial Features," Security and Communication Networks, vol. 2021, 2021. Crossref, https://doi.org/10.1155/2021/7045823
[27] R. Vinayakumar et al., "Deep Learning Approach for Intelligent Intrusion Detection System," IEEE Access, vol. 7, pp. 41525–41550, 2019. Crossref, https://doi.org/10.1109/ACCESS.2019.2895334
[28] S. Einy, C. Oz, and Y. D. Navaei, "The Anomaly-and Signaturebased IDS For Network Security Using Hybrid Inference Systems," Mathematical Problems in Engineering, vol. 2021, 2021. Crossref, https://doi.org/10.1155/2021/6639714
[29] K. Kim, M. E. Aminanto, and H. C. Tanuwidjaja, “Network Intrusion Detection using Deep Learning” A Feature Learning Approach, Springer, 2018.
[30] Y. Alnajjar, and J. Mounsef, "Next-Generation Network Intrusion Detection System (NG-NIDS)," 2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS), IEEE, pp. 411–416, 2021, Crossref, https://doi.org/10.1109/TELSIKS52058.2021.9606424
[31] M. Desquilbet et al., "Adequate Statistical Modelling and Data Selection are Essential when Analysing Abundance and Diversity Trends," Nature Ecology & Evolution, vol. 5, pp. 592–594, 2021. Crossref, https://doi.org/10.1038/s41559-021-01427-x
[32] K. R. M. Fernando, and C. P. Tsokos, "Dynamically Weighted Balanced Loss: Class Imbalanced Learning and Confidence Calibration of Deep Neural Networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2940-2951, 2021. Crossref, https://doi.org/10.1109/TNNLS.2020.3047335
[33] J. Liu, Y. Gao, and F. Hu, "A Fast Network Intrusion Detection System using Adaptive Synthetic Oversampling and Light GBM," Computers & Security, vol. 106, 2021. Crossref, https://doi.org/10.1016/j.cose.2021.102289
[34] M. K. Hasan et al., "Missing Value Imputation Affects the Performance of Machine Learning: A Review and Analysis of the Literature (2010–2021)," Informatics in Medicine Unlocked, vol. 27, 2021. Crossref, https://doi.org/10.1016/j.imu.2021.100799
[35] D. Chou, and M. Jiang, "A Survey on Data-Driven Network Intrusion Detection," ACM Computing Surveys (CSUR), vol. 54, no. 9, pp. 1–36, 2021. Crossref, https://doi.org/10.1145/3472753
[36] Mohammad Dawood Momand, Dr Vikas Thada, and Mr. Utpal Shrivastava, "Intrusion Detection System in IoT Network," SSRG International Journal of Computer Science and Engineering, vol. 7, no. 4, pp. 11-15, 2020. Crossref, https://doi.org/10.14445/23488387/IJCSE-V7I4P104
[37] A. Yulianto, P. Sukarno, and N. A. Suwastika, "Improving Adaboost-Based Intrusion Detection System (IDS) Performance on CIC IDS 2017 Dataset," Journal of Physics, vol. 1192, 2019. Crossref, https://doi.org/10.1088/1742-6596/1192/1/012018
[38] J. Feng, L. Yu, and R. Ma, "AGCN-T: A Traffic Flow Prediction Model for Spatial-Temporal Network Dynamics," Journal of Advanced Transportation, vol. 2022, 2022. Crossref, https://doi.org/10.1155/2022/1217588
[39] D. Levi et al., "Evaluating and Calibrating Uncertainty Prediction in Regression Tasks," Sensors, vol. 22, no. 15, 2022. Crossref, https://doi.org/10.3390/s22155540
[40] Abhijit Das, Pramod, and S. Praveen Kumar “An Enhanced Optimization Model with Ensemble Autoencoder for Zero-Day Attack Detection” Journal of Theoretical and Applied Information Technology, vol. 100, no. 22, 2022.
[41] Giribabu Sadineni, M. Archana, and Rama Chaithanya Tanguturi, "Optimized Detector Generation Procedure for Wireless Sensor Networks based Intrusion Detection System," International Journal of Engineering Trends and Technology, vol. 70, no. 6, pp. 63-72, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I6P208
[42] Abhijit Das, and Pramod, “A Novel Ensemble Model Using Learning Classifiers to Enhance Malware Detection for Cyber Security Systems,” International Journal on Recent and Innovation Trends in Computing and Communication, vol. 10, no. 1s, pp. 31-43, Crossref, https://doi.org/10.17762/ijritcc.v10i1s.5793