Relevant Feature Selection Model Using Data Mining for Intrusion Detection System

Ayman I. Madbouly; Amr M. Gody; Tamer M. Barakat

doi:https://doi.org/10.14445/22315381/IJETT-V9P296

Research Article | Open Access | Download PDF

Volume 9 | Number 2 | Year 2014 | Article Id. IJETT-V9P296 | DOI : https://doi.org/10.14445/22315381/IJETT-V9P296

Relevant Feature Selection Model Using Data Mining for Intrusion Detection System

Ayman I. Madbouly , Amr M. Gody , Tamer M. Barakat

Citation :

Ayman I. Madbouly , Amr M. Gody , Tamer M. Barakat, "Relevant Feature Selection Model Using Data Mining for Intrusion Detection System," International Journal of Engineering Trends and Technology (IJETT), vol. 9, no. 2, pp. 501-512, 2014. Crossref, https://doi.org/10.14445/22315381/IJETT-V9P296

Abstract

Network intrusions have become a significant threat in recent years as a result of the increased demand of computer networks for critical systems. Intrusion detection system (IDS) has been widely deployed as a defense measure for computer networks. Features extracted from network traffic can be used as sign to detect anomalies. However with the huge amount of network traffic, collected data contains irrelevant and redundant features that affect the detection rate of the IDS, consumes high amount of system resources, and slowdown the training and testing process of the IDS. In this paper, a new feature selection model is proposed; this model can effectively select the most relevant features for intrusion detection. Our goal is to build a lightweight intrusion detection system by using a reduced features set. Deleting irrelevant and redundant features helps to build a faster training and testing process, to have less resource consumption as well as to maintain high detection rates. The effectiveness and the feasibility of our feature selection model were verified by several experiments on KDD intrusion detection dataset. The experimental results strongly showed that our model is not only able to yield high detection rates but also to speed up the detection process.

Keywords

Intrusion detection system, traffic classification, network security, supervised learning, feature selection, data mining.

References

[1] Axelsson, S., “Intrusion Detection Systems:A Taxonomy and Survey,” Technical Report No 99-15, Dept. of Computer Engineering, Chalmers University of Technology, Sweden, March 2000.
[2] Lunt, T. F., “Detecting Intruders in Computer Systems,” in proceeding of 1993 Conference on Auditing and Computer Technology, 1993.
[3] Sundaram, A., “An Introduction to Intrusion Detection,” The ACM Student Magazine, Vol.2, No.4, April 1996. Available at http://www.acm.org/crossroads/xrds2-4/xrds2-4.html
[4] Porras, P. A., “STAT: A State Transition Analysis Tool for Intrusion Detection,” MSc Thesis, Department of Computer Science, University of California Santa Babara, 1992
[5] Dorothy E. Denning, “An Intrusion Detection Model,” In IEEE Transactions on Software Engineering, Vol.SE-13, Number 2, page 222-232, February 1987.
[6] Lunt, T. F., et al., “A Real-time Intrusion Detection Expert System (IDES),” Technical Report SRI-CSL-92-05, Computer Science Laboratory, SRI International, Menlo Park, CA, April 1992.
[7] Mykerjee, B., Heberlein, L. T. and Levitt, K. N., “Network Intrusion Detection,” IEEE Network, Vol.8, No.3, pp.26-41, 1994.
[8] Ilgun, K., Kemmerer, R. A., and Porras, P. A., “State Transition Analysis: Rule-Based Intrusion Detection Approach,” IEEE Transactions on Software Engineering, Vol. 21, No. 3, pp.181-199, March 1995.
[9] Kumar, S., “Classification and Detection of Comput
er Intrusions,” PhD Thesis, Department of Computer Science, Purdue University, August 1995. [10] Ingham, K.L., Somayaji, A., Burge, J., Forrest, S., “Learning DFA Representations of HTTP For Protecting Web Applications,” Computer Networks 51(5), 1239–1255 (2007)
[11] Özyer, T., Alhajj, R., Barker, K., “Intrusion Detection By Integrating Boosting Genetic Fuzzy Classifier and Data Mining Criteria for Rule Pre-screening,” Journal of Network and Computer Applications 30(1), 99–113 (2007)
[12] Wang, W., Guan, X.H., Zhang, X.L., Yang, L.W., “Profiling Program Behavior for Anomaly Intrusion Detection Based on The Transition and Frequency Property of Computer Audit Data,” Computers & Security 25(7), 539–550 (2006)
[13] Julisch, K., “Clustering Intrusion Detection Alarms to Support Root Cause Analysis,” ACM Transactions on Information and System Security 6(4), 443–471 (2003)
[14] Lee, W., Stolfo, S., Kui, M., “A Data Mining Framework for Building Intrusion Detection Models,” In IEEE Symposium on Security and Privacy, Oakland, pp. 120–132 (1999)
[15] Shin, M.S., Jeong, K.J, “An Alert Data Mining Framework for Network-based Intrusion Detection System,” in Song, J., Kwon, T., Yung, M. (eds.) WISA 2005. LNCS, vol. 3786, pp. 38–53. Springer, Heidelberg (2006)
[16] A.H. Sung, S., “Mukkamala, Identifying important features for intrusion detection using support vector machines and neural networks,” in Proc. SAINT, 2003, pp. 209–217.
[17] A.H. Sung, S. Mukkamala, “The feature selection and intrusion detection problems,” in Lecture Notes in Computer Science, Springer, 2004.
[18] H. Liu, H. Motoda, “Feature Selection for Knowledge Discovery and Data Mining,” Kluwer Academic, 1998.
[19] D. Mladenic, M. Grobelnik, “Feature selection on hierarchy of web documents,” Decision Support Systems 35 (2003) 45–87.
[20] Z.-J. Lee, “An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer,” International Journal Artificial Intelligence in Medicine 42 (2008) 81–93.
[21] S.X. Wu, W. Banzhaf, “The use of computational intelligence in intrusion detection system: a review,” Applied Soft Computing Journal 10 (January (1)) (2010) 1–35.
[22] S.X. Wu, “Sequential anomaly detection based on temporal-difference learning principles, models and case studies,” Applied Soft Computing 10 (2010) 859–867.
[23] Garcia-Teodoro, Pedro, et al. “Anomaly-based network intrusion detection, “ Techniques, systems and challenges,” computers & security 28.1 (2009): 18-28.
[24] Wu, Shelly Xiaonan, and Wolfgang Banzhaf. “The use of computational intelligence in intrusion detection systems: A review,” Applied Soft Computing10.1 (2010), 1-35.
[25] Tsai, Chih-Fong, et al., “Intrusion detection by machine learning: A review,” Expert Systems with Applications 36.10 (2009):11994-12000.
[26] Aickelin, Uwe, Julie Greensmith, and Jamie Twycross, “Immune system approaches to intrusion detection–a review,” Artificial Immune Systems. Springer Berlin Heidelberg, 2004. 316-329.
[27] Liao, Hung-Jen, et al., “Intrusion Detection System: A Comprehensive Review,” Journal of Network and Computer Applications (2012).
[28] Y. Li, L. Guo, “An active learning based TCM-KNN algorithm for supervised network intrusion detection,” Computers and Security 8 (December) (2007) 459–467.
[29] Jiang, Z., Luosheng, W., Yong, F., & Xiao, Y. C. Intrusion detection based on density level sets estimation,” In Proceedings of international conference on networking, architecture, and storage, (2008) (pp. 173–174).
[30] Lauria, E. J.M., & Tayi, G. K. (2008), “Statistical machine learning for network intrusion detection: a data quality perspective,” International Journal of Services Sciences, 1(2), 179–195.
[31] H.A. Nguyen, D. Choi, “Application of data mining to network intrusion detection classifier selection model,” in LNCS, vol. 5297, Springer-Verlag, Berlin Heidelberg, 2008, pp. 399–408.
[32] Liu, Y., Cai, J., Huang, Z., Yu, J., & Yin, J., “Fast detection of database system abuse behaviors based on data mining approach,” In ACM international conference proceeding series, Vol. 304. Proceedings of the 2nd international conference on scalable information systems, InfoScale ’07 (pp. 1–7).
[33] Prasad, G., Dhanalakshmi, Y., & Kumar, V., “Modeling an intrusion detection system using data mining and genetic algorithms based on fuzzy logic,” IJCSNS International Journal of Computer Science and Network Security, 8(7), 319–325.
[34] Prashanth, G., Prashanth, V., Jayashree, P., & Srinivasan, N., “Using random forests for network-based anomaly detection at active routers,” In Proceedings of international conference on signal processing communications and networking, ICSCN ’08 (pp. 93–96).
[35] Su-Yun Wu, Ester Yen, “Data mining-based intrusion detectors,” Expert Systems with Applications, Volume 36, Issue 3, Part 1, April 2009, Pages 5605-5612, ISSN 0957-4174, 10.1016/j.eswa.2008.06.138
[36] M. Asaka, T. Onabura, T. Inoue, S. Okazawa, S. Goto, “A new intrusion detection method based on discriminant analysis,” IEICE Transactions on Information and System 5 (May) (2001) 570–577.
[37] Ho, Cheng-Yuan, et al., “Statistical analysis of false positives and false negatives from real traffic with intrusion detection/prevention systems,” Communications Magazine, IEEE 50.3 (2012), 146-154.
[38] Chen, Rung-Ching, Kai-Fang Cheng, and Cheng-Chia Hsieh, “Using Fuzzy Neural Networks and rule heuristics for anomaly intrusion detection on database connection,” Machine Learning and Cybernetics, 2008 International Conference on. Vol. 6. IEEE, 2008.
[39] Tian, D., Liu, Y., & Li, B., “A distributed hebb neural network for network anomaly detection,” In Lecture notes in computer science: Vol. 4742. Parallel and distributed processing and applications (pp. 314–325). (Including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
[40] Masri, Wes, and Andy Podgurski, “Application-based anomaly intrusion detection with dynamic information flow analysis,” Computers & Security 27.5 (2008): 176-187.
[41] Powers, Simon T., and Jun He, “A hybrid artificial immune system and Self Organising Map for network intrusion detection,” Information Sciences 178.15 (2008): 3024-3042.
[42] Aickelin, U., Bentley, P., Cayzer, S., Kim, J., & McLeod, J., “Danger theory: The link between AIS and IDS?” In Artificial Immune Systems (pp. 147-155). Springer Berlin Heidelberg.
[43] P. Srinivasulu, D. Nagaraju, P.R. Kumar, K.N. Rao, “Classifying the network intrusion attacks using data mining classification methods and their performance comparison,” International Journal of Computer Science and Network Security 9 (June (6)) (2009) 11–18.
[44] S.Y. Wu, E. Yen, “Data mining-based intrusion detectors,” Expert Systems with Applications 1 (April) (2009) 5605–5612
[45] S. Srinoy, “Intrusion detection model based on particle swarm optimization and support vector machine,” in Proceedings of the 2007 IEEE Symposium on Computational Intelligence in Security and Defence Applications, 2007.
[46] Chung, Yuk Ying, and Noorhaniza Wahid, “A hybrid network intrusion detection system using simplified swarm optimization (SSO),” Applied Soft Computing (2012).
[47] Agarwal, Basant, and Namita Mittal, “Hybrid Approach for Detection of Anomaly Network Traffic using Data Mining Techniques,” Procedia Technology 6 (2012), 996-1003.
[48] Panda, Mrutyunjaya, Ajith Abraham, and Manas Ranjan Patra, “A Hybrid Intelligent Approach for Network Intrusion Detection,” Procedia Engineering 30 (2012), 1-9.
[49] Singh, Shailendra, and Sanjay Silakari, “An ensemble approach for feature selection of Cyber Attack Dataset,” arXiv preprint arXiv, 0912.1014 (2009).
[50] Siraj MMd, Maarof MdAand Hashim SZ Md., “ A hybrid intelligent approach for automated alert clustering and filtering in intrusion alert analysis,” Intl. Journal of computer theory and engineering, 2009; 1(5), 539-45
[51] Mukherjee, Saurabh, and Neelam Sharma, “Intrusion detection using naive Bayes classifier with feature reduction,” Procedia Technology 4 (2012), 119-128
[52] Alhaddad, Mohammed J., et al., “A study of the modified KDD 99 dataset by using classifier ensembles,” IOSR Journal of Engineering, May. 2012, Vol. 2(5) pp: 961-965.
[53] Kumar, G. Sunil, and C. V. K. Sirisha, “Robust Preprocessing and Random Forests Technique for Network Probe Anomaly Detection.,” International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-1, Issue-6, January 2012.
[54] Elngar, Ahmed A., A. El A. Dowlat, and Fayed FM Ghaleb, “A Fast Accurate Network Intrusion Detection System,” International Journal of Computer Science 10 (2012).
[55] Chung, Yuk Ying, and Noorhaniza Wahid, “A hybrid network intrusion detection system using simplified swarm optimization (SSO),” Applied Soft Computing (2012).
[56] Suthaharan, Shan, and Tejaswi Panchagnula, “Relevance feature selection with data cleaning for intrusion detection system,” Southeastcon, 2012 Proceedings of IEEE. IEEE, 2012.
[57] Li, Yinhui, et al., “An efficient intrusion detection system based on support vector machines and gradually feature removal method,” Expert Systems with Applications 39.1 (2012), 424-430.
[58] Y. Li, J. Wang, Z. Tian, T. Lu and C. Young, “Building lightweight intrusion detection system using wrapper-based feature selection mechanisms,” Computer and Security, vol. 28, pp.466-475, 2009.
[59] A. A. Olusola, A. S. Oladele and D. O. Abosede, “Analysis of KDD `99 Intrusion Detection Dataset for Selection of Relevance Features,” Proceedings of the World Congress on Engineering and Computer Science, vol. 1, Oct-2010.
[60] Chebrolu S., Abraham A., and Thomas J., “Feature deduction and ensemble design of intrusion detection systems,” ELSEVIER Computer and Security, V.24, pp. 295-307, 2005
[61] Kayacik, H. Günes, A. Nur Zincir-Heywood, and Malcolm I. Heywood, "Selecting features for intrusion detection: a feature relevance analysis on KDD 99 intrusion detection datasets,” Proceedings of the third annual conference on privacy, security and trust. 2005.
[62] Amiri F., Yousefi M., Lucas C., Shakery A., and Yazdani N., “Mutual information-based feature selection for intrusion detection systems,” ELSEVIER, Journal of Network and Computer Applications, V. 34, pp. 1184-1199, 2011.
[63] Tang P., Jiang R., and Zhao M., “Feature selection and design of intrusion detection system based on k-means and triangle area support vector machine,” Second International Conference on Future Networks, IEEE, pp. 144-148, 2010.
[64] Zargari, Shahrzad, and Dave Voorhis, “Feature Selection in the Corrected KDD-dataset,” Emerging Intelligent Data and Web Technologies (EIDWT), 2012 Third International Conference on. IEEE, 2012.
[65] Lippmann, Richard, et al., “The 1999 DARPA off-line intrusion detection evaluation,” Computer networks 34.4 (2000): 579-595.
[66] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu and Ali A. Ghorbani, “A Detailed Analysis of the KDD CUP 99 Data Set,” In Proceedings of the 2009 IEEE Symposium Computational Intelligence for Security and Defense Applications, CISDA’09, July 2009.
[67] MIT Lincoln Labs, “1998 DARPA Intrusion Detection Evaluation,” Available at: http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/index.html, February 2008.
[68] L. Portnoy, E. Eskin, and S. Stolfo, “Intrusion detection with unlabeled data using clustering,” Proceedings of ACM CSS Workshop on Data Mining Applied to Security, Philadelphia, PA, November, 2001.
[69] K. Leung and C. Leckie, “Unsupervised anomaly detection in network intrusion detection using clusters,” Proceedings of the Twenty-eighth Australasian conference on Computer Science-Volume 38, pp. 333–342, 2005.
[70] G. John and P. Langley, “Estimating continuous distributions in Bayesian classifiers,” in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345, 1995.
[71] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
[72] D. Aldous, “The continuum random tree. I,” The Annals of Probability, pp. 1–28, 1991.
[73] J. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann, 1993.
[74] R. Kohavi, “Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, vol. 7, 1996.
[75] D. Ruck, S. Rogers, M. Kabrisky, M. Oxley, and B. Suter, “The multilayer perceptron as an approximation to a Bayes optimal discriminant function,” IEEE Transactions on Neural Networks, vol. 1, no. 4, pp. 296–298, 1990.
[76] C. Chang and C. Lin, “LIBSVM: a library for support vector machines,” 2001. Software available at: http://www.csie.ntu.edu.tw/ cjlin/libsvm
[77] Caruana, Rich, Niculescu, Alex, Crew, Geoff, and Ksikes, Alex, “Ensemble Selection from Libraries of Models,” the International Conference on Machine Learning (ICML`04), 2004.
[78] Yoav Freund, Robert E. Schapire, “Experiments with a new boosting algorithm,” In: Thirteenth International Conference on Machine Learning, San Francisco, 148-156, 1996.
[79] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, “The WEKA Data Mining Software: An Update,” SIGKDD Explorations, Volume 11, Issue 1.
[80] Borra S, Ciaccio AD. (2010), “Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods,” Computational Statistics and Data Analysis 54, 2976-2989.
[81] Rodriguez, J.D.; Perez, A.; Lozano, J.A., “Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.32, no.3, pp.569,575, March 2010 doi: 10.1109/TPAMI.2009.187.