Analysis of Feature Selection Algorithms and a Comparative study on Heterogeneous Classifier for High Dimensional Data survey

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2017 by IJETT Journal
Volume-53 Number-2
Year of Publication : 2017
Authors : Kassahun Azezew Ayidagn, prof. Shilpa Gite
DOI :  10.14445/22315381/IJETT-V53P211

Citation 

Kassahun Azezew Ayidagn, prof. Shilpa Gite "Analysis of Feature Selection Algorithms and a Comparative study on Heterogeneous Classifier for High Dimensional Data survey", International Journal of Engineering Trends and Technology (IJETT), V53(2),59-63 November 2017. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group

Abstract
This paper focuses on the analysis of various feature selection algorithms and a comparative study on heterogeneous classifier predictive accuracy problems to work with high dimensional data. Especially we conduct experimental comparisons of IBK (KNN), SVM, NBTree and J48 on KDD Cup99 intrusion detection dataset and one cancer disease diagnosis microarray datasets and analysis their performance with vote generalizations. Based on the fact a large number of features can cause a noise of data and degrades a performance of learning algorithm.To tackle these problems identifying a suitable feature selection method is essential for a given machine learning algorithm tasks. So feature selection plays a great role in intrusion detection, bioinformatics, and medical data analysis. Thus this paper deals the application of best feature selection techniques to improve learning algorithm predictive accuracy in microarray dataset and KDD (Knowledge Discovery and Data Mining Tools Conference) Cup 99 dataset with a respective classification and feature selection algorithms. basically, this approach shows the application of feature selection algorithms when a large number of features represented in a small sample data and small numbers of features represented with a high number of samples by taking the above two different datasets.

Reference
1. Aggarwal, P., & Sharma, S. K. (2015). Analysis of KDD dataset attributes-class wise for intrusion detection. Procedia Computer Science, 57, 842-851.
2. Alkuhlani, A., Nassef, M., & Farag, I. (2016). Multistage feature selection approach for high-dimensional cancer data. Soft Computing, 1-12.
3. Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2016). Feature selection for high-dimensional data. Progress in Artificial Intelligence, 5(2), 65-75.
4. Brahim, A. B., & Limam, M. (2017). Ensemble feature selection for high dimensional data: a new method and a comparative study. Advances in Data Analysis and Classification, 1-16.
5. Chidambaram, M., & Umasundari, R. A Survey on Feature Selection in Data Mining. ISSN: 2347-5552.
6. Dernoncourt, D., Hanczar, B., & Zucker, J. D. (2014). Analysis of feature selection stability on high dimension and small sample data. Computational Statistics & Data Analysis, 71, 681-693.
7. Destrero, A., Mosci, S., De Mol, C., Verri, A., & Odone, F. (2009). Feature selection for high-dimensional data. Computational management science, 6(1), 25-40.
8. Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using the random forest. BMC Bioinformatics, 7(1), 3.
9. Giancarlo, R., Bosco, G. L., & Utro, F. (2015). Bayesian versus data-driven model selection for microarray data. Natural Computing, 14(3), 393-402.
10. Gnana, D. A. A., Appavu, S., & Leavline, E. J. (2016). Literature Review on Feature Selection Methods for High- Dimensional Data. Methods, 136(1).
11. Huerta, E. B., Duval, B., & Hao, J. K. (2006, April). A hybrid GA/SVM approach for gene selection and classification of microarray data. In Workshops on Applications of Evolutionary Computation (pp. 34-44). Springer, Berlin, Heidelberg.
12. Ladha, L., & Deepa, T. (2011). Feature selection methods and algorithms. International journal of computer science and engineering, 3(5), 1787-1797.
13. Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C. & Nowe, A. (2012). A survey of filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4), 1106-1119.
14. Li, T., Zhang, C., & Ogihara, M. (2004). A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics, 20(15), 2429-2437.
15. Nagi, S., & Bhattacharyya, D. K. (2013).Classification of microarray cancer data using ensemble approach. Network Modelling Analysis in Health Informatics and Bioinformatics, 2(3), 159-173.
16. Perthame, É. Friguet, C., & Causeur, D. (2016). Stability of feature selection in classification issues for high-dimensional correlated data. Statistics and Computing, 26(4), 783-796.
17. Rahajoe, A. D., Winarko, E., & Guritno, S. (2017). A Hybrid Method for Multivariate Time Series Feature Selection. International Journal of Computer Science and Network Security (IJCSNS), 17(3), 103.
18. Rouhi, A., & Nezamabadi-pour, H. (2017, March). A hybrid feature selection approach based on ensemble method for high-dimensional data. In Swarm Intelligence and Evolutionary Computation (CSIEC), 2017 2nd Conference on (pp. 16-20). IEEE.
19. Saleh, A. I., Talaat, F. M., & Labib, L. M. (2017). A hybrid intrusion detection system (HIDS) based on prioritized knearest neighbors and optimized SVM classifiers. Artificial Intelligence Review, 1-41.
20. Seijo-Pardo, B., Bolón-Canedo, V., & Alonso-Betanzos, A. (2017). Testing Different Ensemble Configurations for Feature Selection. Neural Processing Letters,1-24.
21. Vanaja, S., & Kumar, K. R. (2014). Analysis of feature selection algorithms on classification: a survey. International Journal of Computer Applications, 96(17).
22. Xing, E. P., Jordan, M. I., & Karp, R. M. (2001, June). Feature selection for high-dimensional genomic microarray data. In ICML (Vol. 1, pp. 601-608).
23. Yu, L., & Liu, H. (2003). Feature selection for highdimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 856-863).

Keywords
High dimensional data. feature selection algorithm. Heterogeneous classifier. Feature selection.