An Unsupervised Deep Feature Selection and Ensemble Deep Learning Model for Cancer Classification

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
© 2022 by IJETT Journal
Volume-70 Issue-9
Year of Publication : 2022
Authors : K. Prema, A. Kumar Kombaiya
DOI : 10.14445/22315381/IJETT-V70I9P203

How to Cite?

K. Prema, A. Kumar Kombaiya, "An Unsupervised Deep Feature Selection and Ensemble Deep Learning Model for Cancer Classification," International Journal of Engineering Trends and Technology, vol. 70, no. 9, pp. 20-33, 2022. Crossref,

Microarray technology is a principle to begin and verify the antibody microarrays in a registered series of patents. Within a particular trial, a Microarray Data Analysis (MDA) is utilized to identify the patterns of thousands of genes. The MD consists of a large volume of gene expression data for detecting cancer diseases. But, the imbalanced class label instances in microarray gene datasets and initialized parameter value for the classifier lead to over-fitting and under-fitting problems in cancer classification. Therefore, in this article, a stacking ensemble of Deep cluster-based Deep Learning (DL) systems for Cancer Classification is designed to overcome the abovementioned difficulty by using many learning models to build one ideal predictive model. The developed model is classified into three sections. First, a Modified Harmony Search Algorithm and Modified Kernel-based Fuzzy C-Means (MHSAMKFC) are developed to eliminate huge redundant features effectively. Second, the MHSAMKFC with Convolutional Neural Network (CNN) classifier is proposed to handle uncertainties in the labelled training dataset to improve the classifier performance. Third, the over-fitting and the under-fitting problem of MHSAMKFC-CNN is reduced by the ensemble method, which uses multiple learning models to provide better prediction accuracy. The whole process is termed to be En-MHSAMKFC-CNN. Finally, experimentation is carried out on four Gene Expression Microarray (GEM) datasets and verified that the En-MHSAMKFC-CNN improves the classification performance of SVM, KNN, RF and ANN classifiers.

Microarray Data Analysis, Convolutional Neural Network, Fuzzy C-Means, Harmony Search Algorithm, Cancer Classification.

[1] K. D. Miller, A. Goding Sauer, A. P. Ortiz, S. A. Fedewa, P. S. Pinheiro, G. Tortolero‐Luna, And R.L. Siegel, “Cancer Statistics for Hispanics/Latinos,” Ca: aCancer Journal for Clinicians, vol.68, no.6, pp.425-445, 2018.
[2] J. D. Cohen, L. Li, Y. Wang, C. Thoburn, B. Afsari, L. Danilova, And N. Papadopoulos, “Detection And Localization of Surgically Resectable Cancers with A Multi-Analyte Blood Test,” Science, vol.359, no.6378, pp.926-930, 2018.
[3] S . Farjana Farvin, And S . Krishna Mohan., “A Comparative Study on Lung Cancer Detection Using Deep Learning Algorithms,” SSRG International Journal of Computer Science And Engineering, vol.9, no.5, pp.1-4, 2022, Crossref,
[4] A.K Shukla, P. Singh, And M. Vardhan, “Gene Selection for Cancer Types Classification Using Novel Hybrid Metaheuristics Approach,” Swarm And Evolutionary Computation, vol. 54, pp.100661, 2020.
[5] J. H. Bae, M. Kim, J. S. Lim, And Z. W. Geem, “Feature Selection for Colon Cancer Detection Using K-Means Clustering And Modified Harmony Search Algorithm,” Mathematics, vol.9, no.5, pp.570, 2021.
[6] C. Y. Yu, Y. Li, A. L. Liu, And J. H. Liu, “A Novel Modified Kernel Fuzzy C-Means Clustering Algorithm on Image Segmentation,” In 2011 14th Ieee International Conference on Computational Science And Engineering IEEE, pp. 621-626, 2011.
[7] R. R. Rani And D. Ramyachitra, “Microarray Cancer Gene Feature Selection Using Spider Monkey Optimization Algorithm And Cancer Classification Using Svm,” Procedia Computer Science, vol.143, pp.108-116, 2018.
[8] B. Lyu And A. Haque, “Deep Learning Based Tumor Type Classification Using Gene Expression Data,” In Proceedings of the 2018 Acm International Conference on Bioinformatics, Computational Biology And Health Informatics, pp.89-96, 2018.
[9] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh And D. Batra, “Grad-Cam: Visual Explanations From Deep Networks Via Gradient-Based Localization,” In Proceedings of the IEEE International Conference on Computer Vision, pp.618-626, 2017.
[10] M. Mollaee, And M. H. Moattar, “A Novel Feature Extraction Approach Based on Ensemble Feature Selection and Modified Discriminant Independent Component Analysis for Microarray Data Classification,” Biocybernetics And Biomedical Engineering. Vol.36, no.3, pp.521-529, 2016.
[11] S. Guo, D. Guo, L. Chen And Q. Jiang, “A Centroid-Based Gene Selection Method for Microarray Data Classification,” Journal of Theoretical Biology, vol.400, pp.32-41, 2016.
[12] B. Sahu, S. Dehuri And A. K Jagadev, “Feature Selection Model Based on Clustering And Ranking In Pipeline for Microarray Data,” Informatics In Medicine Unlocked, vol.9, pp.107-122, 2017.
[13] S. Guo, D. Guo, L. Chen And Q. Jiang, “A L1-Regularized Feature Selection Method for Local Dimension Reduction on Microarray Data,” Computational Biology And Chemistry, vol.67 , pp.92-101, 2017.
[14] M. K. Ebrahimpour, H. Nezamabadi-Pour And M. Eftekhari, “Ccfs: A Cooperating Coevolution Technique for Large Scale Feature Selection on Microarray Datasets,” Computational Biology And Chemistry, vol.73, pp.171-178, 2018.
[15] Z. Y. Algamal, R. Alhamzawi And H. T. M. Ali, “Gene Selection for Microarray Gene Expression Classification Using Bayesian Lasso Quantile Regression,” Computers In Biology And Medicine, vol. 97, pp. 145-152, 2018.
[16] B. Cao, J. Zhao, P. Yang, P. Yang, X. Liu, J. Qi And K. Muhammad, “Multiobjective Feature Selection for Microarray Data Via Distributed Parallel Algorithms,” Future Generation Computer Systems, vol.100, pp.952-981, 2019.
[17] M. Yuan, Z. Yang And G. Ji, “Partial Maximum Correlation Information: A New Feature Selection Method for Microarray Data Classification,” Neurocomputing, vol.323 , pp.231-243, 2019.
[18] H. Wang, L. Tan, And B. Niu, “Feature Selection for Classification of Microarray Gene Expression Cancers Using Bacterial Colony Optimization with Multi-Dimensional Population,” Swarm And Evolutionary Computation, vol.48, pp.172-181, 2019.
[19] S. Li, K. Zhang, Q. Chen, S. Wang, And S. Zhang, “Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors And Genetic Algorithm,” IEEE Access, vol.8, pp.139512-139528, 2020.
[20] N. Ilc, “Weighted Cluster Ensemble Based on Partition Relevance Analysis with Reduction Step,” IEEE Access, vol.8 , pp.113720- 113736, 2020.
[21] P. García-Díaz, I. Sánchez-Berriel, J.A. Martínez-Rojas, And A. M. Diez-Pascual, “Unsupervised Feature Selection Algorithm for Multiclass Cancer Classification,” Genomics, vol.112, no.2, pp.1916-1925, 2020.
[22] M. Mohammed, H. Mwambi, I. B. Mboya, M. K Elbashir, And B. A. Omolo, “Stacking Ensemble Deep Learning Approach To Cancer Type Classification Based on Tcga Data,” Scientific Reports, vol.11, no.1, pp.1-22, 2021.
[23] O. A. Alomari, S. N. Makhadmeh, M. A Al-Betar, Z.A.A Alyasseri, I. A. Doush, A. K. Abasi, And R. A. Zitar, “Gene Selection for Microarray Data Classification Based on Gray Wolf Optimizer Enhanced with Triz-Inspired Operators,” Knowledge-Based Systems, vol.223, pp.107034, 2021.