Compound Feature Generation And Boosting Model For Cancer Gene Classification

S. Jafar Ali Ibrahim; A. Mohamed Affir; M. Thangamani; S. Nallusamy

doi:https://doi.org/10.14445/22315381/IJETT-V68I10P208

Research Article | Open Access | Download PDF

Volume 68 | Issue 10 | Year 2020 | Article Id. IJETT-V68I10P208 | DOI : https://doi.org/10.14445/22315381/IJETT-V68I10P208

Compound Feature Generation And Boosting Model For Cancer Gene Classification

S. Jafar Ali Ibrahim, A. Mohamed Affir, M. Thangamani, S. Nallusamy

Citation :

S. Jafar Ali Ibrahim, A. Mohamed Affir, M. Thangamani, S. Nallusamy, "Compound Feature Generation And Boosting Model For Cancer Gene Classification," International Journal of Engineering Trends and Technology (IJETT), vol. 68, no. 10, pp. 48-51, 2020. Crossref, https://doi.org/10.14445/22315381/IJETT-V68I10P208

Abstract

The huge-data processing applications are conducted utilizing data mining or deep learning approaches. In data processing and deep learning systems, computational complexity is the key problem. High dimensional data analysis requires immense computing time and computer capital. For improved visuality, optimization of data, elimination of noise and comprehensible factors and generalization, dimensionality restriction methods are implemented. The dimensionality reduction activities monitor the data output. In the high dimensional data world, feature selection models are implemented to minimize complexity. Throughout the potential selection process, sub-set filtering with significance element is considered. In the function selection process, quantitative techniques are implemented. The poor results of the T-test configuration are found. F-test models disable the unnecessary functions. To test the apps, Q-statistics activities are added. For the practical enhancement cycle, the booster algorithm is used. For the classification method, the Naïve Bayes algorithm is used. Dynamic characteristics are identified with the filtering methods of the applications. The retrieval of characteristics is implemented in the microscope data values to catch complex properties. The method for integrating feature discovery with abstraction is added to the compound object creation. Many percentage-based attribute associations are introduced for app incorporation. The boosting approach is combined with the production of compound functions. The classification is performed using the algorithm Naïve Bayes with function values produced.

Keywords

High Dimensional Data Classification, Feature Selection, Feature Extraction, Feature Generation and Naïve Bayesian Classifier

References

[1] Qinbao Song, Jingjie Ni and Guangtao Wang. “A fast clustering-based feature subset selection algorithm for high dimensional data”, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, pp. 65-73, 2013.
[2] Van Dijk G. and Van Hulle M.M., “Speeding up the wrapper feature subset selection in regression by mutual information relevance and redundancy analysis”, International Conference on Artificial Neural Networks, pp. 122-132, 2006.
[3] Krier C., Francois D., Rossi F. and Verleysen M. “Feature clustering and mutual information for the selection of variables in spectral data”, In Proc European Symposium on Artificial Neural Networks Advances in Computational Intelligence and Learning, pp. 157-162, 2007.
[4] S.K. Muruganandham, D. Sobya, S. Nallusamy, Dulal Krishna Mandal and P.S. Chakraborty. “Study on leaf segmentation using k-means and k-medoid clustering algorithm for identification of disease”, Indian Journal of Public Health Research and Development, vol. 9, no. 2, pp. 289-293, 2018.
[5] T. Abeel, T. Helleputte, Y. V. de Peer, P. Dupont, and Y. Saeys. “Robust biomarker identification for cancer diagnosis with ensemble feature selection methods,” Bioinformatics, vol. 26, no. 3, pp. 392-398, 2010.
[6] A. J. Ferreira and M. A. T. Figueiredo, “Efficient feature selection filters for high dimensional data,” Pattern Recognised Letters, vol. 33, no. 13, pp. 1794-1804, 2012.
[7] Q. Song, J. Ni, and G. Wang, “A fast clustering-based feature subset selection algorithm for high-dimensional data,” IEEE Trans. Knowledge. Data Eng., vol. 25, no. 1, pp. 1-14, 2013.
[8] Y. Han and L. Yu, “A variance reduction framework for stable feature selection,” Statist. Anal. Data Mining, vol. 5, no. 5, pp. 428-445, 2012.
[9] S. Alelyan, “On feature selection stability: A data perspective,” PhD dissertation, Arizona State Univ., Tempe, AZ, USA, 2013.
[10] D. Dernoncourt, B. Hanczar, and J. D. Zucker, “Analysis of feature selection stability on high dimension and small sample data,” Comput. Statist. Data Anal., vol. 71, pp. 681-693, 2014.
[11] N. Meinshausen and P. Buhlmann, “Stability selection,” J. Roy. Statist. Soc.: Series B (Statist.Methodol.), vol. 72, no. 4, pp. 417-473, 2010.
[12] Sobya, D., Manoj, S. “Prediction and identification of cancer and normal genes through wavelet transform technique”, Indian Journal of Public Health Research and Development, vol. 10, no. 8, pp. 631-637, 2019.
[13] Z. He and W. Yu, “Stable feature selection for biomarker discovery,” Comput. Biol. Chem., vol. 34, no. 4, pp. 215-225, 2010.
[14] K. M. Ting, J. R. Wells, S. C. Tan, S. W. Teng, and G. I. Webb, “Feature-subspace aggregating: Ensembles for stable and unstable learners,” Mach. Learn., vol. 82, no. 3, pp. 375-397, 2011.
[15] F. Alonso-Atienza, J. L. Rojo-Alvare, A. Rosado-Mu~noz, J. J. Vinagre, A. Garcia-Alberola, and G. Camps-Valls, “Feature selection using support vector machines and bootstrap methods for ventricular fibrillation detection,” Expert Syst. Appl., vol. 39, no. 2, pp. 1956-1967, 2012.
[16] D. Dembele, “A flexible microarray data simulataion model,” Microarrays, vol. 2, no. 2, pp. 115-130, 2013.
[17] S. Jeyabalan, V. Cyril Raj and S. Nallusamy. “A genetic algorithm based protein signal pathway analysis”, Indian Journal of Public Health Research and Development, vol. 9, no. 1, pp. 402-406, 2018.
[18] Ibrahim, S.J.A. and Thangamani, M. “Enhanced singular value decomposition for prediction of drugs and diseases with Hepatocellular carcinoma based on Multi-Source Bat Algorithm based random walk”, Measurement, vol. 141, pp. 176-183, 2019.
[19] Kassahun Azezew Ayidagn, prof. Shilpa Gite "Analysis of Feature Selection Algorithms and a Comparative study on Heterogeneous Classifier for High Dimensional Data survey", International Journal of Engineering Trends and Technology (IJETT), V53(2),59-63 November 2017.