Data Mining Based Imputation Techniques to Handle Missing Values in Gene Expressed Dataset

Amarjeet Yadav; Aditya Dubey; Akhtar Rasool; Nilay Khare

doi:https://doi.org/10.14445/22315381/IJETT-V69I9P229

Research Article | Open Access | Download PDF

Volume 69 | Issue 9 | Year 2021 | Article Id. IJETT-V69I9P229 | DOI : https://doi.org/10.14445/22315381/IJETT-V69I9P229

Data Mining Based Imputation Techniques to Handle Missing Values in Gene Expressed Dataset

Amarjeet Yadav, Aditya Dubey, Akhtar Rasool, Nilay Khare

Citation :

Amarjeet Yadav, Aditya Dubey, Akhtar Rasool, Nilay Khare, "Data Mining Based Imputation Techniques to Handle Missing Values in Gene Expressed Dataset," International Journal of Engineering Trends and Technology (IJETT), vol. 69, no. 9, pp. 242-250, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I9P229

Abstract

The microarray analysis results in datasets with massive expression levels of genes as rows and following the various laboratory conditions as columns. Due to experimental errors, these datasets frequently have some content dropping. The presence of missing values in data sets significantly reduces efficiency and accuracy. It can influence the outcome of the visualization study of gene representation. Therefore, how to predict missing records indeed becomes significant to examine the elementary arrangement. Missing data imputation has received numerous attractions from researchers. This paper summarizes most of the techniques proposed for the imputation of missing data. It contains a thorough discussion about various advantages and disadvantages of global, local, and hybrid approaches and knowledge-assisted approaches. This paper has described MCAR, MNAR, MAR techniques to identify the type of missing data. Precisely this article compares all the methods and puts forward a better understanding of these techniques.

Keywords

Correlation Structure, Gene Expression Data, Imputation, Missing Value.

References

[1] W.-C. Liew, N.-F. Law, and H. Yan, Missing value imputation for gene expression data: computational techniques to recover missing data from available information, Briefings in bioinformatics, 12(5) (2011) 498–513.
[2] A. B. Pedersen, E. M. Mikkelsen, D. Cronin-Fenton, N. R. Kristensen, T. M. Pham, L. Pedersen, and I. Petersen, Missing data and multiple imputations in clinical, epidemiological research, Clinical epidemiology, 9 (2017) 157-166.
[3] A. Dubey, and A. Rasool, Time series missing value prediction: Algorithms and applications, International Conference on Information, Communication and Computing Technology, (2020) 21–36.
[4] A. Dubey and A. Rasool, Data mining based handling missing data, International conference on I-SMAC (IoT in Social, Mobile, Analytics, and Cloud) (I-SMAC), (2019) 483–489.
[5] X.Zhu, J.Wang, B.Sun, C.Ren, T.Yang, and J. Ding, An efficient ensemble method for missing value imputation in microarray gene expression data, BMC bioinformatics, 22(1) (2021) 1-25.
[6] A. Dubey, and A. Rasool, Clustering-based hybrid approach for multivariate missing data imputation, International Journal of Advanced Computer Science and Applications,11( 11) (2020) 483-489.
[7] A. Dubey, and A. Rasool, Local Similarity-Based Approach for Multivariate Missing Data Imputation, IJAST, 29(6) (2020) 9208 - 9215.
[8] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, Missing value estimation methods for dna microarrays, Bioinformatics, 17(6) (2001) 520–525.
[9] S. Oba, M.-a. Sato, I. Takemasa, M. Monden, K.-i. Matsubara, and S. Ishii, A bayesian missing value estimation method for gene expression profile data, Bioinformatics, 19(16) (2003) 2088–2096.
[10] R.Vinas, T. Azevedo , ER. Gamazon, and P. Lio, Deep Learning Enables Fast and Accurate Imputation of Gene Expression, Frontiers in Genetics, 12 (2021) 489-500.
[11] M. Celton, A. Malpertuy, G. Lelandais , and A. G. De Brevern, Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments, BMC genomics, 11(1) (2010) 1–16.
[12] H. Kim, G. H. Golub, and H. Park Missing value estimation for dna microarray gene expression data: local least squares imputation, Bioinformatics, 21(2) (2005) 187–198.
[13] LF. Burgette, and JP Reiter, Multiple Imputation for Missing Data via Sequential Regression Trees, American journal of epidemiology, 172(9) (2010) 1070-1076.
[14] M. Ouyang, W. J. Welsh, and P. Georgopoulos, Gaussian mixture clustering and imputation of microarray data, Bioinformatics, 20(6) (2004) 917–923.
[15] M. S. B. Sehgal, I. Gondal, and L. S. Dooley, Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data, Bioinformatics, 21(10) (2005) 2417–2423.
[16] M. S. B. Sehgal, I. Gondal, L. S. Dooley, and R. Coppel, Ameliorative missing value imputation for robust biological knowledge inference, Journal of Biomedical Informatics, 41(4) (2008) 499–514.
[17] A. Farswan, A. Gupta, R. Gupta, and G. Kaur, Imputation of gene expression data in blood cancer and its significance in inferring biological pathways, Frontiers in oncology, 9 (2020) 1442-1451.
[18] A. Wang, J. Yang, and N. An Regularized sparse modeling for microarray missing value estimation, IEEE Access, 9 (2021) 16899–16913.
[19] CA.Mancuso, JL. Canfield, D.Singla, and A. Krishnan, A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes, Nucleic Acids Research, 48(21) (2020) 1-12.
[20] Y. Hu, B. Li, W. Zhang, N. Liu, P. Cai, F. Chen, and K. Qu, WEDGE: imputation of gene expression values from single-cell RNA-seq datasets using biased matrix decomposition, Briefings in Bioinformatics, 4 (2021) 1-13.
[21] YL. Qiu, H. Zheng, and A. Gevaert, Genomic data imputation with variational auto-encoders, GigaScience, 9(8) (2020) 1-12.
[22] S. Faisal, and G. Tutz, Missing value imputation for gene expression data by tailored nearest neighbors, Statistical applications in genetics and molecular biology, 16(2) (2017) 95–106.
[23] X. Zhang, X. Song, H. Wang, and H. Zhang, Sequential local least squares imputation estimating the missing value of microarray data, Computers in biology and medicine, 38(10) (2008) 1112–1120.
[24] Z. Yu, T. Li, S.-J. Horng, Y. Pan, H. Wang, and Y. Jing, An iterative locally auto-weighted least squares method for microarray was missing value estimation, IEEE transactions on nano bioscience, 16(1) (2016) 21–33.
[25] R. Jörnsten, H.-Y. Wang, W. J. Welsh, and M. Ouyang, Dna microarray data imputation and significance analysis of differential expression, Bioinformatics, 21(22) (2005) 4155–4161.
[26] A. Purwar, and S. K. Singh, Hybrid prediction model with missing value imputation for medical data, Expert Systems with Applications, 42(13) (2015) 5621–5631.
[27] I. B. Aydilek, and A. Arslan, A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm, Information Sciences, 233 (2013) 25–35.
[28] I. B. Aydilek, and A. Arslan, A novel hybrid approach to estimating missing values in databases using k-nearest neighbors and neural networks, International Journal of Innovative Computing, Information and Control, vo 7(8) (2012), 4705–4717.
[29] A .Tjärnberg, O .Mahmood, CA .Jackson, GA. Saldi , K.Cho , LA. Christiaen, and RA. Bonneau ,Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data, PLoS computational biology, 17(1) (2021) 1-22.
[30] H. Li, C. Zhao, F. Shao, G.-Z. Li, and X. Wang, A hybrid imputation approach for microarray missing value estimation, BMC genomics, 16(S9) (2015) 1-11.
[31] J. Tang, G. Zhang, Y. Wang, H. Wang, and F. Liu, A hybrid approach to integrate fuzzy c-means based imputation method with genetic algorithm for missing traffic volume data estimation, Transportation Research Part C: Emerging Technologies, 51 (2015) 29–40.
[32] J. Tian, B. Yu, D. Yu, and S. Ma, Missing data analyses: a hybrid multiple imputation algorithm using gray system theory and entropy based on clustering, Applied intelligence, 40(2) (2014) 376–388.
[33] S. Azim, and S. Aggarwal, Hybrid model for data imputation: using fuzzy c means and multi-layer perceptron, International Advance Computing Conference (IACC), (2014) 1281–1285.
[34] X. Gan, A. W.-C. Liew, and H. Yan, Microarray missing data imputation based on a set theoretic framework and biological knowledge, Nucleic Acids Research, 34(5) (2006) 1608–1619.
[35] J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio, Improving missing value estimation in microarray data with gene ontology, Bioinformatics, 22(5) (2006) 566–572.
[36] Q. Xiang, X. Dai, Y. Deng, C. He, J. Wang, J. Feng, and Z. Dai, Missing value imputation for microarray gene expression data using histone acetylation information, BMC bioinformatics, 9(1) (2008) 1–17.
[37] Z. Yu, T. Li, S.-J. Horng, Y. Pan, H. Wang, and Y. Jing, An iterative locally auto-weighted least squares method for microarray missing value estimation, IEEE transactions on nanobioscience, 16(1) (2016) 21–33.
[38] C.-C. Chiu, S.-Y. Chan, C.-C. Wang, and W.-S. Wu, Missing value imputation for microarray data: a comprehensive comparison study and a web tool, BMC systems biology, 7(6) (2013) 1–13.