Single and Multiple Imputation Techniques to Treat Missing Numerical Variables (MNV) in Perspectives of Data Science Project - A Case Study

Single and Multiple Imputation Techniques to Treat Missing Numerical Variables (MNV) in Perspectives of Data Science Project - A Case Study

  IJETT-book-cover           
  
© 2022 by IJETT Journal
Volume-70 Issue-5
Year of Publication : 2022
Authors : Dharmendra Patel, Octavio Loyola-González, Arpit Trivedi, Hardik Rajgor, Tushar Mehta, Sanskruti Patel, Pranav Vyas, Nilay Ganatra, Hardik I Patel
DOI :  10.14445/22315381/IJETT-V70I5P202

How to Cite?

Dharmendra Patel, Octavio Loyola-González, Arpit Trivedi, Hardik Rajgor, Tushar Mehta, Sanskruti Patel, Pranav Vyas, Nilay Ganatra, Hardik I Patel, "Single and Multiple Imputation Techniques to Treat Missing Numerical Variables (MNV) in Perspectives of Data Science Project - A Case Study," International Journal of Engineering Trends and Technology, vol. 70, no. 5, pp. 9-14, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I5P202

Abstract
Data Science is extensively used in various industrial domains to understand the enormous amount of data and derive meaningful and valuable insights to make smarter business decisions. The quality of data plays a vital role in insights generations. Data quality can be enhanced by imputing appropriate values in place of missing data. Data imputation plays a critical role in a data science project. In this paper, we have described single and multiple imputation techniques in the context of missing numerical variables with proper cases. We have explained different scenarios to select appropriate imputation techniques for any data science project. We also produce results based on imputation techniques by taking simple and meaningful examples.

Keywords
Single Imputation, Data Science, Numerical Variables, Missing Completely at Random(MCAR), Regression, Multiple Imputation.

Reference
[1] M. C. P. A. J. A. I. G. C. De Souto, Impact of Missing Data Imputation Methods on Gene Expression Clustering and Classification, BMC Bioinformatics. (2015) 1-9.
[2] A. R. T. V. D. H. G. J. S. T. &. M. K. G. Donders, Review: A Gentle Introduction to Imputation of Missing Values, Journal of Clinical Epidemiology. (2006) 1087–1091.
[3] C. P. E. S. A. E. Graham John W, Methods for Handling Missing Data, Wiley. (2012).
[4] K. J. H. Kwak Sang Kyu, Statistical Data Preparation: Management of Missing Values and Outliers. Korean J Anesthesiol. (2017) 407-411.
[5] G.-B. W. J. Grzymala-Busse Jerzy W, Handling Missing Attribute Values, Berlin: Springer. (2009).
[6] L. A. C. J. Peyre H, Missing Data Methods for Dealing with Missing Items in Quality of Life Questionnaires. A Comparison by Simulation of Personal Mean Score, Full Information Maximum Likelihood, Multiple Imputations, and Hot Deck Techniques Applied to the SF-36 in the French, Quality of Life Research. (2011) 287-300.
[7] C. Li, Little`s Test of Missing Completely at Random, The Stata Journal. (2013) 795–809.
[8] K. B. A. L. Smeeth, What is the Difference Between Missing Completely at Random and Missing at Random? International Journal of Epidemiology. (2014) 1336–1339.
[9] C. Li, Little`s Test of Missing Completely at Random, The Stata Journal. 13(4) (2013) 795-809.
[10] N. Shutoh, T. Nishiyama and M. Hyodo, Bartlett Correction to the Likelihood Ratio Test for MCAR with Two?Step Monotone Sample, Statistica Neerlandica. 71(3) (2017) 184-199.
[11] D. S. G. A. Z. Yu, A Find Out: Finding Outliers in Very Large Datasets, Knowledge and Information Systems. (2002) 387 - 412.
[12] C. C. Y. S. P. Aggarwal, Outlier Detection for High Dimensional Data, Sigmod’01. (2001) 37-46.
[13] Ö. Senger, Impact of Skewness on Statistical Power, Modern Applied Science. (2013) 49-56.
[14] M. Templ, A. Kowarik And P. Filzmoser, Iterative Stepwise Regression Imputation Using Standard and Robust Methods, Computational Statistics & Data Analysis. 55(10) (2011) 2793-2806.
[15] J. Shao And H. Wang, Sample Correlation Coefficients Based on Survey Data Under Regression Imputation, Journal of the American Statistical Association. 97(458) (2002) 544-552.
[16] D. P. R. Anil Jadhav, Comparison Of Performance Of Data Imputation Methods For Numeric Dataset, Applied Artificial Intelligence. (2019) 913-933.
[17] Y. H. Christophe Crambes, Regression Imputation in the Functional Linear Model with Missing Values in the Response, Journal of Statistical Planning and Inference. (2019) 103-109.
[18] S. Xu, Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction, G3 (Bethesda). (2017) 895–909.
[19] J. L. &. G. J. W. Schafer, Missing Data: Our View of the State of the Art, Psychological Methods. (2002) 147–177.
[20] J. W. Graham, Missing Data Analysis: Making it Work in the Real World, Annual Review of Psychology. (2009) 549–576.
[21] W. I. C. J. S. M. R. P. K. M. W. A. C. J. Sterne Jac, Multiple Imputation for Missing Data in Epidemiological and Clinical Research: Potential and Pitfalls, BMJ. (2009) 157–160.
[22] G. D. Garson, Missing Values Analysis and Data Imputation, Asheboro, NC: Statistical Associates Publishers. (2015).
[23] G. A. L. M. Abayomi K, Diagnostics for Multivariate Imputations, Journal of the Royal Statistical Society. (2008) 273–291.
[24] K. G.-O. Stef Van Buuren, Mice: Multivariate Imputation by Chained Equations in R, Journal of Statistical Software. (2011) 1-67.
[25] Mera-Gaona, N. M., V.-C. U. And &. L. D. M. R., Evaluating the Impact of Multivariate Imputation by Mice in Feature Selection, Plos One. 16(7) (2021).
[26] Missing Data Methods for Dealing with Missing Items in Quality of Life Questionnaires. A Comparison by Simulation of the Personal Mean Score, Full Information Maximum Likelihood, Multiple Imputation, and Hot Deck Techniques Applied to the SF-36 in the French, Quality of Life Research. (2011) 287–300