Data Preprocessing Techniques for Handling Time Series data for Environmental Science Studies
How to Cite?
Ebin Antony, N S Sreekanth, R K Sunil Kumar, Nishanth T, "Data Preprocessing Techniques for Handling Time Series data for Environmental Science Studies," International Journal of Engineering Trends and Technology, vol. 69, no. 5, pp. 196-207, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I5P227
Abstract
The present article discusses various preprocessing techniques suitable for dealing with time series data for environmental science-related studies. The errors or noises due to electronic sensor fault, fault in the communication channel, etc., are considered here. Such errors or glitches that occur during the data acquisition or transmission phases need to be eliminated before it fed to the forecasting or classification systems. Computationally simple and efficient techniques are discussed here so that they can even be adopted for a hard real-time system environment. While adopting these techniques, we may also end up with some of the real genuine values, which may consider as an outlier. A special indicator function, the moving Inter Quartile Range (MIQR) algorithm, is proposed to overcome such special cases.
Keywords
Time Series Analysis, Data Preprocessing, Moving Inter Quartile Range, Environmental Science, Data Science
Reference
[1] Gocheva-Ilieva, Snezhana & Ivanov, A. & Voynikova, Desislava & Boyadzhiev, Doychin. (2013). Time series analysis and forecasting for air pollution in a small urban area: An SARIMA and factor analysis approach. Stochastic Environmental Research and Risk Assessment. 28. 1045-1060. 10.1007/s00477-013-0800-4.
[2] J. K. Sethi and M. Mittal, Analysis of Air Quality using Univariate and Multivariate Time Series Models 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, (2020) 823-827, doi: 10.1109/Confluence47617.2020.9058303.
[3] Wu, X.; Zhou, J.; Yu, H.; Liu, D.; Xie, K.; Chen, Y.; Hu, J.; Sun, H.; Xing, F. The Development of a Hybrid Wavelet-ARIMA-LSTM Model for Precipitation Amounts and Drought Analysis. Atmosphere 2021, 12, 74. https://doi.org/10.3390/atmos12010074
[4] P. M. T. Broersen and R. Bos, Estimating time-series models from irregularly spaced data, in IEEE Transactions on Instrumentation and Measurement, 55(4) (2006) 1124-1131, doi: 10.1109/TIM.2006.876389.
[5] Richard H. Jones, Time series analysis with unequally spaced data,” Handbook of Statistics, Elsevier, 5(1985) 157-177, ISSN 0169-7161, ISBN 9780444876294, https://doi.org/10.1016/S0169-7161(85)05007-6.
[6] Jones, Richard H. Time Series Regression with Unequally Spaced Data. Journal of Applied Probability, 23(1986) 89–98. JSTOR, www.jstor.org/stable/3214345. Accessed 1 Apr. 2021.
[7] Shukla, A. & Garde, Yogesh & Jain, Ina. (2014). Forecast of weather parameters using time series data. Mausam. 65. 509-520.
[8] J. G. Harris and M. D. Skowronski, Automatic Speech Processing Methods for Bioacoustic Signal Analysis: A Case Study of Cross-Disciplinary Acoustic Research, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, France, (2006)V-V, doi: 10.1109/ICASSP.2006.1661395.
[9] Sarwar, Umair & Muhammad, Masdi & Abdul Karim, Zainal Ambri. (2014). Time Series Method for Machine Performance Prediction Using Condition Monitoring Data. 10.13140/2.1.4520.3201.
[10] Kumar, Raghavendra & Kumar, Pardeep & Kumar, Yugal. (2020). Time Series Data Prediction using IoT and Machine Learning Technique. Procedia Computer Science. 167. 373-381. 10.1016/j.procs.2020.03.240.
[11] P. He, Y. Yuan, and G. Liu, Web Services Quality Prediction Based on Multivariate Time Series Analysis, 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, (2018) 881-884, doi: 10.1109/ICSESS.2018.8663771.
[12] Dybko, A. Errors in Chemical Sensor Measurements. Sensors 2001, 1, 29-37. https://doi.org/10.3390/s10100029
[13] Rogulski, M.; Badyda, A. Investigation of Low-Cost and Optical Particulate Matter Sensors for Ambient Monitoring. Atmosphere 2020, 11, 1040. https://doi.org/10.3390/atmos11101040
[14] He, X.; Xu, X.; Zheng, Z. Optimal Band Analysis of a Space-Based Multispectral Sensor for Urban Air Pollutant Detection. Atmosphere 2019, 10, 631. https://doi.org/10.3390/atmos10100631
[15] Woodall, G.M.; Hoover, M.D.; Williams, R.; Benedict, K.; Harper, M.; Soo, J.-C.; Jarabek, A.M.; Stewart, M.J.; Brown, J.S.; Hulla, J.E.; Caudill, M.; Clements, A.L.; Kaufman, A.; Parker, A.J.; Keating, M.; Balshaw, D.; Garrahan, K.; Burton, L.; Batka, S.; Limaye, V.S.; Hakkinen, P.J.; Thompson, B. Interpreting Mobile and Handheld Air Sensor Readings in Relation to Air Quality Standards and Health Effect Reference Values: Tackling the Challenges. Atmosphere 2017, 8, 182. https://doi.org/10.3390/atmos8100182
[16] CT, Resmi & T, Dr. Nishanth & Kumar, Satheesh & M, Balachandramohan & Valsaraj, Kalliat. (2019). Temporal Changes in Air Quality during a Festival Season in Kannur, India. Atmosphere. 10. 137. 10.3390/atmos10030137.
[17] Jian, F., D. S. Jayas, and N. D. White. 2013. Can Ozone be a New Control Strategy for Pests of Stored Grain? Agricultural Research. 1–8.
[18] Horvitz, S. and M. Cantalejo. 2012. Application of Ozone for the Postharvest Treatment of Fruits and Vegetables. Critical Reviews in Food Science and Nutrition.
[19] Palou, L., C. H. Crisosto, J. L. Smilanick, J. E. Adaskaveg, and J. P. Zoffoli. 2002. Effects of Continuous 0.3 Ppm Ozone Exposure on Decay Development and Physiological Responses of Peaches and Table Grapes in Cold Storage. Postharvest Biology And Technology. 24: 39–48.
[20] Kim J. G., A. E. Yousef, and G. W. Chism. 1999. Use of Ozone to Inactivate Microorganisms on Lettuce. Journal of Food Safety. 19: 17–34.
[21] Karaca, H. and Y. S. Velioglu. 2007. Ozone Applications in Fruit and Vegetable Processing. Food Reviews International. 23: 91–106.
[22] Muz, M., M. Ak, O. Komesli, and C. Gökçay. 2012. An Ozone Assisted Process for Treatment of EDC’s in Biological Sludge. Chemical Engineering Journal.
[23] Michael David,Mohd Haniff Ibrahim, Sevia Mahdaliza Idrus, Asrul Izam Azmi, Nor Hafizah Ngajikin, Tay Ching En Marcus, Maslina Yaacob, Mohd Rashidi Salim, Azian AbdulAziz, "Progress in Ozone Sensors Performance: A Review", Jurnal Teknologi 73(6) (2015) 23-29.
[24] http://norditech.com.au/wp-content/uploads/2019/09/O342e_Eng_17.03.pdf [ Technical Manual for O342e] (Last accessed 20-April-2021]
[25] Kyriakidis, Ioannis & Karatzas, Kostas & Papadourakis, Giorgos. (2009). Using Preprocessing Techniques in Air Quality forecasting with Artificial Neural Networks. Environmental Science and Engineering (Subseries: Environmental Science). 357-372. 10.1007/978-3-540-88351-7_27.
[26] Kin Seng Lei and Feng Wan, Pre-processing for missing data: A hybrid approach to air pollution prediction in Macau, 2010 IEEE International Conference on Automation and Logistics, Hong Kong, China, (2010) 418-422. doi: 10.1109/ICAL.2010.5585320
[27] Christophe Paoli, Cyril Voyant, Marc Muselli, Marie-Laure Nivet, Forecasting of preprocessed daily solar radiation time series using neural networks, Solar Energy, Volume 84, Issue 12(2010) 2146-2160, ISSN 0038-092X,https://doi.org/10.1016/j.solener.2010.08.011.
[28] S. Minu, Amba Shetty & Binny Gopal | Lachezar Hristov Filchev (Reviewing Editor) (2016) Review of preprocessing techniques used in soil property prediction from hyperspectral data, Cogent Geoscience, 2:1, DOI: 10.1080/23312041.2016.1145878
[29] A. Juneja and N. N. Das, Big Data Quality Framework: Pre-Processing Data in Weather Monitoring Application, 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 2019, pp. 559-563. doi: 10.1109/COMITCon.2019.8862267
[30] Khongsrabut and K. Waiyamai, Outliers Detection in Time Series Data: Case study: Provincial Waterworks Authority, 2019 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON), Nan, Thailand, (2019) 234-238. doi: 10.1109/ECTI-NCON.2019.8692257
[31] D. Andreši?, P. Šaloun and B. Suchánová, Large Astronomical Time Series Pre-processing and Visualization for Classification using Artificial Neural Networks, 2019 IEEE 15th International Scientific Conference on Informatics, Poprad, Slovakia, (2019) 000311-000316. doi: 10.1109/Informatics47936.2019.9119283
[32] A. Famili, Wei-Min Shen, Richard Weber, Evangelos Simoudis, Data preprocessing and intelligent data analysis, Intelligent Data Analysis, 1(1–4) (1997)3-23, ISSN 1088-467X,https://doi.org/10.1016/S1088-467X(98)00007-9.
[33] A. Asok, Generalized approach to linear data transformation, 2016 International Conference on Data Science and Engineering (ICDSE), Cochin, India, (2016) 1-6, doi: 10.1109/ICDSE.2016.7823937.
[34] Z. Guan, T. Ji, X. Qian, Y. Ma, and X. Hong, A Survey on Big Data Pre-processing, 2017 5th Intl Conf on Applied Computing and Information Technology/4th Intl Conf on Computational Science/Intelligence and Applied Informatics/2nd Intl Conf on Big Data, Cloud Computing, Data Science (ACIT-CSII-BCD), Hamamatsu, Japan, 2017, pp. 241-247, doi: 10.1109/ACIT-CSII-BCD.2017.49.
[35] Van Zoest, V.M., Stein, A. & Hoek, G. Outlier Detection in Urban Air Quality Sensor Networks. Water Air Soil Pollut 229, 111 (2018). https://doi.org/10.1007/s11270-018-3756-7
[36] Hird, Jennifer & Mcdermid, Greg. (2009). Noise reduction of NDVI time series: An empirical comparison of selected techniques. Remote Sensing of Environment. 113. 248-258. 10.1016/j.rse.2008.09.003.
[37] Michael Galarnyk. (2018, Sep 12), Understanding Boxplots, towards data science, https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
[38] Jajo, Nethal & Matawie, K.. (2009). Outlier Detection using Modified Boxplot. International Journal of Ecology and Development. 13. 116-122.
[39] Kaliyaperumal, Senthamarai & Kuppusamy, Manoj. (2015). Outlier detection in multivariate data. Applied Mathematical Sciences. 9. 2317-2324. 10.12988/ams.2015.53213.
[40] H P, Vinutha & Poornima, B. & Sagar, B.. (2018). Detection of Outliers Using Interquartile Range Technique from Intrusion Dataset. 10.1007/978-981-10-7563-6_53.
[41] Aggarwal, Vaibhav & Gupta, Vaibhav & Singh, Prayag & Sharma, Kiran & Sharma, Neetu. (2019). Detection of Spatial Outlier by Using Improved Z-Score Test. 788-790. 10.1109/ICOEI.2019.8862582.
[42] Pratama, Irfan & Permanasari, Adhistya & Ardiyanto, Igi & Indrayani, Rini. (2016). A review of missing values handling methods on time-series data. 1-6. 10.1109/ICITSI.2016.7858189.
[43] Abdullah, Mohd Mustafa Al Bakri. (2014). Filling Missing Data Using Interpolation Methods: Study on the Effect of Fitting Distribution. Key Engineering Materials. 594-595. 889-895. 10.4028/www.scientific.net/KEM.594-595.889.
[44] Raudys, Aistis & PABARŠKAIT?, Židrina. (2018). Optimizing the smoothness and accuracy of moving average for stock price data. Technological and Economic Development of Economy. 24. 984-1003. 10.3846/20294913.2016.1216906.
[45] Pan, Ruilin & Yang, Tingsheng & Cao, Jianhua & Lu, Ke & Zhang, Zhanchao. (2015). Missing data imputation by K nearest neighbors based on grey relational structure and mutual information. Applied Intelligence. 43. 10.1007/s10489-015-0666-x.