Bidirectional Recurrence Neural Network Imputation For Recovering Missing Daily Streamflow Data
How to Cite?
Fatimah Bibi Hamzah, Firdaus Mohd Hamzah, Siti Fatin Mohd Razali, Juanita Zainudin, "Bidirectional Recurrence Neural Network Imputation For Recovering Missing Daily Streamflow Data," International Journal of Engineering Trends and Technology, vol. 69, no. 8, pp. 1-10, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I8P201
Missing value in hydrological research is common, and there is a growing interest to recover missing streamflow data as accurate information is required for various purposes. Due to missing data limitations, this study aims to evaluate the performance of the RNN-based method compared to the non-RNN based imputation methods to predict recurrence in a streamflow dataset. In this study, daily streamflow datasets from Malaysia`s Langat River Basins were used. Following that, the datasets were fed into the Multiple Linear Regression (MLR) model. The validation of the best estimation methods was performed based on the estimation error, using methods such as Nash-Sutcliffe Efficiency Coefficient (CE), Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE). The findings revealed that the RNN-based method coupled with MLR (BRNN-MLR) outperformed all the approaches examined for filling missing values in streamflow datasets, with the highest CE value and lowest MAPE and RMSE value regardless of any missing data conditions.
BRNN, imputation, MICE, Missing data, streamflow, MLR.
 P. Dobriyal, R. Badola, C. Tuboi, and S. Ainul., A review of methods for monitoring streamflow for sustainable water resource management, Appl. Water Sci., 7(2017) 2617–2628.
 M. H R and P. V N., Impact of Climatological Parameters on Reference Crop Evapotranspiration Using Multiple Linear Regression Analysis, Int. J. Civ. Eng., 2(1) (2015) 21–24.
 J. Odiero, B. T. I Ong’or, and M. N. Edward., Rainfall-Runoff Nexus in Mid-block of Yala Catchment, Int. J. Civ. Eng., 5(10) (2018) 6–16.
 P. Tencaliec, A. Favre, C. Prieur, and T. Mathevet., Reconstruction of missing daily streamflow data using dynamic regression models, Water Resour. Res. Am. Geophys. Union,51(12) (2015) 9447–9463.
 W. Norliyana, W. Ismail, W. Zawiah, W. Zin, and W. Ibrahim., Estimation of rainfall and streamflow missing data for Terengganu, Malaysia by using interpolation technique methods, Malaysian J. Fundam. Appl. Sci., 13(3) (2017) 213– 217.
 M. N. Sediqi et al., Spatio-Temporal Pattern in the Changes in Availability and Sustainability of Water Resources in Afghanistan, Sustainability, 11(17) (2019) 5836.
 M. Kim, S. Baek, M. Ligaray, J. Pyo, M. Park, and K. H. Cho., Comparative studies of different imputation methods for recovering streamflow observation, Water (Switzerland), 7(12) (2015) 6847–6860.
 F. B. Hamzah, F. M. Hamzah, S. F. M. Razali, O. Jaafar, and N. A. Jamil., Imputation methods for recovering streamflow observation : A methodological review, Cogent Environ. Sci., 6(2020) 1745133.
 C. Deng and W. Wang., Runoff predicting and variation analysis in upper Ganjiang Basin under projected climate changes, Sustainability, 11(18) (2019) 5885.
 A. Temesgen., Quantifying model uncertainty to improve streamflow prediction geba cathment, upper tekeze River basin, Ethiopia, Int. J. Hydrol., 6(3) (2020) 48–53.
 I. Žliobaite, J. Hollmén, and H. Junninen., Regression models tolerant to massively missing data: A case study in solarradiation nowcasting, Atmos. Meas. Tech., 7(12) (2014) 4387– 4399.
 Y. Gao., Dealing with missing data in hydrology - Data analysis of discharge and groundwater time-series in Northeast Germany, Freie Universität Berlin, Germany, (2017).
 C. A. Johnston., Development and evaluation of infilling methods for missing hydrologic and chemical watershed monitoring data, Virginia Polytechnic Institute and State University, (1999).
 P. Tencaliec, Developments in statistics applied to hydrometeorology: imputation of streamflow data and semiparametric precipitation modeling, Universite Grenoble Alpes, (2017).
 N. Ahmat Zainuri, A. Aziz Jemain, and N. Muda., A comparison of various imputation methods for missing values in air quality data, Sains Malaysiana, 44(3) (2015) 449–456.
 I. F. Kamaruzaman, W. Z. Wan Zin, and N. Mohd Ariff., A comparison of a method for treating missing daily rainfall data in Peninsular Malaysia, Malaysian J. Fundam. Appl. Sci., no. Special Issue on Some Advances in Industrial and Applied Mathematics, (2017) 375–380.
 R. J. A. Little and D. B. Rubin, Statistical analysis with missing data, 2nd ed. Hoboken, New Jersey: John Wiley & Sons, Inc., (2002).
 M. K. Gill, T. Asefa, Y. Kaheil, and M. McKee, Effect of missing data on the performance of learning algorithms for hydrologic predictions: Implications to an imputation technique, Water Resour. Res., 43(7)(2007) 1–12.
 S. Moritz and T. Bartz-Beielstein., imputeTS: Time series missing value imputation in R,R J., 9(1) (2017) 207–218.
 G. Kabir, S. Tesfamariam, J. Hemsing, and R. Sadiq., Handling incomplete and missing data in water network database using imputation methods, Sustain. Resilient Infrastruct., 00(0) (2019) 1–13.
 Y. Gao, C. Merz, G. Lischeid, and M. Schneider., A review on missing hydrological data processing, Environ. Earth Sci., 77(2) (2018) 47.
 A. Aieb, K. Madani, M. Scarpa, B. Bonaccorso, and K. Lefsih., A new approach for processing climate missing databases applied to daily rainfall data in Soummam watershed, Algeria, Heliyon5(27) (2019) 01247.
 W. Zvarevashe, S. Krishnannair, and V. Sivakumar, Analysis of rainfall and temperature data using ensemble empirical mode decomposition, Data Sci. J., 18(46) (2019) 1–9.
 A. Plaia and A. L. Bondì., Single imputation method of missing values in environmental pollution data sets, Atmos. Environ., 40(8) (2006) 7316–7330.
 H. Tyralis, G. Papacharalampous, and A. Langousis., A brief review of random forests for water scientists and practitioners and their recent history in water resources, Water, 11(910) (2019) 1–37.
 J. E. Shortridge, S. D. Guikema, and B. F. Zaitchik., Machine learning methods for empirical streamflow simulation : a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds,(2016) 2611–2628.
 D. N. Kumar, K. S. Raju, and T. Sathish., River flow forecasting using recurrent neural networks, Water Resour. Manag., 18(2004) 143–161.
 J. Anmala, B. Zhang, and R. S. Govindaraju., Comparison of ANNs and empirical approaches for predicting watershed runoff, J. Water Resour. Plan. Manag., 126(3) (2000) 156–166.
 N. Gong, T. Denoeux, and J. L. Bertrand-Krajewski, Neural networks for solid transport modeling in sewer systems during storm events., Water Sci. Eng.,33(9) (1996). 85–92.
 T. W. S. Chow and S. Y. Cho., Development of a recurrent sigma-Pi neural network rainfall forecasting system in Hong Kong., Neural Comput. Appl., 5(2) (1997) 66–75.
 X. H. Le, H. V. Ho, G. Lee, and S. Jung., Application of long short-term memory (LSTM) neural network for flood forecasting., Water (Switzerland),11(2019) 1387.
 W. Cao, H. Zhou, D. Wang, Y. Li, J. Li, and L. Li., BRITS: Bidirectional recurrent imputation for time series, in 32nd International Conference on Neural Information Processing Systems, (2018) 6776–6786.
 M. . Noorazuan, R. Rainis, H. Juahir, and N. Jaafar., GIS Application in Evaluating Land Use-Land Cover change and its Impact on Hydrological Regime in Langat River Basin, Malaysia, Proc. Conf. MapAsia (Malaysia, Kuala Lumpur).,(2003).
 W. H. M. Wan Mohtar, S. A. Bassa Nawang, and M. N. S. Rahman., Statistical Analysis in Fluvial Sediments of Selangor Rivers: Downstream variation in grain size distribution, J. Kejuruter., S(1)(2017) 37–45.
 H. Juahir, T. M. Ekhwan, S. M. Zain, M. B. Mokhtar, Z. Jalaludin, and I. K. M. Jan., The Use of Chemometrics Analysis as a Cost-effective Tool in Sustainable Utilisation of Water Resources in the Langat River Catchment, Am. J. Agric. Environ. Sci.,4(2) (2008) 258–265.
 H. Juahir et al., Spatial water quality assessment of Langat River Basin ( Malaysia ) using environmetric techniques, Environ. Monit. Assess., 173(2011) 1–4 625–641.
 F. Mohamad Hamzah, S. H. Mohd Yusoff, and O. Jaafar., LMoment- Based Frequency Analysis of High-Flow at Sungai Langat , Kajang , Selangor , Malaysia,Sains Malaysiana, 48(7) (2019) 1357–1366.
 Y. J. Puah, Y. F. Huang, K. C. Chua, and T. S. Lee., River catchment rainfall series analysis using additive Holt-Winters method, J. Earth Syst. Sci.,2(2016) 269–283.
 H. Juahir, S. M. Zain, A. Z. Aris, M. K. Yusof, M. A. A. Samah, and M. Bin Mokhtar., Hydrological trend analysis due to landuse changes at langat river basin, EnvironmentAsia, 3(2020) 20–31(2010).
 H. Memarian, S. K. Balasundram, J. B. Talib, A. M. Sood, and K. C. Abbaspour. Trend analysis of water discharge and sediment load during the past three decades of development in the Langat basin, Malaysia, Hydrol. Sci. J., 57(6) (2012) 1207– 1222.
 H. H. Yang, O. Jaafar, E.-S. A., and S. M. S. A, Analysis of hydrological processes of Langat River sub-basins at Lui and Dengkil, Int. J. Phys. Sci., 6(32) (2011) 7390–7409.
 K. F. Widaman., Missing Data: What to do with or without them,Monogr. Soc. Res. Child Dev., 71(1) (2006) 210–211.
 D. A. Bennett., How can I deal with missing data in my study? Aust. N. Z. J. Public Health, 25(5) (2001) 464–469.
 Y. Bengio and F. Gingras., Recurrent neural networks for missing or asynchronous data, in 8th International Conference on Neural Information Processing Systems, (1995) 395–401.
 S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer., Scheduled sampling for sequence prediction with recurrent neural networks, Adv. Neural Inf. Process. Syst., 9(2015) 1171–1179.
 T. Schneider., Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values., J. Clim., 14(5) (2001) 853–871.
 J. Chen and J. Shao., Jackknife variance estimation for nearestneighbor imputation, J. Am. Stat. Assoc., 96(453) (2001) 260– 269.
 T. Aljuaid and S. Sasi., Proper imputation techniques for missing values in data sets, in 2016 IEEE International Conference on Data Science and Engineering ICDSE, (2016).
 S. van Buuren and K. Groothuis-oudshoorn., mice : Multivariate Imputation by Chained Equations in R., J. Stat. Softw., 45(3) (2011) 1–67.
 S. Islam Khan and A. Sayed Md Latiful Hoque., SICE: an improved missing data imputation technique Background and related works,7(37)(2020).
 K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik, Predicting time series with support vector machines. In: Gerstner W., Germond A., Hasler M., Nicoud JD. (eds) Artificial Neural Networks — ICANN’97. ICANN 1997. Lecture Notes in Computer Science, 1327 (1997) Springer Berlin Heidelberg.
 H. Lee and K. Kang., Interpolation of missing precipitation data using kernel estimations for hydrologic modeling., Adv. Meteorol., 12(5) (2015).
 B. Rajagopalan and U. Lall., A k-nearest-neighbor simulator for daily precipitation and other variables, Water Resour. Res., 35(10) (1999) 3089–3101.
 S. Yakowitz and M. Karlsson., Nearest neighbor methods for time series, with application to rainfall/runoff prediction, in Advances in the Statistical Sciences: Stochastic Hydrology, Dordrecht: Springer Netherlands, (1987) 149–160.
 G. Kalton and L. Kish., Some efficient random imputation methods, Commun. Stat. - Theory Methods, 13(16) (1984) 1919–1939.
 Y. Yang., An evaluation of statistical approaches to text categorization, Inf. Retr. Boston., 1(1999) 1–2 69–90.
 A. Elshorbagy, S. P. Simonovic, and U. S. Panu., Estimation of missing streamflow data using principles of chaos theory, J. Hydrol., 255 (2002) 1–4 123–133, 2002.
 A. B. Hassanat, M. A. Abbadi, A. A. Alhasanat, and G. A. Altarawneh., Solving the problem of the K parameter in the KNN Classifier using an ensemble learning approach, Int. J. Comput. Sci. Inf. Secur., 12 (2014) 33–39.
 L. Breiman., Random forests,Mach. Learn., 45 (2001) 5–32.
 G. Chhabra, V. Vashisht, and J. Ranjan., A comparison of multiple imputation methods for data with missing values., Indian J. Sci. Technol.,10(19) (2017) 1–7.
 H. I. Erdal and O. Karakurt., Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms., J. Hydrol., 477 (2013) 119–128.
 L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. New York: Wadsworth Publishing, (1984).
 A. F. Van Loon and G. Laaha., Hydrological drought severity explained by climate and catchment characteristics, J. Hydrol., 526 (2015) 3–14.
 A. M. Carey and G. B. Paige., Ecological Site-Scale Hydrologic Response in a Semiarid Rangeland Watershed, Rangel. Ecol. Manag., 69(6) (2016) 481–490.
 L. Campozano, E. Sánchez, Á. Avilés, and E. Samaniego., Evaluation of infilling methods for time series of daily precipitation and temperature: The case of the Ecuadorian Andes., Maskana, 5(1) (2014) 99–115.
 A. K. Poul, M. Shourian, and H. Ebrahimi., A comparative study of MLR, KNN, ANN and ANFIS models with wavelet transform in monthly streamflow prediction, Water Resour. Manag., 33(2019) 2907–2923.
 S. C. Worland, W. H. Farmer, and J. E. Kiang., Improving predictions of hydrological low-flow indices in ungaged basins using machine learning,Environ. Model. Softw., 101 (2018) 169–182.
 J. J. Miró, V. Caselles, and M. J. Estrela., Multiple imputations of rainfall missing data in the Iberian Mediterranean context, Atmos. Res., 197 (2017) 2313–330.
 C. H. Cheng and S. J. Syu., Improving area positioning in ZigBee sensor networks using neural network algorithm, Microsyst. Technol., 27(4) (2021) 1419–1428.
 D. Bertsimas, C. Pawlowski, and Y. D. Zhuo., From predictive methods to missing data imputation: An optimization approach, J. Mach. Learn. Res., 18 (2018) 1–39.