Quantile Regression and Machine Learning based hybrid approach for Outlier Detection in Multivariate Time Series data

Quantile Regression and Machine Learning based hybrid approach for Outlier Detection in Multivariate Time Series data

  IJETT-book-cover           
  
© 2022 by IJETT Journal
Volume-70 Issue-6
Year of Publication : 2022
Authors : Dharmendra Patel, Pranav Vyas, Arpit Trivedi, Tushar Mehta, Kanubhai K Patel, Sanskruti Patel, Hardik Rajgor
DOI : 10.14445/22315381/IJETT-V70I6P221

How to Cite?

Dharmendra Patel, Pranav Vyas, Arpit Trivedi, Tushar Mehta, Kanubhai K Patel, Sanskruti Patel, Hardik Rajgor, "Quantile Regression and Machine Learning based hybrid approach for Outlier Detection in Multivariate Time Series data," International Journal of Engineering Trends and Technology, vol. 70, no. 6, pp. 185-194, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I6P221

Abstract
Univariate and Multivariate techniques can be used to discover outliers in multivariate time data. Univariate approaches are difficult to use because they require prior adjustments. On the other hand, multivariate approaches do not necessitate any prior adjustments and find outliers straight from the original data. The dimensionality reduction concept is used by most multivariate approaches to find outliers or anomalies in the data. The most common technique for dimensionality reduction is Principal Component Analysis (PCA). It is widely used in literature surveys to discover outliers and anomalies. However, it has several disadvantages, including being less interpretable, requiring feature scaling before usage, and losing data. This research proposes a new algorithm that uses a hybrid approach of Quantile Regression and Machine Learning to find outliers in multivariate time series data. The algorithm is compared with well-known techniques PCA and Ordinary least square regression(OLSR). The experimental results revealed that the proposed algorithm is simple, effective, and retain all information while detecting outliers.

Keywords
Outlier, Anomaly, Univariate, Multivariate, Quantile Regression, Principal Component Analysis(PCA), Ordinary least square regression(OLSR).

Reference
[1] B. A. V. K. V. Chandola, Anomaly Detection: A Survey, ACM Comput, Surv. 41(3) (2009) 1-72.
[2] A. B. A. V. K. V. Chandola, Anomaly Detection for Discrete Sequences:A Survey, IEEE Trans. Knowl. Data Eng. 24(5) (2012) 823 – 839.
[3] P. E. A. C. Agon, Time-Series Data Mining, ACM Comput. Surv. 45(1) (2012) 1-34.
[4] J. G. C. A. A. J. H. M. Gupta, Outlier Detection for Temporal Data - Morgan & Claypool Publishers. (2014).
[5] J. G. C. A. A. J. H. M. Gupta, Outlier Detection for Temporal Data: A Survey, IEEE Trans. Knowl. Data Eng. 26(9) (2014) 2250–2267.
[6] C. C. Aggarwal, Outlier Analysis, New York: Springer. (2017).
[7] J. S. A. C. F. S. Papadimitriou, Streaming Pattern Discovery in Multiple Time-Series, in In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 2005) ACM, Trondheim, Norway. (2005).
[8] D. P. A. R. S. T. P. Galeano, Outlier Detection in Multivariate Time Series by Projection Pursuit, Journal of the American Statistical Association. 101(474) (2006) 654–669.
[9] R. B. A. F. Battaglia, Outliers Detection in Multivariate Time Series by Independent Component Analysis, Neural Computation. 11(2) (2007) 1962–1984.
[10] D. S. A. R. M. F. J. H. R. R. D. H. A. G. B.-H. M. S. Shahriar, Detecting Heat Events in Dairy Cows Using Accelerometers and Unsupervised Learning, Computers and Electronics in Agriculture. 128 (2016) 20-26.
[11] Y. L. Z. F. A. C. G. H. Lu, An Outlier Detection Algorithm Based on Cross-Correlation Analysis for Time Series Dataset, IEEE Access. 6 (2018) 53593–53610.
[12] M. S. A. T. Yairi, Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction, in In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, ACM, Gold Coast, Australia. (2014).
[13] K. K. A. L. V. R. Z. Xu, Adaptive Streaming Anomaly Analysis, in In Proceedings of NIPS 2016 Workshop on Artificial Intelligence for Data, Barcelona, Spain. (2016).
[14] B. Y. A. C. S. J. T. Kieu, Outlier Detection for Multidimensional Time Series Using Deep Neural Networks, in In Proceedings of the 19th IEEE International Conference on Mobile Data Management, IEEE, Aalborg, Denmark. (2018).
[15] S. A. S. A. D. A. S. A. M. Munir, A Deep Learning Approach for Unsupervised Anomaly Detection in Time Series, IEEE Access. 7 (2019) 1991–2005.
[16] Y. Z. C. N. R. L. W. S. A. D. P. Y. Su, Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural, in In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’19), Anchorage, AK, USA. (2019).
[17] S. W. X. Z. M. L. Y. G. S. A. X. X. D. G. Q. Li, Using Quantile Regression Approach to Analyze Price Movements of Agricultural Products in China, Journal of Integrative Agriculture. 11(4) (2012) 674–683.
[18] K. O. A. S. Samreth, The Effect of Foreign Aid on Corruption: A Quantile Regression Approach, Economics Letters. 115(2) (2012) 240– 243.
[19] I. Helland, Some Theoretical Aspects of Partial Least Squares Regression, Chemometrics and Intelligent Laboratory Systems. 58 (2001) 97-107.
[20] D. E. N. M. L. M. M. A. D. B. York, Unified Equations for the Slope, Intercept, and Standard Errors of the Best Straight Line, American Journal of Physics. 72(3) (2004) 367–375.
[21] N. C. A. J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Royal Holloway, University of London: Cambridge: Cambridge University Press. (2000).
[22] H. X. Z. X. Y. Z. M. C. L. Z. D. H. B. T. F. C Y Zhao, Application of support Vector Machine (SVM) for Prediction Toxic Activity of Different Data Sets, Toxicology. 217 (2006) 105-119.
[23] Cao L.J and E.H. Tay, Support Vector with Adaptive Parameters in Financial Time Series Forecasting, IEEE Trans. Neural Network. 14 (2001) 1506-1518.
[24] Chang C.C, Lin C.J, LIBSVM: A Library for Support Vector Machines, ACM Transactions on Intelligent Systems and Technology. 2(27) (2011) 1–27.
[25] Tay F.E.H, Cao L.J, Modified Support Vector Machines in Financial Time Series Forecasting, Neurocomputing. 48 (2002) 847–861.