A Systematic Ensemble Approach for Concept Drift Detector Selection in Data Stream Classifiers

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
© 2022 by IJETT Journal
Volume-70 Issue-9
Year of Publication : 2022
Authors : Rucha Chetan Samant, Suhas H. Patil, Rahul Nand Sinha, Amol K. Kadam
DOI : 10.14445/22315381/IJETT-V70I9P212

How to Cite?

Rucha Chetan Samant, Suhas H. Patil, Rahul Nand Sinha, Amol K. Kadam, "A Systematic Ensemble Approach for Concept Drift Detector Selection in Data Stream Classifiers," International Journal of Engineering Trends and Technology, vol. 70, no. 9, pp. 119-130, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I9P212

Most applications generate data in a stream format in the Big Data world. Mining this data stream is considered necessary to extract meaningful information from such a large amount of data. To be successful in this well-known field of analytics, traditional classification, clustering, and aggregation techniques must be improved. Ensemble-based classifiers developed using bagging, boosting, or hybrid methods outperformed traditional single classifiers. The ensemble concept has been shown to improve classifier accuracy and diversity in design. At the same time, using a drift detector to address the concept drift issue of a data stream has yielded fantastic results. The primary goal of this proposed system is to provide a suitable methodology for selecting an appropriate drift detector for an effective ensemble classifier by combining a cuttingedge base ensemble classifier with standard drift detectors. Similarly, this paper also examined a proposed boosting ensemble strategy using several drift detectors to determine the most effective combination to address all types of concept drift. The results and analysis discussed in this paper are expected to be relevant and useful for selecting the proper parameters of drift detectors and designing strong ensemble classifiers.

Concept Drift, Data Stream mining, Drift Detector, Ensemble-based learning, Real-time data analysis.

[1] S. R. Nikunj oza, “Online Bagging and Boosting,” in 8th Int. Workshop on Artificial Intelligence and Statistics, pp. 105–112, 2001.
[2] I. Frías-Blanco, A. Verdecia-Cabrera, A. Ortiz-Díaz, and A. Carvalho, “Fast adaptive stacking of ensembles,” Proc. ACM Symp. Appl. Comput., vol. 04-08, pp. 929–934, 2016, doi: 10.1145/2851613.2851655.
[3] M. Kholghi, H. Hassanzadeh, and M. R. Keyvanpour, “Classification and evaluation of data mining techniques for data stream requirements,” 3CA 2010 - 2010 Int. Symp. Comput. Commun. Control Autom., vol. 1, pp. 474–478, 2010, doi: 10.1109/3CA.2010.5533759.
[4] S. Muthukrishnan, “Data Streams: Algorithms and Applications,” Data Streams Algorithms Appl., pp. 1–39, 2005, doi: 10.1561/9781933019604.
[5] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “On demand classification of data streams,” KDD-2004 - Proc. Tenth ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 503–508, 2004, doi: 10.1145/1014052.1014110.
[6] H. M. Gomes, J. P. Barddal, A. F. Enembreck, and A. Bifet, “A survey on ensemble learning for data stream classification,” ACM Comput. Surv., vol. 50, no. 2, 2017, doi: 10.1145/3054925.
[7] H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining concept-drifting data streams using ensemble classifiers,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 226–235, 2003, doi: 10.1145/956750.956778.
[8] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà, “New ensemble methods for evolving data streams,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 139–147, 2009, doi: 10.1145/1557019.1557041.
[9] A. Bifet, G. Holmes, and B. Pfahringer, “Leveraging bagging for evolving data streams,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6321 LNAI, no. PART 1, pp. 135–150, 2010, doi: 10.1007/978-3-642-15880- 3_15.
[10] D. Brzezinski and J. Stefanowski, “Combining block-based and online methods in learning ensembles from concept drifting data streams,” Inf. Sci. (Ny), vol. 265, pp. 50–67, 2014, doi: 10.1016/j.ins.2013.12.011.
[11] R.Samant and S. Patil, “Comparative Analysis of drift detection techniques used in ensemble classification approach,” in International Conference on Recent Challenges in Engineering Science and Technology (ICRCEST 2K21), pp. 201–204, 2021.
[12] L. Durga and R. Deepu, “Ensemble Learning Based Analysis Correlating Graphology to Big Five Personality Model,” Int. J. Eng. Trends Technol., vol. 70, no. 1, pp. 240–251, 2022, doi: 10.14445/22315381/IJETT-V70I1P229.
[13] S. L. V. Papineni, S. Yarlagadda, H. Akkineni, and A. M. Reddy, “Big data analytics applying the fusion approach of multicriteria decision making with deep learning algorithms,” Int. J. Eng. Trends Technol., vol. 69, no. 1, pp. 24–28, 2021, doi: 10.14445/22315381/IJETT-V69I1P204.
[14] A. Bifet, S. Maniu, J. Qian, G. Tian, C. He, and W. Fan, “StreamDM: Advanced Data Mining in Spark Streaming,” in 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Nov. 2015, pp. 1608–1611. doi: 10.1109/ICDMW.2015.140.
[15] A. Bifet and R. Gavaldà, “Learning from time-changing data with adaptive windowing,” Proc. 7th SIAM Int. Conf. Data Min., no. April, pp. 443–448, 2007, doi: 10.1137/1.9781611972771.42.
[16] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3171, no. September, pp. 286–295, 2004, doi: 10.1007/978-3-540-28645-5_29.
[17] A. Bifet et al., “Early Drift Detection Method,” 4th ECML PKDD Int. Work. Knowl. Discov. from Data Streams, vol. 6, no. August 2014, pp. 77–86, 2006.
[18] W. Nick Street and Y. S. Kim, “A streaming ensemble algorithm (SEA) for large-scale classification,” Proc. Seventh ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 4, pp. 377–382, 2001, doi: 10.1145/502512.502568.
[19] A. Asuncion and D. J. Newman, “UCI Machine Learning Repository,” 2007. http://www.ics.uci.edu/~mlearn/ MLRepository.html
[20] “Datasets.” https://github.com/vlosing/driftDatasets
[21] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “MOA: Massive Online Analysis,” J. Mach. Learn. Res., vol. 11, pp. 1601–1604, 2010.
[22] R. C. Samant and S. H. Patil, “An Enhanced Online Boosting Ensemble Classification Technique to Deal with Data Drift,” [Manuscript submitted for publication, Department of Computer Engineering, Bharati Vidyapeeth Deemed University, Pune, 2022.
[23] R. C. Samant and S. H. Patil, “A Systematic and Novel Ensemble Construction Method for Handling Data Stream Challenges,” Chen, J.IZ., Tavares, J.M.R.S., Shi, F. Third Int. Conf. Image Process. Capsul. Networks. ICIPCN 2022. Lect. Notes Networks Syst., vol. 514, 2022, doi: https://doi.org/10.1007/978-3-031-12413-6_20.
[24] R. S. M. De Barros, S. G. T. De Carvalho Santos, and P. M. G. Junior, “A Boosting-like Online Learning Ensemble,” Proc. Int. Jt. Conf. Neural Networks, vol. 2016, no. 2018, pp. 1871–1878, 2016, doi: 10.1109/IJCNN.2016.7727427.