Comparative Analysis of Various Tree Classifier Algorithms for Disease Datasets

Comparative Analysis of Various Tree Classifier Algorithms for Disease Datasets

  IJETT-book-cover           
  
© 2021 by IJETT Journal
Volume-69 Issue-6
Year of Publication : 2021
Authors : Sajithra. N, Dr.D.Ramyachitra
DOI :  10.14445/22315381/IJETT-V69I6P202

How to Cite?

Sajithra. N, Dr.D.Ramyachitra, "Comparative Analysis of Various Tree Classifier Algorithms for Disease Datasets," International Journal of Engineering Trends and Technology, vol. 69, no. 6, pp. 8-13, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I6P202

Abstract
Tree-Based Classification technique is one of the commonly used techniques called White box classification. It targets foreseeing to the having a place of cases or articles in the classes of a particular variable from their estimations on at least one prescient factor. This research work analyzes the concert of five tree-based classification algorithms, namely Decision Stump, J48, Logistic Model Tress (LMT), Random Forest, and REPTree. Various disease datasets such as breast cancer, Pima diabetes, and hypothyroid are utilized for calculating the performance of the classification algorithms by applying the 10-fold cross-validation parameter based on the given class label. Finally, the comparative analysis is held out, using the classification accuracy, kappa value, performance factors, and the error rate measures on all of the algorithms. From the experimental outcomes, it is derived that the LMT provides better results for all the disease datasets than the existing algorithms such as Decision Stump, J48, Random Forest, and REPTree.

Keywords
Decision Stump, J48, LMT, Random Forest, REPTree.

Reference
[1] ZhouJian et al., Masquerade detection by boosting decision stumps using UNIX commands, Computers & Security, 26(4) (2007) 311-318.
[2] S.Kokilavani Sankaralingam, N.Sathishkumar Nagarajan, A.S.Narmadha, Energy-aware decision stump linear programming boosting node classification based data aggregation in WSN, Computer Communications, 155(1) (2020) 133-142.
[3] Niels Landwehr, Mark Hall, Eibe Frank, Logistic Model Trees, Machine Learning: ECML 2003, 2837(2003) (2003) 241-252.
[4] Watshara Shoombuatong, Sayamon Hongjaisee, Francis Barin, Jeerayut Chaijaruwanich, HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees, Computers in Biology and Medicine, Volume 42(9),http://dx.doi.org/10.1016/j.compbiomed.2012.06.011.
[5] Andrew Daly, Estimating “tree” logit models, Transportation Research Part B: Methodological, 21(4), (1987) 251–267,http://dx.doi.org/ 10.1016/ 0191-2615(87) 90026 - 9.
[6] Mohmad Badr Al Snousy, Hesham Mohamed El-Deeb, Khaled Badran, Ibrahim Ali Al Khlil, Suite of decision tree-based classification algorithms on cancer gene expression data, Egyptian Informatics Journal 12, (2011) 73–82.
[7] Mu-Yen Chen, Predicting corporate financial distress based on the integration of decision tree classification and logistic regression, Expert Systems with Applications, 38(9) (2011) 11261-11272.
[8] Lakshmi Devasena C, Proficiency Comparison of LADTree and REPTree Classifiers for Credit Risk Forecast, International Journal on Computational Sciences & Applications (IJCSA) 5(1) (2015) 39-50.
[9] A. Franco-Arcega, J.A. Carrasco-Ochoa, G. Sánchez-Díaz, J.Fco. Martínez-Trinidad, “Decision tree induction using a fast splitting attribute selection for large datasets”, Expert Systems with Applications, 38(11) (2011) 14290-14300.
[10] Aljawarneh, S., Yassein, M.B. & Aljundi, M. An enhanced J48 classification algorithm for the anomaly intrusion detection systems. Cluster Comput 22, (2019) 10549–10565.
[11] Panigrahi, Ranjit & Borah, Samarjeet. Rank Allocation to J48 Group of Decision Tree Classifiers using Binary and Multiclass Intrusion Detection Datasets. Procedia Computer Science. 132. (2018) 323-332. 10.1016/j.procs.2018.05.186.
[12] Kellie J. Archer, Ryan V. Kimes, Empirical characterization of random forest variable importance measures, Computational Statistics & Data Analysis, 52 (4) (2018),http://dx.doi.org/10.1016/j.csda.2007.08.015.
[13] Lidia Auret, Chris Aldrich, Empirical comparison of tree ensemble variable importance measures, Chemometrics and Intelligent Laboratory Systems 105 (2011) 157–170.
[14] Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot, Variable selection using random forests, Pattern Recognition Letters 31 (2010) 2225–2236.
[15] Holte, Robert C.. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. CiteSeerX: 10.1.1.67.2711. (1993).
[16] N. Landwehr, M. Hall, and E. Frank, Logistic model trees, Machine Learning, 59(1-2), (2005) 161-205.
[17] L. Breiman, Random Forests, Machine Learning, 45 (1) (2001) 5-32.
[18] Snousy, Mohmad & El-Deeb, Mohamed & Badran, Khaled & Khlil, Ibrahim. The suite of decision tree-based classification algorithms on cancer gene expression data. Egyptian Informatics Journal. 12. 73-82. 10.1016/j.eij.2011.04.003. (2011).