Student Performance Analysis using Bayesian Optimized Random Forest Classifier and KNN

Student Performance Analysis using Bayesian Optimized Random Forest Classifier and KNN

  IJETT-book-cover           
  
© 2023 by IJETT Journal
Volume-71 Issue-5
Year of Publication : 2023
Author : Safira Begum, Sunita S Padmannavar
DOI : 10.14445/22315381/IJETT-V71I5P213

How to Cite?

Safira Begum, Sunita S Padmannavar, "Student Performance Analysis using Bayesian Optimized Random Forest Classifier and KNN," International Journal of Engineering Trends and Technology, vol. 71, no. 5, pp. 132-140, 2023. Crossref, https://doi.org/10.14445/22315381/IJETT-V71I5P213

Abstract
The importance of Educational Data Mining (EDM), a new interdisciplinary study field that builds on various other disciplines, is rising. It is directly connected to data mining (DM), which is crucial to finding knowledge in databases (KDD). This data is expanding exponentially and may include valuable hidden information for users (both teachers and students). Such knowledge can easily be recognized as models, patterns, or any other type of representational scheme that enables improved system exploitation. It is discovered that data mining can be used to make similar discoveries, giving rise to EDM. To get the best outcomes in this complicated setting, many approaches and learning algorithms are typically applied. In recent times, educational systems have witnessed a surge in using artificial intelligence (AI) systems, especially for extracting relevant information. One such AI system is EDM, which combines various techniques to support the capture, processing, and analysis of these record sets. The primary method used in EDM is machine learning, which has been applied more frequently since the emergence of big data to extract useful information from a vast amount of data. Machine learning has been used for decades in data processing in various contexts. Educational data mining tools and algorithms can be used to assess student academic achievement. This study offers a fresh approach to forecasting student success in middle school Portuguese and mathematics subjects. Hyperparameter tuning of classifiers is essential to overcome the misclassification of conventional classifiers. In order to predict student performance in the UCI dataset, this work proposes a Bayesian-optimized KNN and a random forest classifier. For random forest and KNN, the attained accuracy is 87% and 73%, respectively.

Keywords
Bayesian Optimization, Educational Data Mining, KNN, RF, UCI.

References
[1] R. S. Baker, “Big Data and Education,” New York: Teachers College, Columbia University, 2015.
[Google Scholar]
[2] Cristobal Romero, and Sebastian Ventura, “Educational Data Mining and Learning Analytics: An Updated Survey,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 10, no. 3, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Hanan Aldowah, Hosam Al-Samarraie, and Wan Mohamad Fauzy, “Educational Data Mining and Learning Analytics for 21st Century Higher Education: A Review and Synthesis,” Telematics and Informatics, vol. 37, pp. 13-49, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[4] K. Sutha, and J. Jebamalar Tamilselvi, “A Review of Feature Selection Algorithms for Data Mining Techniques,” International Journal on Computer Science and Engineering, vol. 7, no. 6, pp. 63-67, 2015.
[Google Scholar] [Publisher Link]
[5] Harsh H. Patel, and Purvi Prajapati, “Study and Analysis of Decision Tree Based Classification Algorithms,” International Journal of Computer Sciences and Engineering, vol. 6, no. 10, pp. 74-78, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Maria Tsiakmaki et al., “Transfer Learning from Deep Neural Networks for Predicting Student Performance,” Applied Sciences, vol. 10, no. 6, pp. 2145, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Tiago Luís de Andrade, Sandro José Rigo, and Jorge Luis Victória Barbosa, “Active Methodology, Educational Data Mining and Learning Analytics: A Systematic Mapping Study,” Informatics in Education, vol. 20, no. 2, pp. 171, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[8] E. Mangina, and G. Psyrra, “Review of Learning Analytics and Educational Data Mining Applications,” EDULEARN21 Proceedings, IATED Digital Library, pp. 949-954, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Zenum Kastrati et al., “Sentiment Analysis of Students’ Feedback with NLP and Deep Learning: A Systematic Mapping Study,” Applied Sciences, vol. 11, no. 9, pp. 3986, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[10] A. Namoun, and A. Alshanqiti, “Predicting Student Performance Using Data Mining and Learning Analytics Techniques: A Systematic Literature Review,” Applied Sciences, vol. 11, no. 1, pp. 237, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Christian Fischer et al., “Mining Big Data in Education: Affordances and Challenges,” Review of Research in Education, vol. 44, no. 1, pp. 130-160, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Cristobal Romero, and Sebastian Ventura, “Educational Data Mining and Learning Analytics: An Updated Survey,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 10, no. 3, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Francisca Onaolapo Oladipo, Ogunsanya Funmilayo Blessing, and Ezendu Ariwa, “Terrorism Detection Model using Naive Bayes Classifier,” SSRG International Journal of Computer Science and Engineering, vol. 7, no. 12, pp. 9-15, 2020.
[CrossRef] [Publisher Link]
[14] Muhammad Haziq Bin Roslan, and Chwen Jen Chen, “Educational Data Mining for Student Performance Prediction: A Systematic Literature Review (2015-2021),” International Journal of Emerging Technologies in Learning, vol. 17, no. 5, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Anupam Khan, and Soumya K. Ghosh, “Student Performance Analysis and Prediction in Classroom Learning: A Review of Educational Data Mining Studies,” Education and Information Technologies, vol. 26, pp. 205-240, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Y. K. Salal, S. M. Abdullaev, and M. Kumar, “Educational Data Mining: Student Performance Prediction in Academic,” International Journal of Engineering and Advanced Technology, vol. 8, no. 4C, pp. 54-59, 2019.
[Google Scholar] [Publisher Link]
[17] Wen Xiao, Ping Ji, and Juan Hu “A Survey on Educational Data Mining Methods Used for Predicting Students' Performance,” Engineering Reports, vol. 4, no. 5, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[18] A. Nabil, M. Seyam, and A. Abou-Elfetouh, “Prediction of students’ Academic Performance Based on Courses’ Grades Using Deep Neural Networks,” IEEE Access, vol. 9, pp. 140731-140746, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Vandana Mulye, and Atul Newase, “A Review: Recruitment Prediction Analysis of Undergraduate Engineering Students Using Data Mining Techniques,” SSRG International Journal of Computer Science and Engineering, vol. 8, no. 3, pp. 1-6, 2021.
[CrossRef] [Publisher Link]
[20] Ferda Ünal, “Data Mining for Student Performance Prediction in Education,” Data Mining-Methods, Applications and Systems, vol. 28, pp. 423-432, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Suhas S. Athani et al., “Student Academic Performance and Social Behavior Predictor Using Data Mining Techniques,” International Conference on Computing, Communication and Automation (ICCCA), pp.170-174, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Xiaofeng Ma, and Zhurong Zhou, “Student Pass Rates Prediction Using Optimized Support Vector Machine and Decision Tree,” IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), pp. 209-215, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[23] C. Troussas, M. Virvou, and S. Mesaretzidis, “Comparative Analysis of Algorithms for Student Characteristics Classification Using a Methodological Framework,” 6th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1-5, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Mandeep Singh et al., “Towards Enthusiasm Prediction of Portuguese School's Students towards Higher Education in Realtime,” International Conference on Computation, Automation and Knowledge Management (ICCAKM), pp. 421-425, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Akanksha Pandey, and L S Maurya, "Career Prediction Classifiers based on Academic Performance and Skills using Machine Learning,” SSRG International Journal of Computer Science and Engineering, vol. 9, no. 3, pp. 5-20, 2022.
[CrossRef] [Publisher Link]
[26] Akhilesh Kumar Srivastava et al., “Prediction of Students Performance Using KNN and Decision Tree-A Machine Learning Approach,” Strad Reserch, vol. 7, no. 9, pp. 119-125, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Maryam Zaffar et al., “Role of FCBF Feature Selection in Educational Data Mining,” Mehran University Research Journal of Engineering & Technology, vol. 39, no. 4, pp. 772-778, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Xing Xu et al., “Prediction of Academic Performance Associated with Internet Usage Behaviors Using Machine Learning Algorithms,” Computers in Human Behavior, vol. 98, pp. 166-173, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Paniah Emeko Kodzo, “The Determinants of Educational Performance in Togo,” SSRG International Journal of Economics and Management Studies, vol. 9, no. 11, pp. 6-16, 2022.
[CrossRef] [Publisher Link]
[30] Charu C. Aggarwal, “Mining Text Data,” Springer International Publishing, pp. 429-455, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Alfredo Daza et al., “Predicting Academic Performance through Data Mining: A Systematic Literature,” TEM Journal, vol. 11, no. 2, pp. 939, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Jurgen Schmidhuber, “Deep Learning in Neural Networks: An overview,” Neural Networks, vol. 61, pp. 85-117, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Cindi Mason et al., “Predicting Engineering Student Attrition Risk Using a Probabilistic Neural Network and Comparing Results with a Backpropagation Neural Network and Logistic Regression,” Research in Higher Education, vol. 59, pp. 382-400, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[34] C. Managwu, D.Matthias, and N. Nwaibu, "Random Forest Regression Model for Estimation of Neonatal Levels In Nigeria," SSRG International Journal of Computer Science and Engineering, vol. 7, no. 7, pp. 41-44, 2020.
[CrossRef] [Publisher Link]
[35] K. Deepika, N. Sathyanarayana, and N. Sathyanarayana, “Relief-F and Budget Tree Random Forest Based Feature Selection for Student Academic Performance Prediction,” International Journal of Intelligent Engineering and Systems, vol. 12, no.1, pp. 30-39, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Safira Begum, and Sunita S Padmannavar, “Prediction of Student Performance using Genetically Optimized Feature Selection with Multiclass Classification,” International Journal of Engineering Trends and Technology, vol. 70, no. 4, pp. 223-235, 2022.
[CrossRef] [Publisher Link]
[37] G. Lampropoulos, “Educational Data Mining and Learning Analytics in the 21st Century”, Encyclopedia of Data Science and Machine Learning, IGI Global, pp. 1642-1651, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[38] M. Yağcı, “Educational Data Mining: Prediction of Students' Academic Performance Using Machine Learning Algorithms,” Smart Learning Environments, vol. 9, no. 1, pp. 11 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[39] Nidhi Walia et al., “Student’s Academic Performance Prediction in Academic Using Data Mining Techniques,” International Conference on Innovative Computing & Communications (ICICC), 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[40] S.aid A. Salloum et al., “Mining in Educational Data: Review and Future Directions,” International Conference on Artificial Intelligence and Computer Vision (AICV2020), Springer International Publishing, pp. 92-102, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[41] Student Performance Data Set, UCI Machine Learning Repository, Online [available]: https://archive.ics.uci.edu/ml/datasets/student+performance