Diagnosing Diabetes Onset with Machine Learning Enhanced Predictive Analysis in Pima Indians
Diagnosing Diabetes Onset with Machine Learning Enhanced Predictive Analysis in Pima Indians |
||
![]() |
![]() |
|
© 2025 by IJETT Journal | ||
Volume-73 Issue-8 |
||
Year of Publication : 2025 | ||
Author : Abhishek Kumar, Partha Sarathi Bishnu, Pushpanjali R. Ojha | ||
DOI : 10.14445/22315381/IJETT-V73I8P127 |
How to Cite?
Abhishek Kumar, Partha Sarathi Bishnu, Pushpanjali R. Ojha,"Diagnosing Diabetes Onset with Machine Learning Enhanced Predictive Analysis in Pima Indians", International Journal of Engineering Trends and Technology, vol. 73, no. 8, pp.312-332, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I8P127
Abstract
Since diabetes is becoming more common, early and precise prediction is essential for prevention and management. This study predicts diabetes mellitus in the Pima Indian community using Deep Learning (DL) and advanced Machine Learning (ML) techniques. Model performance improved significantly through meticulous data preprocessing, incorporating imputation for missing values, data balancing techniques, advanced feature engineering, and sophisticated statistical methods for managing incomplete data and identifying key features. The models employed here include a multilayer deep learning model called the Tree-based Pipeline Optimization Tool (TPOT) and ensemble learning using LightGBM and K-Nearest Neighbors (KNN). High accuracy score, precision score, recall score, and F1 score of roughly 94.5% were attained by each model following a rigorous review and improvement procedure. Comprehensive experiments were conducted, with results analyzed graphically and numerically, offering in-depth insights and recommendations. The proposed approach outperforms the most advanced techniques already in use, proving its efficacy and emphasizing the critical role that prompt and precise prediction plays in the prevention and treatment of diabetes in high-risk populations.
Keywords
Deep Learning, Diabetes prediction, Feature engineering, Machine Learning, Medical informatics.
References
[1] Ahmed F. Ashour et al., “Optimized Neural Networks for Diabetes Classification Using Pima Indians Diabetes Database,” 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), Mt Pleasant, MI, USA, pp. 1-7, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Mohammed Awad, and Salam Fraihat, “Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-Based Intrusion Detection Systems,” Journal of Sensor and Actuator Networks, vol. 12, no. 5, pp. 1-23, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[3] N.G. Bhuvaneswari Amma, “En-RFRSK: An Ensemble Machine Learning Technique for Prognostication of Diabetes Mellitus,” Egyptian Informatics Journal, vol. 25, pp. 1-8, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann Publishers, 2011.
[Google Scholar] [Publisher Link]
[5] Victor Chang et al., “Pima Indians Diabetes Mellitus Classification Based on Machine Learning (Ml) Algorithms,” Neural Computing and Applications, vol. 35, no. 16, pp. 16157-16173, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Saiteja Prasad Chatrati et al., “Smart Home Health Monitoring System for Predicting Type 2 Diabetes and Hypertension,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 3, pp. 862-870, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Nitesh V. Chawla et al., “Smote: Synthetic Minority Over Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Pima Indians Diabetes Database, Kaggle. [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
[9] Himanshu Gupta et al., “Comparative Performance Analysis of Quantum Machine Learning with Deep Learning for Diabetes Prediction,” Complex & Intelligent Systems, vol. 8, no. 4, pp. 3073-3087, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[10] IDF Diabetes Atlas, International Diabetes Federation, pp. 1-143, 2011.
[Google Scholar] [Publisher Link]
[11] Reza Iranzad, and Xiao Liu, “A Review of Random Forest-Based Feature Selection Methods for Data Science Education and Applications,” International Journal of Data Science and Analytics, vol. 20, no. 2, pp. 197-211, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Vishesh Jain, Sanyam Shukla, and Nilay Khare, “Analysis of Various Data Imputation Techniques for Diabetes Classification on Pima Dataset,” 2024 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, vol. 35, pp. 1-6, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Hyun Kang, “The Prevention and Handling of the Missing Data,” Korean Journal of Anesthesiology, vol. 64, no. 5, pp. 402-406, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[14] K. Kannadasan, Damodar Reddy Edla, and Venkatanareshbabu Kuppili, “Type 2 Diabetes Data Classification using Stacked Autoencoders in Deep Neural Networks,” Clinical Epidemiology and Global Health, vol. 7, no. 4, pp. 530-535, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Jobeda Jamal Khanam, and Simon Y. Foo, “A Comparison of Machine Learning Algorithms for Diabetes Prediction,” ICT Express, vol. 7, no. 4, pp. 432-439, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Yasunobu Nohara et al., “Explanation of Machine Learning Models Using Shapley Additive Explanation and Application for Real Data in Hospital,” Computer Methods and Programs in Biomedicine, vol. 214, pp. 1-7, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Cindy Nabila Noviyanti, and Alamsyah Alamsyah, “Early Detection of Diabetes using Random Forest Algorithm,” Journal of Information System Exploration and Research, vol. 2, no. 1, pp. 41-48, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Randal S. Olson, and Jason H. Moore, “Tpot: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning,” Proceedings of Machine Learning Research (PMLR), vol. 64, pp. 66-74, 2016.
[Google Scholar] [Publisher Link]
[19] Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren, Automated Machine Learning, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Jayroop Ramesh, Raafat Aburukba, and Assim Sagahyroon, “A Remote Healthcare Monitoring Framework for Diabetes Prediction Using Machine Learning,” Healthcare Technology Letters, vol. 8, no. 2, pp. 45-57, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Md Shamim Reza et al., “Improving Diabetes Disease Patients Classification Using Stacking Ensemble Method with Pima and Local Healthcare Data,” Heliyon, vol. 10, no. 2, pp. 1-13, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[22] M. Jishnu Sai et al., “An Ensemble of Light Gradient Boosting Machine and Adaptive Boosting for Prediction of Type-2 Diabetes,” International Journal of Computational Intelligence Systems, vol. 16, no. 1, pp. 1-20, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Merdin Shamal Salih et al., “Diabetic Prediction Based on Machine Learning Using Pima Indian Dataset,” Communications on Applied Nonlinear Analysis, vol. 31, no. 5s, pp. 138-156, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Surabhi Saxena et al., “Machine Learning Algorithms for Diabetes Detection: A Comparative Evaluation of Performance of Algorithms,” Evolutionary Intelligence, vol. 16, no. 2, pp. 587-603, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Ahmad S. Tarawneh et al., “CTELC: A Constant-Time Ensemble Learning Classifier Based on KNN for Big Data,” IEEE Access, vol. 11, pp. 89791-89802, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Linshan Xie, “Pima Indian Diabetes Database and Machine Learning Models for Diabetes Prediction,” Highlights in Science, Engineering and Technology, vol. 88, pp. 97-103, 2024.
[CrossRef] [Google Scholar] [Publisher Link]