Research Article | Open Access | Download PDF
Volume 74 | Issue 6 | Year 2026 | Article Id. IJETT-V74I6P120 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I6P120FastPedia-ML: An Interpretable Machine-Learning Framework for Pediatric Leukemia Subtype Classification using Gene-Expression Data
SAMUNDI R, VIJAYARANI J
| Received | Revised | Accepted | Published |
|---|---|---|---|
| 28 Dec 2025 | 24 Mar 2026 | 28 Mar 2026 | 27 Jun 2026 |
Citation :
SAMUNDI R, VIJAYARANI J, "FastPedia-ML: An Interpretable Machine-Learning Framework for Pediatric Leukemia Subtype Classification using Gene-Expression Data," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 6, pp. 269-293, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I6P120
Abstract
Pediatric Acute Myeloid Leukemia (pAML) is a heterogeneous disease with complicated genomic variants that make it difficult to subclassify the disease properly. The paper suggests a strong and explainable machine learning model applied to the classification of pediatric leukemia subtypes based on high-dimensional data in microarray gene expression. The framework combines ANOVA-based feature selection and variance-based filtering to minimize dimensionality, as well as adaptive SMOTE in order to deal with the imbalance of classes. The strategy of cross-validation is used in a nested way to guarantee the unbiased model evaluation and hyperparameters optimum. Three classifiers, which are Random Forest (RF), Support Vector Machine (SVM), and XGBoost are compared in the terms of weighted F1-score, MCC, and ROC-AUC. The experimental results on the GSE9476 dataset indicate that RF and SVM can be used to obtain perfect classification performance (F1-score = 1.000), whereas XGBoost can be used to obtain competitive results (F1 = 0.919). Statistical significance (p = 0.001) is proven by permutation testing. SHAP-based analysis also determines the biologically significant genes that correlate with the development of leukemia. The suggested framework has a high predictive power, robustness, and interpretability, which shows the possibility of using it in the context of precision medicine to diagnose pediatric leukemia.
Keywords
Pediatric Acute Myeloid Leukemia (Paml), Gene Expression Analysis, Microarray Data, Machine Learning, Random Forest, Support Vector Machine, Xgboost, Feature Selection, Anova, Smote, Nested Cross-Validation, High-Dimensional Data, Shap, Explainable Artificial Intelligence, Biomarker Identification, Precision Medicine.
References
[1] Mahwish Ilyas et al.,
“Linear Programming based Computational Technique for Leukemia Classification
using Gene Expression Profile,” PLOS ONE, vol. 18, no. 10, pp. 1-21,
2023.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Dirk Reinhardt, Evangelia
Antoniou, and Katharina Waack, “Pediatric Acute Myeloid Leukemia—Past, Present,
and Future,” Journal of Clinical Medicine, vol. 11, no. 3, pp. 1-16,
2022.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Jan-Niklas Eckardt et al.,
“Application of Machine Learning in the Management of Acute Myeloid Leukemia:
Current Practice and Future Prospects,” Blood Advances, vol. 4, no. 23,
pp. 6077-6085, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Masayuki Umeda et al., “A
New Genomic Framework to Categorize Pediatric Acute Myeloid Leukemia,” Nature
Genetics, vol. 56, no. 2, pp. 281-293, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Ophir Gal et al.,
“Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning
Applied to Gene Expression,” Cancer Informatics, vol. 18, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Jan-Niklas Eckardt et al.,
“Prediction of Complete Remission and Survival in Acute Myeloid Leukemia using
Supervised Machine Learning,” Haematologica, vol. 108, no. 3, pp.
690-704, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[7] G.J.L. Kaspers, and U.
Creutzig, “Pediatric Acute Myeloid Leukemia: International Progress and Future
Directions,” Leukemia, vol. 19, no. 12, pp. 2025-2029, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Sabine Kayser, and Mark J.
Levis, “The Clinical Impact of the Molecular Landscape of Acute Myeloid
Leukemia,” Haematologica, vol. 108, no. 2, pp. 308-320, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Andrew Hindley et al.,
“Significance of NPM1 Gene Mutations in AML,” International Journal of
Molecular Sciences, vol. 22, no. 18, pp. 1-16, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Iris Z. Uras, Veronika
Sexl, and Karoline Kollmann, “CDK6 Inhibition: A Novel Approach in AML
Management,” International Journal of Molecular Sciences, vol. 21, no.
7, pp. 1-16, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Yun Tian et al.,
“Single-Cell Dissection Reveals Promotive Role of ENO1 in Leukemia Stem
Cell Self-Renewal and Chemoresistance in Acute Myeloid Leukemia,” Stem Cell
Research and Therapy, vol. 15, no. 1, pp. 1-19, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Keyvan Karami et al.,
“Survival Prognostic Factors in Patients with AML using Machine Learning
Techniques,” PLOS One, vol. 16, no. 7, pp. 1-19, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Mostafa Shanbehzadeh et
al., “Comparing Machine Learning Algorithms to Predict 5-Year Survival in
Patients with Chronic Myeloid Leukemia,” BMC Medical Informatics and
Decision Making, vol. 22, no. 1, pp. 1-13, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Bor-Sheng Ko et al.,
“Clinically Validated Machine Learning Algorithm for Detecting Residual
Diseases with Multicolor Flow Cytometry Analysis in Acute Myeloid Leukemia and
Myelodysplastic Syndrome,” EBioMedicine, vol. 37, pp. 91-100, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Su-In Lee et al., “A
Machine Learning Approach to Integrate Big Data for Precision Medicine in Acute
Myeloid Leukemia,” Nature Communications, vol. 9, no. 1, pp. 1-13, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Yu Qin et al., “Machine
Learning-based Biomarker Screening for Acute Myeloid Leukemia Prognosis and
Therapy from Diverse Cell-Death Patterns,” Scientific Reports, vol. 14,
no. 1, pp. 1-15, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Yushuang Dong et al.,
“Machine Learning Approaches Reveal Methylation Signatures Associated with
Pediatric Acute Myeloid Leukemia Recurrence,” Scientific Reports, vol.
15, no. 1, pp. 1-17, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Raíssa Silva et al., “Acute
Myeloid Leukemia Risk Stratification in Younger and Older Patients Through
Transcriptomic Machine Learning Models,” Scientific Reports, vol. 15,
no. 1, pp. 1-12, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[19] David Shyr et al.,
“Exploring Pattern of Relapse in Pediatric Patients with Acute Lymphocytic
Leukemia and Acute Myeloid Leukemia Undergoing Stem Cell Transplant using
Machine Learning Methods,” Journal of Clinical Medicine, vol. 13, no.
14, pp. 1-13, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Chang Jian et al.,
“Predicting Delayed Methotrexate Elimination in Pediatric Acute Lymphoblastic
Leukemia Patients: An Innovative Web-based Machine Learning Tool Developed
through a Multicenter, Retrospective Analysis,” BMC Medical Informatics and
Decision Making, vol. 23, no. 1, pp. 1-12, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Yu Tao et al., “Integrating
Transcriptomic Profiling and Machine Learning: A Clinically Actionable
Prognostic Model for Infant Acute Myeloid Leukemia,” HemaSphere, vol. 9,
no. 11, pp. 1-11, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Zhenqiu Liu, and Irina
Elcheva, “A Six-Gene Prognostic Signature for Both Adult and Pediatric Acute
Myeloid Leukemia Identified with Machine Learning,” American Journal of
Translational Research, vol. 14, no. 9, pp. 1-15, 2022.
[Google Scholar]
[23] Gene Expression Omnibus
(GEO), GSE9476: Gene Expression Omnibus, 2007. [Online]. Available: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9476
[24] Razieh Sheikhpour,
Roohallah Fazli, and Sanaz Mehrabani, “Gene Identification from Microarray Data
for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia using a Sparse Gene
Selection Method,” Iranian Journal of Pediatric Haematology and Oncology,
vol. 11, no. 2, pp. 70-77, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Sadam Al-Azani et al.,
“Gene Expression-based Cancer Classification for Handling the Class Imbalance
Problem and Curse of Dimensionality,” International Journal of Molecular
Sciences, vol. 25, no. 4, pp. 1-17, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Kun Yu et al., “Gsenet:
Feature Extraction of Gene Expression Data and its Application to Leukemia
Classification,” Mathematical Biosciences and Engineering, vol. 19, no.
5, pp. 4881-4891, 2022.
[CrossRef] [Google Scholar] [Publisher Link]