Diabetes prediction based on Ensemble Methods

Authors

  • Jihan Askandar Mosa Information Technology Management Dept., Technical College of Administration, Duhok Polytechnic University, Duhok, Iraq; Information Technology Dept., Shekhan Technical Institute, Duhok Polytechnic University, Duhok, Iraq
  • Adnan Mohsin Abdulazeez Technical College of Engineering, Duhok Polytechnic University, Duhok, Iraq https://orcid.org/0000-0002-4357-7331

DOI:

https://doi.org/10.19139/soic-2310-5070-2771

Keywords:

Diabetes Prediction, Ensemble Learning, Gradient Boosting, AdaBoost, XGBoost

Abstract

The incidence of diabetes, a chronic disease, is increasing worldwide, especially in low- and middle-income countries. To reduce complications and improve patient outcomes, early and accurate prediction is critical. Using two benchmark datasets, this test demonstrates an ensemble-based machine learning framework for diabetes prediction. Two ensemble strategies were evaluated using the Diabetes Prediction dataset and the Indian Diabetes Pima dataset: a sequential ensemble combining XGBoost, gradient boosting, and AdaBoost, and a parallel ensemble using a smooth voting classifier that encompassed logistic regression, decision tree, and K-Nearest Neighbors. forward feature selection strategies were used to find the most relevant predictors, improving model performance and generalizability. 70% of the data was used for training, 15% for validation, and 15% for testing. According to the experimental results, the sequential ensemble performed better on the Indian Pima dataset, achieving a training accuracy of 98.95%, a validation accuracy of 97.59%, and an F1 accuracy of 97.77%. This performance was better than the parallel ensemble, which achieved an F1 score of 96.62%, a validation accuracy of 96.38%, and a training accuracy of 98.16%. Overall, the sequential model outperformed both datasets, with the diabetes prediction dataset showing better performance than the parallel model. These results demonstrate how feature selection methods and boosting-based ensemble models can work together to create accurate and reliable medical prediction systems.

Downloads

Published

2025-10-04

Issue

Section

Research Articles

How to Cite

Diabetes prediction based on Ensemble Methods. (2025). Statistics, Optimization & Information Computing, 14(6), 3359-3379. https://doi.org/10.19139/soic-2310-5070-2771