Ensemble Based Machine Learning Model for Prediction of Diabetes
DOI:
https://doi.org/10.3126/nprcjmr.v3i2.91272Keywords:
Diabetes, Accuracy, Prediction, XGBoost, Machine Learning, Ensemble LearningAbstract
Background: Early prediction of the disease is of prime importance to reduce the risk of complications and health-care expenses, especially in developing countries. Though conventional machine learning algorithms are employed for the prediction of the disease, ensemble learning algorithms are more robust for the same purpose.
Methods: The performance of ensemble learning algorithm models such as AdaBoost, Gradient Boosting, XGBoost, and Stacking Ensemble is evaluated in this paper by using the PIMA Indian Diabetes dataset. In data preprocessing, missing value handling, normalization, and splitting are performed. The performance of these models is also evaluated by calculating their accuracy, precision, recall, F1-score, and ROC-AUC scores.
Results: Among all the models that were tested, it was observed that the Stacking Ensemble model provided the best results in terms of accuracy, precision, recall, F1-score, and ROC-AUC value, with 0.86 accuracy, 0.82 precision, 0.80 recall, 0.81 F1-score, and 0.91 ROC-AUC value. The XGBoost model was also observed to perform well in terms of accuracy and ROC-AUC value, achieving 0.82 accuracy and 0.88 ROC-AUC value. From the results obtained in the above experiment, it is clear that there is an improvement in the performance of the stacked ensemble method.
Conclusion: Ensemble learning helps to improve the prediction of diabetes at early stages. The Stacking Ensemble model achieved the highest performance with 86% accuracy, 0.82 precision, 0.80 recall, 0.81 F1-score, and 0.91 ROC-AUC. This study provides a foundation for future research and development of robust predictive models in diabetes care and prevent from diabetic complications.
Implications: The similarity in the results for Precision (0.82) and Recall (0.80) for the Stacking model shows good balance, which is critical since the data set could be imbalanced. Even though the XGBoost algorithm provides good trade-offs in terms of its complexity and accuracy, the results show that for the best results, the complexity of the Stacking Ensemble is required.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ramesh Prasad Bhatta

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.
