Ensemble Based Machine Learning Model for Prediction of Diabetes

Ramesh Prasad Bhatta

doi:10.3126/nprcjmr.v3i2.91272

Authors

Ramesh Prasad Bhatta Far Western University, Nepal https://orcid.org/0009-0005-0554-9072

DOI:

https://doi.org/10.3126/nprcjmr.v3i2.91272

Keywords:

Diabetes, Accuracy, Prediction, XGBoost, Machine Learning, Ensemble Learning

Abstract

Background: Early prediction of the disease is of prime importance to reduce the risk of complications and health-care expenses, especially in developing countries. Though conventional machine learning algorithms are employed for the prediction of the disease, ensemble learning algorithms are more robust for the same purpose.

Methods: The performance of ensemble learning algorithm models such as AdaBoost, Gradient Boosting, XGBoost, and Stacking Ensemble is evaluated in this paper by using the PIMA Indian Diabetes dataset. In data preprocessing, missing value handling, normalization, and splitting are performed. The performance of these models is also evaluated by calculating their accuracy, precision, recall, F1-score, and ROC-AUC scores.

Results: Among all the models that were tested, it was observed that the Stacking Ensemble model provided the best results in terms of accuracy, precision, recall, F1-score, and ROC-AUC value, with 0.86 accuracy, 0.82 precision, 0.80 recall, 0.81 F1-score, and 0.91 ROC-AUC value. The XGBoost model was also observed to perform well in terms of accuracy and ROC-AUC value, achieving 0.82 accuracy and 0.88 ROC-AUC value. From the results obtained in the above experiment, it is clear that there is an improvement in the performance of the stacked ensemble method.

Conclusion: Ensemble learning helps to improve the prediction of diabetes at early stages. The Stacking Ensemble model achieved the highest performance with 86% accuracy, 0.82 precision, 0.80 recall, 0.81 F1-score, and 0.91 ROC-AUC. This study provides a foundation for future research and development of robust predictive models in diabetes care and prevent from diabetic complications.

Implications: The similarity in the results for Precision (0.82) and Recall (0.80) for the Stacking model shows good balance, which is critical since the data set could be imbalanced. Even though the XGBoost algorithm provides good trade-offs in terms of its complexity and accuracy, the results show that for the best results, the complexity of the Stacking Ensemble is required.

Downloads

Download data is not yet available.

Abstract

207

PDF

102

Author Biography

Ramesh Prasad Bhatta, Far Western University, Nepal

Assistant Professor, Central Department of CSIT

Ensemble Based Machine Learning Model for Prediction of Diabetes

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Ramesh Prasad Bhatta, Far Western University, Nepal

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Information

Current Issue

Make a Submission