Early Prediction of Chronic Kidney Disease Using Extra Trees and LightGBM with Sharp Visualization
DOI:
https://doi.org/10.3126/kjse.v10i1.93861Keywords:
Chronic Kidney Disease, Explainable AI, Extra Trees, Interpretability, LightGBM, Machine Learning, Medical Diagnosis, SHAP, UCI RepositoryAbstract
This research paper aims to detect Chronic Kidney Disease (CKD) at the initial stages by using blood and urine test parameters leveraging computationally efficient machine learning algorithms such as LightGBM and Extra Trees. The UCI Chronic Kidney Disease dataset with 400 instances and 24 features along with binary “ckd” and “notckd” target class, is preprocessed by imputing missing values, clipping outliers and normalization. Validation was done by 5-fold cross validation technique. Extra Trees and LightGBM achieved accuracies of 0.993 and 0.985 respectively. SHAP visualization showed albumin, hemoglobin and specific gravity (sg = 1.025) as key features using Extra Trees whereas LightGBM showed albumin, hemoglobin and serum creatinine as key features. Albumin, hemoglobin, specific gravity = 1.025, hypertension and serum creatinine are significant indicators of CKD. Specific value of sg i.e. 1.025 being a significant contributor can be a significant area of medical study. Hence, this study offers an interpretable, lightweight framework suitable for integration into routine clinical blood testing for early CKD detection and can be emulated as a method of identifying features that contribute towards a disease, which with high model accuracy can be put into consideration for clinical research to uncover the feature’s biological meaning and potential as a hidden biomarker as well. Notably, low hemoglobin, linked to reduced production of erythropoietin as seen in impaired kidneys, strongly predicted “ckd”, displaying one such critical biomarker. This shows a strong potential of using this method for other diseases as well.