Hallucinating Health: Assessing the Clinical Reliability of LIME, Grad-SHAP and Grad-CAM in Small-Scale Medical Imaging

Isu Sharma; Aaditya Kafle; Aayush Maharjan; Bigyan Moktan

doi:10.3126/injet.v3i2.95504

Authors

Isu Sharma Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
Aaditya Kafle Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
Aayush Maharjan Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal
Bigyan Moktan Department of Computer and Electronics Engineering, Kantipur Engineering College, Dhapakhel, Lalitpur, Nepal

Keywords:

Explainable Artificial Intelligence, XAI Reliability, Hallucinated Explanations, Small Dataset Learning, Grad-CAM, SHAP, LIME, Medical Image Classification, Saliency Maps, Data Scarcity, Transfer Learning, Few-Shot Learning, Trustworthy AI, Clinical AI Safety

Abstract

Explainable Artificial Intelligence (XAI) methods such as Grad-CAM, SHAP, and LIME are increasingly employed in medical imaging to provide post-hoc interpretations of deep learning models. However, their reliability under data-scarce conditions remains insufficiently understood, despite being critical for real-world clinical deployment. In this work, we investigate the phenomenon of hallucinated explanations wherein XAI methods produce visually plausible but clinically ungrounded saliency maps due to models learning spurious correlations from limited training data. We conduct a controlled study on pediatric chest X-ray pneumonia classification by systematically reducing training set size (N = 1000 → 500 → 200 → 100 → 50) and evaluating explanation quality using perturbation-based faithfulness metrics, localization consistency, and qualitative clinical alignment. Our results demonstrate a non-linear degradation in explanation of reliability as dataset size decreases, with both gradient-based and perturbation-based methods exhibiting distinct failure modes. Notably, high model confidence persists even as explanation of faithfulness collapses, highlighting a critical decoupling between predictive performance and interpretability. We further evaluate mitigation strategies including data augmentation and transfer learning, finding that transfer learning partially stabilizes explanation fidelity but does not eliminate hallucination effects at very low sample sizes. These findings underscore the need for rigorous, quantitative evaluation of XAI methods prior to clinical adoption and suggest that commonly used saliency techniques may be unreliable in low-data regimes. This work contributes to a systematic framework for auditing XAI reliability in medical imaging and provides practical insights toward safer deployment of interpretable AI systems.

Abstract

22

PDF

0

Hallucinating Health: Assessing the Clinical Reliability of LIME, Grad-SHAP and Grad-CAM in Small-Scale Medical Imaging

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

How to Cite