Hallucinating Health: Assessing the Clinical Reliability of LIME, Grad-SHAP and Grad-CAM in Small-Scale Medical Imaging
DOI:
https://doi.org/10.3126/injet.v3i2.95504Keywords:
Explainable Artificial Intelligence, XAI Reliability, Hallucinated Explanations, Small Dataset Learning, Grad-CAM, SHAP, LIME, Medical Image Classification, Saliency Maps, Data Scarcity, Transfer Learning, Few-Shot Learning, Trustworthy AI, Clinical AI SafetyAbstract
Explainable Artificial Intelligence (XAI) methods such as Grad-CAM, SHAP, and LIME are increasingly employed in medical imaging to provide post-hoc interpretations of deep learning models. However, their reliability under data-scarce conditions remains insufficiently understood, despite being critical for real-world clinical deployment. In this work, we investigate the phenomenon of hallucinated explanations wherein XAI methods produce visually plausible but clinically ungrounded saliency maps due to models learning spurious correlations from limited training data. We conduct a controlled study on pediatric chest X-ray pneumonia classification by systematically reducing training set size (N = 1000 → 500 → 200 → 100 → 50) and evaluating explanation quality using perturbation-based faithfulness metrics, localization consistency, and qualitative clinical alignment. Our results demonstrate a non-linear degradation in explanation of reliability as dataset size decreases, with both gradient-based and perturbation-based methods exhibiting distinct failure modes. Notably, high model confidence persists even as explanation of faithfulness collapses, highlighting a critical decoupling between predictive performance and interpretability. We further evaluate mitigation strategies including data augmentation and transfer learning, finding that transfer learning partially stabilizes explanation fidelity but does not eliminate hallucination effects at very low sample sizes. These findings underscore the need for rigorous, quantitative evaluation of XAI methods prior to clinical adoption and suggest that commonly used saliency techniques may be unreliable in low-data regimes. This work contributes to a systematic framework for auditing XAI reliability in medical imaging and provides practical insights toward safer deployment of interpretable AI systems.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal on Engineering Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.