BPG is committed to discovery and dissemination of knowledge
Minireviews
Copyright ©The Author(s) 2025.
World J Gastroenterol. Sep 21, 2025; 31(35): 111033
Published online Sep 21, 2025. doi: 10.3748/wjg.v31.i35.111033
Table 1 Overview of studies selected for the review of explainable artificial intelligence techniques in inflammatory bowel disease care
Ref.
XAI technique
IBD type
Modality
Classifier
Contribution
Evaluation metrics (highest)
Onwuka et al[14]SHAPCD and UCMultiomics datasetLGBMThis study applies SHAP-based XAI to identify key fecal metabolites that robustly predict IBD and their diet associations. It demonstrates that XAI not only enhances metabolite prioritization across datasets but also clarifies how diet-driven microbial metabolites distinctly influence IBD pathologyAUC = 0.93 (distinguishing between IBD and non-IBD cases)
Patel et al[26]Grad-CAM, saliency maps, integrated gradients, and LIMEUCEndoscopic imagesResNet50 and MobileNetV2This study evaluates four XAI techniques predictions in IBD-related endoscopic image classification. The analysis reveals that Grad-CAM most consistently highlights relevant regions, supporting its use in visualizing model focus and validating TeleXGI’s multi-XAI strategy for clinical interpretabilityACC = 98.8% (health risk prediction of UC endoscopic images remotely)
Chierici et al[37]Saliency and gradient-based techniquesCD and UC Endoscopic imagesDenseNet121This study applied saliency and guided backpropagation to a ResNet50 model for IBD endoscopic image classification. The attribution maps revealed clinically relevant features, with guided backpropagation offering clearer, less noisy insightsMCC = 0.94 (classification of healthy controls and IBD patients)
Sutton et al[38]Grad-CAMUCEndoscopic imagesDenseNet121This study uses explainable deep learning on HyperKvasir endoscopic images to accurately distinguished UC from non-UC conditions and stratified disease severity based on Mayo scores. The DenseNet121 model achieved the highest performance, with Grad-CAM enhancing interpretability through visual explanationsAUC = 0.90 (classification of UC and non-UC images)
Tsai and Lee[39]Grad-CAMUCEndoscopic imagesDenseNet201, InceptionV3, and VGG19This study leverages Grad-CAM to interpret deep learning model predictions for UC classification, comparing individual CNNs and ensemble models. The triplet ensemble produced the most focused and clinically relevant heatmapsACC = 91%
Ma et al[40]Grad-CAMCDUltrasound images and clinical dataResNet50This study developed a deep learning model combining intestinal ultrasound images and clinical data to predict mucosal healing in CD after one year of treatment. Using Grad-CAM for interpretability, the model highlighted key features such as the bowel wall and mesentery, achieving a high PPVAUC = 0.73 (classification of mucosal healing and non-mucosal healing)
Maurício and Domingues[41]Grad-CAM, LIME, SHAP, occlusion sensitivityCD and UCEndoscopic imagesCNN + LSTMThis study develops CNN and ViT models to classify CD and UC from endoscopic images, creating lighter versions via knowledge distillation for clinical use. The ViT-S/16 model achieved the best performance, accurately identifying IBD features while ignoring irrelevant elements. Multiple XAI interpretability analysis confirmed its reliability, with minimal misclassifications after distillation using temperature-based methodsACC = 95% (distinguishing between active and non-active inflammation)
Weng et al[42]SHAP and LIMECDClinical dataExtreme gradient boostingThis study integrates SHAP and LIME to interpret an extreme gradient boosting-based model for differentiating intestinal tuberculosis from CD. These XAI methods successfully identified and visualized the key clinical features influencing the machine learning model’s predictions for differentiating intestinal tuberculosis and CDMCC = 96.9%
Zhen et al[43]LIMECD and UCClinical dataSVCThis study applied LIME to interpret an SVM model predicting quality-of-life impairment in IBD patients, identifying key modifiable risk factors such as anxiety, abdominal pain, and glucocorticoid useAUC = 80% (evaluating IBD-related quality-of-life impairments)
Deng et al[44]Trainable attention mechanismsCDPathological imagesRFC and GNNThis study introduces a cross-scale attention mechanism for multi-instance learning that captures inter-scale interactions in whole slide images for CD diagnosis. Trained on approximately 250000 hematoxylin and eosin-stained patches. Cross-scale attention visualizations localize lesion patterns at different magnifications, enhancing both diagnostic accuracy and model interpretabilityAUC = 0.89 (distinguishing between healthy CD patient)
de Maissin et al[45]Trainable attention mechanismsCDEndoscopic imagesResNet 34This study introduces a recurrent attention neural network trained on a multi-expert annotated CD capsule endoscopy dataset. It demonstrates that higher annotation quality significantly boosts diagnostic accuracy (up to 93.7% precision). The network mimics human visual focus, enabling interpretable lesion localization and outperforming standard CNNs as annotations improveACC = 94.6% (detection of pathological and non-pathological images)
Wu et al[46]Trainable attention mechanismsUCEndoscopic imagesFLATerThis study presents FLATer, an explainable transformer-based model combining CNN and ViT for GIT disease classification. An ablation study reveals the crucial role of the residual block and spatial attention in boosting performance and interpretability, with saliency maps confirming enhanced localization of pathological regionACC = 99.7% (multi-class classification to categorize images into specific diseases)
Sucipto et al[47]Trainable attention mechanismsCDPathological imagesRFC and GNNThis study introduces a cross-scale attention mechanism for multi-instance learning that captures inter-scale interactions in whole slide images for CD diagnosis. Trained on approximately 250000 hematoxylin and eosin-stained patches, it achieved an AUC of 0.8924. Cross-scale attention visualizations localize lesion patterns at different magnifications, enhancing both diagnostic accuracy and model interpretabilityACC = 87% and 85% (prediction of histologic remission)
Ahamed et al[48]SHAP, heatmap, Grad-CAM, and saliency mapsUCEndoscopic imagesEELMThis study introduces a lightweight, parallel-depth CNN optimized for gastrointestinal image classification using the GastroVision dataset, including IBD cases. Integrated with multiple XAI techniques (Grad-CAM, SHAP, saliency maps), the model enhances interpretability and diagnostic transparency. Using EELM for final classification, the system achieved high accuracy, robust generalizabilityAUC = 0.987 (distinguishing between normal and benign ulcer)
Elmagzoub et al[49]Trainable attention mechanismsUC Endoscopic imagesResNet101This study presents a grid search-optimized ResNet101 model with an integrated attention mechanism for classifying gastrointestinal diseases, including UC, from endoscopic imagesACC = 93.5%