Copyright
©The Author(s) 2025.
World J Gastroenterol. Sep 21, 2025; 31(35): 111033
Published online Sep 21, 2025. doi: 10.3748/wjg.v31.i35.111033
Published online Sep 21, 2025. doi: 10.3748/wjg.v31.i35.111033
Table 1 Overview of studies selected for the review of explainable artificial intelligence techniques in inflammatory bowel disease care
Ref. | XAI technique | IBD type | Modality | Classifier | Contribution | Evaluation metrics (highest) |
Onwuka et al[14] | SHAP | CD and UC | Multiomics dataset | LGBM | This study applies SHAP-based XAI to identify key fecal metabolites that robustly predict IBD and their diet associations. It demonstrates that XAI not only enhances metabolite prioritization across datasets but also clarifies how diet-driven microbial metabolites distinctly influence IBD pathology | AUC = 0.93 (distinguishing between IBD and non-IBD cases) |
Patel et al[26] | Grad-CAM, saliency maps, integrated gradients, and LIME | UC | Endoscopic images | ResNet50 and MobileNetV2 | This study evaluates four XAI techniques predictions in IBD-related endoscopic image classification. The analysis reveals that Grad-CAM most consistently highlights relevant regions, supporting its use in visualizing model focus and validating TeleXGI’s multi-XAI strategy for clinical interpretability | ACC = 98.8% (health risk prediction of UC endoscopic images remotely) |
Chierici et al[37] | Saliency and gradient-based techniques | CD and UC | Endoscopic images | DenseNet121 | This study applied saliency and guided backpropagation to a ResNet50 model for IBD endoscopic image classification. The attribution maps revealed clinically relevant features, with guided backpropagation offering clearer, less noisy insights | MCC = 0.94 (classification of healthy controls and IBD patients) |
Sutton et al[38] | Grad-CAM | UC | Endoscopic images | DenseNet121 | This study uses explainable deep learning on HyperKvasir endoscopic images to accurately distinguished UC from non-UC conditions and stratified disease severity based on Mayo scores. The DenseNet121 model achieved the highest performance, with Grad-CAM enhancing interpretability through visual explanations | AUC = 0.90 (classification of UC and non-UC images) |
Tsai and Lee[39] | Grad-CAM | UC | Endoscopic images | DenseNet201, InceptionV3, and VGG19 | This study leverages Grad-CAM to interpret deep learning model predictions for UC classification, comparing individual CNNs and ensemble models. The triplet ensemble produced the most focused and clinically relevant heatmaps | ACC = 91% |
Ma et al[40] | Grad-CAM | CD | Ultrasound images and clinical data | ResNet50 | This study developed a deep learning model combining intestinal ultrasound images and clinical data to predict mucosal healing in CD after one year of treatment. Using Grad-CAM for interpretability, the model highlighted key features such as the bowel wall and mesentery, achieving a high PPV | AUC = 0.73 (classification of mucosal healing and non-mucosal healing) |
Maurício and Domingues[41] | Grad-CAM, LIME, SHAP, occlusion sensitivity | CD and UC | Endoscopic images | CNN + LSTM | This study develops CNN and ViT models to classify CD and UC from endoscopic images, creating lighter versions via knowledge distillation for clinical use. The ViT-S/16 model achieved the best performance, accurately identifying IBD features while ignoring irrelevant elements. Multiple XAI interpretability analysis confirmed its reliability, with minimal misclassifications after distillation using temperature-based methods | ACC = 95% (distinguishing between active and non-active inflammation) |
Weng et al[42] | SHAP and LIME | CD | Clinical data | Extreme gradient boosting | This study integrates SHAP and LIME to interpret an extreme gradient boosting-based model for differentiating intestinal tuberculosis from CD. These XAI methods successfully identified and visualized the key clinical features influencing the machine learning model’s predictions for differentiating intestinal tuberculosis and CD | MCC = 96.9% |
Zhen et al[43] | LIME | CD and UC | Clinical data | SVC | This study applied LIME to interpret an SVM model predicting quality-of-life impairment in IBD patients, identifying key modifiable risk factors such as anxiety, abdominal pain, and glucocorticoid use | AUC = 80% (evaluating IBD-related quality-of-life impairments) |
Deng et al[44] | Trainable attention mechanisms | CD | Pathological images | RFC and GNN | This study introduces a cross-scale attention mechanism for multi-instance learning that captures inter-scale interactions in whole slide images for CD diagnosis. Trained on approximately 250000 hematoxylin and eosin-stained patches. Cross-scale attention visualizations localize lesion patterns at different magnifications, enhancing both diagnostic accuracy and model interpretability | AUC = 0.89 (distinguishing between healthy CD patient) |
de Maissin et al[45] | Trainable attention mechanisms | CD | Endoscopic images | ResNet 34 | This study introduces a recurrent attention neural network trained on a multi-expert annotated CD capsule endoscopy dataset. It demonstrates that higher annotation quality significantly boosts diagnostic accuracy (up to 93.7% precision). The network mimics human visual focus, enabling interpretable lesion localization and outperforming standard CNNs as annotations improve | ACC = 94.6% (detection of pathological and non-pathological images) |
Wu et al[46] | Trainable attention mechanisms | UC | Endoscopic images | FLATer | This study presents FLATer, an explainable transformer-based model combining CNN and ViT for GIT disease classification. An ablation study reveals the crucial role of the residual block and spatial attention in boosting performance and interpretability, with saliency maps confirming enhanced localization of pathological region | ACC = 99.7% (multi-class classification to categorize images into specific diseases) |
Sucipto et al[47] | Trainable attention mechanisms | CD | Pathological images | RFC and GNN | This study introduces a cross-scale attention mechanism for multi-instance learning that captures inter-scale interactions in whole slide images for CD diagnosis. Trained on approximately 250000 hematoxylin and eosin-stained patches, it achieved an AUC of 0.8924. Cross-scale attention visualizations localize lesion patterns at different magnifications, enhancing both diagnostic accuracy and model interpretability | ACC = 87% and 85% (prediction of histologic remission) |
Ahamed et al[48] | SHAP, heatmap, Grad-CAM, and saliency maps | UC | Endoscopic images | EELM | This study introduces a lightweight, parallel-depth CNN optimized for gastrointestinal image classification using the GastroVision dataset, including IBD cases. Integrated with multiple XAI techniques (Grad-CAM, SHAP, saliency maps), the model enhances interpretability and diagnostic transparency. Using EELM for final classification, the system achieved high accuracy, robust generalizability | AUC = 0.987 (distinguishing between normal and benign ulcer) |
Elmagzoub et al[49] | Trainable attention mechanisms | UC | Endoscopic images | ResNet101 | This study presents a grid search-optimized ResNet101 model with an integrated attention mechanism for classifying gastrointestinal diseases, including UC, from endoscopic images | ACC = 93.5% |
- Citation: Okpete UE, Byeon H. Explainable artificial intelligence for personalized management of inflammatory bowel disease: A minireview of recent advances. World J Gastroenterol 2025; 31(35): 111033
- URL: https://www.wjgnet.com/1007-9327/full/v31/i35/111033.htm
- DOI: https://dx.doi.org/10.3748/wjg.v31.i35.111033