Artificial intelligence in inflammatory bowel disease: Current applications and future directions

doi:10.3748/wjg.v31.i39.111353

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 31, Issue 39

This Article

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (1362)

All Articles published online

The chart showing PDF series, HTML series, Figures (1-1) series, Tables (1-4) series.

Item

Count

PDF

HTML

505

Figures (1-1)

Tables (1-4)

Sum=625

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

216

Download

248

Sum=464

Oct 21, 2025 (publication date) through Mar 1, 2026

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Gastroenterology

ISSN

1007-9327

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Review

World J Gastroenterol. Oct 21, 2025; 31(39): 111353
Published online Oct 21, 2025. doi: 10.3748/wjg.v31.i39.111353

Table 1 Role of artificial intelligence-based endoscopy in the evaluation of patients with inflammatory bowel diseases

Ref.	Disease/number of patients	Type of study	Endoscopic technique	Number of training samples	Number of test samples	AI/model	Main findings
Stidham et al[22]	UC/3082 patients	Retrospective, single center	WLE	14862 images	1652 images	DL-CNN	Discriminating ER (MES ≤ 1) from moderate-severe disease (MES ≥ 2) (AUC = 0.966, sensitivity = 83.0%, specificity = 96.0%). AI and pathologist agreement (κ = 0.840 vs κ = 0.860)
Maeda et al[38]	UC/187 patients	Retrospective, single center	Endocytoscopy	12900 still images	525 segments	CAD	Prediction of HR (GS ≥ 3.1) (sensitivity = 74.0%, specificity = 97.0%, precision = 91.0%, κ = 1.000)
Ozawa et al[40]	UC/955 patients	Retrospective, single center	WLE	26304 still images	3981 still images	CAD-CNN	AI performance for mucosal healing (MES ≤ 1, AUC = 0.980)
Takenaka et al[25]	UC/875 patients	Prospective, single center	WLE	40789 still images	4187 still images	DNUC	Evaluation of ER (UCEIS ≤ 2) (accuracy = 90.1%, ICC = 0.917). Evaluation of HR (GS < 3.1) (accuracy = 92.9%, κ = 0.859)
Bossuyt et al[41]	UC/35 patients	Prospective, multicenter	Prototype endoscope	NR	NR	CAD	RD for endoscopic/histological inflammation: Correlation with MES (r = 0.76), UCEIS (r = 0.74), RHI (r = 0.74). RD score (≤ 60) predicts HR (AUC = 0.950, sensitivity = 96.0%, specificity = 80.0%)
Yao et al[27]	UC/157 patients	Prospective, multicenter	WLE	NR	264 videos of high resolution	DL-CNN	The still image informative classifier had excellent performance (sensitivity = 0.902, specificity = 0.870). Correct prediction of MES: 78% of videos (κ = 0.840)
Gottlieb et al[30]	UC/249 patients	Prospective, multicenter	WLE	629 videos	157 videos	RNN	Endoscopic healing evaluation according to UCEIS (accuracy = 97.0%) and MES (accuracy = 95.5%). Agreement of the model with human experts for MES (QWK = 0.844) and UCEIS (QWK = 0.855)
Bossuyt et al[42]	UC/58 patients	Prospective, single center	SWE	NR	113 still images	CAD	AI algorithm yielded better HR accuracy (86.0%) than MES (74.0%) or UCEIS (79.0%)
Huang et al[43]	UC/54 patients	Retrospective, single center	Endoscopy HD	600 still images	256 still images	DNN, SVM, k-NN	Performance of the combined model for differentiation between MES ≤ 1 and MES 2 (AUC = 0.927, accuracy = 94.5%, sensitivity = 89.2%, specificity = 96.3%)
Takenaka et al[44]	UC/770 patients	Prospective, multicenter	WLE	NR	NR	DNUC	Prediction of HR (sensitivity = 97.9%, specificity = 94.6%). Agreement between the DNUC and experts for endoscopic assessment (ICC = 0.927)
Patel et al[45]	UC/73 patients	Prospective, single center	Endoscopy HD	55 video images	18 video images	MLA	Differentiation between: Remission (UCEIS: 0-1) and active inflammation (UCEIS ≥ 2) (accuracy = 90.0%, κ = 0.900); Mild (UCEIS: 2-3); And moderate-to-severe inflammation (UCEIS ≥ 4) (accuracy = 98.0%, κ = 0.960)
Kim et al[19]	UC/492 patients	Retrospective, single center	WLE	904 still images	80 still images	DL-CNN	Difference between MES 0 vs MES 1: Internal test. IBD experts (F1 score = 0.92, AUC = 0.970); External test. Hyper Kvasir dataset (F1 score = 0.89, AUC = 0.860)
Polat et al[20]	UC/564 patients	Retrospective, single center	WLE	11276 still images	1658 still images	DL-CNN	Excellent concordance between the five CNN networks and endoscopists for: MES evaluation (QWK: 0.847-0.854); And classification of remission cases (QWK: 0.834-0.852)
Wang et al[21]	UC/308 patients	Retrospective, single center	WLE	37515 still images	3191 still images	CNN	Diagnosis of ER (MES ≤ 1) (AUC = 0.980, accuracy = 95.1%, sensitivity = 92.9%, specificity = 95.4%, κ = 0.884)
Iacucci et al[46]	UC/283 patients	Prospective, multicenter	WLE, VCE	239 video images; 245 video images	242 video images; 244 video images	CNN	Detection of ER using VCE (PICaSSO ≤ 3) (AUC = 0.940, sensitivity = 79.0%, specificity = 95.0%, κ = 0.730) achieved better performance than WLE (UCEIS ≤ 1) (AUC = 0.850, sensitivity = 72.0%, specificity = 87.0%, κ = 0.510)
Byrne et al[47]	UC/NR	Prospective, single center	HD endoscopy	134 video images	NR	DL-CNN	Performance for disease severity discrimination: MES ≤ 1 vs MES ≥ 2 (AUC = 0.941, accuracy = 94.0%, sensitivity = 96.7%, specificity = 91.3%, QWK = 0.880); UCEIS ≤ 3 vs UCEIS > 3 (AUC = 0.936, accuracy = 94.0%, sensitivity = 93.9%, specificity = 93.4%, QWK = 0.870)
Stidham et al[31]	UC/748 patients	Prospective, multicenter	WLE	NR	NR	ML	CDS had better performance for detecting endoscopic changes than MES (Hedges’ g: 0.743 vs 0.460, P < 0.001)
Takabayashi et al[48]	UC/812 patients	Retrospective, multicenter	WLE	14208 still images	13826 still images	CNN	Disease severity grading-correlation between: UCEGS and MES (ρ = 0.890, P < 0.001); UCEGS and IBD experts (ρ: 0.960-0.987, P < 0.001)
Ogata et al[49]	UC/110 patients	Prospective, single center	WLE	74713 still images	11452 still images	CNN	Performance for evaluation ER based MES (sensitivity = 96.9%, specificity = 78.4%, accuracy = 93.4%). Interobserver/intraobservator agreement with AI/without AI (ICC: 0.84-0.86/0.89 vs 0.64-0.76/0.76)
Sinonquel et al[50]	UC/36 patients	Prospective, single center	SWE	NR	NR	CAD	Histological assessment using SWE-CAD (sensitivity = 96.1%, specificity = 85.5%, accuracy = 96.4%). The accuracy of classification into mild, moderate, and severe disease was 97.7%, 62.8% and 95.0%, respectively
Aoki et al[51]	CD/131 patients	Retrospective, single center	CE	5360 images	10440 images	CNN	Ulcer recognition in small bowel video frames (AUC = 0.958, sensitivity = 88.2%, specificity = 90.9%, accuracy = 90.8%)
Klang et al[52]	CD/49 patients	Retrospective, single center	CE	14112 images	3528 images	DL-CNN	Increased performance in ulcer detection (AUC = 0.990, accuracy: 95.4%-96.7%)
Klang et al[32]	CD/145 patients	Retrospective, single center	CE	27892 images	1449 images	DNN	Performance for: Stricture detection (AUC = 0.971, accuracy = 93.5%); Differential diagnosis between strictures and normal mucosa (AUC = 0.989); Discrimination between strictures and ulcers (AUC = 0.942)
Barash et al[53]	CD/49 patients	Retrospective, single center	CE	1242 images	248 images	CNN	Ability of ulcerative lesion classification: Grade 1 vs 3 (AUC = 0.958, accuracy = 91.0%, κ = 0.910); Grade 2 vs 3 (AUC = 0.939, accuracy = 79.0%, κ = 0.790); Grade 1 vs 2 (AUC = 0.565, accuracy = 62.4%, κ = 0.670)
Majtner et al[54]	CD/38 patients	Retrospective, single center	CE	5421 images	1549 images	DL	Performance in ulcer detection (sensitivity = 95.7%, specificity = 99.8%, accuracy = 98.4%). Agreement between the model and manual reading of ulcerations (κ = 0.720)
Udristoiu et al[55]	CD/54 patients	Retrospective, single center	pCLE	5081 images	1124 images	CNN	Differentiation between inflammation and intact colonic mucosa (AUC = 0.980, accuracy = 95.3%, specificity = 92.8%, sensitivity = 94.6%)
de Maissin et al[56]	CD/63 patients	Retrospective, multicenter	CE	2449 images	700 images	RNN	Performance for discriminating pathological vs non-pathological images (accuracy = 93.7%, sensitivity = 93.0%, specificity = 95.0%, κ = 0.790)
Ribeiro et al[57]	CD/124 patients	Retrospective, multicenter	CE	37319 images	124 images	CNN	Identification of colonic ulcerations and erosions (AUC = 1.000, accuracy = 99.6%, sensitivity = 96.9%, specificity = 99.9%)
Ferreira et al[58]	CD/NR	Retrospective, multicenter	CE	19740 images	4935 images	DL-CNN	Performance of the model for lesion detection (sensitivity = 90.0%, specificity = 96.0%, precision = 97.1%, accuracy = 92.4%)
Afonso et al[59]	CD/NR	Retrospective, single center	CE	4904 images	1226 images	CNN	Detection of ulcers and erosions in the small intestine mucosa (accuracy = 95.6%, sensitivity = 90.8%, specificity = 97.1%)
Martins et al[60]	CD/250 patients	Retrospective, single center	DAE	250 DAE images	6772 images	CNN	Identification of colonic ulcerations and erosions (AUC = 1.000, accuracy = 98.7%, sensitivity = 88.5%, specificity = 99.7%)
Brodersen et al[34]	CD/131 patients	Prospective, multicenter	CE	NR	NR	DL	The identification capacity for CD (sensitivity: 92.0%-96.0% and specificity: 90.0%-93.0%) and IBD (sensitivity: 97.0% and specificity: 90.0%-91.0%)
Xie et al[61]	CD/628 patients	Retrospective, single center	DBE	NR	28155 images	DL	The accuracy for detection of ulcers (96.3%), inflammatory stenosis (95.7%), and non-inflammatory stenosis (96.7%). The grading of ulcers based on surface area, size, and depth (precision between 85.2% and 87.8%)

AI: Artificial intelligence; CAD: Computer-aided diagnosis; CD: Crohn’s disease; CE: Capsule endoscopy; CNN: Convolutional neural network; DAE: Device-assisted enteroscopy; DBE: Double-balloon endoscopy; DL: Deep learning; DNUC: Deep neural network for evaluation of ulcerative colitis; HD: High definition; NR: Not reported; pCLE: Probe-based confocal laser endomicroscopy; SWE: Single-wavelength endoscopy; WLE: White-light endoscopy; VCE: Virtual chromoendoscopy; UC: Ulcerative colitis; k-NN: K-nearest neighbor network; RNN: Recurrent neural network; SVM: Support vector machine; ML: Machine learning; MLA: Machine learning algorithm; ER: Endoscopic remission; MES: Mayo endoscopic score; AUC: Area under the curve; GS: Geboes score; HR: Histological remission; UCEIS: Ulcerative colitis endoscopic index of severity; ICC: Intraclass correlation coefficient; RD: Red density; RHI: Robarts histological index; QWK: Quadratic weighted kappa; PICaSSO: Paddington international virtual chromoendoscopy score; IBD: Inflammatory bowel disease; UCEGS: Ulcerative colitis endoscopic gradation scale; CDS: Cumulative disease score; DNN: Deep neural network.

Full Size Table

Table 2 Artificial intelligence-enabled digital pathology for histological assessment in inflammatory bowel diseases

Ref.	Disease/number of patients	Type of study	Number of training samples	Number of test samples	AI/model	Main findings
Vande Casteele et al[66]	UC/88 patients	Retrospective, single center	20 tissue regions	88 biopsies	DL	Performance in identifying eosinophil counts: WSI (sensitivity = 86.4%, accuracy = 91.8%, F1 score = 0.89); Strong agreement with four human experts (ICC: 0.805-0.917)
Gui et al[69]	UC/307 patients	Prospective, multicenter	97 biopsies	41 biopsies	CAD-CNN	Discrimination between HR (PHRI < 1) and non-remission (PHRI ≥ 1) based on the presence of neutrophils (sensitivity = 78.0%, specificity = 91.7%, accuracy = 86.0%, ICC = 0.840)
Ohara et al[71]	UC/114 patients	Retrospective, single center	2300 WSIs	114 biopsies	DL	Rate of relapse higher for GCR ≤ 12% compared to GCR > 12% (45.0% vs 6.5%, P < 0.010)
Najdawi et al[68]	UC/577 patients	Retrospective, single center	512 WSIs	308 WSIs	CNN-RFC	HR prediction (NHI ≤ 1, accuracy = 97.0%) was comparable to expert pathologist assessments (κ = 0.910, Spearman’s correlation ρ = 0.890, P < 0.010)
Iacucci et al[70]	UC/273 patients	Prospective, multicenter	118 biopsies	375 biopsies (1); 154 biopsies (2)	CAD-CNN	(1) Performance in distinguishing HR from active inflammation: RHI (AUC = 0.850, accuracy = 80.0%, sensitivity = 94.0%, specificity = 76.0%); NHI (AUC = 0.860, accuracy = 81.0%, sensitivity = 89.0%, specificity = 79.0%); PHRI (AUC = 0.870, accuracy = 87.0%, sensitivity = 89.0%, specificity = 85.0%); and (2) The hazard ratio for disease recurrence according to PHRI was higher for AI assessment compared with pathologist evaluation (4.64 vs 3.56, P < 0.001)
Peyrin-Biroulet et al[73]	UC/NR	Retrospective, single center	160 WSIs	40 WSIs	CNN	The average ICC between histopathologists and the AI tool for histological assessment based on NHI (ICC = 0.872)
Ohara et al[72]	UC/96 patients	Prospective, single center	11260 patches	135 WSIs	DL	Histological evaluation based on neutrophil quantification in WSI (accuracy = 77.0%, F1 score = 79.0%). Prediction of histological scores (PHRI, NHI) by AI showed strong correlation with pathological diagnoses (Spearman’s ρ = 0.680-0.800, P < 0.050)
Klein et al[78]	CD/105 patients	Retrospective, single center	Biopsies NR	Biopsies NR	NNET	Differentiation of clinical phenotypes (sensitivity = 78.0%, specificity = 77.0%). Prediction of surgical intervention (sensitivity = 80.0%, specificity = 91.0%)
Kiyokawa et al[76]	CD/68 patients	Retrospective, single center	619464 tile images	308705 tile images	CNN	Adipocyte shrinkage and increased mast cell infiltration in sub-serosal adipose tissue anticipate postoperative recurrence (AUC = 0.995, accuracy = 96.9%, precision = 96.4%, sensitivity = 96.5%)
Wang et al[77]	CD/205 patients	Retrospective, multicenter	310 WSIs	278 WSIs	DL-CNN	Severity of myenteric plexitis (accuracy = 83.3%) and postoperative recurrence prediction (AUC = 0.980)
Rymarczyk et al[79]	UC/887 patients; CD/302 patients	Retrospective, multicenter	2696 biopsies	800 biopsies	RNN, FV + RF, SA-AbMILP	SA-AbMILP for automated histological assessment of: GS: Accuracy: 65.0%-85.0% (κ = 0.440-0.680); GHAS: Colon accuracy: 80.0%-89.0% (κ = 0.540-0.650); Ileum accuracy 65.0%-82.0% (κ = 0.460-0.670)
Furlanello et al[74]	IBD/52 patients	Prospective, single center	4981 WSIs	356 biopsies	DL	Automated quantification of basal plasmacytosis discriminates IBD from non-IBD (accuracy = 90.0%)

AI: Artificial intelligence; CD: Crohn’s disease; IBD: Inflammatory bowel disease; UC: Ulcerative colitis; NR: Not reported; WSI: Whole slide image; CNN: Convolutional neural network; CAD: Computer-aided diagnosis; DL: Deep learning; RFC: Random forests classifier; NNET: Neural network; RNN: Recurrent neural network; FV + RF: Fisher vector with random forest; SA-AbMILP: Self-attention-based multi-instance learning pooling; ICC: Interclass correlation coefficients; HR: Histological remission; PHRI: Paddington international virtual chromoendoscopy score histologic remission index; GCR: Goblet cell ratio; NHI: Nancy histological index; RHI: Robarts histologic index; AUC: Area under the curve; GS: Geboes score; GHAS: Global histological disease activity score.

Full Size Table

Table 3 Artificial intelligence-assisted prediction of therapeutic response in inflammatory bowel diseases

Ref.	Disease/number of patients	Study design	Therapy	AI/model	Main findings
Waljee et al[91]	UC/491 patients	Retrospective, multicenter	VDZ	ML-RF	Long-term steroid-free ER prediction (AUC = 0.730) using laboratory data from first 6 weeks of VDZ
Waljee et al[88]	CD/401 patients	Retrospective, multicenter	UST	ML-RF	Week 8 CRP/ALB ratio predicts UST non-response (AUC = 0.780, sensitivity = 79.0%, specificity = 67.0%) vs baseline data (AUC = 0.590, sensitivity = 63.0%, specificity = 64.0%)
Con et al[81]	CD/146 patients	Retrospective, single center	IFX, ADA	DL-RNN	AI model using CRP < 5 mg/L better predicts post-therapy remission than conventional model (AUC: 0.754 vs 0.659, P = 0.036)
Li et al[92]	CD/174 patients	Retrospective, single center	IFX	ML-RF	Response to IFX predicted by clinical/serological data (AUC = 0.900, accuracy = 85.0%, sensitivity = 81.0%, specificity = 94.0%)
He et al[93]	CD/86 patients	Retrospective, single center	UST	ML	UST response prediction based on HSD3B1, MUC4, CF1, and CCL11 expression (AUC: 0.734-0.746)
Park et al[94]	CD/234 patients	Prospective, multicenter	anti-TNF	ML	The likelihood of a non-durable response associated with hyperexpression of DPY19 L3 (β = 2.703) and GSTT1 (β = 1.735), and decreasing NUCB1 concentration (β = -2.142)
Kellerman et al[33]	CD/101 patients	Retrospective, single center	ADA, IFX, VDZ	DL- TimeSformer	Prediction of biologic initiation in newly diagnosed patients (AUC = 0.860, accuracy: 81.0%-82.0%), outperforming human reader (AUC = 0.700) and FC (AUC = 0.740)
Iacucci et al[36]	IBD/29 patients	Prospective, single center	anti-TNF, anti-α4β7	CAD	pCLE-detected crypt/vessel abnormalities and fluorescein leakage predict therapy response in UC (AUC = 0.930, accuracy = 85.0%) and CD (AUC = 0.790, accuracy = 80.0%); better anti-TNF prediction in UC (AUC = 0.830) than in CD (AUC = 0.580)
Stidham et al[31]	UC/748 patients with induction; 348 patients with maintenance	Prospective, single center	UST	CAD	CDSs were significantly lower in UST vs placebo both at week 8 (141.9 vs 184.3, P < 0.0001) and week 44 (78.2 vs 151.5, P < 0.0001). Stratification by baseline CDS showed increased UST efficacy in patients with severe disease compared with mild disease (-85.0 vs -55.4, P < 0.0001)
Harun et al[87]	UC/1684 patients with induction; 463 patients with maintenance	Prospective, multicenter	Etrolizumab	ML-SHAP	Remission prediction post-induction (AUC = 0.740) and maintenance (AUC = 0.750) using combined demographic, clinical, physiologic, and histological data
Qiu et al[89]	CD/746 patients	Retrospective, single center	IFX	ML-SHAP	Response discrimination using integrated predictors (HB, WBC, ESR, ALB, PLT, age at diagnosis, Montreal classification) (training set AUC = 0.910, si-test set AUC = 0.710)

AI: Artificial intelligence; CD: Crohn’s disease; IBD: Inflammatory bowel disease; UC: Ulcerative colitis; VDZ: Vedolizumab; UST: Ustekinumab; ADA: Adalimumab; IFX: Infliximab; TNF: Tumor necrosis factor; ML: Machine learning; RF: Random forests; RNN: Recurrent neural network; SHAP: Shapley additive explanations; CAD: Computer-aided diagnosis; DL: Deep learning; ER: Endoscopic remission; AUC: Area under the curve; ALB: Albumin; CRP: C-reactive protein; FC: Fecal calprotectin; pCLE: Probe-based confocal laser endomicroscopy; CDS: Cumulative disease score; HB: Hemoglobin; WBC: White blood cells; ESR: Erythrocyte sedimentation rate; PLT: Platelets.

Full Size Table

Table 4 Overview of machine learning models for diagnosis, prognosis, and treatment optimization in inflammatory bowel disease

Ref.	AI model	Field of application	Disease	Outcomes	Performance
Najdawi et al[68]	ML-RF	Histological assessment	UC	Evaluation of HR	Strong agreement with pathologists in relation to the NHI score (κ = 0.910, Spearman coefficient of ρ = 0.890) (P < 0.001)
Waljee et al[91]		Therapy	UC	Predicting corticosteroid-free ER with VDZ at week 52	AUC = 0.730, sensitivity = 72.0%, specificity = 68% according to the results at week 6
Waljee et al[88]		Therapy	CD	Anticipation of UST response at week 42	AUC = 0.780, sensitivity = 79.0%, specificity = 67.0% based on demographic and laboratory data up to week 8
Li et al[92]				Assessment therapeutic response to IFX	AUC = 0.900, accuracy = 85.0%, sensitivity = 81.0%, specificity = 94.0%
He et al[93]				Prediction of therapeutic response to UST based on expression profile of four genes	AUC: 0.734–0.746
Stidham et al[103]		Risk stratification	CD	Evaluation of surgical outcomes	AUC = 0.780
Maeda et al[90]	ML-SVM	Endoscopic assessment	UC	Evaluation of persistent inflammation	Sensitivity = 74.0%, specificity = 97.0%, precision = 91.0%
Maeda et al[90]	ML-SVM	Risk stratification	UC	Assessment of relapse risk	Increased rate in patients with active form (28.4%) compared with those in clinical remission (4.9%, P < 0.001)
Park et al[94]	ML-XGBoost	Therapy	UC	Remission prediction post-induction and maintenance for etrolizumab	AUC: 0.740-0.750
Harun et al[87]		Therapy	CD	Prediction of therapeutic response to anti-TNF	Non-response associated with hyperexpression of DPY19 L3 (β = 2.703) and GSTT1 (β = 1.735), and decreased NUCB1 concentration (β = -2.142)
Qiu et al[89]		Therapy	CD	Prediction of therapeutic response to IFX	AUC = 0.91
Takenaka et al[44]	DL-DNN	Endoscopic assessment	UC	Prediction of HR	Sensitivity = 97.9%, specificity = 94.6%, ICC = 0.927
Huang et al[43]		Endoscopic assessment	UC	Evaluation of mucosal healing	AUC = 0.927, accuracy = 93.8%, sensitivity = 84.6%, specificity = 96.9%
Klang et al[32]		Endoscopic assessment	CD	Identification of strictures	AUC = 0.989, precision = 93.5%
Klang et al[32]		Endoscopic assessment	CD	Grading the severity of ulcerations	AUC = 0.992 (mild cases); AUC = 0.975 (moderate cases); AUC = 0.889 (severe cases)
Ozawa et al[40]	DL-CNN	Endoscopic assessment	UC	Discrimination between MES ≤ 1 and MES 2; diagnosis of ER (MES ≤ 1)	AUC = 0.980
Wang et al[21]		Endoscopic assessment	UC	Discrimination between MES ≤ 1 and MES 2; diagnosis of ER (MES ≤ 1)	AUC = 0.980, accuracy = 95.1%, sensitivity = 92.9%, specificity = 95.4%, κ = 0.884
Stidham et al[22]		Endoscopic assessment	UC	Discriminating ER from active endoscopic disease	AUC = 0.966, sensitivity = 83.0%, specificity = 96.0%. Excellent agreement between expert reviewers (κ = 0.860)
Gottlieb et al[30]		Endoscopic assessment	UC	Evaluation of mucosal healing	Accuracy: 95.5%-97.0%. Agreement with expert readers for MES (κ = 0.844) and UCEIS (0.855)
Takenaka et al[25]		Endoscopic assessment	UC	Predict of ER and HR	Accuracy = 90.1%, κ = 0.917 (UCEIS ≤ 2). Accuracy = 92.9%, κ = 0.859 (GS < 3.1)
Yao et al[27]	DL-CNN (Inception-V3)	Endoscopic assessment	UC	Assessment of disease severity	AUC = 0.939, sensitivity = 90.2%, specificity = 87.0%
Gui et al[69]	DL-CNN	Histological assessment	UC	Prediction of HR (PHRI < 1) according to the presence or absence of neutrophils	Sensitivity = 78.0%, specificity = 91.7%, accuracy = 86.0%, ICC = 0.84
Iacucci et al[70]		Histological assessment	UC	Prediction of HR (PHRI < 1) according to the presence or absence of neutrophils	AUC = 0.870, accuracy = 87.0%, sensitivity = 89.0%, specificity = 85.0%
Vande Casteele et al[66]		Histological assessment	UC	Quantification of eosinophils in colonic biopsies	The model had sensitivity = 0.86, specificity = 0.91, accuracy = 0.89
Udristoiu et al[55]	DL-CNN	Endoscopic assessment	CD	Differentiation between inflammation and intact colonic mucosa	AUC = 0.980, accuracy = 95.3%, specificity = 92.8%, sensitivity = 94.6%
Majtner et al[54]	DL-CNN (ResNet-50)	Endoscopic assessment	CD	Ulcer detection	The diagnostic accuracy was 98.5% for the small bowel and 98.1% for the colon
Kellerman et al[33]	DL-TimeSformer	Endoscopic assessment	CD	Prediction of biologic initiation in newly diagnosed patients	AUC = 0.860, accuracy: 81.0%-82.0%
Rymarczyk et al[79]	DL-CNN (SA-AbMILP)	Histological assessment	CD	Automatic histological assessment for GHAS and GS	Accuracy between 65.0%-89.0%
Furlanello et al[74]	DL-CNN (StarDist)	Histological assessment	IBD	Discriminates IBD from non-IBD mucosa	Accuracy = 90.0%
Kiyokawa et al[76]	DL-CNN (EfficientNet-b5)	Risk stratification	CD	Prediction of postoperative recurrence	AUC = 0.995, accuracy = 96.9%, precision = 96.4%
Con et al[81]	DL-RNN	Therapy	CD	Predicts post-therapy remission to anti-TNF	AUC = 0.754

AI: Artificial intelligence; ML: Machine learning; RF: Random forests; DL: Deep learning; ER: Endoscopic remission; AUC: Area under the curve; SVM: Support vector machine; XGBoost: Extreme gradient boosting; DNN: Deep neural network; CNN: Conventional neural network; SA-AbMILP: Self-attention-based multi-instance learning pooling; RNN: Recurrent neural network; CD: Crohn’s disease; IBD: Inflammatory bowel disease; UC: Ulcerative colitis; HR: Histological remission; VDZ: Vedolizumab; UST: Ustekinumab; IFX: Infliximab; TNF: Tumor necrosis factor; MES: Mayo endoscopic score; PHRI: Paddington international virtual chromoendoscopy score histologic remission index; GHAS: Global histological disease activity score; NHI: Nancy histological index; ICC: Intraclass correlation coefficient; UCEIS: Ulcerative colitis endoscopic index of severity; GS: Geboes score.

Full Size Table

Citation: Minea H, Singeap AM, Minea M, Chiriac S, Stanciu C, Trifan A. Artificial intelligence in inflammatory bowel disease: Current applications and future directions. World J Gastroenterol 2025; 31(39): 111353
URL: https://www.wjgnet.com/1007-9327/full/v31/i39/111353.htm
DOI: https://dx.doi.org/10.3748/wjg.v31.i39.111353