©Author(s) (or their employer(s)) 2026.
Artif Intell Gastrointest Endosc. Mar 8, 2026; 7(1): 117988
Published online Mar 8, 2026. doi: 10.37126/aige.v7.i1.117988
Published online Mar 8, 2026. doi: 10.37126/aige.v7.i1.117988
Table 1 Clinical applications of artificial intelligence in capsule endoscopy1
| Lesion type | Ref. | Model | Capsule type | Number of training images | Performance metrics |
| GI hemorrhage | Jia et al[38] | Deep CNN for bleeding detection | Small bowel | 10000 images (bleeding and non-bleeding) | Improved precision, recall |
| Spada et al[35] | A multicenter study utilizing AI-assisted reading of lesions | Small bowel | 158235 images | Reading time reduced: 33.7 min → 3.8 minutes (P < 0.0001); improved accuracy | |
| Erosions and ulcers | Ribeiro et al[40] | CNN model for colonic ulcers | Colon capsule | 37319 (3570 with lesions) | Sensitivity: 96.9%, specificity: 99.9% |
| Vascular lesions | Mascarenhas et al[42] | Multicenter study utilizing CNN | Small bowel and colon capsule | 34665 (11091 with lesions) | Diagnostic accuracy: 95%, sensitivity: 86.4%, specificity: 98.3% |
| Polyps and tumors | Kjølhede et al[44] | Systematic review and meta-analysis for the detection of polyps < 6 mm to ≥ 10 mm | CCE-2 | Combined across studies | Sensitivity: 85%-87%, specificity: 85%-95% |
| Polyps and tumors | Moen et al[45] | Systematic review of AI models for polyp or colorectal neoplasia detection | CCE-2 | Varied across studies (thousands to 30000) | Per-frame sensitivity: 47.4%-98.1%, specificity: 87.0%-96.3%; per-lesion sensitivity: 81.3%–98.1% |
Table 2 Comparative overview of image-only vs multimodal artificial intelligence architectures in capsule endoscopy1
| Model | Modality type | Capsule platform | Key diagnostic application | Performance accuracy |
| Ding et al[27], 2023 (CNN) | Image only | Small bowel capsule | Lesion detection (ulcers, bleeding) | Sensitivity 99.2%, specificity 96.7% |
| Nam et al[11], 2024 (CNN-LSTM) | Multimodal (video + IMU) | PillCam™ SB3 | Organ localization and transit time estimation | > 95% accuracy |
| Turan et al[31], 2017 (Endo-VMFuseNet) | Multimodal (video + magnetic) | Experimental platform | Capsule localization (3D trajectory mapping) | Sub-millimeter accuracy |
| Mascarenhas et al[42], 2024 (CNN) | Image only | Multibrand CE | Vascular lesion detection | Accuracy 95%, sensitivity 86.4%, specificity 98.3% |
| Vedaei and Wahid[13], 2021 (prototype) | Multimodal (video + IMU) | Research prototype | 3D trajectory reconstruction | Improved localization accuracy |
- Citation: Chowdhary R, Sheth PD, Rampurawala IM, Kapadia C, Vohra C, Chowdhary R, Arora K, Taranikanti V, Vuthaluru AR, Goyal O, Goyal MK. Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics. Artif Intell Gastrointest Endosc 2026; 7(1): 117988
- URL: https://www.wjgnet.com/2689-7164/full/v7/i1/117988.htm
- DOI: https://dx.doi.org/10.37126/aige.v7.i1.117988
