Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics

doi:10.37126/aige.v7.i1.117988

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 7, Issue 1

This Article

(31)

(32)

(0)

(876)

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Journal Information of This Article

Publication Name

Artificial Intelligence in Gastrointestinal Endoscopy

ISSN

2689-7164

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Minireviews

©Author(s) (or their employer(s)) 2026. No commercial re-use. See Permissions. Published by Baishideng Publishing Group Inc.

Artif Intell Gastrointest Endosc. Mar 8, 2026; 7(1): 117988
Published online Mar 8, 2026. doi: 10.37126/aige.v7.i1.117988

Table 1 Clinical applications of artificial intelligence in capsule endoscopy¹

Lesion type	Ref.	Model	Capsule type	Number of training images	Performance metrics
GI hemorrhage	Jia et al[38]	Deep CNN for bleeding detection	Small bowel	10000 images (bleeding and non-bleeding)	Improved precision, recall
GI hemorrhage	Spada et al[35]	A multicenter study utilizing AI-assisted reading of lesions	Small bowel	158235 images	Reading time reduced: 33.7 min → 3.8 minutes (P < 0.0001); improved accuracy
Erosions and ulcers	Ribeiro et al[40]	CNN model for colonic ulcers	Colon capsule	37319 (3570 with lesions)	Sensitivity: 96.9%, specificity: 99.9%
Vascular lesions	Mascarenhas et al[42]	Multicenter study utilizing CNN	Small bowel and colon capsule	34665 (11091 with lesions)	Diagnostic accuracy: 95%, sensitivity: 86.4%, specificity: 98.3%
Polyps and tumors	Kjølhede et al[44]	Systematic review and meta-analysis for the detection of polyps < 6 mm to ≥ 10 mm	CCE-2	Combined across studies	Sensitivity: 85%-87%, specificity: 85%-95%
Polyps and tumors	Moen et al[45]	Systematic review of AI models for polyp or colorectal neoplasia detection	CCE-2	Varied across studies (thousands to 30000)	Per-frame sensitivity: 47.4%-98.1%, specificity: 87.0%-96.3%; per-lesion sensitivity: 81.3%–98.1%

¹Summary of key studies evaluating the diagnostic performance of artificial intelligence (AI)-assisted capsule endoscopy across various gastrointestinal lesion types. Convolutional neural network-based models have demonstrated improved precision, sensitivity, and specificity in detecting gastrointestinal bleeding, erosions, ulcers, vascular lesions, and colorectal polyps compared with conventional reading. AI integration has also significantly reduced reading time and enhanced diagnostic consistency among readers of different experience levels.

CCE: Colon capsule endoscopy; CNN: Convolutional neural networks; AI: Artificial intelligence.

Full Size Table

Table 2 Comparative overview of image-only vs multimodal artificial intelligence architectures in capsule endoscopy¹

Model	Modality type	Capsule platform	Key diagnostic application	Performance accuracy
Ding et al[27], 2023 (CNN)	Image only	Small bowel capsule	Lesion detection (ulcers, bleeding)	Sensitivity 99.2%, specificity 96.7%
Nam et al[11], 2024 (CNN-LSTM)	Multimodal (video + IMU)	PillCam™ SB3	Organ localization and transit time estimation	> 95% accuracy
Turan et al[31], 2017 (Endo-VMFuseNet)	Multimodal (video + magnetic)	Experimental platform	Capsule localization (3D trajectory mapping)	Sub-millimeter accuracy
Mascarenhas et al[42], 2024 (CNN)	Image only	Multibrand CE	Vascular lesion detection	Accuracy 95%, sensitivity 86.4%, specificity 98.3%
Vedaei and Wahid[13], 2021 (prototype)	Multimodal (video + IMU)	Research prototype	3D trajectory reconstruction	Improved localization accuracy

¹Summary of representative studies evaluating artificial intelligence (AI) systems in capsule endoscopy, comparing image-only convolutional neural network (CNN) models with multimodal architectures that integrate additional sensor inputs such as inertial measurement units or magnetic trackers. Image-only CNN models show excellent lesion detection accuracy (95%-99%) for bleeding and ulcer identification, whereas multimodal systems, such as CNN-long short-term memory and Endo-VMFuseNet, achieve superior spatial localization and temporal mapping capabilities (> 95% accuracy, sub-millimeter precision). These findings highlight that while image-only AI excels in visual lesion recognition multimodal approaches provide enhanced anatomical context and diagnostic robustness.

LSTM: Long short-term memory network; 3D: Three-dimensional; CNN: Convolutional neural networks; IMU: Inertial measurement units; CE: Capsule endoscopy.

Full Size Table

Citation: Chowdhary R, Sheth PD, Rampurawala IM, Kapadia C, Vohra C, Chowdhary R, Arora K, Taranikanti V, Vuthaluru AR, Goyal O, Goyal MK. Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics. Artif Intell Gastrointest Endosc 2026; 7(1): 117988
URL: https://www.wjgnet.com/2689-7164/full/v7/i1/117988.htm
DOI: https://dx.doi.org/10.37126/aige.v7.i1.117988

Chowdhary R, Sheth PD, Rampurawala IM, Kapadia C, Vohra C, Chowdhary R, Arora K, Taranikanti V, Vuthaluru AR, Goyal O, Goyal MK. Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics. Artif Intell Gastrointest Endosc 2026; 7(1): 117988 [DOI: 10.37126/aige.v7.i1.117988]

All content on this site: Copyright © 1993-2026 Baishideng Publishing Group Inc, its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies. For all open access content, the relevant licensing terms apply.