Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics

doi:10.37126/aige.v7.i1.117988

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 7, Issue 1

This Article

(0) (0) (0)

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (955)

All Articles published online

Item

Count

PDF

HTML

296

Figures (1-3)

Tables (1-2)

Sum=464

Featured Article

Item

Count

Browse

108

Download

129

Sum=237

Publishing Process of This Article

Item

Count

Browse

Download

161

Sum=212

Mar 8, 2026 (publication date) through May 20, 2026

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

Artificial Intelligence in Gastrointestinal Endoscopy

ISSN

2689-7164

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Minireviews

©Author(s) (or their employer(s)) 2026. No commercial re-use. See Permissions. Published by Baishideng Publishing Group Inc.

Artif Intell Gastrointest Endosc. Mar 8, 2026; 7(1): 117988
Published online Mar 8, 2026. doi: 10.37126/aige.v7.i1.117988

Open in New Tab Full Size Figure Download Figure

Figure 1 Components of multimodal artificial intelligence. This schematic illustrates how modern capsule endoscopy integrates multiple data streams to enable multimodal artificial intelligence (AI). As the capsule traverses the gastrointestinal tract, it captures video data and generates additional sensor-derived signals (e.g., localization, motility metrics, transit time, pH). These heterogeneous inputs are processed through multimodal AI frameworks that combine visual and non-visual modalities to enhance lesion detection, anatomical localization, motility assessment, and overall diagnostic accuracy. AI: Artificial intelligence. Created in BioRender.

Open in New Tab Full Size Figure Download Figure

Figure 2 Multimodal fusion paradigms in capsule endoscopy. Schematic representation of three primary strategies for integrating visual and sensor data in capsule endoscopy. A: Early fusion (e.g., Endo-VMFuseNet) combines visual and sensory inputs at the feature level using long short-term memory networks, achieving sub-millimeter localization accuracy without explicit calibration; B: Late fusion performs modality-specific analyses independently, then merges model outputs via weighted voting or averaging to produce a robust final prediction; C: Hybrid fusion (e.g., convolutional neural network-long short-term memory hybrid) integrates spatial features from video with temporal features from inertial measurement unit data, enabling accurate organ localization and transit-time estimation (> 95% accuracy). LSTM: Long short-term memory; IMU: Inertial measurement units. Created in BioRender.

Open in New Tab Full Size Figure Download Figure

Figure 3 Strengths, weaknesses, opportunities, and threats analysis of multimodal artificial intelligence systems for capsule endoscopy. This figure illustrates the strengths, weaknesses, opportunities, and threats associated with integrating multimodal artificial intelligence into capsule endoscopy. Key advantages include enhanced lesion localization, diagnostic accuracy, and workflow efficiency. Challenges span computational demands and limited clinical validation. The approach offers opportunities for advanced lesion mapping and integration of diverse sensor data, while regulatory, interoperability, and data security concerns represent potential barriers. GI: Gastrointestinal; CE: Capsule endoscopy; IBD: Inflammatory bowel disease.

Citation: Chowdhary R, Sheth PD, Rampurawala IM, Kapadia C, Vohra C, Chowdhary R, Arora K, Taranikanti V, Vuthaluru AR, Goyal O, Goyal MK. Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics. Artif Intell Gastrointest Endosc 2026; 7(1): 117988
URL: https://www.wjgnet.com/2689-7164/full/v7/i1/117988.htm
DOI: https://dx.doi.org/10.37126/aige.v7.i1.117988

Chowdhary R, Sheth PD, Rampurawala IM, Kapadia C, Vohra C, Chowdhary R, Arora K, Taranikanti V, Vuthaluru AR, Goyal O, Goyal MK. Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics. Artif Intell Gastrointest Endosc 2026; 7(1): 117988 [DOI: 10.37126/aige.v7.i1.117988]

All content on this site: Copyright © 1993-2026 Baishideng Publishing Group Inc, its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies. For all open access content, the relevant licensing terms apply.