©Author(s) (or their employer(s)) 2026.
Artif Intell Gastrointest Endosc. Mar 8, 2026; 7(1): 117988
Published online Mar 8, 2026. doi: 10.37126/aige.v7.i1.117988
Published online Mar 8, 2026. doi: 10.37126/aige.v7.i1.117988
Figure 1 Components of multimodal artificial intelligence.
This schematic illustrates how modern capsule endoscopy integrates multiple data streams to enable multimodal artificial intelligence (AI). As the capsule traverses the gastrointestinal tract, it captures video data and generates additional sensor-derived signals (e.g., localization, motility metrics, transit time, pH). These heterogeneous inputs are processed through multimodal AI frameworks that combine visual and non-visual modalities to enhance lesion detection, anatomical localization, motility assessment, and overall diagnostic accuracy. AI: Artificial intelligence. Created in BioRender.
Figure 2 Multimodal fusion paradigms in capsule endoscopy.
Schematic representation of three primary strategies for integrating visual and sensor data in capsule endoscopy. A: Early fusion (e.g., Endo-VMFuseNet) combines visual and sensory inputs at the feature level using long short-term memory networks, achieving sub-millimeter localization accuracy without explicit calibration; B: Late fusion performs modality-specific analyses independently, then merges model outputs via weighted voting or averaging to produce a robust final prediction; C: Hybrid fusion (e.g., convolutional neural network-long short-term memory hybrid) integrates spatial features from video with temporal features from inertial measurement unit data, enabling accurate organ localization and transit-time estimation (> 95% accuracy). LSTM: Long short-term memory; IMU: Inertial measurement units. Created in BioRender.
Figure 3 Strengths, weaknesses, opportunities, and threats analysis of multimodal artificial intelligence systems for capsule endoscopy.
This figure illustrates the strengths, weaknesses, opportunities, and threats associated with integrating multimodal artificial intelligence into capsule endoscopy. Key advantages include enhanced lesion localization, diagnostic accuracy, and workflow efficiency. Challenges span computational demands and limited clinical validation. The approach offers opportunities for advanced lesion mapping and integration of diverse sensor data, while regulatory, interoperability, and data security concerns represent potential barriers. GI: Gastrointestinal; CE: Capsule endoscopy; IBD: Inflammatory bowel disease.
- Citation: Chowdhary R, Sheth PD, Rampurawala IM, Kapadia C, Vohra C, Chowdhary R, Arora K, Taranikanti V, Vuthaluru AR, Goyal O, Goyal MK. Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics. Artif Intell Gastrointest Endosc 2026; 7(1): 117988
- URL: https://www.wjgnet.com/2689-7164/full/v7/i1/117988.htm
- DOI: https://dx.doi.org/10.37126/aige.v7.i1.117988
