Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics

doi:10.37126/aige.v7.i1.117988

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 7, Issue 1

This Article

(0) (0) (0)

Table of Contents

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (955)

All Articles published online

Item

Count

PDF

HTML

296

Figures (1-3)

Tables (1-2)

Sum=464

Featured Article

Item

Count

Browse

108

Download

129

Sum=237

Publishing Process of This Article

Item

Count

Browse

Download

161

Sum=212

Mar 8, 2026 (publication date) through May 20, 2026

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

Artificial Intelligence in Gastrointestinal Endoscopy

ISSN

2689-7164

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Minireviews Open Access

©Author(s) (or their employer(s)) 2026. No commercial re-use. See Permissions. Published by Baishideng Publishing Group Inc.

Artif Intell Gastrointest Endosc. Mar 8, 2026; 7(1): 117988
Published online Mar 8, 2026. doi: 10.37126/aige.v7.i1.117988

Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics

Rishi Chowdhary, Param Darpan Sheth, Insiya Mohammed Rampurawala, Chitresh Kapadia, Chirag Vohra, Rahul Chowdhary, Kirti Arora, Varna Taranikanti, Ashita Rukmini Vuthaluru, Omesh Goyal, Manjeet Kumar Goyal

Rishi Chowdhary, Department of Medicine, MetroHealth Medical Center, Cleveland, OH 44109, United States

Param Darpan Sheth, Insiya Mohammed Rampurawala, Department of Internal Medicine, J.S.S Medical College, JSS Academy of Higher Education and Research, Mysuru 570015, Karnātaka, India

Chitresh Kapadia, Department of Internal Medicine, Government Medical College, Miraj 416410, Mahārāshtra, India

Chirag Vohra, Department of Medicine, All India Institute of Medical Sciences, Jodhpur 342005, Rājasthān, India

Rahul Chowdhary, Kirti Arora, Manjeet Kumar Goyal, Department of Internal Medicine, Cleveland Clinic Akron General Hospital, Akron, OH 44307, United States

Varna Taranikanti, Department of Foundational Medical Studies, Oakland University William Beaumont School of Medicine Rochester, Rochester, MI 48309, United States

Ashita Rukmini Vuthaluru, Department of Anesthesiology, All India Institute of Medical Sciences, New Delhi 110029, Delhi, India

Omesh Goyal, Department of Gastroenterology, Dayanand Medical College and Hospital, Tagore Nagar, Ludhiana 141001, Punjab, India

ORCID number: Rishi Chowdhary (0000-0002-3075-0684); Omesh Goyal (0000-0002-6347-0988); Manjeet Kumar Goyal (0000-0002-5511-2099).

Author contributions: Chowdhary Ri, Sheth PD, Rampurawala IM, Chowdhary Ra, and Goyal MK performed the conceptualization of the study; Chowdhary Ri, Goyal O, and Kapadia C developed the methodology and design; Chowdhary Ra, Rampurawala IM, Vohra C, Chowdhary Ri, Taranikanti V, and Arora K conducted the literature review and data curation; Rampurawala IM and Goyal MK carried out the visualization and figure preparation; Chowdhary Ra, Sheth PD, Vuthaluru AR, and Rampurawala IM wrote the original draft; all authors contributed to the review and editing of the subsequent versions of the manuscript; Goyal MK provided supervision and validation of the study; all authors read and approved the final version of the manuscript.

Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.

Corresponding author: Manjeet Kumar Goyal, DM, DNB, MD, Department of Internal Medicine, Cleveland Clinic Akron General Hospital, 1 Akron General Avenue, Akron, OH 44308, United States. manjeetgoyal@gmail.com

Received: December 22, 2025
Revised: January 8, 2026
Accepted: January 22, 2026
Published online: March 8, 2026
Processing time: 73 Days and 0.6 Hours

Abstract

Capsule endoscopy (CE) enables noninvasive visualization of the gastrointestinal tract but generates tens of thousands of images per examination, creating substantial interpretive burden, reader fatigue, and uncertainty in lesion localization. Artificial intelligence (AI), particularly deep learning-based image analysis, has markedly improved lesion detection and reduced reading time; however, image-only models remain limited in anatomical localization and spatiotemporal context. This narrative review summarizes current evidence on multimodal AI in CE, focusing on systems integrating capsule video with sensor-derived data such as inertial, magnetic, and physiological inputs. A structured search was conducted. Image-only convolutional neural network models consistently demonstrated high sensitivity (> 95%) for detecting gastrointestinal bleeding, ulcers, vascular lesions, polyps, and neoplasia, while reducing mean reading time from approximately 40-60 minutes to under 5 minutes. Multimodal architectures-including convolutional neural network-long short-term memory hybrids and sensor-fusion frameworks-further enabled accurate organ classification, transit time estimation, and improved lesion localization, particularly under suboptimal image quality. Overall, image-only AI excels in visual lesion detection, whereas multimodal AI enhances spatial awareness and diagnostic context. Key challenges include sensor heterogeneity, limited prospective validation, and regulatory considerations. With standardization and large-scale validation, multimodal AI may enable more context-aware and clinically interpretable CE.

Key Words: Capsule endoscopy; Artificial intelligence; Multimodal artificial intelligence; Deep learning; Convolutional neural networks; Gastrointestinal diagnostics; Data fusion; Lesion detection

Core Tip: Capsule endoscopy (CE) generates thousands of images per study, creating diagnostic and workflow challenges due to manual interpretation and localization errors. The integration of multimodal artificial intelligence combining visual data with sensor inputs such as inertial measurement units, magnetic trackers, and physiological monitors has significantly improved lesion detection, localization, and reading efficiency. Advanced architectures achieve sub-millimeter localization accuracy and > 95% diagnostic precision. These developments represent a paradigm shift in CE, transforming it from a passive imaging tool into an intelligent, context-aware diagnostic platform with the potential to enhance accuracy, reduce reading time, and standardize interpretation across clinicians.

Citation: Chowdhary R, Sheth PD, Rampurawala IM, Kapadia C, Vohra C, Chowdhary R, Arora K, Taranikanti V, Vuthaluru AR, Goyal O, Goyal MK. Multimodal artificial intelligence in capsule endoscopy: Integrating video and sensor data for advanced gastrointestinal diagnostics. Artif Intell Gastrointest Endosc 2026; 7(1): 117988
URL: https://www.wjgnet.com/2689-7164/full/v7/i1/117988.htm
DOI: https://dx.doi.org/10.37126/aige.v7.i1.117988

INTRODUCTION

Capsule endoscopy (CE) has emerged as a noninvasive method for examining the gastrointestinal (GI) tract. It is used to diagnose various small bowel and large bowel diseases, including Crohn’s disease, celiac disease, suspected small bowel hemorrhage or lesions, and nonsteroidal anti-inflammatory drug-induced enteropathy in both the small and large bowel. It also enables visualization of areas in the jejunum and ileum that traditional endoscopy could not reach or access easily, thus expanding the diagnostic capabilities for GI disorders[1-4]. However, a single CE generates approximately 10000 images in one session, requiring over 40-50 minutes of reading time for each frame, proving a tedious task for gastroenterologists[5]. It is also highly dependent on the experience of the practicing gastroenterologist[6]. Therefore, this results in significant intraobserver and interobserver variability in the detection of vascular lesions, ulcers, and neoplasms as they could be limited to fewer frames, thereby increasing their risk of being overlooked[1,7].

To overcome these drawbacks artificial intelligence (AI) algorithms are being integrated with CE, thereby improving the detection of abnormalities and enhancing both sensitivity and specificity[8]. AI is defined as the ability of a computer to perform cognitive tasks comparable to those of a human. In recent years AI has been increasingly utilized in radiology, gastroenterology, orthopedics, and other medical fields to support healthcare professionals in diagnosing multiple conditions[9]. The deep learning algorithm is a form of AI that utilizes multiple neural networks arranged in layers to structure data in an orderly manner and extract information from it without requiring human assistance. In particular, convolutional neural networks (CNNs) are commonly used in deep learning models for image recognition and classification[10].

Despite these successes the drawback of analyzing image data alone is that it can be confounded by several artifacts, such as poor lighting in the GI tract, food debris, and uncontrolled peristaltic motion, thus hampering image quality[2,11]. More importantly, conventional AI cannot reliably determine the exact location of a lesion within the GI tract even though this information is crucial for clinical management[12]. Recent studies that incorporated multimodal AI in CE have been able to overcome these drawbacks by combining visual, sensory, and motion data to localize abnormalities in the GI tract with enhanced accuracy[13]. Figure 1 illustrates how multimodal AI combines capsule video with sensor data to improve lesion localization and diagnostic accuracy. To contextualize the evolution this review discusses the use of both traditional and multimodal AI in CE.

Open in New Tab Full Size Figure Download Figure

Figure 1 Components of multimodal artificial intelligence. This schematic illustrates how modern capsule endoscopy integrates multiple data streams to enable multimodal artificial intelligence (AI). As the capsule traverses the gastrointestinal tract, it captures video data and generates additional sensor-derived signals (e.g., localization, motility metrics, transit time, pH). These heterogeneous inputs are processed through multimodal AI frameworks that combine visual and non-visual modalities to enhance lesion detection, anatomical localization, motility assessment, and overall diagnostic accuracy. AI: Artificial intelligence. Created in BioRender.

BACKGROUND AND CURRENT STATE OF CE

CE was first introduced in 2001 and has since become an integral part of diagnosing diseases of the small intestine[14]. It involves patients swallowing a microcapsule, essentially a camera that captures images of the GI tract and transmits to a device located outside the patient’s body[14,15]. In clinical practice CE is widely used for small bowel evaluation because it is painless, requires no anesthesia, and allows visualization of otherwise obscure regions[9,15]. It is also used for screening and surveillance in familial adenomatous polyposis, Barrett’s esophagus, and esophageal varices and is increasingly explored for triaging and managing acute GI bleeding in the emergency and inpatient settings. In the colon it is reserved for incomplete colonoscopy or patients at high sedation risk and is contraindicated in cognitive impairment, high retention risk, and active Crohn’s disease with caution advised in those with cardiac implantable electronic devices[2].

Since the development of CE, several variations have entered the market, offering improved image resolution, wider field of view, improved battery life, and variable frame rates from 2 to 35 images captured by the capsule per second[16]. These include the PillCam SB3, Olympus Endocapsule 10, Micro Cam, and OMOM, all of which offer similar diagnostic accuracy[2,17]. The newer hybrid and technology-driven capsules, particularly the magnetically controlled gastric CE, enabled better evaluation by controlling capsule movement through the GI tract via magnetic steering[18]. However, they are expensive and complex to operate[16].

Although CE has several advantages, examining large volume images often results in lengthy reading time. Additionally, the accuracy in the detection of abnormalities is less because of the length of the small intestine and peristaltic motion, which can affect image quality. This not only strains physician time but also risks fatigue-related misses[9].

Therefore, in CE the integration of AI has enabled efficient analysis of vast volumes of visual data, assisting gastroenterologists in rapidly identifying clinically significant findings. Newer generations of high-definition capsules further enhance this capability by providing richer datasets for training CNNs, incorporating features such as the suspected blood indicator, adaptive frame rate technology, and the quick-view algorithm, which facilitates rapid detection of masses or bleeding sources[9,17,19]. In addition, modern capsule endoscopes now incorporate inertial and magnetic sensors that provide real-time positional and motion data, enabling more accurate localization beyond visual cues alone[20]. By integrating these multimodal data streams through AI, CE achieves improved diagnostic precision and spatial awareness, reflecting the evolution toward data-integrated medical AI systems[11,13].

ROLE OF AI IN CE

AI-based systems have begun to revolutionize medical imaging by performing initial interpretations and flagging abnormalities for clinicians, thus acting as a second pair of eyes[21]. CNNs are a specialized class of artificial neural networks that have become dominant in computer vision applications, including medical imaging[22]. Artificial neural networks form the foundational structure of both machine learning and deep learning and consist of multiple layers of interconnected “neurons” (mathematical functions) that learn complex patterns from data[23]. When trained on large, labeled datasets, such as CE image frames depicting ulcers, bleeding, or normal mucosa, the network iteratively adjusts its internal weights to achieve high classification accuracy[24,25]. Once trained, the model can categorize newly encountered images into clinically relevant groups (e.g., bleeding, ulcers, polyps), thereby reducing the cognitive burden associated with manual frame-by-frame review and improving overall sensitivity[26].

AI-based reading has also helped novice readers detect pathological lesions with significantly greater sensitivity than those without AI assistance and has brought their work performance and efficiency on par with those of experts. This performance advantage was demonstrated in studies by Ding et al[27] and Xie et al[28] in which junior readers supported by AI achieved significantly higher sensitivities for small bowel lesion detection (99.2% and 96.7%) compared with expert readers using conventional interpretation alone (91.1% and 88.8%, respectively). Notably, Ding et al[27] also reported a 33.3% reduction in missed diagnoses among novice readers when AI assistance was used, underscoring the role of AI in narrowing the expertise gap. This implies that AI can function as a built-in tutor or fail-safe, ensuring that less experienced physicians do not miss critical findings, thus standardizing CE interpretation quality across different centers and practitioners, an appealing prospect for healthcare systems aiming to deliver uniform care.

Additionally, AI can accelerate the reading process. A meta-analysis by Cortegoso Valdivia et al[26] demonstrated the superiority of AI-assisted reading over conventional interpretation in small bowel CE with significantly higher diagnostic accuracy and sensitivity. Across multiple studies AI-assisted reading achieved sensitivity values of 1.00, 0.99, 0.93, and 0.98 compared with 0.75, 0.88, 0.79, and 0.89 for conventional reading. AI assistance also markedly reduced the number of images requiring review, resulting in a dramatic reduction in mean reading time from 56.7 minutes to 4.7 minutes[26]. In a recent multicenter prospective study on small bowel bleeding, integrating an AI system into routine reading reduced the average reading time from 33.7 minutes to 3.8 minutes without compromising the diagnostic yield. This efficiency gain not only improves workflow but also potentially allows earlier decision-making[29].

MULTIMODAL AI IN CE: INTEGRATING VIDEO WITH SENSOR DATA

Multimodal fusion paradigms

Multimodal AI systems fundamentally differ from conventional image-only CNN models in CE analysis by fusing diverse data streams across different levels of integration. Integration can occur at different stages of the AI pipeline, generally categorized into three principal fusion strategies: Early; late; and hybrid. Figure 2 illustrates these fusion paradigms conceptually along with the strengths and weaknesses of each approach (strengths, weaknesses, opportunities, and threats analysis in Figure 3)[30].

Open in New Tab Full Size Figure Download Figure

Figure 2 Multimodal fusion paradigms in capsule endoscopy. Schematic representation of three primary strategies for integrating visual and sensor data in capsule endoscopy. A: Early fusion (e.g., Endo-VMFuseNet) combines visual and sensory inputs at the feature level using long short-term memory networks, achieving sub-millimeter localization accuracy without explicit calibration; B: Late fusion performs modality-specific analyses independently, then merges model outputs via weighted voting or averaging to produce a robust final prediction; C: Hybrid fusion (e.g., convolutional neural network-long short-term memory hybrid) integrates spatial features from video with temporal features from inertial measurement unit data, enabling accurate organ localization and transit-time estimation (> 95% accuracy). LSTM: Long short-term memory; IMU: Inertial measurement units. Created in BioRender.

Open in New Tab Full Size Figure Download Figure

Figure 3 Strengths, weaknesses, opportunities, and threats analysis of multimodal artificial intelligence systems for capsule endoscopy. This figure illustrates the strengths, weaknesses, opportunities, and threats associated with integrating multimodal artificial intelligence into capsule endoscopy. Key advantages include enhanced lesion localization, diagnostic accuracy, and workflow efficiency. Challenges span computational demands and limited clinical validation. The approach offers opportunities for advanced lesion mapping and integration of diverse sensor data, while regulatory, interoperability, and data security concerns represent potential barriers. GI: Gastrointestinal; CE: Capsule endoscopy; IBD: Inflammatory bowel disease.

Early fusion (feature-level integration): In early fusion raw or extracted features from video and sensor data are concatenated prior to being input into a unified model. This approach allows simultaneous learning from visual and non-visual cues. A representative model is Endo-VMFuseNet, which employs CNNs to process visual odometry and magnetic sensor inputs, subsequently merging them through hierarchical long short-term memory (LSTM) networks. This architecture achieved sub-millimeter localization accuracy without requiring explicit sensor calibration or synchronization, thereby addressing one of the major challenges in capsule localization[31].

Late fusion (decision-level integration): Late fusion involves analyzing each modality independently and subsequently combining its outputs via weighted voting or averaging to generate final predictions. This strategy enhances system robustness because performance degradation in one modality, such as temporary sensor dropout, does not drastically impair the final decision, improving reliability in real-world applications[32].

Hybrid fusion (multistage integration): Hybrid fusion combines the strengths of both early and late fusion. Here, modality-specific features are first extracted individually and then integrated at a later stage for contextual refinement. Based on this, Nam et al[11] developed a CNN–LSTM hybrid network that fuses spatial features from CE video with temporal inertial measurement unit data. Their system demonstrated over 95% accuracy in classifying organ locations and estimating GI transit times.

In practice the choice of fusion strategy depends on the application. Early fusion suits tightly linked data in which early interactions matter. Late fusion is useful when one data source may be unreliable, and hybrid fusion is increasingly favored as a balanced approach that combines both. Across all methods, accurate time alignment of data streams and robustness to noise remain key challenges[33,34].

CLINICAL APPLICATIONS

The clinical utility of AI-driven CE extends across a broad spectrum of GI pathologies in which its ability to enhance diagnostic accuracy, efficiency, and reproducibility has been repeatedly demonstrated. Multiple studies have evaluated AI-driven systems in the detection of GI hemorrhage, erosions, ulcers, vascular lesions, polyps, and tumors, conditions that often pose interpretative challenges during manual CE review.

GI hemorrhage

GI bleeding is the most common anomaly of the GI tract and a cardinal manifestation in conditions such as ulcers, polyps, tumors, and inflammatory bowel disease. This makes the detection of GI bleed an essential component of CE[27,35]. AI systems have demonstrated excellent pooled diagnostic performance for GI ulcers and hemorrhage with an overall accuracy of 95.4%, sensitivity of 95.5%, and specificity of 95.8% and even higher performance for hemorrhage detection alone (sensitivity 96% and specificity 97%)[36,37]. Jia et al[38] demonstrated superior precision and recall with a CNN model compared with conventional bleeding detection methods, along with a higher F1 score (a composite metric defined as the harmonic mean of precision and recall that reflects the optimal balance between false-positive and false-negative classifications). A higher F1 score indicates more reliable and clinically robust bleed detection, particularly in class-imbalanced settings such as CE in which accuracy alone can be misleading[38,39]. Similarly, in a multicenter prospective study by Spada et al[35] integrating AI for suspected small bowel bleeding significantly increased diagnostic yield at the patient level while slashing the mean reading time from 33.7 minutes to approximately 4 min (a nearly 10-fold reduction).

Erosion and ulcers

Erosions and ulceration are the most commonly encountered abnormalities in CE[25,35]. They can be subtle and numerous, making them tedious to identify manually. AI algorithms have excelled here as well. For example, Ribeiro et al[40] designed and tested a CNN model for detecting colonic ulcers and erosions in colon CE images, achieving a sensitivity of 96.9% and a specificity of 99.9%. While that study was in the colon, similar approaches in small bowel CE have also shown high sensitivity for ulcer detection. Aoki et al[41] presented a deep learning-based CNN model that demonstrated high accuracy for automatic detection of small bowel erosions and ulcerations on CE, achieving an area under the curve of 0.958 with 88.2% sensitivity and 90.9% specificity.

Vascular lesions

Angioectasias (vascular malformations) and varices appear as faint red spots and can be easily overlooked. Mascarenhas et al[42] conducted a multicentric study for the detection of vascular lesions based on a set of 1022 CE procedures. It was the first proof-of-concept CNN model for pan endoscopic detection of vascular lesions. The results showed 95% overall diagnostic accuracy with a sensitivity and specificity of 86.4% and 98.3%, respectively. Similarly, a CNN-based model for CE demonstrated by Ribeiro et al[43] showed an excellent performance for automatic detection of small bowel vascular lesions, achieving a sensitivity of 91.8%, specificity of 95.9%, and overall accuracy of 94.4% for identifying red spots, angioectasias, and varices. Notably, the algorithm processed images at a rate of 145 frames per second, enabling near–real-time analysis and markedly improving reading efficiency while maintaining high diagnostic precision[43]. This level of accuracy is promising because vascular lesions often cause occult bleeding. Therefore, an AI that consistently flags them could improve diagnostic yield for obscure bleeding cases.

Polyps and tumors

CE is also used for colorectal cancer screening in patients who cannot undergo colonoscopies (e.g., due to anesthesia risk). The second-generation colon CE-2 has reported a sensitivity of 85%-87% and a specificity of 85%-95% for the detection of colorectal polyps ranging in size from < 6 mm to ≥ 10 mm[44]. A systematic review by Moen et al[45] reported sensitivities ranging from 47.4% to 98.1% and specificities of 87.0% to 96.3% in per-frame analysis using proposed AI designs for polyp or colorectal neoplasia detection[46-48]. The two studies that performed per-lesion analysis showed significantly improved sensitivity of polyp or colorectal neoplasia detection to 81.3%-98.1%[45,46]. All these are summarised in Table 1.

Table 1 Clinical applications of artificial intelligence in capsule endoscopy¹.

Lesion type	Ref.	Model	Capsule type	Number of training images	Performance metrics
GI hemorrhage	Jia et al[38]	Deep CNN for bleeding detection	Small bowel	10000 images (bleeding and non-bleeding)	Improved precision, recall
GI hemorrhage	Spada et al[35]	A multicenter study utilizing AI-assisted reading of lesions	Small bowel	158235 images	Reading time reduced: 33.7 min → 3.8 minutes (P < 0.0001); improved accuracy
Erosions and ulcers	Ribeiro et al[40]	CNN model for colonic ulcers	Colon capsule	37319 (3570 with lesions)	Sensitivity: 96.9%, specificity: 99.9%
Vascular lesions	Mascarenhas et al[42]	Multicenter study utilizing CNN	Small bowel and colon capsule	34665 (11091 with lesions)	Diagnostic accuracy: 95%, sensitivity: 86.4%, specificity: 98.3%
Polyps and tumors	Kjølhede et al[44]	Systematic review and meta-analysis for the detection of polyps < 6 mm to ≥ 10 mm	CCE-2	Combined across studies	Sensitivity: 85%-87%, specificity: 85%-95%
Polyps and tumors	Moen et al[45]	Systematic review of AI models for polyp or colorectal neoplasia detection	CCE-2	Varied across studies (thousands to 30000)	Per-frame sensitivity: 47.4%-98.1%, specificity: 87.0%-96.3%; per-lesion sensitivity: 81.3%–98.1%

¹Summary of key studies evaluating the diagnostic performance of artificial intelligence (AI)-assisted capsule endoscopy across various gastrointestinal lesion types. Convolutional neural network-based models have demonstrated improved precision, sensitivity, and specificity in detecting gastrointestinal bleeding, erosions, ulcers, vascular lesions, and colorectal polyps compared with conventional reading. AI integration has also significantly reduced reading time and enhanced diagnostic consistency among readers of different experience levels.

CCE: Colon capsule endoscopy; CNN: Convolutional neural networks; AI: Artificial intelligence.

Open in New Tab Full Size Table

COMPARATIVE PERFORMANCE OF IMAGE-ONLY AND MULTIMODAL AI SYSTEMS

Across diagnostic contexts, image-only CNNs consistently achieve excellent lesion-level sensitivity (95%-99%) for bleeding and ulcer detection, but their localization accuracy remains limited. In contrast, multimodal systems, such as CNN-LSTM and EndoVMFuseNet, demonstrate marked gains in spatiotemporal mapping and organ classification (> 95% accuracy, sub-millimeter localization) even when visual quality is suboptimal[11,49]. These results suggest that image-only AI excels in feature-intensive, high contrast lesions (e.g., active bleeding), whereas multimodal AI confers the greatest advantage in complex tasks requiring anatomical context or motion tracking, such as transit time estimation or three-dimensional trajectory reconstruction[49]. Thus, performance superiority depends on the diagnostic task rather than the architecture alone. A summary of the representative studies evaluating image-only and multimodal AI systems is presented in Table 2.

Table 2 Comparative overview of image-only vs multimodal artificial intelligence architectures in capsule endoscopy¹.

Model	Modality type	Capsule platform	Key diagnostic application	Performance accuracy
Ding et al[27], 2023 (CNN)	Image only	Small bowel capsule	Lesion detection (ulcers, bleeding)	Sensitivity 99.2%, specificity 96.7%
Nam et al[11], 2024 (CNN-LSTM)	Multimodal (video + IMU)	PillCam™ SB3	Organ localization and transit time estimation	> 95% accuracy
Turan et al[31], 2017 (Endo-VMFuseNet)	Multimodal (video + magnetic)	Experimental platform	Capsule localization (3D trajectory mapping)	Sub-millimeter accuracy
Mascarenhas et al[42], 2024 (CNN)	Image only	Multibrand CE	Vascular lesion detection	Accuracy 95%, sensitivity 86.4%, specificity 98.3%
Vedaei and Wahid[13], 2021 (prototype)	Multimodal (video + IMU)	Research prototype	3D trajectory reconstruction	Improved localization accuracy

¹Summary of representative studies evaluating artificial intelligence (AI) systems in capsule endoscopy, comparing image-only convolutional neural network (CNN) models with multimodal architectures that integrate additional sensor inputs such as inertial measurement units or magnetic trackers. Image-only CNN models show excellent lesion detection accuracy (95%-99%) for bleeding and ulcer identification, whereas multimodal systems, such as CNN-long short-term memory and Endo-VMFuseNet, achieve superior spatial localization and temporal mapping capabilities (> 95% accuracy, sub-millimeter precision). These findings highlight that while image-only AI excels in visual lesion recognition multimodal approaches provide enhanced anatomical context and diagnostic robustness.

LSTM: Long short-term memory network; 3D: Three-dimensional; CNN: Convolutional neural networks; IMU: Inertial measurement units; CE: Capsule endoscopy.

Open in New Tab Full Size Table

BENEFITS, IMPACT, CHALLENGES, AND FUTURE DIRECTIONS

Multimodal AI integration significantly enhances CE diagnostics by providing precise lesion localization, improved detection sensitivity, and dramatically reduced physician review time[11,50,51]. However, practical implementation faces challenges, including sensor misalignment, drift, computational complexity, and limited large-scale clinical validation[11,13]. Future research directions include adopting advanced AI architectures such as transformer models and graph neural networks, optimizing lightweight on-device models, and conducting comprehensive multicenter trials to validate clinical efficacy across diverse patient populations and CE platforms[13,50].

Beyond current architecture, transformer-based multimodal fusion (e.g., Vision Transformers and Swin-Transformer hybrids) and self-supervised pretraining approaches may enable models to learn cross-modal representations from fewer labeled datasets[52]. The integration of edge AI hardware could further enable real-time, low-latency inference for lesion detection and localization. Future work should also explore adaptive AI pipelines capable of online learning from clinician feedback, advancing CE from static analysis toward autonomous, context-aware diagnostics.

CHALLENGES AND LIMITATIONS

While multimodal AI in CE offers transformative potential, several challenges and limitations hinder its seamless clinical adoption. One major challenge is the generalizability of AI models. Many current algorithms are trained on retrospective, often single-center datasets that may not capture the full diversity of real-world patient populations or endoscopic findings. This can result in spectrum bias and reduced performance in broader clinical practice[53].

Technical limitations are also significant. CE images are frequently affected by motion artifacts, suboptimal bowel preparation, and variable lighting, all of which can impair AI accuracy and reliability. Sensor-based localization while vital for mapping lesions remains susceptible to signal loss, interference, and a lack of standardization across different platforms[54]. Furthermore, most AI systems have not been validated in large, prospective, multicenter trials, raising concerns about overfitting and investigator bias[55].

Ethical and legal considerations are increasingly important as AI systems become more autonomous. Issues such as patient data privacy, algorithmic bias, and the allocation of responsibility for diagnostic errors remain unresolved[56]. The economic burden of implementing advanced AI and capsule technology, including costs for devices, data storage, and computational infrastructure, may also restrict access, especially in resource-limited settings.

Finally, despite advances in automation expert human oversight remains essential. False positives and negatives can occur, and AI should currently be viewed as an adjunct rather than a replacement for clinical expertise[57].

CONCLUSION

Multimodal AI represents a significant advancement in CE by extending analysis beyond visual pattern recognition to incorporate spatial, temporal, and contextual information derived from auxiliary sensor data. Image-only deep learning models have consistently demonstrated excellent performance in lesion detection, particularly for bleeding, ulcers, vascular abnormalities, and neoplasia, while substantially reducing physician reading time. However, their reliance on visual cues alone limits anatomical localization and interpretability in dynamic GI environments.

By contrast, multimodal architectures that integrate video with inertial, magnetic, or physiological sensor inputs provide enhanced lesion localization, organ classification, and transit time estimation, addressing long-standing limitations of conventional CE. These capabilities support more precise clinical decision-making, facilitate targeted downstream interventions, and improve diagnostic confidence, particularly in complex or equivocal cases. Importantly, the complementary strengths of image-only and multimodal systems suggest that performance gains are task-dependent rather than architecture-dependent.

Despite these advances, widespread clinical adoption remains constrained by challenges related to data heterogeneity, platform-specific sensor variability, limited prospective multicenter validation, and unresolved ethical and regulatory considerations. Continued emphasis on standardized benchmarking, transparent model development, and large-scale clinical trials will be essential to ensure safe, reproducible, and equitable implementation.

With ongoing technological refinement and appropriate clinical validation, multimodal AI has the potential to transform CE from a passive imaging modality into an intelligent, context-aware diagnostic platform that enhances accuracy, efficiency, and consistency of GI disease evaluation.

References

Mascarenhas Saraiva M, Ribeiro T, Afonso J, Ferreira JPS, Cardoso H, Andrade P, Parente MPL, Jorge RN, Macedo G. Artificial Intelligence and Capsule Endoscopy: Automatic Detection of Small Bowel Blood Content Using a Convolutional Neural Network. GE Port J Gastroenterol. 2022;29:331-338. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 22] [Cited by in RCA: 12] [Article Influence: 3.0] [Reference Citation Analysis (2)]

Akpunonu B, Hummell J, Akpunonu JD, Ud Din S. Capsule endoscopy in gastrointestinal disease: Evaluation, diagnosis, and treatment. Cleve Clin J Med. 2022;89:200-211. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 45] [Cited by in RCA: 29] [Article Influence: 7.3] [Reference Citation Analysis (1)]

Srivastava H, Sehgal T, Mehta V, Berinstein J, Bishu S, Goyal MK. From pain relief to mucosal grief: A comprehensive review of NSAID enteropathy. Indian J Gastroenterol. 2025. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 1] [Article Influence: 1.0] [Reference Citation Analysis (0)]

George AA, Tan JL, Kovoor JG, Lee A, Stretton B, Gupta AK, Bacchi S, George B, Singh R. Artificial intelligence in capsule endoscopy: development status and future expectations. Mini-invasive Surg. 2024;8:4. [RCA] [DOI] [Full Text] [Cited by in Crossref: 4] [Cited by in RCA: 2] [Article Influence: 1.0] [Reference Citation Analysis (1)]

5.	Kwon YS, Park TY, Kim SE, Park Y, Lee JG, Lee SP, Kim KO, Jang HJ, Yang YJ, Cho BJ. Deep learning-based localization and lesion detection in capsule endoscopy for patients with suspected small-bowel bleeding. World J Gastroenterol. 2025;31:106819. [PubMed] [DOI] [Full Text]

6.	Liu HR. Deep learning meets small-bowel capsule endoscopy: A step toward faster and more consistent diagnosis of obscure gastrointestinal bleeding. World J Gastrointest Endosc. 2025;17:113184. [PubMed] [DOI] [Full Text]

7.	Trasolini R, Byrne MF. Artificial intelligence and deep learning for small bowel capsule endoscopy. Dig Endosc. 2021;33:290-297. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 41] [Cited by in RCA: 28] [Article Influence: 5.6] [Reference Citation Analysis (1)]

8.	Minchenberg SB, Walradt T, Glissen Brown JR. Scoping out the future: The application of artificial intelligence to gastrointestinal endoscopy. World J Gastrointest Oncol. 2022;14:989-1001. [PubMed] [DOI] [Full Text]

9.	Guo F, Meng H. Application of artificial intelligence in gastrointestinal endoscopy. Arab J Gastroenterol. 2024;25:93-96. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 9] [Cited by in RCA: 8] [Article Influence: 4.0] [Reference Citation Analysis (1)]

10.

Andrade P, Mascarenhas M, Mendes F, Rosa B, Cardoso P, Afonso J, Ribeiro T, Martins M, Mota J, Almeida MJ, Gonçalves TC, Campelo P, Macedo C, Pinto da Costa A, Santander C, di Palma J, Ferreira J, Cotter J, Macedo G. AI-Assisted Capsule Endoscopy for Detection of Ulcers and Erosions in Crohn's Disease: A Multicenter Validation Study. Clin Gastroenterol Hepatol. 2025;S1542-3565(25)00861. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4] [Cited by in RCA: 3] [Article Influence: 3.0] [Reference Citation Analysis (2)]

11.

Nam SJ, Moon G, Park JH, Kim Y, Lim YJ, Choi HS. Deep Learning-Based Real-Time Organ Localization and Transit Time Estimation in Wireless Capsule Endoscopy. Biomedicines. 2024;12:1704. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 5] [Cited by in RCA: 4] [Article Influence: 2.0] [Reference Citation Analysis (1)]

12.	Wu YM, Tang FY, Qi ZX. Multimodal artificial intelligence technology in the precision diagnosis and treatment of gastroenterology and hepatology: Innovative applications and challenges. World J Gastroenterol. 2025;31:109802. [PubMed] [DOI] [Full Text]

13.

Vedaei SS, Wahid KA. A localization method for wireless capsule endoscopy using side wall cameras and IMU sensor. Sci Rep. 2021;11:11204. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 36] [Cited by in RCA: 18] [Article Influence: 3.6] [Reference Citation Analysis (1)]

14.	Mustafa BF, Samaan M, Langmead L, Khasraw M. Small bowel video capsule endoscopy: an overview. Expert Rev Gastroenterol Hepatol. 2013;7:323-329. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 36] [Cited by in RCA: 24] [Article Influence: 1.8] [Reference Citation Analysis (1)]

15.	Xiao ZG, Chen XQ, Zhang D, Li XY, Dai WX, Liang WH. Image detection method for multi-category lesions in wireless capsule endoscopy based on deep learning models. World J Gastroenterol. 2024;30:5111-5129. [PubMed] [DOI] [Full Text]

16.

Su CC, Chou CK, Mukundan A, Karmakar R, Sanbatcha BF, Huang CW, Weng WC, Wang HC. Capsule Endoscopy: Current Trends, Technological Advancements, and Future Perspectives in Gastrointestinal Diagnostics. Bioengineering (Basel). 2025;12:613. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 4] [Cited by in RCA: 4] [Article Influence: 4.0] [Reference Citation Analysis (1)]

17.

Byrne MF, Donnellan F. Artificial intelligence and capsule endoscopy: Is the truly "smart" capsule nearly here? Gastrointest Endosc. 2019;89:195-197. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 23] [Cited by in RCA: 17] [Article Influence: 2.4] [Reference Citation Analysis (1)]

18.	Rey JF. Magnetically guided gastric capsule endoscopy: a review and new developments. Clin Endosc. 2025;58:797-807. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 5] [Cited by in RCA: 3] [Article Influence: 3.0] [Reference Citation Analysis (0)]

19.	Abadir AP, Ali MF, Karnes W, Samarasena JB. Artificial Intelligence in Gastrointestinal Endoscopy. Clin Endosc. 2020;53:132-141. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 68] [Cited by in RCA: 48] [Article Influence: 8.0] [Reference Citation Analysis (1)]

20.	Xu Y, Chen R, Zhang P, Chen L, Luo B, Li Y, Xiao X, Dong W. A Review of Magnetic Sensor-Based Capsule Endoscopy Localization Technology. Int Arch Photogramm Remote Sens Spatial Inf Sci. 2022;XLVI-3/W1-2022:219-226. [PubMed] [DOI] [Full Text]

21.

Clement David-Olawade A, Aderinto N, Egbon E, Olatunji GD, Kokori E, Olawade DB. Enhancing endoscopic precision: the role of artificial intelligence in modern gastroenterology. J Gastrointest Surg. 2025;29:102195. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 8] [Cited by in RCA: 7] [Article Influence: 7.0] [Reference Citation Analysis (2)]

22.

Yamashita R, Nishio M, Do RKG, Togashi K. Convolutional neural networks: an overview and application in radiology. Insights Imaging. 2018;9:611-629. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3779] [Cited by in RCA: 1318] [Article Influence: 164.8] [Reference Citation Analysis (5)]

23.

Kufel J, Bargieł-Łączek K, Kocot S, Koźlik M, Bartnikowska W, Janik M, Czogalik Ł, Dudek P, Magiera M, Lis A, Paszkiewicz I, Nawrat Z, Cebula M, Gruszczyńska K. What Is Machine Learning, Artificial Neural Networks and Deep Learning?-Examples of Practical Applications in Medicine. Diagnostics (Basel). 2023;13:2582. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 306] [Cited by in RCA: 155] [Article Influence: 51.7] [Reference Citation Analysis (3)]

24.

Klang E, Kopylov U, Mortensen B, Damholt A, Soffer S, Barash Y, Konen E, Grinman A, Yehuda RM, Buckley M, Shanahan F, Eliakim R, Ben-Horin S. A Convolutional Neural Network Deep Learning Model Trained on CD Ulcers Images Accurately Identifies NSAID Ulcers. Front Med (Lausanne). 2021;8:656493. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 15] [Cited by in RCA: 13] [Article Influence: 2.6] [Reference Citation Analysis (1)]

25.

Dhali A, Kipkorir V, Maity R, Srichawla BS, Biswas J, Rathna RB, Bharadwaj HR, Ongidi I, Chaudhry T, Morara G, Waithaka M, Rugut C, Lemashon M, Cheruiyot I, Ojuka D, Ray S, Dhali GK. Artificial Intelligence-Assisted Capsule Endoscopy Versus Conventional Capsule Endoscopy for Detection of Small Bowel Lesions: A Systematic Review and Meta-Analysis. J Gastroenterol Hepatol. 2025;40:1105-1118. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 14] [Cited by in RCA: 12] [Article Influence: 12.0] [Reference Citation Analysis (1)]

26.

Cortegoso Valdivia P, Fantasia S, Kayali S, Deding U, Gualandi N, Manno M, Toth E, Dray X, Yang S, Koulaouzidis A. Conventional small-bowel capsule endoscopy reading vs proprietary artificial intelligence auxiliary systems: Systematic review and meta-analysis. Endosc Int Open. 2025;13:a25442863. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 6] [Cited by in RCA: 4] [Article Influence: 4.0] [Reference Citation Analysis (1)]

27.

Ding Z, Shi H, Zhang H, Zhang H, Tian S, Zhang K, Cai S, Ming F, Xie X, Liu J, Lin R. Artificial intelligence-based diagnosis of abnormalities in small-bowel capsule endoscopy. Endoscopy. 2023;55:44-51. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 32] [Cited by in RCA: 25] [Article Influence: 8.3] [Reference Citation Analysis (1)]

28.

Xie X, Xiao YF, Yang H, Peng X, Li JJ, Zhou YY, Fan CQ, Meng RP, Huang BB, Liao XP, Chen YY, Zhong TT, Lin H, Koulaouzidis A, Yang SM. A new artificial intelligence system for both stomach and small-bowel capsule endoscopy. Gastrointest Endosc. 2024;100:878.e1-878.e14. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 14] [Cited by in RCA: 11] [Article Influence: 5.5] [Reference Citation Analysis (1)]

29.

Sidhu R, Shiha MG, Carretero C, Koulaouzidis A, Dray X, Mussetto A, Keuchel M, Spada C, Despott EJ, Chetcuti Zammit S, McNamara D, Rondonotti E, Sabino J, Ferlitsch M; External Voting Panel. Performance measures for small-bowel endoscopy: a European Society of Gastrointestinal Endoscopy (ESGE) Quality Improvement Initiative - Update 2025. Endoscopy. 2025;57:366-389. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 17] [Cited by in RCA: 15] [Article Influence: 15.0] [Reference Citation Analysis (1)]

30.

Simon BD, Ozyoruk KB, Gelikman DG, Harmon SA, Türkbey B. The future of multimodal artificial intelligence models for integrating imaging and clinical metadata: a narrative review. Diagn Interv Radiol. 2025;31:303-312. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 31] [Cited by in RCA: 24] [Article Influence: 24.0] [Reference Citation Analysis (2)]

31.	Turan M, Almalioglu Y, Gilbert H, Sari AE, Soylu U, Sitti M. Endo-VMFuseNet: Deep Visual-Magnetic Sensor Fusion Approach for Uncalibrated, Unsynchronized and Asymmetric Endoscopic Capsule Robot Localization Data. 2017 preprint. Available from: arXiv: 1709.06041. [PubMed] [DOI] [Full Text]

32.

AlSekait DM, Zakariah M, Amin SU, Dubey P, Khan ZI. Using convolutional neural networks with late fusion to predict heart disease. Sci Rep. 2025;15:41260. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 2] [Cited by in RCA: 2] [Article Influence: 2.0] [Reference Citation Analysis (1)]

33.

Pawłowski M, Wróblewska A, Sysko-Romańczuk S. Effective Techniques for Multimodal Data Fusion: A Comparative Analysis. Sensors (Basel). 2023;23:2381. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 149] [Cited by in RCA: 21] [Article Influence: 7.0] [Reference Citation Analysis (1)]

34.

Huang SC, Pareek A, Seyyedi S, Banerjee I, Lungren MP. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digit Med. 2020;3:136. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 681] [Cited by in RCA: 363] [Article Influence: 60.5] [Reference Citation Analysis (1)]

35.

Spada C, Piccirelli S, Hassan C, Ferrari C, Toth E, González-Suárez B, Keuchel M, McAlindon M, Finta Á, Rosztóczy A, Dray X, Salvi D, Riccioni ME, Benamouzig R, Chattree A, Humphries A, Saurin JC, Despott EJ, Murino A, Johansson GW, Giordano A, Baltes P, Sidhu R, Szalai M, Helle K, Nemeth A, Nowak T, Lin R, Costamagna G. AI-assisted capsule endoscopy reading in suspected small bowel bleeding: a multicentre prospective study. Lancet Digit Health. 2024;6:e345-e353. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 45] [Cited by in RCA: 35] [Article Influence: 17.5] [Reference Citation Analysis (1)]

36.

Mohan BP, Khan SR, Kassab LL, Ponnada S, Chandan S, Ali T, Dulai PS, Adler DG, Kochhar GS. High pooled performance of convolutional neural networks in computer-aided diagnosis of GI ulcers and/or hemorrhage on wireless capsule endoscopy images: a systematic review and meta-analysis. Gastrointest Endosc. 2021;93:356-364.e4. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 41] [Cited by in RCA: 32] [Article Influence: 6.4] [Reference Citation Analysis (2)]

37.

Bang CS, Lee JJ, Baik GH. Computer-Aided Diagnosis of Gastrointestinal Ulcer and Hemorrhage Using Wireless Capsule Endoscopy: Systematic Review and Diagnostic Test Accuracy Meta-analysis. J Med Internet Res. 2021;23:e33267. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 16] [Cited by in RCA: 10] [Article Influence: 2.0] [Reference Citation Analysis (1)]

38.

Jia X, Meng MQ. Gastrointestinal bleeding detection in wireless capsule endoscopy images using handcrafted and CNN features. Annu Int Conf IEEE Eng Med Biol Soc. 2017;2017:3154-3157. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 47] [Cited by in RCA: 29] [Article Influence: 3.2] [Reference Citation Analysis (1)]

39.

Hicks SA, Strümke I, Thambawita V, Hammou M, Riegler MA, Halvorsen P, Parasa S. On evaluation metrics for medical applications of artificial intelligence. Sci Rep. 2022;12:5979. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 742] [Cited by in RCA: 393] [Article Influence: 98.3] [Reference Citation Analysis (1)]

40.

Ribeiro T, Mascarenhas M, Afonso J, Cardoso H, Andrade P, Lopes S, Ferreira J, Mascarenhas Saraiva M, Macedo G. Artificial intelligence and colon capsule endoscopy: Automatic detection of ulcers and erosions using a convolutional neural network. J Gastroenterol Hepatol. 2022;37:2282-2288. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 26] [Cited by in RCA: 19] [Article Influence: 4.8] [Reference Citation Analysis (2)]

41.

Aoki T, Yamada A, Aoyama K, Saito H, Tsuboi A, Nakada A, Niikura R, Fujishiro M, Oka S, Ishihara S, Matsuda T, Tanaka S, Koike K, Tada T. Automatic detection of erosions and ulcerations in wireless capsule endoscopy images based on a deep convolutional neural network. Gastrointest Endosc. 2019;89:357-363.e2. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 259] [Cited by in RCA: 185] [Article Influence: 26.4] [Reference Citation Analysis (1)]

42.

Mascarenhas M, Martins M, Afonso J, Ribeiro T, Cardoso P, Mendes F, Andrade P, Cardoso H, Mascarenhas-Saraiva M, Ferreira J, Macedo G. Deep learning and capsule endoscopy: Automatic multi-brand and multi-device panendoscopic detection of vascular lesions. Endosc Int Open. 2024;12:E570-E578. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 12] [Cited by in RCA: 9] [Article Influence: 4.5] [Reference Citation Analysis (1)]

43.

Ribeiro T, Saraiva MM, Ferreira JPS, Cardoso H, Afonso J, Andrade P, Parente M, Jorge RN, Macedo G. Artificial intelligence and capsule endoscopy: automatic detection of vascular lesions using a convolutional neural network. Ann Gastroenterol. 2021;34:820-828. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 10] [Cited by in RCA: 9] [Article Influence: 1.8] [Reference Citation Analysis (1)]

44.

Kjølhede T, Ølholm AM, Kaalby L, Kidholm K, Qvist N, Baatrup G. Diagnostic accuracy of capsule endoscopy compared with colonoscopy for polyp detection: systematic review and meta-analyses. Endoscopy. 2021;53:713-721. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 61] [Cited by in RCA: 48] [Article Influence: 9.6] [Reference Citation Analysis (1)]

45.

Moen S, Vuik FER, Kuipers EJ, Spaander MCW. Artificial Intelligence in Colon Capsule Endoscopy-A Systematic Review. Diagnostics (Basel). 2022;12:1994. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 35] [Cited by in RCA: 22] [Article Influence: 5.5] [Reference Citation Analysis (1)]

46.

Mamonov AV, Figueiredo IN, Figueiredo PN, Tsai YH. Automated polyp detection in colon capsule endoscopy. IEEE Trans Med Imaging. 2014;33:1488-1502. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 248] [Cited by in RCA: 105] [Article Influence: 8.8] [Reference Citation Analysis (2)]

47.

Yamada A, Niikura R, Otani K, Aoki T, Koike K. Automatic detection of colorectal neoplasia in wireless colon capsule endoscopic images using a deep convolutional neural network. Endoscopy. 2021;53:832-836. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 41] [Cited by in RCA: 34] [Article Influence: 6.8] [Reference Citation Analysis (1)]

48.

Saraiva MM, Ferreira JPS, Cardoso H, Afonso J, Ribeiro T, Andrade P, Parente MPL, Jorge RN, Macedo G. Artificial intelligence and colon capsule endoscopy: development of an automated diagnostic system of protruding lesions in colon capsule endoscopy. Tech Coloproctol. 2021;25:1243-1248. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 23] [Cited by in RCA: 16] [Article Influence: 3.2] [Reference Citation Analysis (0)]

49.	Jianming C, Yuanyuan Q, Yanling X, Li L, Mianhua W, Lulu W. Multi-sensor fusion for AI-driven behavior planning in medical applications. Front Phys. 2025;13:1588715. [PubMed] [DOI] [Full Text]

50.

Qin K, Li J, Fang Y, Xu Y, Wu J, Zhang H, Li H, Liu S, Li Q. Convolution neural network for the diagnosis of wireless capsule endoscopy: a systematic review and meta-analysis. Surg Endosc. 2022;36:16-31. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 31] [Cited by in RCA: 18] [Article Influence: 4.5] [Reference Citation Analysis (0)]

51.	Duggal S, Kattamuri L, Dwivedi AK, Shakeel A, Elhanafi SE, Zuckerman MJ. S3078 Performance of Artificial Intelligence Algorithms in Small Bowel Lesion Detection Using Video Capsule Endoscopy: A Meta-Analysis. Am J Gastroenterol. 2025;120:S662-S663. [PubMed] [DOI] [Full Text]

52.	Chu F, Li H, Xie L, Zhao J. A survey of transformer architectures for autonomous driving. Expert Syst Appl. 2026;299:130338. [PubMed] [DOI] [Full Text]

53.

Piccirelli S, Salvi D, Pugliano CL, Tettoni E, Facciorusso A, Rondonotti E, Mussetto A, Fuccio L, Cesaro P, Spada C. Unmet Needs of Artificial Intelligence in Small Bowel Capsule Endoscopy. Diagnostics (Basel). 2025;15:1092. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 2] [Article Influence: 2.0] [Reference Citation Analysis (1)]

54.	Yang Y, Li YX, Yao RQ, Du XH, Ren C. Artificial intelligence in small intestinal diseases: Application and prospects. World J Gastroenterol. 2021;27:3734-3747. [PubMed] [DOI] [Full Text]

55.	Ramoni D, Scuricini A, Carbone F, Liberale L, Montecucco F. Artificial intelligence in gastroenterology: Ethical and diagnostic challenges in clinical practice. World J Gastroenterol. 2025;31:102725. [PubMed] [DOI] [Full Text]

56.

Habe TT, Haataja K, Toivanen P. Review of Deep Learning Performance in Wireless Capsule Endoscopy Images for GI Disease Classification. F1000Res. 2024;13:201. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 3] [Article Influence: 1.5] [Reference Citation Analysis (1)]

57.

Cao Q, Deng R, Pan Y, Liu R, Chen Y, Gong G, Zou J, Yang H, Han D. Robotic wireless capsule endoscopy: recent advances and upcoming technologies. Nat Commun. 2024;15:4597. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 120] [Cited by in RCA: 43] [Article Influence: 21.5] [Reference Citation Analysis (2)]

Footnotes

Provenance and peer review: Invited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Gastroenterology and hepatology

Country of origin: United States

Peer-review report’s classification

Scientific Quality: Grade C

Novelty: Grade C

Creativity or Innovation: Grade C

Scientific Significance: Grade C

Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/

P-Reviewer: Shah K, PhD, Associate Professor, Pakistan S-Editor: Liu H L-Editor: A P-Editor: Xu J