BPG is committed to discovery and dissemination of knowledge
Meta-Analysis Open Access
Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.
World J Gastrointest Endosc. Mar 16, 2026; 18(3): 116381
Published online Mar 16, 2026. doi: 10.4253/wjge.v18.i3.116381
Artificial intelligence in predicting colorectal polyp histology: Systematic review and meta-analysis of diagnostic accuracy in real-time procedures
Princess Curlej, Department of Gastroenterology, University of South Wales in Association with Learna Ltd., Cardiff CF37 1DL, United Kingdom
Jonathan Soldera, Department of Gastroenterology and Acute Medicine, University of South Wales in Association with Learna Ltd., Cardiff CF37 1DL, United Kingdom
Jonathan Soldera, Department of Gastroenterology, Logan Hospital, Brisbane 4131, Queensland, Australia
ORCID number: Jonathan Soldera (0000-0001-6055-4783).
Co-first authors: Princess Curlej and Jonathan Soldera.
Author contributions: Curlej P and Soldera J participated in the concept and design research, drafted the manuscript, contributed to data acquisition, analysis and interpretation, and they contributed equally to this manuscript and are co-first authors; Soldera J contributed to study supervision. All authors contributed to critical revision of the manuscript for important intellectual content.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
PRISMA 2009 Checklist statement: The authors have read the PRISMA 2009 Checklist, and the manuscript was prepared and revised according to the PRISMA 2009 Checklist.
Corresponding author: Jonathan Soldera, MD, PhD, Tutor, Department of Gastroenterology and Acute Medicine, University of South Wales in Association with Learna Ltd., 86-88 Adam Street, Cardiff CF37 1DL, United Kingdom. jonathansoldera@gmail.com
Received: November 11, 2025
Revised: December 10, 2025
Accepted: January 20, 2026
Published online: March 16, 2026
Processing time: 123 Days and 4.9 Hours

Abstract
BACKGROUND

Colorectal cancer remains a major global health burden. Accurate real-time characterization of colorectal polyp histology during colonoscopy is pivotal for early detection and management. Artificial intelligence (AI)-assisted endoscopy has emerged as a transformative tool capable of augmenting diagnostic precision and reducing dependence on conventional histopathology.

AIM

To determine the diagnostic accuracy of AI in predicting colorectal polyp histology during real-time colonoscopy.

METHODS

A comprehensive literature search was conducted in accordance with Preferred Reporting Items for Systematic reviews and Meta-Analyses guidelines and prospectively registered in International Prospective Register of Systematic Reviews (No. CRD420251012404). Nine eligible studies underwent critical appraisal using the Quality Assessment of Prognostic Accuracy Studies-2 framework. Diagnostic performance metrics - including sensitivity, specificity, positive predictive values and negative predictive values, and relative risk - were synthesized using random-effects modeling to account for between-study variability, using R software.

RESULTS

The meta-analysis incorporated 3245 patients encompassing 4752 polyps. Pooled analysis demonstrated that AI achieved an overall diagnostic accuracy of 93%, compared to 82% for human experts (relative risk = 1.13; 95% confidence interval: 1.07-1.20; P < 0.0001). AI consistently outperformed human endoscopists, particularly in cohorts involving less experienced operators or suboptimal imaging conditions. Substantial heterogeneity was observed (I2 = 74.3%), attributed to methodological differences in imaging modalities, AI architectures, and operator proficiency.

CONCLUSION

AI demonstrates high diagnostic accuracy for real-time colorectal polyp histology and may enhance clinical decision-making, although expert oversight remains essential for atypical or high-risk lesions.

Key Words: Artificial intelligence; Adenoma; Colonic polyps; Endoscopy; Gastrointestinal; Diagnosis computer-assisted

Core Tip: Artificial intelligence enables accurate, real-time differentiation between neoplastic and non-neoplastic colorectal polyps, fulfilling the optical biopsy criteria. By supporting “resect-and-discard” strategies for diminutive lesions and providing decision assistance to less experienced endoscopists, artificial intelligence can streamline colonoscopy workflows, mitigate pathology workloads, and enhance colorectal cancer prevention programs.



INTRODUCTION

Colorectal cancer (CRC) remains a major global health challenge[1]. It is the third most common cancer worldwide and a leading cause of cancer-related mortality, with approximately 1.93 million new cases and 935000 deaths each year[2,3]. Incidence rates vary widely across regions developed countries have historically high CRC prevalence, while developing regions are experiencing rising incidence reflecting the interplay of genetic predispositions, lifestyle and dietary factors, and healthcare disparities[4]. This growing burden underscores an urgent need for effective prevention strategies and early diagnostic interventions[3].

One cornerstone of CRC prevention is colonoscopy, the gold-standard screening procedure that allows both detection and removal of premalignant polyps, thereby interrupting the adenoma-carcinoma sequence[5]. A critical goal during colonoscopy is to accurately distinguish neoplastic polyps (adenomas and sessile serrated lesions) from non-neoplastic lesions (such as hyperplastic polyps) in order to guide appropriate management and surveillance intervals[6]. Definitive classification, however, traditionally relies on histopathological examination of excised polyps; a process that is invasive, resource-intensive, costly, and slow. Waiting days for pathology results can cause patient anxiety and impedes immediate clinical decision-making, highlighting the drawbacks of our current “resect first, diagnose later” approach[7].

Diagnostic accuracy in polyp assessment also varies notably among endoscopists and pathologists, leading to inconsistent clinical decisions[8]. Misclassification of polyps can have significant consequences: An overestimation of risk may lead to unnecessary polypectomies and overly frequent surveillance (burdening patients and healthcare resources), whereas underestimation might result in inadequate surveillance and a higher chance of missed interval cancers[9]. Additionally, polypectomy itself carries non-trivial risks including bleeding, perforation, and other complications particularly in older patients or those with comorbidities. These issues underscore the need to improve real-time diagnostic precision during colonoscopy and to minimize unnecessary invasive procedures[10].

At the same time, healthcare systems are under increasing pressure from aging populations and constrained resources, emphasizing the importance of more efficient approaches to CRC screening and surveillance[11]. An ideal solution would be an accurate, real-time diagnostic tool that can reliably characterize polyp histology in vivo during the endoscopic procedure. Such a tool could streamline decision-making by enabling immediate determination of whether a polyp is benign or precancerous, thereby reducing unnecessary removals, biopsies, and associated costs and risks[12]. In this context, artificial intelligence (AI) has garnered significant attention as a potential game-changer for meeting these clinical needs.

In recent years, rapid advancements in AI, particularly in machine learning and deep learning have yielded sophisticated prognostication models and image analysis tools capable of interpreting medical images with high speed and accuracy[13-17]. In gastroenterology, deep convolutional neural networks have been developed to automatically analyses endoscopic images, allowing precise real-time identification and characterization of colorectal polyps[18,19]. Modern AI-assisted colonoscopy systems can also enhance visualization using advanced imaging modalities such as narrow-band imaging (NBI), blue laser imaging, and ultra-magnification endocytoscopy, which provide greater mucosal detail than conventional white-light endoscopy and improve diagnostic accuracy for differentiating polyp types[20,21]. Preliminary clinical studies have been promising: For example, AI-based endoscopic analysis has successfully distinguished neoplastic from non-neoplastic polyps with accuracy rates on par with, or even superior to, expert histopathology. Notably, these AI systems have achieved negative predictive values exceeding 90%, surpassing the threshold recommended by the American Society for Gastrointestinal Endoscopy (ASGE) for a “resect-and-discard” strategy and thereby validating the safety of leaving certain diminutive polyps in place without resection[22,23].

Building on these advances, a growing body of clinical evidence indicates that AI can enhance polyp detection and characterization during colonoscopy. The COACH trial by Renner et al[24], for instance, reported that an AI system outperformed expert endoscopists in classifying colorectal neoplasms, achieving sensitivity rates above 90%. Similarly, Wang et al[25] showed that real-time AI feedback significantly reduced adenoma miss-rates in a tandem colonoscopy study, and Kudo et al[26] demonstrated that an AI-assisted NBI platform could diagnose polyp histology with higher accuracy than experienced endoscopists. These improvements have been consistently observed in studies across various clinical settings, including multicenter trials in different patient populations[24,27,28], suggesting that AI’s benefits are generalizable and not limited to single-center expertise. Professional guidelines are beginning to reflect this progress: The British Society of Gastroenterology (BSG) now encourages the adoption of new technologies such as AI to help endoscopists meet key quality benchmarks, including higher adenoma detection rates and more accurate real-time histologic assessments of polyps[9]. Early clinical implementations of AI have also reported practical advantages, such as shorter procedure times, fewer unnecessary biopsies, and increased diagnostic confidence for endoscopists, which in turn can improve patient satisfaction[9,23]. Nonetheless, for AI to be broadly implemented in routine practice, several practical barriers must be overcome namely, ensuring clinician acceptance of AI tools, addressing costs and reimbursement issues, providing standardized training, and integrating these systems seamlessly with diverse endoscopy hardware and workflows[21].

Despite its promise, AI integration into colonoscopy is not without challenges. Many AI models are trained on specific image datasets under controlled conditions, and differences in algorithm design or training data can lead to variability in performance when these systems are applied in new settings[21,29]. This raises concerns about generalizability and reproducibility: An AI tool that performs well in one hospital or patient group may not instantly translate to all populations or endoscope types. There are also important ethical and transparency issues. AI algorithms often operate as “black boxes”, making it difficult to understand the basis for a given prediction, and any biases present in training data (for example, under-representation of certain demographics or polyp types) could result in skewed outcomes in practice[26]. These considerations necessitate rigorous validation of AI tools across diverse patient populations and endoscopy systems, as well as standardized reporting of performance metrics. Robust clinical governance and oversight will be essential to ensure that AI-assisted diagnosis remains accurate, safe, and equitable as it is integrated into gastroenterological practice[21].

Several knowledge gaps in the literature remain to be addressed. Most studies to date have been relatively small or single-center, and they often evaluate AI based on per-polyp diagnostic accuracy rather than patient-level outcomes. It therefore remains unclear how the routine use of AI in colonoscopy might affect long-term patient outcomes, such as reducing interval CRC rates or improving overall surveillance efficiency[30]. Additionally, comparative studies between AI systems and human experts have yielded mixed results, with AI showing greater benefit in some contexts than others; notably, the incremental advantage of AI may depend on the baseline skill and adenoma detection rate of endoscopists in a given setting[24,29]. A lack of standardization in study methodologies and reporting further complicates the picture researchers have used different definitions, performance metrics, and validation protocols, making it difficult to directly compare results across studies or to formulate uniform guidelines for AI implementation[31]. These gaps highlight the need for a comprehensive synthesis of the available evidence to determine where AI truly stands in augmenting colorectal polyp diagnostics.

The aim of this study was to systematically review and meta-analyses prospective evidence on real-time AI-assisted diagnosis of colorectal polyps, comparing its histologic accuracy with expert endoscopists. The review sought to determine whether current AI systems meet clinical performance benchmarks and to identify settings where AI offers the greatest diagnostic advantage, thereby guiding its safe integration into CRC screening and surveillance.

MATERIALS AND METHODS

This systematic review and meta-analysis were designed to evaluate the diagnostic accuracy of AI systems in predicting colorectal polyp histology during real-time endoscopic assessment. The study was conducted in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses[32] statement and was prospectively registered in the International Prospective Register of Systematic Reviews (No. CRD420251012404). The review was undertaken to address a key evidence gap in endoscopic imaging, evaluating whether AI can improve the accuracy of optical diagnosis, reduce reliance on histopathology, and enhance clinical decision-making and workflow efficiency in CRC prevention.

Inclusion and exclusion criteria

The primary objective was to determine the rate of correct histological diagnoses made by AI-assisted colonoscopy systems compared to conventional histopathology and human expert endoscopists. A structured research question was formulated using the PICO (Population, Intervention, Comparison, Outcome framework; Table 1)[33]. Additional criteria: (1) Peer-reviewed journal articles published in English; and (2) Studies published within the last 10 years (2014-2024).

Table 1 PICO (Population, Intervention, Comparison, Outcome) framework.
Component
Definition
Criteria for inclusion
Criteria for exclusion
Research questionWhat is the diagnostic accuracy and clinical impact of AI-based real-time histology prediction for colorectal polyps compared to conventional histopathology and endoscopists?Studies must specifically investigate AI assisted systems for real time polyp histology predictionStudies that do not focus on real-time AI applications in live colonoscopy settings
PopulationPatients undergoing colonoscopy with colorectal polyp detection and histology prediction(1) Adults (≥ 18 years) undergoing colonoscopy; (2) Studies involving patients with colorectal polyps (adenomatous, hyperplastic, sessile serrated); and (3) Human subjects (no in vitro or animal studies)(1) Studies focusing on animal models, in vitro, or simulation-based research; and (2) Pediatric studies (patients < 18 years)
InterventionAI-based systems for real-time histology prediction of colorectal polyps(1) AI assisted colonoscopy systems for polyp detection and classification; (2) Machine learning and deep learning models (e.g., convolutional neural networks); and (3) AI enhanced imaging techniques (e.g., narrow-band imaging, endocytoscopy)AI models used only for retrospective analysis (not real-time)
ComparisonStandard histopathological methods or expert endoscopists’ assessments(1) Histopathological examination as the gold standard; (2) Comparison with experienced endoscopists’ accuracy; and (3) Conventional endoscopy methods without AI assistanceAI models compared only with other AI models (without human or histological reference)
OutcomeDiagnostic accuracy of AI systems in polyp histology predictionPrimary outcomes: (1) Sensitivity, specificity, accuracy, and negative predictive value of AI models; and (2) Adenoma detection rate. Secondary outcomes: (1) Reduction in unnecessary polypectomies; (2) Interobserver variability between AI and human experts; and (3) Time efficiency and cost-effectiveness of AI-assisted endoscopy(1) Studies with incomplete or insufficient clinical validation of AI performance; and (2) Studies that do not report key diagnostic accuracy metrics (e.g., missing sensitivity, specificity, or adenoma detection rate)
Study selection and data extraction

A comprehensive and systematic search strategy was developed using Medical Subject Headings (MeSH) and Boolean operators to identify all relevant studies evaluating AI systems for real-time colorectal polyp histology prediction. The following structured search query was executed in PubMed: [“Artificial Intelligence”(Mesh) OR “Machine Learning”(Mesh) OR “Deep Learning”(Mesh) OR “Neural Networks, Computer”(Mesh) OR “Convolutional Neural Networks”(tiab) OR “AI-assisted”(tiab) OR “computer-aided diagnosis”(tiab)] AND [“Colorectal Neoplasms”(Mesh) OR “Colonic Polyps”(Mesh) OR “Adenomatous Polyps”(Mesh) OR “Sessile Serrated Polyps”(tiab) OR “Hyperplastic Polyps”(tiab) OR “colorectal polyp histology”(tiab)] AND [“Colonoscopy”(Mesh) OR “Real-time endoscopy”(tiab) OR “Narrow-band imaging”(tiab) OR “Endocytoscopy”(tiab)] AND [“Diagnostic Yield”(tiab) OR “Sensitivity and Specificity”(Mesh) OR “Predictive Value of Tests”(Mesh) OR “Diagnostic Accuracy”(tiab) OR “Negative Predictive Value”(tiab) OR “Positive Predictive Value”(tiab) OR “Area Under Curve”(tiab) OR “AUROC”(tiab)].

Data extraction was systematically conducted using a standardized extraction form designed to capture essential study information, including author details, year of publication, country, study design, and clinical setting. Patient characteristics such as the number enrolled, demographics, and polyp attributes were documented alongside detailed descriptions of the AI systems employed, specifying the model type, algorithm design, and real-time deployment modalities (standard white-light colonoscopy, NBI, or endocytoscopy). Information regarding comparison groups, including expert endoscopists or standard histopathology methods, was also recorded. Diagnostic outcome measures extracted were sensitivity, specificity, overall accuracy, positive predictive value, negative predictive value (NPV), diagnostic odds ratio, and area under the receiver operating characteristic curve. Data extraction was independently performed by the student author and subsequently cross-validated by the supervising tutor to ensure accuracy and consistency, with any discrepancies resolved through consensus.

Quality assessment and risk of bias

Quality assessment and risk of bias evaluation were conducted using the Quality Assessment of Prognostic Accuracy Studies (QUADAS) 2 tool, assessing four domains: Patient selection, index test conduct, reference standard, and patient flow and timing. This comprehensive quality appraisal ensures robustness in interpreting and generalizing the review findings.

Statistical analysis

Meta-analysis was performed to quantitatively compare the diagnostic accuracy defined as the rate of correct histological diagnoses of AI systems relative to expert human endoscopists during real-time colorectal polyp assessment. The primary aim was to evaluate the comparative effectiveness of AI across multiple clinical studies included in the systematic review. Data extracted from each eligible study included the number of correctly diagnosed lesions and the total number of lesions assessed by both the AI systems and human comparator groups. Using these data, proportions of correct diagnoses (diagnostic accuracy rates) were calculated separately for each group within each study.

To directly compare AI performance with human expert performance, relative risk (RR) was selected as the primary effect measure. The RR represents the ratio of correct diagnoses by AI compared to those by human endoscopists. An RR greater than 1 indicated that AI provided superior diagnostic accuracy compared to humans, whereas an RR less than 1 would suggest superior human performance. An RR value of exactly 1 represented equivalent diagnostic accuracy between the two groups. The meta-analysis employed the inverse variance method to pool the RRs across studies. This method assigns weights to individual studies according to the precision of their estimates, reflecting differences in sample size and variability in diagnostic outcomes across studies[34].

Given anticipated methodological and clinical heterogeneity stemming from variations in AI systems, patient populations, lesion characteristics, and clinical setting, a random-effects model was used as the primary analytical approach. This model assumes inherent variability across studies and incorporates between-study variation into the pooled estimates, providing more conservative and clinically realistic results. Additionally, common-effect model was calculated for comparative purposes to assess the consistency and robustness of the pooled effect estimates.

To evaluate heterogeneity among the included studies, two widely accepted statistical methods were employed: (1) Cochran’s Q-test was performed to determine whether the observed variation among study estimates exceeded chance alone. Statistical significance indicating heterogeneity was defined as a P-value less than 0.05; and (2) The I2 statistic quantified the extent of observed heterogeneity, expressed as the proportion of total variability attributable to true differences between studies rather than random sampling error. The interpretation of I2 values followed conventional guidelines: 0%-25% indicating low heterogeneity, 26%-50% moderate heterogeneity, and greater than 50% substantial heterogeneity.

Additionally, τ2 and τ were calculated as measures of between-study variance and standard deviation, respectively, providing further insight into the magnitude and clinical significance of the observed heterogeneity.

To assess the potential impact of publication bias, Deeks’ funnel plot asymmetry test was performed, specifically recommended for diagnostic accuracy meta-analyses[35]. Funnel plots were visually inspected for asymmetry, and the statistical significance of asymmetry was tested using a regression based method, with a P-value less than 0.10 indicating significant publication bias or small-study effects.

Results of the meta-analysis were graphically displayed using forest plots, which depicted individual study-level RR estimates, their corresponding 95% confidence intervals (CI), and the pooled RR estimate from the random-effects model. These plots facilitated easy interpretation and comparison of the diagnostic accuracy of AI systems relative to human experts across the included studies. Additionally, sensitivity analyses were performed by sequentially excluding individual studies from the analysis to assess the stability and robustness of pooled estimates, ensuring that results were not disproportionately influenced by any single study[36].

All analyses were conducted using R statistical software meta package, a widely validated tool specifically designed for meta-analysis[37]. Statistical significance for all analyses was determined at a P-value threshold of less than 0.05, unless otherwise stated.

RESULTS

The initial systematic search yielded 121 articles. After applying eligibility criteria (peer-reviewed articles, English language, published within the last 10 years) and removing duplicates, 45 studies remained. These underwent a full-text review, resulting in the exclusion of 33 studies due to criteria including non-real-time AI applications, insufficient diagnostic data, or incompatible study designs. Manual screening of reference lists revealed no additional eligible studies.

Ultimately, 12 studies met eligibility criteria after full-text screening. However, full-text access could not be obtained for 3 studies, despite attempts to contact authors. Consequently, these studies were excluded due to the inability to ensure data completeness and accuracy. Finally, 9 studies were included for systematic review and meta-analysis. Figure 1 illustrates this study selection process.

Figure 1
Figure 1 Preferred Reporting Items for Systematic reviews and Meta-Analyses flow diagram of the search strategy process. Adapted from Page et al[32]. RCT: Randomized controlled trial.

The nine studies included in this systematic review and meta-analysis collectively evaluated 3245 patients, accounting for a total of 4752 colorectal lesions. These studies encompassed diverse populations from multiple countries, including Germany, Japan, China, the United Kingdom, Canada, the Netherlands, France, and Norway, reflecting a broad geographical distribution and ensuring international generalizability.

Study designs varied significantly, including single-center observational studies[23,24], prospective observational studies[26,38], prospective randomized trials[25,39], and randomized controlled and multicenter studies[27,28,38]. Patient populations across these studies included adults undergoing either screening or diagnostic colonoscopy, contributing to the applicability of the findings across different clinical scenarios.

AI interventions consistently employed advanced deep-learning algorithms, primarily convolutional neural networks, designed for real-time polyp histology classification during endoscopic procedures. Diagnostic modalities included standard high-definition white-light colonoscopy, NBI, and magnification-based technologies such as endocytoscopy. Comparator groups typically comprised experienced human endoscopists. Diagnostic outcomes focused on the number of correctly classified polyps, with clearly documented comparisons between AI systems and human experts. A concise summary of these core characteristics is provided in Table 2.

Table 2 Summary of included studies.
Ref.
Country
Study design and setting
Total lesion
Patient population description
AI intervention description
Comparison group
AI correct diagnoses
Human correct diagnoses
Barua et al[38]United KingdomRandomized controlled trial359Screening colonoscopy in general practiceAI-assisted white-light and narrow band imagingExperienced endoscopists325/359317/359
Chino et al[39]JapanProspective observational556Diagnostic colonoscopy patientsAI system for polyp detection/classificationHuman experts542/556Not reported
Djinbachian et al[40]CanadaProspective randomized trial52Screening colonoscopy patientsAutonomous AI systemHuman experts48/5244/52 (expert 1), 40/52 (expert 2)
Kudo et al[26]JapanProspective observational194Patients with colorectal polypsAI with magnifying narrow band imaging (EndoBRAIN system)Human experts188/194161/194
Mori et al[23]JapanProspective single-center287Patients undergoing magnifying endoscopyAI-assisted endocytoscopy (stained mode)Human experts263/287Not reported
Renner et al[24]GermanySingle-center observational52Adults with colorectal polypsAI-assisted histology predictionTwo expert readers48/5244/52 (expert 1), 40/52 (expert 2)
Sato et al[27]NetherlandsProspective multicenter217Patients undergoing colonoscopyAI using magnifying BLIHuman endoscopists214/217177/217
van der Zander et al[28]NetherlandsProspective multicenter100Patients with colorectal polypsAI system with heatmapping and imagingTwo expert readers92/10084/100 (expert 1), 77/100 (expert 2)
Wang et al[25]ChinaProspective randomized tandem144Screening colonoscopy patientsDeep learning CNN (EndoScreener)Human endoscopists124/14496/144

This table facilitates an understanding of the heterogeneity and methodological approaches of the included evidence base and serves as a foundation for interpreting the pooled diagnostic accuracy and RR analyses presented later. It highlights key differences in AI systems, clinical settings, and human comparator expertise, which are important considerations when assessing AI’s diagnostic utility in routine clinical practice (Table 2).

Quality assessment of included studies (QUADAS-2)

The quality assessment of each study was rigorously conducted using the QUADAS-2 tool. Overall, the methodological quality of included studies was high, though minor concerns were identified in certain domains: Mori et al[23] showed low overall risk of bias, with minor limitations related to incomplete reporting of the human comparator’s results. Renner et al[24] demonstrated overall low risk of bias, although minor concerns existed regarding clarity in patient selection methods. Wang et al[25], a prospective randomized study, exhibited low risk of bias, particularly in domains related to index test conduct and patient selection. Kudo et al[26] showed low risk of bias across all assessed domains, with robust methodology clearly outlined. Sato et al[27] exhibited high methodological quality and low risk of bias with clear, well-defined patient selection and diagnostic criteria. van der Zander et al[28] was similarly robust, with slight concerns regarding variability between human experts but low overall risk of bias. Barua et al[38] was methodologically robust, exhibiting minimal bias due to its randomized controlled design and clearly defined inclusion criteria. Chino et al[39] had overall low risk of bias, although minor concerns arose from incomplete reporting of human comparator data. Djinbachian et al[40] was methodologically sound as a randomized controlled trial, though small sample size posed minor concerns about precision.

These minor concerns did not significantly compromise the overall quality or validity of the pooled results. Recognizing and transparently reporting these variations enables careful interpretation and enhances the credibility of this systematic review’s conclusions.

Diagnostic accuracy of AI systems

Diagnostic accuracy, defined as the proportion of correctly classified colorectal lesions by AI systems, was evaluated across the nine included studies. A total of 1961 lesions were analyzed, with AI correctly diagnosing 1844 lesions.

The pooled diagnostic accuracy for AI systems using a random-effects model was 93.97% (95%CI: 90.46%-96.24%), while the common-effect model provided a similar estimate of 92.71% (95%CI: 91.32%-93.89%). This indicates high and consistent performance across diverse clinical settings and imaging modalities (see forest plot for AI accuracy, Figure 2A). However, substantial heterogeneity was observed among these studies, quantified with an I2 value of 81.1% (95%CI: 65.2%-89.8%, Q = 42.42, P < 0.0001), highlighting methodological and clinical variations, such as differences in imaging methods and patient characteristics, influencing diagnostic outcomes.

Figure 2
Figure 2 Forest plot. A: Artificial intelligence (AI) rate of correct diagnosis: Showing pooled diagnostic accuracy for AI; B: Human rate of correct diagnosis: Showing pooled diagnostic accuracy for human experts; C: Comparison: Illustrating the direct comparative diagnostic accuracy of AI vs human experts. CI: Confidence interval; AI: Artificial intelligence.
Diagnostic accuracy of human experts

Human diagnostic accuracy was assessed in seven studies, encompassing a total of 1118 colorectal lesions. Across these, human experts correctly diagnosed 923 lesions. Notably, several studies involved more than one expert reader, revealing inter-observer variability. Renner et al[24], expert 1 correctly classified 44 lesions while expert 2 diagnosed 40 out of 52. A similar pattern appeared in van der Zander et al[28] (84 vs 77 correct) and Djinbachian et al[40] (44 vs 40 correct diagnoses). While these differences highlight the subjectivity and variability in human interpretation, the overall impact on pooled diagnostic accuracy was small. This is because the discrepancies between individual experts typically ranging from 4 lesions to 7 lesions were modest in magnitude and diluted when aggregated with data from larger studies involving single expert readers, such as Sato et al[27] and Barua et al[38].

The meta-analysis revealed that the pooled diagnostic accuracy for human experts was lower than that of AI systems. Using a random-effects model, the estimated correct diagnosis rate for human experts was 82.20% (95%CI: 76.51%-86.75%), while the common-effect model yielded a nearly identical estimate of 81.79% (95%CI: 79.33%-84.01%). These pooled results, though relatively high, were consistently outperformed by AI in most direct comparisons.

Substantial heterogeneity was also noted among human expert assessments, with an I2 value of 81.3% (95%CI: 62.2%-90.7%), a Q statistic of 32.01, and a P < 0.0001, indicating statistically significant inconsistency across studies. This heterogeneity may be attributed to variability in operator experience, classification criteria, image quality, polyp characteristics, and real-time decision-making environments (see forest plot for human rate of correct diagnosis, Figure 2B). Variability in human performance likely relates to differences in expertise levels, training standards, and clinical practice environments.

Comparison of AI and human diagnostic accuracy

A direct comparison of diagnostic accuracy between AI and humans was performed through meta-analysis of RRs from seven comparative studies (n = 2236 observations). RR is a particularly effective measure in this context, as it quantifies the ratio of correct diagnoses made by AI systems to those made by human experts. An RR greater than 1 suggests superior performance by AI; an RR less than 1 suggests human superiority.

The pooled RR using a random-effects model indicated that AI systems were 13.20% more likely to correctly diagnose colorectal polyps (RR = 1.1320; 95%CI: 1.0659-1.2023; P < 0.0001) compared to human experts. The common-effect model supported this finding, yielding a pooled RR of 1.1153 (95%CI: 1.0821-1.1495; P < 0.0001). These results demonstrate a robust and statistically significant advantage for AI (Figure 2C).

Interpretation of individual study RRs

Renner et al[24], van der Zander et al[28], and Djinbachian et al[40]: These studies reported RRs ranging from 1.09 to 1.10, but with CIs that included 1, indicating that the results were not individually statistically significant. Still, their consistent trend in favor of AI contributes to the overall positive pooled effect. Wang et al[25]: RR = 1.2917 (95%CI: 1.1310-1.4751), the highest observed RR, signifying that AI outperformed humans by nearly 29.17% a highly significant result. Kudo et al[26]: RR = 1.1677 (95%CI: 1.0904-1.2505), suggesting that AI was 16.77% more likely to correctly diagnose histology. The CI excludes 1, indicating statistical significance. Sato et al[27]: RR = 1.2090 (95%CI: 1.1327-1.2905), demonstrating an even greater advantage of 20.90%, with a strongly significant CI. Barua et al[38]: RR = 1.0252 (95%CI: 0.9749-1.0782), indicating only a 2.52% improvement by AI. However, the CI includes 1, so the result is not statistically significant, suggesting equivalence between AI and human experts in this setting.

Evaluation and interpretation of heterogeneity

Substantial heterogeneity was detected in the comparative analysis of AI vs humans, with an I2 value of 74.3% (95%CI: 45.2%-88.0%), Q = 23.35, and P = 0.0007. This indicates that variability across studies was greater than expected by chance alone.

Additional metrics τ2 = 0.0041 and τ = 0.0644 quantify the between-study variance. This heterogeneity is likely due to differences in AI model design, operator skill level, endoscopic imaging modalities, patient demographics, and clinical practice environments. Importantly, such heterogeneity underscores the context-specific nature of AI’s performance.

Publication bias

To explore potential publication bias, Deeks’ funnel plot asymmetry test was performed. Visual and statistical assessment indicated no significant evidence of publication bias (P > 0.10). This supports confidence in the meta-analysis findings and suggests that the observed pooled estimates accurately reflect the published evidence.

DISCUSSION

The present meta-analysis shows that AI systems meaningfully outperform human endoscopists in the real-time optical diagnosis of colorectal polyp histology. Pooling nine studies and nearly two thousand lesions, AI reached an overall correct-diagnosis rate of approximately 94% (random-effects proportion 0.9397; 95%CI: 0.9046-0.9624), while human endoscopists averaged approximately 82% (0.8220; 95%CI: 0.7651-0.8675). When compared directly, AI was 11%-13% more likely to deliver a correct call than humans (random-effects RR = 1.13; 95%CI: 1.07-1.20; P < 0.0001). In practical terms, where an endoscopist would correctly characterize approximately 82 of 100 polyps, a contemporary AI under similar conditions would correctly characterize approximately 93. Although these gains were not uniform across all trials, the direction of effect favored AI in most comparisons, and the pooled advantage remained statistically robust despite between-study variability.

Patterns across individual studies help explain the pooled result and its heterogeneity. In trials that leveraged advanced imaging and high-quality training data, AI often achieved near-ceiling performance and decisively outpaced experts. Kudo et al[26] and Sato et al[27] exemplify this: AI’s relative likelihood of a correct diagnosis exceeded that of human readers by 17%-21% (RR = 1.17 and RR = 1.21, respectively), with absolute accuracies for AI in the high 80s to high 90s and human accuracies around the low 80s to low 90s. By contrast, in settings where experts already performed near a practical “ceiling” with non-magnified imaging and rigorous credentialing, the incremental gain from AI was smaller[26,27]. Barua et al[38] reported approximately 90.5% accuracy with AI vs approximately 88.3% without, with an RR of approximately 1.03 and CIs crossing 1, indicating equivalence rather than clear superiority in that high-performing environment[24]. Where baseline human optical diagnosis was modest, AI provided larger absolute improvements: For example, Wang et al[25] described that human accuracy increased around two-thirds by AI to the mid-80s (RR approximately of 1.29), highlighting AI’s potential to elevate performance in more challenging or variable clinical contexts.

The broader literature supports this gradient of benefit. Early prospective work demonstrated that AI can meet international thresholds for “optical biopsy”, including the ASGE’s PIVI benchmark of ≥ 90% NPV for diminutive rectosigmoid hyperplastic polyps, an essential prerequisite for “resect-and-discard” and “diagnose-and-leave” strategies[40]. Subsequent multicenter studies show that the effect size varies with context. In Barua et al[38], sensitivity and specificity gains were small and not statistically significant, but AI markedly increased the proportion of high-confidence diagnoses (from approximately 74% to approximately 93%), suggesting a standardizing influence on decision-making even where accuracy differences are minimal[24]. Sato et al[27] found that AI’s accuracy was comparable to experts using magnifying imaging, while non-experts supported by AI performed below experts but above unaided novices, underscoring the interaction among imaging modality, operator experience, and algorithm capability. The randomized trial by Djinbachian et al[40] is instructive: An autonomous computer-aided diagnosis achieved approximately 77% accuracy, non-inferior to approximately 72% for endoscopists using AI assistance, and produced surveillance interval recommendations more concordant with pathology (91.5% vs 82.1%), providing evidence that AI can apply guideline logic consistently even when absolute classification accuracy leaves room to improve[12]. Experimental work on human-AI interaction shows that presentation matters: In van der Zander et al[28], endoscopists’ accuracy rose from 69% to 77% with a simple AI dichotomous suggestion, whereas nuanced probability displays helped only when confidence was very high, implying that interface design can shape the realized benefit.

Although our review focused on histology prediction (computer-aided diagnosis), it sits atop a complementary success story in polyp detection (computer-aided detection). Tandem and randomized trials show that computer-aided detection reduces adenoma miss rates and improves adenoma detection rate across experience levels, particularly helping lower-detectors approach expert performance[25]. This matters because characterization can only improve outcomes if detection first ensures the lesion is seen. High-sensitivity computer-aided detection (e.g., Chino et al’s system reporting[39] approximately 97.5% sensitivity for detection) enlarges the funnel of polyps that reach the “optical biopsy” step, where computer-aided diagnosis can then add value. In short, computer-aided detection and computer-aided diagnosis are synergistic: Detection elevates the floor of quality, and characterization raises the ceiling of immediate, risk-stratified management.

These studies differ because of three recurring themes. First, imaging modality is pivotal. Magnifying NBI/blue laser imaging and endocytoscopy yield richer pit-vascular detail, enabling both humans and algorithms to perform better; unsurprisingly, AI trained and deployed with magnification shows larger effect sizes than AI working from standard white-light views. Trials without magnification, such as Barua et al[38] and Djinbachian et al[40] generally report lower absolute accuracies for both groups and narrower AI-human gaps. Second, operator expertise shapes headroom for improvement. Where experts already achieve high performance, AI adds confidence and consistency more than raw accuracy; where performance is variable (general practice, trainees, or visually subtle lesions), AI frequently raises both sensitivity and specificity. Third, dataset provenance and algorithm generalizability matter. Models trained on narrow distributions may falter on different scopes, patient populations, or lesion spectra; multicenter trials help, but heterogeneity remains an unavoidable feature during diffusion of innovation.

Clinically, the implications are considerable. AI augments real-time decision-making, offering a “second pair of eyes” and a disciplined application of pre-specified criteria. When AI predicts a small rectosigmoid lesion is hyperplastic with high confidence, immediate “diagnose-and-leave” becomes plausible under Patient-Centered Outcomes Research Institute - Endoscopy Valuation Initiative/European Society for Gastrointestinal Endoscopy thresholds; when AI flags neoplastic features, the endoscopist can ensure complete resection, retrieval, or opt for advanced en bloc techniques. Gains in confidence and standardization are not trivial: Barua et al[38] increase in high-confidence calls illustrates how AI can expand the proportion of clinically actionable optical diagnoses even without dramatic accuracy differences. Health-economic modelling suggests that implementing discard/Leave in situ strategies under validated AI support could reduce pathology submissions and procedural time, yielding meaningful cost offsets at scale (e.g., estimates in Mori et al[23] and subsequent analyses) while maintaining safety[22]. In real-world practice, these potential gains must be balanced against acquisition and maintenance costs, requirements for regulatory approval and local governance, and the need for ongoing surveillance of model performance over time, including monitoring for model drift. The policy climate is also evolving: Societies such as the BSG endorse adopting technologies that help meet quality standards, for example adenoma detection and accurate characterization, provided systems are validated and embedded within governance, training, and audit frameworks[38]. Dedicated training curricula and credentialing for AI-supported optical diagnosis will therefore likely be required before such systems can be implemented safely and at scale.

A recurrent question is whether AI should augment or replace human judgement. The evidence and current practice favor augmentation: AI is best conceived as a vigilant, tireless assistant that reduces oversight errors and narrows performance variability, while the endoscopist retains contextual reasoning and accountability. Even the autonomous computer-aided diagnosis arm in Djinbachian et al[40] matched average human performance rather than surpassing expert level, and absolute accuracies of approximately 75%-80% in pragmatic diminutive polyp cohorts indicate that neither party is infallible; in our pooled estimates, approximately 6%-9% of AI diagnoses were still incorrect, emphasizing the ongoing value of a clinician in the loop. Limited “replacement” may be acceptable for specific tasks, for example foregoing histology on diminutive distal hyperplastic lesions when validated AI provides ≥ 90% NPV with high confidence, but even in that context, the endoscopist adjudicates uncertainties, weighs comorbidity risks, and integrates patient preferences[22]. Over the near term, the highest-value model remains partnership: Computer-aided detection to ensure few lesions are missed; computer-aided diagnosis to standardize and accelerate optical biopsy; and clinicians to manage exceptions, complications, and shared decisions. Practical implementation will therefore require not only technical validation but also structured training, workflow integration, and clear medical-legal frameworks.

This synthesis has limitations that mirror the field’s growing pains. Between-study heterogeneity was substantial (I2 = 74%-81%, P < 0.001), reflecting differences in imaging platforms, operator expertise, lesion spectra, and study design; we addressed this with random-effects modelling, but residual variability means that the pooled average may not uniformly apply to every setting. Publication bias is a concern in a rapidly advancing area, given the preferential visibility of positive trials; nonetheless, neutral findings have been reported and included, tempering optimism[24]. Outcome definitions varied (per-polyp vs per-patient; overall vs high-confidence calls), and several studies allowed algorithms to abstain in low-confidence cases, which can inflate apparent accuracy compared with humans who must decide in real time. Histopathology was treated as a perfect reference, yet it can err in tiny or fragmented specimens; such misclassification would undercount true AI (and human) performance. Finally, AI systems evolve quickly; some models included were first-generation, so pooled estimates may be conservative for the latest iterations, while also reminding us that validation and regulatory oversight must keep pace with innovation[21,26].

Despite these caveats, the weight of evidence points to a consistent conclusion. AI-assisted characterization is at least as good as, and usually better than, expert optical diagnosis across diverse settings, with the largest gains seen when baseline human performance is modest, imaging is enhanced, and outputs are integrated thoughtfully into workflow. Coupled with the well-established improvements in detection (lower adenoma miss rates and better adenoma detection rates across skill levels)[37], the characterization gains suggest that AI can raise both the floor and the ceiling of colonoscopy quality. The path to routine, safe implementation runs through multicenter pragmatic trials, standardized reporting, human-factors-aware interfaces that display interpretable, well-calibrated outputs, and realistic assessments of cost and resource implications. As societies such as the BSG and ASGE refine guidance on discard/Leave in situ policies, carefully governed AI can help endoscopy units deliver more consistent care, reduce unnecessary pathology and procedures, and direct resources where they have the greatest clinical impact[9,22,23]. In that sense, AI is less a replacement than a multiplier, one that extends expert-level performance across teams and settings while preserving the clinician’s central role in judgment, communication, and patient-centered care.

The evolution toward AI-supported endoscopy also parallels broader shifts in how gastroenterologists acquire and apply expertise. As shown in a recent systematic review, immersive technologies such as virtual reality have already proven effective in accelerating endoscopic skill acquisition, reducing procedural time, and improving accuracy among trainees. When combined with AI-guided systems, these training modalities can create a feedback-rich environment in which performance improvement becomes continuous and quantifiable[41]. At the same time, translational work in colorectal oncology underscores the biological complexity that persists beyond the visual field, illustrating how tumor micro-environmental factors such as macrophage polarization continue to shape disease behavior and prognosis[42]. Complementing these technological and biological insights, comparative diagnostic studies in endoscopy - such as the meta-analysis by Woods and Soldera[43] demonstrating the non-inferiority of colon capsule endoscopy to conventional colonoscopy for polyp detection - reinforce how innovation can preserve diagnostic quality while expanding accessibility and patient acceptance. Together, these advances highlight that AI integration is not merely a technical enhancement but part of a larger redefinition of precision gastroenterology - one that couples machine vision with biological insight and human expertise to achieve more individualized, evidence-driven care.

This review has several limitations. First, the primary search was conducted in PubMed/MEDLINE, which, although broadly representative of biomedical and endoscopy research, does not capture all AI-related publications. Important studies indexed exclusively in EMBASE, Scopus, Web of Science, or Cochrane Central Register of Controlled Trials may therefore have been missed, introducing potential database-related selection bias. Second, three eligible studies could not be included because full texts were unobtainable despite attempts to contact corresponding authors and institutions. This may have affected the precision of our pooled estimates, particularly given the modest number of included trials. Third, several studies lacked reporting of human-reader accuracy or comparator data. In accordance with standards for diagnostic test-accuracy meta-analysis, these studies were included only in pooled estimates of AI performance and excluded from the AI-vs-human comparisons; however, incomplete reporting limits the depth of comparative analysis. Fourth, although substantial heterogeneity is expected in imaging and AI literature, the limited and inconsistently reported methodological details across studies prevented meaningful subgroup analyses (e.g., by imaging modality, operator expertise, AI architecture, or study design). As more standardized and higher-quality trials become available, future updates of this work will be able to explore these sources of heterogeneity more robustly. Together, these factors should be considered when interpreting the results.

CONCLUSION

AI is redefining the standard of optical diagnosis in colonoscopy. The evidence synthesized in this review demonstrates that AI can deliver a level of diagnostic accuracy once achievable only by highly experienced endoscopists, while maintaining reproducibility across different clinical environments. Its integration into practice represents a pivotal advance in the move toward real-time, image-based histologic assessment. Rather than replacing human expertise, AI elevates it - standardizing decision-making, reducing uncertainty, and reinforcing confidence in optical biopsy as a safe, evidence-based alternative to routine histopathology.

Looking forward, the adoption of validated AI systems should be viewed not as experimental but as an essential step in modern CRC prevention. Their use can rationalize resource allocation, reduce procedural inefficiencies, and expand access to high-quality diagnosis in both expert and community settings. For policymakers and professional societies, the challenge now lies not in proving AI’s value but in defining frameworks for its safe deployment, auditing, and continuous learning. With thoughtful integration, AI will not merely assist colonoscopy - it will help redefine what constitutes excellence in endoscopic practice.

ACKNOWLEDGEMENTS

We extend our appreciation to the Faculty of Life Sciences and Education at the University of South Wales in association with Learna Ltd. for the Master of Science in Gastroenterology program and their invaluable support in our work. We sincerely acknowledge the efforts of the University of South Wales and commend them for their commitment to providing life-long learning opportunities and advanced life skills to healthcare professionals.

References
1.  Vabi BW, Gibbs JF, Parker GS. Implications of the growing incidence of global colorectal cancer. J Gastrointest Oncol. 2021;12:S387-S398.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 4]  [Cited by in RCA: 26]  [Article Influence: 5.2]  [Reference Citation Analysis (0)]
2.  World Health Organization  Global Cancer Observatory: Cancer Today. [cited 15 October 2025]. Available from: https://gco.iarc.fr/today.  [PubMed]  [DOI]
3.  Morgan E, Arnold M, Gini A, Lorenzoni V, Cabasag CJ, Laversanne M, Vignat J, Ferlay J, Murphy N, Bray F. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN. Gut. 2023;72:338-344.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 511]  [Cited by in RCA: 1384]  [Article Influence: 461.3]  [Reference Citation Analysis (13)]
4.  Arnold M, Sierra MS, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global patterns and trends in colorectal cancer incidence and mortality. Gut. 2017;66:683-691.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 3058]  [Cited by in RCA: 3536]  [Article Influence: 392.9]  [Reference Citation Analysis (4)]
5.  Brenner H, Stock C, Hoffmeister M. Effect of screening sigmoidoscopy and screening colonoscopy on colorectal cancer incidence and mortality: systematic review and meta-analysis of randomised controlled trials and observational studies. BMJ. 2014;348:g2467.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 663]  [Cited by in RCA: 636]  [Article Influence: 53.0]  [Reference Citation Analysis (2)]
6.  Gupta S, Lieberman D, Anderson JC, Burke CA, Dominitz JA, Kaltenbach T, Robertson DJ, Shaukat A, Syngal S, Rex DK. Recommendations for Follow-Up After Colonoscopy and Polypectomy: A Consensus Update by the US Multi-Society Task Force on Colorectal Cancer. Am J Gastroenterol. 2020;115:415-434.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 107]  [Cited by in RCA: 145]  [Article Influence: 24.2]  [Reference Citation Analysis (0)]
7.  Yang P, Teng F, Bai S, Xia Y, Xie Z, Cheng Z, Li J, Lei Z, Wang K, Zhang B, Yang T, Wan X, Yin H, Shen H, Pawlik TM, Lau WY, Fu Z, Shen F. Liver resection versus liver transplantation for hepatocellular carcinoma within the Milan criteria based on estimated microvascular invasion risks. Gastroenterol Rep (Oxf). 2023;11:goad035.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 3]  [Cited by in RCA: 10]  [Article Influence: 3.3]  [Reference Citation Analysis (0)]
8.  Rastogi A, Keighley J, Singh V, Callahan P, Bansal A, Wani S, Sharma P. High accuracy of narrow band imaging without magnification for the real-time characterization of polyp histology and its comparison with high-definition white light colonoscopy: a prospective study. Am J Gastroenterol. 2009;104:2422-2430.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 136]  [Cited by in RCA: 141]  [Article Influence: 8.3]  [Reference Citation Analysis (0)]
9.  Rutter MD, East J, Rees CJ, Cripps N, Docherty J, Dolwani S, Kaye PV, Monahan KJ, Novelli MR, Plumb A, Saunders BP, Thomas-Gibson S, Tolan DJM, Whyte S, Bonnington S, Scope A, Wong R, Hibbert B, Marsh J, Moores B, Cross A, Sharp L. British Society of Gastroenterology/Association of Coloproctology of Great Britain and Ireland/Public Health England post-polypectomy and post-colorectal cancer resection surveillance guidelines. Gut. 2020;69:201-223.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 235]  [Cited by in RCA: 293]  [Article Influence: 48.8]  [Reference Citation Analysis (0)]
10.  Heldwein W, Dollhopf M, Rösch T, Meining A, Schmidtsdorff G, Hasford J, Hermanek P, Burlefinger R, Birkner B, Schmitt W; Munich Gastroenterology Group. The Munich Polypectomy Study (MUPS): prospective analysis of complications and risk factors in 4000 colonic snare polypectomies. Endoscopy. 2005;37:1116-1122.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 213]  [Cited by in RCA: 216]  [Article Influence: 10.3]  [Reference Citation Analysis (0)]
11.  Kruk ME, Gage AD, Arsenault C, Jordan K, Leslie HH, Roder-DeWan S, Adeyi O, Barker P, Daelmans B, Doubova SV, English M, García-Elorrio E, Guanais F, Gureje O, Hirschhorn LR, Jiang L, Kelley E, Lemango ET, Liljestrand J, Malata A, Marchant T, Matsoso MP, Meara JG, Mohanan M, Ndiaye Y, Norheim OF, Reddy KS, Rowe AK, Salomon JA, Thapa G, Twum-Danso NAY, Pate M. High-quality health systems in the Sustainable Development Goals era: time for a revolution. Lancet Glob Health. 2018;6:e1196-e1252.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 1897]  [Cited by in RCA: 2148]  [Article Influence: 268.5]  [Reference Citation Analysis (0)]
12.  Byrne MF, Chapados N, Soudan F, Oertel C, Linares Pérez M, Kelly R, Iqbal N, Chandelier F, Rex DK. Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut. 2019;68:94-100.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 363]  [Cited by in RCA: 435]  [Article Influence: 62.1]  [Reference Citation Analysis (0)]
13.  Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44-56.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2376]  [Cited by in RCA: 3668]  [Article Influence: 524.0]  [Reference Citation Analysis (5)]
14.  Chongo G, Soldera J. Use of machine learning models for the prognostication of liver transplantation: A systematic review. World J Transplant. 2024;14:88891.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in CrossRef: 10]  [Cited by in RCA: 16]  [Article Influence: 8.0]  [Reference Citation Analysis (1)]
15.  Soldera J, Tomé F, Corso LL, Rech MM, Ferrazza AD, Terres AZ, Cini BT, Eberhardt LZ, Balensiefer JIL, Balbinot RS, Muscope ALF, Longen ML, Schena B, Rost GL Jr, Furlan RG, Balbinot RA, Balbinot SS. Use of a machine learning algorithm to predict rebleeding and mortality for oesophageal variceal bleeding in cirrhotic patients. EMJ Gastroenterol. 2020;9:46-48.  [PubMed]  [DOI]
16.  Soldera J, Corso LL, Rech MM, Ballotin VR, Bigarella LG, Tomé F, Moraes N, Balbinot RS, Rodriguez S, Brandão ABM, Hochhegger B. Predicting major adverse cardiovascular events after orthotopic liver transplantation using a supervised machine learning model: A cohort study. World J Hepatol. 2024;16:193-210.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 4]  [Cited by in RCA: 12]  [Article Influence: 6.0]  [Reference Citation Analysis (1)]
17.  Ballotin VR, Bigarella LG, Soldera J, Soldera J. Deep learning applied to the imaging diagnosis of hepatocellular carcinoma. Artif Intell Gastrointest Endosc. 2021;2:127-135.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 3]  [Cited by in RCA: 5]  [Article Influence: 1.0]  [Reference Citation Analysis (0)]
18.  Abut S, Okut H, Kallail KJ. Paradigm shift from Artificial Neural Networks (ANNs) to deep Convolutional Neural Networks (DCNNs) in the field of medical image processing. Expert Syst Appl. 2024;244:122983.  [PubMed]  [DOI]  [Full Text]
19.  Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J. A guide to deep learning in healthcare. Nat Med. 2019;25:24-29.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1123]  [Cited by in RCA: 1810]  [Article Influence: 258.6]  [Reference Citation Analysis (0)]
20.  Misawa M, Kudo SE, Mori Y, Cho T, Kataoka S, Yamauchi A, Ogawa Y, Maeda Y, Takeda K, Ichimasa K, Nakamura H, Yagawa Y, Toyoshima N, Ogata N, Kudo T, Hisayuki T, Hayashi T, Wakamura K, Baba T, Ishida F, Itoh H, Roth H, Oda M, Mori K. Artificial Intelligence-Assisted Polyp Detection for Colonoscopy: Initial Experience. Gastroenterology. 2018;154:2027-2029.e3.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 317]  [Cited by in RCA: 282]  [Article Influence: 35.3]  [Reference Citation Analysis (1)]
21.  Lou S, Du F, Song W, Xia Y, Yue X, Yang D, Cui B, Liu Y, Han P. Artificial intelligence for colorectal neoplasia detection during colonoscopy: a systematic review and meta-analysis of randomized clinical trials. EClinicalMedicine. 2023;66:102341.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 2]  [Cited by in RCA: 48]  [Article Influence: 16.0]  [Reference Citation Analysis (0)]
22.  Rex DK, Anderson JC, Butterly LF, Day LW, Dominitz JA, Kaltenbach T, Ladabaum U, Levin TR, Shaukat A, Achkar JP, Farraye FA, Kane SV, Shaheen NJ. Quality Indicators for Colonoscopy. Am J Gastroenterol. 2024;119:1754-1780.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 46]  [Cited by in RCA: 41]  [Article Influence: 20.5]  [Reference Citation Analysis (0)]
23.  Mori Y, Kudo SE, Misawa M, Saito Y, Ikematsu H, Hotta K, Ohtsuka K, Urushibara F, Kataoka S, Ogawa Y, Maeda Y, Takeda K, Nakamura H, Ichimasa K, Kudo T, Hayashi T, Wakamura K, Ishida F, Inoue H, Itoh H, Oda M, Mori K. Real-Time Use of Artificial Intelligence in Identification of Diminutive Polyps During Colonoscopy: A Prospective Study. Ann Intern Med. 2018;169:357-366.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 412]  [Cited by in RCA: 377]  [Article Influence: 47.1]  [Reference Citation Analysis (4)]
24.  Renner J, Phlipsen H, Haller B, Navarro-Avila F, Saint-Hill-Febles Y, Mateus D, Ponchon T, Poszler A, Abdelhafez M, Schmid RM, von Delius S, Klare P. Optical classification of neoplastic colorectal polyps - a computer-assisted approach (the COACH study). Scand J Gastroenterol. 2018;53:1100-1106.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 28]  [Cited by in RCA: 37]  [Article Influence: 4.6]  [Reference Citation Analysis (0)]
25.  Wang P, Liu P, Glissen Brown JR, Berzin TM, Zhou G, Lei S, Liu X, Li L, Xiao X. Lower Adenoma Miss Rate of Computer-Aided Detection-Assisted Colonoscopy vs Routine White-Light Colonoscopy in a Prospective Tandem Study. Gastroenterology. 2020;159:1252-1261.e5.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 81]  [Cited by in RCA: 162]  [Article Influence: 27.0]  [Reference Citation Analysis (3)]
26.  Kudo SE, Misawa M, Mori Y, Hotta K, Ohtsuka K, Ikematsu H, Saito Y, Takeda K, Nakamura H, Ichimasa K, Ishigaki T, Toyoshima N, Kudo T, Hayashi T, Wakamura K, Baba T, Ishida F, Inoue H, Itoh H, Oda M, Mori K. Artificial Intelligence-assisted System Improves Endoscopic Identification of Colorectal Neoplasms. Clin Gastroenterol Hepatol. 2020;18:1874-1881.e2.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 184]  [Cited by in RCA: 163]  [Article Influence: 27.2]  [Reference Citation Analysis (0)]
27.  Sato K, Kuramochi M, Tsuchiya A, Yamaguchi A, Hosoda Y, Yamaguchi N, Nakamura N, Itoi Y, Hashimoto Y, Kasuga K, Tanaka H, Kuribayashi S, Takeuchi Y, Uraoka T. Multicentre study to assess the performance of an artificial intelligence instrument to support qualitative diagnosis of colorectal polyps. BMJ Open Gastroenterol. 2024;11:e001553.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 3]  [Reference Citation Analysis (0)]
28.  van der Zander QEW, Roumans R, Kusters CHJ, Dehghani N, Masclee AAM, de With PHN, van der Sommen F, Snijders CCP, Schoon EJ. Appropriate trust in artificial intelligence for the optical diagnosis of colorectal polyps: the role of human/artificial intelligence interaction. Gastrointest Endosc. 2024;100:1070-1078.e10.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 3]  [Cited by in RCA: 17]  [Article Influence: 8.5]  [Reference Citation Analysis (0)]
29.  Wang P, Berzin TM, Glissen Brown JR, Bharadwaj S, Becq A, Xiao X, Liu P, Li L, Song Y, Zhang D, Li Y, Xu G, Tu M, Liu X. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut. 2019;68:1813-1819.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 398]  [Cited by in RCA: 592]  [Article Influence: 84.6]  [Reference Citation Analysis (0)]
30.  Choudhury A, Asan O. Role of Artificial Intelligence in Patient Safety Outcomes: Systematic Literature Review. JMIR Med Inform. 2020;8:e18599.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 115]  [Cited by in RCA: 177]  [Article Influence: 29.5]  [Reference Citation Analysis (0)]
31.  Andersen ES, Birk-Korch JB, Hansen RS, Fly LH, Röttger R, Arcani DMC, Brasen CL, Brandslund I, Madsen JS. Monitoring performance of clinical artificial intelligence in health care: a scoping review. JBI Evid Synth. 2024;22:2423-2446.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 22]  [Reference Citation Analysis (0)]
32.  Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 44932]  [Cited by in RCA: 52734]  [Article Influence: 10546.8]  [Reference Citation Analysis (2)]
33.  Frandsen TF, Bruun Nielsen MF, Lindhardt CL, Eriksen MB. Using the full PICO model as a search tool for systematic reviews resulted in lower recall for some PICO elements. J Clin Epidemiol. 2020;127:69-75.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 20]  [Cited by in RCA: 120]  [Article Influence: 20.0]  [Reference Citation Analysis (0)]
34.  Deeks JJ, Higgins JP, Altman DG, McKenzie JE, Veroniki AA.   Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch V, editors. Cochrane Handbook for Systematic Reviews of Interventions version 6.5. London: Cochrane, 2024.  [PubMed]  [DOI]
35.  Song F, Khan KS, Dinnes J, Sutton AJ. Asymmetric funnel plots and publication bias in meta-analyses of diagnostic accuracy. Int J Epidemiol. 2002;31:88-95.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 319]  [Cited by in RCA: 361]  [Article Influence: 15.0]  [Reference Citation Analysis (0)]
36.  Thabane L, Mbuagbaw L, Zhang S, Samaan Z, Marcucci M, Ye C, Thabane M, Giangregorio L, Dennis B, Kosa D, Borg Debono V, Dillenburg R, Fruci V, Bawor M, Lee J, Wells G, Goldsmith CH. A tutorial on sensitivity analyses in clinical trials: the what, why, when and how. BMC Med Res Methodol. 2013;13:92.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 520]  [Cited by in RCA: 602]  [Article Influence: 46.3]  [Reference Citation Analysis (0)]
37.  Balduzzi S, Rücker G, Schwarzer G. How to perform a meta-analysis with R: a practical tutorial. Evid Based Ment Health. 2019;22:153-160.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1954]  [Cited by in RCA: 3780]  [Article Influence: 540.0]  [Reference Citation Analysis (0)]
38.  Barua I, Wieszczy P, Kudo SE, Misawa M, Holme Ø, Gulati S, Williams S, Mori K, Itoh H, Takishima K, Mochizuki K, Miyata Y, Mochida K, Akimoto Y, Kuroki T, Morita Y, Shiina O, Kato S, Nemoto T, Hayee B, Patel M, Gunasingam N, Kent A, Emmanuel A, Munck C, Nilsen JA, Hvattum SA, Frigstad SO, Tandberg P, Løberg M, Kalager M, Haji A, Bretthauer M, Mori Y. Real-Time Artificial Intelligence-Based Optical Diagnosis of Neoplastic Polyps during Colonoscopy. NEJM Evid. 2022;1:EVIDoa2200003.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 52]  [Cited by in RCA: 77]  [Article Influence: 19.3]  [Reference Citation Analysis (0)]
39.  Chino A, Ide D, Abe S, Yoshinaga S, Ichimasa K, Kudo T, Ninomiya Y, Oka S, Tanaka S, Igarashi M. Performance evaluation of a computer-aided polyp detection system with artificial intelligence for colonoscopy. Dig Endosc. 2024;36:185-194.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 5]  [Article Influence: 2.5]  [Reference Citation Analysis (0)]
40.  Djinbachian R, Haumesser C, Taghiakbari M, Pohl H, Barkun A, Sidani S, Liu Chen Kiow J, Panzini B, Bouchard S, Deslandres E, Alj A, von Renteln D. Autonomous Artificial Intelligence vs Artificial Intelligence-Assisted Human Optical Diagnosis of Colorectal Polyps: A Randomized Controlled Trial. Gastroenterology. 2024;167:392-399.e2.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 26]  [Cited by in RCA: 46]  [Article Influence: 23.0]  [Reference Citation Analysis (0)]
41.  Dương TQ, Soldera J. Virtual reality tools for training in gastrointestinal endoscopy: A systematic review. Artif Intell Gastrointest Endosc. 2024;5:92090.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 4]  [Reference Citation Analysis (4)]
42.  Brambilla E, Brambilla DJF, Tregnago AC, Riva F, Pasqualotto FF, Soldera J. Exploring macrophage polarization as a prognostic indicator for colorectal cancer: Unveiling the impact of metalloproteinase mutations. World J Clin Cases. 2025;13:105011.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 3]  [Reference Citation Analysis (0)]
43.  Woods M, Soldera J. Colon capsule endoscopy polyp detection rate vs colonoscopy polyp detection rate: Systematic review and meta-analysis. World J Meta-Anal. 2024;12:100726.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
Footnotes

Peer review: Externally peer reviewed.

Peer-review model: Single blind

Corresponding Author's Membership in Professional Societies: Federação Brasileira de Gastroenterologia; Sociedade Brasileira de Endoscopia Digestiva; Sociedade Brasileira de Hepatologia; Grupo de Estudos da Doença Inflamatória Intestinal do Brasil.

Specialty type: Gastroenterology and hepatology

Country of origin: United Kingdom

Peer-review report’s classification

Scientific quality: Grade B, Grade C

Novelty: Grade B, Grade C

Creativity or innovation: Grade B, Grade C

Scientific significance: Grade B, Grade C

P-Reviewer: Osera S, PhD, Chief Physician, Japan S-Editor: Zuo Q L-Editor: A P-Editor: Zhang YL