Published online Feb 28, 2026. doi: 10.3748/wjg.v32.i8.115297
Revised: November 22, 2025
Accepted: January 4, 2026
Published online: February 28, 2026
Processing time: 121 Days and 1.5 Hours
In this editorial we comment on the article by Chan et al. The study presents the most comprehensive comparative evaluation to date of deep learning models for multi-class segmentation of upper gastrointestinal diseases, leveraging a novel 3313-image, nine-class clinical dataset alongside the public EDD2020 benchmark. Their results demonstrate that hierarchical, pre-trained encoders (notably Swin-UMamba-D) deliver the highest segmentation accuracy, while SegFormer ba
Core Tip: This study evaluates 17 advanced deep learning models, including convolutional neural network-, transformer-, and mamba-based architectures, for multi-class upper gastrointestinal disease segmentation. Swin-UMamba achieves the highest segmentation accuracy, while SegFormer balances efficiency and performance. Automated segmentation demon
- Citation: Yang YH. Bridging innovation and clinical reality: Interpreting the comparative study of deep learning models for multi-class upper gastrointestinal disease segmentation. World J Gastroenterol 2026; 32(8): 115297
- URL: https://www.wjgnet.com/1007-9327/full/v32/i8/115297.htm
- DOI: https://dx.doi.org/10.3748/wjg.v32.i8.115297
Endoscopy remains the clinical gold standard for diagnosing many upper gastrointestinal (UGI) conditions, yet missed lesions and observer variability are longstanding, clinically significant problems[1,2]. With the development of artificial intelligence (AI) and more specifically deep learning (DL), its application for UGI endoscopy has moved from algorithmic novelty to an active front in the clinical armamentarium[3].
Over the last five years, convolutional neural networks (CNNs) and endoscopy-specific DL systems have evolved from proof-of-concept demonstrations into tools achieving near-human or even over endoscopists’ performance for lesion detection, characterization, and outcome prediction in the esophagus and stomach[4-6]. Automated segmentation has become one of the most important applications of endoscopy-specific DL systems integrating many new approaches, such as DL-spanning CNNs, vision transformers, and hybrid or mamba-based architectures. It has been demonstrated to reliably delineate multiple disease classes in varied imaging conditions to materially reduce missed diagnoses, improve triage during procedures, and standardize documentation for downstream care[7,8]. These advances have enabled not only lesion detection but also automated segmentation to delineate lesion boundaries useful for biopsy guidance and surgical planning.
Multi-class segmentation that simultaneously identifies ulcers, polyps, varices, neoplasms and other pathologies is of strong clinical interest because it reflects real-world decision tasks. However, it remains nontrivial from impressive benchmark metrics to routine clinical utility. The comparative segmentation studies have illuminated both pragmatic op
Chan et al[9] have provided a practical decision-making toolkit for a realistic, multi-label clinical scenario considering AI augmentation of endoscopic workflows beyond benchmark models. This editorial aims to interpret this comparative study of DL models for multi-class upper GI disease segmentation in the context of the broader literature and to argue for its further paths in bridging laboratory performance and routine clinical practices.
In recent years, technical performance of endoscopy-specific DL models has been found impressive but task-specific. Multiple studies focusing on UGI conditions have reported high sensitivities and accuracies, for example, DL-based diagnostic systems have shown detection sensitivity in a range of 80%-90% for esophageal and gastric neoplasia[10,11]. These models could delineate lesion margins, predict invasion depth, and even detect Helicobacter pylori with promising accuracy[12], reflecting a maturity in CNN architectures and AI training strategies as well as the availability of annotated datasets enabling multi-class segmentation and classification. Additionally, DL systems have displayed operational ad
However, some critical caveats temper the promise. First, considering the training data realism and distributional shift, most high-performing models only fit tertiary centers trained on high-quality annotated images. Their application in community practice and low-prevalence screening populations has been found limited due to selection bias and diseased-enriched datasets[5,16]. It has also caused the imbalance of clinical data and risk of overfitting when datasets were small without representative[4]. Otherwise, multi-class segmentation was annotation-intensive, so it exists inter-rater variability for manual frame-by-frame delineation which often lacks standardized criteria across centers. Thus, it has remained immature for current propagation tools and consensus annotation protocols[17]. Additionally, static image segmentation metrics were not insufficient because in real clinical tasks. The video-based endoscopic practice was easily influenced by navigation, insufflation, and mucosal preparation. Few AI systems have robust prospective validation in live procedures that the individualized analysis of endoscopy images seemed unpracticable even for random controlled trials[18].
At its core, the study of Chan et al[9] has demonstrated that contemporary DL architectures could achieve competitive segmentation performance across multiple lesion classes when trained on curated endoscopic image sets (Table 1). The work has also convincingly shown that hierarchical encoders with pretraining ImageNet (Swin-UMamba-D, SegFormer) dominated in accuracy and generalization. Except for the progress of CNN, design choices were decisive by encoding multi-scale anatomical priors and leveraging transfer learning. These choices have noted the priority of clinically meaningful inductive biases like multi-scale hierarchies or boundary-awareness in future model innovation over chasing incremental architecture permutations. Otherwise, PETs might not be optional. As a pragmatic choice, SegFormer’s PET profile showed the preferable possibility of slightly lowering intersection over union (IoU) to allow real-time clinical utility concerning inference time, memory footprint, and integration complexity. In constrained clinical environments, the AI-based endoscopy-assisted tool need to be optimized considering edge inference, regulatory documentation, and user acceptance, not merely IoU.
| Model name | Architecture family | Pretraining | Mean IoU, % | Mean Dice | PET score, % | GRR, % | Inference latency (milliseconds/frame) | Model size | Notes |
| U-Net | CNN | N/A | 74.88 | (Self 79.37 + EDD2020 67.63)/2 = 73.50% | 45.74 | 65.41 | 3.64 | 31.46 M, approximately 126 MB | Classic encoder-decoder with skip connections; excellent throughput with very high FPS but lower PET; simple architecture may limit generalization |
| ResNet + U-Net | CNN | ResNet encoder typically ImageNet-1K | 80.63 | (83.59 + 73.97)/2 = 78.78% | 82.58 | 67.03 | 5.18 | 32.52 M, approximately 130 MB | Residual backbone improves accuracy and robustness with modest compute; good PET |
| ConvNeXt + UPerNet | CNN | Typically ImageNet-1K | 82.69 | (85.65 + 76.90)/2 = 81.27% | 84.70 | 68.79 | 5.65 | 41.37 M, approximately 165 MB | Modern CNN with ViT-inspired design; strong accuracy and good throughput |
| M2SNet | CNN | N/A | 80.87 | (84.24 + 74.81)/2 = 79.53% | 72.33 | 67.64 | 14.86 | 29.89 M, approximately 120 MB | Multi-scale subtraction units for improved feature complementarity and edge clarity; slower inference among CNNs |
| Dilated SegNet | CNN | ResNet50 backbone | 80.55 | (83.35 + 73.64)/2 = 78.49% | 74.88 | 68.08 | 9.23 | 18.111 M, approximately 72 MB | Dilated convolutions for real-time polyp segmentation; good trade-off, moderate speed |
| PraNet | CNN | N/A | 73.74 | (74.15 + 61.12)/2 = 67.63% | 48.81 | 65.79 | 12.16 | 32.56 M, approximately 130 MB | Parallel partial decoder + reverse attention for boundary refinement; decent accuracy but lower PET and generalization |
| SwinV2 + UPerNet | Transformer | SwinV2 typically ImageNet-pretrained | 82.74 | (85.59 + 76.97)/2 = 81.28% | 78.41 | 68.12 | 12.19 | 41.91 M, approximately 168 MB | SwinV2 hierarchical transformer backbone with UPerNet decoder; strong accuracy with moderate compute |
| SegFormer | Transformer | Typically ImageNet-1K | 82.86 | (93.14 + 77.20)/2 = 85.17% | 92.02 | 70.11 | 9.68 | 24.73 M, approximately 99 MB | Excellent performance-efficiency balance; low FLOPs (4.23 GFLOPs), good generalization: Recommended for real-time clinical use per paper |
| SETR-MLA | Transformer | ViT backbone | 77.42 | (82.14 + 71.48)/2 = 76.81% | 52.45 | 69.67 | 5.55 | 90.77 M, approximately 363 MB | Segmentation transformer with multi-level aggregation; large parameter count but relatively fast inference in this setup |
| TransUNet | Hybrid (CNN + transformer) | ResNet50 + ViT | 74.81 | (77.35 + 65.06)/2 = 71.21% | 26.39 | 67.14 | 13.03 | 105.00 M, approximately 420 MB | Combines CNN encoder and ViT: Strong representational power but heavy compute and lower PET |
| PVTV2 + EMCAD | Transformer | PVTV2 usually ImageNet-pretrained | 82.91 | (85.81 + 77.07)/2 = 81.44% | 88.14 | 71.38 | 12.56 | 26.77 M, approximately 107 MB | Pyramid vision transformer v2 + efficient multi-scale decoding; strong generalization and good PET |
| FCBFormer | Transformer | PVTV2 backbone | 82.00 | (85.18 + 76.03)/2 = 80.61% | 61.89 | 71.52 | 21.43 | 33.09 M, approximately 132 MB | Polyp-specialized transformer variant; strong generalization (highest GRR) but highest inference time among many models, limiting real-time use at high resolution |
| Swin-UMamba | Hybrid (Mamba based) | N/A | 79.45 | (81.23 + 71.12)/2 = 76.18% | 53.31 | 70.93 | 13.00 | 59.89 M, approximately 240 MB | Mamba hybrid leveraging visual-state-space-model; good generalization but relatively large and slower training/inference cost |
| Swin-UMamba-D | Hybrid (Mamba based) | N/A | 83.29 | (86.15 + 77.53)/2 = 81.84% | 88.39 | 69.36 | 12.97 | 27.50 M, approximately 110 MB | Best segmentation performance (average IoU) among study but relatively high training and inference cost; strong segmentation accuracy but moderate generalization |
| UMamba-Bot | Hybrid (Mamba based) | N/A | 71.61 | (75.03 + 61.79)/2 = 68.41% | 39.45 | 64.78 | 6.27 | 28.77 M, approximately 115 MB | Lightweight mamba variant; good FPS but weaker accuracy and generalization |
| UMamba-Enc | Hybrid (Mamba based) | N/A | 71.28 | (75.24 + 61.82)/2 = 68.53% | 37.15 | 65.33 | 7.28 | 27.56 M, approximately 110 MB | Encoder-focused mamba variant; similar trade-offs to UMamba-Bot: Faster but lower accuracy |
| VM-UNETV2 | Hybrid (Mamba like) | N/A | 81.63 | (84.36 + 74.89)/2 = 79.63% | 83.48 | 69.49 | 12.90 | 22.77 M, approximately 91 MB | VM-UNET encoder variant; strong PET and competitive accuracy, GPU-focused design |
Dataset diversity and annotation rigor have remained the bottlenecks that the self-collected dataset substantially improved class coverage and size relative to EDD2020, yet single-region sourcing and class imbalance remained per
Chan et al’s comparative methodology[9] has addressed a longstanding need for reproducibility and fair benchmarking in gastrointestinal DL-based computer vision research using shared metrics and standardized ground truth. There are several contextual realities making this work timely and consequential (Table 2). First, missed and late diagnoses remain a serious global health problem in upper GI disease, especially early cancer diagnosis[19]. Second, DL-based research has mainly focused on architectural novelty and leaderboard performance without sufficient considerations of clinical constraints such as interpretability and domain shift[20]. Third, multi-class segmentation shows clinically practical than single-target tasks that clinicians frequently distinguish among multiple concurrent pathologies, such as ulcers, polyps, varices, and etc., but it’s intrinsically harder that most of prior systems couldn’t be robustly referred[21]. Chan et al[9] aimed to bridge these gaps through the efforts of expanding dataset diversity and disease coverage and contrasting ar
| Translational challenge | Concrete solution | Responsible stakeholders | Measurable success metrics |
| Dataset bias and limited diversity | Multi-center data sharing agreements; standardized metadata schema (demographics, device, protocol); stratified sampling and targeted collection for underrepresented cohorts; federated learning to enable cross-site models while preserving privacy | Clinical consortiums, data governance teams, hospital IT, study PIs, legal/compliance | Number of centers and countries represented; device/vendor diversity index; demographic coverage (age/sex/ethnicity) proportions; change in GRR and external IoU on held-out sites |
| Annotation variability and subjectivity | Develop and enforce standardized annotation protocol and labeling guidelines; multi-expert consensus labeling; adjudication workflows; active learning to prioritize ambiguous cases; periodic re-annotation audits | Clinical experts (gastroenterologists), annotation managers, platform vendors, data scientists | Inter-rater agreement (Cohen’s kappa/mean IoU across annotators); % masks adjudicated; annotation time per case; model performance gains after consensus labels |
| Class imbalance/rare pathology sensitivity | Oversampling/targeted collection of rare classes; class-aware loss functions (focal, class-weighted); synthetic data and augmentation for rare classes; curriculum learning focusing on rare classes | Data acquisition teams, ML engineers, clinical partners, biostatisticians | Per-class recall/sensitivity (especially for rare classes); AUPRC for rare classes; reduction in false-negative rate for underrepresented labels |
| Imaging variability (lighting, specular reflection, motion blur) | Advanced preprocessing (illumination normalization, reflection removal), robust augmentation (exposure, blur, specular sim), self-supervised pretraining on large unlabeled endoscopy corpora; spatio-temporal modeling for videos | ML research team, imaging engineers, clinical endoscopy unit, vendors | Performance stratified by exposure/quality buckets (IoU under overexposed vs normal); reduction in failure cases linked to artifacts; frame-level temporal consistency metrics (temporal IoU) |
| Poor cross-dataset generalization/overfitting | Cross-dataset evaluation, domain adaptation techniques, federated or multi-site training, hold-out external validation sets, regularization and ensembling | ML engineers, external collaborators, validation leads, statisticians | Delta IoU/Dice between internal test and external test sets; GRR improvement on external cohorts; calibration metrics (Brier score) |
| Real-time performance and resource constraints (PET) | Model compression and pruning; lightweight architectures (e.g., SegFormer variants); hardware benchmark targeting (edge GPU/CPU); optimized inference pipelines | ML engineers, DevOps, clinical IT, hardware vendors | Inference latency (microseconds/frame), throughput (fps) on target hardware; memory usage, FLOPs; PET score or task-specific tradeoff metric; clinician acceptance for live use |
| Clinical validation and impact on workflow | Prospective clinical studies, reader studies comparing model + clinician vs clinician alone; integration pilots in endoscopy suite; user-centred UI/UX design and training | Clinical investigators, hospital operations, human factors specialists, clinical IT | Diagnostic accuracy improvement (sensitivity/specificity) in prospective trials; change in missed-lesion rate; time-to-report; clinician satisfaction and adoption rates |
| Trust, explainability and clinician acceptance | Provide visual explanations (attention maps, uncertainty overlays); case-level confidence scores; reporting of failure modes and limitations; clinician training modules | ML explainability team, clinical educators, product managers, regulatory/QA | Proportion of model outputs with uncertainty flags; clinician trust scores in surveys; reduction in dismissed correct alerts; explainability usability ratings |
| Privacy, legal & regulatory readiness | Data de-identification pipeline, DPIAs, early engagement with regulators, pre-specified validation plan, post-market surveillance plan, robust audit trails | Legal/compliance, regulatory affairs, data governance, QA, cybersecurity | Completion of DPIA and IRB approvals; regulatory submission milestones (pre-submission, submission, approvals); number of privacy incidents; time to resolve security findings |
| Multi-modal and longitudinal integration | Design multi-modal models (image + report + temporal video), link endoscopy frames with pathology/EMR metadata, adopt interoperable standards (DICOM/HL7/FHIR) | Data engineers, clinical informatics, pathology, ML researchers, standards officers | Increase in model performance when adding modalities (delta IoU/Dice); % cases with linked pathology; successful end-to-end FHIR/DICOM integrations; improvement in clinically-relevant outcome measures (e.g., appropriate biopsy rate) |
This kind of comparative study involving multiple DL segmentation models has shown its strengths in boundary delineation and computational efficiency. However, the biases have merged when ranking models merely based on task metrics, like IoU, Dice, sensitivity, or specificity, and implying that higher numbers of these metrics would lead to better clinical decisions. The training and validation process of comparative studies would confound external validity. There have been selection biases when training initial models on curated datasets collected in tertiary centers with higher disease prevalence and expert operators, which would be lack of generalizability to different settings. The biases were hard to eliminate through data augmentation and synthetic images. In a design of a comparative study, the presence of a superior metric only implied the model’s fit to the specific dataset rather than the genuine clinical utility.
Without careful regularization and external validation, DL models often overfit idiosyncrasies of image acquisition rather than pathology, which meant that a model with marginally great Dice scores might not be reliable when exposing to different scopes or endoscopists. Several studies have reported the unsatisfactory explainability in DL-based models that clinicians with insufficient AI knowledge require intelligible reasoning to accept advanced models’ output, such as heatmaps and bounding boxes tied to interpretable features[22,23]. Merely comparative superiority in raw metrics was hard to account for how easily the model could be interrogated in the clinical practice. Additionally, the human-AI interaction is important that the DL model’s value depends partly on how it alters endoscopist behavior. It remained doubtful that if the automated segmentation model exceled at delineating gastric intestinal metaplasia boundaries, it would increase biopsy yield at meaningful sites or obtain benefits on clinical decision making[24]. With the occurrence of false positive cases, comparative evaluations must examine not only per-frame metrics but their clinical consequences to avoid unnecessary interventions. Moreover, since the requirement of regulatory approval, robust post-market surveillance, and cybersecurity, the models with superior performance in the comparative analysis might fail to scale. The clinical equity of DL models was hard to be assured. The training datasets had few risks in demographic, ethnic, and equipment diversity.
The comparative studies of multi-class segmentation models need a multi-dimensional evaluation framework to move beyond superficial comparisons and to achieve clinical translation. The implementation of multi-site external validation set compares models on demographically diverse test sets to simulate community practice and non-expert operators. The DL-based segmentation metrics tightly integrates with clinically relevant outcomes to increase biopsy yield accuracy and optimize treatment decisions. It’s also warranted to evaluate sensitivity to nuisances, partial occlusion, and prevalence shifts. The observational experiments for human-AI interaction trials should be applied that endoscopists use model outputs in a simulated or live environment to measure behavioral changes. Additional cost-effectiveness analysis examines computational costs and annotation burden to show predicted impact on health economic metrics. The DL-based segmentation models’ performance should also be evaluated by stratification of patient subgroups, scope equip
Several structural factors converge attributing to why has translation lagged despite rapid architectural progress. First, misaligned incentives: Academic and industry incentives reward novelty and benchmark gains including new architectures and leaderboard rank more directly than labor-intensive clinical validation, so teams optimize for lea
To move from exhortation to action, we propose a concrete four-pillar evaluation blueprint to operationalize clinical translation for multi-class endoscopic segmentation systems. Each pillar defines measurable requirements, minimum datasets/experiments, and pragmatic pass/fail criteria.
What to measure: Procedure-level and patient-relevant endpoints based on changes in biopsy yield at clinically meaningful sites, reduction in missed lesions per procedure, procedure time saved, and improvement in novice endoscopist diagnostic accuracy.
Minimum evaluation: Report both pixel-level metrics (IoU/Dice) and at least two procedure-level outcomes in retrospective or simulated workflows; include lesion-level sensitivity/specificity and time-to-decision if applicable.
Pass/fail criterion: Demonstrate a pre-specified clinically meaningful improvement in at least one procedure-level outcome in simulated or pilot clinical use, for example, increase in targeted biopsy yield or reduction in missed lesions.
What to measure: Cross-site performance, device/manufacturer stratification, lighting and prep variability, and demographic subgroup performance like age, sex, and ethnicity.
Minimum evaluation: Multi-site external validation across at least three geographically and device-diverse centers; report GRR and per-subgroup performance with confidence intervals.
Pass/fail criterion: Minimum GRR threshold align with pre-specified regulatory standard for instance, GRR ≥ 0.8 and no clinically significant performance degradation in any defined demographic or device subgroup.
What to measure: Effects on clinician behavior and safety with automation bias, alert fatigue, and changes to biopsy selection, interpretability/usability, and failure-mode analysis.
Minimum evaluation: Observational human-AI interaction studies in simulated or live settings measuring decision change, task time, false positive-driven unnecessary interventions, and clinician trust/usability in accordance to stan
Pass/fail criterion: Absence of net harm with no increase in unnecessary interventions and no clinically significant deterioration in decision quality; acceptable usability scores; pre-specified mitigation strategies for common failure modes.
What to measure: Inference latency/memory for target deployment environment, annotation cost and reproducibility, cybersecurity/data governance posture, and cost-effectiveness.
Minimum evaluation: Profile edge-performance with real-time feasibility, annotation variability analysis, and a basic health-economic model, such as projected cost per additional lesion detected.
Pass/fail criterion: Real-time inference on intended clinical hardware, standardized annotation protocol with inter-rater agreement above a threshold, and a positive or justified cost-effectiveness estimate for the intended use case.
Integration of these pillars move evaluation from isolated technical metrics to a reproducible and clinically meaningful validation pathway, which can be operationalized in pre-clinical trials and regulatory submissions.
Chan et al[9] presented compelling evidence that DL-based multi-class segmentation is nearing clinical viability for UGI endoscopy. The comparative study of DL models for multi-class upper GI segmentation as a necessary and informative exercise has implied the importance of architectural design, pretraining, and multi-scale feature modeling. It incisively addressed the translational bottlenecks, including dataset diversity, annotation uncertainty, and evaluation standards. The study calls for the next phase of endoscopic AI by uniting technical ingenuity with clinical rigor, multi-institutional collaboration, and ethically grounded validation. Only by following such a path can automated segmentation mature into a practice-changing tool for gastrointestinal care.
| 1. | Bhat P, Kaffes AJ, Lassen K, Aabakken L. Upper gastrointestinal endoscopy in the surgically altered patient. Dig Endosc. 2024;36:1077-1093. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 3] [Article Influence: 1.5] [Reference Citation Analysis (0)] |
| 2. | Veitch AM, Uedo N, Yao K, East JE. Optimizing early upper gastrointestinal cancer detection at endoscopy. Nat Rev Gastroenterol Hepatol. 2015;12:660-667. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 65] [Cited by in RCA: 95] [Article Influence: 8.6] [Reference Citation Analysis (0)] |
| 3. | He Q, Bano S, Ahmad OF, Yang B, Chen X, Valdastri P, Lovat LB, Stoyanov D, Zuo S. Deep learning-based anatomical site classification for upper gastrointestinal endoscopy. Int J Comput Assist Radiol Surg. 2020;15:1085-1094. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 16] [Cited by in RCA: 31] [Article Influence: 5.2] [Reference Citation Analysis (0)] |
| 4. | Yan T, Wong PK, Qin YY. Deep learning for diagnosis of precancerous lesions in upper gastrointestinal endoscopy: A review. World J Gastroenterol. 2021;27:2531-2544. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 18] [Cited by in RCA: 15] [Article Influence: 3.0] [Reference Citation Analysis (0)] |
| 5. | Sharma P, Hassan C. Artificial Intelligence and Deep Learning for Upper Gastrointestinal Neoplasia. Gastroenterology. 2022;162:1056-1066. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 50] [Article Influence: 12.5] [Reference Citation Analysis (0)] |
| 6. | Tokat M, van Tilburg L, Koch AD, Spaander MCW. Artificial Intelligence in Upper Gastrointestinal Endoscopy. Dig Dis. 2022;40:395-408. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 3] [Cited by in RCA: 24] [Article Influence: 4.8] [Reference Citation Analysis (0)] |
| 7. | Ren X, Zhou W, Yuan N, Li F, Ruan Y, Zhou H. Prompt-based polyp segmentation during endoscopy. Med Image Anal. 2025;102:103510. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 8. | Wang S, Cong Y, Zhu H, Chen X, Qu L, Fan H, Zhang Q, Liu M. Multi-Scale Context-Guided Deep Network for Automated Lesion Segmentation With Endoscopy Images of Gastrointestinal Tract. IEEE J Biomed Health Inform. 2021;25:514-525. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 98] [Cited by in RCA: 64] [Article Influence: 12.8] [Reference Citation Analysis (0)] |
| 9. | Chan IN, Wong PK, Yan T, Hu YY, Chan CI, Qin YY, Wong CH, Chan IW, Lam IH, Wong SH, Li Z, Gao S, Yu HH, Yao L, Zhao BL, Hu Y. Assessing deep learning models for multi-class upper endoscopic disease segmentation: A comprehensive comparative study. World J Gastroenterol. 2025;31:111184. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 10. | Nakao E, Yoshio T, Kato Y, Namikawa K, Tokai Y, Yoshimizu S, Horiuchi Y, Ishiyama A, Hirasawa T, Kurihara N, Ishizuka N, Ishihara R, Tada T, Fujisaki J. Randomized controlled trial of an artificial intelligence diagnostic system for the detection of esophageal squamous cell carcinoma in clinical practice. Endoscopy. 2025;57:210-217. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4] [Cited by in RCA: 11] [Article Influence: 11.0] [Reference Citation Analysis (0)] |
| 11. | Li SW, Zhang LH, Cai Y, Zhou XB, Fu XY, Song YQ, Xu SW, Tang SP, Luo RQ, Huang Q, Yan LL, He SQ, Zhang Y, Wang J, Ge SQ, Gu BB, Peng JB, Wang Y, Fang LN, Wu WD, Ye WG, Zhu M, Luo DH, Jin XX, Yang HD, Zhou JJ, Wang ZZ, Wu JF, Qin QQ, Lu YD, Wang F, Chen YH, Chen X, Xu SJ, Tung TH, Luo CW, Ye LP, Yu HG, Mao XL. Deep learning assists detection of esophageal cancer and precursor lesions in a prospective, randomized controlled study. Sci Transl Med. 2024;16:eadk5395. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 15] [Cited by in RCA: 22] [Article Influence: 11.0] [Reference Citation Analysis (1)] |
| 12. | Ebigbo A, Messmann H, Lee SH. Artificial Intelligence Applications in Image-Based Diagnosis of Early Esophageal and Gastric Neoplasms. Gastroenterology. 2025;169:396-415.e2. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 14] [Reference Citation Analysis (0)] |
| 13. | Glissen Brown JR, Mansour NM, Wang P, Chuchuca MA, Minchenberg SB, Chandnani M, Liu L, Gross SA, Sengupta N, Berzin TM. Deep Learning Computer-aided Polyp Detection Reduces Adenoma Miss Rate: A United States Multi-center Randomized Tandem Colonoscopy Study (CADeT-CS Trial). Clin Gastroenterol Hepatol. 2022;20:1499-1507.e4. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 27] [Cited by in RCA: 144] [Article Influence: 36.0] [Reference Citation Analysis (1)] |
| 14. | Wu L, Shang R, Sharma P, Zhou W, Liu J, Yao L, Dong Z, Yuan J, Zeng Z, Yu Y, He C, Xiong Q, Li Y, Deng Y, Cao Z, Huang C, Zhou R, Li H, Hu G, Chen Y, Wang Y, He X, Zhu Y, Yu H. Effect of a deep learning-based system on the miss rate of gastric neoplasms during upper gastrointestinal endoscopy: a single-centre, tandem, randomised controlled trial. Lancet Gastroenterol Hepatol. 2021;6:700-708. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 82] [Cited by in RCA: 93] [Article Influence: 18.6] [Reference Citation Analysis (0)] |
| 15. | Wallace MB, Sharma P, Bhandari P, East J, Antonelli G, Lorenzetti R, Vieth M, Speranza I, Spadaccini M, Desai M, Lukens FJ, Babameto G, Batista D, Singh D, Palmer W, Ramirez F, Palmer R, Lunsford T, Ruff K, Bird-Liebermann E, Ciofoaia V, Arndtz S, Cangemi D, Puddick K, Derfus G, Johal AS, Barawi M, Longo L, Moro L, Repici A, Hassan C. Impact of Artificial Intelligence on Miss Rate of Colorectal Neoplasia. Gastroenterology. 2022;163:295-304.e5. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 195] [Cited by in RCA: 166] [Article Influence: 41.5] [Reference Citation Analysis (1)] |
| 16. | Jong MR, Jaspers TJM, Kusters CHJ, Jukema JB, van Eijck van Heslinga RAH, Fockens KN, Boers TGW, Visser LS, van der Putten JA, van der Sommen F, de With PH, de Groof AJ, Bergman JJ; BONS‐AI consortium. Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real-World Imaging Conditions on Barrett's Neoplasia Detection. United European Gastroenterol J. 2025;13:929-937. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 7] [Article Influence: 7.0] [Reference Citation Analysis (0)] |
| 17. | Nathani P, Sharma P. Role of Artificial Intelligence in the Detection and Management of Premalignant and Malignant Lesions of the Esophagus and Stomach. Gastrointest Endosc Clin N Am. 2025;35:319-353. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 18. | Luo H, Xu G, Li C, He L, Luo L, Wang Z, Jing B, Deng Y, Jin Y, Li Y, Li B, Tan W, He C, Seeruttun SR, Wu Q, Huang J, Huang DW, Chen B, Lin SB, Chen QM, Yuan CM, Chen HX, Pu HY, Zhou F, He Y, Xu RH. Real-time artificial intelligence for detection of upper gastrointestinal cancer by endoscopy: a multicentre, case-control, diagnostic study. Lancet Oncol. 2019;20:1645-1654. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 155] [Cited by in RCA: 278] [Article Influence: 39.7] [Reference Citation Analysis (0)] |
| 19. | Danpanichkul P, Auttapracha T, Kongarin S, Ponvilawan B, Simadibrata DM, Duangsonk K, Jaruvattanadilok S, Saowapa S, Suparan K, Lui RN, Liangpunsakul S, Wallace MB, Wijarnpreecha K. Global epidemiology of early-onset upper gastrointestinal cancer: trend from the Global Burden of Disease Study 2019. J Gastroenterol Hepatol. 2024;39:1856-1868. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 8] [Cited by in RCA: 22] [Article Influence: 11.0] [Reference Citation Analysis (0)] |
| 20. | Neri A, Penza V, Baldini C, Mattos LS. Surgical augmented reality registration methods: A review from traditional to deep learning approaches. Comput Med Imaging Graph. 2025;124:102616. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 4] [Article Influence: 4.0] [Reference Citation Analysis (0)] |
| 21. | Weisman AJ, Huff DT, Govindan RM, Chen S, Perk TG. Multi-organ segmentation of CT via convolutional neural network: impact of training setting and scanner manufacturer. Biomed Phys Eng Express. 2023;9. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 4] [Article Influence: 1.3] [Reference Citation Analysis (0)] |
| 22. | Habe TT, Haataja K, Toivanen P. Review of Deep Learning Performance in Wireless Capsule Endoscopy Images for GI Disease Classification. F1000Res. 2024;13:201. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 23. | Krenzer A, Heil S, Fitting D, Matti S, Zoller WG, Hann A, Puppe F. Automated classification of polyps using deep learning architectures and few-shot learning. BMC Med Imaging. 2023;23:59. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 26] [Reference Citation Analysis (1)] |
| 24. | Campion JR, O'Connor DB, Lahiff C. Human-artificial intelligence interaction in gastrointestinal endoscopy. World J Gastrointest Endosc. 2024;16:126-135. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 10] [Reference Citation Analysis (3)] |
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
