Assessing deep learning models for multi-class upper endoscopic disease segmentation: A comprehensive comparative study

doi:10.3748/wjg.v31.i41.111184

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 31, Issue 41

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (619)

All Articles published online

The chart showing PDF series, HTML series, Figures (1-9) series, Tables (1-9) series.

Item

Count

PDF

HTML

174

Figures (1-9)

Tables (1-9)

Sum=233

Featured Article

The chart showing Browse series, Download series.

Item

Count

Browse

Download

142

Sum=239

Publishing Process of This Article

Item

Count

Browse

Download

Sum=111

Nov 7, 2025 (publication date) through Dec 8, 2025

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Gastroenterology

ISSN

1007-9327

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Basic Study

World J Gastroenterol. Nov 7, 2025; 31(41): 111184
Published online Nov 7, 2025. doi: 10.3748/wjg.v31.i41.111184

Assessing deep learning models for multi-class upper endoscopic disease segmentation: A comprehensive comparative study

In Neng Chan, Pak Kin Wong, Tao Yan, Yan-Yan Hu, Chon In Chan, Ye-Ying Qin, Chi Hong Wong, In Weng Chan, Ieng Hou Lam, Sio Hou Wong, Zheng Li, Shan Gao, Hon Ho Yu, Liang Yao, Bao-Liang Zhao, Ying Hu

In Neng Chan, Pak Kin Wong, Ye-Ying Qin, Department of Electromechanical Engineering, University of Macau, Macau 999078, China

Tao Yan, School of Mechanical Engineering, Hubei University of Arts and Science, Xiangyang 441053, Hubei Province, China

Yan-Yan Hu, Zheng Li, Shan Gao, Department of Gastroenterology, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang 441021, Hubei Province, China

Chon In Chan, Ieng Hou Lam, Sio Hou Wong, Hon Ho Yu, Department of Gastroenterology, Kiang Wu Hospital, Macau 999078, China

Chi Hong Wong, In Weng Chan, Faculty of Medicine, Macau University of Science and Technology, Macau 999078, China

Liang Yao, Bao-Liang Zhao, Ying Hu, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong Province, China

Author contributions: Chan IN and Wong PK conceptualized the study, contributed to methodology and resources, were responsible for writing the original draft, and contributed equally to this work; Wong PK and Yan T provided supervision, project administration, and funding acquisition; Chan IN developed the software and conducted the investigation; Yan T contributed to resources and critically revised the manuscript; Hu YY and Chan CI contributed to conceptualization, data curation, and resources; Qin YY conducted the investigation and contributed to writing, review, and editing; Wong CH and Chan IW contributed to data curation, review, and editing; Lam IH, Wong SH, and Li Z were responsible for data curation; Gao S, Yu HH, Yao L, Zhao BL, and Wu Y provided supervision; all authors read and approved the final version of the manuscript.

Supported by the Guangdong Basic and Applied Basic Research Foundation, No. 2021B1515130003; the Key Research and Development Plan of Hubei Province, No. 2022BCE034; and the Natural Science Foundation of Hubei Province, No. 2024AFB1054.

Institutional review board statement: The study was reviewed and approved by the Medical Ethics Committee of Xiangyang Central Hospital (No. 2024-145) and the Medical Ethics Committee of Kiang Wu Hospital, Macao (No. 2019-005), and conducted in accordance with the principles of the Declaration of Helsinki.

Institutional animal care and use committee statement: This study did not involve any animal experiments or the use of laboratory animals.

Conflict-of-interest statement: The authors declare that they have no conflict of interest.

Data sharing statement: The data of the self-collected dataset is available if requested, while the EDD2020 dataset is a public dataset that is freely accessible to researchers.

Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/

Corresponding author: Pak Kin Wong, PhD, Professor, Department of Electromechanical Engineering, University of Macau, Avenida da Universidade, Taipa, Macau 999078, China. fstpkw@um.edu.mo

Received: June 26, 2025
Revised: August 3, 2025
Accepted: September 28, 2025
Published online: November 7, 2025
Processing time: 134 Days and 4 Hours

Abstract

BACKGROUND

Upper gastrointestinal (UGI) diseases present diagnostic challenges during endoscopy due to visual similarities, indistinct boundaries, and observer variability, which can lead to missed diagnoses and delayed treatment. Automated segmentation using deep learning (DL) models offers the potential to assist endoscopists, improve diagnostic accuracy, and reduce workload. However, multi-class UGI disease segmentation remains underexplored, with limited annotated datasets and insufficient focus on clinical validation. This study hypothesizes that comparative analysis of different DL architectures can identify models suitable for clinical application, providing actionable insights to reduce diagnostic errors and support clinical decision-making in endoscopic practice.

AIM

To evaluate 17 state-of-the-art DL models for multi-class UGI disease segmentation, emphasizing clinical translation and real-world applicability.

METHODS

This study evaluated 17 DL models spanning convolutional neural network (CNN)-, transformer-, and mamba-based architectures using a self-collected dataset from two hospitals in Macao and Xiangyang (3313 images, 9 classes) and the public EDD2020 dataset (386 images, 5 classes). Models were assessed for segmentation performance and performance-efficiency trade-off. Statistical analyses were conducted to examine performance differences across architectures. Generalization capability was measured through a cross-dataset evaluation (training models on the self-collected dataset and testing on the EDD2020 dataset).

RESULTS

Swin-UMamba achieved the highest segmentation performance across both datasets [intersection over union (IoU): 89.06% ± 0.20% self-collected, 77.53% ± 0.32% EDD2020], followed by SegFormer (IoU: 88.94% ± 0.38% self-collected, 77.20% ± 0.98% EDD2020) and ConvNeXt + UPerNet (IoU: 88.48% ± 0.09% self-collected, 76.90% ± 0.61% EDD2020). Statistical analyses showed no significant differences between paradigms, though hierarchical architectures with pre-trained encoders consistently outperformed simpler designs. SegFormer achieved the best balance of accuracy and computational efficiency with a performance-efficiency trade-off score of 92.02%, making it suitable for real-time clinical use. Cross-dataset evaluation revealed significant performance drops, with generalization retention rates of 64.78% to 71.52%. Transformer-based models, particularly pyramid vision transformer v2 + efficient multi-scale convolutional decoding (IoU: 63.35% ± 1.44%), generalized better than CNN- and mamba-based models.

CONCLUSION

Hierarchical architectures like Swin-UMamba and SegFormer show promise for UGI disease segmentation, reducing missed diagnoses and improving workflows, but robust clinical validation is crucial for real-world deployment.

Keywords: Deep learning; Upper endoscopy; Medical imaging; Gastrointestinal diseases; Disease segmentation

Core Tip: This study evaluates 17 advanced deep learning models, including convolutional neural network-, transformer-, and mamba-based architectures, for multi-class upper gastrointestinal disease segmentation. Swin-UMamba achieves the highest segmentation accuracy, while SegFormer balances efficiency and performance. Automated segmentation demonstrates significant clinical value by improving diagnostic precision, reducing missed diagnoses, streamlining treatment planning, and easing physician workload. Key challenges include lighting variability, vague lesion boundaries, multi-label complexities, and dataset limitations. Future directions, such as multi-modal learning, self-supervised techniques, spatio-temporal modeling, and rigorous clinical validation, are essential to enhance model robustness and ensure applicability in diverse healthcare settings for better patient outcomes.