Copyright
©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
Assessing deep learning models for multi-class upper endoscopic disease segmentation: A comprehensive comparative study
In Neng Chan, Pak Kin Wong, Tao Yan, Yan-Yan Hu, Chon In Chan, Ye-Ying Qin, Chi Hong Wong, In Weng Chan, Ieng Hou Lam, Sio Hou Wong, Zheng Li, Shan Gao, Hon Ho Yu, Liang Yao, Bao-Liang Zhao, Ying Hu
In Neng Chan, Pak Kin Wong, Ye-Ying Qin, Department of Electromechanical Engineering, University of Macau, Macau 999078, China
Tao Yan, School of Mechanical Engineering, Hubei University of Arts and Science, Xiangyang 441053, Hubei Province, China
Yan-Yan Hu, Zheng Li, Shan Gao, Department of Gastroenterology, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang 441021, Hubei Province, China
Chon In Chan, Ieng Hou Lam, Sio Hou Wong, Hon Ho Yu, Department of Gastroenterology, Kiang Wu Hospital, Macau 999078, China
Chi Hong Wong, In Weng Chan, Faculty of Medicine, Macau University of Science and Technology, Macau 999078, China
Liang Yao, Bao-Liang Zhao, Ying Hu, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong Province, China
Author contributions: Chan IN and Wong PK conceptualized the study, contributed to methodology and resources, were responsible for writing the original draft, and contributed equally to this work; Wong PK and Yan T provided supervision, project administration, and funding acquisition; Chan IN developed the software and conducted the investigation; Yan T contributed to resources and critically revised the manuscript; Hu YY and Chan CI contributed to conceptualization, data curation, and resources; Qin YY conducted the investigation and contributed to writing, review, and editing; Wong CH and Chan IW contributed to data curation, review, and editing; Lam IH, Wong SH, and Li Z were responsible for data curation; Gao S, Yu HH, Yao L, Zhao BL, and Wu Y provided supervision; all authors read and approved the final version of the manuscript.
Supported by the Guangdong Basic and Applied Basic Research Foundation, No. 2021B1515130003; the Key Research and Development Plan of Hubei Province, No. 2022BCE034; and the Natural Science Foundation of Hubei Province, No. 2024AFB1054.
Institutional review board statement: The study was reviewed and approved by the Medical Ethics Committee of Xiangyang Central Hospital (No. 2024-145) and the Medical Ethics Committee of Kiang Wu Hospital, Macao (No. 2019-005), and conducted in accordance with the principles of the Declaration of Helsinki.
Institutional animal care and use committee statement: This study did not involve any animal experiments or the use of laboratory animals.
Conflict-of-interest statement: The authors declare that they have no conflict of interest.
Data sharing statement: The data of the self-collected dataset is available if requested, while the EDD2020 dataset is a public dataset that is freely accessible to researchers.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:
https://creativecommons.org/Licenses/by-nc/4.0/ Corresponding author: Pak Kin Wong, PhD, Professor, Department of Electromechanical Engineering, University of Macau, Avenida da Universidade, Taipa, Macau 999078, China.
fstpkw@um.edu.mo
Received: June 26, 2025
Revised: August 3, 2025
Accepted: September 28, 2025
Published online: November 7, 2025
Processing time: 134 Days and 4 Hours
BACKGROUND
Upper gastrointestinal (UGI) diseases present diagnostic challenges during endoscopy due to visual similarities, indistinct boundaries, and observer variability, which can lead to missed diagnoses and delayed treatment. Automated segmentation using deep learning (DL) models offers the potential to assist endoscopists, improve diagnostic accuracy, and reduce workload. However, multi-class UGI disease segmentation remains underexplored, with limited annotated datasets and insufficient focus on clinical validation. This study hypothesizes that comparative analysis of different DL architectures can identify models suitable for clinical application, providing actionable insights to reduce diagnostic errors and support clinical decision-making in endoscopic practice.
AIM
To evaluate 17 state-of-the-art DL models for multi-class UGI disease segmentation, emphasizing clinical translation and real-world applicability.
METHODS
This study evaluated 17 DL models spanning convolutional neural network (CNN)-, transformer-, and mamba-based architectures using a self-collected dataset from two hospitals in Macao and Xiangyang (3313 images, 9 classes) and the public EDD2020 dataset (386 images, 5 classes). Models were assessed for segmentation performance and performance-efficiency trade-off. Statistical analyses were conducted to examine performance differences across architectures. Generalization capability was measured through a cross-dataset evaluation (training models on the self-collected dataset and testing on the EDD2020 dataset).
RESULTS
Swin-UMamba achieved the highest segmentation performance across both datasets [intersection over union (IoU): 89.06% ± 0.20% self-collected, 77.53% ± 0.32% EDD2020], followed by SegFormer (IoU: 88.94% ± 0.38% self-collected, 77.20% ± 0.98% EDD2020) and ConvNeXt + UPerNet (IoU: 88.48% ± 0.09% self-collected, 76.90% ± 0.61% EDD2020). Statistical analyses showed no significant differences between paradigms, though hierarchical architectures with pre-trained encoders consistently outperformed simpler designs. SegFormer achieved the best balance of accuracy and computational efficiency with a performance-efficiency trade-off score of 92.02%, making it suitable for real-time clinical use. Cross-dataset evaluation revealed significant performance drops, with generalization retention rates of 64.78% to 71.52%. Transformer-based models, particularly pyramid vision transformer v2 + efficient multi-scale convolutional decoding (IoU: 63.35% ± 1.44%), generalized better than CNN- and mamba-based models.
CONCLUSION
Hierarchical architectures like Swin-UMamba and SegFormer show promise for UGI disease segmentation, reducing missed diagnoses and improving workflows, but robust clinical validation is crucial for real-world deployment.
Core Tip: This study evaluates 17 advanced deep learning models, including convolutional neural network-, transformer-, and mamba-based architectures, for multi-class upper gastrointestinal disease segmentation. Swin-UMamba achieves the highest segmentation accuracy, while SegFormer balances efficiency and performance. Automated segmentation demonstrates significant clinical value by improving diagnostic precision, reducing missed diagnoses, streamlining treatment planning, and easing physician workload. Key challenges include lighting variability, vague lesion boundaries, multi-label complexities, and dataset limitations. Future directions, such as multi-modal learning, self-supervised techniques, spatio-temporal modeling, and rigorous clinical validation, are essential to enhance model robustness and ensure applicability in diverse healthcare settings for better patient outcomes.