Published online Aug 21, 2025. doi: 10.3748/wjg.v31.i31.109389
Revised: May 22, 2025
Accepted: July 25, 2025
Published online: August 21, 2025
Processing time: 101 Days and 19 Hours
In 2025, Shi et al constructed a model utilizing machine learning techniques to predict the one-year recurrence of colorectal polyps following endoscopic mucosal resection, showing excellent discriminatory performance with an area under the curve exceeding 0.90. However, limitations exist regarding its narrow temporal scope, potential overestimation due to feature collinearity and imputation opacity, and limited generalizability due to single-center derivation and validation. Moreover, no clear clinical implementation strategy was outlined. Prospective multicenter validation and integration of endoscopist variability, longitudinal outcome data, and deployment mechanisms are warranted to ensure broader applicability and clinical utility.
Core Tip: This letter provides a critical appraisal of a recent machine learning model designed to predict colorectal polyp recurrence after endoscopic mucosal resection. It highlights key methodological issues, such as endpoint selection, imputation trans
- Citation: Li GY, Zhai LL. Insights into a machine learning-based prediction model for colorectal polyp recurrence after endoscopic mucosal resection. World J Gastroenterol 2025; 31(31): 109389
- URL: https://www.wjgnet.com/1007-9327/full/v31/i31/109389.htm
- DOI: https://dx.doi.org/10.3748/wjg.v31.i31.109389
We read with interest the study by Shi et al[1], who introduced and assessed a machine learning (ML) model aimed at predicting one-year recurrence of colorectal polyps following endoscopic mucosal resection (EMR). While the model is an encouraging step toward precision surveillance, several important limitations and enhancement opportunities should be discussed to improve both methodological robustness and clinical relevance.
First, the study’s endpoint - one-year recurrence - is somewhat narrow in scope. International guidelines typically recommend surveillance intervals of 3 years for adenoma follow-up unless high-risk features are present[2,3]. This aligns with global practice patterns emphasizing risk-based stratification. While the authors note that recurrence risk peaks within one year in their Chinese cohort, a broader focus on multi-year outcomes (e.g., 3-year recurrence) would strengthen clinical applicability and harmonize the model’s scope with international standards[2,3].
Second, although the model's performance metrics, particularly the area under the curve of the XGBoost model (> 0.90), are impressive, they may be over-optimistic due to possible data leakage and unaddressed confounders[4,5]. The use of SHapley Additive exPlanation to interpret feature importance is commendable[6], but the authors do not discuss collinearity or redundancy across included features. Notably, some predictors (e.g., polyp number and size) are closely interrelated and may have inflated apparent model performance. Moreover, handling of missing data using multiple imputation warrants more transparency, especially for biomarkers with > 15% missingness. Proper documentation of imputation methods and sensitivity analyses would enhance the study’s reproducibility and credibility. As highlighted in recent methodological reviews, robust imputation techniques and transparent reporting are essential to reduce bias and maintain model validity in clinical prediction research[4,5].
Third, the generalizability of the model remains limited. All data were derived from a single geographic region in China, and external validation was conducted only on a modest sample from the same institution[7]. The absence of international, multi-center validation raises concerns about overfitting to local practice patterns. Additionally, the model does not account for endoscopist-related variability, which is a known determinant of recurrence[7].
Lastly, although the authors developed an online calculator based on the XGBoost model, the clinical implementation pathway remains vague. How this tool would integrate with existing surveillance guidelines and electronic health records[8,9], or whether clinicians would realistically adopt such a model, is not addressed. A decision impact analysis or user-acceptability study would be useful next steps[10].
In summary, the authors present an important advancement in post-polypectomy risk stratification using ML. However, additional methodological transparency, broader validation, and implementation planning are essential for translating this promising tool into clinical practice.
While the ML model developed by Shi et al[1] represents a promising tool for predicting colorectal polyp recurrence, its current limitations - such as narrow endpoint selection, lack of external validation, and incomplete methodological transparency - highlight the need for further refinement. Future prospective, multi-center studies and the incorporation of real-world clinical variables will be critical for improving the model’s robustness, generalizability, and clinical integration.
1. | Shi YH, Liu JL, Cheng CC, Li WL, Sun H, Zhou XL, Wei H, Fei SJ. Construction and validation of machine learning-based predictive model for colorectal polyp recurrence one year after endoscopic mucosal resection. World J Gastroenterol. 2025;31:102387. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 5] [Reference Citation Analysis (1)] |
2. | Hassan C, Antonelli G, Dumonceau JM, Regula J, Bretthauer M, Chaussade S, Dekker E, Ferlitsch M, Gimeno-Garcia A, Jover R, Kalager M, Pellisé M, Pox C, Ricciardiello L, Rutter M, Helsingen LM, Bleijenberg A, Senore C, van Hooft JE, Dinis-Ribeiro M, Quintero E. Post-polypectomy colonoscopy surveillance: European Society of Gastrointestinal Endoscopy (ESGE) Guideline - Update 2020. Endoscopy. 2020;52:687-700. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 150] [Cited by in RCA: 314] [Article Influence: 62.8] [Reference Citation Analysis (0)] |
3. | Lieberman DA, Rex DK, Winawer SJ, Giardiello FM, Johnson DA, Levin TR. Guidelines for colonoscopy surveillance after screening and polypectomy: a consensus update by the US Multi-Society Task Force on Colorectal Cancer. Gastroenterology. 2012;143:844-857. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1367] [Cited by in RCA: 1445] [Article Influence: 111.2] [Reference Citation Analysis (0)] |
4. | Sisk R, Sperrin M, Peek N, van Smeden M, Martin GP. Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study. Stat Methods Med Res. 2023;32:1461-1477. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 27] [Article Influence: 13.5] [Reference Citation Analysis (0)] |
5. | Tsvetanova A, Sperrin M, Jenkins D, Peek N, Buchan I, Hyland S, Martin G. Compatibility in Missing Data Handling Across the Prediction Model Pipeline: A Simulation Study. Stud Health Technol Inform. 2024;310:1476-1477. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
6. | Guan C, Gong A, Zhao Y, Yin C, Geng L, Liu L, Yang X, Lu J, Xiao B. Interpretable machine learning model for new-onset atrial fibrillation prediction in critically ill patients: a multi-center study. Crit Care. 2024;28:349. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 15] [Reference Citation Analysis (0)] |
7. | Kaminski MF, Regula J, Kraszewska E, Polkowski M, Wojciechowska U, Didkowska J, Zwierko M, Rupinski M, Nowacki MP, Butruk E. Quality indicators for colonoscopy and the risk of interval cancer. N Engl J Med. 2010;362:1795-1803. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1287] [Cited by in RCA: 1467] [Article Influence: 97.8] [Reference Citation Analysis (0)] |
8. | US Preventive Services Task Force, Davidson KW, Barry MJ, Mangione CM, Cabana M, Caughey AB, Davis EM, Donahue KE, Doubeni CA, Krist AH, Kubik M, Li L, Ogedegbe G, Owens DK, Pbert L, Silverstein M, Stevermer J, Tseng CW, Wong JB. Screening for Colorectal Cancer: US Preventive Services Task Force Recommendation Statement. JAMA. 2021;325:1965-1977. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 402] [Cited by in RCA: 1131] [Article Influence: 282.8] [Reference Citation Analysis (0)] |
9. | Hewitson P, Glasziou P, Watson E, Towler B, Irwig L. Cochrane systematic review of colorectal cancer screening using the fecal occult blood test (hemoccult): an update. Am J Gastroenterol. 2008;103:1541-1549. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 676] [Cited by in RCA: 720] [Article Influence: 42.4] [Reference Citation Analysis (0)] |
10. | Kahi CJ, Boland CR, Dominitz JA, Giardiello FM, Johnson DA, Kaltenbach T, Lieberman D, Levin TR, Robertson DJ, Rex DK; United States Multi-Society Task Force on Colorectal Cancer. Colonoscopy Surveillance After Colorectal Cancer Resection: Recommendations of the US Multi-Society Task Force on Colorectal Cancer. Gastroenterology. 2016;150:758-768.e11. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 117] [Cited by in RCA: 148] [Article Influence: 16.4] [Reference Citation Analysis (0)] |