Retrospective Study Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Mar 21, 2025; 31(11): 102387
Published online Mar 21, 2025. doi: 10.3748/wjg.v31.i11.102387
Construction and validation of machine learning-based predictive model for colorectal polyp recurrence one year after endoscopic mucosal resection
Yi-Heng Shi, Jun-Liang Liu, Cong-Cong Cheng, Wen-Ling Li, Su-Juan Fei, Department of Gastroenterology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou 221002, Jiangsu Province, China
Yi-Heng Shi, Cong-Cong Cheng, Wen-Ling Li, The First Clinical Medical College of Xuzhou Medical University, Xuzhou 221002, Jiangsu Province, China
Han Sun, Xi-Liang Zhou, Department of Gastroenterology, Xuzhou Central Hospital, The Affiliated Xuzhou Hospital of Medical College of Southeast University, Xuzhou 221009, Jiangsu Province, China
Hong Wei, Department of Gastroenterology, Xuzhou New Health Hospital, North Hospital of Xuzhou Cancer Hospital, Xuzhou 221007, Jiangsu Province, China
ORCID number: Su-Juan Fei (0000-0002-9753-0803).
Author contributions: Shi YH and Liu JL conceived and designed the study; Shi YH, Liu JL, Cheng CC, Li WL and Sun H participated in data processing and statistical analysis; Shi YH, Liu JL, Cheng CC, Li WL, Sun H, Zhou XL, Wei H and Fei SJ drafted the manuscript; Shi YH and Liu JL contributed to data analysis and interpretation; Fei SJ supervised the review of the study; All authors seriously revised and approved the final manuscript.
Institutional review board statement: The study was designed as per the Declaration of Helsinki and was conducted according to the TRIPOD guidelines, with ethical approval granted by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University under the approval number XYFY2023-KL360-01.
Informed consent statement: Written informed consent was waived by the Ethics Committee the Affiliated Hospital of Xuzhou Medical University.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Data sharing statement: No additional data are available.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Su-Juan Fei, MD, Chief Physician, Professor, Department of Gastroenterology, The Affiliated Hospital of Xuzhou Medical University, No. 99 West Huaihai Road, Xuzhou 221002, Jiangsu Province, China. xyfyfeisj99@163.com
Received: October 16, 2024
Revised: January 25, 2025
Accepted: February 14, 2025
Published online: March 21, 2025
Processing time: 148 Days and 5.6 Hours

Abstract
BACKGROUND

Colorectal polyps are precancerous diseases of colorectal cancer. Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer. Endoscopic mucosal resection (EMR) is a common polypectomy procedure in clinical practice, but it has a high postoperative recurrence rate. Currently, there is no predictive model for the recurrence of colorectal polyps after EMR.

AIM

To construct and validate a machine learning (ML) model for predicting the risk of colorectal polyp recurrence one year after EMR.

METHODS

This study retrospectively collected data from 1694 patients at three medical centers in Xuzhou. Additionally, a total of 166 patients were collected to form a prospective validation set. Feature variable screening was conducted using univariate and multivariate logistic regression analyses, and five ML algorithms were used to construct the predictive models. The optimal models were evaluated based on different performance metrics. Decision curve analysis (DCA) and SHapley Additive exPlanation (SHAP) analysis were performed to assess clinical applicability and predictor importance.

RESULTS

Multivariate logistic regression analysis identified 8 independent risk factors for colorectal polyp recurrence one year after EMR (P < 0.05). Among the models, eXtreme Gradient Boosting (XGBoost) demonstrated the highest area under the curve (AUC) in the training set, internal validation set, and prospective validation set, with AUCs of 0.909 (95%CI: 0.89-0.92), 0.921 (95%CI: 0.90-0.94), and 0.963 (95%CI: 0.94-0.99), respectively. DCA indicated favorable clinical utility for the XGBoost model. SHAP analysis identified smoking history, family history, and age as the top three most important predictors in the model.

CONCLUSION

The XGBoost model has the best predictive performance and can assist clinicians in providing individualized colonoscopy follow-up recommendations.

Key Words: Colorectal polyps; Machine learning; Predictive model; Risk factors; SHapley Additive exPlanation

Core Tip: This study is the first to use machine learning methods to construct and validate a prediction model for one year recurrence of colorectal polyps after endoscopic mucosal resection. Key predictors included age, smoking, family history, diarrhea, hazard classification, Helicobacter pylori infection, number and size of polyps. According to receiver operating characteristic curves, sensitivity, specificity, accuracy, precision, and F1 scores, eXtreme Gradient Boosting model has the best performance. Based on this model, an online web calculator was built to help clinicians better distinguish high-risk groups and provide patients with personalized colonoscopy follow-up recommendations.



INTRODUCTION

Colorectal cancer (CRC) is the third most common cancer globally and the second leading cause of cancer-related deaths[1]. According to the latest cancer statistics[2], over 150000 new cases of CRC were diagnosed in the United States in 2022, with more than 50000 deaths. Colorectal polyps are precursors to CRC, with 60%-85% of sporadic CRCs evolving through the conventional adenoma-carcinoma pathway[3], while a smaller portion arises from the serrated or inflammatory pathways[4]. Therefore, colonoscopy screening and early endoscopic polypectomy are effective in preventing CRC by interrupting the polyp-to-cancer progression and reducing patient mortality[5,6].

Endoscopic mucosal resection (EMR) is one of the most commonly used methods for polyp removal, offering safety, efficiency, and cost-effectiveness for the excision of most colorectal lesions[7]. However, a major limitation of EMR is its relatively high recurrence rate. A meta-analysis of 33 studies reported an average recurrence rate of 15%, with rates as high as 50%[8]. As a result, patients require regular colonoscopic follow-up after EMR. According to current guidelines, a 3-year interval for colonoscopic surveillance is generally recommended after the removal of most adenomatous polyps[9-11]. However, these guidelines are primarily based on baseline characteristics of polyps, such as size, morphology, and pathology. In fact, a variety of factors influence polyp recurrence after removal. Previous studies have shown that patient age, sex, smoking, alcohol consumption, family history, BMI, and Helicobacter pylori (H. pylori) infection are associated with polyp recurrence and are considered risk factors for recurrence[12,13]. Therefore, some patients with a high risk of recurrence may require shorter surveillance intervals. In China, a large multicenter study found that the peak recurrence period after colorectal polyp removal occurs almost entirely within the first year, with a recurrence rate approaching 60%[14]. In addition, the surveillance intervals recommended by Chinese expert consensus for colonoscopy are significantly shorter than those in foreign guidelines. For example, the Expert consensus on management strategies for precancerous lesions and conditions of CRC in China[15] suggests that for most adenomas, one year after polypectomy can serve as the starting time for colonoscopic follow-up. Thus, first-year follow-ups may be more valuable than second- or third-year follow-ups.

Recently, machine learning (ML) has gained widespread attention in medicine. ML-based clinical models have demonstrated significant advantages in disease prediction, risk assessment, diagnostic assistance, and patient management[16-18]. Therefore, this study was carried out to develop a novel clinical predictive model using ML methods to explore the risk of colorectal polyp recurrence within one year post-EMR, so as to help stratify high-risk patients and provide individualized colonoscopy monitoring strategies.

MATERIALS AND METHODS
Patient source and ethics statement

This is a multicenter retrospective study with prospective validation. Data were collected from 1694 patients who underwent their first EMR for colorectal polyp removal in the Departments of Gastroenterology at the Affiliated Hospital of Xuzhou Medical University, Xuzhou Central Hospital, and Xuzhou New Health Geriatric Hospital between September 2018 and August 2023, with a one-year follow-up colonoscopy. Additionally, 166 patients treated at the Affiliated Hospital of Xuzhou Medical University between September 2023 and September 2024 were prospectively enrolled. The study was designed as per the Declaration of Helsinki and was conducted according to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guidelines[19], with ethical approval granted by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University under the approval number XYFY2023-KL360-01. Written informed consent was waived by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University.

Inclusion criteria and exclusion criteria

Inclusion criteria: (1) First diagnosis of colorectal polyps and completed EMR of all visible polyps; (2) Age ≥ 18, successful completion of gastrointestinal endoscopy, and a one-year follow-up colonoscopy after the procedure; (3) All resected polyps underwent pathological examination; (4) Adequate bowel preparation, with a Boston Bowel Preparation Scale score of 6-9; (5) Endoscopists had at least three years of experience in endoscopic treatment, with an annual EMR case volume exceeding 300; and (6) Complete clinical data, including patient medical records, endoscopy reports, laboratory tests, and pathology results.

Exclusion criteria: (1) Previous history of colorectal polyp resection or colorectal surgery; (2) Incomplete clinical data or missing follow-up information; (3) Poor bowel preparation, severely impairing observation and procedure, or failure to complete full colonoscopy due to inability to reach the cecum; (4) Patients who did not undergo follow-up colonoscopy one year after EMR; (5) Diagnosed with familial polyposis, inflammatory bowel disease, malignancies, hematologic diseases, or those who had undergone biologic therapy, chemotherapy, or radiotherapy; or (6) Patients with severe heart, liver, lung or kidney diseases, infections, pregnancy, or cachexia.

Research variables

General data: (1) Patient demographic information: Sex, age, body mass index (BMI); and (2) Clinical symptoms and medical history: Diarrhea, constipation, hematochezia, hypertension, diabetes, coronary heart disease (CHD), cigarette preference, alcohol preference, hyperlipidemia, family history, and H. pylori infection.

Laboratory tests: Serum levels of uric acid (UA), total bilirubin (TBIL), total bile acid (TBA), hypersensitive C-reactive protein (hsCRP), carcinoembryonic antigen, and carbohydrate antigens (CA724, CA199, CA242) were tested.

Endoscopy data: Complete records of bowel preparation quality, cecal intubation success, colorectal polyp location, size, number, endoscopic morphology, pathological findings, and the presence of gastric polyps were documented. High-quality endoscopic images were also included. The endoscopes used in this study were purchased from Olympus, Japan, models CF-H290I and CF-H290I, as well as from Fujifilm, Japan, models EC-601WM and EC-760R-VM.

Polyp-related definitions

In this study, polyp locations were classified into those at proximal colon, distal colon, and whole colon (proximal and distal)[20]. Endoscopic classification was categorized according to the Japanese Yamada classification system[21] into Types I, II, III, and IV. Pathological types were classified into non-neoplastic and neoplastic polyps. Non-neoplastic polyps included inflammatory polyps, hyperplastic polyps, and hamartomatous polyps, while neoplastic polyps included tubular adenomas, villous adenomas, tubulovillous adenomas, sessile serrated lesions, and traditional serrated adenomas. Hazard classification was based on pathology. Non-progressive adenomas were defined as tubular adenomas < 10 mm in size without high-grade dysplasia. Progressive adenomas were characterized by adenoma size ≥ 10 mm, or tubulovillous/villous adenomas, or adenomas with high-grade dysplasia. Multiple polyps were defined as the presence of ≥ 2 polyps, with the largest polyp and highest pathological grade used for characterization.

Polyp outcome evaluation

Polyp recurrence was assessed through electronic medical records, the endoscopy workstation, and telephone follow-up to determine whether the patient underwent a colonoscopy one year after colorectal polyp removal. Recurrence was defined as the discovery of new polyps at the original site (local recurrence) or metachronous distant polyps in the colorectal region during follow-up colonoscopy[22]. The definitions, evaluation criteria, and data assignment standards for the feature variables in this study are detailed in Supplementary Table 1.

Modeling methods

Data preprocessing: Data cleaning and imputation were performed. In this study, some laboratory indicators (TBIL, TBA, hsCRP) had a small amount of missing data. The retrospective dataset included data from 1694 patients, with missing TBIL, TBA, and hsCRP values for 113, 292, and 332 patients, respectively, accounting for 6.67%, 17.24%, and 19.6%. Multiple imputation was performed on the missing data using the "mice" package in R software to create a complete dataset. The missing data situation is illustrated with a bar chart (Supplementary Figure 1).

Model construction and evaluation: The retrospective dataset was randomly divided into a training set and a validation set in a 7:3 ratio. The training set was used to develop the model, allowing it to learn data patterns and extract effective features, while the validation set was used to evaluate the model's performance and identify any overfitting challenges. Additionally, a prospective cohort was selected as a test set to assess the model's generalizability. Univariate and multivariate logistic regression (LR) analyses were used for feature variable screening in the training set. First, univariate regression analysis was performed on each feature variable independently, and variables with a P value of < 0.05 were further analyzed using multivariate regression. The P values of all selected variables were subjected to false discovery rate (FDR) correction to avoid false positives. After multivariate analysis, risk factors with a P value < 0.05 were identified as final feature variables. Considering the potential interactions between variables, and to improve the reliability of the model, we further conducted collinearity and correlation analysis. Five ML algorithms-LR, Decision Trees (DT), Random Forest (RF), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGBoost)-were employed to construct the predictive models. The performance of the models was assessed using receiver operating characteristic (ROC) curve analysis and the area under the ROC curve (AUC). Sensitivity, specificity, accuracy, precision, and F1 scores were also calculated to further compare model performance. Decision curve analysis (DCA) was performed to evaluate the clinical utility of the models. Finally, an interactive and visual web-based calculator was developed using the Shiny framework.

Model feature interpretation: SHapley Additive exPlanations (SHAP) analysis was used to interpret the best-performing black-box model[23]. Feature importance was determined by the mean absolute SHAP value for each feature, and SHAP values for each feature in all samples were plotted to understand overall patterns and impact across the dataset.

Statistical analysis

All data analyses and graphical plots for model construction were completed using R v4.3.2. To minimize bias caused by sample exclusion, the percentage of missing values was calculated for each continuous variable. For variables with less than 20% missing data, multiple imputation based on RFs was applied to predict the missing values using the R package “mice”. Five imputed results were generated, and the average of these five predictions was used as the final value. The ML models were constructed using “Rpart”, “randomForest”, “e1071” and “xgboost” R packages respectively. Statistical analysis was performed using the R package “tableone”. Categorical data were presented as numbers and percentages, while normally distributed continuous variables were expressed as mean ± SD. Non-normally distributed continuous variables were expressed as median and interquartile range. To compare differences between groups, the χ2 test was used for categorical variables, while the t-test or Wilcoxon rank-sum test was employed for continuous variables, depending on normality. P < 0.05 indicated statistically significant.

RESULTS
Baseline characteristics

Based on the inclusion and exclusion criteria, a total of 1694 patients were included in the retrospective study, with 742 patients (43.8%) in the non-recurrence group and 952 patients (56.2%) in the recurrence group. Among the patients, 60.6% were males and 39.4% were females. A total of 1071 non-older patients (< 60 years) and 623 older patients (≥ 60 years) were included. Patients were randomized into a training set (n = 1186) and a validation set (n = 508) in a 7:3 ratio. The median age of patients in the training set was 56.00 (49.00, 63.00), and the median age of patients in the validation set was 56.50 (49.00, 64.00). Comparison of baseline characteristics between the two groups showed no significant differences for most variables, except for hypertension (P = 0.018) and concomitant gastric polyps (P = 0.022), suggesting that the clinical data of the two cohorts were overall balanced. Additionally, 166 patients were selected as a prospective cohort to validate the model. Due to the limitation of sample size, there were statistically significant differences between the external validation set and the training set in terms of variables such as polyp location (P = 0.023), size (P < 0.001), endoscopic classification (P < 0.001), hazard classification (P = 0.002), concomitant gastric polyp (P = 0.001), and TBIL (P = 0.001). The demographic and clinical characteristics of the patients are provided in Table 1. The study design is illustrated in Figure 1.

Figure 1
Figure 1 Flowchart of study design route. EMR: Endoscopic mucosal resection; LR: Logistic Regression; DT: Decision Trees; RF: Random Forest; SVM: Support Vector Machine; XGBoost: EXtreme Gradient Boosting; ROC: Receiver operating characteristic; DCA: Decision curve analysis; SHAP: SHapley Additive exPlanations.
Table 1 Baseline demographic and clinicopathological characteristics of all patients, n (%).
Variables
Training set (n = 1186)
Validation set (n = 508)
Prospective set (n = 166)
P value
Gender
    Female474 (40.0)194 (38.2)67 (40.4)0.769
    Male712 (60.0)314 (61.8)99 (59.6)
Age, median (IQR)56.00 (49.00, 63.00)56.50 (49.00, 64.00)56.00 (50.00, 63.00)0.948
BMI, median (IQR)24.20 (22.31, 26.40)24.32 (22.49, 26.47)24.22 (22.02, 26.67)0.445
Hypertension
    No924 (77.9)368 (72.4)128 (77.1)0.051
    Yes262 (22.1)140 (27.6)38 (22.9)
Diabetes
    No1057 (89.1)448 (88.2)143 (86.1)0.497
    Yes129 (10.9)60 (11.8)23 (13.9)
CHD
    No1117 (94.2)468 (92.1)153 (92.2)0.231
    Yes69 (5.8)40 (7.9)13 (7.8)
Family history
    No1088 (91.7)479 (94.3)149 (89.8)0.089
    Yes98 (8.3)29 (5.7)17 (10.2)
Cigarette preference
    No972 (82.0)413 (81.3)137 (82.5)0.921
    Yes214 (18.0)95 (18.7)29 (17.5)
Alcohol preference
    No971 (81.9)422 (83.1)133 (80.1)0.669
    Yes215 (18.1)86 (16.9)33 (19.9)
Constipation
    No1086 (91.6)470 (92.5)146 (88.0)0.185
    Yes100 (8.4)38 (7.5)20 (12.0)
Diarrhea
    No918 (77.4)399 (78.5)128 (77.1)0.860
    Yes268 (22.6)109 (21.5)38 (22.9)
Hemafecia
    No1112 (93.8)479 (94.3)160 (96.4)0.397
    Yes74 (6.2)29 (5.7)6 (3.6)
Anatomical location
    Proximal colon261 (22.0)117 (23.0)54 (32.5)0.023
    Distal colon505 (42.6)208 (40.9)69 (41.6)
    Total colon420 (35.4)183 (36.0)43 (25.9)
Number of polyps
    < 3672 (56.7)276 (54.3)106 (63.9)0.099
    ≥ 3514 (43.3)232 (45.7)60 (36.1)
Number of adenomas
    0392 (33.1)150 (29.5)57 (34.3)0.542
    1-2545 (46.0)237 (46.7)74 (44.6)
    ≥ 3249 (21.0)121 (23.8)35 (21.1)
Size
    < 0.5400 (33.7)157 (30.9)118 (71.1)< 0.001
    0.5-1627 (52.9)266 (52.4)37 (22.3)
    > 1159 (13.4)85 (16.7)11 (6.6)
Endoscopic classification
    I512 (43.2)201 (39.6)99 (59.6)< 0.001
    II442 (37.3)191 (37.6)54 (32.5)
    III-IV232 (19.6)116 (22.8)13 (7.8)
Hazard classification
    Non-neoplastic polyps394 (33.2)152 (29.9)72 (43.4)0.002
    Non-progressive adenoma564 (47.6)250 (49.2)79 (47.6)
    Progressive adenoma228 (19.2)106 (20.9)15 (9.0)
Concomitant gastric polyp
    No761 (64.2)356 (70.1)127 (76.5)0.001
    Yes425 (35.8)152 (29.9)39 (23.5)
H. pylori
    No699 (58.9)308 (60.6)104 (62.7)0.586
    Yes487 (41.1)200 (39.4)62 (37.3)
Hyperlipidemia
    No810 (68.3)341 (67.1)117 (70.5)0.714
    Yes376 (31.7)167 (32.9)49 (29.5)
Uric acid levels
    Normal1105 (93.2)475 (93.5)153 (92.2)0.839
    Elevated81 (6.8)33 (6.5)13 (7.8)
TBIL, median (IQR)11.90 (9.30, 15.78)12.00 (9.30, 15.62)9.90 (5.90, 13.97)< 0.001
TBA, median (IQR)3.00 (1.80, 4.68)3.30 (1.90, 4.93)2.90 (1.80, 5.60)0.149
hsCRP, median (IQR)0.60 (0.50, 1.30)0.60 (0.50, 1.30)0.50 (0.50, 1.40)0.775
CEA
    Normal1150 (97.0)496 (97.6)161 (97.0)0.741
    Elevated36 (3.0)12 (2.4)5 (3.0)
CA724
    Normal1137 (95.9)487 (95.9)155 (93.4)0.323
    Elevated49 (4.1)21 (4.1)11 (6.6)
CA199
    Normal1160 (97.8)499 (98.2)163 (98.2)0.833
    Elevated26 (2.2)9 (1.8)3 (1.8)
CA242
    Normal1154 (97.3)491 (96.7)159 (95.8)0.492
    Elevated32 (2.7)17 (3.3)7 (4.2)
Feature screening of risk factors for colorectal polyp recurrence

Univariate analysis showed that 17 variables were associated with polyp recurrence one year after EMR (P < 0.05), including sex, age, BMI, hypertension, CHD, cigarette preference, family history, diarrhea, polyp location, polyp number, polyp size, number of adenomas, endoscopic classification, hazard classification, H. pylori infection, hyperlipidemia, and serum UA levels. The P values of these variables were subjected to the FDR correction (Supplementary Table 2). Further inclusion of the univariate results into multivariate analysis revealed that eight variables were independent predictors of colorectal polyp recurrence one year after EMR. These variables included age [odds ratio (OR) = 1.05; 95%CI: 1.03-1.06], family history (none as reference, present: OR = 11.34; 95%CI: 5.09-25.26), cigarette preference (none as reference, present: OR = 3.92; 95%CI: 2.50-6.14), diarrhea (none as reference, present: OR = 1.42; 95%CI: 1.02-1.99), polyp size (< 0.5 cm as reference, 0.5-1 cm: OR = 2.05; 95%CI: 1.50-2.80; > 1 cm: OR = 3.98; 95%CI: 2.02-7.86), number of polyps (< 3 as reference, ≥ 3: OR = 1.54; 95%CI: 1.07-2.21), H. pylori infection (none as reference, present: OR = 1.82; 95%CI: 1.37-2.42), and hazard classification (non-neoplastic polyps as reference, non-progressive adenoma: OR = 4.48; 95%CI: 1.02-19.73; progressive adenoma: OR = 5.29; 95%CI: 1.07-26.20). Details can be found in Table 2. We calculated the Variance inflation factor (VIF) for all candidate features. We found that all variables had VIF values < 5, confirming no severe multicollinearity existed among the variables (Supplementary Table 3). We also examined the correlation coefficients. The results showed that Correlation coefficients r were all < 0.7, suggesting that there is no strong correlation between variables (Supplementary Figure 2).

Table 2 Univariate and multivariate logistic regression analysis of colorectal polyp recurrence 1 year after Endoscopic mucosal resection.
Variables
Univariable analysis
Multivariable analysis
OR (95%CI)
P value
OR (95%CI)
P value
Age1.04 (1.03-1.05)< 0.001a1.05 (1.03-1.06)< 0.001a
Gender (%)
    FemaleReference-Reference-
    Male1.73 (1.37-2.19)< 0.001a0.94 (0.69-1.27)0.684
BMI1.07 (1.03-1.11)< 0.001a1.05 (1-1.1)0.056
Hypertension (%)
    NoReference-Reference-
    Yes1.75 (1.31-2.33)< 0.001a1.07 (0.75-1.52)0.722
Diabetes (%)
    NoReference-Reference-
    Yes1.42 (0.98-2.08)0.067--
CHD (%)
    NoReference-Reference-
    Yes2.33 (1.34-4.04)0.003a1.41 (0.73-2.72)0.300
Family history (%)
    NoReference-Reference-
    Yes10.07 (4.84-20.96)< 0.001a11.34 (5.09-25.26)< 0.001a
Cigarette preference (%)
    NoReference-Reference-
    Yes5.14 (3.5-7.54)< 0.001a3.92 (2.50-6.14)< 0.001a
Alcohol preference (%)
    NoReference-Reference-
    Yes1.34 (0.99-1.82)0.056--
Constipation (%)
    NoReference-Reference-
    Yes1.05 (0.69-1.58)0.831--
Diarrhea (%)
    NoReference-Reference-
    Yes1.40 (1.06-1.85)0.018a1.42 (1.02-1.99)0.038a
Hemafecia (%)
    NoReference-Reference-
    Yes1.24 (0.76-2.00)0.389--
Anatomical location (%)
    Proximal colonReference-Reference-
    Distal colon1.08 (0.80-1.45)0.6301.09 (0.77-1.05)0.635
    Total colon2.16 (1.57-2.96)< 0.001a0.92 (0.60-1.40)0.683
Number of polyps (%)
    < 3Reference-Reference-
    ≥ 33.10 (2.43-3.96)< 0.001a1.54 (1.07-2.21)0.019a
Number of adenomas (%)
    0Reference-Reference-
    1-22.02 (1.55-2.63)< 0.001a0.44 (0.10-1.95)0.279
    ≥ 38.96 (6.00-13.38)< 0.001a1.01 (0.22-4.70)0.989
Size (%)
    < 0.5Reference-Reference-
    0.5-12.18 (1.69-2.81)< 0.001a2.05 (1.50-2.80)< 0.001a
    > 16.65 (4.25-10.43)< 0.001a3.98 (2.02-7.86)< 0.001a
Endoscopic classification (%)
    IReference-Reference-
    II1.40 (1.09-1.81)0.010a0.71 (0.50-1.02)0.066
    III-IV3.32 (2.35-4.68)< 0.001a0.60 (0.32-1.11)0.106
Hazard classification (%)
    Non-neoplastic polypsReference-Reference-
    Non-progressive adenoma2.47 (1.89-3.21)< 0.001a4.48 (1.02-19.73)0.048a
    Progressive adenoma6.07 (4.17-8.84)< 0.001a5.29 (1.07-26.20)0.041a
Concomitant gastric polyp (%)
    NoReference-Reference-
    Yes0.90 (0.71-1.15)0.397--
H. pylori (%)
    NoReference-Reference-
    Yes1.89 (1.49-2.4)< 0.001a1.82 (1.37-2.42)< 0.001a
Hyperlipidemia (%)
    NoReference-Reference-
    Yes1.48 (1.15-1.90)0.002a1.20 (0.87-1.64)0.263
Uric acid levels (%)
    NormalReference-Reference-
    Elevated2.08 (1.26-3.41)0.004a1.15 (0.63-2.09)0.646
    TBIL1.01 (1.00-1.04)0.259--
    TBA1.02 (0.99-1.05)0.298--
    hsCRP1.03 (0.98-1.06)0.164--
CEA (%)
    NormalReference-Reference-
    Elevated1.10 (0.56-2.16)0.773--
CA724 (%)
    NormalReference-Reference-
    Elevated0.96 (0.54-1.72)0.899--
CA199 (%)
    NormalReference-Reference-
    Elevated0.92 (0.42-2.00)0.824--
CA242 (%)
    NormalReference-Reference-
    Elevated0.60 (0.30-1.22)0.162--
Construction and validation of the ML prediction model

The eight feature variables mentioned above were incorporated into the model construction. In the training set, the ROC-AUC values of the prediction models built using five ML algorithms are shown in Figure 2A: LR (AUC = 0.803, 95%CI: 0.779-0.828), DT (AUC = 0.754, 95%CI: 0.728-0.781), RF (AUC = 0.861, 95%CI: 0.84-0.881), SVM (AUC = 0.808, 95%CI: 0.784-0.832), and XGBoost (AUC = 0.909, 95%CI: 0.893-0.925). The models were validated in the validation set, with the results displayed in Figure 2B. The AUC values for the five ML models were as follows: LR (AUC = 0.81, 95%CI: 0.774-0.847), DT (AUC = 0.799, 95%CI: 0.761-0.837), RF (AUC = 0.902, 95%CI: 0.877-0.928), SVM (AUC = 0.819, 95%CI: 0.784-0.855), and XGBoost (AUC = 0.921, 95%CI: 0.898-0.944). In the prospective set (Figure 2C), the AUC values for each model were: LR (AUC = 0.779, 95%CI: 0.708-0.851), DT (AUC = 0.812, 95%CI: 0.746-0.877), RF (AUC = 0.943, 95%CI: 0.912-0.974), SVM (AUC = 0.791, 95%CI: 0.722-0.86), and XGBoost (AUC = 0.963, 95%CI: 0.938-0.988). Among the five ML algorithms, the XGBoost model performed the best.

Figure 2
Figure 2 Receiver operating characteristic curves of different models across various datasets. A: Training set; B: Validation set; C: Prospective set. LR: Logistic Regression; DT: Decision Trees; RF: Random Forest; SVM: Support Vector Machine; XGBoost: EXtreme Gradient Boosting; AUC: Area under the curve.
Evaluation of the ML prediction models

To further assess model performance, we calculated the sensitivity, specificity, accuracy, precision, and F1-score for each model based on the confusion matrix results. These metrics, combined with the AUC values, formed a comprehensive evaluation system (Table 3). A comparative analysis indicated that the XGBoost model had the best overall performance.

Table 3 Comparison of the performance of different models in training set, validation set and prospective set.
Model
AUC
Sensitivity
Specificity
Accuracy
Precision
F1 score
Training setLR0.8030.7330.7280.7310.7740.753
DT0.7540.8060.6130.7210.7260.764
RF0.8610.7270.8350.7750.8490.784
SVM0.8080.7200.7530.7340.7880.752
XGB0.9090.7560.9040.8200.9070.824
Validation setLR0.8090.7430.6860.7190.7560.750
DT0.7990.7850.7230.7580.7880.786
RF0.9020.7500.9180.8230.9230.828
SVM0.8190.7430.7050.7260.7670.755
XGB0.9210.7880.9140.8430.9230.850
Prospective setLR0.7790.5680.8470.7110.7800.657
DT0.8120.7650.7290.7470.7650.747
RF0.9430.6670.9880.8310.9820.794
SVM0.7910.6170.8240.7230.7690.685
XGB0.9630.8400.9410.8920.9320.883

The clinical applicability of the XGBoost model was assessed using DCA. This method integrates patients’ benefits and physicians’ preferences to assist doctors in making optimal clinical decisions tailored to patient needs[24,25]. The X-axis represents threshold probability, and the Y-axis represents net benefit. The results (Figure 3) indicated that decisions made using this model yield a greater net clinical benefit compared to the "review all" or "review none" strategies, suggesting that the model has strong clinical applicability. The "review all" strategy, which involves examining every patient regardless of their risk, often leads to high costs and unnecessary interventions. Conversely, the "review none" strategy, which avoids any interventions, risks overlooking patients who could benefit from treatment.

Figure 3
Figure 3 Decision curves of the eXtreme Gradient Boosting model across various datasets. A: Training set; B: Validation set; C: Prospective set. XGBoost: EXtreme Gradient Boosting.
Interpretability analysis of the XGBoost model

We used SHAP analysis to quantify each feature's contribution to the model, enhancing interpretability. As shown in Figure 4A, the importance ranking of the feature variables in the XGBoost model, from highest to lowest, is as follows: Cigarette preference, family history, age, number of polyps ≥ 3, progressive adenoma, diarrhea, H. pylori infection, polyp size > 1 cm, non-progressive adenoma, and polyp size 0.5-1 cm. Additionally, based on each sample's SHAP values, we used SHAP beeswarm plots to visualize the contribution of each sample to the prediction results (Figure 4B).

Figure 4
Figure 4 SHapley Additive exPlanations analysis of the XGBoost model. A: SHapley Additive exPlanations (SHAP) summary bar plot, where features are ranked in descending order according to the mean absolute SHAP value; B: SHAP beeswarm plot, displaying the SHAP value of each feature for every sample in the dataset. Each row represents a feature, and each dot corresponds to a sample. The color of the dots indicates feature values, with yellow representing high values and purple representing low values. SHAP: SHapley Additive exPlanations.
Construction of the online web calculator

Based on the results of the XGBoost model, an online web calculator was developed (https://webcalculatorsyh.shinyapps.io/XGBoost/). By adjusting the values or conditions of a patient's clinical characteristics and clicking the "Predict" button, the calculator automatically estimates the one-year recurrence risk of colorectal polyps after EMR. For example, as shown in Figure 5, a 70-year-old patient with a smoking habit and family history, polyps > 1 cm in size, more than 3 polyps, and the highest-grade polyp pathology classified as advanced adenoma, has a 94.04% probability of colorectal polyp recurrence 1 year after EMR. The physician should strongly recommend that this patient undergo a follow-up colonoscopy 1 year after the procedure.

Figure 5
Figure 5 Online web calculator for predicting colorectal polyp recurrence 1 year after Endoscopic mucosal resection. EMR: Endoscopic mucosal resection; XGBoost: EXtreme Gradient Boosting; CRC: Colorectal cancer.
DISCUSSION

This is a multicenter retrospective study with prospective validation. A clinical prediction model was constructed using ML methods to evaluate the risk of colorectal polyp recurrence one year after EMR. Eight key features were selected for model development and validation, with the XGBoost model demonstrating the best predictive performance. SHAP analysis was employed to calculate the contribution of each feature to the model, and clinical applicability was assessed by DCA. Finally, an online web calculator based on the XGBoost model was developed to assist clinicians in formulating individualized colonoscopy surveillance plans based on each patient's polyp recurrence risk.

From a demographic perspective, this study identified age, family history, and Cigarette preference as independent predictors of colorectal polyp recurrence. Previous studies have suggested that as patients age, the incidence, recurrence, and malignancy rates of colorectal polyps gradually increase, which may be attributed to higher rates of genetic mutations, declining immune function, and chronic inflammation in the intestines of older individuals[26,27]. Some studies[28] have also pointed out that smoking increases the risk of adenoma recurrence after polypectomy. Davenport et al[29] found that smoking status, duration, and intensity are associated with an increased risk of different types of polyps, including sessile serrated polyps, conventional adenomas, and hyperplastic polyps. This may be due to the multi-directional and multi-site harmful effects of cigarette components, which cause irreversible damage to the genetic material of colorectal cells. For example, smoking can lead to abnormal methylation of CpG islands and mutations in genes such as c-MYC, KRAS, and BRAF, promoting the malignant transformation of colorectal cells[30]. We found that patients with a family history of CRC or polyps were more prone to polyp recurrence, highlighting the significant role of genetic susceptibility in the development of colorectal polyps and tumors[31]. A large-scale nationwide case-control study from Sweden reported that the risk of polyp development increases by 40% in patients with first-degree relatives who have colorectal polyps, and this risk escalates with the increasing number of affected first-degree relatives and the younger age at diagnosis[32]. Additionally, a study by Samadder et al[33] demonstrated that having a first-degree relative with CRC increases the risk of developing adenomatous polyps [hazard rate ratio (HRR), 1.82] and advanced villous adenomas (HRR, 2.43).

From clinical data, diarrhea, H. pylori infection, polyp size, polyp number, and hazard classification were identified as independent predictors of colorectal polyp recurrence. Our study found that patients with diarrhea have a higher risk of short-term recurrence after EMR, which may be related to gut microbiota disturbance. Gut microbiota disturbance plays a critical role in the adenoma-carcinoma sequence. Studies have shown that, compared to healthy individuals, patients with adenomatous polyps exhibit significantly higher abundances of microbial species such as Fusobacterium mortiferum, Fusobacterium nucleatum, Ruminococcus gnavus (R. gnavus), and Bacteroides fragilis[34,35]. Conversely, the levels of the genera Bifidobacterium, Faecalibacterium, and Blautia were found to be reduced in patients with adenomatous polyps[36]. Similar findings have been observed in patients with diarrhea-predominant irritable bowel syndrome, where pathogenic bacteria such as R. gnavus are increased[37], while beneficial bacteria like Bifidobacterium are reduced[38]. Our study also identified a higher recurrence risk in patients infected with H. pylori. Previous studies have established H. pylori as a causative agent of chronic gastritis, gastric polyps, and gastric cancer[39], but its relationship with colorectal polyps remains inconclusive. A meta-analysis by Lu et al[40] found that H. pylori infection is independently associated with adenomatous polyps, advanced adenomatous polyps, and hyperplastic polyps, suggesting that H. pylori infection is a risk factor for colorectal polyps. The potential mechanisms may include hypergastrinemia induced by H. pylori, which promotes the proliferation of the colorectal mucosa, and direct stimulation by H. pylori, leading to dysbiosis of the gut microbiota and the development of colorectal lesions[40,41]. Compared with non-neoplastic polyps, progressive adenomas exhibit a notably higher risk of recurrence. These polyps show more dysplasia and atypia histologically, with a higher proportion of villous components, making them more prone to progression[42]. Previous studies have also reported that patients with adenomas at their initial colonoscopy are more likely to experience adenoma recurrence[4,43]. Consistent with our findings, studies have shown that patients with multiple polyps (≥ 3) had an increased risk of recurrence[44]. This could be partly due to genetic factors in these patients and partly because multiple polyps are more difficult to completely remove during surgery, increasing the likelihood of missing polyps[45]. Additionally, polyp size is also a factor influencing recurrence[46], with our study showing that polyps larger than 1 cm in diameter had a higher recurrence risk, consistent with the findings of Murakami et al[47]. The study by Martínez et al[48] also confirmed the correlation between polyp number and size with recurrence, particularly showing that patients with five or more baseline adenomas or adenomas ≥ 2 cm in diameter had a higher risk of developing metachronous advanced adenomas. Although in this study, polyp location was not an independent predictor of recurrence, existing research has shown that both the left and right colon are associated with polyp recurrence[49,50]. However, these findings are inconsistent, and no definitive conclusions have been reached. Further original research is needed for validation in the future.

From the perspective of model construction, with the advancement of precision medicine, clinical prediction models serve as quantitative tools for assessing risk and benefit, providing personalized guidance for patients, physicians, and healthcare decision-makers. As a result, these models are increasingly applied in clinical practice[51]. In previous studies, several researchers have developed various clinical prediction models aimed at evaluating the occurrence and recurrence risk of colorectal polyps. For instance, Huang et al[52] constructed a nomogram model based on risk factors associated with colorectal polyps. It is used to predict the likelihood of polyp occurrence with an AUC of 0.747 (95%CI: 0.692-0.801). He et al[53] developed a nomogram based on the neutrophil-to-lymphocyte ratio and fibrinogen-to-lymphocyte ratio to predict colorectal adenoma recurrence, with AUCs of 0.846 and 0.841 in the training and validation sets, respectively, demonstrating favorable clinical applicability. However, most of these models are based on traditional linear regression analysis, they are not ideal for handling nonlinear data and present several limitations in real-world clinical settings[54]. In contrast, ML algorithms offer unique advantages in processing high-dimensional data with multiple variables and features. Therefore, in this study, we selected ML to build a prediction model for post-EMR polyp recurrence. Among the five ML algorithms tested, XGBoost performed the best, with AUC values of 0.909 (95%CI: 0.89-0.92) and 0.921 (95%CI: 0.90-0.94) in the training and validation sets. Additionally, we validated the model using a prospective cohort, where the XGBoost model again showed the best performance, with an AUC of 0.963 (95%CI: 0.94-0.99), along with relatively high sensitivity and specificity. Despite the higher computational complexity and the trade-off in relative interpretability compared to the LR model, XGBoost was selected in our study due to its superior predictive performance and its ability to handle complex feature interactions. To facilitate individualized decision-making by physicians, we transformed the results of the XGBoost model into an online web calculator. By inputting patient-specific characteristics and clinical indicators, the tool quickly provides a recurrence risk percentage, helping clinicians develop tailored follow-up strategies.

From the perspective of model interpretability, the lack of transparency in ML, often referred to as its "black box" nature, has been a major challenge hindering its broader clinical adoption. This is because the decision boundaries in ML models are complex and often lack the transparency necessary for researchers to fully understand how the model extracts information from the data and makes decisions[55]. To address this issue, we introduced SHAP values to help explain the model's decision-making logic and prediction process. The core concept of SHAP is to compute the marginal contribution of each feature to the model's output for each sample, offering insights into both global and local explanations of the "black box" model[56]. The key advantage of this approach is that it visually demonstrates the actual contribution of each feature to the model's decision-making process and whether that contribution has a positive or negative impact[57]. In this study, the SHAP bar plot visually demonstrated the importance ranking of the eight influencing factors in the model, with cigarette preference, family history, and age being the top three contributors. This helps us understand the contribution of each feature to the prediction results on a global level. The SHAP beeswarm plot offers a more detailed depiction. Each point on the plot represents a sample, and the color intensity indicates the SHAP value for that sample. This visualization not only highlights the relative importance of each feature vertically but also illustrates how each feature impacts the prediction across all samples, enabling a horizontal comparison of their influence. Additionally, we used DCA to evaluate the clinical applicability of the model. The DCA results demonstrate that the XGBoost model strikes a balance between "review all" and "review none" strategies. By predicting patient outcomes with high accuracy, the model enables doctors to identify those who are most likely to benefit from interventions, thereby optimizing resource allocation and patient care. This translates into a significant net clinical benefit, reflecting both the improved health outcomes for patients and the cost-effectiveness of the interventions. For example, in the study by Tong et al[58], within the 0%-30% threshold range, the decision to delay extubation after thoracoscopic lung cancer surgery was clearly beneficial. By comparing threshold probabilities with net benefits, we found that the model demonstrated favorable net benefits, which can help guide clinicians in selecting the optimal clinical strategy.

Inevitably, this study has several limitations. First, it is unusual to observe that the model performs better in the prospective validation set than in the training set. This may be due to the small number of samples in the prospective validation set, the similarity of some features to those in the training set, and the characteristics of the model itself. Thus, further additional prospective data are needed to validate the model and ensure its generalizability and external applicability. Second, despite collecting comprehensive clinicopathological data based on current research, we were unable to include all potential risk factors that might influence polyp recurrence. For example, individual dietary preferences, lifestyle factors, nonsteroidal anti-inflammatory drug use, and biliary diseases could also be related to the occurrence of colorectal polyps. However, due to the difficulty in quantifying these factors and the lack of standardized assessment criteria, they were not included in our analysis.

CONCLUSION

This study is the first to develop ML models to predict the recurrence of colorectal polyps one year after EMR and to identify associated risk factors. Among these models, the XGBoost model performed the best. Additionally, we developed an online web calculator based on the XGBoost predictions, which can help clinicians quickly calculate a patient's recurrence risk, facilitating the joint development of appropriate and accurate follow-up plans between clinicians and patients.

ACKNOWLEDGEMENTS

We sincerely thank all the staff and collaborators who participated in this study. In addition, we acknowledge the contributions of editors and reviewers for their constructive feedback and suggestions on earlier versions of this manuscript.

Footnotes

Provenance and peer review: Unsolicited article; Externally peer reviewed.

Peer-review model: Single blind

Corresponding Author's Membership in Professional Societies: Member of the Ninth Internal Medicine Branch Committee of Jiangsu Medical Association; Member of Psychosomatic Disease Collaboration Group, Chinese Society of Gastroenterology; Chairman of the Digestive and Psychosomatic Committee of Jiangsu Province; President-designate, Branch of Physicians of Jiangsu Medical Association; Consultant of Gastrointestinal Motility Group of Jiangsu Gastroenterology Society.

Specialty type: Gastroenterology and hepatology

Country of origin: China

Peer-review report’s classification

Scientific Quality: Grade A, Grade B, Grade B, Grade C

Novelty: Grade B, Grade B, Grade B, Grade B

Creativity or Innovation: Grade B, Grade B, Grade B, Grade C

Scientific Significance: Grade A, Grade A, Grade B, Grade C

P-Reviewer: Dai Z; Hanada E; Ling YW S-Editor: Li L L-Editor: A P-Editor: Wang WB

References
1.  Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71:209-249.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 50630]  [Cited by in RCA: 60091]  [Article Influence: 15022.8]  [Reference Citation Analysis (171)]
2.  Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin. 2022;72:7-33.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 4235]  [Cited by in RCA: 10757]  [Article Influence: 3585.7]  [Reference Citation Analysis (4)]
3.  Nakanishi Y, Diaz-Meco MT, Moscat J. Serrated Colorectal Cancer: The Road Less Travelled? Trends Cancer. 2019;5:742-754.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 17]  [Cited by in RCA: 38]  [Article Influence: 6.3]  [Reference Citation Analysis (1)]
4.  Li J, Ma X, Chakravarti D, Shalapour S, DePinho RA. Genetic and biological hallmarks of colorectal cancer. Genes Dev. 2021;35:787-820.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 20]  [Cited by in RCA: 271]  [Article Influence: 67.8]  [Reference Citation Analysis (0)]
5.  Dekker E, Rex DK. Advances in CRC Prevention: Screening and Surveillance. Gastroenterology. 2018;154:1970-1984.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 79]  [Cited by in RCA: 99]  [Article Influence: 14.1]  [Reference Citation Analysis (0)]
6.  Sung JJY, Chiu HM, Lieberman D, Kuipers EJ, Rutter MD, Macrae F, Yeoh KG, Ang TL, Chong VH, John S, Li J, Wu K, Ng SSM, Makharia GK, Abdullah M, Kobayashi N, Sekiguchi M, Byeon JS, Kim HS, Parry S, Cabral-Prodigalidad PAI, Wu DC, Khomvilai S, Lui RN, Wong S, Lin YM, Dekker E. Third Asia-Pacific consensus recommendations on colorectal cancer screening and postpolypectomy surveillance. Gut. 2022;71:2152-2166.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in RCA: 30]  [Article Influence: 10.0]  [Reference Citation Analysis (1)]
7.  Ferlitsch M, Hassan C, Bisschops R, Bhandari P, Dinis-Ribeiro M, Risio M, Paspatis GA, Moss A, Libânio D, Lorenzo-Zúñiga V, Voiosu AM, Rutter MD, Pellisé M, Moons LMG, Probst A, Awadie H, Amato A, Takeuchi Y, Repici A, Rahmi G, Koecklin HU, Albéniz E, Rockenbauer LM, Waldmann E, Messmann H, Triantafyllou K, Jover R, Gralnek IM, Dekker E, Bourke MJ. Colorectal polypectomy and endoscopic mucosal resection: European Society of Gastrointestinal Endoscopy (ESGE) Guideline - Update 2024. Endoscopy. 2024;56:516-545.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 7]  [Reference Citation Analysis (0)]
8.  Belderbos TD, Leenders M, Moons LM, Siersema PD. Local recurrence after endoscopic mucosal resection of nonpedunculated colorectal lesions: systematic review and meta-analysis. Endoscopy. 2014;46:388-402.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 200]  [Cited by in RCA: 260]  [Article Influence: 23.6]  [Reference Citation Analysis (2)]
9.  Gupta S, Lieberman D, Anderson JC, Burke CA, Dominitz JA, Kaltenbach T, Robertson DJ, Shaukat A, Syngal S, Rex DK. Recommendations for Follow-Up After Colonoscopy and Polypectomy: A Consensus Update by the US Multi-Society Task Force on Colorectal Cancer. Am J Gastroenterol. 2020;115:415-434.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 107]  [Cited by in RCA: 114]  [Article Influence: 22.8]  [Reference Citation Analysis (0)]
10.  Hassan C, Antonelli G, Dumonceau JM, Regula J, Bretthauer M, Chaussade S, Dekker E, Ferlitsch M, Gimeno-Garcia A, Jover R, Kalager M, Pellisé M, Pox C, Ricciardiello L, Rutter M, Helsingen LM, Bleijenberg A, Senore C, van Hooft JE, Dinis-Ribeiro M, Quintero E. Post-polypectomy colonoscopy surveillance: European Society of Gastrointestinal Endoscopy (ESGE) Guideline - Update 2020. Endoscopy. 2020;52:687-700.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 150]  [Cited by in RCA: 272]  [Article Influence: 54.4]  [Reference Citation Analysis (0)]
11.  Rutter MD, East J, Rees CJ, Cripps N, Docherty J, Dolwani S, Kaye PV, Monahan KJ, Novelli MR, Plumb A, Saunders BP, Thomas-Gibson S, Tolan DJM, Whyte S, Bonnington S, Scope A, Wong R, Hibbert B, Marsh J, Moores B, Cross A, Sharp L. British Society of Gastroenterology/Association of Coloproctology of Great Britain and Ireland/Public Health England post-polypectomy and post-colorectal cancer resection surveillance guidelines. Gut. 2020;69:201-223.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 235]  [Cited by in RCA: 229]  [Article Influence: 45.8]  [Reference Citation Analysis (0)]
12.  Hao Y, Wang Y, Qi M, He X, Zhu Y, Hong J. Risk Factors for Recurrent Colorectal Polyps. Gut Liver. 2020;14:399-411.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 12]  [Cited by in RCA: 28]  [Article Influence: 5.6]  [Reference Citation Analysis (0)]
13.  Lee S, Do YS, Lee HJ, Kim GU, Park HW, Chang HS, Choe J, Byeon JS, Lee JY. Gastrointestinal: Weight gain increases the risk of metachronous advanced colorectal neoplasm observed in post-polypectomy surveillance colonoscopy. J Gastroenterol Hepatol. 2024;39:47-54.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
14.  Gao QY, Chen HM, Sheng JQ, Zheng P, Yu CG, Jiang B, Fang JY. The first year follow-up after colorectal adenoma polypectomy is important: a multiple-center study in symptomatic hospital-based individuals in China. Front Med China. 2010;4:436-442.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 10]  [Cited by in RCA: 25]  [Article Influence: 1.7]  [Reference Citation Analysis (0)]
15.  Li ZS, Linghu EQ, Wang GQ, Bai Y; National Clinical Research Center for Digestive Diseases (Shanghai);  Chinese Society of Digestive Endoscopology;  Cancer Endoscopy Professional Committee of China Anti-Cancer Association;  Digestive Endoscopy Professional Committee of Chinese Endoscopist Association;  Endoscopic Health Management and Medical Examination Professional Committee of Chinese Endoscopist Association. [Expert consensus on management strategies for precancerous lesions and conditions of colorectal cancer in China]. Zhonghua Xiaohua Neijingzazhi. 2022;39:1-18.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]
16.  Adlung L, Cohen Y, Mor U, Elinav E. Machine learning in clinical decision making. Med. 2021;2:642-665.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 11]  [Cited by in RCA: 13]  [Article Influence: 3.3]  [Reference Citation Analysis (0)]
17.  Robinson-Weiss C, Patel J, Bizzo BC, Glazer DI, Bridge CP, Andriole KP, Dabiri B, Chin JK, Dreyer K, Kalpathy-Cramer J, Mayo-Smith WW. Machine Learning for Adrenal Gland Segmentation and Classification of Normal and Adrenal Masses at CT. Radiology. 2023;306:e220101.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 9]  [Cited by in RCA: 1]  [Article Influence: 0.5]  [Reference Citation Analysis (0)]
18.  Elemento O, Leslie C, Lundin J, Tourassi G. Artificial intelligence in cancer research, diagnosis and therapy. Nat Rev Cancer. 2021;21:747-752.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 35]  [Cited by in RCA: 94]  [Article Influence: 23.5]  [Reference Citation Analysis (0)]
19.  Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 1604]  [Cited by in RCA: 2127]  [Article Influence: 212.7]  [Reference Citation Analysis (0)]
20.  Liu L, Messer K, Baron JA, Lieberman DA, Jacobs ET, Cross AJ, Murphy G, Martinez ME, Gupta S. A prognostic model for advanced colorectal neoplasia recurrence. Cancer Causes Control. 2016;27:1175-1185.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 11]  [Cited by in RCA: 7]  [Article Influence: 0.8]  [Reference Citation Analysis (0)]
21.  Kjølhede T, Ølholm AM, Kaalby L, Kidholm K, Qvist N, Baatrup G. Diagnostic accuracy of capsule endoscopy compared with colonoscopy for polyp detection: systematic review and meta-analyses. Endoscopy. 2021;53:713-721.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 18]  [Cited by in RCA: 30]  [Article Influence: 7.5]  [Reference Citation Analysis (0)]
22.  Facciorusso A, Di Maso M, Serviddio G, Vendemiale G, Muscatiello N. Development and validation of a risk score for advanced colorectal adenoma recurrence after endoscopic resection. World J Gastroenterol. 2016;22:6049-6056.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in CrossRef: 11]  [Cited by in RCA: 18]  [Article Influence: 2.0]  [Reference Citation Analysis (0)]
23.  Ren Y, Zhang Y, Zhan J, Sun J, Luo J, Liao W, Cheng X. Machine learning for prediction of delirium in patients with extensive burns after surgery. CNS Neurosci Ther. 2023;29:2986-2997.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
24.  Vickers AJ, Holland F. Decision curve analysis to evaluate the clinical benefit of prediction models. Spine J. 2021;21:1643-1648.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 35]  [Cited by in RCA: 152]  [Article Influence: 38.0]  [Reference Citation Analysis (0)]
25.  Fitzgerald M, Saville BR, Lewis RJ. Decision curve analysis. JAMA. 2015;313:409-410.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 323]  [Cited by in RCA: 505]  [Article Influence: 50.5]  [Reference Citation Analysis (0)]
26.  Han X, Qian W, Liu Y, Zheng T, Su XJ, Zhang PP, Chen Y, Hu LH, Li ZS. Effects of age, sex and pathological type on the risk of multiple polyps: A Chinese teaching hospital study. J Dig Dis. 2020;21:505-511.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 2]  [Cited by in RCA: 2]  [Article Influence: 0.4]  [Reference Citation Analysis (0)]
27.  Sninsky JA, Shore BM, Lupu GV, Crockett SD. Risk Factors for Colorectal Polyps and Cancer. Gastrointest Endosc Clin N Am. 2022;32:195-213.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 6]  [Cited by in RCA: 77]  [Article Influence: 25.7]  [Reference Citation Analysis (0)]
28.  Reid ME, Marshall JR, Roe D, Lebowitz M, Alberts D, Battacharyya AK, Martinez ME. Smoking exposure as a risk factor for prevalent and recurrent colorectal adenomas. Cancer Epidemiol Biomarkers Prev. 2003;12:1006-1011.  [PubMed]  [DOI]  [Cited in This Article: ]
29.  Davenport JR, Su T, Zhao Z, Coleman HG, Smalley WE, Ness RM, Zheng W, Shrubsole MJ. Modifiable lifestyle factors associated with risk of sessile serrated polyps, conventional adenomas and hyperplastic polyps. Gut. 2018;67:456-465.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 57]  [Cited by in RCA: 63]  [Article Influence: 9.0]  [Reference Citation Analysis (0)]
30.  Mármol I, Sánchez-de-Diego C, Pradilla Dieste A, Cerrada E, Rodriguez Yoldi MJ. Colorectal Carcinoma: A General Overview and Future Perspectives in Colorectal Cancer. Int J Mol Sci. 2017;18:197.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 520]  [Cited by in RCA: 857]  [Article Influence: 107.1]  [Reference Citation Analysis (1)]
31.  Kastrinos F, Samadder NJ, Burt RW. Use of Family History and Genetic Testing to Determine Risk of Colorectal Cancer. Gastroenterology. 2020;158:389-403.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 39]  [Cited by in RCA: 44]  [Article Influence: 8.8]  [Reference Citation Analysis (0)]
32.  Song M, Emilsson L, Roelstraete B, Ludvigsson JF. Risk of colorectal cancer in first degree relatives of patients with colorectal polyps: nationwide case-control study in Sweden. BMJ. 2021;373:n877.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 14]  [Cited by in RCA: 24]  [Article Influence: 6.0]  [Reference Citation Analysis (0)]
33.  Samadder NJ, Curtin K, Tuohy TM, Rowe KG, Mineau GP, Smith KR, Pimentel R, Wong J, Boucher K, Burt RW. Increased risk of colorectal neoplasia among family members of patients with colorectal cancer: a population-based study in Utah. Gastroenterology. 2014;147:814-821.e5; quiz e15.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 57]  [Cited by in RCA: 59]  [Article Influence: 5.4]  [Reference Citation Analysis (0)]
34.  Liang S, Mao Y, Liao M, Xu Y, Chen Y, Huang X, Wei C, Wu C, Wang Q, Pan X, Tang W. Gut microbiome associated with APC gene mutation in patients with intestinal adenomatous polyps. Int J Biol Sci. 2020;16:135-146.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 24]  [Cited by in RCA: 45]  [Article Influence: 9.0]  [Reference Citation Analysis (0)]
35.  Kordahi MC, Stanaway IB, Avril M, Chac D, Blanc MP, Ross B, Diener C, Jain S, McCleary P, Parker A, Friedman V, Huang J, Burke W, Gibbons SM, Willis AD, Darveau RP, Grady WM, Ko CW, DePaolo RW. Genomic and functional characterization of a mucosal symbiont involved in early-stage colorectal cancer. Cell Host Microbe. 2021;29:1589-1598.e6.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 6]  [Cited by in RCA: 56]  [Article Influence: 14.0]  [Reference Citation Analysis (0)]
36.  Valciukiene J, Strupas K, Poskus T. Tissue vs. Fecal-Derived Bacterial Dysbiosis in Precancerous Colorectal Lesions: A Systematic Review. Cancers (Basel). 2023;15:1602.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 3]  [Cited by in RCA: 5]  [Article Influence: 2.5]  [Reference Citation Analysis (0)]
37.  Zhai L, Huang C, Ning Z, Zhang Y, Zhuang M, Yang W, Wang X, Wang J, Zhang L, Xiao H, Zhao L, Asthana P, Lam YY, Chow CFW, Huang JD, Yuan S, Chan KM, Yuan CS, Lau JY, Wong HLX, Bian ZX. Ruminococcus gnavus plays a pathogenic role in diarrhea-predominant irritable bowel syndrome by increasing serotonin biosynthesis. Cell Host Microbe. 2023;31:33-44.e5.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in RCA: 69]  [Reference Citation Analysis (0)]
38.  Altomare A, Di Rosa C, Imperia E, Emerenziani S, Cicala M, Guarino MPL. Diarrhea Predominant-Irritable Bowel Syndrome (IBS-D): Effects of Different Nutritional Patterns on Intestinal Dysbiosis and Symptoms. Nutrients. 2021;13:1506.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 33]  [Cited by in RCA: 59]  [Article Influence: 14.8]  [Reference Citation Analysis (0)]
39.  Waldum H, Fossmark R. Gastritis, Gastric Polyps and Gastric Cancer. Int J Mol Sci. 2021;22:6548.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 10]  [Cited by in RCA: 54]  [Article Influence: 13.5]  [Reference Citation Analysis (1)]
40.  Lu D, Wang M, Ke X, Wang Q, Wang J, Li D, Wang M, Wang Q. Association Between H. pylori Infection and Colorectal Polyps: A Meta-Analysis of Observational Studies. Front Med (Lausanne). 2021;8:706036.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in RCA: 6]  [Reference Citation Analysis (0)]
41.  Chen QF, Zhou XD, Fang DH, Zhang EG, Lin CJ, Feng XZ, Wang N, Wu JS, Wang D, Lin WH. Helicobacter pylori infection with atrophic gastritis: An independent risk factor for colorectal adenomas. World J Gastroenterol. 2020;26:5682-5692.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in CrossRef: 3]  [Cited by in RCA: 3]  [Article Influence: 0.6]  [Reference Citation Analysis (1)]
42.  Komeda Y, Watanabe T, Sakurai T, Kono M, Okamoto K, Nagai T, Takenaka M, Hagiwara S, Matsui S, Nishida N, Tsuji N, Kashida H, Kudo M. Risk factors for local recurrence and appropriate surveillance interval after endoscopic resection. World J Gastroenterol. 2019;25:1502-1512.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in CrossRef: 27]  [Cited by in RCA: 32]  [Article Influence: 5.3]  [Reference Citation Analysis (0)]
43.  Huang Y, Gong W, Su B, Zhi F, Liu S, Bai Y, Jiang B. Recurrence and surveillance of colorectal adenoma after polypectomy in a southern Chinese population. J Gastroenterol. 2010;45:838-845.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 34]  [Cited by in RCA: 34]  [Article Influence: 2.3]  [Reference Citation Analysis (0)]
44.  Chi Z, Lin Y, Huang J, Lv MY, Chen J, Chen X, Zhang B, Chen Y, Hu J, He X, Lan P. Risk factors for recurrence of colorectal conventional adenoma and serrated polyp. Gastroenterol Rep (Oxf). 2022;10:goab038.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in RCA: 9]  [Article Influence: 2.3]  [Reference Citation Analysis (0)]
45.  Leufkens AM, van Oijen MG, Vleggaar FP, Siersema PD. Factors influencing the miss rate of polyps in a back-to-back colonoscopy study. Endoscopy. 2012;44:470-475.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 205]  [Cited by in RCA: 193]  [Article Influence: 14.8]  [Reference Citation Analysis (0)]
46.  Djinbachian R, Iratni R, Durand M, Marques P, von Renteln D. Rates of Incomplete Resection of 1- to 20-mm Colorectal Polyps: A Systematic Review and Meta-Analysis. Gastroenterology. 2020;159:904-914.e12.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 35]  [Cited by in RCA: 53]  [Article Influence: 10.6]  [Reference Citation Analysis (0)]
47.  Murakami T, Yoshida N, Yasuda R, Hirose R, Inoue K, Dohi O, Kamada K, Uchiyama K, Konishi H, Naito Y, Morinaga Y, Kishimoto M, Konishi E, Ogiso K, Inada Y, Itoh Y. Local recurrence and its risk factors after cold snare polypectomy of colorectal polyps. Surg Endosc. 2020;34:2918-2925.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 18]  [Cited by in RCA: 21]  [Article Influence: 3.5]  [Reference Citation Analysis (0)]
48.  Martínez ME, Baron JA, Lieberman DA, Schatzkin A, Lanza E, Winawer SJ, Zauber AG, Jiang R, Ahnen DJ, Bond JH, Church TR, Robertson DJ, Smith-Warner SA, Jacobs ET, Alberts DS, Greenberg ER. A pooled analysis of advanced colorectal neoplasia diagnoses after colonoscopic polypectomy. Gastroenterology. 2009;136:832-841.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 396]  [Cited by in RCA: 414]  [Article Influence: 25.9]  [Reference Citation Analysis (0)]
49.  Ateş Ö, Sivri B, Kılıçkap S. Evaluation of risk factors for the recurrence of colorectal polyps and colorectal cancer. Turk J Med Sci. 2017;47:1370-1376.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 4]  [Cited by in RCA: 4]  [Article Influence: 0.5]  [Reference Citation Analysis (0)]
50.  Harrington LX, Wei JW, Suriawinata AA, Mackenzie TA, Hassanpour S. Predicting colorectal polyp recurrence using time-to-event analysis of medical records. AMIA Jt Summits Transl Sci Proc. 2020;2020:211-220.  [PubMed]  [DOI]  [Cited in This Article: ]
51.  Knudsen MD, Wang K, Wang L, Polychronidis G, Berstad P, Wu K, He X, Hang D, Fang Z, Ogino S, Chan AT, Giovannucci E, Wang M, Song M. Development and validation of a risk prediction model for post-polypectomy colorectal cancer in the USA: a prospective cohort study. EClinicalMedicine. 2023;62:102139.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in RCA: 1]  [Article Influence: 0.5]  [Reference Citation Analysis (0)]
52.  Huang Y, Liu Y, Yin X, Zhang T, Hao Y, Zhang P, Yang Y, Gao Z, Liu S, Yu S, Li H, Wang G. Establishment of clinical predictive model based on the study of influence factors in patients with colorectal polyps. Front Surg. 2023;10:1077175.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
53.  He Q, Du S, Wang X, Liu J, Xu X, Liu W, Zhang J, Jiang K. Development and validation of a nomogram based on neutrophil-to-lymphocyte ratio and fibrinogen-to-lymphocyte ratio for predicting recurrence of colorectal adenoma. J Gastrointest Oncol. 2022;13:2269-2281.  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited in This Article: ]  [Cited by in Crossref: 2]  [Reference Citation Analysis (0)]
54.  Tang H, Jin Z, Deng J, She Y, Zhong Y, Sun W, Ren Y, Cao N, Chen C. Development and validation of a deep learning model to predict the survival of patients in ICU. J Am Med Inform Assoc. 2022;29:1567-1576.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in RCA: 9]  [Article Influence: 3.0]  [Reference Citation Analysis (0)]
55.  The Lancet Respiratory Medicine. Opening the black box of machine learning. Lancet Respir Med. 2018;6:801.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 56]  [Cited by in RCA: 90]  [Article Influence: 12.9]  [Reference Citation Analysis (0)]
56.  Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell. 2020;2:56-67.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in Crossref: 2808]  [Cited by in RCA: 2027]  [Article Influence: 405.4]  [Reference Citation Analysis (0)]
57.  Han Y, Wang S. Disability risk prediction model based on machine learning among Chinese healthy older adults: results from the China Health and Retirement Longitudinal Study. Front Public Health. 2023;11:1271595.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
58.  Tong C, Miao Q, Zheng J, Wu J. A novel nomogram for predicting the decision to delayed extubation after thoracoscopic lung cancer surgery. Ann Med. 2023;55:800-807.  [PubMed]  [DOI]  [Full Text]  [Cited in This Article: ]  [Cited by in RCA: 17]  [Reference Citation Analysis (0)]