Shi YH, Liu JL, Cheng CC, Li WL, Sun H, Zhou XL, Wei H, Fei SJ. Construction and validation of machine learning-based predictive model for colorectal polyp recurrence one year after endoscopic mucosal resection. World J Gastroenterol 2025; 31(11): 102387 [PMID: 40124266 DOI: 10.3748/wjg.v31.i11.102387]
Corresponding Author of This Article
Su-Juan Fei, MD, Chief Physician, Professor, Department of Gastroenterology, The Affiliated Hospital of Xuzhou Medical University, No. 99 West Huaihai Road, Xuzhou 221002, Jiangsu Province, China. xyfyfeisj99@163.com
Research Domain of This Article
Gastroenterology & Hepatology
Article-Type of This Article
Retrospective Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Yi-Heng Shi, Jun-Liang Liu, Cong-Cong Cheng, Wen-Ling Li, Su-Juan Fei, Department of Gastroenterology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou 221002, Jiangsu Province, China
Yi-Heng Shi, Cong-Cong Cheng, Wen-Ling Li, The First Clinical Medical College of Xuzhou Medical University, Xuzhou 221002, Jiangsu Province, China
Han Sun, Xi-Liang Zhou, Department of Gastroenterology, Xuzhou Central Hospital, The Affiliated Xuzhou Hospital of Medical College of Southeast University, Xuzhou 221009, Jiangsu Province, China
Hong Wei, Department of Gastroenterology, Xuzhou New Health Hospital, North Hospital of Xuzhou Cancer Hospital, Xuzhou 221007, Jiangsu Province, China
Author contributions: Shi YH and Liu JL conceived and designed the study; Shi YH, Liu JL, Cheng CC, Li WL and Sun H participated in data processing and statistical analysis; Shi YH, Liu JL, Cheng CC, Li WL, Sun H, Zhou XL, Wei H and Fei SJ drafted the manuscript; Shi YH and Liu JL contributed to data analysis and interpretation; Fei SJ supervised the review of the study; All authors seriously revised and approved the final manuscript.
Institutional review board statement: The study was designed as per the Declaration of Helsinki and was conducted according to the TRIPOD guidelines, with ethical approval granted by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University under the approval number XYFY2023-KL360-01.
Informed consent statement: Written informed consent was waived by the Ethics Committee the Affiliated Hospital of Xuzhou Medical University.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Data sharing statement: No additional data are available.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Su-Juan Fei, MD, Chief Physician, Professor, Department of Gastroenterology, The Affiliated Hospital of Xuzhou Medical University, No. 99 West Huaihai Road, Xuzhou 221002, Jiangsu Province, China. xyfyfeisj99@163.com
Received: October 16, 2024 Revised: January 25, 2025 Accepted: February 14, 2025 Published online: March 21, 2025 Processing time: 148 Days and 5.6 Hours
Abstract
BACKGROUND
Colorectal polyps are precancerous diseases of colorectal cancer. Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer. Endoscopic mucosal resection (EMR) is a common polypectomy procedure in clinical practice, but it has a high postoperative recurrence rate. Currently, there is no predictive model for the recurrence of colorectal polyps after EMR.
AIM
To construct and validate a machine learning (ML) model for predicting the risk of colorectal polyp recurrence one year after EMR.
METHODS
This study retrospectively collected data from 1694 patients at three medical centers in Xuzhou. Additionally, a total of 166 patients were collected to form a prospective validation set. Feature variable screening was conducted using univariate and multivariate logistic regression analyses, and five ML algorithms were used to construct the predictive models. The optimal models were evaluated based on different performance metrics. Decision curve analysis (DCA) and SHapley Additive exPlanation (SHAP) analysis were performed to assess clinical applicability and predictor importance.
RESULTS
Multivariate logistic regression analysis identified 8 independent risk factors for colorectal polyp recurrence one year after EMR (P < 0.05). Among the models, eXtreme Gradient Boosting (XGBoost) demonstrated the highest area under the curve (AUC) in the training set, internal validation set, and prospective validation set, with AUCs of 0.909 (95%CI: 0.89-0.92), 0.921 (95%CI: 0.90-0.94), and 0.963 (95%CI: 0.94-0.99), respectively. DCA indicated favorable clinical utility for the XGBoost model. SHAP analysis identified smoking history, family history, and age as the top three most important predictors in the model.
CONCLUSION
The XGBoost model has the best predictive performance and can assist clinicians in providing individualized colonoscopy follow-up recommendations.
Core Tip: This study is the first to use machine learning methods to construct and validate a prediction model for one year recurrence of colorectal polyps after endoscopic mucosal resection. Key predictors included age, smoking, family history, diarrhea, hazard classification, Helicobacter pylori infection, number and size of polyps. According to receiver operating characteristic curves, sensitivity, specificity, accuracy, precision, and F1 scores, eXtreme Gradient Boosting model has the best performance. Based on this model, an online web calculator was built to help clinicians better distinguish high-risk groups and provide patients with personalized colonoscopy follow-up recommendations.
Citation: Shi YH, Liu JL, Cheng CC, Li WL, Sun H, Zhou XL, Wei H, Fei SJ. Construction and validation of machine learning-based predictive model for colorectal polyp recurrence one year after endoscopic mucosal resection. World J Gastroenterol 2025; 31(11): 102387
Colorectal cancer (CRC) is the third most common cancer globally and the second leading cause of cancer-related deaths[1]. According to the latest cancer statistics[2], over 150000 new cases of CRC were diagnosed in the United States in 2022, with more than 50000 deaths. Colorectal polyps are precursors to CRC, with 60%-85% of sporadic CRCs evolving through the conventional adenoma-carcinoma pathway[3], while a smaller portion arises from the serrated or inflammatory pathways[4]. Therefore, colonoscopy screening and early endoscopic polypectomy are effective in preventing CRC by interrupting the polyp-to-cancer progression and reducing patient mortality[5,6].
Endoscopic mucosal resection (EMR) is one of the most commonly used methods for polyp removal, offering safety, efficiency, and cost-effectiveness for the excision of most colorectal lesions[7]. However, a major limitation of EMR is its relatively high recurrence rate. A meta-analysis of 33 studies reported an average recurrence rate of 15%, with rates as high as 50%[8]. As a result, patients require regular colonoscopic follow-up after EMR. According to current guidelines, a 3-year interval for colonoscopic surveillance is generally recommended after the removal of most adenomatous polyps[9-11]. However, these guidelines are primarily based on baseline characteristics of polyps, such as size, morphology, and pathology. In fact, a variety of factors influence polyp recurrence after removal. Previous studies have shown that patient age, sex, smoking, alcohol consumption, family history, BMI, and Helicobacter pylori (H. pylori) infection are associated with polyp recurrence and are considered risk factors for recurrence[12,13]. Therefore, some patients with a high risk of recurrence may require shorter surveillance intervals. In China, a large multicenter study found that the peak recurrence period after colorectal polyp removal occurs almost entirely within the first year, with a recurrence rate approaching 60%[14]. In addition, the surveillance intervals recommended by Chinese expert consensus for colonoscopy are significantly shorter than those in foreign guidelines. For example, the Expert consensus on management strategies for precancerous lesions and conditions of CRC in China[15] suggests that for most adenomas, one year after polypectomy can serve as the starting time for colonoscopic follow-up. Thus, first-year follow-ups may be more valuable than second- or third-year follow-ups.
Recently, machine learning (ML) has gained widespread attention in medicine. ML-based clinical models have demonstrated significant advantages in disease prediction, risk assessment, diagnostic assistance, and patient management[16-18]. Therefore, this study was carried out to develop a novel clinical predictive model using ML methods to explore the risk of colorectal polyp recurrence within one year post-EMR, so as to help stratify high-risk patients and provide individualized colonoscopy monitoring strategies.
MATERIALS AND METHODS
Patient source and ethics statement
This is a multicenter retrospective study with prospective validation. Data were collected from 1694 patients who underwent their first EMR for colorectal polyp removal in the Departments of Gastroenterology at the Affiliated Hospital of Xuzhou Medical University, Xuzhou Central Hospital, and Xuzhou New Health Geriatric Hospital between September 2018 and August 2023, with a one-year follow-up colonoscopy. Additionally, 166 patients treated at the Affiliated Hospital of Xuzhou Medical University between September 2023 and September 2024 were prospectively enrolled. The study was designed as per the Declaration of Helsinki and was conducted according to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guidelines[19], with ethical approval granted by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University under the approval number XYFY2023-KL360-01. Written informed consent was waived by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University.
Inclusion criteria and exclusion criteria
Inclusion criteria: (1) First diagnosis of colorectal polyps and completed EMR of all visible polyps; (2) Age ≥ 18, successful completion of gastrointestinal endoscopy, and a one-year follow-up colonoscopy after the procedure; (3) All resected polyps underwent pathological examination; (4) Adequate bowel preparation, with a Boston Bowel Preparation Scale score of 6-9; (5) Endoscopists had at least three years of experience in endoscopic treatment, with an annual EMR case volume exceeding 300; and (6) Complete clinical data, including patient medical records, endoscopy reports, laboratory tests, and pathology results.
Exclusion criteria: (1) Previous history of colorectal polyp resection or colorectal surgery; (2) Incomplete clinical data or missing follow-up information; (3) Poor bowel preparation, severely impairing observation and procedure, or failure to complete full colonoscopy due to inability to reach the cecum; (4) Patients who did not undergo follow-up colonoscopy one year after EMR; (5) Diagnosed with familial polyposis, inflammatory bowel disease, malignancies, hematologic diseases, or those who had undergone biologic therapy, chemotherapy, or radiotherapy; or (6) Patients with severe heart, liver, lung or kidney diseases, infections, pregnancy, or cachexia.
Research variables
General data: (1) Patient demographic information: Sex, age, body mass index (BMI); and (2) Clinical symptoms and medical history: Diarrhea, constipation, hematochezia, hypertension, diabetes, coronary heart disease (CHD), cigarette preference, alcohol preference, hyperlipidemia, family history, and H. pylori infection.
Laboratory tests: Serum levels of uric acid (UA), total bilirubin (TBIL), total bile acid (TBA), hypersensitive C-reactive protein (hsCRP), carcinoembryonic antigen, and carbohydrate antigens (CA724, CA199, CA242) were tested.
Endoscopy data: Complete records of bowel preparation quality, cecal intubation success, colorectal polyp location, size, number, endoscopic morphology, pathological findings, and the presence of gastric polyps were documented. High-quality endoscopic images were also included. The endoscopes used in this study were purchased from Olympus, Japan, models CF-H290I and CF-H290I, as well as from Fujifilm, Japan, models EC-601WM and EC-760R-VM.
Polyp-related definitions
In this study, polyp locations were classified into those at proximal colon, distal colon, and whole colon (proximal and distal)[20]. Endoscopic classification was categorized according to the Japanese Yamada classification system[21] into Types I, II, III, and IV. Pathological types were classified into non-neoplastic and neoplastic polyps. Non-neoplastic polyps included inflammatory polyps, hyperplastic polyps, and hamartomatous polyps, while neoplastic polyps included tubular adenomas, villous adenomas, tubulovillous adenomas, sessile serrated lesions, and traditional serrated adenomas. Hazard classification was based on pathology. Non-progressive adenomas were defined as tubular adenomas < 10 mm in size without high-grade dysplasia. Progressive adenomas were characterized by adenoma size ≥ 10 mm, or tubulovillous/villous adenomas, or adenomas with high-grade dysplasia. Multiple polyps were defined as the presence of ≥ 2 polyps, with the largest polyp and highest pathological grade used for characterization.
Polyp outcome evaluation
Polyp recurrence was assessed through electronic medical records, the endoscopy workstation, and telephone follow-up to determine whether the patient underwent a colonoscopy one year after colorectal polyp removal. Recurrence was defined as the discovery of new polyps at the original site (local recurrence) or metachronous distant polyps in the colorectal region during follow-up colonoscopy[22]. The definitions, evaluation criteria, and data assignment standards for the feature variables in this study are detailed in Supplementary Table 1.
Modeling methods
Data preprocessing: Data cleaning and imputation were performed. In this study, some laboratory indicators (TBIL, TBA, hsCRP) had a small amount of missing data. The retrospective dataset included data from 1694 patients, with missing TBIL, TBA, and hsCRP values for 113, 292, and 332 patients, respectively, accounting for 6.67%, 17.24%, and 19.6%. Multiple imputation was performed on the missing data using the "mice" package in R software to create a complete dataset. The missing data situation is illustrated with a bar chart (Supplementary Figure 1).
Model construction and evaluation: The retrospective dataset was randomly divided into a training set and a validation set in a 7:3 ratio. The training set was used to develop the model, allowing it to learn data patterns and extract effective features, while the validation set was used to evaluate the model's performance and identify any overfitting challenges. Additionally, a prospective cohort was selected as a test set to assess the model's generalizability. Univariate and multivariate logistic regression (LR) analyses were used for feature variable screening in the training set. First, univariate regression analysis was performed on each feature variable independently, and variables with a P value of < 0.05 were further analyzed using multivariate regression. The P values of all selected variables were subjected to false discovery rate (FDR) correction to avoid false positives. After multivariate analysis, risk factors with a P value < 0.05 were identified as final feature variables. Considering the potential interactions between variables, and to improve the reliability of the model, we further conducted collinearity and correlation analysis. Five ML algorithms-LR, Decision Trees (DT), Random Forest (RF), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGBoost)-were employed to construct the predictive models. The performance of the models was assessed using receiver operating characteristic (ROC) curve analysis and the area under the ROC curve (AUC). Sensitivity, specificity, accuracy, precision, and F1 scores were also calculated to further compare model performance. Decision curve analysis (DCA) was performed to evaluate the clinical utility of the models. Finally, an interactive and visual web-based calculator was developed using the Shiny framework.
Model feature interpretation: SHapley Additive exPlanations (SHAP) analysis was used to interpret the best-performing black-box model[23]. Feature importance was determined by the mean absolute SHAP value for each feature, and SHAP values for each feature in all samples were plotted to understand overall patterns and impact across the dataset.
Statistical analysis
All data analyses and graphical plots for model construction were completed using R v4.3.2. To minimize bias caused by sample exclusion, the percentage of missing values was calculated for each continuous variable. For variables with less than 20% missing data, multiple imputation based on RFs was applied to predict the missing values using the R package “mice”. Five imputed results were generated, and the average of these five predictions was used as the final value. The ML models were constructed using “Rpart”, “randomForest”, “e1071” and “xgboost” R packages respectively. Statistical analysis was performed using the R package “tableone”. Categorical data were presented as numbers and percentages, while normally distributed continuous variables were expressed as mean ± SD. Non-normally distributed continuous variables were expressed as median and interquartile range. To compare differences between groups, the χ2 test was used for categorical variables, while the t-test or Wilcoxon rank-sum test was employed for continuous variables, depending on normality. P < 0.05 indicated statistically significant.
RESULTS
Baseline characteristics
Based on the inclusion and exclusion criteria, a total of 1694 patients were included in the retrospective study, with 742 patients (43.8%) in the non-recurrence group and 952 patients (56.2%) in the recurrence group. Among the patients, 60.6% were males and 39.4% were females. A total of 1071 non-older patients (< 60 years) and 623 older patients (≥ 60 years) were included. Patients were randomized into a training set (n = 1186) and a validation set (n = 508) in a 7:3 ratio. The median age of patients in the training set was 56.00 (49.00, 63.00), and the median age of patients in the validation set was 56.50 (49.00, 64.00). Comparison of baseline characteristics between the two groups showed no significant differences for most variables, except for hypertension (P = 0.018) and concomitant gastric polyps (P = 0.022), suggesting that the clinical data of the two cohorts were overall balanced. Additionally, 166 patients were selected as a prospective cohort to validate the model. Due to the limitation of sample size, there were statistically significant differences between the external validation set and the training set in terms of variables such as polyp location (P = 0.023), size (P < 0.001), endoscopic classification (P < 0.001), hazard classification (P = 0.002), concomitant gastric polyp (P = 0.001), and TBIL (P = 0.001). The demographic and clinical characteristics of the patients are provided in Table 1. The study design is illustrated in Figure 1.
Figure 1 Flowchart of study design route.
EMR: Endoscopic mucosal resection; LR: Logistic Regression; DT: Decision Trees; RF: Random Forest; SVM: Support Vector Machine; XGBoost: EXtreme Gradient Boosting; ROC: Receiver operating characteristic; DCA: Decision curve analysis; SHAP: SHapley Additive exPlanations.
Table 1 Baseline demographic and clinicopathological characteristics of all patients, n (%).
Variables
Training set (n = 1186)
Validation set (n = 508)
Prospective set (n = 166)
P value
Gender
Female
474 (40.0)
194 (38.2)
67 (40.4)
0.769
Male
712 (60.0)
314 (61.8)
99 (59.6)
Age, median (IQR)
56.00 (49.00, 63.00)
56.50 (49.00, 64.00)
56.00 (50.00, 63.00)
0.948
BMI, median (IQR)
24.20 (22.31, 26.40)
24.32 (22.49, 26.47)
24.22 (22.02, 26.67)
0.445
Hypertension
No
924 (77.9)
368 (72.4)
128 (77.1)
0.051
Yes
262 (22.1)
140 (27.6)
38 (22.9)
Diabetes
No
1057 (89.1)
448 (88.2)
143 (86.1)
0.497
Yes
129 (10.9)
60 (11.8)
23 (13.9)
CHD
No
1117 (94.2)
468 (92.1)
153 (92.2)
0.231
Yes
69 (5.8)
40 (7.9)
13 (7.8)
Family history
No
1088 (91.7)
479 (94.3)
149 (89.8)
0.089
Yes
98 (8.3)
29 (5.7)
17 (10.2)
Cigarette preference
No
972 (82.0)
413 (81.3)
137 (82.5)
0.921
Yes
214 (18.0)
95 (18.7)
29 (17.5)
Alcohol preference
No
971 (81.9)
422 (83.1)
133 (80.1)
0.669
Yes
215 (18.1)
86 (16.9)
33 (19.9)
Constipation
No
1086 (91.6)
470 (92.5)
146 (88.0)
0.185
Yes
100 (8.4)
38 (7.5)
20 (12.0)
Diarrhea
No
918 (77.4)
399 (78.5)
128 (77.1)
0.860
Yes
268 (22.6)
109 (21.5)
38 (22.9)
Hemafecia
No
1112 (93.8)
479 (94.3)
160 (96.4)
0.397
Yes
74 (6.2)
29 (5.7)
6 (3.6)
Anatomical location
Proximal colon
261 (22.0)
117 (23.0)
54 (32.5)
0.023
Distal colon
505 (42.6)
208 (40.9)
69 (41.6)
Total colon
420 (35.4)
183 (36.0)
43 (25.9)
Number of polyps
< 3
672 (56.7)
276 (54.3)
106 (63.9)
0.099
≥ 3
514 (43.3)
232 (45.7)
60 (36.1)
Number of adenomas
0
392 (33.1)
150 (29.5)
57 (34.3)
0.542
1-2
545 (46.0)
237 (46.7)
74 (44.6)
≥ 3
249 (21.0)
121 (23.8)
35 (21.1)
Size
< 0.5
400 (33.7)
157 (30.9)
118 (71.1)
< 0.001
0.5-1
627 (52.9)
266 (52.4)
37 (22.3)
> 1
159 (13.4)
85 (16.7)
11 (6.6)
Endoscopic classification
I
512 (43.2)
201 (39.6)
99 (59.6)
< 0.001
II
442 (37.3)
191 (37.6)
54 (32.5)
III-IV
232 (19.6)
116 (22.8)
13 (7.8)
Hazard classification
Non-neoplastic polyps
394 (33.2)
152 (29.9)
72 (43.4)
0.002
Non-progressive adenoma
564 (47.6)
250 (49.2)
79 (47.6)
Progressive adenoma
228 (19.2)
106 (20.9)
15 (9.0)
Concomitant gastric polyp
No
761 (64.2)
356 (70.1)
127 (76.5)
0.001
Yes
425 (35.8)
152 (29.9)
39 (23.5)
H. pylori
No
699 (58.9)
308 (60.6)
104 (62.7)
0.586
Yes
487 (41.1)
200 (39.4)
62 (37.3)
Hyperlipidemia
No
810 (68.3)
341 (67.1)
117 (70.5)
0.714
Yes
376 (31.7)
167 (32.9)
49 (29.5)
Uric acid levels
Normal
1105 (93.2)
475 (93.5)
153 (92.2)
0.839
Elevated
81 (6.8)
33 (6.5)
13 (7.8)
TBIL, median (IQR)
11.90 (9.30, 15.78)
12.00 (9.30, 15.62)
9.90 (5.90, 13.97)
< 0.001
TBA, median (IQR)
3.00 (1.80, 4.68)
3.30 (1.90, 4.93)
2.90 (1.80, 5.60)
0.149
hsCRP, median (IQR)
0.60 (0.50, 1.30)
0.60 (0.50, 1.30)
0.50 (0.50, 1.40)
0.775
CEA
Normal
1150 (97.0)
496 (97.6)
161 (97.0)
0.741
Elevated
36 (3.0)
12 (2.4)
5 (3.0)
CA724
Normal
1137 (95.9)
487 (95.9)
155 (93.4)
0.323
Elevated
49 (4.1)
21 (4.1)
11 (6.6)
CA199
Normal
1160 (97.8)
499 (98.2)
163 (98.2)
0.833
Elevated
26 (2.2)
9 (1.8)
3 (1.8)
CA242
Normal
1154 (97.3)
491 (96.7)
159 (95.8)
0.492
Elevated
32 (2.7)
17 (3.3)
7 (4.2)
Feature screening of risk factors for colorectal polyp recurrence
Univariate analysis showed that 17 variables were associated with polyp recurrence one year after EMR (P < 0.05), including sex, age, BMI, hypertension, CHD, cigarette preference, family history, diarrhea, polyp location, polyp number, polyp size, number of adenomas, endoscopic classification, hazard classification, H. pylori infection, hyperlipidemia, and serum UA levels. The P values of these variables were subjected to the FDR correction (Supplementary Table 2). Further inclusion of the univariate results into multivariate analysis revealed that eight variables were independent predictors of colorectal polyp recurrence one year after EMR. These variables included age [odds ratio (OR) = 1.05; 95%CI: 1.03-1.06], family history (none as reference, present: OR = 11.34; 95%CI: 5.09-25.26), cigarette preference (none as reference, present: OR = 3.92; 95%CI: 2.50-6.14), diarrhea (none as reference, present: OR = 1.42; 95%CI: 1.02-1.99), polyp size (< 0.5 cm as reference, 0.5-1 cm: OR = 2.05; 95%CI: 1.50-2.80; > 1 cm: OR = 3.98; 95%CI: 2.02-7.86), number of polyps (< 3 as reference, ≥ 3: OR = 1.54; 95%CI: 1.07-2.21), H. pylori infection (none as reference, present: OR = 1.82; 95%CI: 1.37-2.42), and hazard classification (non-neoplastic polyps as reference, non-progressive adenoma: OR = 4.48; 95%CI: 1.02-19.73; progressive adenoma: OR = 5.29; 95%CI: 1.07-26.20). Details can be found in Table 2. We calculated the Variance inflation factor (VIF) for all candidate features. We found that all variables had VIF values < 5, confirming no severe multicollinearity existed among the variables (Supplementary Table 3). We also examined the correlation coefficients. The results showed that Correlation coefficients r were all < 0.7, suggesting that there is no strong correlation between variables (Supplementary Figure 2).
Table 2 Univariate and multivariate logistic regression analysis of colorectal polyp recurrence 1 year after Endoscopic mucosal resection.
Construction and validation of the ML prediction model
The eight feature variables mentioned above were incorporated into the model construction. In the training set, the ROC-AUC values of the prediction models built using five ML algorithms are shown in Figure 2A: LR (AUC = 0.803, 95%CI: 0.779-0.828), DT (AUC = 0.754, 95%CI: 0.728-0.781), RF (AUC = 0.861, 95%CI: 0.84-0.881), SVM (AUC = 0.808, 95%CI: 0.784-0.832), and XGBoost (AUC = 0.909, 95%CI: 0.893-0.925). The models were validated in the validation set, with the results displayed in Figure 2B. The AUC values for the five ML models were as follows: LR (AUC = 0.81, 95%CI: 0.774-0.847), DT (AUC = 0.799, 95%CI: 0.761-0.837), RF (AUC = 0.902, 95%CI: 0.877-0.928), SVM (AUC = 0.819, 95%CI: 0.784-0.855), and XGBoost (AUC = 0.921, 95%CI: 0.898-0.944). In the prospective set (Figure 2C), the AUC values for each model were: LR (AUC = 0.779, 95%CI: 0.708-0.851), DT (AUC = 0.812, 95%CI: 0.746-0.877), RF (AUC = 0.943, 95%CI: 0.912-0.974), SVM (AUC = 0.791, 95%CI: 0.722-0.86), and XGBoost (AUC = 0.963, 95%CI: 0.938-0.988). Among the five ML algorithms, the XGBoost model performed the best.
Figure 2 Receiver operating characteristic curves of different models across various datasets.
A: Training set; B: Validation set; C: Prospective set. LR: Logistic Regression; DT: Decision Trees; RF: Random Forest; SVM: Support Vector Machine; XGBoost: EXtreme Gradient Boosting; AUC: Area under the curve.
Evaluation of the ML prediction models
To further assess model performance, we calculated the sensitivity, specificity, accuracy, precision, and F1-score for each model based on the confusion matrix results. These metrics, combined with the AUC values, formed a comprehensive evaluation system (Table 3). A comparative analysis indicated that the XGBoost model had the best overall performance.
Table 3 Comparison of the performance of different models in training set, validation set and prospective set.
Model
AUC
Sensitivity
Specificity
Accuracy
Precision
F1 score
Training set
LR
0.803
0.733
0.728
0.731
0.774
0.753
DT
0.754
0.806
0.613
0.721
0.726
0.764
RF
0.861
0.727
0.835
0.775
0.849
0.784
SVM
0.808
0.720
0.753
0.734
0.788
0.752
XGB
0.909
0.756
0.904
0.820
0.907
0.824
Validation set
LR
0.809
0.743
0.686
0.719
0.756
0.750
DT
0.799
0.785
0.723
0.758
0.788
0.786
RF
0.902
0.750
0.918
0.823
0.923
0.828
SVM
0.819
0.743
0.705
0.726
0.767
0.755
XGB
0.921
0.788
0.914
0.843
0.923
0.850
Prospective set
LR
0.779
0.568
0.847
0.711
0.780
0.657
DT
0.812
0.765
0.729
0.747
0.765
0.747
RF
0.943
0.667
0.988
0.831
0.982
0.794
SVM
0.791
0.617
0.824
0.723
0.769
0.685
XGB
0.963
0.840
0.941
0.892
0.932
0.883
The clinical applicability of the XGBoost model was assessed using DCA. This method integrates patients’ benefits and physicians’ preferences to assist doctors in making optimal clinical decisions tailored to patient needs[24,25]. The X-axis represents threshold probability, and the Y-axis represents net benefit. The results (Figure 3) indicated that decisions made using this model yield a greater net clinical benefit compared to the "review all" or "review none" strategies, suggesting that the model has strong clinical applicability. The "review all" strategy, which involves examining every patient regardless of their risk, often leads to high costs and unnecessary interventions. Conversely, the "review none" strategy, which avoids any interventions, risks overlooking patients who could benefit from treatment.
Figure 3 Decision curves of the eXtreme Gradient Boosting model across various datasets.
A: Training set; B: Validation set; C: Prospective set. XGBoost: EXtreme Gradient Boosting.
Interpretability analysis of the XGBoost model
We used SHAP analysis to quantify each feature's contribution to the model, enhancing interpretability. As shown in Figure 4A, the importance ranking of the feature variables in the XGBoost model, from highest to lowest, is as follows: Cigarette preference, family history, age, number of polyps ≥ 3, progressive adenoma, diarrhea, H. pylori infection, polyp size > 1 cm, non-progressive adenoma, and polyp size 0.5-1 cm. Additionally, based on each sample's SHAP values, we used SHAP beeswarm plots to visualize the contribution of each sample to the prediction results (Figure 4B).
Figure 4 SHapley Additive exPlanations analysis of the XGBoost model.
A: SHapley Additive exPlanations (SHAP) summary bar plot, where features are ranked in descending order according to the mean absolute SHAP value; B: SHAP beeswarm plot, displaying the SHAP value of each feature for every sample in the dataset. Each row represents a feature, and each dot corresponds to a sample. The color of the dots indicates feature values, with yellow representing high values and purple representing low values. SHAP: SHapley Additive exPlanations.
Construction of the online web calculator
Based on the results of the XGBoost model, an online web calculator was developed (https://webcalculatorsyh.shinyapps.io/XGBoost/). By adjusting the values or conditions of a patient's clinical characteristics and clicking the "Predict" button, the calculator automatically estimates the one-year recurrence risk of colorectal polyps after EMR. For example, as shown in Figure 5, a 70-year-old patient with a smoking habit and family history, polyps > 1 cm in size, more than 3 polyps, and the highest-grade polyp pathology classified as advanced adenoma, has a 94.04% probability of colorectal polyp recurrence 1 year after EMR. The physician should strongly recommend that this patient undergo a follow-up colonoscopy 1 year after the procedure.
Figure 5 Online web calculator for predicting colorectal polyp recurrence 1 year after Endoscopic mucosal resection.
EMR: Endoscopic mucosal resection; XGBoost: EXtreme Gradient Boosting; CRC: Colorectal cancer.
DISCUSSION
This is a multicenter retrospective study with prospective validation. A clinical prediction model was constructed using ML methods to evaluate the risk of colorectal polyp recurrence one year after EMR. Eight key features were selected for model development and validation, with the XGBoost model demonstrating the best predictive performance. SHAP analysis was employed to calculate the contribution of each feature to the model, and clinical applicability was assessed by DCA. Finally, an online web calculator based on the XGBoost model was developed to assist clinicians in formulating individualized colonoscopy surveillance plans based on each patient's polyp recurrence risk.
From a demographic perspective, this study identified age, family history, and Cigarette preference as independent predictors of colorectal polyp recurrence. Previous studies have suggested that as patients age, the incidence, recurrence, and malignancy rates of colorectal polyps gradually increase, which may be attributed to higher rates of genetic mutations, declining immune function, and chronic inflammation in the intestines of older individuals[26,27]. Some studies[28] have also pointed out that smoking increases the risk of adenoma recurrence after polypectomy. Davenport et al[29] found that smoking status, duration, and intensity are associated with an increased risk of different types of polyps, including sessile serrated polyps, conventional adenomas, and hyperplastic polyps. This may be due to the multi-directional and multi-site harmful effects of cigarette components, which cause irreversible damage to the genetic material of colorectal cells. For example, smoking can lead to abnormal methylation of CpG islands and mutations in genes such as c-MYC, KRAS, and BRAF, promoting the malignant transformation of colorectal cells[30]. We found that patients with a family history of CRC or polyps were more prone to polyp recurrence, highlighting the significant role of genetic susceptibility in the development of colorectal polyps and tumors[31]. A large-scale nationwide case-control study from Sweden reported that the risk of polyp development increases by 40% in patients with first-degree relatives who have colorectal polyps, and this risk escalates with the increasing number of affected first-degree relatives and the younger age at diagnosis[32]. Additionally, a study by Samadder et al[33] demonstrated that having a first-degree relative with CRC increases the risk of developing adenomatous polyps [hazard rate ratio (HRR), 1.82] and advanced villous adenomas (HRR, 2.43).
From clinical data, diarrhea, H. pylori infection, polyp size, polyp number, and hazard classification were identified as independent predictors of colorectal polyp recurrence. Our study found that patients with diarrhea have a higher risk of short-term recurrence after EMR, which may be related to gut microbiota disturbance. Gut microbiota disturbance plays a critical role in the adenoma-carcinoma sequence. Studies have shown that, compared to healthy individuals, patients with adenomatous polyps exhibit significantly higher abundances of microbial species such as Fusobacterium mortiferum, Fusobacterium nucleatum, Ruminococcus gnavus (R. gnavus), and Bacteroides fragilis[34,35]. Conversely, the levels of the genera Bifidobacterium, Faecalibacterium, and Blautia were found to be reduced in patients with adenomatous polyps[36]. Similar findings have been observed in patients with diarrhea-predominant irritable bowel syndrome, where pathogenic bacteria such as R. gnavus are increased[37], while beneficial bacteria like Bifidobacterium are reduced[38]. Our study also identified a higher recurrence risk in patients infected with H. pylori. Previous studies have established H. pylori as a causative agent of chronic gastritis, gastric polyps, and gastric cancer[39], but its relationship with colorectal polyps remains inconclusive. A meta-analysis by Lu et al[40] found that H. pylori infection is independently associated with adenomatous polyps, advanced adenomatous polyps, and hyperplastic polyps, suggesting that H. pylori infection is a risk factor for colorectal polyps. The potential mechanisms may include hypergastrinemia induced by H. pylori, which promotes the proliferation of the colorectal mucosa, and direct stimulation by H. pylori, leading to dysbiosis of the gut microbiota and the development of colorectal lesions[40,41]. Compared with non-neoplastic polyps, progressive adenomas exhibit a notably higher risk of recurrence. These polyps show more dysplasia and atypia histologically, with a higher proportion of villous components, making them more prone to progression[42]. Previous studies have also reported that patients with adenomas at their initial colonoscopy are more likely to experience adenoma recurrence[4,43]. Consistent with our findings, studies have shown that patients with multiple polyps (≥ 3) had an increased risk of recurrence[44]. This could be partly due to genetic factors in these patients and partly because multiple polyps are more difficult to completely remove during surgery, increasing the likelihood of missing polyps[45]. Additionally, polyp size is also a factor influencing recurrence[46], with our study showing that polyps larger than 1 cm in diameter had a higher recurrence risk, consistent with the findings of Murakami et al[47]. The study by Martínez et al[48] also confirmed the correlation between polyp number and size with recurrence, particularly showing that patients with five or more baseline adenomas or adenomas ≥ 2 cm in diameter had a higher risk of developing metachronous advanced adenomas. Although in this study, polyp location was not an independent predictor of recurrence, existing research has shown that both the left and right colon are associated with polyp recurrence[49,50]. However, these findings are inconsistent, and no definitive conclusions have been reached. Further original research is needed for validation in the future.
From the perspective of model construction, with the advancement of precision medicine, clinical prediction models serve as quantitative tools for assessing risk and benefit, providing personalized guidance for patients, physicians, and healthcare decision-makers. As a result, these models are increasingly applied in clinical practice[51]. In previous studies, several researchers have developed various clinical prediction models aimed at evaluating the occurrence and recurrence risk of colorectal polyps. For instance, Huang et al[52] constructed a nomogram model based on risk factors associated with colorectal polyps. It is used to predict the likelihood of polyp occurrence with an AUC of 0.747 (95%CI: 0.692-0.801). He et al[53] developed a nomogram based on the neutrophil-to-lymphocyte ratio and fibrinogen-to-lymphocyte ratio to predict colorectal adenoma recurrence, with AUCs of 0.846 and 0.841 in the training and validation sets, respectively, demonstrating favorable clinical applicability. However, most of these models are based on traditional linear regression analysis, they are not ideal for handling nonlinear data and present several limitations in real-world clinical settings[54]. In contrast, ML algorithms offer unique advantages in processing high-dimensional data with multiple variables and features. Therefore, in this study, we selected ML to build a prediction model for post-EMR polyp recurrence. Among the five ML algorithms tested, XGBoost performed the best, with AUC values of 0.909 (95%CI: 0.89-0.92) and 0.921 (95%CI: 0.90-0.94) in the training and validation sets. Additionally, we validated the model using a prospective cohort, where the XGBoost model again showed the best performance, with an AUC of 0.963 (95%CI: 0.94-0.99), along with relatively high sensitivity and specificity. Despite the higher computational complexity and the trade-off in relative interpretability compared to the LR model, XGBoost was selected in our study due to its superior predictive performance and its ability to handle complex feature interactions. To facilitate individualized decision-making by physicians, we transformed the results of the XGBoost model into an online web calculator. By inputting patient-specific characteristics and clinical indicators, the tool quickly provides a recurrence risk percentage, helping clinicians develop tailored follow-up strategies.
From the perspective of model interpretability, the lack of transparency in ML, often referred to as its "black box" nature, has been a major challenge hindering its broader clinical adoption. This is because the decision boundaries in ML models are complex and often lack the transparency necessary for researchers to fully understand how the model extracts information from the data and makes decisions[55]. To address this issue, we introduced SHAP values to help explain the model's decision-making logic and prediction process. The core concept of SHAP is to compute the marginal contribution of each feature to the model's output for each sample, offering insights into both global and local explanations of the "black box" model[56]. The key advantage of this approach is that it visually demonstrates the actual contribution of each feature to the model's decision-making process and whether that contribution has a positive or negative impact[57]. In this study, the SHAP bar plot visually demonstrated the importance ranking of the eight influencing factors in the model, with cigarette preference, family history, and age being the top three contributors. This helps us understand the contribution of each feature to the prediction results on a global level. The SHAP beeswarm plot offers a more detailed depiction. Each point on the plot represents a sample, and the color intensity indicates the SHAP value for that sample. This visualization not only highlights the relative importance of each feature vertically but also illustrates how each feature impacts the prediction across all samples, enabling a horizontal comparison of their influence. Additionally, we used DCA to evaluate the clinical applicability of the model. The DCA results demonstrate that the XGBoost model strikes a balance between "review all" and "review none" strategies. By predicting patient outcomes with high accuracy, the model enables doctors to identify those who are most likely to benefit from interventions, thereby optimizing resource allocation and patient care. This translates into a significant net clinical benefit, reflecting both the improved health outcomes for patients and the cost-effectiveness of the interventions. For example, in the study by Tong et al[58], within the 0%-30% threshold range, the decision to delay extubation after thoracoscopic lung cancer surgery was clearly beneficial. By comparing threshold probabilities with net benefits, we found that the model demonstrated favorable net benefits, which can help guide clinicians in selecting the optimal clinical strategy.
Inevitably, this study has several limitations. First, it is unusual to observe that the model performs better in the prospective validation set than in the training set. This may be due to the small number of samples in the prospective validation set, the similarity of some features to those in the training set, and the characteristics of the model itself. Thus, further additional prospective data are needed to validate the model and ensure its generalizability and external applicability. Second, despite collecting comprehensive clinicopathological data based on current research, we were unable to include all potential risk factors that might influence polyp recurrence. For example, individual dietary preferences, lifestyle factors, nonsteroidal anti-inflammatory drug use, and biliary diseases could also be related to the occurrence of colorectal polyps. However, due to the difficulty in quantifying these factors and the lack of standardized assessment criteria, they were not included in our analysis.
CONCLUSION
This study is the first to develop ML models to predict the recurrence of colorectal polyps one year after EMR and to identify associated risk factors. Among these models, the XGBoost model performed the best. Additionally, we developed an online web calculator based on the XGBoost predictions, which can help clinicians quickly calculate a patient's recurrence risk, facilitating the joint development of appropriate and accurate follow-up plans between clinicians and patients.
ACKNOWLEDGEMENTS
We sincerely thank all the staff and collaborators who participated in this study. In addition, we acknowledge the contributions of editors and reviewers for their constructive feedback and suggestions on earlier versions of this manuscript.
Footnotes
Provenance and peer review: Unsolicited article; Externally peer reviewed.
Peer-review model: Single blind
Corresponding Author's Membership in Professional Societies: Member of the Ninth Internal Medicine Branch Committee of Jiangsu Medical Association; Member of Psychosomatic Disease Collaboration Group, Chinese Society of Gastroenterology; Chairman of the Digestive and Psychosomatic Committee of Jiangsu Province; President-designate, Branch of Physicians of Jiangsu Medical Association; Consultant of Gastrointestinal Motility Group of Jiangsu Gastroenterology Society.
Specialty type: Gastroenterology and hepatology
Country of origin: China
Peer-review report’s classification
Scientific Quality: Grade A, Grade B, Grade B, Grade C
Novelty: Grade B, Grade B, Grade B, Grade B
Creativity or Innovation: Grade B, Grade B, Grade B, Grade C
Scientific Significance: Grade A, Grade A, Grade B, Grade C
P-Reviewer: Dai Z; Hanada E; Ling YW S-Editor: Li L L-Editor: A P-Editor: Wang WB
Sung JJY, Chiu HM, Lieberman D, Kuipers EJ, Rutter MD, Macrae F, Yeoh KG, Ang TL, Chong VH, John S, Li J, Wu K, Ng SSM, Makharia GK, Abdullah M, Kobayashi N, Sekiguchi M, Byeon JS, Kim HS, Parry S, Cabral-Prodigalidad PAI, Wu DC, Khomvilai S, Lui RN, Wong S, Lin YM, Dekker E. Third Asia-Pacific consensus recommendations on colorectal cancer screening and postpolypectomy surveillance.Gut. 2022;71:2152-2166.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 1][Cited by in RCA: 30][Article Influence: 10.0][Reference Citation Analysis (1)]
Ferlitsch M, Hassan C, Bisschops R, Bhandari P, Dinis-Ribeiro M, Risio M, Paspatis GA, Moss A, Libânio D, Lorenzo-Zúñiga V, Voiosu AM, Rutter MD, Pellisé M, Moons LMG, Probst A, Awadie H, Amato A, Takeuchi Y, Repici A, Rahmi G, Koecklin HU, Albéniz E, Rockenbauer LM, Waldmann E, Messmann H, Triantafyllou K, Jover R, Gralnek IM, Dekker E, Bourke MJ. Colorectal polypectomy and endoscopic mucosal resection: European Society of Gastrointestinal Endoscopy (ESGE) Guideline - Update 2024.Endoscopy. 2024;56:516-545.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 7][Reference Citation Analysis (0)]
Gupta S, Lieberman D, Anderson JC, Burke CA, Dominitz JA, Kaltenbach T, Robertson DJ, Shaukat A, Syngal S, Rex DK. Recommendations for Follow-Up After Colonoscopy and Polypectomy: A Consensus Update by the US Multi-Society Task Force on Colorectal Cancer.Am J Gastroenterol. 2020;115:415-434.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 107][Cited by in RCA: 114][Article Influence: 22.8][Reference Citation Analysis (0)]
Hassan C, Antonelli G, Dumonceau JM, Regula J, Bretthauer M, Chaussade S, Dekker E, Ferlitsch M, Gimeno-Garcia A, Jover R, Kalager M, Pellisé M, Pox C, Ricciardiello L, Rutter M, Helsingen LM, Bleijenberg A, Senore C, van Hooft JE, Dinis-Ribeiro M, Quintero E. Post-polypectomy colonoscopy surveillance: European Society of Gastrointestinal Endoscopy (ESGE) Guideline - Update 2020.Endoscopy. 2020;52:687-700.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 150][Cited by in RCA: 272][Article Influence: 54.4][Reference Citation Analysis (0)]
Rutter MD, East J, Rees CJ, Cripps N, Docherty J, Dolwani S, Kaye PV, Monahan KJ, Novelli MR, Plumb A, Saunders BP, Thomas-Gibson S, Tolan DJM, Whyte S, Bonnington S, Scope A, Wong R, Hibbert B, Marsh J, Moores B, Cross A, Sharp L. British Society of Gastroenterology/Association of Coloproctology of Great Britain and Ireland/Public Health England post-polypectomy and post-colorectal cancer resection surveillance guidelines.Gut. 2020;69:201-223.
[PubMed] [DOI] [Full Text] [Full Text (PDF)][Cited in This Article: ][Cited by in Crossref: 235][Cited by in RCA: 229][Article Influence: 45.8][Reference Citation Analysis (0)]
Lee S, Do YS, Lee HJ, Kim GU, Park HW, Chang HS, Choe J, Byeon JS, Lee JY. Gastrointestinal: Weight gain increases the risk of metachronous advanced colorectal neoplasm observed in post-polypectomy surveillance colonoscopy.J Gastroenterol Hepatol. 2024;39:47-54.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Reference Citation Analysis (0)]
Li ZS, Linghu EQ, Wang GQ, Bai Y; National Clinical Research Center for Digestive Diseases (Shanghai); Chinese Society of Digestive Endoscopology; Cancer Endoscopy Professional Committee of China Anti-Cancer Association; Digestive Endoscopy Professional Committee of Chinese Endoscopist Association; Endoscopic Health Management and Medical Examination Professional Committee of Chinese Endoscopist Association. [Expert consensus on management strategies for precancerous lesions and conditions of colorectal cancer in China].Zhonghua Xiaohua Neijingzazhi. 2022;39:1-18.
[PubMed] [DOI] [Full Text][Cited in This Article: ]
Robinson-Weiss C, Patel J, Bizzo BC, Glazer DI, Bridge CP, Andriole KP, Dabiri B, Chin JK, Dreyer K, Kalpathy-Cramer J, Mayo-Smith WW. Machine Learning for Adrenal Gland Segmentation and Classification of Normal and Adrenal Masses at CT.Radiology. 2023;306:e220101.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 9][Cited by in RCA: 1][Article Influence: 0.5][Reference Citation Analysis (0)]
Ren Y, Zhang Y, Zhan J, Sun J, Luo J, Liao W, Cheng X. Machine learning for prediction of delirium in patients with extensive burns after surgery.CNS Neurosci Ther. 2023;29:2986-2997.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Reference Citation Analysis (0)]
Reid ME, Marshall JR, Roe D, Lebowitz M, Alberts D, Battacharyya AK, Martinez ME. Smoking exposure as a risk factor for prevalent and recurrent colorectal adenomas.Cancer Epidemiol Biomarkers Prev. 2003;12:1006-1011.
[PubMed] [DOI][Cited in This Article: ]
Davenport JR, Su T, Zhao Z, Coleman HG, Smalley WE, Ness RM, Zheng W, Shrubsole MJ. Modifiable lifestyle factors associated with risk of sessile serrated polyps, conventional adenomas and hyperplastic polyps.Gut. 2018;67:456-465.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 57][Cited by in RCA: 63][Article Influence: 9.0][Reference Citation Analysis (0)]
Samadder NJ, Curtin K, Tuohy TM, Rowe KG, Mineau GP, Smith KR, Pimentel R, Wong J, Boucher K, Burt RW. Increased risk of colorectal neoplasia among family members of patients with colorectal cancer: a population-based study in Utah.Gastroenterology. 2014;147:814-821.e5; quiz e15.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 57][Cited by in RCA: 59][Article Influence: 5.4][Reference Citation Analysis (0)]
Kordahi MC, Stanaway IB, Avril M, Chac D, Blanc MP, Ross B, Diener C, Jain S, McCleary P, Parker A, Friedman V, Huang J, Burke W, Gibbons SM, Willis AD, Darveau RP, Grady WM, Ko CW, DePaolo RW. Genomic and functional characterization of a mucosal symbiont involved in early-stage colorectal cancer.Cell Host Microbe. 2021;29:1589-1598.e6.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 6][Cited by in RCA: 56][Article Influence: 14.0][Reference Citation Analysis (0)]
Zhai L, Huang C, Ning Z, Zhang Y, Zhuang M, Yang W, Wang X, Wang J, Zhang L, Xiao H, Zhao L, Asthana P, Lam YY, Chow CFW, Huang JD, Yuan S, Chan KM, Yuan CS, Lau JY, Wong HLX, Bian ZX. Ruminococcus gnavus plays a pathogenic role in diarrhea-predominant irritable bowel syndrome by increasing serotonin biosynthesis.Cell Host Microbe. 2023;31:33-44.e5.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in RCA: 69][Reference Citation Analysis (0)]
Murakami T, Yoshida N, Yasuda R, Hirose R, Inoue K, Dohi O, Kamada K, Uchiyama K, Konishi H, Naito Y, Morinaga Y, Kishimoto M, Konishi E, Ogiso K, Inada Y, Itoh Y. Local recurrence and its risk factors after cold snare polypectomy of colorectal polyps.Surg Endosc. 2020;34:2918-2925.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 18][Cited by in RCA: 21][Article Influence: 3.5][Reference Citation Analysis (0)]
Martínez ME, Baron JA, Lieberman DA, Schatzkin A, Lanza E, Winawer SJ, Zauber AG, Jiang R, Ahnen DJ, Bond JH, Church TR, Robertson DJ, Smith-Warner SA, Jacobs ET, Alberts DS, Greenberg ER. A pooled analysis of advanced colorectal neoplasia diagnoses after colonoscopic polypectomy.Gastroenterology. 2009;136:832-841.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Cited by in Crossref: 396][Cited by in RCA: 414][Article Influence: 25.9][Reference Citation Analysis (0)]
Harrington LX, Wei JW, Suriawinata AA, Mackenzie TA, Hassanpour S. Predicting colorectal polyp recurrence using time-to-event analysis of medical records.AMIA Jt Summits Transl Sci Proc. 2020;2020:211-220.
[PubMed] [DOI][Cited in This Article: ]
Huang Y, Liu Y, Yin X, Zhang T, Hao Y, Zhang P, Yang Y, Gao Z, Liu S, Yu S, Li H, Wang G. Establishment of clinical predictive model based on the study of influence factors in patients with colorectal polyps.Front Surg. 2023;10:1077175.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Reference Citation Analysis (0)]
He Q, Du S, Wang X, Liu J, Xu X, Liu W, Zhang J, Jiang K. Development and validation of a nomogram based on neutrophil-to-lymphocyte ratio and fibrinogen-to-lymphocyte ratio for predicting recurrence of colorectal adenoma.J Gastrointest Oncol. 2022;13:2269-2281.
[PubMed] [DOI] [Full Text] [Full Text (PDF)][Cited in This Article: ][Cited by in Crossref: 2][Reference Citation Analysis (0)]
Han Y, Wang S. Disability risk prediction model based on machine learning among Chinese healthy older adults: results from the China Health and Retirement Longitudinal Study.Front Public Health. 2023;11:1271595.
[PubMed] [DOI] [Full Text][Cited in This Article: ][Reference Citation Analysis (0)]