Copyright
©The Author(s) 2025.
World J Gastroenterol. Mar 21, 2025; 31(11): 102387
Published online Mar 21, 2025. doi: 10.3748/wjg.v31.i11.102387
Published online Mar 21, 2025. doi: 10.3748/wjg.v31.i11.102387
Table 1 Baseline demographic and clinicopathological characteristics of all patients, n (%)
Variables | Training set (n = 1186) | Validation set (n = 508) | Prospective set (n = 166) | P value |
Gender | ||||
Female | 474 (40.0) | 194 (38.2) | 67 (40.4) | 0.769 |
Male | 712 (60.0) | 314 (61.8) | 99 (59.6) | |
Age, median (IQR) | 56.00 (49.00, 63.00) | 56.50 (49.00, 64.00) | 56.00 (50.00, 63.00) | 0.948 |
BMI, median (IQR) | 24.20 (22.31, 26.40) | 24.32 (22.49, 26.47) | 24.22 (22.02, 26.67) | 0.445 |
Hypertension | ||||
No | 924 (77.9) | 368 (72.4) | 128 (77.1) | 0.051 |
Yes | 262 (22.1) | 140 (27.6) | 38 (22.9) | |
Diabetes | ||||
No | 1057 (89.1) | 448 (88.2) | 143 (86.1) | 0.497 |
Yes | 129 (10.9) | 60 (11.8) | 23 (13.9) | |
CHD | ||||
No | 1117 (94.2) | 468 (92.1) | 153 (92.2) | 0.231 |
Yes | 69 (5.8) | 40 (7.9) | 13 (7.8) | |
Family history | ||||
No | 1088 (91.7) | 479 (94.3) | 149 (89.8) | 0.089 |
Yes | 98 (8.3) | 29 (5.7) | 17 (10.2) | |
Cigarette preference | ||||
No | 972 (82.0) | 413 (81.3) | 137 (82.5) | 0.921 |
Yes | 214 (18.0) | 95 (18.7) | 29 (17.5) | |
Alcohol preference | ||||
No | 971 (81.9) | 422 (83.1) | 133 (80.1) | 0.669 |
Yes | 215 (18.1) | 86 (16.9) | 33 (19.9) | |
Constipation | ||||
No | 1086 (91.6) | 470 (92.5) | 146 (88.0) | 0.185 |
Yes | 100 (8.4) | 38 (7.5) | 20 (12.0) | |
Diarrhea | ||||
No | 918 (77.4) | 399 (78.5) | 128 (77.1) | 0.860 |
Yes | 268 (22.6) | 109 (21.5) | 38 (22.9) | |
Hemafecia | ||||
No | 1112 (93.8) | 479 (94.3) | 160 (96.4) | 0.397 |
Yes | 74 (6.2) | 29 (5.7) | 6 (3.6) | |
Anatomical location | ||||
Proximal colon | 261 (22.0) | 117 (23.0) | 54 (32.5) | 0.023 |
Distal colon | 505 (42.6) | 208 (40.9) | 69 (41.6) | |
Total colon | 420 (35.4) | 183 (36.0) | 43 (25.9) | |
Number of polyps | ||||
< 3 | 672 (56.7) | 276 (54.3) | 106 (63.9) | 0.099 |
≥ 3 | 514 (43.3) | 232 (45.7) | 60 (36.1) | |
Number of adenomas | ||||
0 | 392 (33.1) | 150 (29.5) | 57 (34.3) | 0.542 |
1-2 | 545 (46.0) | 237 (46.7) | 74 (44.6) | |
≥ 3 | 249 (21.0) | 121 (23.8) | 35 (21.1) | |
Size | ||||
< 0.5 | 400 (33.7) | 157 (30.9) | 118 (71.1) | < 0.001 |
0.5-1 | 627 (52.9) | 266 (52.4) | 37 (22.3) | |
> 1 | 159 (13.4) | 85 (16.7) | 11 (6.6) | |
Endoscopic classification | ||||
I | 512 (43.2) | 201 (39.6) | 99 (59.6) | < 0.001 |
II | 442 (37.3) | 191 (37.6) | 54 (32.5) | |
III-IV | 232 (19.6) | 116 (22.8) | 13 (7.8) | |
Hazard classification | ||||
Non-neoplastic polyps | 394 (33.2) | 152 (29.9) | 72 (43.4) | 0.002 |
Non-progressive adenoma | 564 (47.6) | 250 (49.2) | 79 (47.6) | |
Progressive adenoma | 228 (19.2) | 106 (20.9) | 15 (9.0) | |
Concomitant gastric polyp | ||||
No | 761 (64.2) | 356 (70.1) | 127 (76.5) | 0.001 |
Yes | 425 (35.8) | 152 (29.9) | 39 (23.5) | |
H. pylori | ||||
No | 699 (58.9) | 308 (60.6) | 104 (62.7) | 0.586 |
Yes | 487 (41.1) | 200 (39.4) | 62 (37.3) | |
Hyperlipidemia | ||||
No | 810 (68.3) | 341 (67.1) | 117 (70.5) | 0.714 |
Yes | 376 (31.7) | 167 (32.9) | 49 (29.5) | |
Uric acid levels | ||||
Normal | 1105 (93.2) | 475 (93.5) | 153 (92.2) | 0.839 |
Elevated | 81 (6.8) | 33 (6.5) | 13 (7.8) | |
TBIL, median (IQR) | 11.90 (9.30, 15.78) | 12.00 (9.30, 15.62) | 9.90 (5.90, 13.97) | < 0.001 |
TBA, median (IQR) | 3.00 (1.80, 4.68) | 3.30 (1.90, 4.93) | 2.90 (1.80, 5.60) | 0.149 |
hsCRP, median (IQR) | 0.60 (0.50, 1.30) | 0.60 (0.50, 1.30) | 0.50 (0.50, 1.40) | 0.775 |
CEA | ||||
Normal | 1150 (97.0) | 496 (97.6) | 161 (97.0) | 0.741 |
Elevated | 36 (3.0) | 12 (2.4) | 5 (3.0) | |
CA724 | ||||
Normal | 1137 (95.9) | 487 (95.9) | 155 (93.4) | 0.323 |
Elevated | 49 (4.1) | 21 (4.1) | 11 (6.6) | |
CA199 | ||||
Normal | 1160 (97.8) | 499 (98.2) | 163 (98.2) | 0.833 |
Elevated | 26 (2.2) | 9 (1.8) | 3 (1.8) | |
CA242 | ||||
Normal | 1154 (97.3) | 491 (96.7) | 159 (95.8) | 0.492 |
Elevated | 32 (2.7) | 17 (3.3) | 7 (4.2) |
Table 2 Univariate and multivariate logistic regression analysis of colorectal polyp recurrence 1 year after Endoscopic mucosal resection
Variables | Univariable analysis | Multivariable analysis | ||
OR (95%CI) | P value | OR (95%CI) | P value | |
Age | 1.04 (1.03-1.05) | < 0.001a | 1.05 (1.03-1.06) | < 0.001a |
Gender (%) | ||||
Female | Reference | - | Reference | - |
Male | 1.73 (1.37-2.19) | < 0.001a | 0.94 (0.69-1.27) | 0.684 |
BMI | 1.07 (1.03-1.11) | < 0.001a | 1.05 (1-1.1) | 0.056 |
Hypertension (%) | ||||
No | Reference | - | Reference | - |
Yes | 1.75 (1.31-2.33) | < 0.001a | 1.07 (0.75-1.52) | 0.722 |
Diabetes (%) | ||||
No | Reference | - | Reference | - |
Yes | 1.42 (0.98-2.08) | 0.067 | - | - |
CHD (%) | ||||
No | Reference | - | Reference | - |
Yes | 2.33 (1.34-4.04) | 0.003a | 1.41 (0.73-2.72) | 0.300 |
Family history (%) | ||||
No | Reference | - | Reference | - |
Yes | 10.07 (4.84-20.96) | < 0.001a | 11.34 (5.09-25.26) | < 0.001a |
Cigarette preference (%) | ||||
No | Reference | - | Reference | - |
Yes | 5.14 (3.5-7.54) | < 0.001a | 3.92 (2.50-6.14) | < 0.001a |
Alcohol preference (%) | ||||
No | Reference | - | Reference | - |
Yes | 1.34 (0.99-1.82) | 0.056 | - | - |
Constipation (%) | ||||
No | Reference | - | Reference | - |
Yes | 1.05 (0.69-1.58) | 0.831 | - | - |
Diarrhea (%) | ||||
No | Reference | - | Reference | - |
Yes | 1.40 (1.06-1.85) | 0.018a | 1.42 (1.02-1.99) | 0.038a |
Hemafecia (%) | ||||
No | Reference | - | Reference | - |
Yes | 1.24 (0.76-2.00) | 0.389 | - | - |
Anatomical location (%) | ||||
Proximal colon | Reference | - | Reference | - |
Distal colon | 1.08 (0.80-1.45) | 0.630 | 1.09 (0.77-1.05) | 0.635 |
Total colon | 2.16 (1.57-2.96) | < 0.001a | 0.92 (0.60-1.40) | 0.683 |
Number of polyps (%) | ||||
< 3 | Reference | - | Reference | - |
≥ 3 | 3.10 (2.43-3.96) | < 0.001a | 1.54 (1.07-2.21) | 0.019a |
Number of adenomas (%) | ||||
0 | Reference | - | Reference | - |
1-2 | 2.02 (1.55-2.63) | < 0.001a | 0.44 (0.10-1.95) | 0.279 |
≥ 3 | 8.96 (6.00-13.38) | < 0.001a | 1.01 (0.22-4.70) | 0.989 |
Size (%) | ||||
< 0.5 | Reference | - | Reference | - |
0.5-1 | 2.18 (1.69-2.81) | < 0.001a | 2.05 (1.50-2.80) | < 0.001a |
> 1 | 6.65 (4.25-10.43) | < 0.001a | 3.98 (2.02-7.86) | < 0.001a |
Endoscopic classification (%) | ||||
I | Reference | - | Reference | - |
II | 1.40 (1.09-1.81) | 0.010a | 0.71 (0.50-1.02) | 0.066 |
III-IV | 3.32 (2.35-4.68) | < 0.001a | 0.60 (0.32-1.11) | 0.106 |
Hazard classification (%) | ||||
Non-neoplastic polyps | Reference | - | Reference | - |
Non-progressive adenoma | 2.47 (1.89-3.21) | < 0.001a | 4.48 (1.02-19.73) | 0.048a |
Progressive adenoma | 6.07 (4.17-8.84) | < 0.001a | 5.29 (1.07-26.20) | 0.041a |
Concomitant gastric polyp (%) | ||||
No | Reference | - | Reference | - |
Yes | 0.90 (0.71-1.15) | 0.397 | - | - |
H. pylori (%) | ||||
No | Reference | - | Reference | - |
Yes | 1.89 (1.49-2.4) | < 0.001a | 1.82 (1.37-2.42) | < 0.001a |
Hyperlipidemia (%) | ||||
No | Reference | - | Reference | - |
Yes | 1.48 (1.15-1.90) | 0.002a | 1.20 (0.87-1.64) | 0.263 |
Uric acid levels (%) | ||||
Normal | Reference | - | Reference | - |
Elevated | 2.08 (1.26-3.41) | 0.004a | 1.15 (0.63-2.09) | 0.646 |
TBIL | 1.01 (1.00-1.04) | 0.259 | - | - |
TBA | 1.02 (0.99-1.05) | 0.298 | - | - |
hsCRP | 1.03 (0.98-1.06) | 0.164 | - | - |
CEA (%) | ||||
Normal | Reference | - | Reference | - |
Elevated | 1.10 (0.56-2.16) | 0.773 | - | - |
CA724 (%) | ||||
Normal | Reference | - | Reference | - |
Elevated | 0.96 (0.54-1.72) | 0.899 | - | - |
CA199 (%) | ||||
Normal | Reference | - | Reference | - |
Elevated | 0.92 (0.42-2.00) | 0.824 | - | - |
CA242 (%) | ||||
Normal | Reference | - | Reference | - |
Elevated | 0.60 (0.30-1.22) | 0.162 | - | - |
Table 3 Comparison of the performance of different models in training set, validation set and prospective set
Model | AUC | Sensitivity | Specificity | Accuracy | Precision | F1 score | |
Training set | LR | 0.803 | 0.733 | 0.728 | 0.731 | 0.774 | 0.753 |
DT | 0.754 | 0.806 | 0.613 | 0.721 | 0.726 | 0.764 | |
RF | 0.861 | 0.727 | 0.835 | 0.775 | 0.849 | 0.784 | |
SVM | 0.808 | 0.720 | 0.753 | 0.734 | 0.788 | 0.752 | |
XGB | 0.909 | 0.756 | 0.904 | 0.820 | 0.907 | 0.824 | |
Validation set | LR | 0.809 | 0.743 | 0.686 | 0.719 | 0.756 | 0.750 |
DT | 0.799 | 0.785 | 0.723 | 0.758 | 0.788 | 0.786 | |
RF | 0.902 | 0.750 | 0.918 | 0.823 | 0.923 | 0.828 | |
SVM | 0.819 | 0.743 | 0.705 | 0.726 | 0.767 | 0.755 | |
XGB | 0.921 | 0.788 | 0.914 | 0.843 | 0.923 | 0.850 | |
Prospective set | LR | 0.779 | 0.568 | 0.847 | 0.711 | 0.780 | 0.657 |
DT | 0.812 | 0.765 | 0.729 | 0.747 | 0.765 | 0.747 | |
RF | 0.943 | 0.667 | 0.988 | 0.831 | 0.982 | 0.794 | |
SVM | 0.791 | 0.617 | 0.824 | 0.723 | 0.769 | 0.685 | |
XGB | 0.963 | 0.840 | 0.941 | 0.892 | 0.932 | 0.883 |
- Citation: Shi YH, Liu JL, Cheng CC, Li WL, Sun H, Zhou XL, Wei H, Fei SJ. Construction and validation of machine learning-based predictive model for colorectal polyp recurrence one year after endoscopic mucosal resection. World J Gastroenterol 2025; 31(11): 102387
- URL: https://www.wjgnet.com/1007-9327/full/v31/i11/102387.htm
- DOI: https://dx.doi.org/10.3748/wjg.v31.i11.102387