BPG is committed to discovery and dissemination of knowledge
Retrospective Study
Copyright ©The Author(s) 2025.
World J Gastroenterol. Nov 14, 2025; 31(42): 112180
Published online Nov 14, 2025. doi: 10.3748/wjg.v31.i42.112180
Table 1 Baseline characteristics of colorectal cancer patients, n (%)
Characteristic
Overall (n = 765)
Training set (n = 612)
Validation set (n = 153)
P value
Age, median (95%CI)62.26 (61.58-62.96)62.12 (61.36-62.88)62.84 (61.16-64.51)0.422
BSA, median (95%CI)1.71 (1.69-1.72)1.71 (1.69-1.72)1.71 (1.68-1.74)0.996
BMI, median (95%CI)22.94 (22.70-23.17)22.99 (22.72-23.26)22.75 (22.23-23.27)0.428
ALB, median (95%CI)37.53 (37.27-37.80)37.48 (37.19-37.50)37.74 (37.10-38.37)0.453
CEA, median (95%CI)182.30 (125.72-246.75)178.10 (115.79-240.41)199.10 (53.75-344.45)0.770
CA19-9, median (95%CI)169.03 (113.66-237.21)177.95 (108.12-247.78)133.33 (0.11-266.55)0.571
CA125, median (95%CI)22.20 (19.20-25.77)21.93 (18.30-25.56)23.30 (13.48-33.12)0.804
Gender0.607
Male449 (58.7)362 (59.15)87 (56.86)
Female316 (41.3)250 (40.85)66 (43.14)
Smoking0.572
Yes275 (35.9)223 (36.44)52 (33.99)
No316 (41.3)389 (63.56)101 (66.01)
Diabetes0.529
Yes128 (16.7)105 (17.16)23 (15.03)
No637 (83.2)507 (82.84)130 (84.97)
Hypertension0.417
Yes307 (40.1)250 (40.85)57 (37.25)
No458 (59.8)362 (59.15)96 (62.75)
T0.491
118 (2.4)15 (2.45)3 (1.96)
284 (11.0)67 (10.95)17 (11.11)
3264 (34.5)323 (52.78)76 (49.67)
4399 (52.2)207 (33.82)57 (37.25)
N0.255
04 (0.5)126 (20.59)36 (23.53)
1162 (21.2)261 (42.65)68 (44.44)
2270 (35.3)222 (36.27)48 (31.37)
3329 (43.0)3 (0.49)1 (0.65)
M0.103
0355 (46.4)293 (47.88)62 (40.52)
1410 (53.5)319 (52.12)91 (59.48)
Staging0.180
I15 (2.0)11 (1.8)4 (2.61)
II66 (8.6)56 (9.15)10 (6.54)
III267 (34.9)219 (35.78)48 (31.37)
IV417 (54.5)326 (53.27)91 (59.48)
Position0.687
Ascending colon21 (2.7)118 (19.28)30 (19.61)
Transverse colon27 (3.5)15 (2.45)6 (3.92)
Descending colon148 (19.3)24 (3.92)3 (1.96)
Sigmoid colon209 (27.3)164 (26.8)45 (29.41)
Rectum360 (47.1)291 (47.55)69 (45.1)
Hepatic metastasis0.195
Yes261 (34.1)202 (33.01)59 (38.56)
No504 (65.8)410 (66.99)94 (61.44)
Lung metastasis0.064
Yes214 (27.9)162 (26.47)52 (33.99)
No551 (72.0)450 (73.53)101 (66.01)
Peritoneum metastasis0.201
Yes47 (6.1)41 (6.7)6 (3.92)
No718 (93.8)571 (93.3)147 (96.08)
KRAS0.961
Yes124 (16.2)99 (16.18)25 (16.34)
No641 (83.7)513 (83.82)128 (83.66)
BRAF0.051
Yes15 (1.9) 15 (2.45)0 (0)
No750 (98) 597 (97.55)153 (100)
TP530.286
Yes61 (7.9) 52 (8.5)9 (5.88)
No704 (92) 560 (91.5)144 (94.12)
Myelosuppression0.336
Yes250 (32.6) 195 (31.86)55 (35.95)
No515 (67.3) 417 (68.14)98 (64.05)
Chemotherapy cycles0.572
1-2 cycles315 (41.2) 258 (42.16)57 (37.25)
3-4 cycles237 (31.0) 182 (29.74)55 (35.95)
5 or more cycles213 (27.8) 172 (28.1)41 (26.8)
Chemotherapy regimens0.248
CapeOx571 (74.6) 463 (75.65)108 (70.59)
FOLFOX85 (11.1) 63 (10.29)22 (14.38)
FOLFIRI109 (14.2) 86 (14.05)23 (15.03)
Table 2 Performance of 10 machine learnings for predicting myelosuppression after first-line chemotherapy for colorectal cancer
MLs
AUC (95%CI)
AUPRC (95%CI)
Accuracy (%)
Sensitivity (%)
Specificity (%)
F1 score
PPV
NPV
Training set
Adaptive boosting0.88 (0.85-0.91)0.79 (0.73-0.84)0.790.490.940.610.810.78
Artificial neural network0.96 (0.95-0.97)0.94 (0.91-0.96)0.730.530.830.570.610.78
Decision tree1.00 (1.00-1.00)1.00 (1.00-1.00)1.001.001.001.001.001.00
Extra trees1.00 (1.00-1.00)1.00 (1.00-1.00)1.001.001.001.001.001.00
Gradient boosting machine0.99 (0.99-1.00)0.99 (0.99-1.00)0.980.930.990.960.990.97
K-Nearest neighbors0.91 (0.89-0.93)0.79 (0.75-0.84)0.900.850.920.850.850.92
Logistic regression0.75 (0.71-0.79)0.57 (0.50-0.65)0.690.250.920.350.610.70
Random forest1.00 (0.99-1.00)1.00 (0.99-1.00)1.001.001.001.001.001.00
Support vector machine0.87 (0.83-0.90)0.79 (0.73-0.84)0.680.051.000.101.000.67
Extreme gradient boosting1.00 (0.99-1.00)1.00 (0.99-1.00) 1.001.001.001.001.001.00
Validation set
Adaptive boosting0.83 (0.76-0.89)0.72 (0.60-0.85)0.780.350.950.470.710.79
Artificial neural network0.69 (0.60-0.78)0.61 (0.48-0.74)0.650.420.750.400.390.77
Decision tree0.70 (0.62-0.78)0.52 (0.41-0.64)0.820.810.830.720.650.92
Extra trees0.94 (0.89-0.97)0.90 (0.82-0.96)0.870.720.930.760.790.89
Gradient boosting machine0.92 (0.86-0.97)0.90 (0.84-0.95)0.830.670.890.690.710.88
K-Nearest neighbors0.75 (0.67-0.83)0.62 (0.48-0.74)0.800.700.850.670.640.88
Logistic regression0.67 (0.59-0.76)0.49 (0.38-0.65)0.680.200.860.270.380.74
Random forest0.96 (0.93-0.98)0.93 (0.87-0.97)0.880.720.950.780.840.90
Support vector machine0.75 (0.67-0.83)0.65 (0.52-0.77)0.750.121.000.211.000.74
Extreme gradient boosting0.97 (0.94-0.99)0.92 (0.83-0.99)0.880.790.920.790.790.92
Table 3 Logistic regression based on 10 machine learnings for predicting myelosuppression
MLsUnivariate logistic regression
Multivariate logistic regression
OR (95%CI)
P value
OR (95%CI)
P value
Adaptive boosting49.558 (15.807-155.376)0.0000.083 (0.003-2.486)0.151
Artificial neural network54.444 (13.032-227.465)0.0001.572 (0.043-57.273)0.805
Decision tree10.587 (5.084-22.047)0.0000.563 (0.102-3.109)0.510
Extra trees390.471 (75.734-2013.192)0.00031.948 (2.468-413.586)0.008
Gradient boosting machine104.831 (32.579-337.322)0.0002.169 (0.176-26.731)0.546
K-Nearest neighbors24.992 (9.652-64.711)0.0003.081 (0.419-22.650)0.269
Logistic regression279.116 (24.997-3116.614)0.0000.225 (0.000-129.448)0.646
Random forest404.139 (88.973-1835.710)0.00094.621 (1.178-7597.788)0.042
Support vector machine4.180 (0.332-52.698)0.26923.780 (0.434-1303.403)0.121
Extreme gradient boosting131.875 (39.641-438.709)0.0002.669 (0.170-41.930)0.485
Table 4 Performance comparison of extreme gradient boosting, clinic nomogram, and clinic-machine learning nomograms
Model
AUC (95%CI)
AUPRC (95%CI)
Accuracy (%)
Sensitivity (%)
Specificity (%)
F1 score
PPV
NPV
Training set
Extreme gradient boosting1.00 (1.00-1.00)1.00 (0.99-1.00)1.001.001.001.001.001.00
Clinic nomogram0.68 (0.63-0.72)0.49 (0.43-0.56)0.690.250.920.690.610.71
Clinic-machine learning nomogram0.99 (0.99-1.00)0.99 (0.99-1.00)1.001.001.001.001.001.00
Validation set
Extreme gradient boosting0.97 (0.94-0.99)0.92 (0.83-0.99)0.880.790.920.790.790.92
Clinic nomogram0.61 (0.52-0.71)0.48 (0.36-0.62)0.680.210.860.270.380.74
Clinic-machine learning nomogram0.96 (0.92-0.98)0.93 (0.87-0.98)0.900.770.950.800.850.91
Table 5 Performance of clinic-machine learning nomogram in the testing set
Model
AUC (95%CI)
AUPRC (95%CI)
Accuracy (%)
Sensitivity (%)
Specificity (%)
F1 score
PPV
NPV
Clinic-machine learning nomogram0.95 (0.93-0.99)0.83 (0.71-0.92)0.810.520.970.650.890.79