BPG is committed to discovery and dissemination of knowledge
Retrospective Study
Copyright: ©Author(s) 2026.
World J Gastrointest Oncol. Jun 15, 2026; 18(6): 117851
Published online Jun 15, 2026. doi: 10.4251/wjgo.v18.i6.117851
Figure 1
Figure 1 Patient selection and data analysis flow of patients with esophageal cancer. CT: Computed tomography; ESCC: Esophageal squamous cell carcinoma; ESD: Endoscopic submucosal dissection.
Figure 2
Figure 2 Performance of various models established by different feature selection strategies and machine-learning algorithms. A and B: Area under the curve (AUC) and F1 score of the training set; C and D: AUC and F1 score of the validation cohort. Heatmap of the AUC or F1 score of each feature selection method (rows) with each machine-learning algorithm (columns). In the heatmaps below, warmer colors (orange) indicate higher performance. Ada: Adaptive; DC: Decision tree; GBDT: Gradient boosting decision tree; KNN: K-nearest neighbors; LASSO: Least absolute shrinkage and selection operator; LR: Logistic regression; NB: Naïve Bayes; RF: Random forest; RFECV: Recursive feature elimination with cross-validation; RFLV: Removing features with low variance; SFM: SelectFromModel; SVM: Support vector machine; UFS: Univariate feature selection; XGBoost: Extreme gradient boosting machine.
Figure 3
Figure 3 Performance comparison between machine-learning models and computed tomography in evaluating lymph node metastasis status in esophageal cancer patients. A: The receiver operating characteristic curves in the training cohort, where lines of different colors represent different models or computed tomography (CT); B-F: The sensitivity, specificity, positive predictive value, negative predictive value and accuracy of the training and validation sets. Model-1 with the highest area under the curve in the training set, developed by the random forest algorithm, included five factors: (1) Tumor location; (2) Depth of tumor invasion; (3) Tumor length; (4) CT reported results; and (5) Number of abnormal protein biomarkers. Model-2 with the best F1 score in the training cohort, established by naïve Bayes algorithm, contained four variables: (1) Tumor location; (2) Depth of tumor invasion; (3) Tumor length; and (4) CT results.
Figure 4
Figure 4 The comparison of cell-free DNA features grouped by lymph node metastasis status. A: The cell-free DNA concentration; B: Number of circulating tumor DNA (ctDNA) variants; C: Average variant allele frequency (VAF) of ctDNA mutations; D: Average VAF of ctDNA TP53 mutations; E: Average VAF of ctDNA PIK3CA mutations; F: Average VAF of PTCH1 mutations of the lymph node negative and positive groups. cfDNA: Cell-free DNA; ctDNA: Circulating tumor DNA; LNM: Lymph node metastasis; VAF: Variant allele frequency.
Figure 5
Figure 5 Performance comparison among different random forest models with or without circulating tumor DNA features. A: Area under the curve and F1 score of the circulating tumor DNA cohort. It compares the area under the curve and F1 score of different model combinations in the circulating tumor DNA cohort; B: Performance capacity of three machine-learning models. It details the performance metrics of three selected machine-learning models. AUC: Area under the curve; CT: Computed tomography; PPV: Positive predictive value; NPV: Negative predictive value; VAF: Variant allele frequency.


Write to the Help Desk