Machine-learning models integrating preoperative clinical factors and circulating tumor DNA features predict lymph node metastasis in esophageal carcinoma

doi:10.4251/wjgo.v18.i6.117851

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 18, Issue 6

This Article

(15)

(0)

(24)

(279)

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Journal Information of This Article

Publication Name

World Journal of Gastrointestinal Oncology

ISSN

1948-5204

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Retrospective Study

Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.

World J Gastrointest Oncol. Jun 15, 2026; 18(6): 117851
Published online Jun 15, 2026. doi: 10.4251/wjgo.v18.i6.117851

Open in New Tab Full Size Figure Download Figure

Figure 1 Patient selection and data analysis flow of patients with esophageal cancer. CT: Computed tomography; ESCC: Esophageal squamous cell carcinoma; ESD: Endoscopic submucosal dissection.

Open in New Tab Full Size Figure Download Figure

Figure 2 Performance of various models established by different feature selection strategies and machine-learning algorithms. A and B: Area under the curve (AUC) and F1 score of the training set; C and D: AUC and F1 score of the validation cohort. Heatmap of the AUC or F1 score of each feature selection method (rows) with each machine-learning algorithm (columns). In the heatmaps below, warmer colors (orange) indicate higher performance. Ada: Adaptive; DC: Decision tree; GBDT: Gradient boosting decision tree; KNN: K-nearest neighbors; LASSO: Least absolute shrinkage and selection operator; LR: Logistic regression; NB: Naïve Bayes; RF: Random forest; RFECV: Recursive feature elimination with cross-validation; RFLV: Removing features with low variance; SFM: SelectFromModel; SVM: Support vector machine; UFS: Univariate feature selection; XGBoost: Extreme gradient boosting machine.

Open in New Tab Full Size Figure Download Figure

Figure 3 Performance comparison between machine-learning models and computed tomography in evaluating lymph node metastasis status in esophageal cancer patients. A: The receiver operating characteristic curves in the training cohort, where lines of different colors represent different models or computed tomography (CT); B-F: The sensitivity, specificity, positive predictive value, negative predictive value and accuracy of the training and validation sets. Model-1 with the highest area under the curve in the training set, developed by the random forest algorithm, included five factors: (1) Tumor location; (2) Depth of tumor invasion; (3) Tumor length; (4) CT reported results; and (5) Number of abnormal protein biomarkers. Model-2 with the best F1 score in the training cohort, established by naïve Bayes algorithm, contained four variables: (1) Tumor location; (2) Depth of tumor invasion; (3) Tumor length; and (4) CT results.

Open in New Tab Full Size Figure Download Figure

Figure 4 The comparison of cell-free DNA features grouped by lymph node metastasis status. A: The cell-free DNA concentration; B: Number of circulating tumor DNA (ctDNA) variants; C: Average variant allele frequency (VAF) of ctDNA mutations; D: Average VAF of ctDNA TP53 mutations; E: Average VAF of ctDNA PIK3CA mutations; F: Average VAF of PTCH1 mutations of the lymph node negative and positive groups. cfDNA: Cell-free DNA; ctDNA: Circulating tumor DNA; LNM: Lymph node metastasis; VAF: Variant allele frequency.

Open in New Tab Full Size Figure Download Figure

Figure 5 Performance comparison among different random forest models with or without circulating tumor DNA features. A: Area under the curve and F1 score of the circulating tumor DNA cohort. It compares the area under the curve and F1 score of different model combinations in the circulating tumor DNA cohort; B: Performance capacity of three machine-learning models. It details the performance metrics of three selected machine-learning models. AUC: Area under the curve; CT: Computed tomography; PPV: Positive predictive value; NPV: Negative predictive value; VAF: Variant allele frequency.

Citation: Gu RT, Li X, Cheng W, Wang XW, Jin H, Liu T. Machine-learning models integrating preoperative clinical factors and circulating tumor DNA features predict lymph node metastasis in esophageal carcinoma. World J Gastrointest Oncol 2026; 18(6): 117851
URL: https://www.wjgnet.com/1948-5204/full/v18/i6/117851.htm
DOI: https://dx.doi.org/10.4251/wjgo.v18.i6.117851

Gu RT, Li X, Cheng W, Wang XW, Jin H, Liu T. Machine-learning models integrating preoperative clinical factors and circulating tumor DNA features predict lymph node metastasis in esophageal carcinoma. World J Gastrointest Oncol 2026; 18(6): 117851 [DOI: 10.4251/wjgo.v18.i6.117851]

All content on this site: Copyright © 1993-2026 Baishideng Publishing Group Inc, its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies. For all open access content, the relevant licensing terms apply.