Machine-learning models integrating preoperative clinical factors and circulating tumor DNA features predict lymph node metastasis in esophageal carcinoma

doi:10.4251/wjgo.v18.i6.117851

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 18, Issue 6

This Article

(18)

(17)

(0)

(25)

(337)

Table of Contents

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Journal Information of This Article

Publication Name

World Journal of Gastrointestinal Oncology

ISSN

1948-5204

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Retrospective Study Open Access

Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.

World J Gastrointest Oncol. Jun 15, 2026; 18(6): 117851
Published online Jun 15, 2026. doi: 10.4251/wjgo.v18.i6.117851

Machine-learning models integrating preoperative clinical factors and circulating tumor DNA features predict lymph node metastasis in esophageal carcinoma

Ren-Tong Gu, Xin Li, Wen Cheng, Xiao-Wei Wang, Hai Jin, Tao Liu

Ren-Tong Gu, Department of Thoracic Surgery, Eastern Hepatobiliary Surgery Hospital, Naval Medical University, Shanghai 201800, China

Xin Li, Xiao-Wei Wang, Hai Jin, Department of Thoracic Surgery, Changhai Hospital, Naval Medical University, Shanghai 200433, China

Wen Cheng, Department of Thoracic Surgery, Shanghai Fourth People’s Hospital, School of Medicine, Tongji University, Shanghai 200434, China

Tao Liu, Department of Thoracic Surgery, Peking University First Hospital, Beijing 100034, China

ORCID number: Ren-Tong Gu (0009-0005-7257-3988); Xin Li (0009-0008-2415-4385); Wen Cheng (0000-0001-9587-3813); Xiao-Wei Wang (0000-0002-6287-6679); Hai Jin (0000-0003-0430-2834); Tao Liu (0009-0001-5963-8215).

Co-first authors: Ren-Tong Gu and Xin Li.

Co-corresponding authors: Hai Jin and Tao Liu.

Author contributions: Gu RT and Li X have played indispensable roles in the experimental design and data interpretation as co-first authors; Gu RT, Li X, Cheng W and Wang XW were involved in data curation, formal analysis, and writing original draft; Jin H and Liu T were responsible for supervision and writing review and editing as co-corresponding authors; all of the authors read and approved the final version of the manuscript to be published.

Institutional review board statement: This study complied with all relevant national regulations and institutional policies, was conducted in accordance with the tenets of the Helsinki Declaration (as revised in 2013), and was approved by the Institutional Review Board of Changhai Hospital (No. CHEC2020-021).

Informed consent statement: All participants provided informed consent.

Conflict-of-interest statement: All authors declare no conflict of interest in publishing the manuscript.

Data sharing statement: The data used in this study may be obtained upon reasonable request from the corresponding authors.

Corresponding author: Tao Liu, MD, Department of Thoracic Surgery, Peking University First Hospital, No. 8 Xishiku Street, Xicheng District, Beijing 100034, China. liu-ta0@outlook.com

Received: December 18, 2025
Revised: January 31, 2026
Accepted: March 19, 2026
Published online: June 15, 2026
Processing time: 173 Days and 21.1 Hours

Abstract

BACKGROUND

Accurate assessment of lymph node metastasis (LNM) is important in patients with esophageal cancer (EC).

AIM

To construct machine learning (ML) models using routine clinical data to predict LNM in patients with EC, exploring predictive capacity after integrating circulating tumor DNA (ctDNA) features.

METHODS

In this retrospective study, we collected demographic information, risk factors, protein biomarkers, computed tomography (CT), endoscopic, and pathological data of 206 patients with EC. The ctDNA data were available for 57 patients. A total of 81 models were developed using different feature-selection techniques and ML algorithms. A total of 79 (38.3%) patients had pathologically confirmed LNM.

RESULTS

The different ML models demonstrated good predictive performance, with a median area under the curve (AUC) of 0.767 (interquartile range: 0.679, 0.828) and median F1 score of 0.715 (interquartile range: 0.672, 0.772). The variables were selected through univariate and multivariate logistic analyses and the best model was constructed using the random forest algorithm. It incorporated tumor length, location, CT results, depth of tumor invasion, and number of aberrant protein biomarkers. It demonstrated an AUC of 0.79 (95%CI: 0.65-0.93) and accuracy of 82.26% (95%CI: 70.47%-90.80%), which were superior to the CT results. Incorporating ctDNA features yielded modest improvements in AUC (9.0%) and F1 score (14.3%); however, these gains were not statistically significant.

CONCLUSION

Combining ctDNA features with preoperative clinical factors and CT results can enhance the predictive ability of LNM models in patients with EC.

Key Words: Lymph node metastasis; Computed tomography; Circulating tumor DNA; Variant allele frequency; Esophageal cancer

Core Tip: This retrospective study developed machine learning models to predict lymph node metastasis in 206 esophageal cancer patients. The optimal random forest model, using clinical, computed tomography, and pathological features, achieved an area under the curve of 0.79 and 82.26% accuracy, outperforming computed tomography alone. Integrating circulating tumor DNA features from a 57-patient subset further improved area under the curve and F1 score by 9.0% and 14.3%, respectively, demonstrating enhanced predictive capability.

Citation: Gu RT, Li X, Cheng W, Wang XW, Jin H, Liu T. Machine-learning models integrating preoperative clinical factors and circulating tumor DNA features predict lymph node metastasis in esophageal carcinoma. World J Gastrointest Oncol 2026; 18(6): 117851
URL: https://www.wjgnet.com/1948-5204/full/v18/i6/117851.htm
DOI: https://dx.doi.org/10.4251/wjgo.v18.i6.117851

INTRODUCTION

Esophageal cancer (EC) is one of the most common and fatal malignant tumors worldwide, with the eighth highest morbidity and sixth highest mortality[1]. In China, esophageal squamous cell carcinoma (ESCC) accounts for as high as 90% of all EC cases[2,3]. For patients with resectable EC, surgery remains the cornerstone treatment. However, critical preoperative decisions – such as whether to administer neoadjuvant therapy[4] or how extensively to perform lymphadenectomy[5,6] – heavily depend on an accurate assessment of lymph node metastasis (LNM). This assessment is crucial not only for prognostic stratification[7-9] but also for tailoring treatment to balance survival benefits against surgical risks[5,6,10]. Consequently, the inability to reliably identify preoperative LNM represents a significant unmet clinical need[11].

Currently, preoperative LNM evaluation relies on methods like puncture (which is often anatomically infeasible) and imaging techniques including enhanced computed tomography (CT)[12-15]. While these modalities can visualize lymph node (LN) morphology, they struggle to reliably distinguish metastasis from inflammation, leading to suboptimal diagnostic performance. For example, CT exhibits high specificity but disappointingly low sensitivity, particularly for abdominal LNM[13,16,17]. Therefore, there is a pressing demand for more accurate predictive tools that integrate novel indicators[11].

Machine learning (ML) models have emerged as a promising avenue to address this challenge. By integrating multidimensional preoperative data – including demographic, clinical, imaging (radiomics), and even gene expression signatures[11,18-20] – these models can potentially improve LNM prediction accuracy[21-24]. Furthermore, liquid biopsy biomarkers like circulating tumor DNA (ctDNA) offer a novel dimension reflecting real-time tumor biology. Although studies in other cancers, such as breast cancer and non-small cell lung cancer (NSCLC), have explored the link between ctDNA features [e.g., variant allele frequency (VAF)] and LNM with mixed results[25-27], and ESCC data suggests VAF correlates with tumor burden[28], the utility of preoperative ctDNA for LNM prediction in EC remains entirely unexplored.

Given the moderate predictive ability of models based on conventional data[11,20,21], and the unclear role of ctDNA in this setting, we conducted this study with two primary aims: (1) To develop and compare various ML models for preoperative LNM prediction in EC using routine clinical and imaging data; and (2) To exploratorily investigate whether incorporating ctDNA features could enhance the predictive power of these models.

MATERIALS AND METHODS

Study design

This retrospective study was conducted at the Changhai Hospital. In total, 206 patients with EC were included in this study.

Inclusion criteria: (1) Age ≥ 18 years; (2) Primary EC confirmed by pathology; (3) Radical resection of EC; (4) Pathological examination performed to determine the status of LNM; and (5) CT data were available within 2 weeks before surgery.

Exclusion criteria: (1) The presence of multiple primary carcinomas; and (2) Fewer than 10 LNs harvested during surgery.

This study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Review Board of Changhai Hospital (No. CHEC2020-021). All participants provided written informed consent.

Data collection and follow-up

Clinical data, including demographic information (age, sex, height, and weight), risk factors (smoking, drinking, and history of cancer), CT-reported LN status, endoscopic data, including tumor location, length (defined as the longitudinal extent measured during endoscopy), and differentiation, and pathological reports (LNM status, location of positive LN, number of harvested LN, and tumor node metastasis stage), were collected retrospectively from the medical database of the hospital. Specifically, all patients underwent preoperative contrast-enhanced chest and abdominal CT. The presence of positive LNs was reported by consensus of two experienced radiologists who were blinded to the final pathological outcomes, based on the following criteria: (1) The maximum short diameter of a LN exceeded 10 mm; and (2) The LN showed inhomogeneous enhancement. Additionally, the number of aberrant protein biomarkers in the preoperative examination was counted. The expression levels of four commonly used protein biomarkers [squamous cell carcinoma (SCC), cytokeratin 19 fragment antigen 21-1 (CYFRA21-1), carcinoembryonic antigen (CEA), and carbohydrate antigen 199] were collected, and the corresponding cutoff values were defined as 1.5 ng/mL, 2.08 ng/mL, 5 ng/mL, and 37 U/mL, respectively. The presence of positive LNs was reported when the maximum short diameter of a LN exceeded 10 mm or when an inhomogeneous enhancement of LNs was observed on CT. LNM was evaluated in the pathological reports according to the 8^th edition of the American Joint Committee on Cancer tumor node metastasis staging system. All patients were systematically followed up via telephone to obtain their survival data. Disease-free survival (DFS) was determined from the surgery date until the occurrence of recurrent disease, death from any cause, or the last follow-up.

The cell-free DNA and ctDNA analyses

Fifty-seven patients with available archived preoperative plasma samples underwent preoperative cell-free DNA (cfDNA) detection. The selection for ctDNA testing was based on sample availability and sequencing budget, not on specific clinical or pathological characteristics, to minimize selection bias. A comparison of key baseline clinical characteristics (e.g., age, gender, tumor stage) between the ctDNA subgroup and the rest of the cohort showed no statistically significant differences (Supplementary Table 1), supporting the representativeness of the ctDNA subset. The mutational landscape of ctDNA has been previously reported[27]. The cfDNA sequencing procedure could be summarised as follows: (1) The cfDNA extraction from plasma using the QIAamp Circulating Nucleic Acid Kit (QIAGEN, Hilden, Germany), followed by qualification using the Qubit fluorometer 3.0 (Life Technologies, Grand Island, NY, United States) and 2100 Bioanalyzer (Technologies, Palo Alto, CA, United States); (2) Construction of cfDNA libraries using the KAPA library preparation kit (KAPA Biosystems, Boston, MA, United States)[28]; (3) Capture of target genes using a 61-gene panel (Yunsheng Medical Laboratory Co., Ltd, Shanghai, China); (4) Sequencing of the enriched libraries on HiSeq 2500 (Illumina, San Diego, CA, United States) with paired-end 150bp mode; (5) Calling of variants using a custom pipeline[28,29]; and (6) Extraction of variables such as the cfDNA concentration, average VAF of patients, total number of variants per patient, mutation status, and average VAF of frequently mutated genes from the sequencing data.

Feature selection

The data of patients were randomly divided into training and validation sets in a ratio of 7:3. All numerical features were standardised using the mean and standard deviation method in the Python package scikit-learn (sklearn). All data, except ctDNA variables, were initially used for feature selection in the training set using various methods. Traditional univariate and multivariate logistic analyses were performed to determine the independent risk factors by calculating the odds ratio (OR). Subsequently, we used the ML method to select features, including removing features with low variance, univariate feature selection, recursive feature elimination with cross-validation, feature selection using SelectFromModel, and the shrinkage method of the least absolute shrinkage and selection operator with cross-validation. A subset of variables mentioned in the literature that affect the LNM status of patients with EC was also utilised. Categorical or numerical variables were used for certain data. For example, smoking history was categorised as smoking or no smoking and quantified as smoking index in pack years. Similarly, drinking history was categorised as drinking or not drinking and quantified as alcohol units. Protein biomarkers of SCC, CYFRA21-1, CEA, and carbohydrate antigen 199 were categorised as being in the normal or abnormal range and quantified by their expression levels.

Model development and validation

Different ML algorithms were used to train the models. The average area under the curve (AUC) and the F1 score were used to evaluate the overall performance of each model with a 10-fold cross-validation (CV). ML models include logistic regression, decision tree, random forest (RF), support vector machine, k-nearest neighbour, naïve Bayes, adaptive boosting, gradient boosting decision tree, and extreme gradient boosting machine. Models with the highest average AUC and F1 score in the training set were selected to assess the LNM status of the validation set. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were used to evaluate the predictive capabilities of the selected models. We chose the model with comparable performance metrics in the training and validation sets as the best model and fitted it to the ctDNA cohort. The AUC and F1 score of the various variable combinations were evaluated and compared with the CT-reported nodal results.

Statistical analysis

The Kaplan-Meier method was used to estimate the median DFS and overall survival (OS), and log-rank test was used to analyse the survival curves of the different groups. To identify the differences in clinical characteristics between the training and validation groups, we used the independent sample t-test or Mann-Whitney U test for continuous variables and the χ² test for categorical variables. The bilateral statistical significance level was set at P < 0.05. Statistical analyses and graphing were performed using R version 4.2.2 (The R Foundation for Statistical Computing, Vienna, Austria). Feature selection and modelling were performed using Python version 3.9 with scikit-learn 1.3.0 packages.

RESULTS

Clinical characteristics of the ESCC cohort

A total of 206 patients with ESCC who met the inclusion criteria were randomly assigned to the training group (n = 144, mean age: 64.5 ± 7.5 years) and validation group (n = 62, mean age: 63.1 ± 7.4 years) in a ratio of 7:3 (Figure 1 and Table 1). The study included 161 men (78.2%) and 45 women (21.8%) with an average age of 64.1 ± 7.5 years. Additionally, 51.9 % of the patients had a smoking history, whereas 38.8% had a history of drinking alcohol (Table 1). The median endoscopic tumor lengths were 3 cm [interquartile range (IQR): 2, 4.5], 3.4 cm (IQR: 2, 4.5), and 3 cm (IQR: 2, 4) in the entire data, training, and validation sets, respectively. Overall, the rates of pathologically positive LNM were 38.3% (79/206), 38.2% (55/144), and 38.7% (24/62) in the entire cohort, training set, and validation set respectively. The LNM positivity rates reported by CT were 28.2% (58/206), 29.9% (43/144), and 24.2% (15/62) for the whole, training, and validation cohorts, respectively. The training and validation sets were comparable in age, sex, body mass index, smoking history, drinking history, protein biomarker levels, tumor location, tumor length, tumor grade, tumor invasion depth, and LNM status (Table 1).

Open in New Tab Full Size Figure Download Figure

Figure 1 Patient selection and data analysis flow of patients with esophageal cancer. CT: Computed tomography; ESCC: Esophageal squamous cell carcinoma; ESD: Endoscopic submucosal dissection.

Table 1 Clinical characteristic of esophageal patients, median (interquartile range)/mean (SD).

Variables	Whole cohort (n = 206)	Training cohort (n = 144)	Validation cohort (n = 62)	P value
Age	64.1 (7.5)	64.5 (7.5)	63.1 (7.4)	0.218
Gender				0.570
Female	45 (21.8%)	33 (22.9%)	12 (19.4%)
Male	161 (78.2%)	111 (77.1%)	50 (80.6%)
Body mass index	23.1 (2.65)	23.1 (2.7)	23.1 (2.5)	0.965
Smoking and drinking				0.267
None	86 (41.7%)	60 (41.7%)	26 (41.9%)
Smoking alone	40 (19.4%)	29 (20.1%)	11 (17.7%)
Drinking alone	13 (6.3%)	6 (4.2%)	7 (11.3%)
Both	67 (32.5%)	49 (34%)	18 (29%)
Squamous cell carcinoma level	1 (0.6, 1.4)	1 (0.6, 1.4)	1.1 (0.8, 1.3)	0.520
Cytokeratin 19 fragment antigen 21-1 level	1.4 (0.9, 1.8)	1.4 (0.9, 1.9)	1.3 (0.8, 1.8)	0.290
Carcinoembryonic antigen level	3.4 (2.6, 4.3)	3.4 (2.6, 4.4)	3.5 (2.6, 4.2)	0.924
Carbohydrate antigen 199	22.3 (14.9, 29.7)	22.5 (14.8, 30)	22.2 (16.8, 27.2)	0.828
Tumor location				0.480
Upper	63 (30.6%)	44 (30.6%)	19 (30.6%)
Middle	130 (63.1%)	89 (61.8%)	41 (66.1%)
Lower	13 (6.3%)	11 (7.6%)	2 (3.2%)
Tumor length	3 (2, 4.5)	3.4 (2, 4.5)	3 (2, 4)	0.082
Computed tomography-reported nodal status				0.407
Negative	148 (71.8%)	101 (70.1%)	47 (75.8%)
Positive	58 (28.2%)	43 (29.9%)	15 (24.2%)
Tumor grade				0.057
G1	16 (7.8%)	7 (4.9%)	9 (14.5%)
G2	150 (72.8%)	109 (75.7%)	41 (66.1%)
G3	40 (19.4%)	28 (19.4%)	12 (19.4%)
Depth of tumor invasion				0.912
cT1	47 (22.8%)	33 (22.9%)	14 (22.6%)
cT2	58 (28.2%)	40 (27.8%)	18 (29%)
cT3	91 (44.2%)	63 (43.8%)	28 (45.2%)
cT4	10 (4.9%)	8 (5.6%)	2 (3.2%)
Node stage				0.506
pN0	127 (61.7%)	89 (61.8%)	38 (61.3%)
pN1	46 (22.3%)	35 (24.3%)	11 (17.7%)
pN2	19 (9.2%)	12 (8.3%)	7 (11.3%)
pN3	14 (6.8%)	8 (5.6%)	6 (9.7%)
Lymph node metastasis status				0.944
Negative	127 (61.7%)	89 (61.8%)	38 (61.3%)
Positive	79 (38.3%)	55 (38.2%)	24 (38.7%)

Open in New Tab Full Size Table

Follow-up and survival analysis

The median DFS of the entire cohort, training, and validation sets was 23.0 months (IQR: 11.7, 33.3), 22.8 months (IQR: 13.6, 31.9), and 23.9 months (IQR: 10.0, 41.4), respectively. Generally, in the training cohort, 38.9% (56/144) of the patients relapsed after surgery, 82.1% (46/56) experienced recurrent disease within 2 years, and 31.3% (45/144) died during follow-up, with 62.2% (28/45) deaths occurring within 2 years. In contrast, in the validation cohort, 41.9% (26/62) of the patients encountered recurrent tumors postoperatively, with 50.0% (13/26) relapsing within 2 years. Additionally, 32.3% (20/62) of the patients died during the follow-up period, with 65% (13/20) death events occurring within 2 years. The DFS and OS of the LNM-positive (LNM+) group were significantly lower than those of the LNM-negative (LNM-) group across different datasets (Supplementary Figure 1).

Performance analysis of CT in evaluating LNM

In general, the overall LNM status reported by CT was inconsistent with the results of pathological evaluation (Kappa = 0.232, P = 0.001). Additionally, CT showed poor consistency in evaluating LNM in the thoracic (Kappa = 0.347, P < 0.001) and abdominal cavities (Kappa = 0.296, P < 0.001; Supplementary Table 2). The sensitivity of CT for evaluating thoracic LNM was much higher than that for abdominal LNM (47.92% vs 28.89%). However, the specificity showed the opposite trend (86.08% vs 95.03%) (Table 2). Overall, the use of CT to evaluate LNM in patients with EC showed low sensitivity and favourable specificity, consistent with previously reported findings. Thus, other variables must be integrated with CT findings to further improve the sensitivity of LNM prediction in patients with EC.

Table 2 Performances of computed tomography to evaluate lymph node metastasis status in patients with esophageal cancer.

Statistic	Value (95%CI)
Statistic	Whole cohort (n = 206)	Training set (n = 144)	Validation set (n = 62)
Overall lymph node status evaluation
Sensitivity	42.11% (30.86%-53.98%)	42.59% (29.23%-56.79%)	40.91% (20.71%-63.65%)
Specificity	80% (72.08%-86.50%)	77.78% (67.79%-85.87%)	85% (70.16%-94.29%)
PPV	55.17% (44.38%-65.50%)	53.49% (41.21%-65.36%)	60% (38.06%-78.55%)
NPV	70.27% (65.70%-74.47%)	69.31% (63.64%-74.45%)	72.34% (64.34%-79.13%)
Accuracy	66.02% (59.11%-72.46%)	64.58% (56.19%-72.37%)	69.35% (56.35%-80.44%)
Thoracic lymph node status evaluation
Sensitivity	47.92% (33.29%-62.81%)	51.35% (34.40%-68.08%)	36.36% (10.93%-69.21%)
Specificity	86.08% (79.68%-91.06%)	85.05% (76.86%-91.20%)	88.24% (76.13%-95.56%)
PPV	51.11% (39.11%-62.99%)	54.29% (40.66%-67.30%)	40% (18.39%-66.35%)
NPV	84.47% (80.46%-87.79%)	83.49% (78.25%-87.66%)	86.54% (80.26%-91.04%)
Accuracy	77.18% (70.84%-82.73%)	76.39% (68.60%-83.06%)	79.03% (66.82%-88.34%)
Abdominal lymph node status evaluation
Sensitivity	28.89% (16.37%-44.31%)	26.67% (12.28%-45.89%)	33.33% (11.82%-61.62%)
Specificity	95.03% (90.44%-97.83%)	93.86% (87.76%-97.50%)	97.87% (88.71%-99.95%)
PPV	61.9% (41.80%-78.62%)	53.33% (31.05%-74.36%)	83.33% (38.76%-97.53%)
NPV	82.7% (79.82%-85.25%)	82.95% (79.59%-85.85%)	82.14% (76.24%-86.83%)
Accuracy	80.58% (74.51%-85.75%)	79.86% (72.37%-86.08%)	82.26% (70.47%-90.80%)

NPV: Negative predictive value; PPV: Positive predictive value.

Open in New Tab Full Size Table

ML models for predicting LNM of ESCC with routine clinical data

Using the training dataset alone, we explored factors associated with LNM in ESCC using different approaches. Univariate and multivariate logistic regression analyses showed that the five factors of CT-reported nodal status (OR = 2.35, 95%CI: 1.10-5.03, P = 0.028), number of aberrant protein biomarkers (OR = 1.76, 95%CI: 1.11-2.8, P = 0.017), tumor length (OR = 1.37, 95%CI: 1.06-1.77, P = 0.017), depth of tumor invasion (OR = 2.72, 95%CI: 1.71-4.32, P < 0.001), and tumor location (OR = 0.35, 95%CI: 0.18-0.67, P = 0.002) were independent risk factors for LNM in patients with ESCC (Supplementary Tables 3 and 4). Additionally, we selected a set of variables reported in at least three previous studies, including tumor location, histologic grade, depth of tumor invasion, tumor length, and preoperative CT results (Supplementary Tables 4 and 5). We established a new variable set by combining the number of aberrant protein biomarkers with the five components from set 2 (Supplementary Table 4). Six feature selection strategies were used by Sklearn to determine the optimal combination of variables (Supplementary Table 4). Nine feature sets were selected, including five categorical variables and eight numerical factors (Supplementary Table 4).

The ML models were trained using a training dataset with nine ML modelling algorithms and nine feature sets, yielding 81 models. Figure 2 presents the average AUC and F1 score for each model. In the training set, the supervised ML models had a median AUC of 0.76 (IQR: 0.68, 0.83) and median F1 score of 0.57 (IQR: 0.45, 0.65). The highest AUC of the training set was 0.86 (95%CI: 0.78-0.94), which was developed using the RF algorithm. It included five factors: Tumor location, depth of tumor invasion, tumor length, CT-reported nodal results, and number of abnormal protein biomarkers, that were selected using traditional univariate and multivariate logistic regression analyses (labelled as model-1) (Figure 2A). The best F1 score was 0.71 (95%CI: 0.58-0.83) developed using the naïve Bayes algorithm in the training group, which contained four variables: Tumor location, depth of tumor invasion, tumor length, and CT results, as determined using least absolute shrinkage and selection operator with cross-validation (labelled as model-2) (Figure 2B). The median average AUC and F1 score in the validation set were 0.73 (IQR: 0.62, 0.79) and 0.5 (IQR: 0.4, 0.56), respectively (Figure 2C and D).

Open in New Tab Full Size Figure Download Figure

Figure 2 Performance of various models established by different feature selection strategies and machine-learning algorithms. A and B: Area under the curve (AUC) and F1 score of the training set; C and D: AUC and F1 score of the validation cohort. Heatmap of the AUC or F1 score of each feature selection method (rows) with each machine-learning algorithm (columns). In the heatmaps below, warmer colors (orange) indicate higher performance. Ada: Adaptive; DC: Decision tree; GBDT: Gradient boosting decision tree; KNN: K-nearest neighbors; LASSO: Least absolute shrinkage and selection operator; LR: Logistic regression; NB: Naïve Bayes; RF: Random forest; RFECV: Recursive feature elimination with cross-validation; RFLV: Removing features with low variance; SFM: SelectFromModel; SVM: Support vector machine; UFS: Univariate feature selection; XGBoost: Extreme gradient boosting machine.

Table 3 displays the predictive abilities of the two best-performing models mentioned above in the training and validation datasets. In the validation cohort, the sensitivity, specificity, PPV, NPV, and accuracy of model-1 were 66.67% (44.68%-84.37%), 92.11% (78.62%-98.34%), 84.21% (63.45%-94.25%), 81.40% (71.15%-88.59%), and 82.26% (70.47%-90.80%), respectively; the sensitivity, specificity, PPV, NPV, and accuracy of model-2 were 75% (53.29%-90.23%), 89.47% (75.20%-97.06%), 81.82% (63.38%-92.12%), 85% (73.75%-91.95%), and 83.87% (72.33%-91.98%), respectively. The predictive power of the two models outperformed that of the CT-reported nodal results (sensitivity: 40.9%, 20.7%-63.7%; specificity: 85.0%, 70.2%-94.3%; PPV: 60%, 38.1%-78.6%; NPV: 72.3%, 64.3%-79.1%; and accuracy: 69.4%, 56.4%-80.4%). The predictive capacity of model-2 was more comparable to that of model-1 in the training and validation datasets (Figure 3). We selected model-2 as the optimal model for further exploration because its balanced performance in both sensitivity and specificity (as reflected by the F1 score) is clinically pertinent. In the context of LNM prediction, a high sensitivity is crucial to avoid missing true positive cases (who may need more aggressive treatment), while a reasonable specificity is needed to prevent overtreatment of node-negative patients. The F1 score captures this balance. The superior sensitivity of our ML model suggests its potential value in addressing the critical challenge of occult LN metastases (CT-negative but pathology-positive). In the validation cohort, CT imaging failed to detect approximately 14 out of 24 pathologically confirmed LNM cases (sensitivity of 40.9%). In contrast, our optimal model-2 identified 18 of these 24 true LNM cases (sensitivity of 75.0%). This implies that model-2 could correctly reclassify a substantial proportion (approximately 8 out of 14) of the CT-occult metastases as high-risk, thereby demonstrating its potential to add significant clinical value beyond conventional CT assessment alone.

Open in New Tab Full Size Figure Download Figure

Figure 3 Performance comparison between machine-learning models and computed tomography in evaluating lymph node metastasis status in esophageal cancer patients. A: The receiver operating characteristic curves in the training cohort, where lines of different colors represent different models or computed tomography (CT); B-F: The sensitivity, specificity, positive predictive value, negative predictive value and accuracy of the training and validation sets. Model-1 with the highest area under the curve in the training set, developed by the random forest algorithm, included five factors: (1) Tumor location; (2) Depth of tumor invasion; (3) Tumor length; (4) CT reported results; and (5) Number of abnormal protein biomarkers. Model-2 with the best F1 score in the training cohort, established by naïve Bayes algorithm, contained four variables: (1) Tumor location; (2) Depth of tumor invasion; (3) Tumor length; and (4) CT results.

Table 3 Comparison of two machine-learning models with computed tomography in evaluating lymph node metastasis in patients with esophageal cancer.

Statistic	Training set		Validation set
Statistic	Value	95%CI	Value	95%CI
Computed tomography predicted
Sensitivity	42.6%	29.25%-56.8%	40.9%	20.7%-63.7%
Specificity	77.8%	67.8%-85.8%	85.0%	70.2%-94.3%
PPV	53.5%	41.2%-65.4%	60.0%	38.1%-78.6%
NPV	69.3%	63.6%-74.4%	72.3%	64.3%-79.1%
Accuracy	64.6%	56.2%-72.4%	69.4%	56.4%-80.4%
Model-1 predicted
Sensitivity	85.45%	73.34%-93.50%	66.67%	44.68%-84.37%
Specificity	88.76%	80.31%-94.48%	92.11%	78.62%-98.34%
PPV	82.46%	72.18%-89.49%	84.21%	63.45%-94.25%
NPV	90.80%	83.82%-94.95%	81.40%	71.15%-88.59%
Accuracy	87.50%	80.97%-92.42%	82.26%	70.47%-90.80%
Model-2 predicted
Sensitivity	89.09%	77.75%-95.89%	75.00%	53.29%-90.23%
Specificity	85.39%	76.32%-91.99%	89.47%	75.20%-97.06%
PPV	79.03%	69.34%-86.27%	81.82%	63.38%-92.12%
NPV	92.68%	85.56%-96.44%	85.00%	73.75%-91.95%
Accuracy	86.81%	80.16%-91.87%	83.87%	72.33%-91.98%

NPV: Negative predictive value; PPV: Positive predictive value.

Open in New Tab Full Size Table

The ctDNA features improve the predictive capacity

We also explored whether incorporating preoperative ctDNA features could improve the predictive capacity of the above-mentioned optimal model (model-2). The ctDNA cohort for which ctDNA data were available included 39 patients in the training set and 18 in the validation set. The preoperative plasma cfDNA concentrations in the LNM- and LNM+ groups were 24.80 ng/mL vs 19.73 ng/mL (P = 0.290; Figure 4A), respectively. The number of ctDNA variants showed no significant difference between the LNM- and LNM+ groups (P = 0.880; Figure 4B). The average VAF of ctDNA variants in each patient was calculated, and the difference was statistically significant, with an average VAF of 0.43% in LNM- cases and 0.97% in LNM+ cases (P = 0.026; Figure 4C). As TP53, PIK3CA, and PTCH1 were the most frequently altered genes in ESCC, we also analysed their average VAF. The results revealed a significant statistical difference between LNM- and LNM+ groups for TP53 (0.05% vs 0.62%, P = 0.008; Figure 4D), but not for PIK3CA (P = 0.202; Figure 4E) and PTCH1 (P = 0.550; Figure 4F). Moreover, we discovered that the average VAF of TP53 mutations was an independent risk factor for LNM (OR = 2.41, 95%CI: 1.04-5.57, P = 0.040; Supplementary Table 5).

Open in New Tab Full Size Figure Download Figure

Figure 4 The comparison of cell-free DNA features grouped by lymph node metastasis status. A: The cell-free DNA concentration; B: Number of circulating tumor DNA (ctDNA) variants; C: Average variant allele frequency (VAF) of ctDNA mutations; D: Average VAF of ctDNA TP53 mutations; E: Average VAF of ctDNA PIK3CA mutations; F: Average VAF of PTCH1 mutations of the lymph node negative and positive groups. cfDNA: Cell-free DNA; ctDNA: Circulating tumor DNA; LNM: Lymph node metastasis; VAF: Variant allele frequency.

We then added the mean VAF of each patient and mean VAF of TP53 mutations alone or combined with the five factors used in model-2; the average AUC and F1 score were used to evaluate each model’s overall performance with five-fold cross-validation, and the results are shown in Figure 5A. In detail, in the ctDNA cohort, the average AUC and F1 score of the previous optimal model were 0.88 (95%CI: 0.76-1.0) and 0.79 (95%CI: 0.69-0.89), respectively, which are comparable to those in the validation set. When the ctDNA feature was added to the model, the average AUC and F1 score were elevated to varying degrees. The combination of ctDNA mean VAF and TP53 mean VAF with the five factors resulted in the best performance, resulting in average AUC and F1 score of 0.96 (95%CI: 0.89-1.00) and 0.91 (95%CI: 0.79-1.0), respectively, outperforming the CT-reported results [AUC: 0.72 (95%CI: 0.54-0.90), P = 0.013; F1 score: 0.64 (95%CI: 0.43-0.85), P = 0.025]. Compared with model-2, the average AUC increased by 9.0%, although there was no statistically significant difference (P = 0.202); the F1 score increased by 14.3%, but did not achieve statistical significance (P = 0.103). Our findings suggest that integrating the CT results with the clinical data and ctDNA features may improve the predictive power, particularly sensitivity, of predicting LNM in patients with ESCC, though this requires validation in larger studies (Figure 5B).

Open in New Tab Full Size Figure Download Figure

Figure 5 Performance comparison among different random forest models with or without circulating tumor DNA features. A: Area under the curve and F1 score of the circulating tumor DNA cohort. It compares the area under the curve and F1 score of different model combinations in the circulating tumor DNA cohort; B: Performance capacity of three machine-learning models. It details the performance metrics of three selected machine-learning models. AUC: Area under the curve; CT: Computed tomography; PPV: Positive predictive value; NPV: Negative predictive value; VAF: Variant allele frequency.

DISCUSSION

In this study, we developed ML models to address the unmet need for accurate preoperative prediction of LNM in EC. Our optimal model, which integrated routine clinical data such as CT-reported results, tumor length, depth of invasion, tumor location, and the number of aberrant protein biomarkers, demonstrated significantly superior predictive performance compared to CT alone. Furthermore, in an exploratory analysis of a subset of patients, we found that a high TP53 ctDNA VAF was an independent risk factor for LNM, and its inclusion enhanced the model’s predictive capacity.

The primary clinical value of a reliable preoperative LNM prediction model lies in its potential to inform critical treatment decisions. For instance, patients identified as high-risk for LNM could be more confidently recommended for neoadjuvant therapy, which is established to benefit those with nodal involvement[4]. Conversely, for patients predicted to have a very low risk of LNM, the model might support a more tailored surgical approach, potentially aiding in the discussion about the necessity and extent of lymphadenectomy, thus balancing oncological radicality against the risks of surgical morbidity[5,6]. Our model, by synthesizing routinely available preoperative data, offers a framework for such individualized risk assessment that extends beyond conventional imaging.

Our findings align with and extend previous research. The association between negative nodal status and better disease-free and overall survival reinforces the prognostic importance of LNM[7-9]. The features selected in our best-performing models – tumor length, depth of invasion, location, and CT findings – have all been previously linked to LNM[21,30-32]. Regarding biomarkers, while prior studies have reported associations between LNM and individual serum markers like SCC-Ag, CYFRA21-1, or CEA with inconsistent results often due to varying cutoff values[33,34], our study uniquely identified the number of abnormal protein biomarkers as an independent risk factor. We chose this composite variable rather than the concentration of any single marker because it may provide a more robust and integrative reflection of systemic tumor burden and biological aggressiveness. Individual marker levels can be influenced by non-malignant conditions, whereas an increasing number of concurrently elevated biomarkers is less likely to occur by chance and may better correlate with an advanced disease state and metastatic potential. This is supported by its association with higher tumor stage in our cohort (Supplementary Figure 2), suggesting that this composite measure more robustly reflects the systemic tumor burden. Compared to existing predictive models based on demographics, clinicopathology, or radiomics which show moderate performance (Supplementary Table 6), and to those utilizing gene expression signatures[23,35], our work integrates a different and more readily available data dimension. Most notably, to our knowledge, this is the first study to incorporate preoperative ctDNA features into an ML model for LNM prediction in EC, addressing a previously unexplored association.

The biological plausibility of using ctDNA VAF is supported by its correlation with tumor burden. We confirmed that a high average TP53 ctDNA VAF is an independent risk factor for LNM in ESCC. This observation finds context in studies of other cancers but also highlights important nuances. In NSCLC, ctDNA VAF correlates with tumor stage and size[36], and some studies link high preoperative VAF to LNM[26], while others do not[27]. Similarly, in ESCC, higher VAF is associated with advanced disease[28]. These comparative observations suggest that while VAF may serve as a general indicator of tumor load, its specific association with LNM is not uniform across all cancer types. The strength and clinical utility of this association likely depend on a confluence of tumor-specific factors. For instance, differences in intrinsic tumor biology (such as cfDNA shedding rates), the anatomical complexity of lymphatic drainage in the esophageal region, and the distinct genomic landscape of ESCC compared to adenocarcinomas of the esophagus or other organs, could all influence the detectability and predictive value of ctDNA for LNM. Therefore, the discrepancy between our positive findings in ESCC and the mixed results in NSCLC[26,27] underscores a critical point: Liquid biopsy markers like ctDNA VAF must be rigorously validated within specific disease contexts. Our findings contribute a necessary and specific data point for ESCC within this evolving landscape, reinforcing the principle that predictive biomarkers require disease-specific evaluation.

Our study has several important limitations that must be acknowledged. First, all models were developed and validated on data from a single center. While internal cross-validation was employed, the generalizability of our findings, particularly for the ctDNA-enhanced model, remains to be established through rigorous external validation in prospective, multi-center cohorts. Second, the overall cohort size (n = 206) limits more granular subgroup analyses. Additionally, the observed discrepancy in CT performance between thoracic and abdominal LNM evaluation could be influenced by inherent challenges such as respiratory motion artifacts, differences in background tissue contrast, and the complex anatomical landscape of different nodal basins, which are limitations intrinsic to current imaging assessment itself. Third, and most pertinent to the novel ctDNA aspect, the subset with ctDNA data was relatively small (n = 57), increasing the risk of overfitting. The performance improvement seen with ctDNA integration, while encouraging, must therefore be interpreted as exploratory and hypothesis-generating, requiring confirmation in larger studies. Furthermore, technical factors like panel design and sequencing depth can influence ctDNA variant detection. Fourth, while combining ctDNA features improved model performance, the current additional cost and limited accessibility of ctDNA sequencing present practical barriers to routine clinical adoption. A formal cost-benefit analysis, which was beyond the scope of this exploratory study, would be necessary to determine whether the incremental predictive accuracy justifies the increased expense in real-world settings. Nonetheless, sequencing costs are declining[37], and the clinical utility of ctDNA is expanding. Finally, the retrospective design necessitates prospective validation. Future work should focus on such external validation, directly testing the model’s utility in guiding neoadjuvant therapy decisions or surgical planning. Although our primary aim was prediction, the confirmed link between pathological LNM and worse survival (Supplementary Figure 1) underscores the clinical importance of accurate preoperative assessment.

CONCLUSION

In conclusion, effective prediction of LNM in patients with EC could assist clinicians in personalizing treatment strategies. In this study, RF prediction models for LNM were developed based on clinical characteristics alone and, exploratorily, combined with ctDNA features. Both approaches showed promising predictive capacity, with the ctDNA-integrated model yielding particularly encouraging but preliminary results. This framework may facilitate the personalized preoperative evaluation of LNM in patients with EC. However, prospective multicenter validation studies are essential to confirm its clinical utility before any routine application can be considered.

ACKNOWLEDGMENTS

We are grateful to all the study participants and our colleagues in the pathology and endoscopy departments for their help.

References

Then EO, Lopez M, Saleem S, Gayam V, Sunkara T, Culliford A, Gaduputi V. Esophageal Cancer: An Updated Surveillance Epidemiology and End Results Database Analysis. World J Oncol. 2020;11:55-64. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 273] [Cited by in RCA: 246] [Article Influence: 41.0] [Reference Citation Analysis (9)]

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71:209-249. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 76817] [Cited by in RCA: 70403] [Article Influence: 14080.6] [Reference Citation Analysis (61)]

National Health Commission of The People's Republic of China. Chinese guidelines for diagnosis and treatment of esophageal carcinoma 2018 (English version). Chin J Cancer Res. 2019;31:223-258. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 27] [Cited by in RCA: 59] [Article Influence: 8.4] [Reference Citation Analysis (3)]

4.	Thuss-Patience P, Vecchione L, Keilholz U. Should cT2 esophageal cancer get neoadjuvant treatment before surgery? J Thorac Dis. 2017;9:2819-2823. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 5] [Article Influence: 0.6] [Reference Citation Analysis (0)]

Hagens ERC, van Berge Henegouwen MI, Cuesta MA, Gisbertz SS. The extent of lymphadenectomy in esophageal resection for cancer should be standardized. J Thorac Dis. 2017;9:S713-S723. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 55] [Cited by in RCA: 57] [Article Influence: 6.3] [Reference Citation Analysis (1)]

6.	Nishimura E, Matsuda S, Takeuchi M, Kawakubo H, Kitagawa Y. Trends in Lymphadenectomy for Esophageal/Esophagogastric Junction Cancer. Lymphatics. 2023;1:77-86. [PubMed] [DOI] [Full Text]

Li K, Nie X, Li C, He W, Wang C, Du K, Li K, Liu K, Li Z, Lu S, Ni K, Huang Y, Jiang L, Wang K, Li H, Fang Q, Xiao W, Han Y, Leng X, Peng L. Mapping of Lymph Node Metastasis and Efficacy Index in Thoracic Esophageal Squamous Cell Carcinoma: A Large-Scale Retrospective Analysis. Ann Surg Oncol. 2023;30:5856-5865. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 31] [Cited by in RCA: 37] [Article Influence: 12.3] [Reference Citation Analysis (0)]

Peyre CG, Hagen JA, DeMeester SR, Altorki NK, Ancona E, Griffin SM, Hölscher A, Lerut T, Law S, Rice TW, Ruol A, van Lanschot JJ, Wong J, DeMeester TR. The number of lymph nodes removed predicts survival in esophageal cancer: an international study on the impact of extent of surgical resection. Ann Surg. 2008;248:549-556. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 390] [Cited by in RCA: 386] [Article Influence: 21.4] [Reference Citation Analysis (2)]

Jiang D, Liu XB, Xing WQ, Chen PN, Feng SK, Yan S, Lerut T, Sun HB. Survival impact of the number of lymph nodes dissection in patients receiving neoadjuvant chemotherapy for esophageal squamous cell carcinoma. Dis Esophagus. 2023;36:doac082. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 5] [Reference Citation Analysis (0)]

10.

Fang W, Kato H, Tachimori Y, Igaki H, Sato H, Daiko H. Analysis of pulmonary complications after three-field lymph node dissection for esophageal cancer. Ann Thorac Surg. 2003;76:903-908. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 59] [Cited by in RCA: 63] [Article Influence: 2.7] [Reference Citation Analysis (0)]

11.

Li DL, Zhang L, Yan HJ, Zheng YB, Guo XG, Tang SJ, Hu HY, Yan H, Qin C, Zhang J, Guo HY, Zhou HN, Tian D. Machine learning models predict lymph node metastasis in patients with stage T1-T2 esophageal squamous cell carcinoma. Front Oncol. 2022;12:986358. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 4] [Reference Citation Analysis (0)]

12.

Karashima R, Watanabe M, Imamura Y, Ida S, Baba Y, Iwagami S, Miyamoto Y, Sakamoto Y, Yoshida N, Baba H. Advantages of FDG-PET/CT over CT alone in the preoperative assessment of lymph node metastasis in patients with esophageal cancer. Surg Today. 2015;45:471-477. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 12] [Cited by in RCA: 17] [Article Influence: 1.4] [Reference Citation Analysis (0)]

13.

Foley KG, Christian A, Fielding P, Lewis WG, Roberts SA. Accuracy of contemporary oesophageal cancer lymph node staging with radiological-pathological correlation. Clin Radiol. 2017;72:693.e1-693.e7. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 28] [Cited by in RCA: 42] [Article Influence: 4.7] [Reference Citation Analysis (1)]

14.

Vazquez-Sequeiros E, Norton ID, Clain JE, Wang KK, Affi A, Allen M, Deschamps C, Miller D, Salomao D, Wiersema MJ. Impact of EUS-guided fine-needle aspiration on lymph node staging in patients with esophageal carcinoma. Gastrointest Endosc. 2001;53:751-757. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 148] [Cited by in RCA: 136] [Article Influence: 5.4] [Reference Citation Analysis (0)]

15.

Shan HB, Zhang R, Li Y, Gao XY, Lin SY, Luo GY, Li JJ, Xu GL. Application of Endobronchial Ultrasonography for the Preoperative Detecting Recurrent Laryngeal Nerve Lymph Node Metastasis of Esophageal Cancer. PLoS One. 2015;10:e0137400. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 5] [Cited by in RCA: 10] [Article Influence: 0.9] [Reference Citation Analysis (0)]

16.

Li B, Li N, Liu S, Li Y, Qian B, Zhang Y, He H, Chen X, Sun Y, Xiang J, Hu H, Chen H. Does [18F] fluorodeoxyglucose-positron emission tomography/computed tomography have a role in cervical nodal staging for esophageal squamous cell carcinoma? J Thorac Cardiovasc Surg. 2020;160:544-550. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 15] [Article Influence: 2.1] [Reference Citation Analysis (0)]

17.

Peng G, Zhan Y, Wu Y, Zeng C, Wang S, Guo L, Liu W, Luo L, Wang R, Huang K, Huang B, Chen J, Chen C. Radiomics models based on CT at different phases predicting lymph node metastasis of esophageal squamous cell carcinoma (GASTO-1089). Front Oncol. 2022;12:988859. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 13] [Reference Citation Analysis (1)]

18.

Saikali S, Reddy S, Gokaraju M, Goldsztein N, Dyer A, Gamal A, Jaber A, Moschovas M, Rogers T, Vangala A, Briscoe J, Toleti C, Patel P, Patel V. Development and Assessment of an AI-based Machine Learning Model for Predicting Urinary Continence and Erectile Function Recovery after Robotic-Assisted Radical Prostatectomy: Insights from a Prostate Cancer Referral Center. Comput Methods Programs Biomed. 2025;259:108522. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 8] [Reference Citation Analysis (0)]

19.

Miao S, Xuan Q, Huang W, Jiang Y, Sun M, Qi H, Li A, Liu Z, Li J, Ding X, Wang R. Multi-region nomogram for predicting central lymph node metastasis in papillary thyroid carcinoma using multimodal imaging: A multicenter study. Comput Methods Programs Biomed. 2025;261:108608. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 6] [Reference Citation Analysis (1)]

20.

Xu L, Guo J, Qi S, Xie HN, Wei XF, Yu YK, Cao P, Zhang RX, Chen XK, Li Y. Development and validation of a nomogram model for the prediction of 4L lymph node metastasis in thoracic esophageal squamous cell carcinoma. Front Oncol. 2022;12:887047. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

21.

Zhang Y, Zhang L, Li B, Ye T, Zhang Y, Yu Y, Ma Y, Sun Y, Xiang J, Li Y, Chen H. Machine learning to predict occult metastatic lymph nodes along the recurrent laryngeal nerves in thoracic esophageal squamous cell carcinoma. BMC Cancer. 2023;23:197. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 6] [Reference Citation Analysis (0)]

22.

Chen L, Ouyang Y, Liu S, Lin J, Chen C, Zheng C, Lin J, Hu Z, Qiu M. Radiomics Analysis of Lymph Nodes with Esophageal Squamous Cell Carcinoma Based on Deep Learning. J Oncol. 2022;2022:8534262. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 13] [Reference Citation Analysis (1)]

23.

Sonohara F, Gao F, Iwata N, Kanda M, Koike M, Takahashi N, Yamada Y, Kodera Y, Wang X, Goel A. Genome-wide Discovery of a Novel Gene-expression Signature for the Identification of Lymph Node Metastasis in Esophageal Squamous Cell Carcinoma. Ann Surg. 2019;269:879-886. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 7] [Cited by in RCA: 14] [Article Influence: 2.3] [Reference Citation Analysis (0)]

24.

Wu J, Chen QX, Shen DJ, Zhao Q. A prediction model for lymph node metastasis in T1 esophageal squamous cell carcinoma. J Thorac Cardiovasc Surg. 2018;155:1902-1908. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 11] [Cited by in RCA: 19] [Article Influence: 2.1] [Reference Citation Analysis (0)]

25.

Lee JH, Jeong H, Choi JW, Oh HE, Kim YS. Liquid biopsy prediction of axillary lymph node metastasis, cancer recurrence, and patient survival in breast cancer: A meta-analysis. Medicine (Baltimore). 2018;97:e12862. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 29] [Cited by in RCA: 34] [Article Influence: 4.3] [Reference Citation Analysis (1)]

26.

Zhang R, Zhang X, Huang Z, Wang F, Lin Y, Wen Y, Liu L, Li J, Liu X, Xie W, Huang M, Wang G, Yang L, Zhao D, Yu X, Xi K, Wang W, Cai L, Zhang L. Development and validation of a preoperative noninvasive predictive model based on circular tumor DNA for lymph node metastasis in resectable non-small cell lung cancer. Transl Lung Cancer Res. 2020;9:722-730. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 4] [Cited by in RCA: 12] [Article Influence: 2.0] [Reference Citation Analysis (0)]

27.

Zhang Q, Luo J, Wu S, Si H, Gao C, Xu W, Abdullah SE, Higgs BW, Dennis PA, van der Heijden MS, Segal NH, Chaft JE, Hembrough T, Barrett JC, Hellmann MD. Prognostic and Predictive Impact of Circulating Tumor DNA in Patients with Advanced Cancers Treated with Immune Checkpoint Blockade. Cancer Discov. 2020;10:1842-1853. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 167] [Cited by in RCA: 294] [Article Influence: 49.0] [Reference Citation Analysis (4)]

28.

Iwaya T, Endo F, Yaegashi M, Sasaki N, Fujisawa R, Hiraki H, Akiyama Y, Sasaki A, Suzuki Y, Masuda M, Yamada T, Takahashi F, Tokino T, Sasaki Y, Nishizuka S. Frequent tumor burden monitoring of esophageal squamous cell carcinoma with circulating tumor DNA using individually designed digital PCR. 2020 Preprint. Available from: medRxiv. [DOI] [Full Text]

29.

Liu T, Li M, Cheng W, Yao Q, Xue Y, Wang X, Jin H. A clinical prognostic model for patients with esophageal squamous cell carcinoma based on circulating tumor DNA mutation features. Front Oncol. 2022;12:1025284. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 4] [Reference Citation Analysis (0)]

30.

Zeybek A, Erdoğan A, Gülkesen KH, Ergin M, Sarper A, Dertsiz L, Demircan A. Significance of tumor length as prognostic factor for esophageal cancer. Int Surg. 2013;98:234-240. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 19] [Cited by in RCA: 25] [Article Influence: 2.1] [Reference Citation Analysis (0)]

31.

Yang J, Liu Y, Li B, Jiang P, Wang C. Prognostic significance of tumor length in patients with esophageal cancer undergoing radical resection: A PRISMA-compliant meta-analysis. Medicine (Baltimore). 2019;98:e15029. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 6] [Cited by in RCA: 6] [Article Influence: 0.9] [Reference Citation Analysis (0)]

32.

Chen H, Zhou X, Tang X, Li S, Zhang G. Prediction of Lymph Node Metastasis in Superficial Esophageal Cancer Using a Pattern Recognition Neural Network. Cancer Manag Res. 2020;12:12249-12258. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 6] [Cited by in RCA: 13] [Article Influence: 2.2] [Reference Citation Analysis (0)]

33.

Mei X, Zhu X, Zuo L, Wu H, Guo M, Liu C. Predictive significance of CYFRA21-1, squamous cell carcinoma antigen and carcinoembryonic antigen for lymph node metastasis in patients with esophageal squamous cancer. Int J Biol Markers. 2019;34:200-204. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 16] [Article Influence: 2.3] [Reference Citation Analysis (1)]

34.

Chen L, Peng K, Han Z, Yu S, Huang Z, Xu H, Kang M. Development and validation of a nomogram for preoperative prediction of lymph node metastasis in pathological T1 esophageal squamous cell carcinoma. Medicine (Baltimore). 2022;101:e29299. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 4] [Reference Citation Analysis (0)]

35.

Tamoto E, Tada M, Murakawa K, Takada M, Shindo G, Teramoto K, Matsunaga A, Komuro K, Kanai M, Kawakami A, Fujiwara Y, Kobayashi N, Shirata K, Nishimura N, Okushiba S, Kondo S, Hamada J, Yoshiki T, Moriuchi T, Katoh H. Gene-expression profile changes correlated with tumor progression and lymph node metastasis in esophageal cancer. Clin Cancer Res. 2004;10:3629-3638. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 69] [Cited by in RCA: 61] [Article Influence: 2.8] [Reference Citation Analysis (0)]

36.

Abbosh C, Birkbak NJ, Wilson GA, Jamal-Hanjani M, Constantin T, Salari R, Le Quesne J, Moore DA, Veeriah S, Rosenthal R, Marafioti T, Kirkizlar E, Watkins TBK, McGranahan N, Ward S, Martinson L, Riley J, Fraioli F, Al Bakir M, Grönroos E, Zambrana F, Endozo R, Bi WL, Fennessy FM, Sponer N, Johnson D, Laycock J, Shafi S, Czyzewska-Khan J, Rowan A, Chambers T, Matthews N, Turajlic S, Hiley C, Lee SM, Forster MD, Ahmad T, Falzon M, Borg E, Lawrence D, Hayward M, Kolvekar S, Panagiotopoulos N, Janes SM, Thakrar R, Ahmed A, Blackhall F, Summers Y, Hafez D, Naik A, Ganguly A, Kareht S, Shah R, Joseph L, Marie Quinn A, Crosbie PA, Naidu B, Middleton G, Langman G, Trotter S, Nicolson M, Remmen H, Kerr K, Chetty M, Gomersall L, Fennell DA, Nakas A, Rathinam S, Anand G, Khan S, Russell P, Ezhil V, Ismail B, Irvin-Sellers M, Prakash V, Lester JF, Kornaszewska M, Attanoos R, Adams H, Davies H, Oukrif D, Akarca AU, Hartley JA, Lowe HL, Lock S, Iles N, Bell H, Ngai Y, Elgar G, Szallasi Z, Schwarz RF, Herrero J, Stewart A, Quezada SA, Peggs KS, Van Loo P, Dive C, Lin CJ, Rabinowitz M, Aerts HJWL, Hackshaw A, Shaw JA, Zimmermann BG; TRACERx consortium; PEACE consortium, Swanton C. Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution. Nature. 2017;545:446-451. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1195] [Cited by in RCA: 1422] [Article Influence: 158.0] [Reference Citation Analysis (7)]

37.

Shajii A, Numanagić I, Baghdadi R, Berger B, Amarasinghe S; MIT CSAIL, USA. Seq: A High-Performance Language for Bioinformatics. Proc ACM Program Lang. 2019;3:125. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 11] [Cited by in RCA: 7] [Article Influence: 1.0] [Reference Citation Analysis (0)]

Footnotes

Peer review: Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Oncology

Country of origin: China

Peer-review report’s classification

Scientific quality: Grade A, Grade A, Grade B, Grade B, Grade C

Novelty: Grade A, Grade A, Grade A, Grade B, Grade C

Creativity or innovation: Grade A, Grade A, Grade B, Grade B, Grade C

Scientific significance: Grade A, Grade A, Grade B, Grade B, Grade C

P-Reviewer: Chen GY, MD, Assistant Professor, China; Torun M, MD, FACS, FESC, Türkiye; Yao JH, Researcher, China S-Editor: Luo ML L-Editor: A P-Editor: Zhao YQ