Ji XL, Xu S, Li XY, Xu JH, Han RS, Guo YJ, Duan LP, Tian ZB. Prognostic prediction models for postoperative patients with stage I to III colorectal cancer based on machine learning. World J Gastrointest Oncol 2024; 16(12): 4597-4613 [DOI: 10.4251/wjgo.v16.i12.4597]
Corresponding Author of This Article
Li-Ping Duan, MD, PhD, Chief Physician, Professor, Department of Gastroenterology, Beijing Key Laboratory for Helicobacter Pylori Infection and Upper Gastrointestinal Diseases, Peking University Third Hospital, No. 49 Garden Road, Haidian District, Beijing 100191, China. duanlp@bjmu.edu.cn
Research Domain of This Article
Gastroenterology & Hepatology
Article-Type of This Article
Retrospective Cohort Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Xiao-Lin Ji, Li-Ping Duan, Department of Gastroenterology, Beijing Key Laboratory for Helicobacter Pylori Infection and Upper Gastrointestinal Diseases, Peking University Third Hospital, Beijing 100191, China
Shuo Xu, Beijing Aerospace Wanyuan Science Technology Co., Ltd., China Academy of Launch Vehicle Technology, Beijing 100176, China
Xiao-Yu Li, Rong-Shuang Han, Ying-Jie Guo, Zi-Bin Tian, Department of Gastroenterology, The Affiliated Hospital of Qingdao University, Qingdao 266003, Shandong Province, China
Jin-Huan Xu, Institute of Automation, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, Shandong Province, China
Co-corresponding authors: Zi-Bin Tian and Li-Ping Duan.
Author contributions: Ji XL and Xu S designed the study, acquired and analyzed the data, and wrote the manuscript; Ji XL and Xu S contributed equally to this work; Li XY prepared the materials; Xu JH provided methods; Han RS, and Guo YJ participated in the data acquisition and analysis; Tian ZB and Duan LP managed and designed the project, and performed critical revisions of the manuscript; Tian ZB and Duan LP contributed equally to this work; All authors have read and approve the final manuscript. Ji XL and Xu S contributed equally to this work as co-first authors. The designation of Tian ZB and Duan LP as co-corresponding authors of this work is primarily based on the following three reasons. First, this research project spans multiple disciplines. As the main provider of data, Tian ZB ensures the reliability and integrity of the research. His work in data collection, collation and analysis is crucial to the quality of the paper, while Duan LP ensures the comprehensiveness and depth of the research. Second, Duan LP was the main provider of the core ideas of the paper, setting the foundation for the direction and framework of the entire research and promoting the innovation and scientific value of the research. Moreover, Tian ZB put forward suggestions during this process. The two co-corresponding authors have similar contributions to the project and work closely together. Balancing their contributions is crucial for fairness and transparency. Third, Tian ZB and Duan LP jointly undertook the task of revising the manuscript, reducing the burden of a single corresponding author and ensuring timely and efficient responses. In short, designating two corresponding authors helps promote cooperation, enhance academic influence, and improve the quality of research results. The contributions of Tian ZB and Duan LP are equally important at different stages, so being co-corresponding authors more fairly reflects their collaboration and contribution to this research.
Supported byNational Natural Science Foundation of China, No. 81802777.
Institutional review board statement: The study was reviewed and approved for publication by the Ethics Committee of the Affiliate Hospital of Qingdao University (Grant No. QYFYWZLL26957).
Informed consent statement: All study participants or their legal guardians provided informed verbal consent for personal and medical data collection prior to study enrollment.
Conflict-of-interest statement: The authors have no conflicts of interest related to the manuscript.
Data sharing statement: The original anonymized dataset is available upon request from the corresponding author at duanlp@bjmu.edu.cn.
STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Li-Ping Duan, MD, PhD, Chief Physician, Professor, Department of Gastroenterology, Beijing Key Laboratory for Helicobacter Pylori Infection and Upper Gastrointestinal Diseases, Peking University Third Hospital, No. 49 Garden Road, Haidian District, Beijing 100191, China. duanlp@bjmu.edu.cn
Received: March 19, 2024 Revised: September 7, 2024 Accepted: September 14, 2024 Published online: December 15, 2024 Processing time: 237 Days and 23.7 Hours
Abstract
BACKGROUND
Colorectal cancer (CRC) is characterized by high heterogeneity, aggressiveness, and high morbidity and mortality rates. With machine learning (ML) algorithms, patient, tumor, and treatment features can be used to develop and validate models for predicting survival. In addition, important variables can be screened and different applications can be provided that could serve as vital references when making clinical decisions and potentially improving patient outcomes in clinical settings.
AIM
To construct prognostic prediction models and screen important variables for patients with stage I to III CRC.
METHODS
More than 1000 postoperative CRC patients were grouped according to survival time (with cutoff values of 3 years and 5 years) and assigned to training and testing cohorts (7:3). For each 3-category survival time, predictions were made by 4 ML algorithms (all-variable and important variable-only datasets), each of which was validated via 5-fold cross-validation and bootstrap validation. Important variables were screened with multivariable regression methods. Model performance was evaluated and compared before and after variable screening with the area under the curve (AUC). SHapley Additive exPlanations (SHAP) further demonstrated the impact of important variables on model decision-making. Nomograms were constructed for practical model application.
RESULTS
Our ML models performed well; the model performance before and after important parameter identification was consistent, and variable screening was effective. The highest pre- and postscreening model AUCs 95% confidence intervals in the testing set were 0.87 (0.81-0.92) and 0.89 (0.84-0.93) for overall survival, 0.75 (0.69-0.82) and 0.73 (0.64-0.81) for disease-free survival, 0.95 (0.88-1.00) and 0.88 (0.75-0.97) for recurrence-free survival, and 0.76 (0.47-0.95) and 0.80 (0.53-0.94) for distant metastasis-free survival. Repeated cross-validation and bootstrap validation were performed in both the training and testing datasets. The SHAP values of the important variables were consistent with the clinicopathological characteristics of patients with tumors. The nomograms were created.
CONCLUSION
We constructed a comprehensive, high-accuracy, important variable-based ML architecture for predicting the 3-category survival times. This architecture could serve as a vital reference for managing CRC patients.
Core Tip: We developed and validated a promising machine learning architecture for predicting the 3-category survival times (cutoff values of 3 years and 5 years) for four survival times (overall, disease-free, recurrence-free, and distant metastasis-free survival) and screened corresponding important variables. Fivefold cross validation and bootstrap validation were conducted. The models were evaluated with the area under the curve (AUC); moreover, the effectiveness of our variable screening methods was evaluated by comparing the models’ pre- and post-screening AUCs. SHapley Additive exPlanations were used to explain the decision-making process. Nomograms were drawn for various applications.
Citation: Ji XL, Xu S, Li XY, Xu JH, Han RS, Guo YJ, Duan LP, Tian ZB. Prognostic prediction models for postoperative patients with stage I to III colorectal cancer based on machine learning. World J Gastrointest Oncol 2024; 16(12): 4597-4613
Colorectal cancer (CRC) is characterized by high heterogeneity and aggressiveness and high morbidity and mortality rates[1] due to disease progression and inadequate treatment strategies[2]. Furthermore, overdiagnosis, overtreatment, false positives, false reassurance, uncertain findings, and complications are common and can lead to unnecessary psychological burdens on the patients[3-6]. Accurate prediction of the outcomes of CRC patients could be a vital reference when making clinical decisions. The American Joint Committee on Cancer (AJCC) classification system for CRC remains the primary tool for predicting these outcomes, especially for making adjuvant chemoradiotherapy decisions[7,8]. However, the survival observations associated with the AJCC classifications for CRC patients have been reported to exhibit certain inconsistencies[9-11]. Researchers have investigated and built prognostic models for CRC patients with traditional statistical methods; some of these models include the tumor node metastasis (TNM) stage, whereas others do not[12-15]. However, the performance of these models is not very satisfactory, likely due to methodological limitations. Therefore, it seems possible that the power of machine learning (ML) could be leveraged for improvements.
ML is a branch of artificial intelligence in which a computer generates rules underlying or based on raw data[16]; it has gradually been found to be useful in various applications in the field of medicine[17-20]. ML can be used to directly compare the accuracy of two or more quantitative tests for the same disease/condition[21], playing a role in formulating diagnosis and treatment rules[22-24]. ML algorithms have also been used to construct risk forecast models that predict the hazard ratio of adverse events[25,26] or predict the classification of double-class/multiclass endpoints at a specific time[27]. Nevertheless, the time interval of occurrence of specific oncological outcomes for CRC patients cannot be vertically predicted in these models, and some of the models’ important variables are unknown, casting doubt on their clinical credibility.
In our work, we developed a new ML architecture to predict 3-category occurrence times (cutoff values of 3 years and 5 years) of four oncological outcomes (death, tumor recurrence/distant metastasis, tumor recurrence, and tumor distant metastasis) in patients with stage I, II, and III CRC who underwent curative resection. The longitudinal predictive mentality is different from that in previous studies. Moreover, the models’ important variables and their order of importance were determined. These important parameters were used as clinical references, allowing the identification of hotspot indicators such as the lymph node ratio (LNR) and improving the clinical credibility of our models. The performances of the prediction models with all variables and only the important variables were compared to demonstrate the effectiveness of the variable screening method. SHapley Additive exPlanations (SHAP) was used to provide a more intuitive analysis of the importance of characteristic variables for predicting the net benefits of the model. Nomograms based on different outcomes and classifications were generated for use and reference.
MATERIALS AND METHODS
Patient selection
This study was performed in accordance with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the Affiliate Hospital of Qingdao University (Grant number QYFYWZLL26957). We retrospectively analyzed the data of patients who underwent curative surgery for primary stage I, II, and III CRC at the Affiliated Hospital of Qingdao University from 2001 to 2020. Patients who received neoadjuvant chemoradiotherapy or who died due to a noncancer-specific cause were excluded. The postoperative adjuvant chemoradiotherapy history of our CRC patients was unclear, and the data were acquired through the hospital information system. A detailed flowchart is shown in Figure 1.
Figure 1 Flowchart showing colorectal cancer patient cohort selection and model training and performance evaluation.
A total of 1330 patients were recruited for model development and were grouped by oncological outcomes; then, the data were randomly divided at a 7:3 ratio into a training set and a testing set. The hyperparameters were determined for both the training set and testing set, and model performance was evaluated on the basis of the area under the receiver operating characteristic curve. Finally, we predicted patient outcomes, screened important variables, and processed the bootstrap iterations. DFS: Disease-free survival; DMFS: Distant metastasis-free survival; OS: Overall survival; RFS: Recurrence-free survival.
Potential variables for model construction included age, sex, body mass index (BMI), hypertension, diabetes mellitus (DM), chronic heart disease (CHD), smoking history, drinking history, family history of tumors, family history of gastrointestinal tumors, serum carcinoembryonic antigen (CEA) level, serum C-reactive protein (CRP) level, tumor position (ascending colon vs transverse colon vs descending colon vs sigmoid colon vs rectum), tumor differentiation grade, histological type, tumor size (diameter, 20 mm cutoff), perineural invasion (PNI), lymphovascular invasion, lesion number (unifocal vs multifocal), Ki-67 protein level, operation method (laparotomy vs laparoscopy), LNR, and TNM stage. These characteristics mainly consisted of patient demographics and health, tumor and treatment characteristics. Missing values only appeared in the serum CEA, CRP, and Ki-67 protein level data and were processed as a corresponding variable classification.
Outcome selection
The outcomes were the 3-category survival times (cutoff values of 3 years and 5 years). The four survival times were overall survival (OS), disease-free survival (DFS), recurrence-free survival (RFS), and distant metastasis-free survival (DMFS), which were defined as the time from the date of surgery to the date of patient death, tumor recurrence/distant metastasis, tumor recurrence, and distant tumor metastasis, respectively. The three classifications of each outcome were labeled class 1 (cl 1) (< 3 years), class 2 (cl 2) (3-5 years), and class 3 (cl 3) (> 5 years).
ML algorithms and multivariable regression methods
The four ML algorithms used here were linear regression (LR)[28], linear discriminant analysis (LDA), eXtreme gradient boosting (XGBoost)[29], and categorical features and gradient boosting (CatBoost)[30]. Best subset selection regression, ridge regression, least absolute shrinkage and selection operator (LASSO) regression, and LASSO cross-validation methods were selected for multivariable regression methods.
SHAPs
SHAP[31] was used to analyze the number of important input variables screened by the prediction model for achieving net benefit. As the SHAP value approached 0, the possibility of further deleting input variables increased (the SHAP value was approximately 0), ultimately improving the net benefits of the prediction model. The Y-axis refers to various characteristic variables, the X-axis refers to variable SHAP values; a point in the graph represents a sample, the color of the point represents the eigenvalue, data jitter (data metastable state) reflects the distribution of SHAP values, and the order of the variables represents their importance.
Nomogram
Nomograms based on the important variables screened for each outcome class for the four ML models were constructed to improve the applicability of our work. The variables in the nomograms are arranged according to the order mentioned in the variable selection.
ML model training and validation
Figure 1 shows the process of dataset division. Preliminary predictions were made with the four algorithms with the data from all variables, yielding corresponding area under the curve (AUC). The outcomes were then classified prior to the important variable screening step. For each ML model, variables were screened with the four multivariate regression methods, and the most appropriate regression method was selected on the basis of the difference between the predicted value and the true values [e.g., the mean-square errors (MSEs); the smaller the MSE was, the greater the fit]. Moreover, commonly used clinical guidelines were used to aid in determining important variables. The selected predictors were subsequently input into the four algorithms, and the AUCs were recalculated. All ML algorithm models were validated with 5-fold cross-validation and bootstrap validation with 300 resamplings. The SHAP values of important variables were also categorized, and nomograms were constructed for practical use.
Optimization and hyperparameter configuration for classification models
The optimizer was configured for optimizing the XGBoost- and CatBoost-based models. To improve the predictive stability of our models and reduce the imbalance in the data, three optimization steps were performed. First, the maximum depth of the tree in the XGBoost algorithm was reduced and the penalty coefficients of the L2 and L1 regularization weights were increased to prevent overfitting of the prediction model. Second, the maximum depth in the CatBoost algorithm was reduced, and the corresponding L2 weight was increased. Third, we introduced the kernel density estimation (KDE) algorithm[32] for classifying small sample sizes. The maximum number of iterations was 500. The learning rate was set to 0.1 in the hyperparameter configurations. The other hyperparameters were set to their default optimized values.
Proposed architecture
The technical architecture used in this study was Model-View-View Model: Preprocessing of databases, view: Executing predictions, view model: Implementation of the prediction algorithms). The specific implementation plan involved providing the data processed by R language to the ML model that had been built and finally obtaining the prediction results of the entire architecture. Previous works involved either only processing the data in R to build the models or only performing the prediction tasks with the ML models. Our work combined the advantages of the R language itself with those of mature ML models. Our prediction algorithms were developed in Python 3.11, and our regression methods were based on the R language (version 4.3.2). The AUCs were compared and the SHAP values were implemented after the data iterations. The nomograms were established in R and were used as scoring systems in our work.
Statistical analysis
The clinicopathological characteristics of the patients were compared between the training and testing sets for each model with the χ2 test or Fisher's exact test.
Biostatistics
The statistical methods used in this study were reviewed by Shu-Cheng Si from Peking University Third Hospital.
RESULTS
Study population characteristics
The clinical and therapeutic characteristics of the study population are detailed in Supplementary Table 1. In addition, we further perform χ2 tests/Fisher's exact tests to compare the variable distributions between the training sets and testing sets of the four models (OS, DFS, RFS, and DMFS model) and found that there were almost no significant differences. In the OS model, 66.3% (689/1039) of the patients died 3 years after surgery, and 81.9% (716/874), 81.7% (138/169) and 82.7% (663/802) of the patients experienced tumor recurrence/distant metastasis, recurrence, and distant metastasis, respectively, within 3 years after surgery. After the 5-year follow-up, 18.5% (192/1039) of the patients were still alive; additionally, 3.9% (34/874), 4.7% (8/169) and 3.4% (27/802) of the patients did not demonstrate tumor recurrence/distant metastasis, recurrence, and distant metastasis, respectively.
Performance of the all-variable database models
The results for the full-variable models are shown as solid lines. Only the results in the testing sets in terms of the receiver operating characteristic (ROC) curves are shown here; the results in the training sets are shown in detail in Figures 2, 3, 4 and 5. The AUCs [95% confidence intervals (CIs)] of the OS models produced by the four ML algorithms are shown in Figure 2 [LR: cl 1: 0.75 (0.70-0.80), cl 2: 0.65 (0.59-0.72), cl 3: 0.87 (0.81-0.92); LDA: cl 1: 0.76 (0.71-0.81), cl 2: 0.66 (0.60-0.72), cl 3: 0.86 (0.80-0.92); XGBoost: cl 1: 0.71 (0.64-0.76), cl 2: 0.62 (0.55-0.70), cl 3: 0.79 (0.71-0.86); and CatBoost: cl 1: 0.75 (0.70-0.81), cl 2: 0.65 (0.59-0.72), cl 3: 0.83 (0.77-0.89)]. Figure 3 shows the AUCs (95%CI) for the DFS models [LR: cl 1: 0.71 (0.61-0.80), cl 2: 0.70 (0.61-0.80), cl 3: 0.65 (0.35-0.89); LDA: cl 1: 0.71 (0.61-0.80), cl 2: 0.70 (0.61-0.80), cl 3: 0.62 (0.34-0.88); XGBoost: cl 1: 0.69 (0.60-0.78), cl 2: 0.66 (0.57-0.76), cl 3: 0.70 (0.46-0.89); and CatBoost: cl 1: 0.75 (0.69-0.82), cl 2: 0.71 (0.63-0.80), cl 3: 0.68 (0.43-0.90)]. The obtained AUCs (95%CIs) for the RFS models are shown in Figure 4 [LR: cl 1: 0.80 (0.65-0.92), cl 2: 0.76 (0.53-0.92), cl 3: 0.95 (0.88-1.00); LDA: cl 1: 0.83 (0.69-0.93), cl 2: 0.77 (0.57-0.93), cl 3: 0.92 (0.83-0.99); XGBoost: cl 1: 0.81 (0.65-0.93), cl 2: 0.70 (0.47-0.87), cl 3: 0.79 (0.56-0.96); and CatBoost: cl 1: 0.82 (0.67-0.93), cl 2: 0.72 (0.51-0.90), cl 3: 0.83 (0.67-0.96)]. Figure 5 shows the AUCs (95%CI) for the DMFS models [LR: cl 1: 0.65 (0.56-0.73), cl 2: 0.63 (0.54-0.71), cl 3: 0.76 (0.47-0.95); LDA: cl 1: 0.64 (0.56-0.72), cl 2: 0.63 (0.55-0.71), cl 3: 0.74 (0.45-0.93); XGBoost: cl 1: 0.66 (0.57-0.75), cl 2: 0.65 (0.55-0.74), cl 3: 0.76 (0.47-0.93); and CatBoost: cl 1: 0.65 (0.56-0.74), cl 2: 0.62 (0.54-0.71), cl 3: 0.75 (0.50-0.92)].
Figure 2 Prediction of overall survival by machine learning models.
The plots show the areas under the curve (AUCs) and their 95% confidence interval (CI). A: The linear regression (LR) in the training set [class 1 (cl 1): Before: 0.76 (0.72-0.79), after: 0.74 (0.70-0.78); class 2 (cl 2): Before: 0.71 (0.66-0.75), after: 0.69 (0.64-0.73); class 3 (cl 3): Before: 0.81 (0.75-0.86), after: 0.78 (0.71-0.84)]; B: The LR model in the testing set [cl 1: Before: 0.75 (0.70-0.80), after: 0.78 (0.73-0.83); cl 2: Before: 0.65 (0.59-0.72), after: 0.67 (0.61-0.74); cl 3: Before: 0.87 (0.81-0.92), after: 0.88 (0.82-0.93)]; C: The linear discriminant analysis (LDA) model in the training set [cl 1: Before: 0.76 (0.73-0.80), after: 0.75 (0.71-0.78); cl 2: Before: 0.71 (0.67-0.75), after: 0.69 (0.65-0.74); cl 3: Before: 0.81 (0.75-0.86), after: 0.78 (0.71-0.84)]; D: The LDA model in the testing set [cl 1: Before: 0.76 (0.71-0.81), after: 0.78 (0.74-0.84); cl 2: Before: 0.66 (0.60-0.72), after: 0.67 (0.61-0.74); cl 3: Before: 0.86 (0.80-0.92), after: 0.89 (0.84-0.93)]; E: The eXtreme gradient boosting (XGBoost) model in the training set [cl 1: Before: 0.93 (0.92-0.95), after: 0.79 (0.76-0.82); cl 2: Before: 0.94 (0.93-0.96), after: 0.76 (0.72-0.80); cl 3: Before: 0.98 (0.97-0.99), after: 0.82 (0.76-0.87)]; F: The XGBoost model in the testing set [cl 1: Before: 0.71 (0.64-0.76), after: 0.76 (0.70-0.81); cl 2: Before: 0.62 (0.55-0.70), after: 0.64 (0.57-0.71); cl 3: Before: 0.79 (0.71-0.86), after: 0.85 (0.77-0.91)]; G: The categorical features and gradient boosting (CatBoost) model in the training set [cl 1: Before: 0.88 (0.86-0.91), after: 0.79 (0.75-0.82); cl 2: Before: 0.87 (0.85-0.90), after: 0.76 (0.72-0.80); cl 3: Before: 0.95 (0.93-0.97), after: 0.84 (0.78-0.88)]; H: The CatBoost model in the testing set [cl 1: Before: 0.75 (0.70-0.81), after: 0.77 (0.72-0.82); cl 2: Before: 0.65 (0.59-0.72), after: 0.65 (0.58-0.72); cl 3: Before: 0.83 (0.77-0.89), after: 0.86 (0.78-0.92)]. The curves of the models constructed with the full-variable datasets and the datasets containing only important variables are depicted with solid lines and dashed lines, respectively (abbreviated as “before” and “after” in this annotation).
Figure 3 Prediction of disease-free survival by machine learning models.
The plots show the areas under the curve (AUCs) and their 95%CI. A: The linear regression (LR) model in the training set [class 1 (cl 1): Before: 0.77 [0.73-0.82), after: 0.75 (0.70-0.79); class 2 (cl 2): Before: 0.75 (0.71-0.80), after: 0.69 (0.63-0.74); class 3 (cl 3): Before: 0.90 (0.84-0.95), after: 0.87 (0.81-0.92)]; B: The LR model in the testing set [cl 1: Before: 0.71 (0.61-0.80), after: 0.70 (0.60-0.79); cl 2: Before: 0.70 (0.61-0.80), after: 0.69 (0.57-0.80); cl 3: Before: 0.65 (0.35-0.89), after: 0.68 (0.43-0.87)]; C: The linear discriminant analysis (LDA) model in the training set [cl 1: Before: 0.77 (0.73-0.82), after: 0.75 (0.70-0.80); cl 2: Before: 0.76 (0.71-0.80), after: 0.69 (0.63-0.75); cl 3: Before: 0.89 (0.83-0.95), after: 0.87 (0.81-0.92)]; D: The LDA model in the testing set [cl 1: Before: 0.71 (0.61-0.80), after: 0.70 (0.60-0.79); cl 2: Before: 0.70 (0.61-0.80), after: 0.69 (0.57-0.79); cl 3: Before: 0.62 (0.34-0.88), after: 0.69 (0.43-0.88)]; E: The eXtreme gradient boosting (XGBoost) model in the training set [cl 1: Before: 0.96 (0.95-0.98), after: 0.79 (0.75-0.83); cl 2: Before: 0.94 (0.92-0.96), after: 0.74 (0.69-0.79); cl 3: Before: 0.99 (0.97-1.00), after: 0.88 (0.84-0.92)]; F: The XGBoost model in the testing set [cl 1: Before: 0.69 (0.60-0.78), after: 0.71 (0.62-0.80); cl 2: Before: 0.66 (0.57-0.76), after: 0.72 (0.62-0.81); cl 3: Before: 0.70 (0.46-0.89), after: 0.70 (0.48-0.88)]; G: The categorical features and gradient boosting (CatBoost) model in the training set [cl 1: Before: 0.91 (0.88-0.93), after: 0.80 (0.75-0.84); cl 2: Before: 0.91 (0.88-0.93), after: 0.76 (0.71-0.81); cl 3: Before: 0.99 (0.97-1.00), after: 0.89 (0.85-0.93)]; H: The CatBoost model in the testing set [cl 1: Before: 0.75 (0.69-0.82), after: 0.73 (0.64-0.81); cl 2: Before: 0.71 (0.63-0.80), after: 0.72 (0.63-0.81); cl 3: Before: 0.68 (0.43-0.90), after: 0.70 (0.50-0.87)]. The curves of the models constructed with the full-variable datasets and the datasets containing only important variables are depicted with solid lines and dashed lines, respectively (abbreviated as “before” and “after” in this annotation).
Figure 4 Prediction of recurrence-free survival by machine learning models.
The plots show the areas under the curve (AUCs) and their 95%CI. A: The linear regression (LR) model in the training set [class 1 (cl 1): Before: 0.84 (0.76-0.91), after: 0.76 (0.67-0.84); class 2 (cl 2): Before: 0.88 (0.81-0.93), after: 0.76 (0.66-0.85); class 3 (cl 3): Before: 0.99 (0.96-1.00), after: 0.95 (0.89-0.98)]; B: The LR model in the testing set [cl 1: Before: 0.80 (0.65-0.92), after: 0.68 (0.53-0.82); cl 2: Before: 0.76 (0.53-0.92), after: 0.66 (0.44-0.82); cl 3: Before: 0.95 (0.88-1.00), after: 0.85 (0.66-0.97)]; C: The linear discriminant analysis (LDA) model in the training set [cl 1: Before: 0.84 (0.76-0.91), after: 0.76 (0.67-0.84); cl 2: Before: 0.86 (0.79-0.92), after: 0.75 (0.64-0.85); cl 3: Before: 0.97 (0.92-1.00), after: 0.93 (0.88-0.98)]; D: The LDA model in the testing set [cl 1: Before: 0.83 (0.69-0.93), after: 0.70 (0.56-0.85); cl 2: Before: 0.77 (0.57-0.93), after: 0.67 (0.46-0.83); cl 3: Before: 0.92 (0.83-0.99), after: 0.85 (0.65-0.98)]; E: The eXtreme gradient boosting (XGBoost) model in the training set [cl 1: Before: 0.93 (0.86-0.97), after: 0.89 (0.82-0.94); cl 2: Before: 0.92 (0.85-0.97), after: 0.84 (0.75-0.92); cl 3: Before: 0.96 (0.91-1.00), after: 0.94 (0.86-1.00)]; F: The XGBoost model in the testing set [cl 1: Before: 0.81 (0.65-0.93), after: 0.83 (0.68-0.95); cl 2: Before: 0.70 (0.47-0.87), after: 0.70 (0.45-0.87); cl 3: Before: 0.79 (0.56-0.96), after: 0.88 (0.75-0.97)]; G: The categorical features and gradient boosting (CatBoost) model in the training set [cl 1: Before: 0.95 (0.91-0.99), after: 0.87 (0.79-0.93); cl 2: Before: 0.93 (0.87-0.97), after: 0.82 (0.72-0.91); cl 3: Before: 0.96 (0.90-1.00), after: 0.88 (0.73-1.00)]; and H: The CatBoost model in the testing set [cl 1: Before: 0.82 (0.67-0.93), after: 0.79 (0.64-0.91); cl 2: Before: 0.72 (0.51-0.90), after: 0.68 (0.45-0.85); cl 3: Before: 0.83 (0.67-0.96), after: 0.84 (0.68-0.93)]. The curves of the models constructed with the full-variable datasets and the datasets containing only important variables are depicted with solid lines and dashed lines, respectively (abbreviated as “before” and “after” in this annotation).
Figure 5 Prediction of distant metastasis-free survival by machine learning models.
The plots show the areas under the curve (AUCs) and their 95%CI. A: The linear regression (LR) model in the training set [class 1 (cl 1): Before: 0.81 (0.77-0.85), after: 0.77 (0.72-0.81); class 2 (cl 2): Before: 0.80 (0.75-0.84), after: 0.73 (0.67-0.79); class 3 (cl 3): Before: 0.90 (0.83-0.96), after: 0.84 (0.75-0.91)]; B: The LR model in the testing set [cl 1: Before: 0.65 (0.56-0.73), after: 0.67 (0.58-0.76); cl 2: Before: 0.63 (0.54-0.71), after: 0.63 (0.54-0.72); cl 3: Before: 0.76 (0.47-0.95), after: 0.79 (0.48-0.95)]; C: The linear discriminant analysis (LDA) model in the training set [cl 1: Before: 0.82 (0.78-0.85), after: 0.77 (0.72-0.81); cl 2: Before: 0.81 (0.77-0.84), after: 0.74 (0.68-0.79); cl 3: Before: 0.90 (0.84-0.96), after: 0.83 (0.75-0.90)]; D: The LDA model in the testing set [cl 1: Before: 0.64 (0.56-0.72), after: 0.65 (0.56-0.74); cl 2: Before: 0.63 (0.55-0.71), after: 0.62 (0.53-0.71); cl 3: Before: 0.74 (0.45-0.93), after: 0.79 (0.48-0.95)]; E: The eXtreme gradient boosting (XGBoost) model in the training set [cl 1: Before: 0.96 (0.94-0.97), after: 0.80 (0.75-0.84); cl 2: Before: 0.96 (0.94-0.98), after: 0.76 (0.70-0.81); cl 3: Before: 0.97 (0.96-0.99), after: 0.85 (0.79-0.91)]; F: The XGBoost model in the testing set [cl 1: Before: 0.66 (0.57-0.75), after: 0.68 (0.60-0.76); cl 2: Before: 0.65 (0.55-0.74), after: 0.64 (0.55-0.73); cl 3: Before: 0.76 (0.47-0.93), after: 0.80 (0.53-0.94)]; G: The categorical features and gradient boosting (CatBoost) model in the training set [cl 1: Before: 0.95 (0.93-0.96), after: 0.79 (0.74-0.83); cl 2: Before: 0.96 (0.94-0.97), after: 0.77 (0.72-0.82); cl 3: Before: 0.98 (0.97-1.00), after: 0.87 (0.81-0.92)]; H: The CatBoost model in the testing set [cl 1: Before: 0.65 (0.56-0.74), after: 0.67 (0.59-0.75); cl 2: Before: 0.62 (0.54-0.71), after: 0.63 (0.54-0.72); cl 3: Before: 0.75 (0.50-0.92), after: 0.78 (0.54-0.94)]. The curves of the models constructed with the full-variable datasets and the datasets containing only important variables are depicted with solid lines and dashed lines, respectively (abbreviated as “before” and “after” in this annotation).
Model explanatory features
The important variables differed by study outcome. The detailed MSEs are shown in Table 1. We selected ridge regression for the patient OS model, and the most important variable for predicting patient death was tumor differentiation grade, whereas the tumor differentiation grade, Ki-67 protein level, TNM stage, histological type, CHD, PNI, serum CRP level, and tumor size were the leading features for predicting multicategory OS (8 in total, Table 2 and Supplementary Figure 1). Subset regression was used for the DFS model, and the six important indicators were PNI, Ki-67 protein level, tumor differentiation grade, serum CRP level, histological type, and TNM stage; moreover, the most important variable was PNI (Supplementary Figure 2). For RFS, we used LASSO regression, and DM was found to be the most important variable. The nine important indicators were DM, histological type, serum CEA level, Ki-67 protein level, PNI, serum CRP level, drinking history, LNR, and BMI (Table 2 and Supplementary Figure 3). Subset regression was also chosen for the DMFS model, and PNI, Ki-67 protein level, TNM stage, tumor differentiation grade, and histological type were identified as the five important features; of these, PNI was the most important variable (Supplementary Figure 2).
Table 1 Comparisons of the mean square error of the four regression methods.
Regression methods
MSEs
OS
DFS
RFS
DMFS
Subset regression method
0.3451435
0.2798500
0.2982825
0.2473204
Ridge regression method
0.3446552
0.2851649
0.3134211
0.2618467
LASSO regression method
0.3539798
0.2841859
0.3051014
0.2600286
LASSO cross-validation method
0.3594004
0.3240556
0.3087686
0.2833409
Table 2 Regression coefficients for each variable after ridge regression for overall survival, and least absolute shrinkage and selection operator regression for recurrence-free survival.
No.
Intercept
s1
s1
1
Sex
-0.018391078
0
2
Age
-0.038454863
0
3
BMI
0.021728968
-0.001371348
4
HP
0.004982681
0
5
DM
0.001375364
0.119909730
6
CHD
0.096041621
0
7
Smoking history
0.028673390
0
8
Drinking history
0.057936668
0.013536804
9
Family history of tumors
-0.006440690
0
10
Family history of gastrointestinal tumors
0.049916084
0
11
Serum CEA level
0.045104522
-0.043589364
12
Serum CRP level
-0.069915886
-0.026111125
13
Tumor position
0.001514524
0
14
Tumor differentiation grade
-0.295140732
0
15
Histological type
-0.109175968
-0.063182813
16
Tumor size
-0.063522898
0
17
PNI
-0.092586921
-0.031648909
18
LVI
-0.026426021
0
19
Lesion number
0.037347444
0
20
Ki-67 protein level
-0.163277916
-0.033951556
21
Operation method
0.055820722
0
22
LNR
0.040875360
-0.009499988
23
TNM stage
-0.150669578
0
Performance of important variable models
The ROC curves of the models based on datasets containing only important variables identified after variable screening are shown as dashed lines. Only the results in the testing sets are shown here; the results in the training sets are shown in detail in Figures 2, 3, 4 and 5. The AUCs of the OS models obtained by the four ML algorithms are shown in Figure 2 [LR: cl 1: 0.78 (0.73-0.83), cl 2: 0.67 (0.61-0.74), cl 3: 0.88 (0.82-0.93); LDA: cl 1: 0.78 (0.74-0.84), cl 2: 0.67 (0.61-0.74), cl 3: 0.89 (0.84-0.93); XGBoost: cl 1: 0.76 (0.70-0.81), cl 2: 0.64 (0.57-0.71), cl 3: 0.85 (0.77-0.91); and CatBoost: cl 1: 0.77 (0.72-0.82), cl 2: 0.65 (0.58-0.72), cl 3: 0.86 (0.78-0.92)]. Figure 3 shows the AUCs for the DFS models in detail [LR: cl 1: 0.70 (0.60-0.79), cl 2: 0.69 (0.57-0.80), cl 3: 0.68 (0.43-0.87); LDA: cl 1: 0.70 (0.60-0.79), cl 2: 0.69 (0.57-0.79), cl 3: 0.69 (0.43-0.88); XGBoost: cl 1: 0.71 (0.62-0.80), cl 2: 0.72 (0.62-0.81), cl 3: 0.70 (0.48-0.88); and CatBoost: cl 1: 0.73 (0.64-0.81), cl 2: 0.72 (0.63-0.81), cl 3: 0.70 (0.50-0.87)]. The obtained AUCs for the RFS models are shown in Figure 4 [LR: cl 1: 0.68 (0.53-0.82), cl 2: 0.66 (0.44-0.82), cl 3: 0.85 (0.66-0.97); LDA: cl 1: 0.70 (0.56-0.85), cl 2: 0.67 (0.46-0.83), cl 3: 0.85 (0.65-0.98); XGBoost: cl 1: 0.83 (0.68-0.95), cl 2: 0.70 (0.45-0.87), cl 3: 0.88 (0.75-0.97); and CatBoost: cl 1: 0.79 (0.64-0.91), cl 2: 0.68 (0.45-0.85), cl 3: 0.84 (0.68-0.93)]. Figure 5 shows the AUCs for the DMFSs [LR: cl 1: 0.67 (0.58-0.76), cl 2: 0.63 (0.54-0.72), cl 3: 0.79 (0.48-0.95); LDA: cl 1: 0.65 (0.56-0.74), cl 2: 0.62 (0.53-0.71), cl 3: 0.79 (0.48-0.95); XGBoost: cl 1: 0.68 (0.60-0.76), cl 2: 0.64 (0.55-0.73), cl 3: 0.80 (0.53-0.94); and CatBoost: cl 1: 0.67 (0.59-0.75), cl 2: 0.63 (0.54-0.72), cl 3: 0.78 (0.54-0.94)]. The model AUCs did not significantly decrease and, in some cases, even increased after reducing the variables to only those identified as important.
SHAP value
Supplementary Figure 4 shows the plots of the model SHAP values. A higher probability of outcome occurrence is represented by a SHAP value less than zero. Patients with poor tumor differentiation, high Ki-67 protein level, poor histological type, late TNM stage, high tumor size, PNI, high serum CRP level, and CHD tended to have poorer OS. For DFS, PNI, high Ki-67 protein level, poor tumor differentiation, poor histological type, late TNM stage, and high serum CRP level were related to earlier tumor recurrence/distant metastasis. According to the RFS model, tumor recurrence was more frequent in patients with poor histological type, high BMI, high Ki-67 protein level, drinking history, high serum CEA level, PNI, DM, high LNR, and high serum CRP level. The distant metastasis model suggested that distant metastasis was more likely for a patient with PNI, high Ki-67 protein level, poor tumor differentiation, late TNM stage, and poor histological type.
Nomogram
Supplementary Figures 5-8 show the nomograms of each ML model. The total scores and the probability of outcome occurrence can be obtained from these nomograms.
Model validation
The 5-fold cross-validation and bootstrap validation results of our ML models are detailed in Supplementary Figures 9-12.
DISCUSSION
In this study, we developed and validated a promising ML architecture for predicting the 3-class occurrence time (cutoff values of 3 years and 5 years) for four oncological outcomes (patient death, tumor recurrence/distant metastasis, tumor recurrence, and distant tumor metastasis) and identified corresponding important variables. Moreover, 5-fold cross-validations and bootstrap validations were conducted. The AUC was calculated to evaluate our predictive models, and the effectiveness of our variable screening methods was evaluated by comparing the pre- and post-screening AUCs of the models. SHAP values aided in improving the explanation of the decision-making processes of the models. Moreover, nomograms were produced for ease of application of the models. This architecture represents a comprehensive, practical, and robust tool that clinicians can use when making clinical decisions. Additionally, given the nature of the included patient data, our architecture has good tolerance for heterogeneity and does not require clear patient medical histories, lowering the threshold for use. We cut the survival times, predicted them as multicategory endpoints, and assessed patient outcomes longitudinally through our results, providing a perspective that differs from those of previous studies. Our ML models were designed on the basis of specific oncological outcomes to predict the possible occurrence time category. Some of the important indicators, such as LNR, were newly identified as important predictors of CRC patient outcomes, providing possible insights for researchers and clinicians. Our work demonstrated the feasibility of applying ML models to CRC patients to a certain extent; moreover, the adaptability and interpretability of our architecture can help promote its application in hospitals at different levels.
To improve the practicality of the architecture, we selected clinicopathological indicators that are easily obtained, although several genetic and molecular markers have been proven to be correlated with patient prognoses[25,33]. Eschewing selection biases, we avoided the TNM stage-centric impasse by inputting multipotential variables into the parameter screening step, allowing the ML algorithms to screen important variables that performed best. Missing values were treated as one of the categories of corresponding categorical variables in our study for the sake of applicability and to avoid biases caused by improper filling of missing values to a certain extent. In practice, clinicians can choose the classification that represents missing data when missing values are encountered. We randomly grouped the patients into a training and testing cohort to avoid selection bias, and the differences in baseline characteristics between the two sets were almost not significant. Therefore, we did not further explore potential confounding factors. However, future studies should consider investigating the effects of such confounding factors on model outcomes.
We included patients with a clear medical history before the operation who underwent curative initial treatment; in this way, we excluded stage I, II, and III CRC patients who had received neoadjuvant chemoradiotherapy (which may have led to a vague history). Patients with stage IV CRC who were not eligible for radical surgery were also excluded. However, the history of postoperative adjuvant chemoradiotherapy in our patients was unclear. Consequently, the models might serve as simply rough references when physicians at higher-level hospitals redesign treatment strategies for patients from lower-level hospitals with unclear postoperative radiotherapy and chemotherapy histories. Treatment options for these patients are difficult to determine, and it is difficult for oncologists to obtain references from previous studies that stratify patients by chemotherapy regimen. Moreover, to avoid bias, we excluded patients who did not have endpoint data. When applying our architecture, patients who were predicted to not have outcomes were classified as having an outcome greater than 5 years. We cautiously excluded patients with long follow-up intervals and patients with clear, noncancer-specific deaths to further avoid bias. In addition, the number of patients with tumor recurrence was the smallest; therefore, after strict 7:3 classification into the training and testing cohorts, the sample size for the testing cohort in the RFS model was still small; therefore, we used the KDE algorithm to address this situation. The similarity between the fitting and original data was evaluated by the bandwidth in KDE, and the fitted data were merged with the original data to form a new RFS dataset. The data introduced here were only used as model training data to assess the performance of the prediction model.
We decomposed the 3-category outcomes into binary outcomes and introduced the data into a LR prediction model, LR; ultimately, 23 original characteristic variables were given outcome labels. As the dataset was multidimensional, the supervised dimensionality reduction algorithm LDA[34,35] was introduced to effectively improve the establishment of the prediction models. We selected XGBoost to optimize the prediction models, although calculating leaf weights is a complex process. The predictive values were obtained by directly summing the leaf weights of all the weak classifiers. We adopted CatBoost because it is based on a tree structure algorithm that can increase the robustness of prediction models. Most importantly, CatBoost can process categorical features into numerical features. The algorithm counts the categorical features, calculates the frequency of appearance of each categorical feature, and subsequently adds hyperparameters to generate new numerical features. CatBoost also uses combined category features, which can leverage the connections between features, greatly enriching the feature dimensions. After analyzing the performance of CatBoost and comparing it with that of LDA, we found that the advantages of the former outweighed those of the latter. To avoid bias in gradient estimation and address prediction bias, we adopted a sorting enhancement method to combat noise in the training dataset. In addition, to better predict the 3-category survival time for patients with different oncological outcomes, we configured and optimized the hyperparameters wherever possible. Furthermore, to reduce bias, the computer experts were blinded to the meaning of each indicator when building the ML models.
Taking survival time as a categorical variable and making more precise predictions to obtain a time interval of oncological outcome occurrence reflects one of the potential applications of our models in clinical practice and provides a different perspective for making predictions from previous studies. A more accurate prediction of possible patient outcomes could translate into more precise formulations of treatment therapies and patient management strategies. Extending survival time is a shared goal among clinicians and oncology patients, and quantifying patient outcomes aids in shared decision making[36]. Because of the heterogeneity of CRC, physicians and patients must seriously consider the trade-offs between adverse effects and benefits[37] when choosing a treatment strategy. It is possible to improve outcomes by administering closer follow-up or additional chemoradiotherapy to patients who are predicted to have poorer outcomes. Consequently, we suggest that patients expected to have a shorter DMFS receive prophylactic chemotherapy or regional radiotherapy at common CRC metastasis sites, as described by Jiang et al[38]. Moreover, the identification of patients with better outcomes could reduce medical care costs and improve the level of humanistic care by reducing the psychological burden on patients and their families. Therefore, predictive tools such as those produced with our architecture should be adopted quickly in clinical practice. However, the results output by models with unclear vital parameters for managing patients are not always acceptable to clinicians and patients[39,40]. Model interpretability is important, especially in biomedicine[41,42]. To turn a model with unclear important parameters into a model with clear important parameters model, we screened the corresponding predictors and showed their importance for different patient outcomes. TNM stage, the primary indicator for chemoradiotherapy decisions, was included in the OS, DFS, and DMFS models, which made our models more credible. Moreover, indicators that have been widely found to be correlated with patient outcome, such as PNI[43-45], pathological type[46-49], and tumor differentiation grade[50], were also included in the models, further confirming the credibility of our architecture. One of the potential benefits of using ML models is that important variables can be identified, while less important parameters can be ignored. Several predictors that were not widely used as important predictors for CRC patient outcome were also included in our model, providing new insights into the concept of predicting patient outcomes. The LNR, whose high prognostic value has been previously demonstrated[51,52], was selected for inclusion in our RFS model and could emerge as an important prognostic indicator for CRC patients for clinical decision-making. The levels of serum CRP and tumor Ki-67 protein were also shown to be prognostic factors, which is consistent with studies showing that high serum CRP levels are associated with increased postoperative complication rates[53,54] and that Ki-67 levels reflect the proliferative capacity of cells[55], especially tumor cells[56]. Our models also identified several predictors that were not previously considered to be directly linked to a poor outcome (unifocal vs multifocal lesions and laparotomy vs laparoscopy)[57-60]. These factors are more likely to be directly related to surgical trauma rather than survival time, however. For the OS and RFS models, chronic diseases, CHD and DM, were selected together with alcohol consumption history and BMI, suggesting that clinicians need to be mindful of cancer patients with underlying diseases and poor health conditions. Moreover, we validated the effectiveness of our screening method by comparing the AUCs of the all-variable and important variable-only predictive models. Model performance did not significantly decrease and in some cases even improved when constructed only with the important variables. This finding indicates that our important variables indeed play key roles in our prediction models. After calculating SHAP values, we found that different variables had different degrees of impact on the predictions. Furthermore, clinicians could directly and conveniently reference our results through the nomograms provided in the Results section.
Our findings revealed the formidable predictive power of ML methods, particularly for heterogeneous diseases whose outcome stratifications serve as important clinical references. ML has unique value in clinical applications; when guiding patient management, improving patient outcomes, and tailoring treatment regimens, it could provide important reference values, especially under conditions of resource scarcity (for example, when only clinicopathological and surgical variables are available for analysis). In terms of the computer calculations used, the function parameters were those of the multiple LR model, whereas the optimization parameter b applied in the Cox proportional hazards model is consistent with that of the multiple LR model. Moreover, compared with the Cox proportional hazards model[61], ML has some advantages, although the Cox proportional hazards model and the multivariate linear model in the ML model are similar. When selecting the important parameters, the Cox proportional hazards model commonly shows independent prognostic factors and more indirectly compares their predictive value, whereas ML finds important factors and compares their importance more reliably and directly. When building models and determining their performance, the Cox proportional hazards model typically takes a specific survival time and builds double/multiclass risk stratifications based on this survival time, whereas we predicted the 3-category occurrence time of the oncological outcomes, i.e. patient survival times from a longitudinal perspective, with our ML models. Moreover, the number of variables for which the Cox proportional hazards model performed best was less than that of the ML model, which could include multiple variables[62,63]. In addition, the ML and Cox proportional hazards models also differed in terms of the AUC. In our study, the AUC refers to the percentage of correct prediction results in the total sample and specifically refers to the ratio of 3-category survival times (OS, DFS, RFS, and DMFS) to the corresponding sample size of the datasets. In the ML model, we binarized the three categories, and the AUC value was the same as the conventional value. The final AUC was obtained from the last module of the ML model[64] and corresponded to the three outcome classifications. Although we obtained encouragingly high predictive performance, high robustness, and transparently important variables, more progress is needed before ML can be fully relied upon. In addition, in clinical practice, traditional performance measures such as the AUC must be translated into medically relevant measures to elucidate the patient-centric value of the ML model, indicating that ML is still lacking in some ways.
The limitations of our study must be noted. First, the sample size needed to collect the data to be input into the ML models was not large (especially for the RFS model). When the data were input into the ML models for parameter optimization, the sensitivity of parameter adjustment could not be estimated because of the small sample size. To compensate for this limitation, we introduced the KDE algorithm for the RFS model (classes 2 and 3); the algorithm was only applied during the training and testing process of the models and achieved good results. Second, the sample uniformity of the data could not be estimated. This is a common problem in ML[41] that can possibly affect the final results. Furthermore, this was a retrospective study, which implies the presence of certain selection biases. However, the patient data were obtained from a well-conceived and well-characterized cohort, which increases the credibility of our results; thus, this study can serve as the basis for subsequent prospective studies. In addition, postoperative treatment information, such as information regarding specific radiotherapy and chemotherapy treatments as well as details of the surgical methods, was not available in our datasets and should be considered for inclusion in future works.
Prospective research with a larger sample size and a more comprehensive and consummate design is needed in the future. Moreover, information involving omics data and microbial analyses is highly worthy of inclusion in the predictive models. We look forward to carrying out more in-depth work in the future.
CONCLUSION
We successfully designed and validated comprehensive, accessible, and robust clinicopathological-based ML prediction models built from clearly identified important variables. Our work could serve as a reference for CRC patient management and outcome improvement. We demonstrated the potential of the proposed ML architecture for clinical application.
ACKNOWLEDGEMENTS
We thank all the authors for their contributions to the manuscript and for their recognition of the data and conclusions; moreover, we thank Shu-Cheng Si, who reviewed the statistical methods of this study.
Footnotes
Provenance and peer review: Unsolicited article; Externally peer reviewed.
Peer-review model: Single blind
Specialty type: Oncology
Country of origin: China
Peer-review report’s classification
Scientific Quality: Grade B, Grade C, Grade E
Novelty: Grade B, Grade B, Grade D
Creativity or Innovation: Grade B, Grade B, Grade D
Scientific Significance: Grade B, Grade B, Grade D
P-Reviewer: Cengiz F; Stabellini N; Wang Z S-Editor: Qu XL L-Editor: Filipodia P-Editor: Wang WB
Dienstmann R, Mason MJ, Sinicrope FA, Phipps AI, Tejpar S, Nesbakken A, Danielsen SA, Sveen A, Buchanan DD, Clendenning M, Rosty C, Bot B, Alberts SR, Milburn Jessup J, Lothe RA, Delorenzi M, Newcomb PA, Sargent D, Guinney J. Prediction of overall survival in stage II and III colon cancer beyond TNM system: a retrospective, pooled biomarker study.Ann Oncol. 2017;28:1023-1031.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 124][Cited by in F6Publishing: 148][Article Influence: 24.7][Reference Citation Analysis (0)]
Chu QD, Zhou M, Medeiros KL, Peddi P, Kavanaugh M, Wu XC. Poor survival in stage IIB/C (T4N0) compared to stage IIIA (T1-2 N1, T1N2a) colon cancer persists even after adjusting for adequate lymph nodes retrieved and receipt of adjuvant chemotherapy.BMC Cancer. 2016;16:460.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 21][Cited by in F6Publishing: 37][Article Influence: 4.6][Reference Citation Analysis (0)]
Rekhraj S, Aziz O, Prabhudesai S, Zacharakis E, Mohr F, Athanasiou T, Darzi A, Ziprin P. Can intra-operative intraperitoneal free cancer cell detection techniques identify patients at higher recurrence risk following curative colorectal cancer resection: a meta-analysis.Ann Surg Oncol. 2008;15:60-68.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 32][Cited by in F6Publishing: 34][Article Influence: 2.0][Reference Citation Analysis (0)]
Choi JY, Jung SA, Shim KN, Cho WY, Keum B, Byeon JS, Huh KC, Jang BI, Chang DK, Jung HY, Kong KA; Korean ESD Study Group. Meta-analysis of predictive clinicopathologic factors for lymph node metastasis in patients with early colorectal carcinoma.J Korean Med Sci. 2015;30:398-406.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 44][Cited by in F6Publishing: 60][Article Influence: 6.7][Reference Citation Analysis (0)]
Grimm LJ, Plichta JK, Hwang ES. More Than Incremental: Harnessing Machine Learning to Predict Breast Cancer Risk.J Clin Oncol. 2022;40:1713-1717.
[PubMed] [DOI][Cited in This Article: ][Reference Citation Analysis (0)]
Xie C, Zhuang XX, Niu Z, Ai R, Lautrup S, Zheng S, Jiang Y, Han R, Gupta TS, Cao S, Lagartos-Donate MJ, Cai CZ, Xie LM, Caponio D, Wang WW, Schmauck-Medina T, Zhang J, Wang HL, Lou G, Xiao X, Zheng W, Palikaras K, Yang G, Caldwell KA, Caldwell GA, Shen HM, Nilsen H, Lu JH, Fang EF. Amelioration of Alzheimer's disease pathology by mitophagy inducers identified via machine learning and a cross-species workflow.Nat Biomed Eng. 2022;6:76-93.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 31][Cited by in F6Publishing: 140][Article Influence: 70.0][Reference Citation Analysis (0)]
Kim M, Chen C, Wang P, Mulvey JJ, Yang Y, Wun C, Antman-Passig M, Luo HB, Cho S, Long-Roche K, Ramanathan LV, Jagota A, Zheng M, Wang Y, Heller DA. Detection of ovarian cancer via the spectral fingerprinting of quantum-defect-modified carbon nanotubes in serum by machine learning.Nat Biomed Eng. 2022;6:267-275.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 91][Cited by in F6Publishing: 55][Article Influence: 27.5][Reference Citation Analysis (0)]
Liang H, Tsui BY, Ni H, Valentim CCS, Baxter SL, Liu G, Cai W, Kermany DS, Sun X, Chen J, He L, Zhu J, Tian P, Shao H, Zheng L, Hou R, Hewett S, Li G, Liang P, Zang X, Zhang Z, Pan L, Cai H, Ling R, Li S, Cui Y, Tang S, Ye H, Huang X, He W, Liang W, Zhang Q, Jiang J, Yu W, Gao J, Ou W, Deng Y, Hou Q, Wang B, Yao C, Liang Y, Zhang S, Duan Y, Zhang R, Gibson S, Zhang CL, Li O, Zhang ED, Karin G, Nguyen N, Wu X, Wen C, Xu J, Xu W, Wang B, Wang W, Li J, Pizzato B, Bao C, Xiang D, He W, He S, Zhou Y, Haw W, Goldbaum M, Tremoulet A, Hsu CN, Carter H, Zhu L, Zhang K, Xia H. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence.Nat Med. 2019;25:433-438.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 257][Cited by in F6Publishing: 265][Article Influence: 53.0][Reference Citation Analysis (0)]
D'Ascenzo F, De Filippo O, Gallone G, Mittone G, Deriu MA, Iannaccone M, Ariza-Solé A, Liebetrau C, Manzano-Fernández S, Quadri G, Kinnaird T, Campo G, Simao Henriques JP, Hughes JM, Dominguez-Rodriguez A, Aldinucci M, Morbiducci U, Patti G, Raposeiras-Roubin S, Abu-Assi E, De Ferrari GM; PRAISE study group. Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets.Lancet. 2021;397:199-207.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 96][Cited by in F6Publishing: 143][Article Influence: 47.7][Reference Citation Analysis (0)]
Parikh RB, Manz CR, Nelson MN, Evans CN, Regli SH, O'Connor N, Schuchter LM, Shulman LN, Patel MS, Paladino J, Shea JA. Clinician perspectives on machine learning prognostic algorithms in the routine care of patients with cancer: a qualitative study.Support Care Cancer. 2022;30:4363-4372.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 16][Reference Citation Analysis (0)]
Song JH, Yu M, Kang KM, Lee JH, Kim SH, Nam TK, Jeong JU, Jang HS, Lee JW, Jung JH. Significance of perineural and lymphovascular invasion in locally advanced rectal cancer treated by preoperative chemoradiotherapy and radical surgery: Can perineural invasion be an indication of adjuvant chemotherapy?Radiother Oncol. 2019;133:125-131.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 12][Cited by in F6Publishing: 32][Article Influence: 6.4][Reference Citation Analysis (0)]
Kim SH, Shin SJ, Lee KY, Kim H, Kim TI, Kang DR, Hur H, Min BS, Kim NK, Chung HC, Roh JK, Ahn JB. Prognostic value of mucinous histology depends on microsatellite instability status in patients with stage III colon cancer treated with adjuvant FOLFOX chemotherapy: a retrospective cohort study.Ann Surg Oncol. 2013;20:3407-3413.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 52][Cited by in F6Publishing: 69][Article Influence: 6.3][Reference Citation Analysis (0)]
Nitsche U, Zimmermann A, Späth C, Müller T, Maak M, Schuster T, Slotta-Huspenina J, Käser SA, Michalski CW, Janssen KP, Friess H, Rosenberg R, Bader FG. Mucinous and signet-ring cell colorectal cancers differ from classical adenocarcinomas in tumor biology and prognosis.Ann Surg. 2013;258:775-82; discussion 782.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 148][Cited by in F6Publishing: 201][Article Influence: 18.3][Reference Citation Analysis (0)]
Garrity MM, Burgart LJ, Mahoney MR, Windschitl HE, Salim M, Wiesenfeld M, Krook JE, Michalak JC, Goldberg RM, O'Connell MJ, Furth AF, Sargent DJ, Murphy LM, Hill E, Riehle DL, Meyers CH, Witzig TE; North Central Cancer Treatment Group. Prognostic value of proliferation, apoptosis, defective DNA mismatch repair, and p53 overexpression in patients with resected Dukes' B2 or C colon cancer: a North Central Cancer Treatment Group Study.J Clin Oncol. 2004;22:1572-1582.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 98][Cited by in F6Publishing: 102][Article Influence: 5.1][Reference Citation Analysis (0)]
Orive M, Anton A, Gonzalez N, Aguirre U, Anula R, Lázaro S, Redondo M, Bare M, Briones E, Escobar A, Sarasqueta C, Ferreiro J, Quintana JM; REDISSEC-CARESS/CCR group. Factors associated with colon cancer early, intermediate and late recurrence after surgery for stage I-III: A 5-year prospective study.Eur J Cancer Care (Engl). 2020;29:e13317.
[PubMed] [DOI][Cited in This Article: ][Cited by in F6Publishing: 1][Reference Citation Analysis (0)]
Muñoz JL, Alvarez MO, Cuquerella V, Miranda E, Picó C, Flores R, Resalt-Pereira M, Moya P, Pérez A, Arroyo A. Procalcitonin and C-reactive protein as early markers of anastomotic leak after laparoscopic colorectal surgery within an enhanced recovery after surgery (ERAS) program.Surg Endosc. 2018;32:4003-4010.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 37][Cited by in F6Publishing: 44][Article Influence: 7.3][Reference Citation Analysis (0)]
Schlüter C, Duchrow M, Wohlenberg C, Becker MH, Key G, Flad HD, Gerdes J. The cell proliferation-associated antigen of antibody Ki-67: a very large, ubiquitous nuclear protein with numerous repeated elements, representing a new kind of cell cycle-maintaining proteins.J Cell Biol. 1993;123:513-522.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 508][Cited by in F6Publishing: 564][Article Influence: 18.2][Reference Citation Analysis (0)]
Starborg M, Gell K, Brundell E, Höög C. The murine Ki-67 cell proliferation antigen accumulates in the nucleolar and heterochromatic regions of interphase cells and at the periphery of the mitotic chromosomes in a process essential for cell cycle progression.J Cell Sci. 1996;109 ( Pt 1):143-153.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 139][Cited by in F6Publishing: 140][Article Influence: 5.0][Reference Citation Analysis (0)]
Fleshman J, Branda ME, Sargent DJ, Boller AM, George VV, Abbas MA, Peters WR Jr, Maun DC, Chang GJ, Herline A, Fichera A, Mutch MG, Wexner SD, Whiteford MH, Marks J, Birnbaum E, Margolin DA, Larson DW, Marcello PW, Posner MC, Read TE, Monson JRT, Wren SM, Pisters PWT, Nelson H. Disease-free Survival and Local Recurrence for Laparoscopic Resection Compared With Open Resection of Stage II to III Rectal Cancer: Follow-up Results of the ACOSOG Z6051 Randomized Controlled Trial.Ann Surg. 2019;269:589-595.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 183][Cited by in F6Publishing: 237][Article Influence: 59.3][Reference Citation Analysis (0)]
Hida K, Okamura R, Sakai Y, Konishi T, Akagi T, Yamaguchi T, Akiyoshi T, Fukuda M, Yamamoto S, Yamamoto M, Nishigori T, Kawada K, Hasegawa S, Morita S, Watanabe M; Japan Society of Laparoscopic Colorectal Surgery. Open versus Laparoscopic Surgery for Advanced Low Rectal Cancer: A Large, Multicenter, Propensity Score Matched Cohort Study in Japan.Ann Surg. 2018;268:318-324.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 56][Cited by in F6Publishing: 78][Article Influence: 15.6][Reference Citation Analysis (0)]
Poortmans PM, Collette S, Kirkove C, Van Limbergen E, Budach V, Struikmans H, Collette L, Fourquet A, Maingon P, Valli M, De Winter K, Marnitz S, Barillot I, Scandolaro L, Vonk E, Rodenhuis C, Marsiglia H, Weidner N, van Tienhoven G, Glanzmann C, Kuten A, Arriagada R, Bartelink H, Van den Bogaert W; EORTC Radiation Oncology and Breast Cancer Groups. Internal Mammary and Medial Supraclavicular Irradiation in Breast Cancer.N Engl J Med. 2015;373:317-327.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 706][Cited by in F6Publishing: 721][Article Influence: 80.1][Reference Citation Analysis (0)]
Kadan A, Ryczko K, Wildman A, Wang R, Roitberg A, Yamazaki T. Accelerated Organic Crystal Structure Prediction with Genetic Algorithms and Machine Learning.J Chem Theory Comput. 2023;19:9388-9402.
[PubMed] [DOI][Cited in This Article: ][Reference Citation Analysis (0)]