Published online Aug 24, 2025. doi: 10.5306/wjco.v16.i8.107306
Revised: May 11, 2025
Accepted: July 3, 2025
Published online: August 24, 2025
Processing time: 153 Days and 4.7 Hours
Ki-67 is a routine test item in clinical pathology departments. However, its prognostic value requires further investigation, especially in the context of research using machine learning (ML), which remains relatively underdeveloped.
To investigate the prognostic value of Ki-67 in cases of colorectal carcinoma (CRC) and explore the potential application of ML algorithms to predict the Ki-67 index.
Case data and pathological sections from two centers were systematically collected. To analyze the prognostic value of the Ki-67 index in CRC, multiple cutoff values were established. Meanwhile, by virtue of the histological features presented in the hematoxylin and eosin-stained CRC images, three mainstream ML algorithms, support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost) were employed to construct prediction models. Subsequently, the potential of these algorithms to classify and predict the Ki-67 index was explored.
Non-parametric tests revealed that Ki-67 ≥ 40% correlated with a high histological grade (P = 0.017), deficient mismatch repair protein status associated with ≥ 50%-90% cutoffs (all P ≤ 0.028), and ≥ 80% linked to lymph node metastasis (P = 0.006). Kaplan-Meier analysis showed that Ki-67 ≥ 50% predicted higher survival (log-rank P = 0.0299, hazard ratio = 2.142), with no differences for other cutoffs. COX regression identified the Ki-67 positive rate as a significant predictor (P = 0.027, hazard ratio = 2.583), while other variables had no association. In algorithmic model predictions, the SVM, RF, and XGBoost models achieved training area under the curve (AUC) values of 0.851, 0.948, and 0.872, respectively, with corresponding test set AUC values of 0.795, 0.755, and 0.750, respectively. During external validation, their AUC values for predicting Ki-67 status reached 0.757, 0.749, and 0.783, respectively.
In algorithmic model predictions, the SVM, RF, and XGBoost models achieved training AUC values of 0.851, 0.948, and 0.872, respectively, with corresponding test set AUC values of 0.795, 0.755, and 0.750, respectively. During external validation, their AUC values for predicting Ki-67 status reached 0.757, 0.749, and 0.783, respectively.
Core Tip: This study pioneers the application of machine learning to predict Ki-67 status in colorectal carcinoma directly from hematoxylin and eosin-stained images. By analyzing data, 50% was identified as the optimal Ki-67 cutoff, with high-expression being linked to improved survival rates and low-expression being associated with advanced tumor stage and lymph node metastasis. Predictive models were developed using the support vector machine, random forest, and eXtreme gradient boosting algorithms, achieving area under the curve values (0.851-0.948 in training and 0.750-0.795 in the external validation group). This innovative approach highlights the potential of machine learning to enhance prognostic accuracy.
