Published online Sep 28, 2022. doi: 10.3748/wjg.v28.i36.5338
Peer-review started: July 20, 2022
First decision: August 6, 2022
Revised: August 14, 2022
Accepted: September 6, 2022
Article in press: September 6, 2022
Published online: September 28, 2022
Processing time: 64 Days and 19.4 Hours
The most important consideration in determining treatment strategies for undifferentiated early gastric cancer (UEGC) is the risk of lymph node metastasis (LNM). Therefore, identifying a potential biomarker that predicts LNM is quite useful in determining treatment.
To develop a machine learning (ML)-based integral procedure to construct the LNM gray-level co-occurrence matrix (GLCM) prediction model.
We retrospectively selected 526 cases of UEGC confirmed through pathological examination after radical gastrectomy without endoscopic treatment in four tertiary hospitals between January 2015 to December 2021. We extracted GLCM-based features from grayscale images and applied ML to the classification of candidate predictive variables. The robustness and clinical utility of each model were evaluated based on the following factors: Receiver operating characteristic curve (ROC), decision curve analysis, and clinical impact curve.
GLCM-based feature extraction significantly correlated with LNM. The top 7 GLCM-based factors included inertia value 0° (IV_0), inertia value 45° (IV_45), inverse gap 0° (IG_0), inverse gap 45° (IG_45), inverse gap full angle (IG_all), Haralick 30° (Haralick_30), Haralick full angle (Haralick_all), and Entropy. The areas under the ROC curve (AUCs) of the random forest classifier (RFC) model, support vector machine, eXtreme gradient boosting, artificial neural network, and decision tree ranged from 0.805 [95% confidence interval (CI): 0.258-1.352] to 0.925 (95%CI: 0.378-1.472) in the training set and from 0.794 (95%CI: 0.237-1.351) to 0.912 (95%CI: 0.355-1.469) in the testing set, respectively. The RFC (training set: AUC: 0.925, 95%CI: 0.378-1.472; testing set: AUC: 0.912, 95%CI: 0.355-1.469) model that incorporates Entropy, Haralick_all, Haralick_30, IG_all, IG_45, IG_0, and IV_45 had the highest predictive accuracy.
The evaluation results indicate that the method of selecting radiological and textural features becomes more effective in the LNM discrimination against UEGC patients. Additionally, the ML-based prediction model developed using the RFC can be used to derive treatment options and identify LNM, which can hence improve clinical outcomes.
Core Tip: Gray-level co-occurrence matrix-based feature extraction can be a robust and promising tool to improve the efficiency in predicting lymph node metastasis of individual undifferentiated early gastric cancer patients. Additionally, machine learning adopts more optimized algorithms and more clear feature extraction. Models developed using random forest classifier have the highest predictive accuracy in terms of Entropy, Haralick full angle, Haralick 30°, inverse gap full angle, inverse gap 45°, inverse gap 0°, and inertia value 45°. Further research is required to develop these models for clinical practice.
