Retrospective Cohort Study
Copyright ©The Author(s) 2022. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Sep 28, 2022; 28(36): 5338-5350
Published online Sep 28, 2022. doi: 10.3748/wjg.v28.i36.5338
Machine learning-based gray-level co-occurrence matrix signature for predicting lymph node metastasis in undifferentiated-type early gastric cancer
Xin Wei, Xue-Jiao Yan, Yu-Yan Guo, Jie Zhang, Guo-Rong Wang, Arsalan Fayyaz, Jiao Yu
Xin Wei, Department of Oncology, Shaanxi Provincial People’s Hospital, Xi’an 710068, Shaanxi Province, China
Xue-Jiao Yan, Department of Magnetic Resonance, Shaanxi Provincial People’s Hospital, Xi’an 710068, Shaanxi Province, China
Yu-Yan Guo, Department of Radiotherapy, The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710004, Shaanxi Province, China
Jie Zhang, Department of Gastrointestinal Surgery, Shaanxi Provincial Tumour Hospital, Xi’an 710068, Shaanxi Province, China
Guo-Rong Wang, Department of General Surgery, Shaanxi Provincial People’s Hospital, Xi’an 710068, Shaanxi Province, China
Arsalan Fayyaz, School of Management, Northwestern Polytechnical University, Xi’an 710072, Shaanxi Province, China
Jiao Yu, Department of Radiotherapy, Shaanxi Provincial People’s Hospital, Xi’an 710068, Shaanxi Province, China
Author contributions: Yu J and Wei X conceived and designed the study and wrote the manuscript; Yan XJ, Guo YY, Zhang J, Wang GR, and Arsalan F collected the data, performed the data analysis, and interpreted the outcomes; and all authors critically reviewed the content of the manuscript and helped with the drafts.
Supported by the General Project-Social Development Field of Shaanxi Province Science and Technology Department, No. 2021SF-313; and Innovation Capability Support Plan of Shaanxi Science and Technology Department - Science and Technology Innovation Team, No. 2020TD-048.
Institutional review board statement: This study was approved by the Institutional Review Committee of Shaanxi Provincial People’s Hospital (2021-Y024).
Informed consent statement: Written informed consent was not required given the retrospective nature of the study from chart review.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Data sharing statement: No additional data are available.
STROBE statement: The authors have read the STROBE Statement-a checklist of items is provided. The manuscript was prepared and revised according to the STROBE Statement-a checklist of items is provided.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Jiao Yu, MD, Radiologist, Department of Radiotherapy, Shaanxi Provincial People’s Hospital, No. 256 Youyi West Road, Beilin District, Xi’an 710068, Shaanxi Province, China. shawn170215@163.com
Received: July 20, 2022
Peer-review started: July 20, 2022
First decision: August 6, 2022
Revised: August 14, 2022
Accepted: September 6, 2022
Article in press: September 6, 2022
Published online: September 28, 2022
Processing time: 64 Days and 19.4 Hours
ARTICLE HIGHLIGHTS
Research background

Gray-level co-occurrence matrix (GLCM) based feature extraction could serve as a robust and promising tool to improve the predictive efficiency for lymph node metastasis (LNM) of individual undifferentiated early gastric cancer (UEGC) patients. Additionally, machine learning (ML) adopts more optimized algorithms and more clear feature extraction. Models built using random forest classifier (RFC) have the highest predictive accuracy in Entropy, Haralick full angle (Haralick_all), Haralick 30° (Haralick_30), Inverse gap full angle (IG_all), Inverse gap 45° (IG_45), Inverse gap 0° (IG_0), and Inertia value 45° (IV_45). Further research is needed to develop these models for clinical practice.

Research motivation

The evaluation results indicate that the method of selecting radiological and textural features becomes more effective in the discrimination of LNM from UEGC patients. In addition, an ML-based prediction model developed using RFC can be used to derive treatment options and identify LNM that can improve clinical outcomes.

Research objectives

GLCM based feature extraction significantly correlated with LNM. The top 7 GLCM based factors included Inertia value 0°, IV_45, IG_0, IG_45, IG_all, Haralick_30, Haralick_all, and Entropy. The areas under the receiver operating characteristic (ROC) curve (AUCs) of the RFC model, support vector machine (SVM), eXtreme gradient boosting (XGBoost), artificial neural network (ANN), and decision tree (DT) ranged from 0.805 [95% confidence interval (CI): 0.258-1.352] to 0.925 (95%CI: 0.378-1.472) in the training set and from 0.794 (95%CI: 0.237-1.351) to 0.912 (95%CI: 0.355-1.469) in the testing set, respectively. The RFC (training set: AUC: 0.925, 95%CI: 0.378-1.472; testing set: AUC: 0.912, 95%CI: 0.355-1.469) model incorporating Entropy, Haralick_all, Haralick_30, IG_all, IG_45, IG_0, and IV_45 had the highest predictive accuracy.

Research methods

We retrospectively selected 526 cases of UEGC confirmed by pathological examination after radical gastrectomy without endoscopic treatment in four tertiary hospitals between January 2015 to December 2021. GLCM-based features were extracted from grayscale images and ML was applied to the classification of candidate predictive variables. In order to evaluate robustness and clinical utility of each model, the following were made: ROC, decision curve analysis, and clinical impact curve.

Research results

Identifying a potential biomarker that predicts LNM is proven to be very useful in determining treatment.

Research conclusions

To develop a ML-based integral procedure to construct the LNM gray level co-occurrence matrix (GLCM) prediction model.

Research perspectives

The risk of LNM is the most important consideration in determining treatment strategies for UEGC. Therefore, identifying a potential biomarker that predicts LNM is proven to be very useful in determining treatment.