BPG is committed to discovery and dissemination of knowledge
Retrospective Study
©Author(s) (or their employer(s)) 2026. No commercial re-use. See Permissions. Published by Baishideng Publishing Group Inc.
World J Gastrointest Surg. Feb 27, 2026; 18(2): 114951
Published online Feb 27, 2026. doi: 10.4240/wjgs.v18.i2.114951
Development of a machine learning-based model for predicting postoperative survival in gastric cancer
Ya-Na Lü, Dong Liu, Shuai Tao, Ju Wu, Shu-Juan Yu, Hui-Ling Yuan
Ya-Na Lü, Dong Liu, Shuai Tao, Shu-Juan Yu, Hui-Ling Yuan, School of Information Engineering, Dalian University, Dalian 116622, Liaoning Province, China
Ju Wu, Department of General Surgery, Zhongshan Hospital Affiliated to Dalian University, Dalian 116001, Liaoning Province, China
Author contributions: Lü YN contributed to conceptualization, methodology, and supervision; Liu D and Tao S contributed to formal analysis, software, validation, and visualization; Wu J, Yu SJ and Yuan HL contributed to data curation, investigation, and resources. All authors contributed to writing - original draft, review and editing, and approved the final manuscript.
Institutional review board statement: This study has been approved by Institutional Review Board of the Zhongshan Hospital Affiliated to Dalian University (Approval No. KY2023-002-2).
Informed consent statement: All study participants and their legal guardians provided written informed consent before recruitment.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Data sharing statement: No additional data are available.
Corresponding author: Ju Wu, Department of General Surgery, Zhongshan Hospital Affiliated to Dalian University, No. 6 Jiefang Street, Zhongshan District, Dalian 116001, Liaoning Province, China. wuju@s.dlu.edu.cn
Received: October 11, 2025
Revised: December 2, 2025
Accepted: January 8, 2026
Published online: February 27, 2026
Processing time: 147 Days and 2.4 Hours
Abstract
BACKGROUND

Accurate prediction of postoperative survival is crucial for the personalized management of gastric cancer. However, the development of robust predictive models is often constrained by incomplete clinical data, while their clinical utility is limited by poor interpretability and the absence of practical applications.

AIM

To develop an interpretable machine learning model for predicting 3-year survival following gastric cancer surgery. A novel data imputation method was proposed to handle missing values, and a user-friendly online tool was developed to facilitate clinical decision-making.

METHODS

A retrospective analysis was conducted on a group of 304 patients with gastric adenocarcinoma. A hybrid imputation method (HDI-MF-Gower) was developed and compared against conventional techniques. Key prognostic factors were identified by integrating least absolute shrinkage and selection operator regression with the Boruta algorithm. Subsequently, ten machine learning models were trained and validated.

RESULTS

The proposed HDI-MF-Gower method demonstrated superior imputation accuracy. Seven features were selected for the final model. The extra trees classifier achieved the best performance on the independent validation set, with an area under the curve of 0.853 and an accuracy of 0.772. The optimal model was interpreted using SHapley Additive exPlanations analysis and deployed as an online prediction tool.

CONCLUSION

A robust and interpretable predictive model integrating advanced data imputation was successfully developed. The deployed tool facilitates individualized prognostic assessment and shows potential for enhancing personalized treatment planning in gastric cancer.

Keywords: Gastric cancer; Machine learning; Survival prediction; Missing data imputation; Extra trees

Core Tip: This study developed a novel hybrid imputation method (HDI-MF-Gower) to handle missing clinical data. We then built and validated a robust machine learning model (extra trees classifier) for predicting postoperative 3-year survival in gastric cancer patients. The model demonstrated high performance (area under the curve of 0.853), and its clinical application is facilitated by interpretable SHapley Additive exPlanations analysis and a user-friendly online prediction tool, aiding personalized treatment planning.