BPG is committed to discovery and dissemination of knowledge
Retrospective Cohort Study
Copyright ©The Author(s) 2026. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Jan 21, 2026; 32(3): 115527
Published online Jan 21, 2026. doi: 10.3748/wjg.v32.i3.115527
Application of machine learning models in predicting the risk of thromboembolic events in patients with nonvariceal gastrointestinal bleeding
Chao Lu, Hao-Yang Cheng, Ren-Ke Zhu, Yi-De Zhou, Ke-Fang Sun, Lei Xu, Jian-Zhong Sang, Jiao-E Chen, Chao-Hui Yu, Yu-Lu Qin, Lan Li
Chao Lu, Yi-De Zhou, Chao-Hui Yu, Lan Li, Department of Gastroenterology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, Zhejiang Province, China
Hao-Yang Cheng, Yu-Lu Qin, Laboratory of Ultrafast Intelligent Optoelectronic Information, College of Electrical and Information Engineering, Quzhou University, Quzhou 324000, Zhejiang Province, China
Ren-Ke Zhu, Department of Gastroenterology, Zhejiang University School of Medicine, Hangzhou 310003, Zhejiang Province, China
Ke-Fang Sun, Department of Internal Medicine Residency Program, Rochester General Hospital, New York, NY 10041NY212, United States
Lei Xu, Department of Gastroenterology, Ningbo First Hospital, Ningbo 315010, Zhejiang Province, China
Jian-Zhong Sang, Department of Gastroenterology, Renmin Hospital of Yuyao City, Yuyao 315499, Zhejiang Province, China
Jiao-E Chen, Department of Gastroenterology, Sanmen People's Hospital of Zhejiang Province, Sanmen 317100, Zhejiang Province, China
Co-first authors: Chao Lu and Hao-Yang Cheng.
Co-corresponding authors: Yu-Lu Qin and Lan Li.
Author contributions: Lu C and Cheng HY wrote the manuscript as co-first authors; Lu C, Cheng HY and Zhu RK participated in the conception and design of the study and were involved in the acquisition, analysis, or interpretation of data; Sun KF and Yu CH accessed and verified the study data; Zhou YD, Xu L, Sang JZ, and Chen JE collected data; Qin YL and Li L revised the manuscript as co-corresponding authors; all authors critically reviewed and approved the final manuscript to be published.
Institutional review board statement: The study protocol was approved by the Clinical Research Ethics Committee of the First Affiliated Hospital, Zhejiang University School of Medicine (No. 2024-1142).
Informed consent statement: Waiver regarding informed consent.
Conflict-of-interest statement: All authors declare no conflict of interest in publishing the manuscript.
STROBE statement: The authors have read the STROBE Statement – checklist of items, and the manuscript was prepared and revised according to the STROBE Statement – checklist of items.
Data sharing statement: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Lan Li, Chief Physician, Department of Gastroenterology, The First Affiliated Hospital, Zhejiang University, No. 79 Qingchun Road, Hangzhou 310003, Zhejiang Province, China. nalil@zju.edu.cn
Received: October 21, 2025
Revised: November 10, 2025
Accepted: December 16, 2025
Published online: January 21, 2026
Processing time: 89 Days and 21.2 Hours
Abstract
BACKGROUND

Clinically, patients with nonvariceal gastrointestinal bleeding (NVGB) are prone to thromboembolic events, but the specific risk remains unclear.

AIM

To identify risk factors and evaluate the performance of five machine learning (ML) models in predicting the risk of thromboembolic events in patients with NVGB.

METHODS

This retrospective cohort study enrolled 866 patients from a tertiary hospital for model training and internal validation, and 282 patients from three other tertiary hospitals for external validation. These data were used to develop five ML models to predict the risk of thromboembolic events in patients with NVGB. After initial feature selection by training ML models, ten variables were selected to construct simplified ML models. Model performance was evaluated using accuracy, precision, sensitivity, specificity, F1-score and area under the receiver operating characteristic curve. Calibration curve and decision curve analysis were used to further evaluate the predicted probabilities and net benefits of the models.

RESULTS

During hospitalization, the incidence of thromboembolic events was 25.61% in patients with NVGB. The categorical boosting (CatBoost) algorithm which combined variable importance and SHapley Additive exPlanations values identified 10 independent predictors of thromboembolic events: (1) History of anticoagulant drug use; (2) D-dimer level; (3) Age; (4) History of thromboembolism; (5) Length of hospital stays; (6) Intensive care unit (ICU) admission; (7) Hemoglobin level; (8) Use of hemostatic drugs; (9) Heart rate; and (10) Serum albumin level. We developed five simplified ML prediction models (L1 regularized logistic regression, random forest, support vector machines, extreme gradient boosting, and CatBoost) based on the above 10 predictors, which achieved area under the receiver operating characteristic curves of 0.805, 0.804, 0.806, 0.746, and 0.815 in external validation, respectively. The performance of all five ML models significantly exceeded that of D-dimer alone in both internal and external validation. The CatBoost model demonstrated good calibration and accuracy, achieving the lowest Brier score of 0.131 and 0.110 in the internal and external validation set, respectively. Of the five models, the CatBoost model was considered the preferred choice in clinical settings.

CONCLUSION

The findings in this study enable effective and timely preventive interventions for high-risk patients, and help avoid unnecessary monitoring in low-risk patients.

Keywords: Nonvariceal gastrointestinal bleeding; Thromboembolic event; Machine learning; Categorical boosting; D-dimer

Core Tip: This multicenter study developed and validated five machine learning models to predict thromboembolic risk in patients with nonvariceal gastrointestinal bleeding. Using ten key clinical variables identified by categorical boosting and SHapley Additive exPlanations analysis, all models showed superior predictive performance to D-dimer alone, with the categorical boosting model achieving the best calibration and accuracy. These models can help clinicians identify high-risk patients for early intervention while reducing unnecessary monitoring in low-risk individuals.