Xiong FX, Sun L, Zhang XJ, Chen JL, Zhou Y, Ji XM, Meng PP, Wu T, Wang XB, Hou YX. Machine learning-based models for advanced fibrosis in non-alcoholic steatohepatitis patients: A cohort study. World J Gastroenterol 2025; 31(9): 101383 [PMID: 40061588 DOI: 10.3748/wjg.v31.i9.101383]
Corresponding Author of This Article
Yi-Xin Hou, PhD, Center of Integrative Medicine, Beijing Ditan Hospital, Capital Medical University, No. 8 Jingshun East Street, Chaoyang District, Beijing 100015, China. xuexin162@163.com
Research Domain of This Article
Computer Science, Artificial Intelligence
Article-Type of This Article
Retrospective Cohort Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Gastroenterol. Mar 7, 2025; 31(9): 101383 Published online Mar 7, 2025. doi: 10.3748/wjg.v31.i9.101383
Machine learning-based models for advanced fibrosis in non-alcoholic steatohepatitis patients: A cohort study
Fei-Xiang Xiong, Lei Sun, Xue-Jie Zhang, Jia-Liang Chen, Yang Zhou, Xiao-Min Ji, Pei-Pei Meng, Tong Wu, Xian-Bo Wang, Yi-Xin Hou
Fei-Xiang Xiong, Lei Sun, Xue-Jie Zhang, Jia-Liang Chen, Yang Zhou, Xiao-Min Ji, Pei-Pei Meng, Tong Wu, Xian-Bo Wang, Yi-Xin Hou, Center of Integrative Medicine, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China
Lei Sun, Department of Pathology, Beijing Ditan Hospital, Beijing 100015, China
Co-corresponding authors: Xian-Bo Wang and Yi-Xin Hou.
Author contributions: Xiong FX performed the methodology and writing; Lei S, Zhang XJ, Chen JL, Zhou Y, Ji XM, Meng PP and Wu T collected the data; Hou YX and Wang XB designed the research and revised the manuscript; Wang XB and Hou YX conceptualized and designed the research; Sun L, Ji XM, Meng PP and Wu T screened patients and acquired clinical data; Zhang XJ, Chen JL and Zhou Y collected blood specimens and performed laboratory analyses; Xiong FX performed data analysis and wrote the paper; All the authors have read and approved the final manuscript. Both Wang XB and Hou YX have played important and indispensable roles in the experimental design, data interpretation and manuscript preparation as the co-corresponding authors.
Supported by the Natural Science Foundation of China, No. 81970512; the Beijing Hospitals Authority Youth Programme, No. QMl220201802; the Beijing Traditional Chinese Medicine Science and Technology Development Fund Project, No. Qn-2020-25; and High-Level Public Health Technical Personnel Construction Project.
Institutional review board statement: The study was approved by the Ethics Committee of Beijing Ditan Hospital, Capital Medical University, No. DTEC-KY2024-009-02.
Informed consent statement: Written informed consent was obtained from each patient.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
STROBE statement: The authors have read the STROBE Statement—checklist of items, and the manuscript was prepared and revised according to the STROBE Statement—checklist of items.
Data sharing statement: Data are unavailable due to privacy or ethical restrictions.
Corresponding author: Yi-Xin Hou, PhD, Center of Integrative Medicine, Beijing Ditan Hospital, Capital Medical University, No. 8 Jingshun East Street, Chaoyang District, Beijing 100015, China. xuexin162@163.com
Received: September 12, 2024 Revised: December 2, 2024 Accepted: January 8, 2025 Published online: March 7, 2025 Processing time: 158 Days and 22.6 Hours
Abstract
BACKGROUND
The global prevalence of non-alcoholic steatohepatitis (NASH) and its associated risk of adverse outcomes, particularly in patients with advanced liver fibrosis, underscores the importance of early and accurate diagnosis.
AIM
To develop a machine learning-based diagnostic model for advanced liver fibrosis in NASH patients.
METHODS
A total of 749 patients who underwent liver biopsy at Beijing Ditan Hospital, Capital Medical University, between January 2010 and January 2020 were included. Patients were randomly divided into training (n = 522) and validation (n = 224) cohorts. Five machine learning models were applied to predict advanced liver fibrosis, with feature selection based on Shapley Additive Explanations (SHAP). The diagnostic performance of these models was compared to traditional scores such as the aspartate aminotransferase to platelet ratio index (APRI) and fibrosis index based on the 4 factors (FIB-4), using metrics including the area under the receiver operating characteristic curve (AUROC), decision curve analysis (DCA), and calibration curves.
RESULTS
The Extreme Gradient Boosting (XGBoost) model outperformed all other machine learning models, achieving an AUROC of 0.934 (95%CI: 0.914-0.955) in the training cohort and 0.917 (95%CI: 0.880-0.953) in the validation cohort (P < 0.001). Incorporating liver stiffness measurement into the model further improved its performance, with an AUROC of 0.977 (95%CI: 0.966-0.980) in the training cohort and 0.970 (95%CI: 0.950-0.990) in the validation cohort, significantly surpassing APRI and FIB-4 scores (P < 0.001). The XGBoost model also demonstrated superior clinical utility, as evidenced by DCA and calibration curve analysis in both cohorts.
CONCLUSION
The XGBoost model provides a highly accurate, non-invasive diagnosis of advanced liver fibrosis in NASH patients, outperforming traditional methods. An online tool based on this model has been developed to assist clinicians in evaluating the risk of advanced liver fibrosis.
Core Tip: This study employed Shapley Additive Explanations (SHAP) to select key features for diagnosing advanced liver fibrosis in non-alcoholic steatohepatitis patients. Among five machine learning models, the Extreme Gradient Boosting model achieved the best performance and was further developed into an online diagnostic tool. SHAP was also used to provide local explanations, clarifying its applicability across clinical populations.