Observational Study
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Jul 21, 2025; 31(27): 108200
Published online Jul 21, 2025. doi: 10.3748/wjg.v31.i27.108200
Machine learning-based identification of biochemical markers to predict hepatic steatosis in patients at high metabolic risk
Yuan Tian, Hang-Yi Zhou, Ming-Lin Liu, Yi Ruan, Zhao-Xian Yan, Xiao-Hua Hu, Juan Du
Yuan Tian, Hang-Yi Zhou, Ming-Lin Liu, Zhao-Xian Yan, Juan Du, Department of Chinese Medicine, Changhai Hospital, Naval Medical University, Shanghai 200433, China
Yuan Tian, Hang-Yi Zhou, Ming-Lin Liu, Zhao-Xian Yan, Juan Du, School of Traditional Chinese Medicine, Naval Medical University, Shanghai 200433, China
Yi Ruan, PLA Naval Medical Center, Shanghai 200433, China
Zhao-Xian Yan, School of Integrative Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
Xiao-Hua Hu, Digital Innovation Laboratory, Changhai Hospital, Naval Medical University, Shanghai 200433, China
Co-first authors: Yuan Tian and Hang-Yi Zhou.
Co-corresponding authors: Xiao-Hua Hu and Juan Du.
Author contributions: Hu XH and Du J contributed equally to this study as co-corresponding authors; Hu XH and Du J conceived and planned this study; Tian Y and Zhou HY contributed equally to this study as co-first authors; Tian Y and Zhou HY performed the vast majority of the data acquisition and analysis for this experiment; Liu ML, Ruan Y, and Yan ZX performed the remaining data collection and analysis; Tian Y and Du J wrote the first draft of the manuscript; Hu XH and Du J were responsible for the execution and supervision of the entire project.
Institutional review board statement: The study was reviewed and approved by the Shanghai Changhai Hospital Medical Ethics Committee (Approval No. CHEC2025-129).
Informed consent statement: All study participants, or their legal guardian, provided informed written consent prior to study enrollment.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
STROBE statement: The authors have read the STROBE Statement—checklist of items, and the manuscript was prepared and revised according to the STROBE Statement—checklist of items.
Data sharing statement: The data are available from the corresponding author upon reasonable request.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Juan Du, Department of Chinese Medicine, Changhai Hospital, Naval Medical University, No. 168 Changhai Road, Yangpu District, Shanghai 200433, China. dujuan714@163.com
Received: April 8, 2025
Revised: May 20, 2025
Accepted: July 1, 2025
Published online: July 21, 2025
Processing time: 105 Days and 1.8 Hours
Abstract
BACKGROUND

Metabolic-associated fatty liver disease (MAFLD) is the most common cause of chronic liver disease and remains under-recognized within the health check-up population. Ultrasonography during physical examination fail to accurately identify at-risk patients as they involve multiple metabolic aspects.

AIM

To rapidly identify hepatic steatosis patients from high-metabolic-risk populations and reduce medical costs.

METHODS

We analyzed all data from a prospective cohort study to identify potential predictors of MAFLD risk. The LASSO and recursive feature elimination were used to screen for feature selection. Four machine learning models were employed to construct the prediction model for hepatic steatosis.

RESULTS

We found that 86.2% of the 1011 individuals in the trial phase exhibited metabolic abnormalities, with 70.8% presenting with hepatic steatosis. After data cleaning, 711 participants (207 non-MAFLD patients vs 504 MAFLD patients) were included, and the prediction models were validated. After overlapping and reducing the feature set based on feature importance ranking, we developed an interpretable final XGBoost model with 10 features, achieving an area under the curve of 0.82.

CONCLUSION

We have introduced a valuable noninvasive tool for efficiently identifying hepatic steatosis patients in high-metabolic-risk populations. This tool may improve screening effectiveness and reduce medical costs.

Keywords: Metabolic-associated fatty liver disease; Machine learning; Prediction model; Hepatic steatosis; High metabolic risk population

Core Tip: We used a prospective cohort to develop and optimize a high-performance machine learning model, demonstrating its potential to screen the hepatic fat deposition in high-risk populations. We also integrate the facial and tongue diagnosis of traditional Chinese medicine (TCM) with the heterogeneity of metabolic-associated fatty liver disease (MAFLD) and introduce TCM-related indicators to increase the diversity of the metrics. Our model targets a more specific population and is applicable to a broader range of scenarios, which lays the foundation for significantly improving MAFLD check-up efficiency and reducing related medical expenses.