BPG is committed to discovery and dissemination of knowledge
Letter to the Editor Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Dec 7, 2025; 31(45): 114413
Published online Dec 7, 2025. doi: 10.3748/wjg.v31.i45.114413
Machine learning to predict metabolic-associated fatty liver disease
Ottavia Cicerone, Department of Clinical-Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia 27100, Italy
Marcello Maestri, General Surgery Unit I - Liver Service, Fondazione IRCCS Policlinico San Matteo, Pavia 27100, Italy
ORCID number: Ottavia Cicerone (0009-0004-9712-2553); Marcello Maestri (0000-0002-5693-9151).
Author contributions: Maestri M contributed to the project administration; Cicerone O and Maestri M contributed to the concept and design of the study, the writing of the original draft and the review and editing of the manuscript.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Marcello Maestri, MD, PhD, Professor, General Surgery Unit I - Liver Service, Fondazione IRCCS Policlinico San Matteo, P.le Golgi 19, Pavia 27100, Italy. m.maestri@smatteo.pv.it
Received: September 18, 2025
Revised: September 29, 2025
Accepted: October 28, 2025
Published online: December 7, 2025
Processing time: 76 Days and 13.2 Hours

Abstract

Metabolic-associated fatty liver disease (MAFLD) represents the most common cause of chronic liver disease worldwide and remains frequently underdiagnosed in its early stages. Tian et al recently reported a prospective observational study that developed a machine learning-based model to predict hepatic steatosis in high-risk individuals. The resulting XGBoost model demonstrated excellent predictive performance (area under the curve 0.82; cross-validation mean area under the curve 0.918). Importantly, the study highlighted clinically meaningful predictors such as the aspartate aminotransferase/alanine aminotransferase ratio, triglycerides, and waist circumference, alongside novel traditional Chinese medicine-derived features like greasy tongue coating and tongue edge redness. Nonetheless, challenges remain, including the need for standardized traditional Chinese medicine assessment, external multicenter validation, and refined modeling to account for MAFLD heterogeneity. Future studies should expand biomarker panels, incorporate advanced imaging, and evaluate clinical outcomes of model-driven interventions. Overall, Tian et al provide a valuable contribution by demonstrating that machine learning can improve early detection and personalized management of MAFLD.

Key Words: Metabolic-associated fatty liver disease; Hepatic steatosis; Machine learning; Predictive model; Chronic liver disease

Core Tip: Machine learning can enhance early detection of metabolic-associated fatty liver disease by integrating biochemical, clinical, and traditional Chinese medicine features into predictive models. Tian et al provide a promising framework, though external validation and refinement for disease heterogeneity are needed before widespread clinical adoption.



TO THE EDITOR

Tian et al’s study[1] presents an innovative application of machine learning (ML) for the prediction of hepatic steatosis in individuals at high metabolic risk - defined as patients with one or more metabolic risk factors such as obesity, type 2 diabetes, dyslipidemia, or hypertension. Metabolic-associated fatty liver disease (MAFLD) has emerged as the most common chronic liver disorder worldwide, yet its silent progression in the early stages makes detection challenging. According to the international consensus definition, MAFLD is diagnosed when hepatic steatosis is present together with one of the following: Overweight/obesity, type 2 diabetes mellitus, or evidence of metabolic dysregulation[2]. In this context, the study by Tian et al[1] is particularly noteworthy, as it leverages ML to transform readily available biochemical and clinical information into a practical tool for risk prediction. While ultrasonography remains the most widely used screening modality, its limited sensitivity underscores the need for alternative approaches that are both accurate and scalable.

Model development and validation

A methodological strength of the study is its comprehensive use of multiple ML algorithms - XGBoost, random forest, support vector machine, and logistic regression - representing diverse modeling approaches. This allowed for robust comparative analysis, with each algorithm selected for specific strengths: XGBoost for nonlinear interactions, random forest for overfitting resistance, support vector machine for high-dimensional data, and logistic regression for baseline interpretability[3,4]. By benchmarking these models against one another, the authors ensure that the final choice - XGBoost - is the result of systematic validation. In this context, the superior performance of XGBoost likely reflects its ability to capture complex non-linear interactions between metabolic biomarkers and traditional Chinese medicine (TCM)-derived features, providing an advantage over other algorithms. The reported area under the curve values (0.82 in the test set; 0.918 in cross-validation) and balanced F1-score demonstrate that the model is not only accurate but also reliable across internal validation settings (Table 1).

Table 1 Overview of machine learning algorithms used by Tian et al[1].
Algorithm
Main strengths
Limitations/considerations
Role in Tian et al’s study[1]
LRSimple, interpretable, baseline comparatorLimited handling of non-linear relationshipsServed as reference model
RFRobust to overfitting, good for tabular dataLess interpretable, may require tuningModerate performance
SVMEffective with high-dimensional dataSensitive to parameter choice, less scalableTested but lower accuracy
XGBoostHandles non-linear interactions, high accuracy, efficient“Black box” risk, requires interpretability toolsBest-performing model (AUC = 0.82; CV AUC = 0.918)
Feature selection and predictors

Equally important is the study’s rigorous feature selection process. The combined use of recursive feature elimination and least absolute shrinkage and selection operator regression distilled a complex dataset of 156 candidate variables into 10 robust predictors[5]. Notably, these included markers that hepatologists already recognize as clinically meaningful (e.g., aspartate aminotransferase/alanine aminotransferase ratio, triglycerides, and waist circumference), together with additional predictors of metabolic dysfunction such as low- and high-density lipoproteins, the albumin/globulin ratio, and the creatinine-to-body-weight ratio. Each of these variables has established clinical associations with MAFLD pathophysiology: Dyslipidemia captured by low- and high-density lipoproteins reflects systemic metabolic imbalance; the albumin/globulin ratio provides indirect information on liver synthetic function and systemic inflammation; and the creatinine-to-body-weight ratio is emerging as a surrogate for muscle mass and renal metabolic load, both relevant to metabolic risk. Their inclusion reinforces the model’s clinical relevance and underscores its ability to integrate diverse pathophysiological dimensions.

Methodological aspects and enhanced interpretability

In addition, several methodological aspects deserve emphasis. Class imbalance (70.8% MAFLD vs 29.2% non-MAFLD) was addressed through the synthetic minority over-sampling technique, which partially corrected the underrepresentation of non-MAFLD cases[6]. Hepatic steatosis was diagnosed using FibroScan® with a controlled attenuation parameter threshold of ≥ 238 dB/m, a noninvasive approach that provides higher accuracy and reproducibility compared with conventional ultrasonography. This strengthens the validity of the reference standard used for model development[7].

Another innovative aspect was incorporating TCM indicators alongside conventional clinical metrics. Two TCM-derived features, greasy tongue coating and tongue edge redness, emerged prominently among the top predictors, bridging traditional holistic diagnostics and contemporary data analytics. This methodological choice pushes beyond conventional biomedical boundaries, suggesting that subtle, visually accessible traits may carry diagnostic value when objectively quantified. In the study, these features were acquired using the Intelligent Constitution Identifier system and subsequently reviewed by expert physicians, representing an important attempt to standardize and objectify TCM diagnostics. However, replicating such measures outside the original clinical setting remains challenging, given the lack of internationally standardized protocols and the potential for cultural or technological variability in their assessment. These limitations underscore the need for further harmonization and validation before TCM-derived indicators can be widely integrated into clinical practice.

Importantly, the authors also addressed the “black box” concern often associated with complex ML models[8]. By employing SHapley Additive exPlanation analysis, they quantified the relative contribution of each predictor, thereby enhancing transparency and bridging the gap between algorithmic complexity and clinical interpretability (Table 2). This effort is particularly valuable, as interpretability is essential for fostering clinician trust and encouraging real-world adoption. Future studies could expand on this interpretability framework, but the current work already provides a meaningful step forward in making ML models clinically accessible.

Table 2 Key methodological features of the study by Tian et al[1].
Methodological aspect
Approach used
Rationale/significance
Reference standard for steatosisFibroScan® (CAP ≥ 238 dB/m)More accurate and reproducible than conventional ultrasonography; strengthens diagnostic validity
Class imbalanceSMOTEMitigates underrepresentation of non-MAFLD cases; reduces bias in training
Feature selectionRecursive feature elimination + LASSO regressionReduced 156 variables to 10 key predictors with strong pathophysiological relevance
TCM feature acquisitionICI + expert reviewAttempt to standardize and objectify TCM-derived indicators
Model interpretabilitySHAP analysisQuantified relative contribution of each predictor; improved clinical interpretability
Limitations and future directions

Nevertheless, further challenges remain. First, external validation in multicenter cohorts is essential, given the study’s single-center design and potential population-specific biases. Moreover, the absence of subgroup analyses (e.g., diabetic vs non-diabetic patients, different grades of steatosis) limits insights into disease heterogeneity. Comorbidities such as obesity, type 2 diabetes, and metabolic dysregulation not only define high-risk individuals but may also strongly influence model performance, potentially limiting its applicability in populations with different risk profiles. Furthermore, it should be noted that 70.8% of the study cohort was diagnosed with MAFLD, a relatively high prevalence that may have contributed to the model’s strong performance while potentially limiting its applicability to populations with lower disease burden. To mitigate this imbalance, the authors applied synthetic minority over-sampling technique, which partially addressed the underrepresentation of non-MAFLD cases, although external validation in more balanced populations remains necessary. In addition, the demographic profile of the cohort was predominantly male (over 70%), which raises questions about the generalizability of the findings to more diverse or balanced populations. Future validation efforts should therefore aim to include cohorts with broader demographic representation and varying disease prevalence. It should also be acknowledged that the controlled attenuation parameter threshold of ≥ 238 dB/m, used as the reference standard for steatosis, provides only a dichotomous definition of disease presence and does not capture the degree of severity, thereby limiting insights into disease progression. In addition, while cross-validation provided important internal checks, bootstrap resampling could have further strengthened the reliability of the model by quantifying the variance and optimism of performance metrics. Bootstrap allows repeated sampling with replacement from the original dataset, generating empirical distributions of metrics such as the area under the curve or F1-score. This approach would have provided a more detailed picture of the model’s stability in the absence of external validation. Moving forward, combining both approaches - bootstrap-based internal validation and multicenter external validation - would ensure maximum robustness and reproducibility. Such efforts could ultimately facilitate integration of ML-based prediction tools into digital applications and clinical workflows[3,4,9]. Future research should therefore prioritize multicenter validation, stratified analyses, and, ideally, prospective interventional trials where predictions are linked to tailored management strategies.

From a clinical standpoint, the value of such a model lies in its ability to stratify risk in routine care. By identifying individuals who would most benefit from advanced imaging or closer follow-up, the tool has the potential to optimize resources - an especially relevant feature in health systems with constrained capacity. Integration into electronic health records, or even mobile health applications, could further accelerate its real-world impact. However, such clinical integration should be preceded by rigorous external validation and, ideally, prospective interventional studies assessing the impact of model-guided strategies on patient outcomes and decision-making.

Conclusion

In summary, Tian et al[1] provide a thoughtful and innovative contribution to the evolving field of predictive hepatology. Their work illustrates not only the power of ML in refining diagnostic strategies but also the importance of methodological rigor and openness to unconventional data sources. While external validation and refinement are still needed, the study lays a strong foundation for future tools aimed at improving early MAFLD detection and patient-centered care.

Footnotes

Provenance and peer review: Unsolicited article; Externally peer reviewed.

Peer-review model: Single blind

Corresponding Author’s Membership in Professional Societies: International Hepato-Pancreato-Biliary Association; European-African Hepato-Pancreato-Biliary Association; European Society of Surgical Oncology; International Society of Liver Surgeons.

Specialty type: Gastroenterology and hepatology

Country of origin: Italy

Peer-review report’s classification

Scientific Quality: Grade A, Grade A, Grade B, Grade B, Grade C

Novelty: Grade B, Grade B, Grade B, Grade C, Grade C

Creativity or Innovation: Grade A, Grade B, Grade C, Grade C, Grade C

Scientific Significance: Grade B, Grade B, Grade B, Grade C, Grade C

P-Reviewer: Gutiérrez-Cuevas J, PhD, Full Professor, Mexico; Liao WZ, PhD, Assistant Professor, China; Lis-Gutiérrez JP, PhD, Colombia S-Editor: Wang JJ L-Editor: A P-Editor: Wang WB

References
1.  Tian Y, Zhou HY, Liu ML, Ruan Y, Yan ZX, Hu XH, Du J. Machine learning-based identification of biochemical markers to predict hepatic steatosis in patients at high metabolic risk. World J Gastroenterol. 2025;31:108200.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 2]  [Reference Citation Analysis (1)]
2.  Eslam M, Newsome PN, Sarin SK, Anstee QM, Targher G, Romero-Gomez M, Zelber-Sagi S, Wai-Sun Wong V, Dufour JF, Schattenberg JM, Kawaguchi T, Arrese M, Valenti L, Shiha G, Tiribelli C, Yki-Järvinen H, Fan JG, Grønbæk H, Yilmaz Y, Cortez-Pinto H, Oliveira CP, Bedossa P, Adams LA, Zheng MH, Fouad Y, Chan WK, Mendez-Sanchez N, Ahn SH, Castera L, Bugianesi E, Ratziu V, George J. A new definition for metabolic dysfunction-associated fatty liver disease: An international expert consensus statement. J Hepatol. 2020;73:202-209.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2883]  [Cited by in RCA: 3014]  [Article Influence: 602.8]  [Reference Citation Analysis (2)]
3.  Jiang T, Gradus JL, Rosellini AJ. Supervised Machine Learning: A Brief Primer. Behav Ther. 2020;51:675-687.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 237]  [Cited by in RCA: 276]  [Article Influence: 55.2]  [Reference Citation Analysis (0)]
4.  Binson VA, Thomas S, Subramoniam M, Arun J, Naveen S, Madhu S. A Review of Machine Learning Algorithms for Biomedical Applications. Ann Biomed Eng. 2024;52:1159-1183.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 32]  [Reference Citation Analysis (0)]
5.  Xi LJ, Guo ZY, Yang XK, Ping ZG. [Application of LASSO and its extended method in variable selection of regression analysis]. Zhonghua Yu Fang Yi Xue Za Zhi. 2023;57:107-111.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 13]  [Reference Citation Analysis (0)]
6.  Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res. 2002;16:321-357.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 10955]  [Cited by in RCA: 7199]  [Article Influence: 313.0]  [Reference Citation Analysis (0)]
7.  Mikolasevic I, Orlic L, Franjic N, Hauser G, Stimac D, Milic S. Transient elastography (FibroScan®) with controlled attenuation parameter in the assessment of liver steatosis and fibrosis in patients with nonalcoholic fatty liver disease - Where do we stand? World J Gastroenterol. 2016;22:7236-7251.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in CrossRef: 205]  [Cited by in RCA: 207]  [Article Influence: 23.0]  [Reference Citation Analysis (0)]
8.  Handelman GS, Kok HK, Chandra RV, Razavi AH, Huang S, Brooks M, Lee MJ, Asadi H. Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods. AJR Am J Roentgenol. 2019;212:38-43.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 101]  [Cited by in RCA: 144]  [Article Influence: 20.6]  [Reference Citation Analysis (0)]
9.  Klement W, El Emam K. Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Modeling Studies: Development and Validation. J Med Internet Res. 2023;25:e48763.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 11]  [Cited by in RCA: 35]  [Article Influence: 17.5]  [Reference Citation Analysis (0)]