Published online Dec 7, 2025. doi: 10.3748/wjg.v31.i45.114413
Revised: September 29, 2025
Accepted: October 28, 2025
Published online: December 7, 2025
Processing time: 76 Days and 13.2 Hours
Metabolic-associated fatty liver disease (MAFLD) represents the most common cause of chronic liver disease worldwide and remains frequently underdiagnosed in its early stages. Tian et al recently reported a prospective observational study that developed a machine learning-based model to predict hepatic steatosis in high-risk individuals. The resulting XGBoost model demonstrated excellent pre
Core Tip: Machine learning can enhance early detection of metabolic-associated fatty liver disease by integrating bioche
- Citation: Cicerone O, Maestri M. Machine learning to predict metabolic-associated fatty liver disease. World J Gastroenterol 2025; 31(45): 114413
- URL: https://www.wjgnet.com/1007-9327/full/v31/i45/114413.htm
- DOI: https://dx.doi.org/10.3748/wjg.v31.i45.114413
Tian et al’s study[1] presents an innovative application of machine learning (ML) for the prediction of hepatic steatosis in individuals at high metabolic risk - defined as patients with one or more metabolic risk factors such as obesity, type 2 diabetes, dyslipidemia, or hypertension. Metabolic-associated fatty liver disease (MAFLD) has emerged as the most common chronic liver disorder worldwide, yet its silent progression in the early stages makes detection challenging. According to the international consensus definition, MAFLD is diagnosed when hepatic steatosis is present together with one of the following: Overweight/obesity, type 2 diabetes mellitus, or evidence of metabolic dysregulation[2]. In this context, the study by Tian et al[1] is particularly noteworthy, as it leverages ML to transform readily available biochemical and clinical information into a practical tool for risk prediction. While ultrasonography remains the most widely used screening modality, its limited sensitivity underscores the need for alternative approaches that are both accurate and scalable.
A methodological strength of the study is its comprehensive use of multiple ML algorithms - XGBoost, random forest, support vector machine, and logistic regression - representing diverse modeling approaches. This allowed for robust comparative analysis, with each algorithm selected for specific strengths: XGBoost for nonlinear interactions, random forest for overfitting resistance, support vector machine for high-dimensional data, and logistic regression for baseline interpretability[3,4]. By benchmarking these models against one another, the authors ensure that the final choice - XGBoost - is the result of systematic validation. In this context, the superior performance of XGBoost likely reflects its ability to capture complex non-linear interactions between metabolic biomarkers and traditional Chinese medicine (TCM)-derived features, providing an advantage over other algorithms. The reported area under the curve values (0.82 in the test set; 0.918 in cross-validation) and balanced F1-score demonstrate that the model is not only accurate but also reliable across internal validation settings (Table 1).
| Algorithm | Main strengths | Limitations/considerations | Role in Tian et al’s study[1] |
| LR | Simple, interpretable, baseline comparator | Limited handling of non-linear relationships | Served as reference model |
| RF | Robust to overfitting, good for tabular data | Less interpretable, may require tuning | Moderate performance |
| SVM | Effective with high-dimensional data | Sensitive to parameter choice, less scalable | Tested but lower accuracy |
| XGBoost | Handles non-linear interactions, high accuracy, efficient | “Black box” risk, requires interpretability tools | Best-performing model (AUC = 0.82; CV AUC = 0.918) |
Equally important is the study’s rigorous feature selection process. The combined use of recursive feature elimination and least absolute shrinkage and selection operator regression distilled a complex dataset of 156 candidate variables into 10 robust predictors[5]. Notably, these included markers that hepatologists already recognize as clinically meaningful (e.g., aspartate aminotransferase/alanine aminotransferase ratio, triglycerides, and waist circumference), together with additional predictors of metabolic dysfunction such as low- and high-density lipoproteins, the albumin/globulin ratio, and the creatinine-to-body-weight ratio. Each of these variables has established clinical associations with MAFLD pathophysiology: Dyslipidemia captured by low- and high-density lipoproteins reflects systemic metabolic imbalance; the albumin/globulin ratio provides indirect information on liver synthetic function and systemic inflammation; and the creatinine-to-body-weight ratio is emerging as a surrogate for muscle mass and renal metabolic load, both relevant to metabolic risk. Their inclusion reinforces the model’s clinical relevance and underscores its ability to integrate diverse pathophysiological dimensions.
In addition, several methodological aspects deserve emphasis. Class imbalance (70.8% MAFLD vs 29.2% non-MAFLD) was addressed through the synthetic minority over-sampling technique, which partially corrected the underrepresentation of non-MAFLD cases[6]. Hepatic steatosis was diagnosed using FibroScan® with a controlled attenuation para
Another innovative aspect was incorporating TCM indicators alongside conventional clinical metrics. Two TCM-derived features, greasy tongue coating and tongue edge redness, emerged prominently among the top predictors, bridging traditional holistic diagnostics and contemporary data analytics. This methodological choice pushes beyond conventional biomedical boundaries, suggesting that subtle, visually accessible traits may carry diagnostic value when objectively quantified. In the study, these features were acquired using the Intelligent Constitution Identifier system and subsequently reviewed by expert physicians, representing an important attempt to standardize and objectify TCM diagnostics. However, replicating such measures outside the original clinical setting remains challenging, given the lack of internationally standardized protocols and the potential for cultural or technological variability in their assessment. These limitations underscore the need for further harmonization and validation before TCM-derived indicators can be widely integrated into clinical practice.
Importantly, the authors also addressed the “black box” concern often associated with complex ML models[8]. By employing SHapley Additive exPlanation analysis, they quantified the relative contribution of each predictor, thereby enhancing transparency and bridging the gap between algorithmic complexity and clinical interpretability (Table 2). This effort is particularly valuable, as interpretability is essential for fostering clinician trust and encouraging real-world adoption. Future studies could expand on this interpretability framework, but the current work already provides a meaningful step forward in making ML models clinically accessible.
| Methodological aspect | Approach used | Rationale/significance |
| Reference standard for steatosis | FibroScan® (CAP ≥ 238 dB/m) | More accurate and reproducible than conventional ultrasonography; strengthens diagnostic validity |
| Class imbalance | SMOTE | Mitigates underrepresentation of non-MAFLD cases; reduces bias in training |
| Feature selection | Recursive feature elimination + LASSO regression | Reduced 156 variables to 10 key predictors with strong pathophysiological relevance |
| TCM feature acquisition | ICI + expert review | Attempt to standardize and objectify TCM-derived indicators |
| Model interpretability | SHAP analysis | Quantified relative contribution of each predictor; improved clinical interpretability |
Nevertheless, further challenges remain. First, external validation in multicenter cohorts is essential, given the study’s single-center design and potential population-specific biases. Moreover, the absence of subgroup analyses (e.g., diabetic vs non-diabetic patients, different grades of steatosis) limits insights into disease heterogeneity. Comorbidities such as obesity, type 2 diabetes, and metabolic dysregulation not only define high-risk individuals but may also strongly influence model performance, potentially limiting its applicability in populations with different risk profiles. Fur
From a clinical standpoint, the value of such a model lies in its ability to stratify risk in routine care. By identifying individuals who would most benefit from advanced imaging or closer follow-up, the tool has the potential to optimize resources - an especially relevant feature in health systems with constrained capacity. Integration into electronic health records, or even mobile health applications, could further accelerate its real-world impact. However, such clinical integration should be preceded by rigorous external validation and, ideally, prospective interventional studies assessing the impact of model-guided strategies on patient outcomes and decision-making.
In summary, Tian et al[1] provide a thoughtful and innovative contribution to the evolving field of predictive hepatology. Their work illustrates not only the power of ML in refining diagnostic strategies but also the importance of methodological rigor and openness to unconventional data sources. While external validation and refinement are still needed, the study lays a strong foundation for future tools aimed at improving early MAFLD detection and patient-centered care.
| 1. | Tian Y, Zhou HY, Liu ML, Ruan Y, Yan ZX, Hu XH, Du J. Machine learning-based identification of biochemical markers to predict hepatic steatosis in patients at high metabolic risk. World J Gastroenterol. 2025;31:108200. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 2] [Reference Citation Analysis (1)] |
| 2. | Eslam M, Newsome PN, Sarin SK, Anstee QM, Targher G, Romero-Gomez M, Zelber-Sagi S, Wai-Sun Wong V, Dufour JF, Schattenberg JM, Kawaguchi T, Arrese M, Valenti L, Shiha G, Tiribelli C, Yki-Järvinen H, Fan JG, Grønbæk H, Yilmaz Y, Cortez-Pinto H, Oliveira CP, Bedossa P, Adams LA, Zheng MH, Fouad Y, Chan WK, Mendez-Sanchez N, Ahn SH, Castera L, Bugianesi E, Ratziu V, George J. A new definition for metabolic dysfunction-associated fatty liver disease: An international expert consensus statement. J Hepatol. 2020;73:202-209. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2883] [Cited by in RCA: 3014] [Article Influence: 602.8] [Reference Citation Analysis (2)] |
| 3. | Jiang T, Gradus JL, Rosellini AJ. Supervised Machine Learning: A Brief Primer. Behav Ther. 2020;51:675-687. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 237] [Cited by in RCA: 276] [Article Influence: 55.2] [Reference Citation Analysis (0)] |
| 4. | Binson VA, Thomas S, Subramoniam M, Arun J, Naveen S, Madhu S. A Review of Machine Learning Algorithms for Biomedical Applications. Ann Biomed Eng. 2024;52:1159-1183. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 32] [Reference Citation Analysis (0)] |
| 5. | Xi LJ, Guo ZY, Yang XK, Ping ZG. [Application of LASSO and its extended method in variable selection of regression analysis]. Zhonghua Yu Fang Yi Xue Za Zhi. 2023;57:107-111. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 13] [Reference Citation Analysis (0)] |
| 6. | Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res. 2002;16:321-357. [RCA] [DOI] [Full Text] [Cited by in Crossref: 10955] [Cited by in RCA: 7199] [Article Influence: 313.0] [Reference Citation Analysis (0)] |
| 7. | Mikolasevic I, Orlic L, Franjic N, Hauser G, Stimac D, Milic S. Transient elastography (FibroScan®) with controlled attenuation parameter in the assessment of liver steatosis and fibrosis in patients with nonalcoholic fatty liver disease - Where do we stand? World J Gastroenterol. 2016;22:7236-7251. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 205] [Cited by in RCA: 207] [Article Influence: 23.0] [Reference Citation Analysis (0)] |
| 8. | Handelman GS, Kok HK, Chandra RV, Razavi AH, Huang S, Brooks M, Lee MJ, Asadi H. Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods. AJR Am J Roentgenol. 2019;212:38-43. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 101] [Cited by in RCA: 144] [Article Influence: 20.6] [Reference Citation Analysis (0)] |
| 9. | Klement W, El Emam K. Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Modeling Studies: Development and Validation. J Med Internet Res. 2023;25:e48763. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 11] [Cited by in RCA: 35] [Article Influence: 17.5] [Reference Citation Analysis (0)] |
