Sun JR, Sun XN, Lu BJ, Deng BC. Artificial intelligence in hepatopathy diagnosis and treatment: Big data analytics, deep learning, and clinical prediction models. World J Gastroenterol 2025; 31(46): 111176 [DOI: 10.3748/wjg.v31.i46.111176]
Corresponding Author of This Article
Bao-Cheng Deng, PhD, The Second Department of Infectious Diseases, The First Affiliated Hospital, China Medical University, No. 155 Nanjing North Street, Shenyang 110001, Liaoning Province, China. sydengbc@163.com
Research Domain of This Article
Gastroenterology & Hepatology
Article-Type of This Article
Review
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Dec 14, 2025 (publication date) through Dec 10, 2025
Times Cited of This Article
Times Cited (0)
Journal Information of This Article
Publication Name
World Journal of Gastroenterology
ISSN
1007-9327
Publisher of This Article
Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA
Share the Article
Sun JR, Sun XN, Lu BJ, Deng BC. Artificial intelligence in hepatopathy diagnosis and treatment: Big data analytics, deep learning, and clinical prediction models. World J Gastroenterol 2025; 31(46): 111176 [DOI: 10.3748/wjg.v31.i46.111176]
Jing-Ran Sun, Bao-Cheng Deng, The Second Department of Infectious Diseases, The First Affiliated Hospital, China Medical University, Shenyang 110001, Liaoning Province, China
Jing-Ran Sun, Bing-Jiu Lu, Department of Hepatology, Affiliated Hospital of Liaoning University of Traditional Chinese Medicine, Shenyang 110032, Liaoning Province, China
Xiao-Ning Sun, Department of Geriatrics, Affiliated Hospital of Liaoning University of Traditional Chinese Medicine, Shenyang 110032, Liaoning Province, China
Co-corresponding authors: Bing-Jiu Lu and Bao-Cheng Deng.
Author contributions: Sun JR and Sun XN contributed equally to this work, they participated in the literature review, data collection, and manuscript writing; Lu BJ and Deng BC contributed equally to this work, they designed the draft and critically reviewed the manuscript for academic rigor. All authors have read and approved the final manuscript.
Supported by the Science Planning Project of Liaoning Province, No. 2019JH2/10300031-05; and the National Natural Science Foundation of China, No. 12171074.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Bao-Cheng Deng, PhD, The Second Department of Infectious Diseases, The First Affiliated Hospital, China Medical University, No. 155 Nanjing North Street, Shenyang 110001, Liaoning Province, China. sydengbc@163.com
Received: June 30, 2025 Revised: August 31, 2025 Accepted: October 21, 2025 Published online: December 14, 2025 Processing time: 165 Days and 13.5 Hours
Abstract
Artificial intelligence (AI) is rapidly transforming the landscape of hepatology by enabling automated data interpretation, early disease detection, and individualized treatment strategies. Chronic liver diseases, including non-alcoholic fatty liver disease, cirrhosis, and hepatocellular carcinoma, often progress silently and pose diagnostic challenges due to reliance on invasive biopsies and operator-dependent imaging. This review explores the integration of AI across key domains such as big data analytics, deep learning-based image analysis, histopathological interpretation, biomarker discovery, and clinical prediction modeling. AI algorithms have demonstrated high accuracy in liver fibrosis staging, hepatocellular carcinoma detection, and non-alcoholic fatty liver disease risk stratification, while also enhancing survival prediction and treatment response assessment. For instance, convolutional neural networks trained on portal venous-phase computed tomography have achieved area under the curves up to 0.92 for significant fibrosis (F2-F4) and 0.89 for advanced fibrosis, with magnetic resonance imaging-based models reporting comparable performance. Advanced methodologies such as federated learning preserve patient privacy during cross-center model training, and explainable AI techniques promote transparency and clinician trust. Despite these advancements, clinical adoption remains limited by challenges including data heterogeneity, algorithmic bias, regulatory uncertainty, and lack of real-time integration into electronic health records. Looking forward, the convergence of multi-omics, imaging, and clinical data through interpretable and validated AI frameworks holds great promise for precision liver care. Continued efforts in model standardization, ethical oversight, and clinician-centered deployment will be essential to realize the full potential of AI in hepatopathy diagnosis and treatment.
Core Tip: This review highlights how artificial intelligence is transforming hepatology by enabling early diagnosis, fibrosis staging, hepatocellular carcinoma detection, and personalized treatment. Key innovations include deep learning for imaging, multi-omics integration, and privacy-preserving federated learning. Explainable artificial intelligence builds clinician trust. Despite promising results, challenges like data heterogeneity, regulatory barriers, and limited real-time integration remain. Continued efforts in validation, ethical oversight, and user-centered design are essential for clinical adoption.
Citation: Sun JR, Sun XN, Lu BJ, Deng BC. Artificial intelligence in hepatopathy diagnosis and treatment: Big data analytics, deep learning, and clinical prediction models. World J Gastroenterol 2025; 31(46): 111176
Liver diseases are a major global health problem. They include viral hepatitis, nonalcoholic fatty liver disease (NAFLD), alcoholic liver disease, autoimmune hepatitis, cirrhosis, and hepatocellular carcinoma (HCC). In 2019, chronic liver diseases caused more than 2 million deaths worldwide. Cirrhosis is now one of the top ten causes of death, especially in low- and middle-income countries[1]. NAFLD affects over 25% of people worldwide. It is one of the most common chronic liver diseases[2]. These liver diseases often have no symptoms in the early stages. Many patients are not diagnosed until fibrosis or cancer has already developed. This makes early detection and treatment difficult.
Doctors often rely on blood tests like alanine aminotransferase and aspartate aminotransferase, imaging tools such as ultrasound, computed tomography (CT), or magnetic resonance imaging (MRI), and liver biopsy to make a diagnosis. Although biopsy is still the best way to assess fibrosis, it is invasive, expensive, and prone to sampling errors and disagreement between doctors[3]. Imaging also depends on the operator’s skill and often fails to detect early-stage disease. Most current clinical systems do not use detailed data like genetic profiles, long-term medical records, or imaging features, which limits precision diagnosis and personalized care.
Artificial intelligence (AI) is helping change this. AI uses large and complex datasets to detect patterns, make predictions, and support clinical decisions[4,5]. In liver disease, AI can analyze medical images, identify patients at risk, predict disease progression, and help guide treatment. For example, convolutional neural networks (CNNs) can classify liver tumors on scans with accuracy similar to expert radiologists[6]. Deep learning also improves the detection of HCC on CT images[7]. As an example, CT-based deep learning systems have reported area under the curve (AUC) of 0.89-0.92 for fibrosis staging[8,9], while multiphasic MRI CNNs reach approximately 0.91 for HCC differentiation, underscoring the clinical promise of AI[10]. This review discusses how AI is improving the diagnosis and treatment of liver disease. It covers AI applications in big data analytics, image analysis, pathology, biomarker discovery, and prediction models for diagnosis, staging, and outcome prediction. It also highlights current challenges such as model generalizability, interpretability, clinical use, and ethical concerns. Together, these topics show how AI is shaping the future of precision liver care.
Throughout this review, several technical terms are used. Explainable AI (XAI) refers to models that provide interpretable predictions, making them more acceptable in clinical practice. Federated learning (FL) enables decentralized training of models across hospitals without sharing raw patient data. Edge AI refers to running AI models locally on devices near data sources (e.g., imaging machines), which reduces latency and protects privacy. These concepts will be discussed in context within relevant sections.
In particular, given the rising clinical concern around drug-induced liver injury (DILI), and the growing application of AI-based models for hepatotoxicity prediction, this review includes a dedicated section on AI applications for DILI. We believe that separating DILI from general treatment decision modeling provides clearer thematic coherence and highlights its distinct importance in hepatology.
OVERVIEW OF AI TECHNOLOGIES IN MEDICINE
AI, including machine learning (ML), deep learning (DL), and big data analytics, is showing clear value in hepatology. These tools help extract useful clinical information from different types of data. Common data sources include medical images (such as CT, MRI, and ultrasound), multi-omics datasets (like genomics, proteomics, and metabolomics), and electronic health records (EHRs). Using these inputs, AI can support more accurate diagnosis, better disease staging, and personalized treatment planning.
AI-based clinical decision support systems (CDSS) usually follow several steps. First, they collect and clean the data. Then, they build models using algorithms that handle both structured and unstructured data. After model development, they test performance and integrate the system into daily clinical workflows[11-14]. For example, CNNs have reached high accuracy in detecting liver tumors and mapping liver structures on scans[15,16]. Other AI models built from EHR data can predict how liver disease will progress or how patients may respond to treatment, especially in cases of cirrhosis or HCC[12]. Studies also show that combining imaging with clinical or molecular data improves predictions more than using a single type of data alone[17].
Even with this progress, there are still important challenges. One issue is that many AI models do not work well across different hospitals or patient populations. Another is that doctors often cannot understand how the model makes decisions, which makes them less likely to use it. In addition, these tools must fit smoothly into existing hospital systems. Data privacy, fairness, and transparency are also major concerns[18]. To move forward, researchers and developers need to create clear standards for testing and reporting AI models. They also need to design systems that explain how results are generated. Finally, real-world studies are essential to show that AI can improve care in everyday clinical practice[19].
BIG DATA ANALYTICS IN HEPATOLOGY
The growing volume of clinical, molecular, and imaging data is pushing hepatology toward predictive, preventive, and personalized care. EHRs have been used with ML to find patients who are likely to have poor outcomes. These include disease progression from NAFLD to nonalcoholic steatohepatitis, liver failure due to cirrhosis, or the development of HCC[20,21]. These tools help guide targeted screening, which can lead to earlier treatment and reduce medical costs.
From a health-economics perspective, AI reduces costs through three complementary pathways. First, avoiding unnecessary invasive procedures: Noninvasive models for liver fibrosis - built on MRI/CT radiomics or routine laboratory data - support triage and can curb referrals for liver biopsy[22,23]. Second, enabling earlier diagnosis and less complex care: Opportunistic CT/MRI radiomics and primary-care AI screening surface fibrosis earlier, allowing timelier lifestyle or pharmacologic interventions; risk-stratified HCC surveillance further concentrates imaging where the expected benefit is highest[24,25]. Third, improving resource allocation and operational efficiency: In radiology services, implementation studies and early health-technology assessments show workflow gains (e.g., faster triage/reading) and favorable return on investment, with some tools projected to be cost-saving vs standard care when integrated into reporting workflows[26,27].
Using omics data - such as genomics, transcriptomics, and metabolomics - also helps define subtypes of liver disease. For example, genome-wide studies have linked specific genes (like MTARC1 and GPAM) to liver fat and fibrosis risk in NAFLD patients[28]. By combining multiple omics datasets, researchers have identified molecular subgroups of NAFLD and discovered key pathways involved in fibrosis[29].
Large imaging databases now support AI-based tools that detect liver lesions, measure liver size, and stage fibrosis. CT and MRI images have been used to train DL models that spot early signs of liver damage with high accuracy[3]. Radiomics, which analyzes image textures, can improve predictions when combined with clinical information. These models often perform better than standard imaging reports[30].
Still, several challenges limit clinical use. Medical data often vary between institutions. Some datasets lack labels or contain sensitive patient information. These issues make it hard to build models that work well across hospitals. Solutions such as FL, standardized data formats (like OMOP or FHIR), and secure computing systems can help researchers share data while protecting privacy[31,32]. Ethical issues also matter. AI tools must avoid bias and fairly represent all patient groups to support equal care.
Beyond individual care, big data also helps with public health planning. For example, EHR-based tools can estimate how common NAFLD or cirrhosis is in specific groups. This information supports health policy and resource planning. Tracking patient data over time also allows for flexible risk prediction, helping doctors adjust care based on how a disease changes[33]. In short, big data is changing hepatology by combining clinical records, omics, and imaging to better detect disease, divide patients into subgroups, and support population-level planning. To fully benefit from this approach, future work must overcome technical and ethical barriers and ensure broad, safe, and fair use of data.
DL APPLICATIONS IN LIVER DISEASE DIAGNOSIS
DL is changing how liver diseases are diagnosed. It is widely used in imaging, histology, and biomarker research.
Imaging: Fibrosis diagnosis and staging
In imaging, CNNs trained on contrast-enhanced CT or MRI scans have shown strong performance in staging liver fibrosis. For instance, one model using portal venous-phase CT achieved AUC values of 0.92 for significant fibrosis (F2-F4), 0.89 for advanced fibrosis (F3-F4), and 0.88 for cirrhosis (F4)[8]. A pilot study using contrast-enhanced CT found a moderate correlation with biopsy results (spearman ρ = 0.48) and reported AUCs between 0.73 and 0.76 for fibrosis staging[9]. Newer MRI-based methods have further improved detection. Some even apply XAI to show which image areas influence predictions[34].
Methodological details of key DL studies. Beyond general statements, pivotal imaging studies in hepatology report explicit architectures and external validation metrics. For fibrosis staging on contrast-enhanced CT, Choi et al[9] used a multi-phase 3D CNNs with residual blocks; on an external cohort (n = 100), AUCs reached 0.96 (≥ F2), 0.97 (≥ F3), and 0.95 (≥ F4) with balanced sensitivity/specificity profiles. On hepatobiliary-phase gadoxetic-acid MRI, Yasaka et al[35] trained a deep CNN that achieved high diagnostic performance for fibrosis staging in independent testing. More recently, deep residual networks on non-contrast or plain CT have also shown non-invasive staging capability with robust external testing[36]. In parallel, transformer-based models have emerged for liver tasks such as preoperative microvascular invasion prediction in HCC, showing competitive AUCs against strong CNN baselines and providing attention-based interpretability[37]. Together, these exemplars substantiate the claims of “high accuracy” with concrete architectures and validation-set performance.
Tumor detection (HCC and differentials)
DL also helps detect liver tumors. CNNs trained on multi-phase MRI have reached 91% accuracy (AUC = 0.912) in telling HCC apart from other liver lesions[10]. These tools offer fast and reliable options for tumor screening.
Pathology and biomarker modeling
In pathology, DL can analyze liver biopsy slides. One study used CNNs to grade how well HCC cells were differentiated, based on standard hematoxylin and eosin staining. The model helped doctors improve diagnostic accuracy[38]. For biomarker discovery, DL models have combined CT images and pathology data to assess the NAFLD activity score and fibrosis stage. These models provide non-invasive tools to help assess disease severity and guide treatment.
Translation toward clinical use and reporting standards
Some of these tools have already moved toward clinical use. A model trained on gadoxetic acid-enhanced MRI showed similar accuracy to MR elastography in staging fibrosis[39]. This suggests it could be used in regular clinical practice. But challenges remain. These include the need for large, high-quality datasets, making sure models work across different scanner types, and improving how well doctors can understand the model’s decisions. Guidelines like checklist for artificial intelligence in medical imaging and transparent reporting of a multivariable prediction model for individual prognosis or diagnosis-AI (TRIPOD-AI) now support better model reporting, testing, and sharing across hospitals[40].
Explainability in clinical review (XAI)
Overall, DL in liver disease is moving beyond early tests. New tools are becoming more useful in real-world settings. They can help with image reading, biopsy grading, and risk scoring - offering faster, more objective, and more consistent results in liver care. In practice, attention heatmaps on liver MRI can visually highlight sub-lesional rims, wash-in/wash-out zones, or peritumoral capsules that drove a malignant classification, enabling side-by-side review with radiologists[41,42]. LRP-style attributions on CT can mark parenchymal textures and periportal regions that contributed most to a fibrosis stage prediction, thereby increasing clinician confidence and facilitating error analysis[8,43,44].
CLINICAL PREDICTION MODELS FOR HEPATOPATHY
AI-based models for early diagnosis
AI is improving the early diagnosis of liver diseases. For NAFLD, Hsu et al[45] used a random forest (RF) model on large population datasets and reached an area under the receiver operating characteristic curve (AUROC) of 0.83 for identifying high-risk individuals. Zhang et al[46] built a CNN that analyzed ultrasound images and achieved an AUC of 0.89 for detecting ≥ F2 fibrosis. Yin et al[8] applied CNNs to portal-venous CT scans and obtained AUROC values of up to 0.92 for significant fibrosis. Yasaka et al[3] reported similar accuracy using CT-based DL tools with an AUC of about 0.89. Some multi-center studies that combine CT and MRI have shown improved generalization across patient populations[30].
AI also plays a growing role in early detection of HCC. For example, Xu et al’s ML tool improved the sensitivity of HCC screening in patients with hepatitis B-related cirrhosis[47]. Other studies using CNNs on multiphasic MRI achieved AUCs of 0.91 for distinguishing HCC from benign liver lesions[6]. These findings are supported by additional neural network approaches using medical images[48,49]. Together, these models offer non-invasive, fast, and consistent tools for population screening. They reduce the need for liver biopsy and lower the impact of operator-dependent variability.
Prognostic models for disease progression and survival
AI models are also used to predict how liver diseases will progress and how long patients may survive. Katzman et al[50] created an extreme gradient boosting model that outperformed traditional Cox regression in forecasting survival in HCC patients. Radiomics-based nomograms that combine clinical and imaging features can predict recurrence-free survival after liver surgery, with a C-index of about 0.76[51]. Other tools include improved versions of existing scores. For example, Zhu et al[52] modified the model for end-stage liver disease score to better predict outcomes in primary liver cancer. DeepSurv, a DL model, has been used to make personalized predictions in both cirrhosis and HCC[53]. Some research also focuses on dynamic models built from EHRs. These allow real-time updates of patient risk as new data becomes available[54,55]. Clinicians are beginning to use these models to adjust how often patients are monitored, how aggressive treatments should be, and whether liver transplant should be considered.
Treatment decision models and precision medicine
AI tools are helping doctors tailor treatment plans based on disease severity. For NAFLD, Zhang et al[56] developed a RF model using lab tests and achieved an AUROC of 0.91 for detecting moderate-to-severe disease, showing potential for guiding lifestyle or drug therapy. In HCC, Peng et al[57] used CT-based radiomics and DL to predict how patients would respond to transarterial chemoembolization. Their model showed high accuracy with AUC values up to 0.97 in training cohorts. External tests reported internal AUCs of 0.94 and external validation AUCs around 0.90[57].
ML is also being used to choose the right systemic treatment or immunotherapy. These models may include imaging features, genetic mutations, programmed death-ligand 1 status, and lab markers. Although many are still in testing, early results show they may predict treatment response better than traditional methods[58,59]. Reporting of model architecture and validation. For clinical prediction models that integrate laboratory tests and EHR features, we now explicitly report the learning algorithm and validation metrics whenever available (e.g., RF or gradient-boosted decision trees with AUROC, sensitivity, and specificity on external cohorts). Representative EHR-integrated or radiomics-augmented models in hepatology report external AUROCs (0.85-0.92) for advanced fibrosis/HCC-related endpoints, with calibration and decision-curve analyses complementing discrimination[60]. We have aligned our reporting with TRIPOD-AI/CONSORT-AI recommendations to enhance transparency and clinical interpretability.
AI for DILI prediction
DILI is a major cause of drug development failure and post-marketing drug withdrawal. Conventional toxicology approaches often lack predictive accuracy and mechanistic resolution, particularly in early-phase risk stratification. In this context, AI has emerged as a powerful tool to model, predict, and interpret DILI risk across diverse data modalities. Current AI applications in DILI prediction primarily include: Compound-based prediction using chemical descriptors, off-target profiling, and physicochemical properties, often implemented via RF, support vector machines, and deep neural networks; biological data modeling, such as transcriptomic or microarray analyses, to uncover gene-level predictors and regulatory cascade patterns preceding liver toxicity; integrative platforms that combine clinical laboratory data, pharmacogenomic inputs, and multi-source datasets for individualized DILI risk assessment; mechanistic and interpretable models (e.g., SHAP, virtual liver lobule simulations) that bridge black-box AI with biological interpretability and regulatory applicability (Table 1[61-69]).
Table 1 Artificial intelligence-based studies on drug-induced liver injury prediction, highlighting their methodological frameworks and key performance outcomes.
AI-powered CDSS are being developed to assist with liver disease management. These systems help predict HCC risk, monitor complications in cirrhosis, and guide treatment choices by analyzing imaging and clinical data. For example, Malik et al[10] outlined the expanding role of AI across early diagnosis, prognosis, and therapy selection in liver diseases. One CT-based tool, PLAN-B-DF, achieved strong predictive power, with a C-index of 0.91 in internal validation and 0.89 in external datasets. This outperformed traditional scoring models in predicting HCC risk among patients with chronic hepatitis B[70]. However, most of these systems remain in early development stages. External validation and widespread clinical use are still limited[71].
Regulatory, ethical, and interpretability considerations
Bringing AI-CDSS into clinical settings requires meeting regulatory standards and addressing ethical concerns. In the United States, the Food and Drug Administration (FDA) has introduced its software as a medical device action plan. This includes guidance on algorithm transparency, lifecycle monitoring, and good ML practices. In the European Union, the General Data Protection Regulation enforces strict rules for data privacy in healthcare AI[72]. Ethical issues remain a major concern. These include risks of algorithmic bias, lack of fairness, and automation bias, where clinicians may over-rely on AI decisions. Studies show that unless bias is actively addressed, AI systems may reinforce health disparities[73].
To improve transparency and model quality, guidelines like TRIPOD-AI and CONSORT-AI have been developed. These frameworks promote better reporting, validation, and reproducibility of AI-based clinical prediction models[74]. Operationally, saliency- or attention-based visualizations reviewed with radiologists can document whether the model relies on clinically credible cues (e.g., arterial rim, peritumoral capsule), supporting model verification in the imaging report workflow[41]. Pixel-wise relevance maps (e.g., LRP) further enable case-level audits on CT by localizing fibrosis-related textures (e.g., periportal change), aligning with governance requirements for post-hoc explainability and error analysis[8,43,75,76]. Despite these advances, common barriers still exist. These include alert fatigue, poor integration with EHRs, and limited real-time interpretability. Such issues continue to hinder routine clinical adoption.
Clinical translation and deployment challenges
Despite promising results in experimental and retrospective settings, the readiness of AI systems for clinical hepatology remains limited. Several barriers must be addressed before widespread deployment.
Regulatory hurdles: Both the United States FDA and the European Medicines Agency now provide frameworks for AI/ML-enabled medical devices, requiring continuous performance monitoring and transparent reporting of algorithm updates. FDA’s 2021 action plan for AI/ML-based software as a medical device emphasizes real-world performance monitoring and change control protocols[77].
Real-world barriers: Clinical translation is constrained by privacy regulations (Health Insurance Portability and Accountability Act/General Data Protection Regulation), lack of interoperability between hospital EHRs, and low clinician trust in ‘black-box’ models. For example, across-site validation studies show significant performance drops due to domain shifts, while surveys reveal that hepatologists express concerns over liability and interpretability[78,79].
Cost-effectiveness: AI can potentially reduce costs by avoiding unnecessary liver biopsies, enabling earlier disease detection and treatment, and optimizing imaging resource allocation. For instance, cost-effectiveness analyses of AI-enabled imaging in oncology indicate that early detection strategies reduce downstream treatment expenditures by up to 30%[80]. Similar modeling studies are beginning to emerge for chronic liver disease, though prospective economic evaluations remain limited. Taken together, these considerations underscore the importance of regulatory compliance, interoperability, and health economic validation to ensure safe, equitable, and sustainable adoption of AI in hepatology practice (Table 2).
Table 2 Artificial intelligence-augmented vs human-only diagnostic accuracy: Current evidence.
Successful AI implementation depends on clinician trust and workflow fit. Doctors-especially transplant hepatologists-often stress that AI should support, not replace, human judgment[81,82]. Transparency and ease of use are essential. Practical design solutions can mitigate these barriers. For instance, interactive dashboards such as the GutGPT system were shown to reduce alert fatigue by prioritizing clinically relevant notifications[76]. Similarly, case studies of application program interfaces-based integration with EHRs have demonstrated smoother adoption by embedding AI predictions directly into radiology reports or hepatology consult notes, minimizing workflow disruption[27]. Some systems use dashboard interfaces, similar to “GutGPT”, to guide decisions and improve compliance with clinical guidelines. While these tools can enhance care quality, they may also disrupt workflow or lead to alert fatigue. Many reviews have found that lack of EHR integration and limited real-time feedback are key reasons why clinicians hesitate to adopt AI system.
CHALLENGES AND FUTURE DIRECTIONS
Data quality, bias, and generalizability
Reliable AI tools in hepatology depend on access to diverse and high-quality data. But most current datasets are retrospective and collected from single centers. These datasets often lack variation in patient ethnicity, liver disease causes, and imaging methods. As a result, AI models may show poor performance when tested in new settings. Ghosh et al[83] reported that some models lost more than 20% accuracy when applied to data from different hospitals.
Beyond performance drops across centers, several studies have highlighted real-world failures of AI in hepatology. For example, Yin et al[8] reported that a DL model for liver fibrosis staging, developed on predominantly Western image datasets, when applied in an Asian cohort exhibited notable performance drop-off (e.g., lower AUCs), thereby highlighting the risk of ethnic and etiological bias in model generalisation. Similarly, Abràmoff et al[73] emphasized that insufficient subgroup validation can reinforce health inequities if AI is deployed without fairness audits. In addition to ethnic variability, gender bias has also been documented - for instance, routine clinical and demographic feature-based ML tools like FibrAIm under-detect early steatosis and fibrosis in certain subpopulations, raising concerns about subgroup performance disparities in early screening tools for metabolic dysfunction-associated steatotic liver disease/steatohepatitis patients[84]. These examples caution against premature clinical deployment and highlight the need for prospective, multi-ethnic validation before routine use.
Cross-site domain shifts (scanner vendors, acquisition protocols, disease etiologies, and ethnicity) remain major sources of performance drop; model bias has been well documented in medical AI and requires pre-specifying sub-group analyses and fairness audits[72,73]. Adopting TRIPOD-AI for transparent reporting and conducting prospective, multi-center external validation are therefore essential to avoid spectrum bias and improve real-world reliability[74].
Moreover, evidence suggests that AI performance in hepatology is uneven across underrepresented populations. For instance, Obermeyer et al[85] showed that an algorithm widely used in United States healthcare underestimated risk in Black patients due to reliance on healthcare costs as a proxy for illness, demonstrating how systemic bias in training data can exacerbate disparities. Similarly, Nam et al[86] emphasized that most hepatology AI studies are derived from Western cohorts, with markedly lower accuracy when applied to Asian populations, underscoring the need for multi-ethnic validation before clinical deployment. Gender bias has also been reported - for example, blood-test-based AI models missed 44% of female liver disease cases compared to 23% in males, highlighting subgroup-specific risks that could worsen inequities if unaddressed[87]. Standard data pipelines and formats, such as OMOP and FHIR, are rarely used in liver studies. To improve generalizability, future research should focus on building multicenter, prospective datasets and on harmonizing metadata to reduce bias across institutions.
Multi-omics integration and real-time analytics
AI has enabled the integration of multi-omics data - such as genomics, transcriptomics, and metabolomics - into liver disease models. These tools can improve diagnosis, risk assessment, and treatment planning. A recent review in gut highlighted how AI turns omics data into meaningful clinical insights[83]. By combining single-cell and bulk omics data using DL or graph-based models, researchers can better classify disease subtypes and understand how they progress[88,89]. For example, by integrating genomic variants (e.g., PNPLA3[90], MTARC1[28]), transcriptomic signatures of fibrogenic activation[91], metabolomic lipid pathway shifts[92], and routine labs, an AI meta-model can identify NAFLD subtypes that respond preferentially to glucagon-like peptide 1 analogs[93] vs pioglitazone[94], enabling tailored therapy selection in clinics. However, technical challenges remain. These include differences in data types, timing, and limited interpretability. Some early-stage solutions use bio-inspired AI frameworks to link genotype with phenotype, but they are not yet widely adopted[95]. In addition, real-time AI tools at the bedside (edge AI) are limited due to hardware constraints and data processing speed. More research is needed to bring these systems into routine use.
FL and privacy-preserving models
Protecting patient privacy is a key challenge in AI model development. Centralized data sharing is often not allowed, especially in liver disease research where multi-institutional data is critical. FL offers a solution. It allows models to be trained across sites without moving patient data. Instead, only the model updates are shared. Several studies show that FL works well for liver imaging tasks. Bernecker et al[96] developed a method called Federated Normalization, which adapts FL to both CT and MRI data. The model achieved near-centralized performance, with Dice coefficients close to 0.96 across six liver imaging datasets[97]. In pathology, Lusnig et al[98] built a hybrid quantum FL model to grade hepatic steatosis. Their approach used quantum neural networks and achieved over 90% accuracy without any data sharing. These methods are especially useful in settings where biopsy data is sensitive.
Even with these successes, challenges remain. Medical data from different centers are not always distributed evenly. This can make training unstable and hurt performance[99]. Other barriers include high communication costs, synchronization issues, and the need for strong computing power. To address this, researchers are testing strategies like Federated Averaging, FedSGD, and split learning to improve speed and stability[100].
Towards explainable and trustworthy AI in hepatology
Building trust in AI requires transparency and interpretability. Tools like SHAP values and attention maps are now being used to explain AI decisions in omics and imaging[101]. For multi-omics models, surveys have shown how XAI techniques can point to key features and explain which data types are most important[102,103]. Combining explainable models with FL can lead to systems that are both secure and interpretable[74,97]. This makes it easier for clinicians to trust and adopt these tools. Regulatory groups are also supporting this shift. Guidelines like TRIPOD-AI and DECIDE-AI push for better reporting and validation of AI models[104]. Future deployment should also include clinician-centered design, regular feedback loops, and real-world testing of how AI affects workload and decision-making[105].
FUTURE DIRECTIONS
AI is transforming the landscape of hepatopathy diagnosis and treatment by enabling automated data analysis, improved disease stratification, and personalized therapeutic decision-making. This review comprehensively highlights AI applications across multiple domains: Big data analytics, DL-based imaging interpretation, histopathological analysis, biomarker discovery, and clinical prediction modeling. AI has demonstrated high diagnostic accuracy in liver fibrosis staging, HCC detection, and NAFLD stratification using CT, MRI, and ultrasound. Prognostic models integrating radiomics and EHR data offer improved survival predictions and facilitate treatment selection. AI-driven decision support systems have shown promise in enhancing the efficiency and precision of clinical workflows.
Furthermore, AI has enabled scalable early screening tools and non-invasive biomarkers, thus minimizing reliance on liver biopsy. FL has addressed data privacy issues while maintaining model performance across decentralized datasets. Advances in XAI have contributed to clinician trust by enhancing transparency in complex models. However, widespread clinical integration remains limited by issues such as data heterogeneity, regulatory ambiguity, and lack of real-time interpretability.
Looking forward, the promise of AI in hepatology lies in its potential to integrate multi-omics, imaging, and clinical data into unified, interpretable, and actionable models. This requires the development of robust, externally validated algorithms trained on large, ethnically diverse, and longitudinal datasets. Interoperability standards, such as OMOP and FHIR, should be adopted to harmonize data input across institutions. FL and edge AI represent promising frameworks for ensuring privacy-preserving, real-time analytics at the point of care. Moreover, regulatory frameworks such as TRIPOD-AI and DECIDE-AI should be universally implemented to standardize AI model reporting and validation. Clinician-in-the-loop design, user-centered interface development, and alert burden mitigation are critical to promote AI adoption in hepatology. Equally important is the incorporation of bioethical safeguards to ensure algorithmic fairness, accountability, and transparency.
CONCLUSION
This review summarizes the current progress and challenges of AI in the diagnosis and management of liver diseases. AI technologies - including big data analytics, DL, and clinical prediction modeling - have demonstrated promising potential across multiple domains, such as fibrosis staging, HCC detection, and non-invasive risk stratification. These tools support earlier diagnosis, individualized therapy, and more efficient clinical workflows. In addition, FL offers privacy-preserving solutions for multicenter model training, while XAI improves transparency and builds clinician trust. Despite these advances, barriers such as data heterogeneity, lack of real-time interpretability, and regulatory uncertainty remain. Therefore, we believe AI should be actively integrated but critically evaluated in hepatology. Future efforts should focus on large-scale validation, harmonized data standards, and user-centered design. With sustained investment in clinical translation, interpretability, and infrastructure, AI is poised to become a central component in the precision management of liver diseases. Looking ahead, we envision a learning hepatology ecosystem in which harmonized EHR, imaging, and multi-omics streams continuously update validated, explainable models at the bedside. In this future state, AI serves as an accountable clinical co-pilot - auditable, bias-aware, interoperable, and aligned with practice guidelines - supporting prevention, earlier diagnosis, and individualized therapy while reducing unwarranted variation. Progress should be judged not only by benchmark AUCs but by patient-centered outcomes, equity, safety, and efficiency. Realizing this vision will require shared standards, prospective trials, and governance that earns durable trust among patients, clinicians, and regulators.
Footnotes
Provenance and peer review: Invited article; Externally peer reviewed.
Oestmann PM, Wang CJ, Savic LJ, Hamm CA, Stark S, Schobert I, Gebauer B, Schlachter T, Lin M, Weinreb JC, Batra R, Mulligan D, Zhang X, Duncan JS, Chapiro J. Deep learning-assisted differentiation of pathologically proven atypical and typical hepatocellular carcinoma (HCC) versus non-HCC on contrast-enhanced MRI of the liver.Eur Radiol. 2021;31:4981-4990.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 49][Cited by in RCA: 55][Article Influence: 13.8][Reference Citation Analysis (0)]
Thrift AP, Nguyen Wenker TH, Godwin K, Balakrishnan M, Duong HT, Loomba R, Kanwal F, El-Serag HB. An Electronic Health Record Model for Predicting Risk of Hepatic Fibrosis in Primary Care Patients.Dig Dis Sci. 2024;69:2430-2436.
[RCA] [PubMed] [DOI] [Full Text][Reference Citation Analysis (0)]
Ding J, Liu H, Zhang X, Zhao N, Peng Y, Shi J, Chen J, Chi X, Li L, Zhang M, Liu WY, Zhang L, Ouyang J, Yuan Q, Liao M, Tan Y, Li M, Xu Z, Tang W, Xie C, Li Y, Pan Q, Xu Y, Cai SY, Byrne CD, Targher G, Ouyang X, Zhang L, Jiang Z, Zheng MH, Sun F, Chai J. Integrative multiomic analysis identifies distinct molecular subtypes of NAFLD in a Chinese population.Sci Transl Med. 2024;16:eadh9940.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 8][Cited by in RCA: 12][Article Influence: 12.0][Reference Citation Analysis (0)]
Dayan I, Roth HR, Zhong A, Harouni A, Gentili A, Abidin AZ, Liu A, Costa AB, Wood BJ, Tsai CS, Wang CH, Hsu CN, Lee CK, Ruan P, Xu D, Wu D, Huang E, Kitamura FC, Lacey G, de Antônio Corradi GC, Nino G, Shin HH, Obinata H, Ren H, Crane JC, Tetreault J, Guan J, Garrett JW, Kaggie JD, Park JG, Dreyer K, Juluru K, Kersten K, Rockenbach MABC, Linguraru MG, Haider MA, AbdelMaseeh M, Rieke N, Damasceno PF, E Silva PMC, Wang P, Xu S, Kawano S, Sriswasdi S, Park SY, Grist TM, Buch V, Jantarabenjakul W, Wang W, Tak WY, Li X, Lin X, Kwon YJ, Quraini A, Feng A, Priest AN, Turkbey B, Glicksberg B, Bizzo B, Kim BS, Tor-Díez C, Lee CC, Hsu CJ, Lin C, Lai CL, Hess CP, Compas C, Bhatia D, Oermann EK, Leibovitz E, Sasaki H, Mori H, Yang I, Sohn JH, Murthy KNK, Fu LC, de Mendonça MRF, Fralick M, Kang MK, Adil M, Gangai N, Vateekul P, Elnajjar P, Hickman S, Majumdar S, McLeod SL, Reed S, Gräf S, Harmon S, Kodama T, Puthanakit T, Mazzulli T, de Lavor VL, Rakvongthai Y, Lee YR, Wen Y, Gilbert FJ, Flores MG, Li Q. Federated learning for predicting clinical outcomes in patients with COVID-19.Nat Med. 2021;27:1735-1743.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 278][Cited by in RCA: 250][Article Influence: 62.5][Reference Citation Analysis (0)]
Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong IC, Rijnbeek PR, van der Lei J, Pratt N, Norén GN, Li YC, Stang PE, Madigan D, Ryan PB. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers.Stud Health Technol Inform. 2015;216:574-578.
[PubMed] [DOI]
Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, Sanduleanu S, Larue RTHM, Even AJG, Jochems A, van Wijk Y, Woodruff H, van Soest J, Lustberg T, Roelofs E, van Elmpt W, Dekker A, Mottaghy FM, Wildberger JE, Walsh S. Radiomics: the bridge between medical imaging and personalized medicine.Nat Rev Clin Oncol. 2017;14:749-762.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 1825][Cited by in RCA: 3774][Article Influence: 471.8][Reference Citation Analysis (0)]
Dong X, Jia X, Zhang W, Zhang J, Xu H, Xu L, Ma C, Hu H, Luo J, Zhang J, Wang Z, Ji W, Yang D, Yang Z. Interpretable and generalizable deep learning model for preoperative assessment of microvascular invasion and outcome in hepatocellular carcinoma based on MRI: a multicenter study.Insights Imaging. 2025;16:151.
[RCA] [PubMed] [DOI] [Full Text][Reference Citation Analysis (0)]
Hsu C, Caussy C, Imajo K, Chen J, Singh S, Kaulback K, Le MD, Hooker J, Tu X, Bettencourt R, Yin M, Sirlin CB, Ehman RL, Nakajima A, Loomba R. Magnetic Resonance vs Transient Elastography Analysis of Patients With Nonalcoholic Fatty Liver Disease: A Systematic Review and Pooled Analysis of Individual Participants.Clin Gastroenterol Hepatol. 2019;17:630-637.e8.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 258][Cited by in RCA: 311][Article Influence: 51.8][Reference Citation Analysis (0)]
Xu Y, Zhang B, Zhou F, Yi YP, Yang XL, Ouyang X, Hu H. Development of machine learning-based personalized predictive models for risk evaluation of hepatocellular carcinoma in hepatitis B virus-related cirrhosis patients with low levels of serum alpha-fetoprotein.Ann Hepatol. 2024;29:101540.
[RCA] [PubMed] [DOI] [Full Text][Cited by in RCA: 5][Reference Citation Analysis (1)]
Yang CJ, Wang CK, Fang YD, Wang JY, Su FC, Tsai HM, Lin YJ, Tsai HW, Yeh LR. Clinical application of mask region-based convolutional neural network for the automatic detection and segmentation of abnormal liver density based on hepatocellular carcinoma computed tomography datasets.PLoS One. 2021;16:e0255605.
[RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)][Cited by in Crossref: 1][Cited by in RCA: 9][Article Influence: 2.3][Reference Citation Analysis (0)]
Martinez Chanza N, Werner L, Plimack E, Yu EY, Alva AS, Crabb SJ, Powles T, Rosenberg JE, Baniel J, Vaishampayan UN, Berthold DR, Ladoire S, Hussain SA, Milowsky MI, Agarwal N, Necchi A, Pal SK, Sternberg CN, Bellmunt J, Galsky MD, Harshman LC; RISC Investigators. Incidence, Patterns, and Outcomes with Adjuvant Chemotherapy for Residual Disease After Neoadjuvant Chemotherapy in Muscle-invasive Urinary Tract Cancers.Eur Urol Oncol. 2020;3:671-679.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 7][Cited by in RCA: 8][Article Influence: 1.3][Reference Citation Analysis (0)]
Zhang L, Huang Y, Huang M, Zhao CH, Zhang YJ, Wang Y. Development of Cost-Effective Fatty Liver Disease Prediction Models in a Chinese Population: Statistical and Machine Learning Approaches.JMIR Form Res. 2024;8:e53654.
[RCA] [PubMed] [DOI] [Full Text][Reference Citation Analysis (0)]
Cui H, Zeng L, Li R, Li Q, Hong C, Zhu H, Chen L, Liu L, Zou X, Xiao L. Radiomics signature based on CECT for non-invasive prediction of response to anti-PD-1 therapy in patients with hepatocellular carcinoma.Clin Radiol. 2023;78:e37-e44.
[RCA] [PubMed] [DOI] [Full Text][Cited by in RCA: 8][Reference Citation Analysis (0)]
Sarvestany SS, Kwong JC, Azhie A, Dong V, Cerocchi O, Ali AF, Karnam RS, Kuriry H, Shengir M, Candido E, Duchen R, Sebastiani G, Patel K, Goldenberg A, Bhat M. Development and validation of an ensemble machine learning framework for detection of all-cause advanced hepatic fibrosis: a retrospective cohort study.Lancet Digit Health. 2022;4:e188-e199.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 3][Cited by in RCA: 18][Article Influence: 6.0][Reference Citation Analysis (0)]
Shin H, Hur MH, Song BG, Park SY, Kim GA, Choi G, Nam JY, Kim MA, Park Y, Ko Y, Park J, Lee HA, Chung SW, Choi NR, Park MK, Lee YB, Sinn DH, Kim SU, Kim HY, Kim JM, Park SJ, Lee HC, Lee DH, Chung JW, Kim YJ, Yoon JH, Lee JH. AI model using CT-based imaging biomarkers to predict hepatocellular carcinoma in patients with chronic hepatitis B.J Hepatol. 2025;82:1080-1088.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 2][Cited by in RCA: 9][Article Influence: 9.0][Reference Citation Analysis (0)]
Abràmoff MD, Tarver ME, Loyo-Berrios N, Trujillo S, Char D, Obermeyer Z, Eydelman MB; Foundational Principles of Ophthalmic Imaging and Algorithmic Interpretation Working Group of the Collaborative Community for Ophthalmic Imaging Foundation, Washington, D. C, Maisel WH. Considerations for addressing bias in artificial intelligence for health equity.NPJ Digit Med. 2023;6:170.
[RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)][Cited by in Crossref: 1][Cited by in RCA: 133][Article Influence: 66.5][Reference Citation Analysis (0)]
Vasey B, Nagendran M, Campbell B, Clifton DA, Collins GS, Denaxas S, Denniston AK, Faes L, Geerts B, Ibrahim M, Liu X, Mateen BA, Mathur P, McCradden MD, Morgan L, Ordish J, Rogers C, Saria S, Ting DSW, Watkinson P, Weber W, Wheatstone P, McCulloch P; DECIDE-AI expert group. Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI.Nat Med. 2022;28:924-933.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 243][Cited by in RCA: 238][Article Influence: 79.3][Reference Citation Analysis (0)]
United States Food and Drug Administration.
Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. 2021. [cited 24 April 2025]. Available from: https://www.fda.gov/media/145022/download.
[PubMed] [DOI]
Parra NS, Ross HM, Khan A, Wu M, Goldberg R, Shah L, Mukhtar S, Beiriger J, Gerber A, Halegoua-DeMarzio D. Advancements in the Diagnosis of Hepatocellular Carcinoma.Int J Transl Med. 2023;3:51-65.
[PubMed] [DOI] [Full Text]
Ginter-Matuszewska B, Adamek A, Majchrzak M, Rozplochowski B, Zientarska A, Kowala-Piaskowska A, Lukasiak P. FibrAIm - The machine learning approach to identify the early stage of liver fibrosis and steatosis.Int J Med Inform. 2025;197:105837.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 1][Reference Citation Analysis (0)]
Lv Y, Ding H, Wu H, Zhao Y, Zhang L. FedRDS: Federated Learning on Non-IID Data via Regularization and Data Sharing.Appl Sci. 2023;13:12962.
[PubMed] [DOI] [Full Text]
Wan L, Liu R, Sun L, Nie H, Wang X. UAV swarm based radar signal sorting via multi-source data fusion: A deep transfer learning framework.Inf Fusion. 2022;78:90-101.
[PubMed] [DOI] [Full Text]