BPG is committed to discovery and dissemination of knowledge
Review
Copyright ©The Author(s) 2026.
World J Gastroenterol. Jan 14, 2026; 32(2): 113059
Published online Jan 14, 2026. doi: 10.3748/wjg.v32.i2.113059
Table 1 Studies that evaluated the artificial intelligence ability to stage liver fibrosis[40,42-57,110,111]
Ref.
Type of study
Population
Number of patients
AI technique employed
Main results
Ruan et al[43]Retrospective, multi-centerHBV508MSTNet DLHigh accuracy in detecting both moderate (≥ F2) and advanced (F4) liver fibrosis, outperforming conventional clinical tools (APRI, FIB-4 and Forns) and human sonographers
Song et al[42]Retrospective, single-centerHBV93ANNs DLExcellent predictive capability to stage liver fibrosis and superior to serum fibrosis tests
Zhang et al[44]Retrospective, multi-centerHBV1500CNNs DLHigh-frequency images outperformed low-frequency ones across all trained CNNs models, as well as FIB-4, APRI and SWE in staging liver fibrosis
Duan et al[45]Retrospective, two-centerCLD434GAN model DLGood performances in staging liver fibrosis. Good predictive accuracy in identifying liver cirrhosis
Miura et al[55]Retrospective, single-centerCLD517CNNs DLHigher diagnostic accuracy than human scoring for detecting significant fibrosis (≥ F2)
Li et al[40]Prospective, single-centerChronic HBV infection144Adaptive boosting, random forest, SVM MLML algorithms improve the accuracy of liver fibrosis assessment. Combining conventional radiomics, ORF and CEMF data with ML algorithms enhances accuracy in detecting significant liver fibrosis
Durot et al[46]RetrospectiveCLD or elevated liver enzymes204SVM MLSVM ML algorithm demonstrated excellent diagnostic accuracy in distinguishing significant liver fibrosis (≥ F2) when applied to both p-SWE and 2D-SWE data from two different systems, compared with MRE
Gatos et al[47]Prospective, single-center54 healthy patients, 31 with CLD85MLGood accuracy in distinguishing healthy individuals from patients with CLD
Gatos et al[48]Retrospective56 healthy patients, 70 with CLD126MLGood accuracy in distinguishing healthy individuals from patients with CLD combining different cluster features
Destrempes et al[49]Retrospective, cross-sectionalCLD (HBV, HCV, NAFLD, AIH)82MLCombining QUS and p-SWE in an ML model enhanced accuracy in staging fibrosis, inflammation and steatosis
Wang et al[51]Prospective, multi-centerHBV398DlreDlre outperformed 2D-SWE in detecting cirrhosis and advanced fibrosis. It was more reliable than biomarkers (FIB-4, APRI) to identify all fibrosis stages
Lu et al[52]Retrospective, multi-centerCLD807Dlre2.0Dlre2.0 achieved a higher AUC than Dlre for significant fibrosis, but without statistical significance
Kagadis et al[50]Retrospective88 healthy individuals, 112 with CLD200GoogLeNet, AlexNet, VGG16, ResNet50, DenseNet201 DLAll pre-trained DL networks achieved good to excellent performance in staging liver fibrosis, outperforming radiologists. ResNet50 and DenseNet201 showed high accuracy across all fibrosis stages
Xue et al[54]RetrospectiveLocal liver lesions treated by partial hepatectomy466Inception-V3 network (DL), TLGray scale US images and 2D-SWE images analyzed with Inception-V3 (DL) using the TL achieved excellent performance in staging liver fibrosis
Brattain et al[53]RetrospectiveNAFLD328Random forest, SVM ML; CNN DLCNN demonstrated the highest performance in distinguishing liver fibrosis as significant or not
Zhou et al[57]Retrospective94 patients with liver fibrosis; 143 patients with liver fibrosis and liver steatosis237iANN, DLRadiomics with iANN-based homodyned-K US imaging outperformed both the standalone iANN method and radiomics on uncompressed US data for liver fibrosis assessment
Park et al[110]Retrospective, multi-centerPatients underwent to liver biopsy or hepatectomy933DL (VGGNet, ResNet, DenseNet, EfficientNet, ViT)Deep CNNs accurately staged liver fibrosis by METAVIR score from B-mode US images. EfficientNet showed the best performance among models
Lee et al[111]Retrospective, multi-centerHealthy individuals and patients with CLD838DCNN, DLDCNN accurately assessed METAVIR score from US images and outperformed radiologists in diagnosing cirrhosis in simulated US examination
Table 2 Studies that evaluated the artificial intelligence ability to stage liver steatosis[49,68,73-75,112-120]
Ref.
Type of study
Population
Number of patients
AI technique employed
Main results
Fujii et al[112]Prospective, cross-sectionalMASLD486DL (U-net)DL-based segmentation reliably identified the surface irregularity of the liver
Drazinos et al[113]Retrospective, monocentricMASLD112DL (Inception-V3, MobileNetV2, ResNet50, DenseNet201 and NASNet mobile)DenseNet201 achieved the highest overall performance, while Inception-V3 showed superior accuracy in the binary classification of steatosis
Chou et al[114]RetrospectiveHealthy patients and patients with liver steatosis2070DLDL models achieved higher 88.7% sensitivity for mild steatosis and consistent accuracy across all grades (normal 91.8%, moderate 77.3% moderate, severe 84.4%)
Vianna et al[115]RetrospectiveHealthy patients and patients with liver steatosis199DL (VGG16, ResNet50 and Inception-V3)DL–based analysis of B-mode US images demonstrated diagnostic performance comparable to expert human readers in both the detection and grading of hepatic steatosis
Vianna et al[116]Retrospective, multi-centerPatients with suspected hepatic steatosis datasetsNot specifiedDLDiagnostic AUC for steatosis detection increased from 0.78 to 0.97. Test-time adaptation improved DL models robustness and generalizability B-mode US
Cao et al[117]Prospective, cross-sectionalHealthy patients and patients with liver steatosis240DLThe methods showed a good ability (AUC > 0.7) to identify steatosis, particularly in distinguishing moderate from severe (AUC = 0.958)
Han et al[73]ProspectiveHealthy individuals and patients with NAFLD204CNN DLAccurate diagnosis of NAFLD and fat quantification using US radiofrequency signals
Byra et al[74]ProspectiveSteatosis and/or obese patients55DL (Inception ResNet-v2)The AI-based model performed best (AUC = 0.977) outperforming the hepatorenal sonographic index (not significant) and grey-level co-occurrence matrix (significant difference)
Constantinescu et al[75]RetrospectiveHealthy patients and patients with liver steatosis60DL (Inception-V3 and VGG-16)DL algorithms demonstrated excellent diagnostic performance, achieving accuracy rates exceeding 90%
Jeon et al[68]ProspectiveSuspected steatosis173DLDL algorithm combining QUS parametric maps with B-mode imaging accurately estimated hepatic fat fraction and reliably diagnosed hepatic steatosis
Gómez-Gavara et al[118]ProspectiveLivers from brain-dead donors, evaluated during the procurement phase192 liversMLIntegrating ML with liver texture and color analysis smartphone images enables highly accurate estimation of hepatic steatosis severity
Santoro et al[119]Prospective, cross-sectionalHealthy patients and patients with liver steatosis134MLAI application enhances both the diagnostic accuracy and efficiency of US in the assessment of hepatic steatosis
Kaffas et al[120]Retrospective, single centerHealthy patients and patients with liver steatosis403DLThis DL algorithm achieved accurate estimation of hepatic fat fraction and reliable diagnosis of hepatic steatosis
Destrempes et al[49]ProspectiveCLD82ML (random forest)Random Forest integration of QUS and SWE markedly enhanced diagnostic vs SWE alone, particularly for steatosis assessment, increasing AUC by 25%-50%
Table 3 Comparative summary of the main artificial intelligence models applied to liver ultrasound, outlining their key features, strengths, limitations, and representative clinical applications
Model type
Main features
Strengths
Limitations
Typical clinical applications
Convolutional neural networksDeep-learning models extracting hierarchical image features from B-mode or SWE dataHigh accuracy in fibrosis staging; automatic feature extraction; excellent for large datasetsRequire large training datasets; limited interpretability (“black box”)Fibrosis staging, steatosis grading, lesion detection
Support vector machinesSupervised ML classifier using kernel-based separation of dataRobust for small datasets; interpretable decision boundariesLower performance for complex, high-dimensional dataEarly fibrosis detection, ML radiomics, feature selection
Random forestEnsemble ML algorithm combining multiple decision treesHandles mixed data (imaging + clinical); resistant to overfittingLimited ability to capture image texture; less suitable for pixel-level analysisIntegration of US features with clinical and laboratory data
Generative adversarial networksDL models using generator-discriminator structureEffective for data augmentation; improves synthetic image realism and model generalizabilityComputationally demanding; risk of instability during trainingImage synthesis, dataset expansion, quality enhancement
Hybrid/multimodal modelsCombine DL image-based features with ML classifiers or clinical variablesCapture complementary information; improve diagnostic precisionRequire harmonized data and complex implementationComprehensive multiparametric liver assessment (fibrosis + steatosis)