BPG is committed to discovery and dissemination of knowledge
Retrospective Study Open Access
Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.
World J Gastroenterol. Mar 21, 2026; 32(11): 116220
Published online Mar 21, 2026. doi: 10.3748/wjg.v32.i11.116220
Explainable machine learning model integrating clinical and radiomic features for predicting acute suppurative cholecystitis
Guo-Dong Chen, Yu-Hua Ge, Department of Radiology, Panjin Liaohe Oilfield Gem Flower Hospital, Panjin 124010, Liaoning Province, China
Bai-Qing Chen, Department of Radiology, The People’s Hospital of Liaoning Province, Shenyang 110067, Liaoning Province, China
Ji-Liang Liu, Department of Ophthalmology, Zigong Fourth People’s Hospital, Zigong 643000, Sichuan Province, China
Kai-Wen Cheng, Shenyang Pharmaceutical University, Shenyang 110016, Liaoning Province, China
Han-Wei Xiao, Beijing Anzhen Nanchong Hospital of Capital Medical University and Nanchong Central Hospital, Nanchong 63700, Sichuan Province, China
Hong-Yu Long, Feng Xie, Department of Interventional Medicine, Jin Qiu Hospital of Liaoning Province, Shenyang 110016, Liaoning Province, China
ORCID number: Bai-Qing Chen (0000-0001-5089-7512); Feng Xie (0009-0005-2480-6492).
Co-first authors: Guo-Dong Chen and Bai-Qing Chen.
Co-corresponding authors: Hong-Yu Long and Feng Xie.
Author contributions: Chen GD and Chen BQ contributed equally to this manuscript; Chen GD and Chen BQ collected the papers and analyzed data, analyzed the conclusions, and drafted the manuscript; Ge YH, Liu JL, Cheng KW, and Xiao HW presented the idea of this paper, reviewed the data and conclusions; Long HY and Xie F analyzed the conclusions, and drafted and revised the manuscript; Xie F and Long HY are the corresponding authors; all authors read and approved the final manuscript.
Institutional review board statement: Institutional review board approval was obtained from the Institutional Review Board of the People’s Hospital of Liaoning Province (No. 2023K047), the Institutional Review Board of Panjin Liaohe Oilfield Gem Flower Hospital (No. LLSC-2025-LW-01), and the Institutional Review Board of Nanchong Central Hospital (No. 2025-125).
Informed consent statement: Given the retrospective design of this study, a waiver of participant informed consent was granted by the Institutional Review Board of the People’s Hospital of Liaoning Province, the Institutional Review Board of Panjin Liaohe Oilfield Gem Flower Hospital, and the Institutional Review Board of Nanchong Central Hospital.
Conflict-of-interest statement: The authors declare that they have no conflict of interest.
Data sharing statement: Anonymized data not presented herein is available upon reasonable request from the corresponding author on rational request by any qualified researcher.
Corresponding author: Feng Xie, MD, Doctor, Department of Interventional Medicine, Jin Qiu Hospital of Liaoning Province, No. 317 Xiaonan Road, Shenhe District, Shenyang 110016, Liaoning Province, China. 15040255877@163.com
Received: November 6, 2025
Revised: December 4, 2025
Accepted: January 8, 2026
Published online: March 21, 2026
Processing time: 131 Days and 6.2 Hours

Abstract
BACKGROUND

Acute suppurative cholecystitis (ASC) is a critical stage in the progression of acute cholecystitis. ASC indicates an escalation of local inflammation in the gallbladder from mild to significant. The surgical difficulty and mortality of laparoscopic cholecystectomy will increase significantly.

AIM

To develop a model integrating clinical characteristics and computed tomography (CT) radiomics features to improve the predictive performance of ASC.

METHODS

Patients diagnosed with acute cholecystitis were retrospectively recruited from three independent centers. Patients were grouped into purulent and non-purulent phases based on the results of percutaneous cholecystostomy or laparoscopic cholecystectomy. Visual analysis of radiologic features combined with clinical information established a clinical model. Radiomics features were extracted from CT images. A radiomics model was extracted from these features. Then a fusion model was built by using a stacking ensemble strategy to integrate clinical and radiomics models.

RESULTS

Of 311 patients were included (mean ± SD age, 66 ± 15, 154 men; center 1, training and validation dataset; centers 2 and 3, test dataset; training dataset, n = 150; validation dataset, n = 61; test dataset, n = 100). Model performance was evaluated with the area under the receiver operating characteristic curve (AUC). SHapley Additive exPlanations (SHAP) reveals the importance of radiomics features. In the test dataset, the fusion model better predicted ASC than the clinical model and radiomics model (AUC = 0.82 vs 0.75 vs 0.76, P < 0.05), with similar specificity (83.1% vs 87.7% vs 73.9%) and higher sensitivity (71.4% vs 62.9% vs 45.7%). In addition, SHAP analysis identified logarithm glszm ZoneEntropy as the main predictor for the radiomics model.

CONCLUSION

The clinical-radiomics model constructed based on the stacking ensemble strategy could significantly improve ASC predictive accuracy.

Key Words: Acute cholecystitis; Suppuration; Computed tomography; Radiomics; SHapley Additive exPlanations value

Core Tip: This multi-center study developed and validated a fusion model to preoperatively predict acute suppurative cholecystitis (ASC). By integrating clinical characteristics and computed tomography radiomics features using a stacking ensemble strategy, the fusion model achieved an area under the receiver operating characteristic curve (AUC) of 0.82 on the external test dataset, significantly outperforming the clinical (AUC = 0.75) and radiomics (AUC = 0.76) models alone. It also showed higher sensitivity (71.4%) while maintaining high specificity (83.1%). The study concludes that this clinical-radiomics model can significantly improve the predictive accuracy for ASC, aiding in better surgical planning and risk assessment.



INTRODUCTION

The purulent phase is a critical stage in the progression of acute cholecystitis (AC). The purulent indicates an escalation of local inflammation in the gallbladder (GB) from mild to significant. And the severity also escalates from mild to moderate[1]. The surgical difficulty, conversion to open surgery and 30-day mortality of laparoscopic cholecystectomy (LC) will increase significantly[2-5]. Given this, biliary drainage should be considered as a more important treatment option, rather than one that is only considered if patient cannot withstand surgery. However, there is a lack of high-level evidence comparing the efficacy of the two in the purulent phase. This is because the diagnosis of acute suppurative cholecystitis (ASC) is usually based on intra-operative observation or biliary drainage, and there is currently a lack of effective noninvasive methods to characterize ASC prior to these interventions.

Ultrasound[6,7], computed tomography (CT)[8], and magnetic resonance imaging[9] can help diagnose purulent, in which pus or purulent bile and pericholecystic abscess are direct manifestations of ASC. However, the manifestations are not specific in these imaging modalities. Pus within the GB resembles sludge[9], making it challenging to differentiate, while the pericholecystic abscess is complex and varied. AC progresses in 3 distinct phases after cystic duct obstruction[10]. The first phase is characterized by inflammation and is manifest by GB wall congestion and edema. The second phase is characterized by hemorrhage and necrosis of the GB wall, which may lead to perforation at the site of ischemic gangrene. The third phase is purulent phase. This indicates that purulent may sometimes coexist with necrosis and perforation. Therefore, describing the imaging features of ASC is challenging.

Non-enhanced CT is commonly used as the initial diagnostic tool for AC[11,12] and is also a commonly used imaging examination to assess complications of AC[8]. And it serves as the first-line diagnostic option for acute abdominal pain[13-15]. Therefore, improving its efficacy in diagnosing ASC will bring wide-ranging benefits and cost savings. However, even when combined with laboratory parameters, current diagnostic efficacy remains unsatisfactory[16].

Radiomics is likely to change this by enabling high-throughput mining of quantitative image features from standard-of-care medical imaging that enables to capture imaging characteristics that are difficult or impossible to characterize by the human eye[17-20]. Thus, this study aimed to evaluate the diagnostic performance of non-enhanced CT radiomics in predicting ASC, using samples obtained during percutaneous cholecystostomy (PC) and LC as a reference standard.

MATERIALS AND METHODS
Study population

This retrospective multicenter study was approved by the local institutional review boards of the People’s Hospital of Liaoning Province (No. 2023K047), the Institutional Review Board of Panjin Liaohe Oilfield Gem Flower Hospital (No. LLSC-2025-LW-01), and the Institutional Review Board of Nanchong Central Hospital (No. 2025-125), and the requirement for informed consent was waived. The training dataset and validation dataset were acquired from center 1, while the test dataset included patients from center 2 and center 3.

AC patients who underwent their first PC or LC between January 2020 to January 2023 were considered for inclusion in this study. The diagnosis of AC relied on clinical manifestation and radiological studies[16]. Figure 1 showed the inclusion and exclusion process. Exclusion criteria: (1) Concurrent or secondary pancreatitis and pancreatic trauma; (2) Bloody, mucinous, or unclassifiable bile; (3) Lack of CT images and laboratory values within 48 hours before PC or LC; and (4) Poor image quality. At the three centers involved in this study, a total of 823 initially screened patients were evaluated. First, 106 patients were excluded due to concurrent or secondary pancreatitis and pancreatic trauma. Next, 20 patients were excluded because their bile appeared bloody, mucinous, or unclassifiable. Subsequently, 304 patients were excluded due to lack of CT images and laboratory values within 48 hours before PC or LC. Finally, 82 patients were excluded due to poor image quality. After the above exclusion process, 311 patients were ultimately included in the study. Among them, 211 patients from center 1 were randomly divided into the training dataset (n = 150) and validation dataset (n = 61). The remaining 100 patients from centers 2 (n = 46) and 3 (n = 54) comprised the independent test dataset. The training dataset was utilized for model construction and all parameter optimization via internal cross-validation. The validation dataset was used to provide an internal, independent assessment of the final selected model. The test dataset from centers 2 and 3 was completely held out during model development and used exclusively as an independent external validation cohort for final performance assessment of the locked model.

Figure 1
Figure 1 Flowchart of patient selection and dataset allocation. A total of 823 patients were initially screened from three centers. After applying exclusion criteria, 211 patients from center 1 were allocated into the training (n = 150) and validation (n = 61) sets, while 100 patients from centers 2 and 3 were used as the independent test set. AC: Acute cholecystitis; CT: Computed tomography; PC: Percutaneous cholecystostomy; LC: Laparoscopic cholecystectomy; ASC: Acute suppurative cholecystitis.
Diagnostic criteria for ASC

Interventional radiologists performed PC and observed the bile sample obtained intraoperatively. General surgeons performed LC and observed the intraoperative GB specimens. The diagnostic criterion for ASC was the observation of purulent bile samples and/or pericholecystic abscesses during PC/LC.

Extraction of clinical features

The clinical characteristics include gender, age, body mass index (BMI), the most recent laboratory indicators and CT imaging features within 48 hours before PC/LC. Detailed CT scan parameters used by each center were provided in Supplementary material (scan parameters of CT). Two radiologists independently documented every radiologic features. The details of features are described in Supplementary material (radiologic feature analysis of CT). Upon completion, any disagreement on the features of each AC was jointly reviewed, and the final classification was made by the consensus of two other senior radiologists. For all image reviewing, radiologists were blinded to clinical information and pathology results.

To prevent model overfitting from excessive variables while capturing key aspects of the systemic inflammatory response, biliary obstruction, and local anatomical changes of the GB, imaging and clinical features were selected based on the core pathophysiological mechanisms of AC and the 2018 Tokyo Guidelines. The final set of features included in the analysis were: Age, sex, BMI, white blood cells (WBC), neutrophil granulocytes (NE), alanine aminotransferase, serum total bilirubin, unconjugated bilirubin, cystic duct or neck of the stones, GB stones, stratification of bile in the lumen, gas within the GB lumen, necrosis of the GB wall, pericholecystic exudation or fluid, and GB wall thickness.

Radiomics feature extraction

An abdominal radiologist (reader 1) manually drawn volume-of-interest (VOI) regions layer by layer around the GB based on non-enhanced CT images using three dimensional (3D)-slicer software (version 5.2.2; http://www.slicer.org/). The method is detailed in Supplementary material (image segmentation). After 1 month, 20 patients were randomly selected from the training dataset. Their VOI regions were resegmented by reader 1 and another radiologist (reader 2) using the method to construct two resegmentation datasets. Radiomics features were extracted using PyRadiomics software (version 3.0.1; pyradiomics community). Before feature extraction, segmented images were preprocessed to minimize the influence of contrast and brightness variations on texture features: Images were spatially resampled to 3 mm × 3 mm × 3 mm using sitkNearestNeighbor as SimpleITK constant; Signal intensity values were discretized to a bin width of 25 with relative intensity rescaling. Radiomic features were extracted from both the original images and filtered versions processed with various algorithms, including wavelet (eight directions), logarithm, square, local binary pattern-3D (three variants), gradient, exponential, and square root filters. Feature categories included first-order statistics, shape features (extracted only from the original images), and texture features. A total of 1595 radiomic features were generated per patient. Intra/interobserver reproducibility analysis was evaluated using correlation coefficients. Although some features showed low correlation coefficients, they were retained due to their potential biological relevance. The entire feature extraction workflow is illustrated in Figure 2.

Figure 2
Figure 2 Workflow of model development. Radiomic features were extracted using Pyradiomics after manual segmentation. Feature selection involved t-test, Pearson correlation filtering, and least absolute shrinkage and selection operator regression. Radiomics and clinical models were constructed using logistic regression. Stacking strategy was used to integrate outputs from both models into a fusion model. WBC: White blood cells; GB: Gallbladder; STB: Serum total bilirubin; NE: Neutrophil granulocytes; UCB: Unconjugated bilirubin; LASSO: Least absolute shrinkage and selection operator; SHAP: SHapley Additive exPlanations.
Clinical model construction

After standardizing continuous variables in the training dataset (Z-score), we used a univariate-way t-test or rank sum test for continuous variables and a χ2 test for categorical variables to preliminarily screen for statistically significant variables (P ≤ 0.05). The screened variables were then included in a multivariate logistic regression model, and the forward stepwise logistic regression was used to determine the variables for model construction.

Radiomics model construction

The radiological features extracted from the training dataset were standardized (Z-score) and preliminarily screened using a univariate t-test (P < 0.01) to remove insignificant variables. Then, redundant features (|ρ| ≥ 0.9) were removed using Spearman correlation analysis. Finally, key radiomic features for model construction were selected using least absolute shrinkage and selection operator (LASSO) regression, using 5-fold cross-validation with area under the receiver operating characteristic curve (AUC) as the performance metric to select the optimal regularization parameter. SHapley Additive exPlanations (SHAP) values were subsequently calculated based on the final LASSO-logistic regression model to interpret the contribution of each selected feature to model predictions.

Fusion model construction

This study employs a stacking ensemble strategy, using the clinical model and the radiomics model as the base learners. The construction methods for the two base learners mirror those detailed in the preceding clinical model and radiomics model construction sections. Both base learners utilize logistic regression and output probabilities. To train the meta-learner, we implemented a 5-fold out-of-fold (OOF) prediction strategy on the training cohort. This strategy requires that, in each iteration of the 5-fold cross-validation, the base learners follow their respective model construction procedures and generate unbiased predictions only for the data subset left out of training in the current fold. These OOF predicted probabilities are then concatenated to form the complete fused feature matrix. The secondary model (meta-learner) is a logistic regression model, which learns how to optimally combine the predictions of the base models by fitting the OOF fused feature matrix. Finally, the fused model is saved for subsequent validation and application.

Statistical analysis

Statistical analysis was conducted using R software (version 3.6.3; R Foundation for Statistical Computing), SPSS statistics (version 24.0; IBM), and Python (version 3.10.9; Python Software Foundation). Continuous variables that followed a normal distribution were analyzed using independent samples t-tests, while those that did not follow a normal distribution were analyzed using Mann-Whitney U tests. Categorical variables were analyzed using χ2 tests or Fisher’s exact tests. The association between continuous variables was assessed using Spearman’s rank correlation coefficient. Model performance was evaluated using the AUC and decision curve analysis. To enhance the stability and accuracy of the tests, model comparisons were performed using Delong’s tests based on the predicted probability distributions. These distributions were obtained by 2000 bootstrap resampling, which were used solely to estimate confidence intervals and assess the stability of performance metrics, not to train the models themselves. In addition, the calibration of all three models was assessed across all cohorts using calibration plots and the Brier score to evaluate the agreement between predicted probabilities and observed outcomes.

RESULTS
Patient characteristics

Of 311 AC patients were included (Table 1 and Figure 1), of whom 114 patients (36.7%) were diagnosed with ASC. In the training, validation, and test dataset, 60 cases (40.0%), 19 cases (31.1%), and 35 cases (35.0%) were diagnosed with ASC, respectively. With the exception of stratification of bile in the lumen (P = 0.046), no statistically significant differences in clinical were observed among the datasets.

Table 1 The demographic data and radiologic feature of 311 patients, n (%).
Variable
Training set (n = 150)
Validation set (n = 61)
Test set (n = 100)
P value
Age (years), median IQR67.00 (56.50, 78.00)69.00 (57.50, 78.50)68.00 (55.25, 78.00)0.883
Sex (male)76 (50.7)30 (49.2)48 (48.0)0.917
WBC (× 109/L), median IQR10.36 (7.50, 14.72)9.85 (6.55, 14.84)9.95 (6.66, 13.34)0.544
NE (%), median IQR84.75 (72.35, 90.83)83.40 (70.00, 88.35)82.40 (71.42, 90.92)0.309
ALT (U/L), median IQR29.40 (16.30, 53.58)25.00 (16.50, 42.60)25.40 (15.65, 50.75)0.406
STB (μmol/L), median IQR20.35 (13.20, 32.58)17.20 (11.40, 27.00)21.15 (14.45, 35.00)0.120
UCB (μmol/L), median IQR13.05 (9.20, 20.05)11.00 (6.70, 16.80)13.45 (7.60, 20.38)0.065
GB wall thickness (mm), median IQR3.20 (2.60, 4.20)3.00 (2.30, 3.85)3.20 (2.33, 4.00)0.390
GB stones93 (62.0)36 (59.0)65 (65.0)0.7991
Cystic duct or neck of the stones65 (43.3)23 (37.7)40 (40.0)0.723
Stratification of bile in the lumen17 (11.3)5 (8.2)3 (3.0)0.0461
Gas within the GB lumen4 (2.7)0 (0.0)6 (6.0)0.093
Necrosis of the GB wall31 (20.7)14 (23.0)19 (19.0)0.834
Pericholecystic exudation or fluid66 (44.0)25 (41.0)32 (32.0)0.159
Pus60 (40)19 (31.14)35 (35)0.441
Clinical model

Univariate analysis of 14 clinical characteristics (Table 2), identified 7 predictors significantly associated with ASC, including age, WBC, NE, GB wall thickness, gas within the GB lumen, necrosis of the GB wall, and pericholecystic exudation or fluid. A multivariate logistic regression model was constructed via forward stepwise selection (significance level α = 0.05), ultimately including NE [odds ratio (OR) = 2.456, 95% confidence interval (CI): 1.520-3.969; P < 0.001] and necrosis of the GB wall (OR = 5.255, 95%CI: 2.091-13.206; P < 0.001) as independent predictive factors. The model achieved AUC values of 0.784, 0.745, and 0.746 in the training, validation, and test datasets, respectively (Figure 3).

Figure 3
Figure 3 Receiver operating characteristic curves, decision curve analysis plots and, calibration analysis for the three models in the training, test, and external validation cohorts. A: Panels illustrate the diagnostic performance (area under the receiver operating characteristic curve) of the clinical model, radiomics model, and fusion model across different cohorts; B: Panels compare the net benefit of each model across the cohorts; C: Panels show the agreement between predicted probabilities and observed outcomes for each model. The fusion model consistently outperformed the individual models in all cohorts, indicating the complementary value of clinical and radiomics features. AUC: Area under the receiver operating characteristic curve.
Table 2 Univariate and multivariate analysis of clinical features, n (%).
VariableASC (n = 60)Non-ASC (n = 90)Univariate P valueMultivariate analysis
P value
OR (95%CI)
Age (years), median IQR70.5 (59.00, 81.50)65.50 (50.00, 75.25)0.041
Sex (male)34 (56.7)42 (46.7)0.23
BMI, median IQR24.17 (21.38, 27.13)24.00 (21.59, 27.08)0.844
WBC (× 109/L), median IQR11.77 (8.31, 15.65)9.44 (6.77, 14.19)0.016
NE (%), median IQR90.35 (81.33, 93.50)80.60 (66.70, 88.75)< 0.001< 0.0012.456 (1.520-3.969)
ALT (U/L), median IQR28.00 (16.00, 60.03)30.25 (19.48, 50.60)0.524
STB (μmol/L), median IQR23.00 (14.30, 37.83)18.65 (12.70, 27.80)0.107
UCB (μmol/L), median IQR13.60 (9.33, 22.88)12.60 (9.00, 19.83)0.322
GB wall thickness (mm), median IQR3.65 (2.70, 4.50)3.10 (2.43, 3.85)0.017
GB stones41 (68.3)55 (61.1)0.367
Cystic duct or neck of the stones29 (48.3)36 (40)0.313
Stratification of bile in the lumen4 (6.7)13 (14.4)0.141
Gas within the GB lumen4 (6.7)0 (0.0)0.0241
Necrosis of the GB wall, median IQR22 (36.7)9 (10)< 0.001< 0.0015.255 (2.091-13.206)
Pericholecystic exudation or fluid38 (63.3)28 (31.1)< 0.001
Time between CT and intervention (days)0 (0-1)0 (0-1)0.672
Construction and interpretability of a radiomics model

A total of 1595 radiomics features were initially extracted from non-contrast CT images of the GB. Following initial screening using independent samples t-tests, 333 features were retained. Redundant features were subsequently removed using Pearson correlation analysis, reducing the feature set to 42. Subsequently, LASSO regression was then applied to select 11 optimal radiomics features. The pairwise correlations among these selected features were all below 0.7 (Supplementary Figure 2). The radiomics model achieved AUC values of 0.804, 0.781, and 0.763 in the training, validation, and test datasets, respectively (Figure 3A).

To elucidate the contribution of individual features to model predictions, the SHAP algorithm was applied, and a SHAP beeswarm plot was constructed (Figure 4). The results revealed two most influential features were logarithm glszm ZoneEntropy and wavelet-LLH gldm dependence nonuniformity normalized, both exhibiting positive SHAP values, suggesting a positive association with the risk of ASC. In contrast, features such as square root glszm size zone nonuniformity and lbp-3D-k glszm gray level nonuniformity exhibited negative SHAP values, indicating a potential association with decreased risk.

Figure 4
Figure 4 The SHapley Additive exPlanations beewram plots of the radiomics model. The X-axis shows SHapley Additive exPlanations values representing the magnitude and direction of feature contributions. Each dot is a sample, colored by the feature value (red high, blue low), helping interpret feature importance and effects. All features used in the radiomics model are derived from non-contrast computed tomography images. The features include: Original first-order and shape features, texture features from gray level size zone matrix, gray level dependence matrix, gray level run length matrix, neighboring gray tone difference matrix, and local binary pattern families, and filtered/transformed features using logarithm, gradient, wavelet, or square root operations. The Y-axis is limited to the range (-2, 2) to focus on the main impact range and improve plot readability by reducing the influence of extreme values. SHAP: SHapley Additive exPlanations.

Figure 5 presents four representative cases and their corresponding SHAP force plots, illustrating how individual features contribute positively or negatively to the prediction outcome. The baseline value in each plot represents the probability from the baseline model, while f(x) denotes the final predicted probability.

Figure 5
Figure 5 The SHapley Additive exPlanations force plot. Red features indicate an increased risk of acute suppurative cholecystitis (ASC), while blue features indicate a decreased risk. A: For patients with ASC, the model predicts a 97.5% probability of a positive result; B: For patients without ASC, the model predicts a 84.1% probability of a negative result; C: For patients with ASC, the model predicts a 58.4% probability of a negative result; D: For patients without ASC, the model predicts a 54.8% probability of a positive result.
Fusion model

The fusion model was constructed by integrating the prediction probabilities from both the clinical model and the radiomics model using logistic regression. The AUC values for the training, validation, and test datasets were 0.848, 0.840, and 0.826, respectively (Figure 3A). In instances where the two models yielded conflicting predictions (e.g., high probability from the clinical model but low probability from the radiomics model), the fusion model leveraged dynamic weight allocation to mitigate misclassifications. This adaptive integration enhanced the robustness and accuracy of the overall predictive performance (Figure 6).

Figure 6
Figure 6 Visualization of the fusion model’s predicted probabilities in the training set based on the predicted probabilities from clinical model and radiomics model. Each point represents a sample, with the X-axis indicating the probability predicted by clinical model and the Y-axis indicating the probability predicted by radiomics model. The color gradient reflects the predicted probability from the C model, which integrates both modalities, with blue indicating lower probability and red indicating higher probability. Circle and square markers represent negative and positive ground-truth labels, respectively.
Comparison of diagnostic performance between the clinical model, the radiomics model, and the fusion model

Compared with the clinical and radiomics models, the fusion model consistently achieved higher AUC values across all datasets (Figure 3A), with statistically significant differences observed between the training and test datasets (Table 3). Decision curve analysis further demonstrated that the fusion model yielded a greater net benefit than either individual model across almost the entire range of threshold probabilities in the training cohort (Figure 3B). In the validation cohort, the fusion model showed superior net benefit primarily within the 0.2-0.8 threshold range, although the advantage over the other models was minimal in the 0.4-0.6 interval. In the test cohort, the fusion model provided the highest net benefit in the clinically relevant 0.4-0.7 threshold range. Calibration analysis using calibration plots and Brier scores indicated that the fusion model also exhibited better agreement between predicted probabilities and observed outcomes across the training, validation, and test cohorts (Supplementary Table 1 and Figure 3C). Taking into account multiple evaluation metrics including sensitivity, specificity, and overall accuracy as well as generalizability, the fusion model demonstrated the best performance among the three models.

Table 3 Diagnostic performance of the clinical model, radiomics model and fusion model for predicting acute suppurative cholecystitis.
ModelDatasetAUC (95%CI)Sensitivity (%)Specificity (%)Accuracy (%)DeLong test P value
P vs clinical
P vs radiomics
ClinicalTraining0.7841 (0.7079-0.8557)53.386.773.3
RadiomicsTraining0.8043 (0.7224-0.8724)63.385.576.70.686
FusionTraining0.8478 (0.7773-0.9070)65.085.677.30.0460.039
ClinicalValidation0.7450 (0.5915-0.8800)57.992.982.0
RadiomicsValidation0.7807 (0.6405-0.9021)63.283.377.10.713
FusionValidation0.8396 (0.7214-0.9385)63.290.582.00.1400.192
ClinicalTest0.7459 (0.6345-0.8497)45.787.773.0
RadiomicsTest0.7631 (0.6658-0.8515)62.973.970.00.794
FusionTest0.8264 (0.7327-0.9063)71.483.179.00.0490.047
DISCUSSION

Our study aimed to evaluate the value of radiomics in improving the diagnostic performance of non-enhanced CT for ASC. This study first constructed a clinical model incorporating laboratory indicators and radiologic features, and then independently developed a machine learning model based on CT radiomics features. In the test dataset, both base models demonstrated moderate diagnostic performance with no significant difference (P = 0.80). The further constructed fusion model, by integrating the two types of features, significantly improved diagnostic performance compared to single models (P < 0.05). This indicates that combining radiomics features with conventional clinical indicators can effectively enhance the diagnostic efficacy of non-enhanced CT for ASC, suggesting that radiomics features contain important incremental diagnostic information. Moreover, in the test cohort, decision curve analysis showed that the fusion model provided a clear net benefit over both the clinical and radiomics models. This benefit was most pronounced within the clinically relevant 0.4-0.7 threshold probability range, highlighting its usefulness for guiding decisions in patients with intermediate risk.

In this study, we systematically collected two types of data to construct a clinical model. One type was conventional demographic characteristics and laboratory indicators. The other type was CT imaging manifestations collected based on the pathological development patterns of AC. These imaging features were classified into two categories: (1) Etiology-related indicators (e.g., GB stones, cystic duct stones); and (2) Disease severity assessment indicators (e.g., pericholecystic exudation, necrosis of the GB wall, which directly reflect the severity of inflammation). The study design deliberately incorporated the phased nature of AC disease progression. The pathological process of AC indicates that cystic duct obstruction and the passage of time are determining factors in the progression of AC to the purulent phase[10]. However, in clinical practice, there are significant individual differences in the duration of illness reported by patients, especially in the elderly population, where the correlation between symptoms and actual disease duration is weak. Therefore, this study did not include the subjective onset time of symptoms in the variable collection scope. Without the assistance of accurate timing, the value of etiological indicators such as cystic duct obstruction may also be significantly reduced. This was also validated in subsequent analyses. Univariate analysis results also confirmed this, with no significant statistical correlation found between ASC and etiological indicators such as GB stones (P > 0.05). The features of clinical model included NE (OR = 2.456, 95%CI: 1.520-3.969; P < 0.001) and necrosis of the GB wall (OR = 5.255, 95%CI: 2.091-13.206; P < 0.001). Both of these features directly reflect the degree of inflammation, with the former reflecting systemic inflammation and the latter reflecting local inflammation.

The advantage of radiologic features directly reflecting the severity of inflammation is that they more accurately reveal the role of temporal factors. A higher proportion of pericholecystic exudate or effusion and thicker GB walls on CT within 48 hours before PC compared to over 48 hours before PC[16]. This is also the basis for excluding patients lacking CT within 48 hours before PC/LC in this study. But even then, there was a duration of about 24 hours between the time of undergoing PC/LC and the time of CT. This can be an important factor in preventing CT reaching its full potential.

To further enhance the diagnostic utility of non-contrast CT in AC, we developed a radiomics-based clinical model. Given that radiomics models are often perceived as “black boxes” in clinical decision-making, we applied the SHAP method to conduct interpretability analysis. By visualizing both global and individual SHAP values, we quantitatively assessed the contribution of each feature to the model’s predictions[21]. As illustrated in the SHAP beeswarm plot, the most influential features included logarithm glszm ZoneEntropy, wavelet- LLH gldm dependence nonuniformity normalized, and lbp-3D-k first order 10Percentile. Features derived from the logarithm and wavelet domains effectively captured high- and low-frequency information in the grayscale texture of CT images, thereby revealing subtle heterogeneity caused by inflammatory changes. SHAP values for logarithm glszm ZoneEntropy and wavelet- LLH gldm dependence nonuniformity normalized were predominantly positive, indicating a strong association with increased risk of ASC. In contrast, features such as squareroot glszm size zone nonuniformity and lbp-3D-k glszm gray level nonuniformity showed negative SHAP contributions, suggesting a potential protective role in identifying low-risk cases. Interestingly, original shape maximum two-dimensional diameter row was the only shape feature retained, highlighting that two-dimensional GB enlargement may reflect morphological alterations during suppuration. Overall, radiomics features with their quantitative, multidimensional, and multiscale imaging representations enable identification of microstructural changes often undetectable by conventional imaging, thus offering substantial complementary value in clinical diagnosis[19,22].

Although the clinical and radiomics models showed similar performance (AUC: 0.746 vs 0.763), they are complementary. This is because they focus on different biological levels: The former reflects systemic and local inflammatory response, while the latter captures local microstructural alterations in the GB. This intrinsic complementarity motivated the construction of a fusion model, which achieved superior diagnostic performance (AUC = 0.826) on the test cohort compared to the clinical (AUC = 0.746) and radiomics (AUC = 0.763) models. This improvement underscores the added value of multimodal data integration in enhancing the diagnostic potential of non-contrast CT for ASC. Furthermore, the incorporation of a dynamic weighting mechanism allowed the fusion model to resolve inconsistencies between the two individual models (e.g., high clinical probability but low radiomics probability), thereby mitigating potential misclassifications. These findings highlight the synergistic and non-redundant contributions of clinical and radiomics features in diagnosis of ASC.

This study has several limitations. First, there were some differences in scanning protocols and baseline characteristics among different centers, which may affect the radiomics features. Secondly, due to differences in laboratory testing items among these centers, this study was unable to include some laboratory indicators with potential diagnostic value, such as C-reactive protein.

CONCLUSION

In conclusion, our study showed that the fusion model constructed by integrating the clinical model and the radiomics model based on the Stacking ensemble strategy could accurately predict ASC.

References
1.  Yokoe M, Hata J, Takada T, Strasberg SM, Asbun HJ, Wakabayashi G, Kozaka K, Endo I, Deziel DJ, Miura F, Okamoto K, Hwang TL, Huang WS, Ker CG, Chen MF, Han HS, Yoon YS, Choi IS, Yoon DS, Noguchi Y, Shikata S, Ukai T, Higuchi R, Gabata T, Mori Y, Iwashita Y, Hibi T, Jagannath P, Jonas E, Liau KH, Dervenis C, Gouma DJ, Cherqui D, Belli G, Garden OJ, Giménez ME, de Santibañes E, Suzuki K, Umezawa A, Supe AN, Pitt HA, Singh H, Chan ACW, Lau WY, Teoh AYB, Honda G, Sugioka A, Asai K, Gomi H, Itoi T, Kiriyama S, Yoshida M, Mayumi T, Matsumura N, Tokumura H, Kitano S, Hirata K, Inui K, Sumiyama Y, Yamamoto M. Tokyo Guidelines 2018: diagnostic criteria and severity grading of acute cholecystitis (with videos). J Hepatobiliary Pancreat Sci. 2018;25:41-54.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 769]  [Cited by in RCA: 792]  [Article Influence: 99.0]  [Reference Citation Analysis (0)]
2.  Okamoto K, Suzuki K, Takada T, Strasberg SM, Asbun HJ, Endo I, Iwashita Y, Hibi T, Pitt HA, Umezawa A, Asai K, Han HS, Hwang TL, Mori Y, Yoon YS, Huang WS, Belli G, Dervenis C, Yokoe M, Kiriyama S, Itoi T, Jagannath P, Garden OJ, Miura F, Nakamura M, Horiguchi A, Wakabayashi G, Cherqui D, de Santibañes E, Shikata S, Noguchi Y, Ukai T, Higuchi R, Wada K, Honda G, Supe AN, Yoshida M, Mayumi T, Gouma DJ, Deziel DJ, Liau KH, Chen MF, Shibao K, Liu KH, Su CH, Chan ACW, Yoon DS, Choi IS, Jonas E, Chen XP, Fan ST, Ker CG, Giménez ME, Kitano S, Inomata M, Hirata K, Inui K, Sumiyama Y, Yamamoto M. Tokyo Guidelines 2018: flowchart for the management of acute cholecystitis. J Hepatobiliary Pancreat Sci. 2018;25:55-72.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 556]  [Cited by in RCA: 564]  [Article Influence: 70.5]  [Reference Citation Analysis (0)]
3.  Ambe PC, Jansen S, Macher-Heidrich S, Zirngibl H. Surgical management of empyematous cholecystitis: a register study of over 12,000 cases from a regional quality control database in Germany. Surg Endosc. 2016;30:5319-5324.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 15]  [Cited by in RCA: 25]  [Article Influence: 2.5]  [Reference Citation Analysis (0)]
4.  Griffiths EA, Hodson J, Vohra RS, Marriott P; CholeS Study Group, Katbeh T, Zino S, Nassar AHM;  West Midlands Research Collaborative. Utilisation of an operative difficulty grading scale for laparoscopic cholecystectomy. Surg Endosc. 2019;33:110-121.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 48]  [Cited by in RCA: 97]  [Article Influence: 13.9]  [Reference Citation Analysis (0)]
5.  Nugent JP, Li J, Pang E, Harris A. What's new in the hot gallbladder: the evolving radiologic diagnosis and management of acute cholecystitis. Abdom Radiol (NY). 2023;48:31-46.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 4]  [Article Influence: 1.3]  [Reference Citation Analysis (0)]
6.  Charalel RA, Jeffrey RB, Shin LK. Complicated cholecystitis: the complementary roles of sonography and computed tomography. Ultrasound Q. 2011;27:161-170.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 40]  [Cited by in RCA: 32]  [Article Influence: 2.3]  [Reference Citation Analysis (0)]
7.  Sagrini E, Pecorelli A, Pettinari I, Cucchetti A, Stefanini F, Bolondi L, Piscaglia F. Contrast-enhanced ultrasonography to diagnose complicated acute cholecystitis. Intern Emerg Med. 2016;11:19-30.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 9]  [Cited by in RCA: 12]  [Article Influence: 1.2]  [Reference Citation Analysis (0)]
8.  Shakespear JS, Shaaban AM, Rezvani M. CT findings of acute cholecystitis and its complications. AJR Am J Roentgenol. 2010;194:1523-1529.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 100]  [Cited by in RCA: 75]  [Article Influence: 4.7]  [Reference Citation Analysis (0)]
9.  Watanabe Y, Nagayama M, Okumura A, Amoh Y, Katsube T, Suga T, Koyama S, Nakatani K, Dodo Y. MR imaging of acute biliary disorders. Radiographics. 2007;27:477-495.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 109]  [Cited by in RCA: 88]  [Article Influence: 4.6]  [Reference Citation Analysis (0)]
10.  Gallaher JR, Charles A. Acute Cholecystitis: A Review. JAMA. 2022;327:965-975.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 27]  [Cited by in RCA: 233]  [Article Influence: 58.3]  [Reference Citation Analysis (0)]
11.  Wertz JR, Lopez JM, Olson D, Thompson WM. Comparing the Diagnostic Accuracy of Ultrasound and CT in Evaluating Acute Cholecystitis. AJR Am J Roentgenol. 2018;211:W92-W97.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 62]  [Cited by in RCA: 70]  [Article Influence: 8.8]  [Reference Citation Analysis (0)]
12.  Martellotto S, Dohan A, Pocard M. Evaluation of the CT Scan as the First Examination for the Diagnosis and Therapeutic Strategy for Acute Cholecystitis. World J Surg. 2020;44:1779-1789.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 5]  [Cited by in RCA: 14]  [Article Influence: 2.3]  [Reference Citation Analysis (0)]
13.  Lee D, Appel S, Nunes L. CT findings and outcomes of acute cholecystitis: is additional ultrasound necessary? Abdom Radiol (NY). 2021;46:5434-5442.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 7]  [Article Influence: 1.4]  [Reference Citation Analysis (0)]
14.  Min JH, Shin KS, Lee JE, Choi SY, Ahn S. Combination of CT findings can reliably predict radiolucent common bile duct stones: a novel approach using a CT-based nomogram. Eur Radiol. 2019;29:6447-6457.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 4]  [Article Influence: 0.6]  [Reference Citation Analysis (0)]
15.  Expert Panel on Gastrointestinal Imaging:, Scheirey CD, Fowler KJ, Therrien JA, Kim DH, Al-Refaie WB, Camacho MA, Cash BD, Chang KJ, Garcia EM, Kambadakone AR, Lambert DL, Levy AD, Marin D, Moreno C, Noto RB, Peterson CM, Smith MP, Weinstein S, Carucci LR. ACR Appropriateness Criteria(®) Acute Nonlocalized Abdominal Pain. J Am Coll Radiol. 2018;15:S217-S231.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 31]  [Cited by in RCA: 58]  [Article Influence: 7.3]  [Reference Citation Analysis (0)]
16.  Chen BQ, Xie F, Chen GD, Li X, Mao X, Jia B. Value of nonenhanced CT combined with laboratory examinations in the diagnosis of acute suppurative cholecystitis treated with percutaneous cholecystostomy: a retrospective study. BMC Gastroenterol. 2022;22:155.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 4]  [Reference Citation Analysis (0)]
17.  Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, Sanduleanu S, Larue RTHM, Even AJG, Jochems A, van Wijk Y, Woodruff H, van Soest J, Lustberg T, Roelofs E, van Elmpt W, Dekker A, Mottaghy FM, Wildberger JE, Walsh S. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14:749-762.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1825]  [Cited by in RCA: 3991]  [Article Influence: 443.4]  [Reference Citation Analysis (0)]
18.  Lafata KJ, Wang Y, Konkel B, Yin FF, Bashir MR. Radiomics: a primer on high-throughput image phenotyping. Abdom Radiol (NY). 2022;47:2986-3002.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 15]  [Cited by in RCA: 62]  [Article Influence: 15.5]  [Reference Citation Analysis (0)]
19.  Liu Z, Wang S, Dong D, Wei J, Fang C, Zhou X, Sun K, Li L, Li B, Wang M, Tian J. The Applications of Radiomics in Precision Diagnosis and Treatment of Oncology: Opportunities and Challenges. Theranostics. 2019;9:1303-1322.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 558]  [Cited by in RCA: 681]  [Article Influence: 97.3]  [Reference Citation Analysis (0)]
20.  Sohn JH, Fields BKK. Radiomics and Deep Learning to Predict Pulmonary Nodule Metastasis at CT. Radiology. 2024;311:e233356.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 20]  [Reference Citation Analysis (0)]
21.  Li MD, Cheng MQ, Chen LD, Hu HT, Zhang JC, Ruan SM, Huang H, Kuang M, Lu MD, Li W, Wang W. Reproducibility of radiomics features from ultrasound images: influence of image acquisition and processing. Eur Radiol. 2022;32:5843-5851.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 21]  [Article Influence: 5.3]  [Reference Citation Analysis (0)]
22.  Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology. 2016;278:563-577.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 4541]  [Cited by in RCA: 6081]  [Article Influence: 608.1]  [Reference Citation Analysis (7)]
Footnotes

Peer review: Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Gastroenterology and hepatology

Country of origin: China

Peer-review report’s classification

Scientific quality: Grade B, Grade B, Grade B, Grade B

Novelty: Grade B, Grade B, Grade B, Grade B

Creativity or innovation: Grade B, Grade B, Grade B, Grade B

Scientific significance: Grade B, Grade B, Grade B, Grade B

P-Reviewer: Wen J, PhD, China; Yang YH, MD, Postdoc, China S-Editor: Fan M L-Editor: A P-Editor: Lei YY