Published online May 27, 2026. doi: 10.4240/wjgs.v18.i5.115903
Revised: December 16, 2025
Accepted: February 4, 2026
Published online: May 27, 2026
Processing time: 209 Days and 6.9 Hours
Acute necrotizing pancreatitis (ANP), a more severe form of acute pancreatitis, requires early diagnosis and accurate severity stratification for optimal patient prognosis and treatment. Currently, most scholars do not clearly differentiate between ANP and severe acute pancreatitis, although they are distinct clinical entities.
To investigate the value of radiomics derived from contrast-enhanced computed tomography (CECT) of the pancreatic parenchyma and peripancreatic necrotic collections, combined with various machine learning algorithms, to differentiate between severe and moderately severe ANP.
We conducted a retrospective cohort study of 184 ANP patients (72 severe, 112 moderately severe), randomly divided into training and test cohorts in a 7:3 ratio. On portal venous phase CECT images, regions of interest encompassing the entire pancreatic parenchyma and peripancreatic necrotic collections were manually delineated on a slice-by-slice basis. Radiomic features were then extracted from the regions of interest using the PyRadiomics package. Feature selection was performed using intraclass and interclass correlation coefficients, independent samples t-tests, and the light gradient boosting machine algorithm. The classification models were constructed using support vector machine, random forest (RF), k-nearest neighbor, gradient boosting decision tree, and extreme gradient boosting algorithms combined with 10-fold cross-validation, developing three distinct models: the pancreatic model, the peripancreatic model, and the combined model. The performance of each model was evaluated by analyzing receiver operating characteristic curves, the area under the curve, accuracy, sensitivity, specificity, F1-score, and Brier score.
The combined RF model demonstrated superior performance compared to other models (support vector machine, k-nearest neighbor, gradient boosting decision tree, and extreme gradient boosting) for differentiating between severe and moderately severe ANP. It achieved the best results in the test cohort, with an area under the curve of 0.896 (95% confidence interval: 0.778-0.977), accuracy of 0.839, sensitivity of 0.650, specificity of 0.944, F1-score of 0.743, and Brier score of 0.134.
Radiomic analysis of both the pancreatic parenchyma and peripancreatic necrotic collections on CECT, combined with machine learning, effectively differentiates between severe and moderately severe ANP. The combined RF model showed superior performance. This approach shows potential for improving early diagnostic accuracy, aiding clinical decision-making, and optimizing treatment strategies. The refined classification system facilitates better resource allocation, patient triage, and stratification.
Core Tip: Acute necrotizing pancreatitis (ANP), a more severe form of acute pancreatitis, requires early diagnosis and accurate severity stratification for optimal patient prognosis and treatment. This study demonstrates that radiomics based on contrast-enhanced computed tomography of both pancreatic parenchyma and peripancreatic necrotic collections, combined with machine learning algorithms, can effectively differentiate between severe and moderately severe ANP. This model may serve as a valuable adjunct clinical decision support tool, and its refined classification of ANP into severe and moderately severe categories could help optimize resource allocation and improve patient triage.
- Citation: Feng Y, Hu XH, Xiao B. Machine learning and radiomics for differentiating severe from moderately severe acute necrotizing pancreatitis on contrast-enhanced computed tomography. World J Gastrointest Surg 2026; 18(5): 115903
- URL: https://www.wjgnet.com/1948-9366/full/v18/i5/115903.htm
- DOI: https://dx.doi.org/10.4240/wjgs.v18.i5.115903
Acute pancreatitis (AP) is a common acute abdominal emergency, characterized by both local and systemic inflammatory responses. Its clinical course ranges from a self-limiting mild form to moderate or severe AP (SAP). Globally, AP has an annual incidence of approximately 33.74 per 100000 person-years and a mortality rate of about 1.16 per 100000. Furthermore, its incidence has been increasing annually[1-3]. The 2012 revised Atlanta classification[4] categorizes AP into two types based on morphology and pathology: Interstitial edematous pancreatitis and acute necrotizing pancreatitis (ANP). ANP, the more severe form, often involves multiple organ systems with more severe manifestations, leading to higher mortality and poorer outcomes. Approximately 20% of AP patients progress to moderately SAP or SAP. SAP is the most critical form, characterized by a high mortality rate (20%-40%) and poor prognosis[3]. Although the terms ANP (morphological) and SAP (clinical) are often used interchangeably in the literature, their definitions and clinical manifestations are not identical. To establish a clearer differentiation based on disease severity within the ANP population, this study introduces novel terminology “acute necrotizing moderately severe pancreatitis (ANMSP)” and “acute necrotizing severe pancreatitis (ANSP)”. This terminology integrates both radiological and clinical characteristics, providing a more precise framework for classification and management.
Contrast-enhanced computed tomography (CECT) is the primary imaging modality for evaluating the morphological characteristics of necrotizing pancreatitis[5]. CECT is more widely used than magnetic resonance imaging for both diagnosing AP and assessing its severity. This is likely due to the wider availability of computed tomography (CT), faster scanning times, and the easier interpretation of its findings for clinicians[6]. CECT visualizes the necrotic areas (only peripancreatic necrosis, only pancreatic necrosis, or both) by highlighting differences in parenchymal enhancement and peripancreatic vessel opacification, thereby providing a comprehensive assessment of the severity of ANP and the extent of surrounding tissue involvement. In addition, the concept of radiomics was first introduced by Lambin et al[7] in 2012. It refers to the high-throughput extraction and analysis of large volumes of advanced quantitative imaging features from medical images[8]. In the early stages of AP, morphological changes in the pancreas may not be evident on imaging studies in some patients, particularly in cases of pancreatic necrosis, potentially leading to an underestimation of disease severity[9]. As a non-invasive approach, radiomics can capture subtle heterogeneity within lesions that is undetectable by conventional imaging in the early stages. By quantitatively analyzing these features, it creates a critical link between imaging findings and clinical practice, thereby aiding in treatment selection[10].
Recently, radiomics has been primarily used to diagnose pancreatic tumors and differentiate between types of pancreatitis[11]. However, few studies have focused on assessing the severity of ANP, and none have explored its early differential diagnosis using combined radiomic features from both pancreatic parenchyma and peripancreatic necrotic collections. Therefore, this study aims to develop and validate machine learning models for differentiating severe from moderately severe ANP (i.e., ANSP from ANMSP). These models will be based on radiomic features extracted from portal venous phase CECT images of the pancreatic parenchyma, peripancreatic necrotic collections, and their combination. The diagnostic performance of these models will be evaluated to enable early identification of disease severity and support clinical decision-making regarding treatment.
We retrospectively analyzed medical records of patients with ANP treated at our hospital from May 2016 to June 2024. The diagnosis of AP was based on the 2012 revised Atlanta classification[4], which required at least two of the following: (1) Persistent epigastric pain; (2) Serum amylase and/or lipase levels at least three times the upper limit of normal; or (3) Imaging findings consistent with AP. ANP was defined as AP accompanied by pancreatic parenchymal necrosis and/or peripancreatic necrosis. Pancreatic parenchymal necrosis is defined as non-enhancing or hypo-enhancing areas (< 30 HU) within the pancreas[12], while peripancreatic necrosis is defined by fluid collections containing fatty necrotic debris[13].
The inclusion criteria were the following: (1) Hospitalization and undergoing CECT within 7 days of ANP onset; (2) Availability of complete laboratory data, medical records, and imaging studies; and (3) Age ≥ 18 years. The exclusion criteria were the following: (1) Admission to the hospital more than 7 days after ANP onset; (2) Admission within 7 days of ANP onset without undergoing a CECT examination; (3) Previous history of chronic pancreatitis, pancreatic malignancy, or pancreatic surgery; (4) Pregnancy or age < 18 years; or (5) Incomplete imaging series (absence of portal venous phase axial images), poor image quality that precludes evaluation, missing essential clinical data, or admission primarily for other acute abdominal conditions (e.g., gastrointestinal bleeding, intestinal obstruction). Additionally, the severe group was characterized by persistent organ dysfunction (> 48 hours), a modified Marshall score ≥ 2, and meeting the threshold for any two of the following criteria: Ranson score ≥ 3, Glasgow-Imrie score > 3, or Bedside Index of SAP (BISAP) score ≥ 3. The moderately severe group was defined by transient organ dysfunction (< 48 hours) that resolved within 48 hours and the failure to meet the criteria for severe disease. In total, 184 ANP patients were enrolled (Figure 1 presents the flowchart of patient selection), comprising 72 severe (ANSP) and 112 moderately severe (ANMSP) patients. These patients were then randomly allocated to a training cohort (n = 128; 52 severe, 76 moderately severe) and a test cohort (n = 56; 20 severe, 36 moderately severe) in a 7:3 ratio.
After screening for CECT contraindications, all patients underwent the examination within 7 days of symptom onset. The revised Atlanta classification[4] describes the dynamic course of AP as having two mortality peaks (early and late phases)[14]. Our study focused on portal venous phase CT images obtained during the first 7 days after ANP onset. Patient position: Head first, supine, with arms raised above the head. Scan range: From the diaphragmatic dome to below the inferior poles of both kidneys. CT parameters are detailed in Table 1. A non-contrast scan was acquired first. Subsequently, iohexol (1-2 mL/kg) was administered as an intravenous bolus via the antecubital vein at 3.5-4 mL/second using a high-pressure injector. Arterial and portal venous phase scans were acquired at 25-30 seconds and 65-75 seconds post-injection, respectively. All images were reconstructed and uploaded to a dedicated workstation for analysis before being archived in the picture archiving and communication system. Two abdominal radiologists, each with more than 4 years of experience, independently reviewed all studies for diagnostic interpretation.
| Scanning models | Tube voltage (kV) | Tube current (mA) | Acquisition matrix | Pitch (mm) | Slice thickness (mm) | Slice interval (mm) |
| Siemens SOMATOM Force computed tomography | 120 | 200 | 512 × 512 | 0.6 | 5 | 5 |
| United Imaging uCT 710 (64-slice) | 120 | 108 | 512 × 512 | 1.0 | 5 | 5 |
| Brilliance 64 | 120 | 200 | 512 × 512 | 0.8 | 5 | 5 |
Portal venous phase CECT images were analyzed using 3D-Slicer software (https://www.slicer.org/) for feature extraction. Regions of interests (ROIs) were manually delineated by two radiologists (with 4 years and 5 years of experience in pancreatitis diagnosis, respectively). To ensure segmentation consistency, all 184 patient ROIs for the final model were delineated by radiologist 1. Radiologist 2 then independently resegmented a random subset of 30 patients, blinded to the initial results. The ROIs included two distinct components (Figures 2 and 3): Label 1 for the entire pancreatic parenchyma (including necrotic areas but excluding bile ducts and vessels) and label 2 for peripancreatic necrotic collections at corresponding anatomical levels. These collections were found in spaces such as the anterior pararenal, perirenal, posterior pararenal, lesser sac, perihepatic, gastrosplenic, and pancreatosplenic spaces. All images were first resampled to an isotropic 5 mm3 voxel size using bilinear interpolation. Radiomic features were then automatically extracted from the volumes of interest using the PyRadiomics python package (version 3.11.9). The extracted feature set included first-order gray histogram features, second-order, higher-order texture features, and other features.
We calculated interclass and intraclass correlation coefficients (ICCs) to evaluate consistency among radiologists, with ICC values > 0.75 indicating good agreement[15]. Subsequently, independent samples t-tests were performed on the features with good agreement (ICCs > 0.75) to identify those with statistically significant differences (P < 0.05). These significant features were then discretized into 10 bins (values 0-9) using both equal-width binning and equal-frequency binning methods. This binning process transforms features to an appropriate scale without requiring normalization or standardization, while also simplifying logistic regression models and reducing overfitting risk[16]. Finally, based on the light gradient boosting machine algorithm, the discretized features were shuffled and evaluated using 10-fold cross-validation to iteratively derive the optimal feature subset.
Classification models were constructed using the optimal radiomic features and a 10-fold cross-validation framework. Five machine learning algorithms [support vector machine (SVM), random forest (RF), k-nearest neighbor (KNN), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost)] were applied to portal venous phase CECT data to build three distinct models: The pancreatic model, the peripancreatic model, and the combined model. Model performance was assessed using area under the curve (AUC), accuracy, sensitivity, specificity, F1-score, and Brier score. The Brier score quantifies performance through the mean squared error between predicted probabilities and observed outcomes, with lower values indicating better performance.
Statistical analyses were conducted using SPSS version 25.0 and Python version 3.11.9. Normally distributed continuous variables are presented as mean ± SD, and non-normally distributed continuous variables as median (interquartile range). Categorical variables are expressed as n (%). Group comparisons for continuous variables are made using independent samples t-tests or Mann-Whitney U tests, while categorical variables are compared with χ2 test or Fisher’s exact test. P-values of less than 0.05 are deemed significant.
In the training cohort, the severe and moderately severe ANP groups did not differ significantly in gender, alcohol consumption history, white blood cell count, serum lipase levels, or pancreatic necrosis types (P > 0.05). In contrast, patients in the severe group were significantly older and had higher serum amylase levels and Modified CT Severity Index (MCTSI) scores (P < 0.05; Table 2).
| Characteristics | MSAP (n = 76) | SAP (n = 52) | Z/χ2/t/Fisher’s exact test | P value | |
| Age (years) | 45.01 ± 13.58 | 52.88 ± 14.68 | -3.116 | 0.002b | |
| Gender | Male | 50 (65.79) | 34 (65.38) | 0.002 | 0.962 |
| Female | 26 (34.21) | 18 (34.62) | |||
| Alcohol consumption history | No | 47 (61.84) | 37 (71.15) | 1.187 | 0.276 |
| Yes | 29 (38.16) | 15 (28.85) | |||
| WBC (× 109/L) | 14.12 ± 4.21 | 15.47 ± 6.24 | -1.369 | 0.175 | |
| Serum amylase (× 10 U/L) | 23.80 (7.18-43.35) | 47.31 (10.83-116.60) | -2.159 | 0.031a | |
| Serum lipase (× 10 U/L) | 27.45 (11.23-94.00) | 59.41 (15.76-117.91) | -1.356 | 0.175 | |
| MCTSI score | 8.00 (6.50-8.00) | 8.00 (8.00-10.00) | -2.879 | 0.004b | |
| Pancreatic necrosis types | Only pancreatic parenchymal necrosis type | 11 (14.47) | 5 (9.62) | 0.730 | 0.781 |
| Mixed type | 60 (78.95) | 44 (84.61) | |||
| Only peripancreatic necrosis type | 5 (6.58) | 3 (5.77) | |||
In the test cohort, there were no significant differences between the severe and moderately severe ANP groups in age, gender, alcohol consumption history, serum lipase levels, or pancreatic necrosis types (P > 0.05). However, the severe group had significantly higher white blood cell counts, serum amylase levels, and MCTSI scores than the moderately severe group (P < 0.05; Table 3).
| Characteristics | MSAP (n = 76) | SAP (n = 52) | Z/χ2/t/Fisher’s exact test | P value | |
| Age (years) | 44.39 ± 11.15 | 50.30 ± 17.46 | -1.367 | 0.182 | |
| Gender | Male | 24 (66.67) | 12 (60.00) | 0.249 | 0.618 |
| Female | 12 (33.33) | 8 (40.00) | |||
| Alcohol consumption history | No | 17 (47.22) | 13 (65.00) | 1.634 | 0.201 |
| Yes | 19 (52.78) | 7 (35.00) | |||
| WBC (× 109/L) | 13.38 ± 4.22 | 16.46 ± 4.60 | -2.535 | 0.014a | |
| Serum amylase (× 10 U/L) | 12.85 (6.98-36.53) | 47.00 (19.83-126.78) | -2.539 | 0.011a | |
| Serum lipase (× 10 U/L) | 20.80 (9.93-82.75) | 50.10 (23.65-145.43) | -1.351 | 0.177 | |
| MCTSI score | 8.00 (8.00-8.00) | 9.00(8.00-10.00) | -2.492 | 0.013a | |
| Pancreatic necrosis types | Only pancreatic parenchymal necrosis type | 6 (16.67) | 1 (5.00) | 1.820 | 0.475 |
| Mixed type | 29 (80.55) | 18 (90.00) | |||
| Only peripancreatic necrosis type | 1 (2.78) | 1 (5.00) | |||
We assessed feature reproducibility via ICC using a two-way random-effects model (absolute agreement, based on average measures). The median and interquartile range of ICC for features extracted from the pancreatic parenchyma and peripancreatic necrotic collections are detailed in Tables 4 and 5. Unreliable features (ICC < 0.75) were excluded.
| Feature type | Feature count | Median ICC | IQR | ICC < 0.75 (%) |
| First-order statistics | 316 | 0.959 | 0.891-0.985 | 17.7% (56/316) |
| Shape features | 14 | 0.944 | 0.937-0.971 | 7% (1/14) |
| GLCM features | 408 | 0.928 | 0.834-0.974 | 23.5% (96/408) |
| GLSZM features | 272 | 0.745 | 0.581-0.889 | 52.2% (142/272) |
| GLRLM features | 272 | 0.894 | 0.744-0.960 | 25.4% (69/272) |
| GLDM features | 238 | 0.902 | 0.728-0.966 | 28.2% (67/238) |
| Total | 1520 | 0.912 | 0.730-0.972 | 28.3% (431/1520) |
| Feature type | Feature count | Median ICC | IQR | ICC < 0.75 (%) |
| First-order statistics | 316 | 0.942 | 0.841-0.975 | 23.4% (74/316) |
| Shape features | 14 | 0.972 | 0.841-0.975 | 0% (0/14) |
| GLCM features | 408 | 0.871 | 0.693-0.938 | 33.6% (137/408) |
| GLSZM features | 272 | 0.883 | 0.706-0.970 | 35.6% (97/272) |
| GLRLM features | 272 | 0.980 | 0.803-0.992 | 21.7% (59/272) |
| GLDM features | 238 | 0.944 | 0.824-0.988 | 21.8% (52/238) |
| Total | 1520 | 0.923 | 0.744-0.978 | 27% (419/1520) |
We extracted 1520, 1520, and 3040 radiomic features from the pancreatic parenchyma, peripancreatic necrotic collections, and combined regions, respectively. After consistency testing, 1089 (pancreatic), 1101 (peripancreatic), and 2190 (combined) features exhibited good agreement (ICC > 0.75). Following independent samples t-tests, 620, 515, and 1135 features showed significant differences (P < 0.05) between severity groups for each model. The final optimal subsets, determined by light gradient boosting machine and cross-validation, contained 10 pancreatic, 9 peripancreatic, and 14 combined features (Tables 6, 7, and 8).
| Feature number | Optimal radiomic features |
| 1 | Wavelet-LLL_glcm_Correlation_qcut |
| 2 | Wavelet-HHL_gldm_DependenceVariance_qcut |
| 3 | Original_shape_Maximum2DDiameterRow_qcut |
| 4 | Wavelet-LHL_glszm_SmallAreaEmphasis_qcut |
| 5 | Wavelet-LHL_glszm_SmallAreaEmphasis_cut |
| 6 | Logarithm_firstorder_10Percentile_qcut |
| 7 | Log-sigma-5-0-mm-3D_glcm_Imc1_cut |
| 8 | Gradient_gldm_GrayLevelVariance_cut |
| 9 | Wavelet-LHL_firstorder_Mean_qcut |
| 10 | Log-sigma-5-0-mm-3D_glcm_Imc1_qcut |
| Feature number | Optimal radiomic features |
| 1 | Wavelet-HHH_glszm_GrayLevelNonUniformity_qcut |
| 2 | Log-sigma-3-0-mm-3D_glcm_Idm_qcut |
| 3 | Original_glszm_ZoneEntropy_qcut |
| 4 | Square_gldm_DependenceVariance_qcut |
| 5 | Exponential_gldm_DependenceVariance_qcut |
| 6 | Wavelet-LLL_glrlm_LongRunLowGrayLevelEmphasis_qcut |
| 7 | Log-sigma-5-0-mm-3D_glcm_Idmn_qcut |
| 8 | Log-sigma-2-0-mm-3D_firstorder_10Percentile_qcut |
| 9 | Original_shape_Flatness_qcut |
| Feature number | Optimal radiomic features |
| 1 | a_wavelet-LLH_gldm_LowGrayLevelEmphasis_cut |
| 2 | p_gradient_glrlm_ShortRunEmphasis_qcut |
| 3 | a_log-sigma-4-0-mm-3D_glcm_MCC_qcut |
| 4 | p_wavelet-HLL_glcm_Imc1_qcut |
| 5 | p_wavelet-HLL_firstorder_Mean_qcut |
| 6 | a_wavelet-HHH_glszm_GrayLevelNonUniformity_qcut |
| 7 | a_log-sigma-2-0-mm-3D_glszm_ZoneEntropy_qcut |
| 8 | a_log-sigma-2-0-mm-3D_firstorder_Mean_qcut |
| 9 | p_wavelet-HLL_firstorder_Mean_cut |
| 10 | a_wavelet-HLH_glrlm_RunVariance_qcut |
| 11 | a_wavelet-LLL_firstorder_RootMeanSquared_cut |
| 12 | p_log-sigma-5-0-mm-3D_glcm_Imc1_qcut |
| 13 | a_wavelet-LLH_glszm_GrayLevelVariance_qcut |
| 14 | a_wavelet-LHH_glszm_LowGrayLevelZoneEmphasis_cut |
We developed radiomics-based models to discriminate disease severity using five machine learning algorithms (SVM, RF, KNN, GBDT, XGBoost) and optimal feature subsets from portal venous phase CECT data. Three distinct models were constructed: The pancreatic model, the peripancreatic model, and the combined model. Their receiver operating characteristic curves are shown in Figure 4. Table 9 shows the final hyperparameter values after tuning.
| Model | Hyperparameter | Value |
| SVM | Kernel, C, gamma | Poly, 0.2 |
| RF | n_estimators, max_depth, min_samples_split, max_features | 50, 5, 15, 10 |
| KNN | n_neighbors, wights, p | 30, distance, 1 |
| GBDT | n_estimators, learning_rate, max_depth, subsample | 20, 0.3, 3, 0.8 |
| XGBoost | n_estimators, learning_rate, max_depth, subsample | 20, 0.1, 5, 0.5 |
| LightGBM | n_estimators, learning_rate, num_leaves, colsample_bytree, subsample, max_depth | 4000, 0.08, 32 (25), 0.65, 0.9, 5 |
Model performance metrics - including the AUC, accuracy, sensitivity, specificity, F1 score, and Brier score, along with their 95% confidence intervals - are detailed in Tables 10 and 11. While overfitting was observed in models such as KNN and GBDT on the training cohort - likely due to the interplay of high model complexity, the dimensionality of the initial radiomic feature set, and the limited sample size, which may have led the models to learn noise rather than generalizable patterns - the subsequent performance drop on the independent test cohort directly reflects their limited generalizability. To mitigate this risk, our study design included a strict 7:3 training-test split and 10-fold cross validation during training and feature selection. Ultimately, the combined RF model was selected as it demonstrated a more balanced and generalizable profile, maintaining high specificity (0.944) and a robust AUC (0.896) on the test cohort.
| Model | AUC | 95%CI | Accuracy, % | Sensitivity, % | Specificity, % | F1-score | |
| Pancreatic parenchyma | SVM | 0.916 | 0.858-0.962 | 84.4 (78.1-90.6) | 88.5 (79.0-96.4) | 81.6 (72.1-90.0) | 0.826 (0.734-0.893) |
| RF | 0.962 | 0.928-0.988 | 89.8 (84.4-94.5) | 80.7 (69.8-90.4) | 96.1 (91.3-100) | 0.866 (0.785-0.929) | |
| KNN | 1.000 | 1.000-1.000 | 100 (100-100) | 100 (100-100) | 100 (100-100) | 1.000 (1.000-1.000) | |
| GBDT | 0.996 | 0.989-1.000 | 96.9 (93.8-99.2) | 92.3 (83.7-98.3) | 100 (100-100) | 0.960 (0.911-0.991) | |
| XGBoost | 0.971 | 0.938-0.993 | 82.2 (77.5-86.9) | 88.5 (79.5-96.1) | 94.7 (89.2-988.8) | 0.902 (0.835-0.956) | |
| Peripancreatic necrotic collections | SVM | 0.924 | 0.877-0.967 | 87.5 (81.3-92.9) | 90.4 (81.5-97.7) | 85.5 (77.6-92.9) | 0.855 (0.776-0.917) |
| RF | 0.969 | 0.942-0.990 | 90.6 (85.2-95.3) | 86.5 (77.2-949) | 93.4 (87.5-98.7) | 0.882 (81.3-93.9) | |
| KNN | 1.000 | 1.000-1.000 | 100 (100-100) | 100 (100-100) | 100 (100-100) | 1.000 (1.000-1.000) | |
| GBDT | 1.000 | 0.999-1.000 | 100 (100-100) | 100 (100-100) | 100 (100-100) | 1.000 (1.000-1.000) | |
| XGBoost | 0.977 | 0.956-0.995 | 93.0 (88.3-96.8) | 88.5 (78.6-96.1) | 96.1 (91.5-100) | 0.911 (0.846-0.961) | |
| Combined pancreatic parenchyma and peripancreatic necrotic collections | SVM | 0.951 | 0.904-0.986 | 89.8 (84.4-94.6) | 86.5 (76.1-94.8) | 92.1 (85.5-97.5) | 0.874 (0.796-0.935) |
| RF | 0.973 | 0.950-0.991 | 89.8 (84.4-94.5) | 84.6 (75.0-93.9) | 93.4 (87.1-98.6) | 0.871 (0.795-0.934) | |
| KNN | 1.000 | 1.000-1.000 | 100 (100-100) | 100 (100-100) | 100 (100-100) | 1.000 (1.000-1.000) | |
| GBDT | 1.000 | 0.999-1.000 | 100 (100-100) | 100 (100-100) | 100 (100-100) | 1.000 (1.000-1.000) | |
| XGBoost | 0.971 | 0.942-0.992 | 93.8 (89.8-97.6) | 92.3 (84.6-98.2) | 94.7 (89.3-988) | 0.923 (00.865-0.968) |
| Model | AUC | 95%CI | Accuracy, % | Sensitivity, % | Specificity, % | F1-score | Brier score | |
| Pancreatic parenchyma | SVM | 0.829 | 0.706-0.932 | 75.0 (62.5-85.7) | 85.0 (66.7-100) | 69.4 (52.7-83.3) | 0.708 (0.540-0.836) | 0.164 (0.117-0.214) |
| RF | 0.840 | 0.719-0.936 | 78.6 (67.8-89.3) | 75.0 (52.9-93.7) | 80.6 (67.7-93.5) | 0.714 (0.524-0.851) | 0.159 (0.116-0.209) | |
| KNN | 0.814 | 0.690-0.918 | 75.0 (64.3-85.7) | 65.0 (43.4-86.4) | 80.6 (66.7-91.9) | 0.650 (0.452-0.810) | 0.175 (0.139-0.215) | |
| GBDT | 0.814 | 0.682-0.940 | 73.2 (62.5-83.9) | 85.0 (66.7-100) | 66.7 (51.4-81.3) | 0.694 (0.524-0.821) | 0.172 (0.119-0.233) | |
| XGBoost | 0.828 | 0.694-0.933 | 78.6 (67.8-89.3) | 75.0 (55.6-93.7) | 80.6 (66.7-923) | 0.714 (0.542-0.850) | 0.161 (0.109-0.211) | |
| Peripancreatic necrotic collections | SVM | 0.857 | 0.744-0.942 | 75.0 (62.5-85.7) | 65.0 (43.7-86.4) | 80.5 (65.8-92.7) | 0.650 (0.462-0.809) | 0.147 (0.096-0.199) |
| RF | 0.868 | 0.756-0.960 | 82.1 (71.4-91.1) | 65.0 (43.7-86.4) | 91.7 (81.1-99.8) | 0.722 (0.522-0.872) | 0.146 (0.089-0.206) | |
| KNN | 0.825 | 0.695-0.923 | 75.0 (64.3-85.7) | 60.0 (36.8-81.8) | 83.3 (70.3-94.3) | 0.632 (0.424-0.784) | 0.159 (0.112-0.211) | |
| GBDT | 0.817 | 0.686-0.922 | 75.0 (67.9-89.3) | 70.0 (50.0-59.5) | 83.3 (70.6-94.6) | 0.700 (0.513-0.850) | 0.168 (0.104-0.238) | |
| XGBoost | 0.847 | 0.733-0.947 | 80.4 (69.6-0.911) | 70.0 (47.6-88.9) | 86.1 (74.2-95.1) | 0.718 (0.526-0.857) | 0.147 (0.102-0.195) | |
| Combined pancreatic parenchyma and peripancreatic necrotic collections | SVM | 0.879 | 0.780-0.959 | 80.4 (69.6-91.1) | 70.0 (50.0-90.0) | 86.1 (74.4-96.9) | 0.718 (0.540-0.864) | 0.142 (0.094-0.189) |
| RF | 0.896 | 0.778-0.977 | 83.9 (74.9-92.9) | 65.0 (42.9-0.857) | 94.4 (86.1-100) | 0.743 (0.600-0.889) | 0.134 (0.097-0.173) | |
| KNN | 0.849 | 0.735-0.945 | 82.1 (71.4-928) | 60.0 (38.1-82.6) | 94.4 (85.4-100) | 0.706 (0.500-0.875) | 0.156 (0.113-0.199) | |
| GBDT | 0.854 | 0.744-0.947 | 78.6 (67.9-89.3) | 70.0 (50.0-88.9) | 83.3 (71.1-94.7) | 0.700 (0.533-0.840) | 0.166 (0.085-0.247) | |
| XGBoost | 0.853 | 0.744-0.949 | 78.6 (67.8-87.5) | 70.0 (50.0-88.2) | 83.3 (70.3-94.3) | 0.700 (0.519-0.833) | 0.149 (0.086-0.221) |
In summary, the RF algorithm, applied to the combined feature set, outperformed all other models (SVM, RF, KNN, GBDT, XGBoost) in distinguishing severe from moderately severe ANP. The combined RF model achieved an AUC of 0.896 (95% confidence interval: 0.778-0.977), accuracy of 0.839, sensitivity of 0.65, specificity of 0.944, F1-score of 0.743, and a Brier score of 0.134 in the test cohort.
The novel terminology (ANMSP/ANSP) represents a key innovation of this study. To evaluate its clinical utility, we compared the moderately severe (ANMSP) and severe (ANSP) groups on key outcomes: Hospital length of stay, intensive care unit (ICU) admission rate, mortality, and receipt of surgical/interventional procedures (e.g., thoracocentesis, paracentesis, laparoscopic cholecystectomy, biliary drainage). Significant differences were observed between groups in length of hospital stay, ICU admission, and receipt of surgery/interventions (P < 0.05; Table 12). With only two deaths, in-hospital mortality in the severe group was lower than previously reported[3]. We noted that 26.4% (19/72) of severe and 5.4% (6/112) of moderately severe patients left against medical advice or refused ICU transfer due to high costs, plans for treatment elsewhere, or a perceived poor prognosis. This pattern may partly explain the lower mortality and ICU utilization observed in the severe group.
| Characteristics | MSAP (n = 112) | SAP (n = 72) | Z/χ2/t/Fisher’s exact test | P value | |
| Length of stay | 11.00 (8.00-16.00) | 16.00 (10.00-20.75) | -3.562 | < 0.001b | |
| ICU admission | No | 112 (100.00) | 60 (83.33) | 17.329 | < 0.001b |
| Yes | 0 (0.00) | 12 (16.67) | |||
| Mortality | No | 112 (100.00) | 70 (97.22) | 0.152 | |
| Yes | 0 (0.00) | 2 (2.78) | |||
| Surgery/intervention | No | 100 (89.29) | 54 (75.00) | 6.554 | 0.010a |
| Yes | 12 (10.71) | 18 (25.00) | |||
In this study, we developed radiomics-based machine learning models to differentiate between severe and moderately severe ANP, enabling a more refined severity classification. The combined RF model, which utilized features from both the pancreatic parenchyma and peripancreatic necrotic collections on portal venous phase CECT, demonstrated superior performance. It achieved AUCs of 0.973 and 0.896 and accuracies of 89.8% and 83.9% in the training and test cohorts, respectively. These results suggest that radiomics can capture early ANP lesion heterogeneity - information not discernible on conventional imaging[9,10] - thus allowing for early identification of severe cases and guiding personalized treatment strategies. The refined classification of ANP severity, along with the proposed novel terminology of ANMSP and ANSP, enables both radiologists and clinicians to more accurately determine whether a patient with radiologically confirmed necrotizing pancreatitis should be classified as severe or moderately severe. This differentiation facilitates clinical decision-making for treatment strategies and supports early triage and stratification of such patients, thereby optimizing resource allocation. Critically, ANMSP and ANSP correlate strongly with hard endpoints (length of stay, ICU admission, need for intervention), which robustly supports the clinical utility of this classification.
The combined RF model demonstrated high sensitivity and specificity in the training cohort (0.846 and 0.934, respectively). In the test cohort, it maintained a high specificity of 0.944 but showed a more moderate sensitivity of 0.65. From a clinical perspective, high specificity is advantageous for early ANP severity stratification, as it effectively “rules in” severe cases. This minimizes false positives, thereby avoiding overtreatment of ANMSP patients and unnecessary ICU resource allocation - a key consideration in emergency settings. However, the sensitivity of 0.65 reflects a nonnegligible false negative rate (approximately 35%), which could delay critical interventions. Our model is intended as a decision-support tool, not a stand-alone diagnostic; thus, detecting false negatives requires combined clinical judgment and ongoing monitoring. To improve sensitivity and reduce the risk of missed diagnosis, two strategies are proposed. First, our high-specificity model serves as the first screen, followed by a more sensitive (potentially different) test or close monitoring for model-negative patients. Second, we plan to develop a multimodal model that integrates radiomic features with additional clinical data - such as early lab results and bedside scores - to better identify severe cases missed by imaging alone.
The dynamic evolution of necrosis and peripancreatic collections in early ANP is a key clinical characteristic. This “temporal variability” introduces instability into radiomic features extracted from a single early timepoint CT scan, posing a major challenge to model robustness and feature reproducibility. We argue that despite this instability, the clinical value of early prediction - even with inherent uncertainty - remains critical. Such a prediction can serve as a crucial “alert” to prompt immediate intensified monitoring of high risk patients, as delayed intervention may lead to serious outcomes. Looking ahead, robustness could potentially be enhanced by developing models that integrate serial CT imaging to capture the temporal trajectory of imaging features.
Furthermore, the RF-based radiomics model showed progressively higher AUC values for the pancreatic, peripancreatic, and combined models. In the training cohort, the AUC values were 0.962, 0.969, and 0.973, respectively; corresponding values in the test cohort were 0.840, 0.868, and 0.896. This performance pattern may be related to the prevalence of the three ANP subtypes. The 2012 revised Atlanta classification states that ANP most frequently presents as the mixed type, less commonly as only peripancreatic necrosis type, and rarely as only pancreatic parenchymal necrosis type[4]. Together, these findings suggest a positive correlation between the prevalence of the ANP subtypes and the discriminatory performance (AUC) of the corresponding radiomics models. Given the complex manifestations of ANP - which often involves both pancreatic and peripancreatic necrosis - Zhou et al[17] manually outlined extra-pancreatic collections on T2-weighted images (including the largest cross-section and adjacent slices) and the entire pancreatic parenchyma on late arterial-phase MRI to predict early peripancreatic necrosis in AP. However, their study did not combine these two regions into a unified model, and the delineation of peripancreatic areas was more limited than in our approach. Thus, integrating radiomic features from both pancreatic and peripancreatic regions, together with machine learning, allows for a more comprehensive and accurate assessment of disease involvement in ANP[18].
Radiomics addresses clinical needs by non-invasively extracting and analyzing numerous quantitative features from medical images for diagnostic purposes[19]. This approach is powerfully complemented by machine learning, a branch of artificial intelligence that employs algorithms to identify risk factors, discover patterns, and build predictive models from complex datasets. The applications of machine learning continue to grow rapidly[20,21]. Radiomics has been successfully applied to differentiate pancreatic malignancies from inflammatory diseases and to predict the severity of AP[4,9,22,23]. Yet, its potential for grading severity in ANP remains largely unexplored. Here, we present the first machine learning-based radiomics model designed for the early stratification of ANP patients into moderately severe and severe categories. Chen et al[24] utilized a limited peripancreatic region (5 mm from the pancreatic surface) in their CT-based prognostic model for AP. Lin et al[9] restricted their analysis to the pancreatic parenchyma alone in a magnetic resonance imaging-based model for AP severity prediction, and the peripancreatic region was not explored. Moving beyond these limited approaches, our study integrated comprehensive peripancreatic necrotic collections with pancreatic features to create a combined model that demonstrated superior discriminatory performance. Compared with established scoring systems (e.g., Acute Physiology and Chronic Health Evaluation II, BISAP, MCTSI), our model provides greater objectivity through quantitative imaging, reduces subjective variability, and may enable earlier assessment by utilizing admission CT scans. It also directly visualizes local disease morphology. However, it depends on high-quality contrast-enhanced CT and dedicated computational analysis. By providing an objective, quantitative data point derived from early imaging, our model adds significant value by augmenting - rather than replacing - traditional clinical judgment and established scores, particularly during the early diagnostic phase of ANP.
This study has some limitations. Firstly, as a retrospective single-center investigation, it may introduce selection bias. We plan to conduct multicenter, standardized, large-scale, and prospective studies in the next phase. Secondly, the manual delineation of ROIs is time-consuming and subject to subjective variability. Subsequent research could use artificial intelligence for automated segmentation to overcome this limitation. Further, the isotropic resampling to 5 mm voxels, while chosen for standardization and reproducibility, may not capture the most fine-grained texture details. Future studies with access to greater computational resources could explore the impact of finer resolutions (e.g., 1 mm) on feature robustness and model performance. Finally, this study did not include a direct performance comparison with established clinical scoring systems (e.g., Acute Physiology and Chronic Health Evaluation II, BISAP). Future prospective studies are needed to conduct such head-to-head validation.
In summary, integrating CECT-based radiomic features from both the pancreatic parenchyma and peripancreatic necrotic collections using multiple machine learning algorithms effectively differentiates between severe and moderately severe ANP. The combined RF model demonstrated optimal diagnostic performance. This approach improves the early diagnosis of severe ANP and supports clinical decision-making. Furthermore, refining the ANP classification into severe and moderately severe categories enables better patient triage, optimizes resource allocation, and aims to improve patient prognosis.
| 1. | Lankisch PG, Apte M, Banks PA. Acute pancreatitis. Lancet. 2015;386:85-96. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1000] [Cited by in RCA: 893] [Article Influence: 81.2] [Reference Citation Analysis (1)] |
| 2. | Xiao AY, Tan ML, Wu LM, Asrani VM, Windsor JA, Yadav D, Petrov MS. Global incidence and mortality of pancreatic diseases: a systematic review, meta-analysis, and meta-regression of population-based cohort studies. Lancet Gastroenterol Hepatol. 2016;1:45-55. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 647] [Cited by in RCA: 546] [Article Influence: 54.6] [Reference Citation Analysis (0)] |
| 3. | Boxhoorn L, Voermans RP, Bouwense SA, Bruno MJ, Verdonk RC, Boermeester MA, van Santvoort HC, Besselink MG. Acute pancreatitis. Lancet. 2020;396:726-734. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 860] [Cited by in RCA: 733] [Article Influence: 122.2] [Reference Citation Analysis (1)] |
| 4. | Banks PA, Bollen TL, Dervenis C, Gooszen HG, Johnson CD, Sarr MG, Tsiotos GG, Vege SS; Acute Pancreatitis Classification Working Group. Classification of acute pancreatitis--2012: revision of the Atlanta classification and definitions by international consensus. Gut. 2013;62:102-111. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 5667] [Cited by in RCA: 4785] [Article Influence: 368.1] [Reference Citation Analysis (7)] |
| 5. | Thoeni RF. The revised Atlanta classification of acute pancreatitis: its importance for the radiologist and its effect on treatment. Radiology. 2012;262:751-764. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 349] [Cited by in RCA: 271] [Article Influence: 19.4] [Reference Citation Analysis (2)] |
| 6. | Busireddy KK, AlObaidy M, Ramalho M, Kalubowila J, Baodong L, Santagostino I, Semelka RC. Pancreatitis-imaging approach. World J Gastrointest Pathophysiol. 2014;5:252-270. [PubMed] [DOI] [Full Text] |
| 7. | Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, Zegers CM, Gillies R, Boellard R, Dekker A, Aerts HJ. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441-446. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 4971] [Cited by in RCA: 4287] [Article Influence: 306.2] [Reference Citation Analysis (5)] |
| 8. | Kumar V, Gu Y, Basu S, Berglund A, Eschrich SA, Schabath MB, Forster K, Aerts HJ, Dekker A, Fenstermacher D, Goldgof DB, Hall LO, Lambin P, Balagurunathan Y, Gatenby RA, Gillies RJ. Radiomics: the process and the challenges. Magn Reson Imaging. 2012;30:1234-1248. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1905] [Cited by in RCA: 1634] [Article Influence: 116.7] [Reference Citation Analysis (4)] |
| 9. | Lin Q, Ji YF, Chen Y, Sun H, Yang DD, Chen AL, Chen TW, Zhang XM. Radiomics model of contrast-enhanced MRI for early prediction of acute pancreatitis severity. J Magn Reson Imaging. 2020;51:397-406. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 47] [Cited by in RCA: 49] [Article Influence: 8.2] [Reference Citation Analysis (6)] |
| 10. | Yu NJ, Li XH, Liu C, Chen C, Xu WH, Chen C, Chen Y, Liu TT, Chen TW, Zhang XM. Radiomics models of contrast-enhanced computed tomography for predicting the activity and prognosis of acute pancreatitis. Insights Imaging. 2024;15:158. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 4] [Reference Citation Analysis (0)] |
| 11. | Zhong J, Hu Y, Xing Y, Ge X, Ding D, Zhang H, Yao W. A systematic review of radiomics in pancreatitis: applying the evidence level rating tool for promoting clinical transferability. Insights Imaging. 2022;13:139. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 14] [Reference Citation Analysis (0)] |
| 12. | Balthazar EJ. Acute pancreatitis: assessment of severity with clinical and CT evaluation. Radiology. 2002;223:603-613. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 617] [Cited by in RCA: 455] [Article Influence: 19.0] [Reference Citation Analysis (0)] |
| 13. | Xiao B, Xu HB, Jiang ZQ, Zhang J, Zhang XM. Current concepts for the diagnosis of acute pancreatitis by multiparametric magnetic resonance imaging. Quant Imaging Med Surg. 2019;9:1973-1985. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 20] [Cited by in RCA: 32] [Article Influence: 4.6] [Reference Citation Analysis (4)] |
| 14. | Brizi MG, Perillo F, Cannone F, Tuzza L, Manfredi R. The role of imaging in acute pancreatitis. Radiol Med. 2021;126:1017-1029. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 62] [Cited by in RCA: 51] [Article Influence: 10.2] [Reference Citation Analysis (0)] |
| 15. | Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016;15:155-163. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 22854] [Cited by in RCA: 18720] [Article Influence: 1872.0] [Reference Citation Analysis (8)] |
| 16. | Tsai CF, Chen YC. The optimal combination of feature selection and data discretization: An empirical study. Inf Sci. 2019;505:282-293. [RCA] [DOI] [Full Text] [Cited by in Crossref: 31] [Cited by in RCA: 18] [Article Influence: 2.6] [Reference Citation Analysis (0)] |
| 17. | Zhou T, Xie CL, Chen Y, Deng Y, Wu JL, Liang R, Yang GD, Zhang XM. Magnetic Resonance Imaging-Based Radiomics Models to Predict Early Extrapancreatic Necrosis in Acute Pancreatitis. Pancreas. 2021;50:1368-1375. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 14] [Cited by in RCA: 11] [Article Influence: 2.2] [Reference Citation Analysis (4)] |
| 18. | Zhao Y, Wei J, Xiao B, Wang L, Jiang X, Zhu Y, He W. Early prediction of acute pancreatitis severity based on changes in pancreatic and peripancreatic computed tomography radiomics nomogram. Quant Imaging Med Surg. 2023;13:1927-1936. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 22] [Cited by in RCA: 26] [Article Influence: 8.7] [Reference Citation Analysis (0)] |
| 19. | Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology. 2016;278:563-577. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 6963] [Cited by in RCA: 6202] [Article Influence: 620.2] [Reference Citation Analysis (5)] |
| 20. | Zhou Y, Ge YT, Shi XL, Wu KY, Chen WW, Ding YB, Xiao WM, Wang D, Lu GT, Hu LH. Machine learning predictive models for acute pancreatitis: A systematic review. Int J Med Inform. 2022;157:104641. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 7] [Cited by in RCA: 60] [Article Influence: 12.0] [Reference Citation Analysis (0)] |
| 21. | Deo RC. Machine Learning in Medicine. Circulation. 2015;132:1920-1930. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2872] [Cited by in RCA: 2198] [Article Influence: 199.8] [Reference Citation Analysis (9)] |
| 22. | Ren S, Zhang J, Chen J, Cui W, Zhao R, Qiu W, Duan S, Chen R, Chen X, Wang Z. Evaluation of Texture Analysis for the Differential Diagnosis of Mass-Forming Pancreatitis From Pancreatic Ductal Adenocarcinoma on Contrast-Enhanced CT Images. Front Oncol. 2019;9:1171. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 36] [Cited by in RCA: 41] [Article Influence: 5.9] [Reference Citation Analysis (0)] |
| 23. | Park S, Chu LC, Hruban RH, Vogelstein B, Kinzler KW, Yuille AL, Fouladi DF, Shayesteh S, Ghandili S, Wolfgang CL, Burkhart R, He J, Fishman EK, Kawamoto S. Differentiating autoimmune pancreatitis from pancreatic ductal adenocarcinoma with CT radiomics features. Diagn Interv Imaging. 2020;101:555-564. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 92] [Cited by in RCA: 82] [Article Influence: 13.7] [Reference Citation Analysis (1)] |
| 24. | Chen H, Wen Y, Li X, Li X, Su L, Wang X, Wang F, Liu D. Integrating CT-based radiomics and clinical features to better predict the prognosis of acute pancreatitis. Insights Imaging. 2025;16:8. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 7] [Reference Citation Analysis (0)] |