Yang YH, Li Y. Deep learning-based imaging model to predict early hematoma enlargement and hospital mortality in spontaneous intracerebral hemorrhage. World J Radiol 2026; 18(1): 115504 [DOI: 10.4329/wjr.v18.i1.115504]
Corresponding Author of This Article
Yu-Han Yang, MD, West China Hospital, Sichuan University, No. 17 People’s South Road, Chengdu 6100041, Sichuan Province, China. yyh_1023@163.com
Research Domain of This Article
Computer Science, Artificial Intelligence
Article-Type of This Article
Retrospective Cohort Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Jan 28, 2026 (publication date) through Jan 28, 2026
Times Cited of This Article
Times Cited (0)
Journal Information of This Article
Publication Name
World Journal of Radiology
ISSN
1949-8470
Publisher of This Article
Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA
Share the Article
Yang YH, Li Y. Deep learning-based imaging model to predict early hematoma enlargement and hospital mortality in spontaneous intracerebral hemorrhage. World J Radiol 2026; 18(1): 115504 [DOI: 10.4329/wjr.v18.i1.115504]
Yu-Han Yang, West China Hospital, Sichuan University, Chengdu 6100041, Sichuan Province, China
Yuan Li, State Key Laboratory of Biotherapy and Cancer Center, Department of Pediatric Surgery, Laboratory of Digestive Surgery, West China Hospital, Sichuan University, Chengdu 610041, Sichuan Province, China
Author contributions: Yang YH drafted the manuscript and performed critical revision of the manuscript; Li Y and Yang YH contributed to the study conception and design, data acquisition, and analysis and interpretation of data; and all authors read and approved the manuscript.
Institutional review board statement: This study was approved by the Medical Ethics Committee of West China School of Medicine, Sichuan University.
Informed consent statement: The requirement for written informed consent was waived by the institutional review boards.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.
Data sharing statement: De-identified individual participant data that underlie the results reported in this article are available from the corresponding author upon reasonable request. Data sharing is subject to approval by the relevant institutional review boards and execution of a data-use agreement to ensure protection of patient privacy and compliance with applicable regulations. Due to institutional policies and patient privacy considerations, raw imaging data or any data containing potentially identifying information will not be publicly released.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Yu-Han Yang, MD, West China Hospital, Sichuan University, No. 17 People’s South Road, Chengdu 6100041, Sichuan Province, China. yyh_1023@163.com
Received: October 20, 2025 Revised: November 17, 2025 Accepted: January 6, 2026 Published online: January 28, 2026 Processing time: 100 Days and 13.4 Hours
Abstract
BACKGROUND
Spontaneous intracerebral hemorrhage (ICH) is a severe form of stroke with high early mortality, and hematoma enlargement (HE) occurs in roughly one-third of patients and strongly predicts poor outcomes. Quantitative image analysis using handcrafted radiomics and deep learning-derived features can capture hematoma and perihematomal edema (PHE) heterogeneity objectively that the combination of these approaches with clinical data may improve early prediction of HE and in-hospital mortality.
AIM
To evaluate and validate the predictive performance of hematoma- and PHE-derived features on non-contrast computed tomography via handcrafted radiomics and automatic deep learning analysis for prediction of early HE and hospital mortality in spontaneous ICH.
METHODS
Of 322 patients with basal ganglia ICHs were included retrospectively between June 2018 and June 2020, and assigned into the training cohort (n = 225) and the testing cohort (n = 97). We extracted features on hematoma and PHE subregions via handcrafted radiomics analysis manually and deep learning analysis of pretrained convolutional neural networks via transfer learning automatically. Support vector machine was adopted as the classifier for prediction of HE and hospital mortality. The clinical-radiological integrated models for HE and hospital mortality were constructed on clinical data and radiological signatures generated from the radiological models with the optimal area under the receiver operating characteristics curve in the testing cohort.
RESULTS
The clinical-radiological model combining clinical information and hematoma- and PHE-derived computed tomography features for prediction of HE implied an area under the receiver operating characteristics curve of 0.828 with 95% confidence interval of 0.714 to 0.942 with accuracy of 72.89%, sensitivity of 70.00%, and specificity of 74.52% in the testing cohort. The model integrating clinical and radiological features showed great identification performance for predicting hospital mortality, demonstrating significant classification and discrimination abilities after validation.
CONCLUSION
Quantitative radiomics features from hematoma and PHE regions on non-contrast computed tomography images showed good performance for predicting HE and hospital mortality in patients with ICH.
Core Tip: In this work, we developed quantitative and easy-to-reach prediction tools for early hematoma enlargement in spontaneous intracerebral hemorrhage based on the radiological features from deep learning or handcrafted radiomics methods, and validated the predictive models in an independent cohort to assure their discriminative capacities. The artificial intelligence based computer aided diagnosis methods we used to predict hematoma enlargement in spontaneous intracerebral hemorrhage on computed tomography images would assist making decisions about whether clinicians should implement positive surgical intervention or not at early stage once admission.
Citation: Yang YH, Li Y. Deep learning-based imaging model to predict early hematoma enlargement and hospital mortality in spontaneous intracerebral hemorrhage. World J Radiol 2026; 18(1): 115504
Spontaneous intracerebral hemorrhage (ICH) accounts for 10%-30% of all strokes[1-3], and is a lethal one associated with a high mortality rate, especially within hospital after onset[4,5]. There are nearly one-third patients occurring hematoma enlargement (HE) in patients with spontaneous ICH[6], which acts as an independent predictors for poor prognosis[7] and a potential instruction for clinical interventions[8]. Otherwise, perihematomal edema (PHE), the edema surrounding the hematoma, is also common and significantly associated with the patients’ outcomes due to the injuries of growing hematoma and the toxic effects of blood and its degradation products[9,10]. The early and accurate identification of these primary and secondary biomarkers might assist making clinical decisions and individualized therapeutic plans.
Non-contrast computed tomography (NCCT) is the first-line radiological diagnostic methods for accurate strokes[1]. The application of NCCT has been demonstrated its value in detection of conventional imaging characteristics such as the swirl sign, blend sign, black hole sign, and irregular shape, all of which have been considered as potential predictors[11-13]. However, those qualitative signatures represent overlapping criteria and high subjectivity, which are hard to standardize and generalize for further utility. Thus, a quantitative and easy-to-reach tool on NCCT images is warranted for evaluation of hematoma and PHE.
As the development of radiomics techniques, medical imaging materials have provided more and more quantitative information for exploration and interpretation in assistance of clinical decisions[14,15]. The quantitative information has been found significant prediction value widely in oncological researches, and recently more and more studies have focuses on its application in ICH that handcrafted radiomics features can independently predict early HE[16-18]. Deep learning methods show identification and segmentation abilities on images directly, interpret accurate and consistent imaging features automatically, and indicate associations with various clinical endpoints via multilayered convolutional neural networks (CNNs)[19,20]. As for ICH, a previous study achieved satisfactory segmentation performance automatically via deep learning methods in detection of hematoma and PHE. Transfer learning on pretrained CNNs, an important branch of deep learning methods, can extract accurate and stable quantitative features on small clinical dataset automatically[21,22]. Furthermore, the combination of handcrafted and automatic features can complement their own shortages, and yield stable performance in predicting clinical outcomes[23]. Until now, there have been few literature focused on using handcrafted radiomics and deep learning features extracted from hematoma and PHE to predict HE.
The objectives of our study were to apply a noninvasive tool on NCCT-based deep learning and traditional handcrafted radiomics methods to quantify the heterogeneity of hematoma and PHE, and predict the occurrence of HE and hospital mortality.
MATERIALS AND METHODS
Patients
We obtained the Institutional review board approval for this retrospective study in our hospital, and the informed consent was waived. We retrospectively recruited patients with spontaneous ICH between June 2018 and June 2020. All patients received a baseline NCCT examination within 6 hours since the onset of ICH symptoms, and a follow-up NCCT evaluation at 24 hours. The hematomas were located at basal ganglia at initial attack. The exclusion criteria were as follows: (1) Cerebral hemorrhage caused by trauma or other secondary causes such as cerebral aneurysms, vascular malformations, venous sinus embolism, and brain tumors; (2) Primary intraventricular hemorrhage; (3) Surgery for hematoma removal before the 24 hours follow-up NCCT; and (4) Images with artifacts. All eligible patients had complete baseline data including demographics and potential clinical data. The detailed information about age, gender, time to arrival, time to baseline computed tomography (CT), systolic blood pressure, diastolic blood pressure, heart rate, Glasgow score, National Institute of Health stroke scale score, the medical history, and medication history was collected at admission. The included patients were distributed into the training and testing cohorts randomly with a ratio of 7:3. The detailed procedure of CT imaging acquisition was shown in Supplementary material.
Reference standards for endpoints
The definition of hematoma expansion, the primary outcome in this study, was referred as hematoma volume growth exceeding 6 mL or 33% from the baseline volume[8,24]. The evaluation of baseline and follow-up ICH volumes were completed by computer-assisted multi-slice planimetric and voxel calculation techniques with Python. The hospital mortality, as the secondary outcome was defined as the death occurring after the in-hospital admission.
Hematoma and PHE segmentation
The regions of interest (ROIs) of hematoma and PHE were delineated by two experienced radiologists in baseline and follow-up NCCT images separately with a window position of 30-40 HU and a window width of 90-100 HU using ITK-SNAP software[25]. For handcrafted radiomics features extraction, the contoured regions covered the whole hematoma volume and surrounding PHE. To assure the inter-observer reproducibility for handcrafted radiomics features of the whole volume, 20% patients of the training cohort were selected randomly and blindly for radiologists, and received repetitive ROIs segmentation a week after the first-time segmentation. Only the radiomics features with intra-class correlation coefficients > 0.85 were considered for further analysis. As for deep learning feature extraction, we contoured three consecutive slices with the maximum cross-sectional area of hematomas which would be resized into 224 mm × 224 mm with a bounding box covering the radiologist-delineated ROIs appropriate for the input layer of pretrained CNNs.
Handcrafted radiomics features
The handcrafted radiomics features were generated based on manual ROIs, which were identified and depicted by radiologists via naked eyes. Experienced radiologists draw the exact area of hematoma and PHE on NCCT images considering the handcrafted nature. We derived handcrafted radiomics features on the exact ROIs of hematoma and PHE respectively via an automated mode[26]. Radiomics features from the radiologist-drawn ROIs were extracted with or without wavelet filtration. There were 851 features extracted from hematoma or PHE respectively, including 107 features from original NCCT images and 744 features from wavelet filtered images. The study design complied with the image biomarker standardization initiative (IBSI) reporting guidelines. Radiological associated biomarker consisted of three categories: (1) First-order statistics; (2) Shape features; and (3) Second-order statistics: Gray level co-occurrence matrix, gray level run length matrix, gray level size zone matrix, gray level dependence matrix, and neighborhood gray tone difference matrix. Most of handcrafted radiomics features mentioned above showed consistency with feature definitions in accordance with the IBSI guidelines[27,28]. Radiomics extraction was performed with PyRadiomics v2.1.2. Images were resampled to isotropic voxels of 1 mm × 1 mm × 1 mm. Image intensity discretization used a fixed bin width of 25 HU and voxel array shift of 1000. Segmented voxels were restricted to intensities in the range 50-400 HU prior to feature extraction to avoid air and bone artifacts. We extracted original and wavelet-filtered features; wavelet decomposition used separable 1D high-pass (H) and low-pass (L) filters across x, y, z axes, yielding the eight combinations (LLL, LLH, LHL, LHH, HLL, HLH, HHL, HHH). Extracted feature classes included firstorder statistics, shape, and texture matrices (gray level co-occurrence matrix, gray level run length matrix, gray level size zone matrix, gray level dependence matrix) as defined by PyRadiomics/IBSI conventions; minor definition differences between PyRadiomics and IBSI (e.g., Total Energy, kurtosis normalization) are noted and handled as in Supplementary material.
Deep learning features
A total of six base models were applied on hematoma and PHE sequences respectively for the extraction of deep learning features with representativeness, including Xception[29], VGG16[30], VGG19[30], ResNet50[31], InceptionV3[32], and InceptionResNetV2[33]. These six CNNs were commonly used and pre-trained by the large-scale and well-annotated ImageNet database[34]. We took away the last fully-connected layer at the top of the CNN and utilized the global max pooling (GMP) to obtain the maximum values of each layer of the feature maps which could transform feature maps to raw values ultimately. These extracted features from different pre-trained CNNs would be used for the features selection and model construction via the machine learning approaches at next step[35]. The mechanism for the predictive potential of deep learning derived features was not clear considering the complexity of pre-trained CNNs. Additionally, we used pre-trained CNN backbones initialized from ImageNet weights. For each backbone we removed the final fully-connected layers and applied a GMP layer to the last convolutional feature maps to produce feature vectors. During transfer learning we froze the initial blocks for ResNet50: Layers conv1-through-layer 3 and fine-tuned the final convolutional blocks for ResNet50: Layer4 together with the appended GMP and a small classifier head. We also tested an alternative scheme where all convolutional layers were frozen, retaining only a trainable classifier head; this produced inferior performance. Detailed description about deep learning features was shown in Supplementary material. Guided gradient-weighted class activation mapping visualized the output of CNNs in the last convolutional layer[36], which highlighted specific subregions CNNs concerned for the generation of deep learning features.
Input to the CNNs comprised three consecutive axial slices centered on the slice with the maximum hematoma cross-sectional area such as the slice with greatest hematoma ROI area, chosen to capture contiguous spatial context while limiting input dimensionality. Selected 3 slice patches were cropped with a bounding box around the radiologist-delineated ROI and resized to 224 pixels × 224 pixels consistent with the pre-trained networks’ input size. Intensity normalization was applied per-slice with mean subtraction and division by standard deviation to match the pretraining distribution. GMP was chosen after empirical testing because GMP emphasizes the most strongly activated locations in the feature maps likely focal hematoma heterogeneity and produced lower dimensional representations with good discriminative performance in preliminary experiments. We compared GMP against global average pooling and flattened feature maps that GMP achieved slightly higher area under the receiver operating characteristic (ROC) curves (AUCs) on cross-validation and lower overfitting tendency, so we used GMP in the final models.
Feature selection and model construction
Three kinds of radiological models based on hematoma features, PHE features, and combined features were trained to predict occurrence of HE and hospital mortality, respectively. As for the construction of radiological models, a previous study has combined various feature selection methods and classifiers and evaluated the predictive performance by the AUC[37]. We adopted the optimal combination with the largest AUC among all combinations as the standard method in this study. We selected deep learning or handcrafted radiomics features as the standard method in the training cohort. The top 20% best features predictive for HE or hospital mortality were selected calculated by univariate analysis. Next, we implemented a wrapper feature selection method strategy regarding the recursive feature addition algorithm to select the predictive features with high AUC value. We used the support vector machine (SVM) with a radial basis kernel function as a classifier[35]. Feature selection proceeded in three steps. First, univariate association with the outcome was assessed that we ranked features by absolute effect size. We retained the top 20% of features by this ranking for multivariate selection. Second, we applied a recursive feature addition (wrapper) algorithm with cross validation in the training set: Features were iteratively added in order of univariate ranking if their inclusion increased cross validated AUC on the training folds by a prespecified margin (≥ 0.005) until no further improvement occurred. Third, to reduce multicollinearity we computed pairwise Spearman correlations among retained features and, for any pair with correlation r > 0.9, we removed the feature with lower univariate ranking. We also report variance inflation factors for the final feature set with variance inflation factors < 5 for all features.
As for the prediction model for HE, the integrated model consisted of selected handcrafted radiomics features and deep learning features from the deep learning model with highest AUC value in the testing cohort among all deep learning models. To optimize the discriminative performance of the prediction model for hospital mortality, we selected the radiological model with the highest AUC value of the testing cohort to generate the deep learning signature, took the effect of HE into consideration, and constructed a clinical-radiological integrated model via SVM method. Meanwhile, the clinical signature, presence or absence of HE, was identified its prediction value for the hospital mortality in the training and testing cohorts for comparison. SVM with radial basis function (RBF) kernel was selected based on prior literature and internal benchmarking. In our training cohort we compared SVM with RBF with logistic regression (penalized L2), random forest, and XGBoost using nested crossvalidation (5 folds × 5 folds) that SVM (RBF) achieved the highest mean AUC for both HE and mortality tasks. Hyperparameters for SVM (C, gamma) were tuned using the inner CV grid search.
Statistical analysis
The distributions of baseline characteristics between the HE group and non-HE group were evaluated by χ2 test for categorical data, and non-parametric Mann-Whitney test for continuous data. To evaluate predictive performance of our models, we applied the ROC curves and their AUC value with 95% confidence interval, and precision-recall plots and their Brier score value, respectively. Accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score were calculated from confusion matrix to assess predictive models quantitatively. The calibration performance was measured by the Hosmer-Leme show test and visualized by calibration plots[38]. Discrimination ability was quantified by Harrell’s concordance indices (C-index)[39]. The clinical usefulness of the clinical-radiological model in prediction of hospital mortality was evaluated by the net benefit of decision curve analysis[40]. A two-tailed P value less than 0.05 was considered as statistical significance. The whole statistical analysis and graphic production were completed by Python (version 3.8) and R (version 3.6.1). All involved packages in this study are listed in Supplementary material.
RESULTS
Demographic characteristics
The overall workflow, including feature extraction, predictive model development, and performance assessment has been illustrated in Figure 1. A total of 322 patients were enrolled, of whom 225 comprised the training set and 97 comprised the testing set. Baseline characteristics were compared between patients who experienced HE and those who did not in both cohorts (Table 1). No statistically significant differences were observed in baseline variables between the HE and non-HE groups in either the training or testing cohort (all P > 0.05).
Figure 1 A general flowchart of data analysis.
A: Radiological features from hemorrhage and perihematomal edema non-contrast computed tomography images were extracted by the deep learning analysis and handcrafted radiomics analysis, respectively; B: The prediction models in identification of early enlargement of spontaneous intracerebral hemorrhage were approached via machine learning methods using radiological features; C: The prognostic model in prediction of hospital death took radiological features and the effect of hematoma expansion into account, and was visualized by nomogram. ROC: Receiver operating characteristic; SVM: Support vector machine.
Table 1 Clinical characteristics of patients in the training and testing cohorts, n (%).
In the handcrafted radiomics approach, eight features were ultimately selected from the primary hematoma and eight from PHE. Individually, the hematoma- and PHE-derived radiomics models achieved AUCs of 0.902 and 0.873, with accuracies of 82.67% and 80.89%, respectively, in the training cohort. Performance declined substantially in the independent testing cohort, where the hematoma- and PHE-based models yielded AUCs of 0.564 and 0.507 and accuracies of 53.61% and 55.67% (Supplementary Tables 1 and 2). A combined model incorporating features from both the hematoma and PHE attained an AUC of 0.895 and accuracy of 80.00% in the training set. On the test set, the integrated handcrafted model produced an AUC (and C-index) of 0.635, accuracy of 54.64%, sensitivity 62.86%, specificity 50.00%, PPV 41.51%, and NPV 70.45% (Supplementary Tables 1 and 2). Corresponding ROC curves, precision–recall curves, and calibration plots for the hematoma-, PHE-, and combined models (whole-volume analysis) are presented in Supplementary Figure 1.
Eighteen predictive models based on deep-learning-derived features were developed, comprising six hematoma-specific models, six PHE-specific models, and six combined models. These models were constructed using features selected from various pre-trained CNNs (Supplementary Table 3). In the hematoma- and PHE-derived models, AUC values in the training set ranged from 0.865 to 0.968, whereas AUCs in the independent testing set ranged from 0.571 to 0.747 (Supplementary Tables 1 and 2). ROC curves and calibration plots for both cohorts are presented in Supplementary Figure 2. Among the combined models, the ResNet50-based model coupled with an SVM classifier (RN-SVM) demonstrated the best discrimination on the test data, achieving an AUC of 0.841 in the training cohort and an AUC of 0.774 in the testing cohort (C-index 0.774). In the testing set this model yielded an accuracy of 60.82%, sensitivity of 88.57%, specificity of 45.16%, PPV of 47.69%, and NPV of 87.50% (Supplementary Tables 1 and 2). Comparative performance and calibration of the RN-SVM and other combined models are depicted in Supplementary Figure 3. In addition, for the ResNet50 architecture we assessed models built from features extracted at earlier convolutional layers to confirm that features from the final pre-fully connected layer were superior for predictive performance; these results support the chosen feature-extraction strategy (Supplementary Table 4). Finally, activation (feature) maps derived from ResNet50 identified salient subregions contributing to model outputs (Figure 2), and the resulting heatmaps delineated hematoma and PHE regions that were most informative for the learned feature representations, facilitating clinical interpretation.
Figure 2 Feature heatmaps of representative patients on the deep learning ResNet50 algorithm via the guided grad-class activation mapping.
The original non-contrast computed tomography images and their corresponding feature heatmaps were shown from left to right. The red color highlighted the region of interest on the hemorrhage and perihematomal edema during the deep learning analysis. Four cases were shown with subregions of hemorrhage and perihematomal edema, respectively. The left two indicated hematoma enlargement cases, while the right two indicated non-hematoma enlargement cases.
In the training cohort, a clinical model yielded modest discrimination (AUC = 0.534; C-index = 0.534) with overall accuracy of 55.67%, sensitivity 45.71%, and specificity 61.29%. By contrast, a radiological model incorporating selected handcrafted radiomics and deep features extracted with ResNet50 achieved substantially higher performance: AUC = 0.933 and accuracy = 87.56% in the training set, and in the independent testing cohort an AUC = 0.713 (C-index = 0.713), accuracy = 72.16%, sensitivity = 60.00%, and specificity = 79.03% (Table 2, Supplementary Table 1). An integrated clinical-radiological model combining clinical variables with the radiological signature further improved discrimination, reaching AUC = 0.973 and accuracy = 92.00% in training, and AUC = 0.828 (C-index = 0.828), accuracy = 72.89%, sensitivity = 70.00%, and specificity = 74.52% in testing (Table 2, Supplementary Table 1). Both the radiological and the combined models demonstrated superior classification performance on ROC and precision-recall analyses Figure 3 (radiological: Figure 3A and B; clinical-radiological: Figure 3D and E) and exhibited satisfactory calibration (radiological: Figure 3C; clinical-radiological: Figure 3F).
Figure 3 Evaluation of predictive performances for the radiological and clinical-radiological models in prediction of early enlargement of spontaneous intracerebral hemorrhage.
A: Receiver operating characteristic curves for the predictive performance of the radiological model in the training and testing cohorts, respectively; B: Precision-recall plots for the predictive performance of the radiological model in the training and testing cohorts, respectively; C: Curves of the calibration analysis for the radiological model in the training and testing cohorts, respectively; D: Receiver operating characteristic curves for the predictive performance of the clinical-radiological model in the training and testing cohorts, respectively; E: Precision-recall plots for the predictive performance of the clinical-radiological model in the training and testing cohorts, respectively; F: Curves of the calibration analysis for the clinical-radiological model in the training and testing cohorts, respectively.
Table 2 Predictive performance of radiological models, clinical model and clinical-radiological model in prediction of early enlargement of spontaneous intracerebral hemorrhage on patients in the testing cohort.
Baseline characteristics were compared between patients with and without hospital mortality in training and testing cohorts, respectively (Supplementary Table 5). No significant differences between alive group and dead group were found in baseline variables of both the training and testing cohorts (P > 0.05).
The HE for hospital mortality yielded the predictive effect with an AUC of 0.969 and 0.546, and an accuracy of 96.00% and 65.98% for the training and testing datasets, respectively (Table 3, Supplementary Table 6), and was visualized by ROC curves, precision-recall plots, and calibration plots (Supplementary Figure 4).
Table 3 Predictive performance of radiological models, clinical model and clinical-radiological model in prediction of hospital death on patients in the testing cohort.
Six radiomic features extracted from the primary hematoma and six from PHE were retained for the handcrafted radiomics models (Supplementary Table 7). In the training set, the hematoma- and PHE-derived models achieved AUC values of 0.970 and 0.996, with corresponding accuracy of 94.22% and 96.89%, respectively. However, performance declined on the independent test set: The hematoma model yielded an AUC of 0.600 and 70.10% accuracy, while the PHE model attained an AUC of 0.516 and 657.73% accuracy (Supplementary Tables 3 and 6). Combining hematoma- and PHE-derived features produced a composite handcrafted radiomics model with an AUC of 0.975 and 95.56% accuracy in the training cohort. On the test cohort this integrated model achieved an AUC (and C-index) of 0.603, accuracy 67.01%, sensitivity 50.00%, specificity 68.97%, PPV 15.63%, and NPV 92.31% (Supplementary Tables 3 and 6). The ROC curves, precision-recall curves, and calibration plots for the hematoma, PHE, and combined models based on whole-volume analysis are presented in Supplementary Figure 5.
We developed 18 predictive models based on deep-learning-derived imaging features: Six derived from hematoma regions, six from PHE, and six combining both feature sets (Supplementary Table 8). Across all hematoma- and PHE-specific models, AUC values in the training cohort were very high (0.977-0.998) but decreased substantially in the independent testing cohort (0.502-0.606) (Supplementary Tables 3 and 6). Corresponding ROC curves and calibration plots for both cohorts are presented in Supplementary Figure 6.
Among the combined-feature classifiers, the RN-SVM model demonstrated the best discrimination on the held-out data. Its performance metrics in the testing cohort were: AUC = 0.705 (training AUC = 0.982), concordance index = 0.705, accuracy = 64.95%, sensitivity = 60.00%, specificity = 65.52%, PPV = 16.67%, and NPV = 93.44% (Supplementary Tables 3 and 6). ROC curves and calibration plots comparing the RN-SVM to other combined models are shown in Supplementary Figure 7. Finally, to confirm the appropriateness of our feature-extraction strategy for ResNet50, we compared models built using features from earlier network layers with those using the final pre-fully connected layer (Supplementary Table 9); these results supported the superiority of features extracted from the last convolutional stage.
A deep learning signature was developed based on the combined RN-SVM model, and a handcrafted radiomics signature was also generated based on the selected features for hospital mortality. A radiological model was constructed combining the handcrafted radiomic and deep learning signatures with an AUC of 0.655, the C-index of 0.655, accuracy of 67.01%, sensitivity of 30.00%, and specificity of 71.26% in the testing cohort (Supplementary Figure 8). Furthermore, a clinical-radiological integrated model was constructed in corporation with the radiological signatures and the occurrence of HE (Figure 4A). This integrated model showed an AUC of 0.992 and accuracy of 96.00% in the training cohort, and an AUC of 0.754, the C-index of 0.754, accuracy of 71.13%, sensitivity of 60.00%, specificity of 72.41%, PPV of 20.00%, and NPV of 94.03% in the testing cohort (Table 3, Supplementary Table 6). Low PPVs reflected low outcome prevalence with hospital mortality rate of 10.2% in the training cohort and 10.3% in the testing cohort, hence NPV was high. We therefore additionally reported precision-recall curve and used decision-curve analysis as a clinical-oriented metric. The classification ability of the clinical-radiological model was found great concerning ROC curves and precision-recall plots (Figure 4B and C) with consistent calibration capacities (Figure 4D) and satisfactory clinical benefit (Figure 4E).
Figure 4 Evaluation of predictive performances for the integrated nomogram model on hematoma expansion and the deep learning signature in prediction of hospital death.
A: Nomogram model combining hematoma expansion and the deep learning signature generated from the best radiological model considering area under the receiver operating characteristic curve of the testing cohort; B: Receiver operating characteristic curves for the predictive performance of the integrated nomogram model in the training and testing cohorts, respectively; C: Precision-recall plots for the predictive performance of the integrated nomogram model in the training and testing cohorts, respectively; D: Curves of the calibration analysis for the integrated nomogram model in the training and testing cohorts, respectively; E: The decision curve analysis for the integrated nomogram model.
DISCUSSION
In this study, we identified heterogeneity of hematoma and PHE, and developed three kinds of prediction models for HE using hematoma-derived, PHE-derived, and combined features from handcrafted radiomics and deep learning analysis. The clinical-radiological prediction model for HE integrating clinical information and hematoma- and PHE-derived CT features were identified the superiority above pure radiological models. Otherwise, a clinical-radiological model was integrated in prediction of hospital mortality with satisfactory performance. The present study generated a reliable and each-to-reach tool to distinguish patients with high risk of HE and/or hospital mortality at early stage, which provided effective instructions for appropriate therapeutic regimen.
Previous studies have demonstrated the relationship between several subjective NCCT biomarkers, such as swirl sign, blend sign, and island sign, and the occurrence of HE. These biomarkers interpreted the heterogeneity of hematoma on NCCT images essentially that these signs showed the difference of hypo- and/or iso-attenuation ROIs within the hyper-attenuated hematoma[41]. There was plausible evidence in explanation of the association between the heterogeneous hematoma and HE. The hyper-attenuation ROIs could be interpreted as the sign of coagulated and contracted clots, and the hypo- and/or iso-attenuation ROIs could be interpreted as the sign of fresh bleeding[42]. The heterogeneity of hematoma increased with the growth of hypo- and/or iso-attenuation regions, which indicated more active bleeding points within the hematoma regions. The continuation of active bleeding points resulted in the rupture of peripheral vessels and expansion[24], which might explain the occurrence of HE pathologically. However, these subjective signs existed shortages in discrimination of hematoma with relatively low accuracy and hard-to-reach standardized criteria[43]. Therefore, quantitative measurement of ROIs in ICH has still been considered as a reliable solution in further clinical utility.
As the development of computer-aided diagnosis (CAD), there have been more and more quantitation techniques used to predict clinical outcomes in ICH. At first, CT densitometry had been applied to extract quantitative features in predicting ICH growth[44,45]. With the development of texture analysis, handcrafted radiomics features have achieved satisfactory prediction performance in ICH. The various extraction filters have approached to evaluate variance, uniformity and heterogeneity on NCCT comprehensively to enhance the linkage with HE[16,17]. In this study, we also used handcrafted radiomics features, and constructed radiological models with relatively low accuracy (AUC: 0.635, accuracy: 54.64%) compared with the results of previous study (AUC: 0.729, accuracy: 72.6%). However, the radiomics analysis using handcrafted features has still been too limited in the inter-observer difference and various extractors to reach standardization. An automatic segmentation approach in detection of ROIs has been sought to interpret associations with clinical outcomes in ICH. Recently, deep learning technique, as an advanced branch of CAD, has been expected to complete automated delineation and enhance the efficiency of prediction models to in medical images. However, training a deep CNN from scratch on a limited, institution-specific dataset often leads to pronounced overfitting, particularly for narrowly defined clinical tasks where labeled medical images are scarce. A practical remedy is to use pre-trained CNNs as feature extractors: Transfer learning enables representations learned from large-scale source datasets to be adapted to small medical-image cohorts, thereby improving model generalizability and facilitating reproducible validation and replication efforts[22]. Based on previous literature, the efficacy of deep learning (DL)-based transfer learning method has been demonstrated in ICH growth prediction, achieving satisfactory predictive results[46-48]. The internal operations of these pretrained CNNs were not directly interpretable for explaining how radiological features were generated. Therefore, we visualized the networks’ feature maps to localize the subregions that contributed most to feature formation and converted those activations into highlighted areas. This visualization demonstrated that deep learning methods can effectively identify image patterns associated with hematoma and PHE.
In this study, we tried to compare the radiological models on handcrafted radiomics features and automatic deep learning features in prediction of HE. The radiological models on deep learning features achieved better discriminative performance than those on handcrafted radiomics features. We also expected to optimize predictive abilities by integrating handcrafted radiomics features and deep learning features, but the actual results were unsatisfactory which might attribute to the underlying contradictions between these two kinds of radiological features. Otherwise, the hospital mortality was defined as the secondary outcome in our study that the mortality was also an important endpoint and warranted for further investigation to explore the efficiency of our radiological models. We considered the effect of HE on hospital mortality, and then generated a clinical-radiological model in prediction of hospital mortality, which might be an effective complementary measure for conventional baseline evaluation. All CAD models in prediction of HE and/or hospital mortality assessed the intact lesions to quantify heterogeneity of hematoma and/or PHE to decrease the uncertainty, and provided plausible suggestion to optimize personalized treatment at admission as early as possible after onset. Combining handcrafted radiomics and deep learning features reduced HE testing performance relative to DL alone. There might be potential reasons attributing to redundancy/contradiction between feature types causing the wrapper selection to include conflicting signals, addition of noisy radiomics features increased dimensionality and overfitting, and differences in information scale (local texture vs high level learned patterns) meant naive concatenation harmed model generalization. Our research group used to perform a pilot ablation analysis, which showed that: (1) Concatenated feature set without correlation pruning led to worse testing performance; (2) Concatenated set with aggressive correlation removal (r > 0.8) partially recovered performance; and (3) Supervised dimensionality reduction of concatenated features didn’t outperform DL alone. Based on these results we prioritized DL-only radiological signatures for HE while retaining the combined approach in exploratory analyses. For hospital mortality, the radiological signature with the best testing AUC (RN-SVM) was selected with integration of occurrence of HE and clinical variables that a combined radiomics plus DL signature was considered but performed worse on internal testing, so the final mortality model used the best-performing radiological signature to maximize generalizability. Low PPVs reflected low outcome prevalence with hospital mortality rate of 10.2% in the training cohort and 10.3% in the testing cohort, hence NPV is high. We therefore additionally report model’s performance on precision-recall plots and balanced accuracy in main tables and used decision-curve analysis as a clinical-oriented metric.
There were some limitations existing in this study. First, selection bias might be formed because of the strict inclusion criteria and small sample size in this study. We only included a small part of ICH, which might limit the clinical use of the radiological model. Since our testing cohort was internal and single-center, external validation is required in further research. We plan a multi-center external validation on independent NCCT datasets, including lobar and cerebellar ICH to assess generalizability beyond basal ganglia ICH. Second, we only chose short-term clinical outcomes which needed further supplement with long-term clinical outcomes and quality of life after discharge. In addition, despite the use of transfer learning, a mismatch remains between the domains of the pre-trained CNN backbones and the target datasets. An optimal remedy would be the creation of large, task-specific medical imaging repositories with comprehensive expert annotations, enabling training of CNNs from scratch and thereby enhancing generalizability and clinical utility. Moreover, our approach to evaluating feature robustness has limitations: Contour-based ROI methods may be somewhat less reliable for repeatability assessment than test-retest imaging protocols[49,50]. The test-retest framework is more suitable for establishing feature robustness in prospective studies, which our team is currently planning to undertake. In present study, only the short-term hospital mortality was used as a secondary outcome that long-term functional outcomes were unavailable for all patients. We will collect and analyze functional outcome like modified Rankin scale at discharge and 3 months in future prospective cohorts to link predicted HE risk to functional outcomes and decision-making for surgical intervention. For now, we carefully phrase conclusions that the present models might assist early risk stratification but shouldn’t by themselves determine surgical decisions until validated against functional outcomes.
CONCLUSION
In conclusion, this study validated radiological models on handcrafted radiomics features and automatic deep learning features in the prediction of HE. The clinical-radiological model for prediction of HE achieved the satisfactory discrimination performance. Otherwise, an integrated model for hospital mortality represented the great prediction performance to provide valuable prognostic information. We believe our radiological models after further validation may assist treatment and surveillance for patients with ICH to optimize patients’ outcomes.
Footnotes
Provenance and peer review: Invited article; Externally peer reviewed.
Peer-review model: Single blind
Specialty type: Radiology, nuclear medicine and medical imaging
Country of origin: China
Peer-review report’s classification
Scientific Quality: Grade C
Novelty: Grade B
Creativity or Innovation: Grade C
Scientific Significance: Grade C
P-Reviewer: Qu CS, Chief, China S-Editor: Bai Y L-Editor: A P-Editor: Lei YY
Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, McKeown A, Yang G, Wu X, Yan F, Dong J, Prasadha MK, Pei J, Ting MYL, Zhu J, Li C, Hewett S, Dong J, Ziyar I, Shi A, Zhang R, Zheng L, Hou R, Shi W, Fu X, Duan Y, Huu VAN, Wen C, Zhang ED, Zhang CL, Li O, Wang X, Singer MA, Sun X, Xu J, Tafreshi A, Lewis MA, Xia H, Zhang K. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning.Cell. 2018;172:1122-1131.e9.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 2132][Cited by in RCA: 1778][Article Influence: 254.0][Reference Citation Analysis (5)]
Boulouis G, Morotti A, Brouwers HB, Charidimou A, Jessel MJ, Auriel E, Pontes-Neto O, Ayres A, Vashkevich A, Schwab KM, Rosand J, Viswanathan A, Gurol ME, Greenberg SM, Goldstein JN. Association Between Hypodensities Detected by Computed Tomography and Hematoma Expansion in Patients With Intracerebral Hemorrhage.JAMA Neurol. 2016;73:961-968.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 131][Cited by in RCA: 188][Article Influence: 20.9][Reference Citation Analysis (0)]
Simonyan K, Zisserman A.
Very Deep Convolutional Networks for Large-Scale Image Recognition. 2014 Preprint. Available from: arXiv: 14091556.
[PubMed] [DOI] [Full Text]
He K, Zhang X, Ren S, Sun J.
Deep Residual Learning for Image Recognition. 2015 Preprint. Available from: arXiv: 1512.03385.
[PubMed] [DOI] [Full Text]
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z.
Rethinking the Inception Architecture for Computer Vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016 Jun 27-30; Las Vegas, NV. New York: IEEE Xplore, 2016: 2818-2826.
[PubMed] [DOI]
Szegedy C, Ioffe S, Vanhoucke V, Alemi A.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In: Singh S, Markovitch S. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 2017 Feb 4-9; San Francisco California. New York: ACM, 2017: 4278-4284.
[PubMed] [DOI]
Stein J, Huerta K. When looking at a non-contrast head CT, what actually appears white in an acute hemorrhagic stroke?Cal J Emerg Med. 2002;3:70-71.
[PubMed] [DOI]
Tran AT, Zeevi T, Haider SP, Abou Karam G, Berson ER, Tharmaseelan H, Qureshi AI, Sanelli PC, Werring DJ, Malhotra A, Petersen NH, de Havenon A, Falcone GJ, Sheth KN, Payabvash S. Uncertainty-aware deep-learning model for prediction of supratentorial hematoma expansion from admission non-contrast head computed tomography scan.NPJ Digit Med. 2024;7:26.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 4][Cited by in RCA: 17][Article Influence: 8.5][Reference Citation Analysis (0)]