Yang YH, Li Y. Magnetic resonance imaging-based deep-learning radiomics score for survival prediction and risk stratification in pediatric hepatoblastoma receiving surgical resection. World J Radiol 2026; 18(1): 115503 [DOI: 10.4329/wjr.v18.i1.115503]
Corresponding Author of This Article
Yuan Li, Laboratory of Digestive Surgery, State Key Laboratory of Biotherapy and Cancer Center, Department of Pediatric Surgery, West China Hospital, Sichuan University, No. 37 Guoxue Alley, Chengdu 6100000, Sichuan Province, China. l13258389785@126.com
Research Domain of This Article
Pediatrics
Article-Type of This Article
Retrospective Cohort Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Jan 28, 2026 (publication date) through Jan 28, 2026
Times Cited of This Article
Times Cited (0)
Journal Information of This Article
Publication Name
World Journal of Radiology
ISSN
1949-8470
Publisher of This Article
Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA
Share the Article
Yang YH, Li Y. Magnetic resonance imaging-based deep-learning radiomics score for survival prediction and risk stratification in pediatric hepatoblastoma receiving surgical resection. World J Radiol 2026; 18(1): 115503 [DOI: 10.4329/wjr.v18.i1.115503]
World J Radiol. Jan 28, 2026; 18(1): 115503 Published online Jan 28, 2026. doi: 10.4329/wjr.v18.i1.115503
Magnetic resonance imaging-based deep-learning radiomics score for survival prediction and risk stratification in pediatric hepatoblastoma receiving surgical resection
Yu-Han Yang, West China Hospital, Sichuan University, Chengdu 6100041, Sichuan Province, China
Yuan Li, Laboratory of Digestive Surgery, State Key Laboratory of Biotherapy and Cancer Center, Department of Pediatric Surgery, West China Hospital, Sichuan University, Chengdu 6100000, Sichuan Province, China
Author contributions: Yang YH and Li Y contributed to study conception and design; Yang YH contributed to data acquisition, analysis, and data interpretation, drafting of the manuscript; Li Y contributed to critical revision.
Institutional review board statement: This retrospective study involving human participants was reviewed and approved by the Institutional Review Boards of the West China Hospital, Sichuan University and the First Hospital of Liangshan. All procedures were conducted in accordance with the ethical standards of the institutional and/or national research committees and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent statement: The requirement for written informed consent was waived by the institutional review boards of both participating institutions because the study was retrospective, used existing clinical and imaging records, and analyzed de-identified data.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.
Data sharing statement: De-identified individual participant data that underlie the results reported in this article are available from the corresponding author upon reasonable request. Data sharing is subject to approval by the relevant institutional review boards and execution of a data-use agreement to ensure protection of patient privacy and compliance with applicable regulations. Due to institutional policies and patient privacy considerations, raw imaging data or any data containing potentially identifying information will not be publicly released.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Yuan Li, Laboratory of Digestive Surgery, State Key Laboratory of Biotherapy and Cancer Center, Department of Pediatric Surgery, West China Hospital, Sichuan University, No. 37 Guoxue Alley, Chengdu 6100000, Sichuan Province, China. l13258389785@126.com
Received: October 20, 2025 Revised: November 12, 2025 Accepted: December 15, 2025 Published online: January 28, 2026 Processing time: 100 Days and 14.1 Hours
Abstract
BACKGROUND
Children with hepatoblastoma (HB) remain high heterogeneity with distinct survival outcomes among individuals after surgical resection. Therefore, it’s essential to identify high-risk patients with poor outcomes before surgery in order to add appropriate neoadjuvant chemotherapy for improving prognosis.
AIM
To evaluate the performance of a deep learning (DL)-based radiomics (DLBR) score at predicting event-free survival (EFS) in patients with HB at the early stage who underwent surgical resection.
METHODS
A total of 106 patients were included retrospectively at two hospitals who underwent magnetic resonance imaging scanning and surgical excision, and were assigned into the training cohort (n = 74) from one institution and the testing cohort (n = 32) from the other institution. The widely adopted clinicopathologic variables were collected, and the magnetic resonance imaging-derived DL-based features were extracted through automatic segmentation. We developed a DLBR score based on DL-based features and an integrated clinical-DL nomogram model, and validated them externally.
RESULTS
The DLBR score was generated incorporating four DL-based features, including three TI-derived features and one T2-derived feature. The integrated clinical-DL nomogram was constructed based on the Pretreatment Extension of Disease stage, alpha-fetoprotein concentration, and the DLBR score. The integrated nomogram had relatively better prognostic and calibration abilities and less opportunity for prediction error compared with the clinicopathologic predictors alone and the DLBR score alone in both training and external validation. Additionally, the DLBR score could stratify the HB patients into two EFS-related risk subgroups accurately, and showed fine distinction abilities to identify patients with different survival outcomes within identical subgroups of clinical predictors.
CONCLUSION
The DLBR score acted as a noninvasive and reliable tool for predicting EFS in early-stage HB patients receiving survival resection, and might instruct therapeutic plans for improving prognosis.
Core Tip: In the present research, we generated a deep learning (DL) based radiomics score in the prediction of event-free survival for children with hepatoblastoma receiving surgical resection from multiple institutions, and developed an integrated clinical-DL nomogram based on widely accepted clinicopathologic predictors and the DL based radiomics score. The integrated nomogram showed great prediction performance for event-free survival with external validation, which might instruct therapeutic interventions. Further research is needed to validate the risk identification performance for improving clinical practicability and generality.
Citation: Yang YH, Li Y. Magnetic resonance imaging-based deep-learning radiomics score for survival prediction and risk stratification in pediatric hepatoblastoma receiving surgical resection. World J Radiol 2026; 18(1): 115503
Hepatoblastoma (HB) is the most common primary hepatic malignancy in childhood, with a new onset of 1.5 cases/million people worldwide[1]. For early-stage HB without distant metastasis, complete resection of primary hepatic lesions has been considered as the first-line treatment[2]. However, HB lesions remain highly heterogeneous with distinct survival outcomes among individuals after surgical resection[3]. Therefore, it is essential to identify high-risk patients with poor outcomes before surgery in order to add appropriate neoadjuvant chemotherapy for improving prognosis[4].
Nowadays, there have been several prognosis-related risk identification systems established by cooperative research teams in prospective studies, such as the Children’s Oncology Group, the International Childhood Liver Tumors Strategy Group, the German Society for Pediatric Oncology and Hematology, the Children’s Hepatic tumors International Collaboration (CHIC), and the Japanese Study Group for Pediatric Liver Tumors[5-8]. These systems have provided personalized therapeutic strategies based on pretreatment risk stratification[3,9]. Notably, the Pretreatment Extension of Disease (PRETEXT) system introduced by the International Childhood Liver Tumors Strategy Group has been explored as a remarkable achievement for preoperative evaluation of HB based on imaging materials[10]. The risk stratification model integrating PRETEXT system, alpha-fetoprotein (AFP) concentration, and imaging-derived annotation factors has been found significant prognostic value based on large-population pooled trial data from cooperative teams[3]. Nevertheless, the imaging-based assessment of the PRETEXT system has been present with unsatisfactory accuracy and a tendency of over staging with the naked eye[11]. Meanwhile, the current model has yielded insufficient stratification abilities in discrimination of HB patients for appropriate therapeutic options, which was developed based on only low-latitude clinical and qualitative imaging data, with the ignorance of underlying high-latitude features in the imaging data[12]. Considering the inclusion of only low-latitude clinical and qualitative imaging data in previous model construction, the risk stratification system has the potential to be upgraded with the adoption of high-latitude features in the imaging data.
Medical imaging plays a vital role in the assistance of clinical decisions, especially after the emergence of the radiomics method. Radiomics analysis extracts high-throughput quantitative features from medical images to reflect tumoral characteristics with the increasing radiomics feature screening methods and the improving model architectures[13]. Radiomics models combined with deep learning (DL) features automatically extracted by convolutional neural networks (CNNs), referred to as DL-based radiomics (DLBR) models, have shown improved performance efficiency in evaluating tumor prognosis compared with conventional radiomics models by interpretation of tumor heterogeneity[14,15]. It is crucial to identify HB heterogeneity for accurate risk stratification and prognosis prediction[12]. However, the current system provided inadequate evidence about tumoral heterogeneity for the case that there were obvious prognosis variations in HB patients with an identical PRETEXT stage[16]. According to this situation, DLBR methods might facilitate predicting HB prognosis and distinguishing low- and high-risk patients within an identical subgroup of widely accepted factors. In the present study, we aimed to evaluate the prognostic value of the DLBR score based on magnetic resonance imaging (MRI)-derived images to predict event-free survival (EFS) in patients with early-stage HB receiving surgical resection. In addition, we incorporated the widely accepted risk stratification system to assess the additional value of the DLBR score and to construct an integrated clinical-imaging model for model comparison.
MATERIALS AND METHODS
Patients
This retrospective cohort study was approved by the Institutional Review Boards of West China Hospital, Sichuan University (Approval No. HX-20230202) and the First Hospital of Liangshan (Approval No. LSZDY-202306-0090). The requirement for written informed consent was waived by both the Institutional Review Boards due to the retrospective nature of the study. We retrospectively recruited pediatric HB patients who underwent complete surgical resection of primary lesions in 2 hospitals for tumors from January 2009 to June 2019. Histopathological evidence of included patients proven to have HBs was obtained from surgical excisions and was reviewed by two experienced pathologists to evaluate the histology subtype with consensus. The inclusion criteria were as follows: (1) Age less than 18 years; (2) Diagnosed as HB with histopathologic evidence; (3) Only receiving complete surgical resection without neoadjuvant and/or adjuvant treatment; (4) Preoperative MRI within 2 weeks before surgery; and (5) Available medical records and follow-up data. The exclusion criteria were as follows: (1) Receiving other therapy; (2) Inadequate clinical or imaging data; and (3) Low-quality MRI images. Finally, 106 patients were included in the present study; of 74 patients from one institution were distributed into the training cohort for model development, and 32 patients from the other institution were distributed into the testing cohort for external validation.
We collected clinical information, including sex, age, tumor size, AFP concentration, and annotation factors, and preoperative MRI images for subsequent analysis. The serum AFP concentration was classified into two groups: ≤ 1000 ng/mL and > 1000 ng/mL. Tumor stage and annotation factors were reclassified based on the 2017 PRETEXT systems with respect to the presence of vascular involvement (V, hepatic vein/inferior vena cava; P, portal vein), multifocality (F), tumor rupture (R), extrahepatic tumor extension (E), the involvement of the caudate lobe (C), lymph-node metastases (N), and distant metastases (M)[3]. The distant metastatic status was evaluated by chest computed tomography (CT) and brain MRI scans before treatment. Tumor parameters were recorded as maximum tumor size based on preoperative MRI data, and the lesion with the maximum parameter was selected to evaluate tumor size for cases with multifocality.
Follow-up
In the present study, EFS was considered as the primary endpoint, defined as the period from the surgery date to the date of recurrence, development of a second malignancy, disease progression, death, or the last follow-up[3]. Patients received MRI and/or CT follow-up examinations with a schedule of every 3-6 months for the first 2 years after surgery and every 6 months subsequently. The duration of follow-up ranged from 36-122 months, with a median duration of 76 months. The follow-up was censored in May 2022.
MRI sequence
All eligible patients received preoperative conventional MRI examinations, including axial T1WI and axial FS-T2WI acquisition. The scanning procedures were performed by GE HDx 3.0T (GE Medical Systems) and Siemens 3.0T Magnetom Skyra. The MRI scanning parameters were set as repetition time of 500-600 ms and echo time of 10-15 ms for the T1WI sequence, and repetition time of 2400-4500 ms and echo time of 70-120 ms for the FS-T2WI sequence. These two weightings represented a slice thickness of 3-5 mm, a slice spacing of 1 mm, a matrix of 320 × 320, and a viewing field of 200-400 mm. Conventional T1WI and T2WI sequences were used in the present study because these sequences were consistently available across scanners and centers in our retrospective cohort, facilitating multicenter analysis. Diffusion-weighted imaging (DWI) and perfusion/contrast-enhanced sequences were variably acquired with heterogeneous protocols across centers. Concerning the inconsistent availability and parameters, diffusion-weighted imaging and dynamic contrast-enhanced sequences were excluded to avoid introducing additional acquisition-related variability. All MRIs included in the study were not uniformly contrast-enhanced, so contrast-enhanced images were not used for DL feature extraction owing to heterogeneity in timing and sequence parameters. To address multi-center variability, all images were resampled to 1 mm isotropic voxels using SimpleITK vX, and intensity harmonization was applied using Nyul standardization. Subsequent grey-level discretisation used a fixed bin width of 25. ComBat harmonization was then applied to extracted features to further remove scanner effects. The consistency of features was assured after computing the coefficient of variation and Intra-Class Correlation Coefficient across scanners before and after harmonization to quantify improvement.
Tumor segmentation and feature extraction
The work scheme of this study is shown in the flow chart (Figure 1). The regions of interest (ROIs) of HB lesions were segmented automatically. Before segmentation, pre-processing was performed on the MRI image stack to focus only on the target anatomical part to perform the segmentation algorithm efficiently and faster. The bounding box covering the target anatomical part along the image stack was created by a thresholding operation and cropped throughout all slices programmatically. Next, images were normalized to a total of 256 grey levels (0-255) to facilitate further operation. Automatic tumor segmentation was performed by unsupervised clustering-based algorithms, including Simple-lineariterative-clustering Superpixels (SLIC-S)[17] and Fuzzy c-means clustering (FCM)[18]. SLIC-S generates supervoxels by clustering voxels based on intensity similarity and proximity in the plane[17]. FCM employs fuzzy partitioning of voxel intensities to classify the voxels into a specified number of clusters with a measurement of cluster centers such that the dissimilarity measure among clusters is minimized[18]. Tumor regions were observed as a hyperintense region, and within the tumor, depending upon tumor heterogeneity, two to three levels of different intensities were observed. Healthy muscle tissues, bone, and background were observed to have lower signal intensity levels. SLIC-S and FCM are both algorithms that were applied in MRI images to generate clusters of voxels depending upon their intensity similarity, and hyperintense clusters were selected programmatically to segment the tumor volume[19]. Automatic segmentation results were visually checked by two pediatric radiologists (5 years and 7 years of experience). A total of 12 cases were randomly selected (10% of the cohort), and the automatic segmentation was compared against manual contours [mean Dice coefficient = 0.906 (SD = 0.167)]. Where gross segmentation failure was identified (Dice < 0.6), manual correction was applied prior to feature extraction for segmentation quality control.
Figure 1 Flowchart of deep learning features extraction, and deep learning-based radiomics score generation and related prognostic models construction for predicting event-free survival in hepatoblastoma patients receiving surgical resection.
A: Flowchart of deep learning features extraction; B: Related prognostic models construction. DL: Deep learning; MR: Magnetic resonance; DLBR: Deep learning-based radiomics.
The process of extracting DL features was performed based on automatically segmented ROIs using a pretrained ResNet CNN architecture with 34 Layers (ResNet34) on ImageNet and fine-tuned on our HB MR image ROIs[20]. ResNet34 pretrained on ImageNet was used as a frozen feature extractor, that only the classifier head not retrained. The ResNet34 architecture was pre-trained on Pytorch 1.4.0 using transfer learning. The original ROI resolution was normalized to 224 × 224 with flipping and rotation for adaptation of the network’s input size. The model was trained by updating the network weights using a cross-entropy loss function and the adaptive moment estimation optimizer with a learning rate of 0.001 for 1000 epochs and a batch size of 16. The fixed network parameters of the model with the highest accuracy were used as a DL-based feature extractor, and then 512 DL-based features were extracted from the penultimate layer of the ResNet34 CNN for each patient. The same strategy was used for T1WI and T2WI sequences, respectively. After elimination of the null features, 93 DL-based features, 40 features from the T1WI sequence, and 53 features from the T2WI sequence were ready for the subsequent process. To mitigate overfitting risks due to sample size, we performed internal validation using 5-fold cross-validation for feature selection and model training and computed average performance metrics [concordance index (C-index), area under the receiver operating characteristic curve] across folds. In addition, bootstrap resampling (n = 1000) was used to estimate optimism-corrected C-index and confidence intervals (CIs) for the training cohort. Post-hoc power for the observed C-index was calculated assuming alpha = 0.05.
Predictive signature generation
All DL-based features were standardized to z scores in the form of mean and standard deviation for the training and testing cohorts, respectively. A combat compensation methodology was used to retain outperforming features in texture patterns through filtering the inconsistent features affected by different MRI equipment and scanning parameters[21]. All 93 DL-based features were enrolled to build the predictive signature, DLBR score. The feature selection was performed in the training with a three-step procedure. First, the characteristic parameters associated with EFS by univariate Cox regression analysis, and the features with P < 0.05 were selected for analysis. Second, the minimum redundancy maximum relevance algorithm was used to select the 10 top features with the strongest correlations and the least redundancy. Finally, the further screening of the parameters and generation of the DLBR score were performed by the least absolute shrinkage and selection operator Cox regression algorithm. The number of events was also reported to ensure the final prognostic model respected recommended events-per-variable guidance: The integrated nomogram included 34 events and 3 predictor degrees of freedom. Additionally, internal cross-validation and bootstrap analyses were performed to assess the stability of the selected features and the predictive metrics. The features selected in the training cohort were applied to the testing cohort following the same procedure. We used X-tile (version 3.6.1) to identify the optimal cutoff value for DLBR score in the training cohort by which patients were divided into low- and high-score groups for both training and testing cohorts[22].
Model development and validation
The clinical characteristics with statistical significance were selected by the univariate Cox regression analysis in the training cohort. Then, a clinical model based on the selected clinical characteristics was built by multivariate Cox regression analysis and backward stepwise elimination using the Akaike information criterion for both training and testing cohorts. Otherwise, we constructed an integrated nomogram model in the prediction of EFS, integrating statistically significant clinical characteristics and DLBR score for both training and testing cohorts. The EFS probabilities of different groups of clinical factors and DLBR score were estimated using the Kaplan-Meier method and the log-rank test. The survival probabilities of the DLBR score were also evaluated within different risk groups.
Statistical analysis
We report the number of EFS events: In the full cohort, n-events = 34 (34/106, 32.08%); training cohort n-events = 23 (23/74, 31.08%); testing cohort n-events = 11 (11/32, 34.38%). Given the mild imbalance, we used standard Cox regression for time-to-event modelling. For feature selection, univariate Cox filtering, minimum redundancy maximum relevance, and least absolute shrinkage and selection operator-Cox were used to inherently regularize. In sensitivity analyses, we re-ran the least absolute shrinkage and selection operator with event-weighting to examine the stability of selected features, which resulted in indicated consistency. The categorical variables were present with counts and corresponding proportions, and the continuous variables were present with median value and corresponding interquartile range (IQR). The baseline comparisons of clinical information and DLBR score were performed between the training and testing cohorts using the Mann-Whitney U test for continuous variables and the Fisher exact test or χ2 test for categorical variables, respectively. The predictive accuracy of different risk factors was assessed by the C-index and the time-dependent receiver operating characteristic (ROC) at different time points ranging from 1 month to 90 months with an interval of 1 month[23]. The integrated Brier score (IBS) was used to evaluate prediction errors using the “Boot632plus” splitting method, which reflected the distance between the predicted and observed survival probabilities[24]. The lower IBS value represented a better predictive value of overall model performance. Calibration intercepts and slopes were estimated at 3-year and 5-year EFS using bootstrap validation (1000 resamples). Time-specific Brier scores and IBSs were computed, and all metrics and their 95%CIs were obtained by bootstrap. Reclassification between the clinical and integrated models was assessed using continuous net reclassification improvement and integrated discrimination improvement with 95%CIs from 1000 bootstraps. Decision-curve analysis was conducted for a 5-year EFS to evaluate clinical usefulness for comparing the net benefit of the clinical model vs the integrated nomogram across threshold probabilities of 5%-30%. Statistical analysis was completed using R software (version 3.6.3, available from: https://www.r-project.org/), and P-values less than 0.05 with two tails were considered statistically significant.
RESULTS
Baseline characteristics
In the present study, there were no statistically significant differences found in clinical characteristics between the training and testing cohorts with P > 0.05 (Table 1). There was also no significant difference in DLBR score between the two cohorts, with the median DLBR score of 0.002 with IQR of 0.050 for the training cohort and the median DLBR score of 0.002 with IQR of 0.058 for the testing cohort (P = 0.625). Internal validation via 5-fold cross-validation yielded a mean C-index = 0.65 (SD = 0.04). Apparent C-index in the training cohort = 0.669, and optimism estimated from 1000 bootstraps = 0.021, yielding optimism-corrected C-index = 0.648 (95%CI: 0.611-0.685) that which suggested limited optimism from model fitting. Post-hoc power to detect C-index = 0.67 vs null 0.5 was 32% in training and 34% in testing that the external validation (n = 32) was underpowered and CIs were wide (C-index = 0.696, 95%CI: 0.688-0.704).
Table 1 Demographic characteristics of hepatoblastoma patients receiving surgical resection in the training and testing cohorts, n (%).
There were 4 variables prepared for the generation of the DLBR score, including T1-DL175 from the T1WI sequence, and T2-DL88, T2-DL117, and T2-DL281 from the T2WI sequence. The median DLBR score of 0.002 for the training cohort and the median DLBR score of 0.002 for the testing cohort. There was also no significant difference in DLBR score between the training and testing cohorts (P = 0.625). Two clinical parameters, PRETEXT stage and serum AFP concentration, were referred to as significant clinical predictors for EFS according to the univariable Cox regression analysis in the training cohort. These two clinical variables were combined to construct the clinical model. The PRETEXT stage and serum AFP concentration were also found statistically significant in the testing cohort. The integrated clinical-DL model was constructed based on two significant clinical predictors, the PRETEXT stage and serum AFP concentration, and DLBR score shown as a nomogram to estimate individualized risks (Figure 2A).
Figure 2 Nomogram and Kaplan-Meier plots of deep learning-based radiomics score for event-free survival in hepatoblastoma patients receiving surgical resection.
A: Nomogram developed by significant clinical variables and deep learning-based radiomics score to predict event risks in the training cohort; B-D: Kaplan-Meier plots of deep learning-based radiomics score (B), and stratified by 2017 PRE-Treatment EXTent of tumor stage (C) and serum alpha-fetoprotein concentration (D) on event-free survival compared by log-rank tests in the training (left) and testing (right) cohorts, respectively. PRETEXT: 2017 PRE-Treatment EXTent of tumor; AFP: Alpha-fetoprotein; DLBR: Deep learning-based radiomics.
The predictive performance of clinical variables, DLBR score, and different models for the training and testing cohorts is shown in Table 2 and Figure 2B-D. The C index of DLBR score was 0.610 (95%CI: 0.599-0.620) in the training cohort, and 0.642 (95%CI: 0.633-0.652) in the testing cohort. The C index values of the integrated nomogram were 0.669 (95%CI: 0.661-0.677) in the training cohort and 0.696 (95%CI: 0.688-0.704) in the testing cohort. Calibration plots of all predictors and models for predicting 3-year and 5-year EFS of HB patients are shown in Figure 3A and B. The DLBR score, the clinical model, and the integrated nomogram model showed consistency between observational and predicted results with good calibration capacities. The results of time-dependent ROC analysis were shown in Table 3 and Figure 3C. According to the time-dependent ROC analysis, the area under the receiver operating characteristic curve values of the integrated nomogram model was higher than the other predictors or models at diverse time points in both model training and external validation. The prediction errors were shown in Figure 3D with the IBS values of DLBR score and the integrated nomogram of 0.246 and -0.634 in the training cohort, and 0.378 and -0.760 in the testing cohort, respectively. The integrated model showed a calibration slope at 5 years = 0.98 (95%CI: 0.85-1.12), intercept = -0.03 (95%CI: -0.12 to 0.06) with continuous net reclassification improvement at 5 years = 0.21 (95%CI: 0.05-0.36), integrated discrimination improvement = 0.034 (95%CI: 0.010-0.058). For the 5-year EFS, the integrated nomogram provided a higher net benefit than the pure clinical model, PRETEXT + AFP, between thresholds 5%-30%. At a 10% threshold, net benefit was 0.027 vs 0.012 for the clinical model (difference 0.015).
Figure 3 The calibration and discrimination performance of deep learning-based radiomics score for event-free survival in hepatoblastoma patients receiving surgical resection.
A and B: Calibration curves for deep learning-based radiomics (DLBR) score, the clinical model, and the integrated nomogram model yielding agreement degrees between predicted and observational survival probabilities of event-free survival (EFS) for patients in the training (left) and testing (right) databases at the time of 36 months (A) and 60 months (B). The gray line of y = x represents a perfect predictive power by an ideal model. The fit goodness with this diagonal line coincided with the model’s predictive performance; C: Time-dependent Harrell’s C-indexes for DLBR score, the clinical model, and the integrated nomogram model on EFS for the training (left) and testing (right) cohorts; D: Time-dependent Brier scores in estimation of prediction errors for DLBR score, the clinical model, and the integrated nomogram model on EFS for the training (left) and testing (right) cohorts. DLBR: Deep learning-based radiomics; AUC: Area under the receiver operating characteristic curve.
Table 2 Predication performance of clinical variables, deep learning-based radiomics score, and integrated nomogram models for prognostication of event-free survival in the training and testing cohorts of hepatoblastoma patients receiving surgical resection.
Table 3 Time-dependent area under the receiver operating characteristic curve value of clinical variables, deep learning-based radiomics score, and integrated nomogram models for prognostication of event-free survival in the training and testing cohorts of hepatoblastoma patients receiving surgical resection.
We identified two risk groups of DLBR score for EFS using the cutoff score of 0.0, deriving from the training cohort, including low-score and high-score groups. The survival differences of different risk groups of the DLBR score were shown in Table 4. There were significant differences found between low-score and high-score groups, with a mean survival time of 104.83 months vs 71.58 months for the training cohort (P < 0.001), and of 106.73 months vs 61.74 months for the testing cohort (P < 0.001). Within the identical subgroups of the PRETEXT stage, the DLBR score identified patients with significantly different survival outcomes. Within the identical subgroups of serum AFP concentration, there were significant survival differences between different risk groups of DLBR score.
Table 4 Kaplan-Meier analysis of deep learning-based radiomics score stratified by significant clinical variables for prognostication of event-free survival in the training and testing cohorts of hepatoblastoma patients receiving surgical resection.
The present study developed and validated the prognostic performance of the MRI-based DLBR score to predict EFS risk of HB patients receiving surgical resection. We found that the integrated clinical-DL model yielded greater prediction abilities for EFS than the widely accepted clinical predictors and the DLBR score only in terms of prediction accuracy, calibration capacity, and prediction error. Additionally, the DLBR score could stratify the HB patients into two EFS-risk subgroups accurately with great risk stratification performance in the external validation cohort, which indicated generalizability and stability. The DLBR score also showed fine distinction abilities to identify patients with different survival outcomes within identical subgroups of clinical predictors.
In this study, the quantitative DL-based radiomics approach was developed based on MRI images for accomplishing accurate pretreatment risk stratification in early-stage HB patients in order to assist individualized therapy and improve survival prognosis. We generated the DLBR score with an independent prediction value statistically for EFS in both model training and external validation. The DLBR score derived from pretreatment MRI images showed similar performance for EFS compared with the clinicopathologic predictors defined by postoperative specimens. The prediction performance of the nomogram integrating the DLBR score and the clinicopathologic predictors yielded superiority to that of the DLBR score and the clinicopathologic predictors only for individualized estimation of EFS risks, which was consistent with results from a previous study based on CT images[25]. Furthermore, the DLBR score classified patients into two risk groups, the low-risk group and the high-risk group, with statistically significant EFS differences. The patients in identical subgroups of clinicopathological predictors were identified with different survival outcomes by the DLBR score, which had great potential to supplement the present PRETEXT system for improving pretreatment risk evaluation. This study explored the DL-based radiomics approach in association with clinical outcomes of early-stage HB patients receiving surgical resection for the first time application in MRI images. Although prior radiomics studies, such as Jiang et al’s development of CT-based models[25], have demonstrated the potential of imaging-derived signatures for HB prognosis, our study has provided novel contributions in three respects. First, MRI, unlike CT, offers superior soft-tissue contrast and multiparametric information, for example, T1 and T2 signal characteristics that better capture intratumoral heterogeneity in pediatric liver tumors without additional ionizing radiation[26], which is particularly relevant in children. Second, we used DL-based feature extraction (ResNet34) to obtain high-dimensional features that capture hierarchical and spatial patterns not readily quantified by hand-crafted radiomics; this may enhance robustness to subtle image texture and morphological cues. Third, we integrated the MRI-derived DLBR score with established clinical variables (PRETEXT, AFP) to construct a clinically usable nomogram and performed external validation at a second center. Together, these points represent an incremental advance in leveraging MRI and DL features specifically for early-stage, surgically resected pediatric HB.
Recently, there have been an increasing number of studies published focusing on investigating the association between risk factors and HB survival[5-8]. The CHIC group has provided a preliminary risk stratification strategy based on several clinicopathologic factors, including AFP concentration, PRETEXT stage, and annotation factors in all-stage HB patients[3]. However, the present strategy might be present in little practicability due to too many variables used in risk estimation, limited value in fine distinction for early-stage HB patients, and little prognostic association with certain outcomes[27]. In addition, since localized and metastatic HB patients have been included in previous studies, the CHIC strategy might show insufficient value in precision treatment. The standards based on extremely heterogenous lesions from the extensive stage might be limited in the application of estimating individualized risks precisely in patients at a certain stage, especially for early-stage patients[8]. Therefore, the exploration of prognostic factors for early-stage HB patients was warranted. We included easy-to-reach indicators in routine examinations for accurate prognostication of EFS. According to suggestions from previous studies, the performance of morphologic characteristics on conventional imaging was investigated for potential association with survival outcome of early-stage HBs[11,28]. However, none of the morphologic features were found to have significant prognostic value with EFS in either training or external validation; this observation might be attributed to the rare presence of PRETEXT annotation factors in early-stage HBs receiving surgical excision. Otherwise, the PRETEXT stage and AFP concentration in the CHIC-related strategy were also found as independent predictors significantly in both training and testing cohorts in accordance with the previous studies[3,8]. We incorporated these two risk factors into a clinical model with increasing performance of risk estimation in HB patients receiving surgical excision. In the present study, Harrell’s C-index values reported with approximately 0.61-0.69 indicated modest discrimination. While these values didn’t imply perfect prognostic separation, the integrated nomogram demonstrated consistent modest improvements over clinical predictors alone, better calibration, and lower Brier scores. In the context of early-stage HB, where existing clinical predictors have limited discriminatory power, such incremental improvements might assist clinicians in combining with other clinical information. Our research group has also recognized uncertainty, given the sample size, that CIs around performance metrics have been reported and should be considered when interpreting clinical impact, which needs prospective validation with larger cohorts to establish clinical utility and decision thresholds.
Nowadays, much attention has been paid to applying high-throughput screening on imaging materials to assist clinical decisions. The radiomics approach has emerged to quantify tumor characteristics in a noninvasive and cost-effective manner by translating the spatial arrangement of imaging voxels and changes of signal strength into high-dimensional information[29]. The radiomics analysis represented considerable potential to indicate tumor progression and recurrence, considering the interpretation of intratumor heterogeneity[30]. The results in this study implied that the radiomics score combining four DL-based features had great prediction accuracy similar to that of the widely accepted clinicopathological predictors. The four DL-based features composing the DLBR score were abstract deep network activations and didn’t map one-to-one to conventional radiologic descriptors. Based on our pilot exploration of extracting standard handcrafted radiomics and computing Spearman correlations between DL features and classical radiomics, the identified overlap between activation maps from the ResNet34 model and classical radiomics has suggested these DL-based features respond to: (1) Intratumoral texture heterogeneity; (2) Lesion boundary irregularity and margin sharpness; (3) Relative signal intensity heterogeneity between central and peripheral tumor zones, and (4) Presence of internal septations or nodularity. Such qualitative interpretations could help bridge feature activations and radiological appearance. The CNN-based automatically quantitative features on MRI images could reflect the risk for HB progression, especially after enhancing feature stability by resampling methodology[31] and the combat compensation technology[21]. The DLBR score might capture underlying prognosis-related information through the extraction of high-dimensional data. Notably, the DLBR score, as a surrogate biomarker, could stratify patients into low- and high-risk groups. Patients with higher DLBR scores indicated a higher risk with worse EFS, which suggested the indication of intensive treatment for survival benefits.
Nomograms have been developed for the construction of reliable prediction tools to estimate individualized risk[32]. An integrated clinical-DL nomogram was developed and validated externally, incorporating risk factors mentioned above, including two clinicopathological predictors, PRETEXT stage and AFP concentration, and the DLBR score to promote prediction performance for EFS in early-stage HB patients. Overall, the integrated nomogram yielded robust prediction power for EFS than that of the DLBR score alone or the clinicopathologic predictors alone in terms of higher prediction accuracy, lower prediction error, and better calibration in both model training and validation. The concept of this combination model suggested that the comprehensive analysis of clinicopathologic and imaging data reinforced the prognostic ability for EFS and added prediction value to conventional clinical risk factors. The present nomogram was implemented as a powerful tool to assist pediatricians in facilitating personalized treatment and monitoring long-term follow-up for early-stage HB patients. For high-risk patients with high possibilities of progression and recurrence, local control rates and prognosis can be improved by neoadjuvant and/or adjuvant chemotherapy[33]. The present nomogram might provide support to formulate therapeutic plans, whether adding chemotherapy or not. The pediatricians might consider surgery alone without adjuvant therapy for decreasing chemotherapy toxicity in low-risk HB patients; the systematic treatment combining surgery and neoadjuvant and/or adjuvant chemotherapy might be an appropriate intervention for high-risk patients. Because the DLBR score was derived from preoperative MRI, it could potentially inform treatment planning prior to resection. For example, patients with high DLBR scores indicating increased risk of recurrence/progression might be considered for neoadjuvant or adjuvant chemotherapy or closer surveillance, whereas low-score patients might be spared chemotherapy toxicity and treated with surgery alone. The present study has emphasized that our cohort included only patients who ultimately underwent complete resection; prospective studies are needed to test whether DLBR could also guide the decision to offer neoadjuvant therapy or to select candidates for transplantation. Until then, the DLBR could be considered an adjunctive preoperative risk-stratification tool complementary to PRETEXT and AFP.
There exited some limitations in the present study. The study design of a multicenter retrospective study resulted in selection bias for subject inclusion, which we tried to decrease bias via strict inclusion and exclusion criteria. The acquisition of MRI images was inconsistently performed by different scanners, so the signal intensity standardization technology was applied to reduce the inconsistency in pixel parameters. Additionally, various HB histologic subtypes were included with a small sample size, which might decrease the practicability of the present model in each subtype due to high heterogeneity. Future studies will recruit more patients in all subtypes to enhance the generalizability of the model. We acknowledge that external validation was performed on a single-center cohort of 32 patients and that further external testing across multiple institutions is required to demonstrate broader generalizability. The internal cross-validation and bootstrap analyses included in the present study might provide additional assurances against overfitting; nevertheless, larger multi-institutional prospective studies are needed.
CONCLUSION
The present study generated a DLBR score in the prediction of EFS for HB patients receiving surgical resection, and developed an integrated clinical-DL nomogram based on widely accepted clinicopathologic predictors and the DLBR score. The integrated nomogram showed great prediction performance for EFS with external validation, which might instruct therapeutic interventions. Further research is needed to validate the risk identification performance for improving clinical practicability and generality.
Footnotes
Provenance and peer review: Invited article; Externally peer reviewed.
Peer-review model: Single blind
Specialty type: Radiology, nuclear medicine and medical imaging
Country of origin: China
Peer-review report’s classification
Scientific Quality: Grade B, Grade B, Grade C
Novelty: Grade A, Grade B, Grade C
Creativity or Innovation: Grade B, Grade B, Grade C
Scientific Significance: Grade B, Grade B, Grade C
P-Reviewer: Yang K, Director, China; Yu RQ, PhD, Associate Professor, Director, Lecturer, China S-Editor: Bai SR L-Editor: A P-Editor: Zheng XM
Meyers RL, Maibach R, Hiyama E, Häberle B, Krailo M, Rangaswami A, Aronson DC, Malogolowkin MH, Perilongo G, von Schweinitz D, Ansari M, Lopez-Terrada D, Tanaka Y, Alaggio R, Leuschner I, Hishiki T, Schmid I, Watanabe K, Yoshimura K, Feng Y, Rinaldi E, Saraceno D, Derosa M, Czauderna P. Risk-stratified staging in paediatric hepatoblastoma: a unified analysis from the Children's Hepatic tumors International Collaboration.Lancet Oncol. 2017;18:122-131.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 235][Cited by in RCA: 294][Article Influence: 32.7][Reference Citation Analysis (0)]
Zsiros J, Brugieres L, Brock P, Roebuck D, Maibach R, Zimmermann A, Childs M, Pariente D, Laithier V, Otte JB, Branchereau S, Aronson D, Rangaswami A, Ronghe M, Casanova M, Sullivan M, Morland B, Czauderna P, Perilongo G; International Childhood Liver Tumours Strategy Group (SIOPEL). Dose-dense cisplatin-based chemotherapy and surgery for children with high-risk hepatoblastoma (SIOPEL-4): a prospective, single-arm, feasibility study.Lancet Oncol. 2013;14:834-842.
[RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)][Cited by in Crossref: 185][Cited by in RCA: 229][Article Influence: 17.6][Reference Citation Analysis (0)]
Aronson DC, Schnater JM, Staalman CR, Weverling GJ, Plaschkes J, Perilongo G, Brown J, Phillips A, Otte JB, Czauderna P, MacKinlay G, Vos A. Predictive value of the pretreatment extent of disease system in hepatoblastoma: results from the International Society of Pediatric Oncology Liver Tumor Study Group SIOPEL-1 study.J Clin Oncol. 2005;23:1245-1252.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 150][Cited by in RCA: 130][Article Influence: 6.2][Reference Citation Analysis (0)]
Fuchs J, Rydzynski J, Von Schweinitz D, Bode U, Hecker H, Weinel P, Bürger D, Harms D, Erttmann R, Oldhafer K, Mildenberger H; Study Committee of the Cooperative Pediatric Liver Tumor Study Hb 94 for the German Society for Pediatric Oncology and Hematology. Pretreatment prognostic factors and treatment results in children with hepatoblastoma: a report from the German Cooperative Pediatric Liver Tumor Study HB 94.Cancer. 2002;95:172-182.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 197][Cited by in RCA: 166][Article Influence: 6.9][Reference Citation Analysis (0)]
Czauderna P, Haeberle B, Hiyama E, Rangaswami A, Krailo M, Maibach R, Rinaldi E, Feng Y, Aronson D, Malogolowkin M, Yoshimura K, Leuschner I, Lopez-Terrada D, Hishiki T, Perilongo G, von Schweinitz D, Schmid I, Watanabe K, Derosa M, Meyers R. The Children's Hepatic tumors International Collaboration (CHIC): Novel global rare tumor database yields new prognostic factors in hepatoblastoma and becomes a research model.Eur J Cancer. 2016;52:92-101.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 157][Cited by in RCA: 194][Article Influence: 17.6][Reference Citation Analysis (4)]
Towbin AJ, Meyers RL, Woodley H, Miyazaki O, Weldon CB, Morland B, Hiyama E, Czauderna P, Roebuck DJ, Tiao GM. 2017 PRETEXT: radiologic staging system for primary hepatic malignancies of childhood revised for the Paediatric Hepatic International Tumour Trial (PHITT).Pediatr Radiol. 2018;48:536-554.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 137][Cited by in RCA: 186][Article Influence: 23.3][Reference Citation Analysis (0)]
Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, McKeown A, Yang G, Wu X, Yan F, Dong J, Prasadha MK, Pei J, Ting MYL, Zhu J, Li C, Hewett S, Dong J, Ziyar I, Shi A, Zhang R, Zheng L, Hou R, Shi W, Fu X, Duan Y, Huu VAN, Wen C, Zhang ED, Zhang CL, Li O, Wang X, Singer MA, Sun X, Xu J, Tafreshi A, Lewis MA, Xia H, Zhang K. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning.Cell. 2018;172:1122-1131.e9.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 2132][Cited by in RCA: 1778][Article Influence: 254.0][Reference Citation Analysis (5)]
Peng H, Dong D, Fang MJ, Li L, Tang LL, Chen L, Li WF, Mao YP, Fan W, Liu LZ, Tian L, Lin AH, Sun Y, Tian J, Ma J. Prognostic Value of Deep Learning PET/CT-Based Radiomics: Potential Role for Future Individual Induction Chemotherapy in Advanced Nasopharyngeal Carcinoma.Clin Cancer Res. 2019;25:4271-4279.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 207][Cited by in RCA: 234][Article Influence: 33.4][Reference Citation Analysis (0)]
Selvathi D, Arulmurgan A, Seivi ST, Alagappan S.
MRI image segmentation using unsupervised clustering techniques. Proceedings of the Sixth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA'05); 2005 Aug 16-18; Las Vegas, NV, United States. United States: IEEE, 2005.
[PubMed] [DOI]
Baidya Kayal E, Kandasamy D, Sharma R, Bakhshi S, Mehndiratta A. Segmentation of osteosarcoma tumor using diffusion weighted MRI: a comparative study using nine segmentation algorithms.SIViP. 2020;14:727-735.
[PubMed] [DOI] [Full Text]
Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, Sanduleanu S, Larue RTHM, Even AJG, Jochems A, van Wijk Y, Woodruff H, van Soest J, Lustberg T, Roelofs E, van Elmpt W, Dekker A, Mottaghy FM, Wildberger JE, Walsh S. Radiomics: the bridge between medical imaging and personalized medicine.Nat Rev Clin Oncol. 2017;14:749-762.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 1825][Cited by in RCA: 3874][Article Influence: 430.4][Reference Citation Analysis (0)]