INVITED COMMENTARY ON HOT ARTICLES
Hepatocellular carcinoma (HCC) is the sixth most common cancer type and the third leading cause of cancer-related death worldwide[1]. The major risk factor of HCC is chronic infection with hepatitis B virus (HBV) and/or hepatitis C virus (HCV)[2]. So far, curative treatments for HCC include orthotopic liver transplantation, surgical resection and percutaneous ablation. However, the recurrence rates remain high and long-term survival is poor.
There are two types of HCC recurrence: early recurrence and late recurrence with different mechanisms. Early recurrence (< 2 years after the treatment) is mostly caused by metastasis and dissemination of primary HCC; while late recurrence (≥ 2 years after the treatment) mainly results from de novo tumors, as a consequence of field effect in diseased liver which is closely associated with high viral loads and hepatic inflammatory activities[3,4]. The treatment after curative therapy varies greatly, depending on individual’s profile[5]. The traditional prognostic markers of HCC include vascular invasion (both macroscopic and microscopic) which is the most significant factor, tumor size, number of nodules, α-fetoprotein level, degree of differentiation, and satellites[6]. Recent advancement in the field has shown that viral factors and inflammation-related conditions are apparently associated with HCC prognosis. Viral load, genotype C, viral mutations, and expression of inflammatory molecules in HBV-related HCC tissues are significantly associated with poor prognosis. Host-inflammation-related factors such as imbalance between intratumoral CD8+ T lymphocytes and regulatory T lymphocytes, T helper (Th)1 and Th2 cytokines in peritumoral tissues are also predictors of HBV-related HCC[7,8]. In addition, non-coding RNA also plays a significant role in HCC progression[9]. However, even after incorporating viral and other factors, the prediction power can not be optimized. Therefore, it is crucial to identify new prognostic markers to better approach opportunities for individualized therapeutics for HCC patients.
The application of high-throughput methods has provided new opportunities for analyzing the diversity and heterogeneity of cancers. Studies of microarray-based gene expression profiling in breast cancer have shown a great success and led to a working model for a breast cancer molecular taxonomy[10]. Gene expression signatures suceeded in prognosis prediction and treatment responses for HCC[11], and they are promising in developing personalized cancer medication[12]. Gene expression profiles may add new and important prognostic information beyond those provided by the standard clinical predictors. It is important to incorporate molecular information to more accurately predict early and overall recurrence of HCC.
We read with great interest the recent article by Villanueva et al[13]. In this article, the authors developed an integrated prognostic model combining genomic and clinicopathologic data to improve outcome prediction in single-nodule early HCC patients. They analyzed the prognostic power of 22 previously reported gene signatures in a cohort of 287 early-stage HCC patients. The analysis showed that the proliferation signature was the most prevalent prediction (number of patients identified with the signature/number of total patients); and there was a substantial association among three groups of signatures: (1) signatures related to increased cell proliferation, progression in cell cycle and activation of specific pathways; (2) signatures generated in the adjacent tissues; and (3) cytokeratin-19 gene signature. They found that G3 (tumoral) signature and poor-survival (non-tumoral) signature, along with satellites were independent predictors of early tumor recurrence and overall recurrence. They also reported that genomic profiles of tumor and adjacent tissues were complementary in refining the prediction.
Advanced imaging techniques such as computed tomography and magnetic resonance imaging have been used to detect vascular invasion and conduct satellite evaluation before surgery, which are helpful in the pre-operative prediction of HCC prognosis. Genomic profiling using tumor and adjacent tissues obtained by fine-needle biopsy may provide complementary and/or confirmative information, thus having a great potential when combined with imaging findings in the clinical practice. Many studies have used array-based gene expression profiling obtained from tumoral or non-tumoral tissues to predict HCC prognosis. However, the number and heterogeneity of the signatures hinders their further application. The study of Villanueva et al[13] attempted to address these issues. They evaluated the prognostic predictive power of previously reported gene signatures in an independent cohort, and then developed a “composite genomic-based prognostic model”. They further validated the stability of the model using samples from different sites of the same tumor nodule to test whether the genomic signature was consistent throughout different sites of a tumor[13]. This study presents a unified approach to systematically evaluate and independently validate HCC prognostic gene signatures; and the procedure developed in this study is conducive to the future studies of other complex disease.
Cancer gene signatures may indicate specific biological traits of heterogeneous tumor sub-phenotypes that cannot be identified by traditional methods. They may be associated with tumor biology and tumor microenvironment such as chromosomal instability, wounded stroma, or invasiveness, and possibly also linked to certain signaling pathways[14]. Gene signatures may have functional implications and may be predictive of response to specific therapeutic agents such as antiviral medications. Signatures identified in the study of Villanueva et al[13] (tumoral G3-proliferation signature and nontumoral poor-prognosis signature) reflect highly relevant biological events for outcome prediction and point out possible pathways to search for biomarkers as therapeutic targets. If used appropriately, gene signatures should be important complementary methods to current clinicopathological risk stratification systems[15]. Integrating gene signatures in HCC prognosis prediction may potentially improve patient outcomes, obtain a better understanding of the underlying HCC biology, and identify effective therapeutic options for an individual patient.
HCC is not a single disease at the molecular level. Using gene signatures to classify HCC into molecular subtypes with similar prognostic implication can guide clinical decision-making, particularly regarding therapy. However, these signatures lack prognostic power. The assignment of a given patient to a subgroup is strongly dependent on the gene signature used and the results from studies of a specific/single gene signature cannot necessarily be generalized. Furthermore, there are few genes overlapped among gene expression signatures which reflect common cellular phenotypes and yield similar predictions. Therefore, it is not appropriate to use overlapping in gene identity to measure the reproducibility of gene-expression profiles[16]. Thus, systematic evaluation of different gene expression datasets and validation in independent cohorts provide basis for identifying true genomic signatures that are associated with oncogenic pathway, tumor biology and its microenvironment. Nevertheless, there are problems of using gene signatures to classify sub-phenotypes and predict HCC prognosis. In the following section, we take the paper of Villanueva et al[13] for an example to discuss several imposing issues in the field.
First, the paper does not mention whether evaluation on the quality of the different gene signatures was used. These signatures were generated from different samples with different biological background. Different studies may vary greatly in study quality, such as patient selection criteria, RNA quality, follow-up criteria, definition of prognosis, treatment after surgery, etc. Patient differences including different staging and underlying conditions may reflect etiological differences, thus resulting in the heterogeneity of gene signatures. Prognostic accuracy might differ in tumors with different stages. Additionally, multiple end points, such as overall recurrence, early recurrence, late recurrence, overall survival, or metastasis-free survival, used in the analyses are also the source of heterogeneity. There is also the possibility of stromal contamination, namely, gene signatures derived from analysis of tumor specimens with a high proportion of adjacent tissue contamination, and vice versa. The general reproducibility of these signatures stands out as an important issue.
Second, it is inappropriate to directly combine datasets from different platforms and different experiments because of the non-biological experimental variation or batch effects. In the study of Villanueva et al[13], gene expression data were obtained from 3 high-throughput genomic platforms, and these datasets cannot be readily put together because of their heterogeneity. Again, the authors did not mention whether any standardization procedures were applied. In addition, the method used for integration and/or standardization of different platforms is also a challenge. How to choose a robust normalization method according to the features of the dataset to reduce the batch effect is essential for further computational analyses[17].
Third, the authors did not describe whether they applied the gene mapping procedure. Gene database updates with time, with the accumulation of information, the platform used several years ago may not be comparable to the gene database in service now. Without mapping, the genes in the 22 signatures produced at different time points may not correspond well. Accurately mapping and matching a gene across different signatures generated by different platforms at different time points is an important quality control step to enable the finding of true signatures.
Last but not least, the quality of survival analyses used to generate these signatures differs. The frequently used statistical methods, such as the significant analysis of microarray tool, the trend filter tool, and Cox’s proportional Hazard model, may contribute to the great variety of gene expression signatures[17]. Different studies also vary in terms of follow-up information collected, covariates adjusted in multivariate analysis, and non-informative censoring. These directly affect the gene signatures generated.
For gene signatures to be used in clinical practice to accurately predict HCC prognosis, the following procedures are required. For a start, there should be a standardization of tissue composition. Without appropriate and standardized samples, the further experiments to determine a robust signature will be difficult. For example, the variable selection procedure is crucial in developing reliable and reproducible gene signature because pre-analytical variables such as stromal component and tissue processing will directly affect gene expression profiles. In addition, to enable the usage of data by different researchers and future investigators, a detailed description of data processing and analytical methods is required. A further step is to establish unified high criteria for generating gene expression signatures. Moreover, it is also important to identify gene signatures to predict early and late recurrence of HCC. HCCs are a group of diverse and heterogeneous diseases. Gene expression patterns can provide a basis to distinguish sub-phenotypes within the heterogeneity subgroups characterized by conventional clinicopathological variables, and also present important information about individualization of therapy[18]. Viral mutations in the preS and the basal core promoter regions of HBV are significantly associated with HCC risk[19-23]. The HBV mutations including A1762T/G1764A, preS deletion at nt.107-141, and preS2 mutations in adjacent hepatic tissues and the HCV mutation such as M91L are significantly associated with poor prognosis of HCC[24-26]. The viral mutations should be reasonably integrated into the HCC prognosis-related gene signature.
To summarize, this paper drew our interests because gene expression signatures have shown great promise in classifying cancer subtypes and predicting prognosis. The Villanueva team has introduced an effective approach to systematically integrate different types of data for HCC prognosis prediction. With the increasing amount of data produced, there is an urgent need of standardized methods in systems biology to integrate descriptive data from cohort studies and other sources such as clinicopathological features, massive DNA and RNA parallel sequencing, and proteomics, along with functional data to guide therapeutic decisions. In addition, data on vascular features of HCC from imaging techniques may help select and validate true gene expression signatures associated with HCC prognosis. Future studies should also correlate these two non-invasive and innovative methods. It is still premature to use the current gene signatures for predicting HCC prognosis in the context of clinical practice. There is enormous work to be done for these gene signatures to be used in routine clinical practice and treatment decision making.