INTRODUCTION
Cancer is a public health problem worldwide[1]. Predictions suggest that 13 million people will die each year from cancer by 2030[2]. Tumor heterogeneity represents an important obstacle to establish efficient therapeutic strategies. Over the last decades, large-scale pan-genomic studies allowed to address tumor heterogeneity in multiple cancers and to provide a landscape of alterations occurring at multiple levels in tumor cells (e.g. at DNA, RNA and protein levels). Thus, international consortia have been initiated, including The Cancer Genome Atlas (TCGA) and its landmark cancer genomics program, which molecularly characterized over 84000 cases from 67 primary sites so far (https://portal.gdc.cancer.gov). Accordingly, TCGA and other cancer programs generated over 2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data. This explosive growth of data represented a major driving force to develop innovative artificial intelligence (AI) methods, including deep learning algorithms, capable of analyzing large and multifaceted datasets in an integrated and comprehensive way[3]. By using algorithms that imitate the thinking process, deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction and to discover intricate structure in large data sets[4]. These automated methods, popularized in the society by image or speech recognition algorithms, are now moving into the field of health, including cancer research. Indeed, innovative algorithms are developed to extract meaningful genomic patterns and to translate this conceptual basic information into clinical applications, notably to improve cancer diagnosis, prognosis prediction and treatment efficacy (Figure 1). Here, I briefly review some examples of supervised and unsupervised big data derived from TCGA programs and comment on how AI algorithms have been applied to improve the management of patients with cancer.
Figure 1 Artificial intelligence and omics to improve the management of patients with cancer.
Actual artificial intelligence algorithms are mainly fueled with clinical data (e.g. clinical records, computed tomography scan, magnetic resonance imaging) and omics data, as exemplified by those from The Cancer Genome Atlas consortium (e.g. genetic, epigenetic, transcriptomic, proteomic, metabolomics profiles). They pave the way for future models that will integrate personalized clinical information related to lifestyle of each patient, including exposome and microbiome, in order to improve cancer diagnosis, prognosis prediction and treatment efficacy. AI: Artificial intelligence; TCGA: The Cancer Genome Atlas.
BIG DATA FROM TCGA
TCGA programs represented a major advance in the field of cancer research, allowing both supervised analysis of specific cancers and unsupervised analysis of pan-cancer datasets. Thus, supervised comparative and comprehensive analyses that distinguished clinically relevant molecular subtypes were reported in several cancers, including gastrointestinal (GI) cancers[5], gynecologic and breast cancers[6], pancreatic[7] or liver[8] cancers. Unsupervised analyses have been also performed using pan-cancer datasets. By analyzing mutation profiles, copy-number changes, gene fusions, mRNA expression, and DNA methylation in 9125 tumors profiled by TCGA, a detailed landscape of oncogenic pathway alterations was notably charted in 33 cancer types. Tumors were stratified into 64 subtypes, and patterns of co-occurrence and mutual exclusivity alterations were identified using SELECT, a method that infers conditional selection dependencies between alterations from occurrence patterns[9]. Importantly, using dedicated knowledge base of clinically actionable alterations, it was shown that 57% of tumors had at least one alteration potentially targetable and 30% of tumors had multiple targetable alterations, indicating opportunities for combination therapy[9]. This type of information will be crucial in the current area of cancer precision medicine to develop effective combination therapies that address or prevent resistance to initially successful single agent therapies. Pan-cancer supervised analyses were also performed to highlight frequent alterations in key signaling pathways involved in cancer progression. transforming growth factor beta (TGFβ) is a pleiotropic cytokine that harbors a functional duality in cancer, i.e. exhibiting tumor suppressive features at early stages but switching toward pro-metastatic activities at late tumor stages[10]. Interestingly, genetic alterations in TGFβ signaling, affecting mostly metastatic-associated genes, were observed in 39% of pan-cancer TCGA cases, and were particularly enriched in GI cancers[11]. Specific algorithms have been also used to characterize the immune tumor microenvironment across 33 cancer types analyzed by TCGA. By integrating major immunogenomics methods, including analysis of genomic profiles, hematoxylin and eosin stained tumor sections and deconvolution analysis of mRNA sequencing (mRNA-seq) data, six immune subtypes were characterized, spanning multiple tumor types, with potential therapeutic and prognostic implications for cancer management[12]. Interestingly, one so-called TGFβ dominant subtype, displayed the highest TGFβ signature and a high lymphocytic infiltrate. This observation is particularly relevant with the emergence of effective immunotherapies, including the recent development of an innovative immuno-therapeutic that simultaneously blocks the PD-L1 checkpoint protein and the TGFβ signaling pathway[13].
From a basic point of view, several efforts have been made also to integrate multi omics data and to provide a better understanding of tumor biology. As an example, a deep learning-based predictive model using deep denoising auto-encoder and multi-layer perceptron was developed to quantitatively capture how genetic and epigenetic alterations correlate with directionality of gene expression in liver cancer[14]. Similarly, an innovative one-class logistic regression machine-learning algorithm was used to identify stemness features associated with oncogenic dedifferentiation[15]. Interestingly, an unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells was highlighted in the tumor microenvironment[15]. The analysis of gene regulatory networks from available omics data is a challenging task given that biological data is prone to different kinds of noise and ambiguity. Soft computing tools, such as fuzzy sets, evolutionary strategies, and neurocomputing, have been found to be helpful in providing low-cost, acceptable solutions in the presence of various types of uncertainties[16].
AI AND OMICS FOR CANCER DIAGNOSIS AND PROGNOSIS
Cancer diagnosis using deep learning has been recently reviewed[17]. Soft computing techniques also provided solutions for cancer, regarding diagnosis, prediction, inference and classification[18,19,20]. The approaches are mainly based on segmentation processes using convolutional neural networks (CNN) in clinical images notably acquired from computed tomography (CT) and magnetic resonance imaging (MRI). AI allows integrating quantitative, multiparametric and functional imaging data to automatically recognize complex patterns and to provide quantitative, rather than qualitative, assessments of radiographic characteristics[21]. A classification of skin lesions using a single CNN, trained end-to-end from images directly, using only pixels and disease labels as inputs, nicely illustrates the interest and the power of AI algorithms[22]. Indeed, a CNN trained using a dataset of 129450 clinical images (2032 different cases) was capable of classifying skin lesions with a level of competence comparable to dermatologists[22]. By helping clinicians in characterizing early benign and/or malignant lesions, AI recently emerged as the next step towards precision pathology. Screening programs for early detection of colorectal cancer (CRC) have been shown to reduce mortality in multiple studies. Thus, a machine learning-based algorithm (MeScore) was trained to predict the occurrence of CRC and to identify a group of individuals at a high risk for CRC. Remarkably, MeScore can help identifying individuals in the population who would benefit most from CRC screening, including those with no clinical signs or symptoms of CRC[23]. In another study, a total of 1970 whole slide images of 731 cases of nasopharyngeal carcinoma were divided into training, validation and testing sets. A CNN model was trained to classify images into three categories: Chronic nasopharyngeal inflammation, lymphoid hyperplasia and nasopharyngeal carcinoma. Remarkably, the model equals the senior pathologist when considered in terms of accuracy, specificity, sensitivity, area under the curve and consistency[24]. Thus, this couple of examples suggests that deep learning algorithms could potentially assist pathologists in clinical practice by providing a second opinion and thus increasing consistency on the diagnosis.
Gene expression profiling has been extensively used to derive prognostic signatures in multiple types of cancers. However, these signatures are usually derived from a single type of omics data (e.g. mRNA, miRNA, lncRNA profiling). Integration of multifaceted datasets with different levels of information appears relevant to better reflect the biology of a specific tumor. Accordingly, integrated genome-wide epigenetic and multi omics analyses using AI entered in the era of precision medicine with the burst of data generated over the last decades[25]. Thus, a deep learning multi omics model integrating RNA-seq, miRNA-seq, and methylation data from TCGA, was reported to robustly predict survival of patients with liver cancer[26]. A more aggressive subtype was associated with frequent TP53 inactivation mutations, higher expression of stemness markers, and activated WNT and AKT signaling pathways[26]. Pathway-based biomarker identification with crosstalk analysis has been also reported in liver cancer for efficiently differentiating patients into moderate or aggressive risk subtypes with significant differences in terms of survival[27]. Besides, deep-learning algorithms based on whole slide histological images were reported to predict prognosis of patients with liver cancer. By using a training set made of 390 slides from 206 tumors and a validating set made of 342 slides from 328 patients, a model was built for predicting the survival of patients after surgical resection of hepatocellular carcinoma[28]. Notably, the study highlights the importance of pathologist/machine interactions for the construction of deep-learning algorithms[28]. By processing 5202 digital pathology images from 13 cancer types, a deep-learning model established tumor-infiltrating lymphocytes maps correlated with molecular data, tumor subtypes, immune profiles and patient survival[29]. The application of deep learning in cancer prognosis has been shown to be equivalent or better than current approaches, as recently reviewed[30].
AI AND OMICS FOR CANCER TREATMENT
Deep learning-based analysis of multi omics data finds its natural place for the development of personalized therapies in cancer, notably by linking molecular actionable alterations with specific drugs already developed for these alterations or through a drug repositioning process (also referred to as drug repurposing). Deep learning models also enable large scale virtual screening of compound databases for predictive activity profiling against targets important for multiple cancers. Such large scale screening facilitate the quick and cost-effective repurposing of existing drugs[31]. By using a pharmacogenomics database of 1001 cancer cell lines, deep neural networks were trained for predicting drug response and their performance was assessed on multiple clinical cohorts[32]. By integrating RNA-seq, copy number, and mutations from 33 different cancer types (TCGA PanCanAtlas project), a deep learning model was shown to successfully predict RAS activation across cancer types and to identify phenocopying variants (e.g. NF1 loss). The model represents a useful tool to predict response to MEK inhibitors and identify the best responders[33]. Specific algorithms for drug repurposing have been also developed, based notably on linking gene expression profiles of tumors with gene signatures of bioactive molecules. Thus, the L1000 Connectivity Map is a library of gene expression signatures established in cell lines after pharmacologic or genetic (knockdown or over-expression) perturbation (approximately 20000 compounds, 4500 knockdowns, and 3000 over-expressions)[34]. This approach has been successfully used to propose epigenetic modulators (e.g. HDAC inhibitors) as relevant innovative therapeutics to target several hallmarks of liver cancer[35]. Using the same approach, anthelminthic drugs were also identified as potential therapeutic candidates in liver cancer[36]. Thus, combined with a robust stratification of human tumors, AI would help predicting response to individual therapy. Although translation between research and clinical practice requires to fully addressing the question of the reproducibility and interpretability of the developed algorithms, there is no doubt that AI will positively impact clinical decision-making, providing a more personalized management of patients[37]. Another aspect that needs to be fully appraised is the regulatory issue for AI technologies, including clinically approved algorithms (Software as Medical Devices, SaMD), e.g. in terms of personal data sharing[38].
CONCLUSION
Over the last decades, cancer genomic programs generated a large amount of multi omics data. This information fueled the development of innovative algorithms to extract meaningful information possibly translatable into clinical practices. AI emerged only recently in the field of cancer research. However, specific studies demonstrated already the possibility of AI to improve diagnosis and prognosis of patients with cancer and to develop innovative targeted therapeutics. Although, the actual algorithms are fueled mainly with omics data and clinical images (e.g. genetic, epigenetic, transcriptomic, proteomic, metabolomics profiles, CT scan, MRI), they pave the way for future models that will also integrate personalized clinical information related to lifestyle of each patient, including environmental exposure (exposome) or microbiome composition that may influence response to treatment[39] (Figure 1). As a promising future direction, research on exposome, genetic factors, microbiome, immunity, and molecular tissue biomarkers is needed using AI and omics technologies. This field referred to as molecular pathological epidemiology (MPE) aims at investigating those factors in relation to molecular pathologies and clinical outcomes by means of computational analyses. Thus, MPE represents a promising area of investigation to better understand how a particular exposure influences the carcinogenic and pathologic process[40,41].
In this context, Artificial Intelligence in Cancer journal was specifically launched to promote the development of this discipline, by serving as a forum to publish high-quality basic and clinical research articles in various fields of AI in oncology.
Manuscript source: Invited manuscript
Corresponding Author's Membership in Professional Societies: European Association for the Study of the Liver; European Network for the Study of Cholangiocarcinoma; and International Lactation Consultant Association.
Specialty type: Oncology
Country/Territory of origin: France
Peer-review report’s scientific quality classification
Grade A (Excellent): A, A, A
Grade B (Very good): B
Grade C (Good): C
Grade D (Fair): 0
Grade E (Poor): 0
P-Reviewer: Hu B, Jurman G, Liu Y, Ogino S, Santos-García G S-Editor: Wang JL L-Editor: A E-Editor: Liu JH