1
|
Rebello RJ, Posner A, Dong R, Prall OWJ, Sivakumaran T, Mitchell CB, Flynn A, Caneborg A, Mitchell C, Kanwal S, Fedele C, Webb S, Fisher K, Wong HL, Balachander S, Zhu W, Nicolson S, Dimitriadis V, Wilcken N, DeFazio A, Gao B, Singh M, Collins IM, Steer C, Warren M, Karanth N, Xu H, Fellowes A, Hicks RJ, Stewart KP, Shale C, Priestley P, Dawson SJ, Vissers JHA, Fox SB, Schofield P, Bowtell D, Hofmann O, Grimmond SM, Mileshkin L, Tothill RW. Whole genome sequencing improves tissue-of-origin diagnosis and treatment options for cancer of unknown primary. Nat Commun 2025; 16:4422. [PMID: 40393956 PMCID: PMC12092688 DOI: 10.1038/s41467-025-59661-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 04/23/2025] [Indexed: 05/22/2025] Open
Abstract
Genomics can inform both tissue-of-origin (TOO) and precision treatments for patients with cancer of unknown primary (CUP). Here, we use whole genome and transcriptome sequencing (WGTS) for 72 patients and show diagnostic superiority of WGTS over panel testing (386-523 genes) in 71 paired cases. WGTS detects all reportable DNA features found by panel as well as additional mutations of diagnostic or therapeutic relevance in 76% of cases. Curated WGTS features and a CUP prediction algorithm (CUPPA) trained on WGTS data of known cancer types informs TOO in 71% of cases otherwise undiagnosed by clinicopathology review. WGTS informs treatments for 79% of patients, compared to 59% by panel testing. Finally, WGS of cell-free DNA (cfDNA) from patients with a high cfDNA tumour fraction (>7%), enables high-likelihood CUPPA predictions in 41% of cases. WGTS is therefore superior to panel testing, broadens treatment options, and is feasible using routine pathology samples and cfDNA.
Collapse
Affiliation(s)
- Richard J Rebello
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Atara Posner
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Ruining Dong
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Owen W J Prall
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Tharani Sivakumaran
- Department of Medical Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
| | - Camilla B Mitchell
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Aidan Flynn
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Alex Caneborg
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Catherine Mitchell
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
| | - Sehrish Kanwal
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Clare Fedele
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Samantha Webb
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Krista Fisher
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Hui-Li Wong
- Department of Medical Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
| | - Shiva Balachander
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Wenying Zhu
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Shannon Nicolson
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Voula Dimitriadis
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Nicholas Wilcken
- The Westmead Institute for Medical Research, Sydney, NSW, Australia
| | - Anna DeFazio
- The Westmead Institute for Medical Research, Sydney, NSW, Australia
- Department of Gynaecological Oncology, Westmead Hospital, Sydney, NSW, Australia
- The Daffodil Centre, The University of Sydney, a joint venture with Cancer Council NSW, Sydney, NSW, Australia
| | - Bo Gao
- Department of Medical Oncology, Crown Princess Mary Cancer Centre, Westmead Hospital, Sydney, NSW, Australia
| | - Madhu Singh
- Department of Medical Oncology, Barwon Health Cancer Services, Geelong, VIC, Australia
| | - Ian M Collins
- Department of Medical Oncology, Southwest HealthCare, Warrnambool and Deakin University, Geelong, VIC, Australia
| | - Christopher Steer
- Border Medical Oncology, Albury Wodonga Regional Cancer Centre, Albury NSW, Australia and UNSW School of Clinical Medicine, Rural Clinical Campus, Albury, NSW, Australia
| | - Mark Warren
- Department of Medical Oncology, Bendigo Health, Bendigo, VIC, Australia
| | - Narayan Karanth
- Division of Medicine, Alan Walker Cancer Centre, Darwin, NT, Australia
| | - Huiling Xu
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Andrew Fellowes
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Rodney J Hicks
- The St Vincent's Hospital Department of Medicine, University of Melbourne, Melbourne, VIC, Australia
| | - Kym Pham Stewart
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | | | | | - Sarah-Jane Dawson
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Joseph H A Vissers
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Stephen B Fox
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
| | - Penelope Schofield
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Psychology, and Iverson Health Innovation Research Institute, Swinburne University, Melbourne, VIC, Australia
- School of Computing, Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, Australia
| | - David Bowtell
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Oliver Hofmann
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Sean M Grimmond
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia
| | - Linda Mileshkin
- Department of Medical Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Richard W Tothill
- Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia.
- Centre for Cancer Research, University of Melbourne, Melbourne, VIC, Australia.
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia.
| |
Collapse
|
2
|
Droogers E, Teunissen Y, van Puffelen AJ, Groenendijk FH, Veldhuijzen van Zanten SEM, Wagner A, Verheul HMW, Robbrecht DGJ. Impact of whole genome sequencing on the care pathway for patients with cancer of unknown primary. ESMO Open 2025; 10:105069. [PMID: 40345055 DOI: 10.1016/j.esmoop.2025.105069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2024] [Revised: 03/26/2025] [Accepted: 03/27/2025] [Indexed: 05/11/2025] Open
Abstract
BACKGROUND Patients with metastatic disease and no identifiable primary tumor, thus diagnosed with cancer of unknown primary (CUP), typically have a poor prognosis. Tumor DNA sequencing has recently shown promise in identifying the molecular tissue of origin. This study evaluated the value of whole genome sequencing (WGS) in the CUP care pathway, by comparing patient outcomes with a historical control cohort. Also, the value of whole transcriptome sequencing (WTS) was explored. PATIENTS AND METHODS A prospective intervention cohort was established of provisional CUP patients (≥18 years of age) who had WGS carried out on metastatic tissue between August 2021 and August 2023. A control cohort was established of CUP patients (≥18 years of age) diagnosed between December 2016 and April 2021 without the availability of WGS. The CUP predicting algorithm was applied to WGS data, and data on definitive diagnosis, molecular actionable alterations [ESMO Scale for Clinical Actionability of molecular Targets (ESCAT) tier 1-3], therapy, diagnostic timelines, and overall survival (OS) were captured. RESULTS A total of 159 provisional CUP patients (n = 54 intervention cohort, n = 105 control cohort) were included. WGS and WTS were successfully carried out in 46 (85%) and 27 patients (50%). A primary tumor diagnosis was established in 76% of the intervention cohort compared with 16% of the control cohort (P < 0.001). WGS contributed to a primary tumor diagnosis in 34 patients (63%) and identified an actionable alteration in 34 patients (63%). WTS contributed to a primary tumor diagnosis in three patients (6%). Following WGS, treatment recommendations could be made for 38 patients (70%), of whom 22 started the recommended therapies (58%). The median OS was 11 and 9 months in the intervention and control cohorts, respectively (P = 0.345). CONCLUSION Incorporation of WGS into the CUP care pathway is of significant value for diagnosing a primary tumor of origin and contributed to the identification of actionable alterations in the majority of patients.
Collapse
Affiliation(s)
- E Droogers
- Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands.
| | - Y Teunissen
- Department of Medical Oncology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - A J van Puffelen
- Department of Medical Oncology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - F H Groenendijk
- Department of Pathology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - S E M Veldhuijzen van Zanten
- Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - A Wagner
- Department of Clinical Genetics, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - H M W Verheul
- Department of Medical Oncology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - D G J Robbrecht
- Department of Medical Oncology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
| |
Collapse
|
3
|
Mukherjee A, Abraham S, Singh A, Balaji S, Mukunthan KS. From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies. Mol Biotechnol 2025; 67:1269-1289. [PMID: 38565775 PMCID: PMC11928429 DOI: 10.1007/s12033-024-01133-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
In the dynamic landscape of targeted therapeutics, drug discovery has pivoted towards understanding underlying disease mechanisms, placing a strong emphasis on molecular perturbations and target identification. This paradigm shift, crucial for drug discovery, is underpinned by big data, a transformative force in the current era. Omics data, characterized by its heterogeneity and enormity, has ushered biological and biomedical research into the big data domain. Acknowledging the significance of integrating diverse omics data strata, known as multi-omics studies, researchers delve into the intricate interrelationships among various omics layers. This review navigates the expansive omics landscape, showcasing tailored assays for each molecular layer through genomes to metabolomes. The sheer volume of data generated necessitates sophisticated informatics techniques, with machine-learning (ML) algorithms emerging as robust tools. These datasets not only refine disease classification but also enhance diagnostics and foster the development of targeted therapeutic strategies. Through the integration of high-throughput data, the review focuses on targeting and modeling multiple disease-regulated networks, validating interactions with multiple targets, and enhancing therapeutic potential using network pharmacology approaches. Ultimately, this exploration aims to illuminate the transformative impact of multi-omics in the big data era, shaping the future of biological research.
Collapse
Affiliation(s)
- Arnab Mukherjee
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Suzanna Abraham
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Akshita Singh
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - S Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - K S Mukunthan
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| |
Collapse
|
4
|
Goel I, Bhaskar Y, Kumar N, Singh S, Amanullah M, Dhar R, Karmakar S. Role of AI in empowering and redefining the oncology care landscape: perspective from a developing nation. Front Digit Health 2025; 7:1550407. [PMID: 40103737 PMCID: PMC11913822 DOI: 10.3389/fdgth.2025.1550407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 02/17/2025] [Indexed: 03/20/2025] Open
Abstract
Early diagnosis and accurate prognosis play a pivotal role in the clinical management of cancer and in preventing cancer-related mortalities. The burgeoning population of Asia in general and South Asian countries like India in particular pose significant challenges to the healthcare system. Regrettably, the demand for healthcare services in India far exceeds the available resources, resulting in overcrowded hospitals, prolonged wait times, and inadequate facilities. The scarcity of trained manpower in rural settings, lack of awareness and low penetrance of screening programs further compounded the problem. Artificial Intelligence (AI), driven by advancements in machine learning, deep learning, and natural language processing, can profoundly transform the underlying shortcomings in the healthcare industry, more for populous nations like India. With about 1.4 million cancer cases reported annually and 0.9 million deaths, India has a significant cancer burden that surpassed several nations. Further, India's diverse and large ethnic population is a data goldmine for healthcare research. Under these circumstances, AI-assisted technology, coupled with digital health solutions, could support effective oncology care and reduce the economic burden of GDP loss in terms of years of potential productive life lost (YPPLL) due to India's stupendous cancer burden. This review explores different aspects of cancer management, such as prevention, diagnosis, precision treatment, prognosis, and drug discovery, where AI has demonstrated promising clinical results. By harnessing the capabilities of AI in oncology research, healthcare professionals can enhance their ability to diagnose cancers at earlier stages, leading to more effective treatments and improved patient outcomes. With continued research and development, AI and digital health can play a transformative role in mitigating the challenges posed by the growing population and advancing the fight against cancer in India. Moreover, AI-driven technologies can assist in tailoring personalized treatment plans, optimizing therapeutic strategies, and supporting oncologists in making well-informed decisions. However, it is essential to ensure responsible implementation and address potential ethical and privacy concerns associated with using AI in healthcare.
Collapse
Affiliation(s)
- Isha Goel
- Department of Biochemistry, All India Institute of Medical Sciences (AIIMS), New Delhi, India
- Department of Psychiatry, All India Institute of Medical Sciences (AIIMS), New Delhi, India
| | - Yogendra Bhaskar
- ICMR Computational Genomics Centre, Indian Council of Medical Research (ICMR), New Delhi, India
| | - Nand Kumar
- Department of Psychiatry, All India Institute of Medical Sciences (AIIMS), New Delhi, India
| | - Sunil Singh
- Department of Biochemistry, All India Institute of Medical Sciences (AIIMS), New Delhi, India
| | - Mohammed Amanullah
- Department of Clinical Biochemistry, College of Medicine, King Khalid University, Abha, Saudi Arabia
| | - Ruby Dhar
- Department of Biochemistry, All India Institute of Medical Sciences (AIIMS), New Delhi, India
| | - Subhradip Karmakar
- Department of Biochemistry, All India Institute of Medical Sciences (AIIMS), New Delhi, India
| |
Collapse
|
5
|
Ellrott K, Wong CK, Yau C, Castro MAA, Lee JA, Karlberg BJ, Grewal JK, Lagani V, Tercan B, Friedl V, Hinoue T, Uzunangelov V, Westlake L, Loinaz X, Felau I, Wang PI, Kemal A, Caesar-Johnson SJ, Shmulevich I, Lazar AJ, Tsamardinos I, Hoadley KA, Robertson AG, Knijnenburg TA, Benz CC, Stuart JM, Zenklusen JC, Cherniack AD, Laird PW. Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets. Cancer Cell 2025; 43:195-212.e11. [PMID: 39753139 PMCID: PMC11949768 DOI: 10.1016/j.ccell.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 08/26/2024] [Accepted: 12/05/2024] [Indexed: 02/12/2025]
Abstract
Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer's underlying biology, bringing hope to inform a patient's prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes-a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.
Collapse
Affiliation(s)
- Kyle Ellrott
- Oregon Health and Science University, Portland, OR 97239, USA.
| | - Christopher K Wong
- Biomolecular Engineering Department, School of Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christina Yau
- University of California, San Francisco, Department of Surgery, San Francisco, CA 94158, USA; Buck Institute for Research on Aging, Novato, CA 94945, USA
| | - Mauro A A Castro
- Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba, PR 81520-260, Brazil
| | - Jordan A Lee
- Oregon Health and Science University, Portland, OR 97239, USA
| | | | - Jasleen K Grewal
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Vincenzo Lagani
- JADBio Gnosis DA, GR-700 13 Heraklion, Crete, Greece; Institute of Chemical Biology, Ilia State University, Tbilisi 0162, Georgia
| | - Bahar Tercan
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA
| | - Verena Friedl
- Biomolecular Engineering Department, School of Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Toshinori Hinoue
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI 49503, USA
| | - Vladislav Uzunangelov
- Biomolecular Engineering Department, School of Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Lindsay Westlake
- The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Xavier Loinaz
- The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Ina Felau
- Center for Cancer Genomics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Peggy I Wang
- Center for Cancer Genomics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Anab Kemal
- Center for Cancer Genomics, National Cancer Institute, Bethesda, MD 20892, USA
| | | | - Ilya Shmulevich
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA
| | - Alexander J Lazar
- Departments of Pathology & Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ioannis Tsamardinos
- JADBio Gnosis DA, GR-700 13 Heraklion, Crete, Greece; Department of Computer Science, University of Crete, GR-700 13 Heraklion, Crete, Greece; Institute of Applied and Computational Mathematics, Foundation for Research and Technology Hellas (FORTH), GR-700 13 Heraklion, Crete, Greece
| | - Katherine A Hoadley
- Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27519, USA
| | - A Gordon Robertson
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Theo A Knijnenburg
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA
| | | | - Joshua M Stuart
- Biomolecular Engineering Department, School of Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jean C Zenklusen
- Center for Cancer Genomics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Andrew D Cherniack
- The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02115, USA.
| | - Peter W Laird
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI 49503, USA.
| |
Collapse
|
6
|
Zhou L, Li J, Tan W. M-NET: Transforming Single Nucleotide Variations Into Patient Feature Images for the Prediction of Prostate Cancer Metastasis and Identification of Significant Pathways. IEEE J Biomed Health Inform 2025; 29:1199-1208. [PMID: 39509309 DOI: 10.1109/jbhi.2024.3493618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
High-performance prediction of prostate cancer metastasis based on single nucleotide variations remains a challenge. Therefore, we developed a novel biologically informed deep learning framework, named M-NET, for the prediction of prostate cancer metastasis. Within the framework, we transformed single nucleotide variations into patient feature images that are optimal for fitting convolutional neural networks. Moreover, we identified significant pathways associated with the metastatic status. The experimental results showed that M-NET significantly outperformed other comparison methods based on single nucleotide variations, achieving improvements in accuracy, precision, recall, F1-score, area under the receiver operating characteristics curve, and area under the precision-recall curve by 6.3%, 8.4%, 5.1%, 0.070, 0.041, and 0.026, respectively. Furthermore, M-NET identified some important pathways associated with the metastatic status, such as signaling by the hedgehog pathway. In summary, compared with other comparative methods, M-NET exhibited a better performance in the prediction of prostate cancer metastasis.
Collapse
|
7
|
Vashisht V, Vashisht A, Mondal AK, Woodall J, Kolhe R. From Genomic Exploration to Personalized Treatment: Next-Generation Sequencing in Oncology. Curr Issues Mol Biol 2024; 46:12527-12549. [PMID: 39590338 PMCID: PMC11592618 DOI: 10.3390/cimb46110744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 10/29/2024] [Accepted: 11/03/2024] [Indexed: 11/28/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized personalized oncology care by providing exceptional insights into the complex genomic landscape. NGS offers comprehensive cancer profiling, which enables clinicians and researchers to better understand the molecular basis of cancer and to tailor treatment strategies accordingly. Targeted therapies based on genomic alterations identified through NGS have shown promise in improving patient outcomes across various cancer types, circumventing resistance mechanisms and enhancing treatment efficacy. Moreover, NGS facilitates the identification of predictive biomarkers and prognostic indicators, aiding in patient stratification and personalized treatment approaches. By uncovering driver mutations and actionable alterations, NGS empowers clinicians to make informed decisions regarding treatment selection and patient management. However, the full potential of NGS in personalized oncology can only be realized through bioinformatics analyses. Bioinformatics plays a crucial role in processing raw sequencing data, identifying clinically relevant variants, and interpreting complex genomic landscapes. This comprehensive review investigates the diverse NGS techniques, including whole-genome sequencing (WGS), whole-exome sequencing (WES), and single-cell RNA sequencing (sc-RNA-Seq), elucidating their roles in understanding the complex genomic/transcriptomic landscape of cancer. Furthermore, the review explores the integration of NGS data with bioinformatics tools to facilitate personalized oncology approaches, from understanding tumor heterogeneity to identifying driver mutations and predicting therapeutic responses. Challenges and future directions in NGS-based cancer research are also discussed, underscoring the transformative impact of these technologies on cancer diagnosis, management, and treatment strategies.
Collapse
Affiliation(s)
| | | | | | | | - Ravindra Kolhe
- Department of Pathology, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (V.V.); (A.V.); (A.K.M.); (J.W.)
| |
Collapse
|
8
|
Wu Q, Morrow EM, Gamsiz Uzun ED. A deep learning model for prediction of autism status using whole-exome sequencing data. PLoS Comput Biol 2024; 20:e1012468. [PMID: 39514604 DOI: 10.1371/journal.pcbi.1012468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 11/20/2024] [Accepted: 09/06/2024] [Indexed: 11/16/2024] Open
Abstract
Autism is a developmental disability. Research demonstrated that children with autism benefit from early diagnosis and early intervention. Genetic factors are considered major contributors to the development of autism. Machine learning (ML), including deep learning (DL), has been evaluated in phenotype prediction, but this method has been limited in its application to autism. We developed a DL model, the Separate Translated Autism Research Neural Network (STAR-NN) model to predict autism status. The model was trained and tested using whole exome sequencing data from 43,203 individuals (16,809 individuals with autism and 26,394 non-autistic controls). Polygenic scores from common variants and the aggregated count of rare variants on genes were used as input. In STAR-NN, protein truncating variants, possibly damaging missense variants and mild effect missense variants on the same gene were separated at the input level and merged to one gene node. In this way, rare variants with different level of pathogenic effects were treated separately. We further validated the performance of STAR-NN using an independent dataset, including 13,827 individuals with autism and 14,052 non-autistic controls. STAR-NN achieved a modest ROC-AUC of 0.7319 on the testing dataset and 0.7302 on the independent dataset. STAR-NN outperformed other traditional ML models. Gene Ontology analysis on the selected gene features showed an enrichment for potentially informative pathways including calcium ion transport.
Collapse
Affiliation(s)
- Qing Wu
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, Rhode Island, United States of America
- Center for Translational Neuroscience, Robert J. and Nancy D. Carney Institute for Brain Science and Brown Institute for Translational Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Eric M Morrow
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, Rhode Island, United States of America
- Center for Translational Neuroscience, Robert J. and Nancy D. Carney Institute for Brain Science and Brown Institute for Translational Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Developmental Disorders Genetics Research Program, Department of Psychiatry and Human Behavior, Emma Pendleton Bradley Hospital, East Providence, Rhode Island, United States of America
| | - Ece D Gamsiz Uzun
- Center for Translational Neuroscience, Robert J. and Nancy D. Carney Institute for Brain Science and Brown Institute for Translational Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Pathology and Laboratory Medicine, Warren Alpert Medical School of Brown University, Providence, Rhode Island, United States of America
- Department of Pathology and Laboratory Medicine, Rhode Island Hospital, Providence, Rhode Island, United States of America
| |
Collapse
|
9
|
Sheng Y, Zhao B, Cheng H, Yu Y, Wang W, Yang Y, Ding Y, Qiu L, Qin Z, Yao Z, Zhang X, Ren Y. A Convolutional Neural Network Model for Distinguishing Hemangioblastomas From Other Cerebellar-and-Brainstem Tumors Using Contrast-Enhanced MRI. J Magn Reson Imaging 2024; 60:1512-1520. [PMID: 38206839 DOI: 10.1002/jmri.29230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/26/2023] [Accepted: 12/27/2023] [Indexed: 01/13/2024] Open
Abstract
BACKGROUND Hemangioblastoma (HB) is a highly vascularized tumor most commonly occurring in the posterior cranial fossa, requiring accurate preoperative diagnosis to avoid accidental intraoperative hemorrhage and even death. PURPOSE To accurately distinguish HBs from other cerebellar-and-brainstem tumors using a convolutional neural network model based on a contrast-enhanced brain MRI dataset. STUDY TYPE Retrospective. POPULATION Four hundred five patients (182 = HBs; 223 = other cerebellar-and brainstem tumors): 305 cases for model training, and 100 for evaluation. FIELD STRENGTH/SEQUENCE 3 T/contrast-enhanced T1-weighted imaging (T1WI + C). ASSESSMENT A CNN-based 2D classification network was trained by using sliced data along the z-axis. To improve the performance of the network, we introduced demographic information, various data-augmentation methods and an auxiliary task to segment tumor region. Then, this method was compared with the evaluations performed by experienced and intermediate-level neuroradiologists, and the heatmap of deep feature, which indicates the contribution of each pixel to model prediction, was visualized by Grad-CAM for analyzing the misclassified cases. STATISTICAL TESTS The Pearson chi-square test and an independent t-test were used to test for distribution difference in age and sex. And the independent t-test was exploited to evaluate the performance between experts and our proposed method. P value <0.05 was considered significant. RESULTS The trained network showed a higher accuracy for identifying HBs (accuracy = 0.902 ± 0.031, F1 = 0.891 ± 0.035, AUC = 0.926 ± 0.040) than experienced (accuracy = 0.887 ± 0.013, F1 = 0.868 ± 0.011, AUC = 0.881 ± 0.008) and intermediate-level (accuracy = 0.827 ± 0.037, F1 = 0.768 ± 0.068, AUC = 0.810 ± 0.047) neuroradiologists. The recall values were 0.910 ± 0.050, 0.659 ± 0.084, and 0.828 ± 0.019 for the trained network, intermediate and experienced neuroradiologists, respectively. Additional ablation experiments verified the utility of the introduced demographic information, data augmentation, and the auxiliary-segmentation task. DATA CONCLUSION Our proposed method can successfully distinguish HBs from other cerebellar-and-brainstem tumors and showed diagnostic efficiency comparable to that of experienced neuroradiologists. EVIDENCE LEVEL 3 TECHNICAL EFFICACY: Stage 2.
Collapse
Affiliation(s)
- Yaru Sheng
- Radiology Department of Huashan Hospital, Fudan University, Shanghai, China
| | - Botao Zhao
- Research Center for Augmented Intelligence, Zhejiang Lab, Hangzhou, China
| | - Haixia Cheng
- Neuropathology Department of Huashan Hospital, Fudan University, Shanghai, China
| | - Yang Yu
- Radiology Department of Huashan Hospital, Fudan University, Shanghai, China
| | - Weiwei Wang
- Radiology Department of Huashan Hospital, Fudan University, Shanghai, China
| | - Yang Yang
- Radiology Department of Huashan Hospital, Fudan University, Shanghai, China
| | - Yueyue Ding
- Department of Ultrasonography, Jing'an District Centre Hospital of Shanghai, Shanghai, China
| | - Longhua Qiu
- Radiology Department of Huashan Hospital, Fudan University, Shanghai, China
| | - Zhiyong Qin
- Neurosurgery Department of Huashan Hospital, Fudan University, Shanghai, China
| | - Zhenwei Yao
- Radiology Department of Huashan Hospital, Fudan University, Shanghai, China
| | - Xiaoyong Zhang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Yan Ren
- Radiology Department of Huashan Hospital, Fudan University, Shanghai, China
| |
Collapse
|
10
|
Mamalakis M, Macfarlane SC, Notley SV, Gad AKB, Panoutsos G. A novel pipeline employing deep multi-attention channels network for the autonomous detection of metastasizing cells through fluorescence microscopy. Comput Biol Med 2024; 181:109052. [PMID: 39216406 DOI: 10.1016/j.compbiomed.2024.109052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 08/09/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024]
Abstract
Metastasis driven by cancer cell migration is the leading cause of cancer-related deaths. It involves significant changes in the organization of the cytoskeleton, which includes the actin microfilaments and the vimentin intermediate filaments. Understanding how these filament change cells from normal to invasive offers insights that can be used to improve cancer diagnosis and therapy. We have developed a computational, transparent, large-scale and imaging-based pipeline, that can distinguish between normal human cells and their isogenically matched, oncogenically transformed, invasive and metastasizing counterparts, based on the spatial organization of actin and vimentin filaments in the cell cytoplasm. Due to the intricacy of these subcellular structures, manual annotation is not trivial to automate. We used established deep learning methods and our new multi-attention channel architecture. To ensure a high level of interpretability of the network, which is crucial for the application area, we developed an interpretable global explainable approach correlating the weighted geometric mean of the total cell images and their local GradCam scores. The methods offer detailed, objective and measurable understanding of how different components of the cytoskeleton contribute to metastasis, insights that can be used for future development of novel diagnostic tools, such as a nanometer level, vimentin filament-based biomarker for digital pathology, and for new treatments that significantly can increase patient survival.
Collapse
Affiliation(s)
- Michail Mamalakis
- School of Electrical and Electronic Engineering, University of Sheffield, Sheffield, UK; Insigneo Institute for in-silico, Medicine, University of Sheffield, Sheffield, UK; Department of Infection, Immunity and Cardiovascular Disease, and Department of Computer science, Sheffield, UK; Department of Psychiatry, Cambridge University, Cambridge, UK.
| | - Sarah C Macfarlane
- Department of Oncology and Metabolism, The Medical School, University of Sheffield, Sheffield, UK
| | - Scott V Notley
- Insigneo Institute for in-silico, Medicine, University of Sheffield, Sheffield, UK; Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, UK
| | - Annica K B Gad
- Insigneo Institute for in-silico, Medicine, University of Sheffield, Sheffield, UK; Department of Oncology and Metabolism, The Medical School, University of Sheffield, Sheffield, UK; Madeira Chemistry Research Centre, University of Madeira, Funchal, Portugal; Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden
| | - George Panoutsos
- School of Electrical and Electronic Engineering, University of Sheffield, Sheffield, UK; Insigneo Institute for in-silico, Medicine, University of Sheffield, Sheffield, UK; Department of Oncology and Metabolism, The Medical School, University of Sheffield, Sheffield, UK.
| |
Collapse
|
11
|
Hwang J, Lee Y, Yoo SK, Kim JI. Image-based deep learning model using DNA methylation data predicts the origin of cancer of unknown primary. Neoplasia 2024; 55:101021. [PMID: 38943996 PMCID: PMC11261876 DOI: 10.1016/j.neo.2024.101021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 06/24/2024] [Indexed: 07/01/2024]
Abstract
Cancer of unknown primary (CUP) is a rare type of metastatic cancer in which the origin of the tumor is unknown. Since the treatment strategy for patients with metastatic tumors depends on knowing the primary site, accurate identification of the origin site is important. Here, we developed an image-based deep-learning model that utilizes a vision transformer algorithm for predicting the origin of CUP. Using DNA methylation dataset of 8,233 primary tumors from The Cancer Genome Atlas (TCGA), we categorized 29 cancer types into 18 organ classes and extracted 2,312 differentially methylated CpG sites (DMCs) from non-squamous cancer group and 420 DMCs from squamous cell cancer group. Using these DMCs, we created organ-specific DNA methylation images and used them for model training and testing. Model performance was evaluated using 394 metastatic cancer samples from TCGA (TCGA-meta) and 995 samples (693 primary and 302 metastatic cancers) obtained from 20 independent external studies. We identified that the DNA methylation image reveals a distinct pattern based on the origin of cancer. Our model achieved an overall accuracy of 96.95 % in the TCGA-meta dataset. In the external validation datasets, our classifier achieved overall accuracies of 96.39 % and 94.37 % in primary and metastatic tumors, respectively. Especially, the overall accuracies for both primary and metastatic samples of non-squamous cell cancer were exceptionally high, with 96.79 % and 96.85 %, respectively.
Collapse
Affiliation(s)
- Jinha Hwang
- Department of Laboratory Medicine, Korea University Anam Hospital, Seoul, the Republic of Korea
| | - Yeajina Lee
- Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, the Republic of Korea; Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, the Republic of Korea
| | - Seong-Keun Yoo
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| | - Jong-Il Kim
- Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, the Republic of Korea; Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, the Republic of Korea.
| |
Collapse
|
12
|
Neagu AI, Poalelungi DG, Fulga A, Neagu M, Fulga I, Nechita A. Enhanced Immunohistochemistry Interpretation with a Machine Learning-Based Expert System. Diagnostics (Basel) 2024; 14:1853. [PMID: 39272638 PMCID: PMC11394116 DOI: 10.3390/diagnostics14171853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 07/26/2024] [Accepted: 08/22/2024] [Indexed: 09/15/2024] Open
Abstract
BACKGROUND In recent decades, machine-learning (ML) technologies have advanced the management of high-dimensional and complex cancer data by developing reliable and user-friendly automated diagnostic tools for clinical applications. Immunohistochemistry (IHC) is an essential staining method that enables the identification of cellular origins by analyzing the expression of specific antigens within tissue samples. The aim of this study was to identify a model that could predict histopathological diagnoses based on specific immunohistochemical markers. METHODS The XGBoost learning model was applied, where the input variable (target variable) was the histopathological diagnosis and the predictors (independent variables influencing the target variable) were the immunohistochemical markers. RESULTS Our study demonstrated a precision rate of 85.97% within the dataset, indicating a high level of performance and suggesting that the model is generally reliable in producing accurate predictions. CONCLUSIONS This study demonstrated the feasibility and clinical efficacy of utilizing the probabilistic decision tree algorithm to differentiate tumor diagnoses according to immunohistochemistry profiles.
Collapse
Affiliation(s)
- Anca Iulia Neagu
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint John Clinical Emergency Hospital for Children, 800487 Galati, Romania
| | - Diana Gina Poalelungi
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint Apostle Andrew Emergency County Clinical Hospital, 177 Brailei St., 800578 Galati, Romania
| | - Ana Fulga
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint Apostle Andrew Emergency County Clinical Hospital, 177 Brailei St., 800578 Galati, Romania
| | - Marius Neagu
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint Apostle Andrew Emergency County Clinical Hospital, 177 Brailei St., 800578 Galati, Romania
| | - Iuliu Fulga
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint Apostle Andrew Emergency County Clinical Hospital, 177 Brailei St., 800578 Galati, Romania
| | - Aurel Nechita
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint John Clinical Emergency Hospital for Children, 800487 Galati, Romania
| |
Collapse
|
13
|
Li L, Sun M, Wang J, Wan S. Multi-omics based artificial intelligence for cancer research. Adv Cancer Res 2024; 163:303-356. [PMID: 39271266 DOI: 10.1016/bs.acr.2024.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
With significant advancements of next generation sequencing technologies, large amounts of multi-omics data, including genomics, epigenomics, transcriptomics, proteomics, and metabolomics, have been accumulated, offering an unprecedented opportunity to explore the heterogeneity and complexity of cancer across various molecular levels and scales. One of the promising aspects of multi-omics lies in its capacity to offer a holistic view of the biological networks and pathways underpinning cancer, facilitating a deeper understanding of its development, progression, and response to treatment. However, the exponential growth of data generated by multi-omics studies present significant analytical challenges. Processing, analyzing, integrating, and interpreting these multi-omics datasets to extract meaningful insights is an ambitious task that stands at the forefront of current cancer research. The application of artificial intelligence (AI) has emerged as a powerful solution to these challenges, demonstrating exceptional capabilities in deciphering complex patterns and extracting valuable information from large-scale, intricate omics datasets. This review delves into the synergy of AI and multi-omics, highlighting its revolutionary impact on oncology. We dissect how this confluence is reshaping the landscape of cancer research and clinical practice, particularly in the realms of early detection, diagnosis, prognosis, treatment and pathology. Additionally, we elaborate the latest AI methods for multi-omics integration to provide a comprehensive insight of the complex biological mechanisms and inherent heterogeneity of cancer. Finally, we discuss the current challenges of data harmonization, algorithm interpretability, and ethical considerations. Addressing these challenges necessitates a multidisciplinary collaboration, paving the promising way for more precise, personalized, and effective treatments for cancer patients.
Collapse
Affiliation(s)
- Lusheng Li
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| | - Mengtao Sun
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| | - Jieqiong Wang
- Department of Neurological Sciences, University of Nebraska Medical Center, Omaha, NE, United States
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States.
| |
Collapse
|
14
|
Shahamatdar S, Saeed-Vafa D, Linsley D, Khalil F, Lovinger K, Li L, McLeod HT, Ramachandran S, Serre T. Deceptive learning in histopathology. Histopathology 2024; 85:116-132. [PMID: 38556922 PMCID: PMC11162337 DOI: 10.1111/his.15180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/02/2024]
Abstract
AIMS Deep learning holds immense potential for histopathology, automating tasks that are simple for expert pathologists and revealing novel biology for tasks that were previously considered difficult or impossible to solve by eye alone. However, the extent to which the visual strategies learned by deep learning models in histopathological analysis are trustworthy or not has yet to be systematically analysed. Here, we systematically evaluate deep neural networks (DNNs) trained for histopathological analysis in order to understand if their learned strategies are trustworthy or deceptive. METHODS AND RESULTS We trained a variety of DNNs on a novel data set of 221 whole-slide images (WSIs) from lung adenocarcinoma patients, and evaluated their effectiveness at (1) molecular profiling of KRAS versus EGFR mutations, (2) determining the primary tissue of a tumour and (3) tumour detection. While DNNs achieved above-chance performance on molecular profiling, they did so by exploiting correlations between histological subtypes and mutations, and failed to generalise to a challenging test set obtained through laser capture microdissection (LCM). In contrast, DNNs learned robust and trustworthy strategies for determining the primary tissue of a tumour as well as detecting and localising tumours in tissue. CONCLUSIONS Our work demonstrates that DNNs hold immense promise for aiding pathologists in analysing tissue. However, they are also capable of achieving seemingly strong performance by learning deceptive strategies that leverage spurious correlations, and are ultimately unsuitable for research or clinical work. The framework we propose for model evaluation and interpretation is an important step towards developing reliable automated systems for histopathological analysis.
Collapse
Affiliation(s)
- Sahar Shahamatdar
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- The Warren Alpert Medical School, Brown University, Providence, RI, USA
| | - Daryoush Saeed-Vafa
- Department of Anatomic Pathology, H. Lee Moffitt Cancer and Research Institute, Tampa, FL, USA
| | - Drew Linsley
- Carney Institute for Brain Science, Brown University, Providence, RI, USA
- Department of Cognitive Linguistic & Psychological Sciences, Brown University, Providence, RI, USA
| | - Farah Khalil
- Department of Anatomic Pathology, H. Lee Moffitt Cancer and Research Institute, Tampa, FL, USA
| | - Katherine Lovinger
- Department of Molecular Biology, H. Lee Moffitt Cancer and Research Institute, Tampa, FL, USA
| | - Lester Li
- University of Rochester, Rochester, NY, USA
| | | | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, RI, USA
- The Data Science Initiative, Brown University, Providence, RI, USA
| | - Thomas Serre
- Carney Institute for Brain Science, Brown University, Providence, RI, USA
- Department of Cognitive Linguistic & Psychological Sciences, Brown University, Providence, RI, USA
| |
Collapse
|
15
|
Garg S. A Deep Learning Model for Cancer Type Prediction Sets a New Standard. Cancer Discov 2024; 14:906-908. [PMID: 38826098 DOI: 10.1158/2159-8290.cd-24-0280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
SUMMARY Classifying tumor types using machine learning approaches is not always trivial, particularly for challenging cases such as cancers of unknown primary. In this issue of Cancer Discovery, Darmofal and colleagues describe a new tool that uses information from a clinical sequencing panel to diagnose tumor type, and show that the model is particularly robust. See related article by Darmofal et al., p. 1064 (1).
Collapse
Affiliation(s)
- Salil Garg
- Laboratory Medicine and Pathology, Yale University, New Haven, Connecticut
| |
Collapse
|
16
|
Darmofal M, Suman S, Atwal G, Toomey M, Chen JF, Chang JC, Vakiani E, Varghese AM, Balakrishnan Rema A, Syed A, Schultz N, Berger MF, Morris Q. Deep-Learning Model for Tumor-Type Prediction Using Targeted Clinical Genomic Sequencing Data. Cancer Discov 2024; 14:1064-1081. [PMID: 38416134 PMCID: PMC11145170 DOI: 10.1158/2159-8290.cd-23-0996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/07/2023] [Accepted: 02/23/2024] [Indexed: 02/29/2024]
Abstract
Tumor type guides clinical treatment decisions in cancer, but histology-based diagnosis remains challenging. Genomic alterations are highly diagnostic of tumor type, and tumor-type classifiers trained on genomic features have been explored, but the most accurate methods are not clinically feasible, relying on features derived from whole-genome sequencing (WGS), or predicting across limited cancer types. We use genomic features from a data set of 39,787 solid tumors sequenced using a clinically targeted cancer gene panel to develop Genome-Derived-Diagnosis Ensemble (GDD-ENS): a hyperparameter ensemble for classifying tumor type using deep neural networks. GDD-ENS achieves 93% accuracy for high-confidence predictions across 38 cancer types, rivaling the performance of WGS-based methods. GDD-ENS can also guide diagnoses of rare type and cancers of unknown primary and incorporate patient-specific clinical information for improved predictions. Overall, integrating GDD-ENS into prospective clinical sequencing workflows could provide clinically relevant tumor-type predictions to guide treatment decisions in real time. SIGNIFICANCE We describe a highly accurate tumor-type prediction model, designed specifically for clinical implementation. Our model relies only on widely used cancer gene panel sequencing data, predicts across 38 distinct cancer types, and supports integration of patient-specific nongenomic information for enhanced decision support in challenging diagnostic situations. See related commentary by Garg, p. 906. This article is featured in Selected Articles from This Issue, p. 897.
Collapse
Affiliation(s)
- Madison Darmofal
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, New York
| | - Shalabh Suman
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Gurnit Atwal
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Michael Toomey
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, New York
| | - Jie-Fu Chen
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Jason C. Chang
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Efsevia Vakiani
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Anna M. Varghese
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | | | - Aijazuddin Syed
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Nikolaus Schultz
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Michael F. Berger
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Quaid Morris
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York
| |
Collapse
|
17
|
Edsjö A, Russnes HG, Lehtiö J, Tamborero D, Hovig E, Stenzinger A, Rosenquist R. High-throughput molecular assays for inclusion in personalised oncology trials - State-of-the-art and beyond. J Intern Med 2024; 295:785-803. [PMID: 38698538 DOI: 10.1111/joim.13785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
In the last decades, the development of high-throughput molecular assays has revolutionised cancer diagnostics, paving the way for the concept of personalised cancer medicine. This progress has been driven by the introduction of such technologies through biomarker-driven oncology trials. In this review, strengths and limitations of various state-of-the-art sequencing technologies, including gene panel sequencing (DNA and RNA), whole-exome/whole-genome sequencing and whole-transcriptome sequencing, are explored, focusing on their ability to identify clinically relevant biomarkers with diagnostic, prognostic and/or predictive impact. This includes the need to assess complex biomarkers, for example microsatellite instability, tumour mutation burden and homologous recombination deficiency, to identify patients suitable for specific therapies, including immunotherapy. Furthermore, the crucial role of biomarker analysis and multidisciplinary molecular tumour boards in selecting patients for trial inclusion is discussed in relation to various trial concepts, including drug repurposing. Recognising that today's exploratory techniques will evolve into tomorrow's routine diagnostics and clinical study inclusion assays, the importance of emerging technologies for multimodal diagnostics, such as proteomics and in vivo drug sensitivity testing, is also discussed. In addition, key regulatory aspects and the importance of patient engagement in all phases of a clinical trial are described. Finally, we propose a set of recommendations for consideration when planning a new precision cancer medicine trial.
Collapse
Affiliation(s)
- Anders Edsjö
- Department of Clinical Genetics, Pathology and Molecular Diagnostics, Office for Medical Services, Region Skåne, Lund, Sweden
- Division of Pathology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Hege G Russnes
- Department of Pathology, Oslo University Hospital, Oslo, Norway
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Janne Lehtiö
- Department of Oncology and Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
- Cancer genomics and proteomics, Karolinska University Hospital, Solna, Sweden
| | - David Tamborero
- Department of Oncology and Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Eivind Hovig
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Albrecht Stenzinger
- Institute of Pathology, Division of Molecular Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | - Richard Rosenquist
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Clinical Genetics and Genomics, Karolinska University Hospital, Solna, Sweden
| |
Collapse
|
18
|
Staiger RD, Mehra T, Haile SR, Domenghino A, Kümmerli C, Abbassi F, Kozbur D, Dutkowski P, Puhan MA, Clavien PA. Experts vs. machine - comparison of machine learning to expert-informed prediction of outcome after major liver surgery. HPB (Oxford) 2024; 26:674-681. [PMID: 38423890 DOI: 10.1016/j.hpb.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 02/01/2024] [Accepted: 02/11/2024] [Indexed: 03/02/2024]
Abstract
BACKGROUND Machine learning (ML) has been successfully implemented for classification tasks (e.g., cancer diagnosis). ML performance for more challenging predictions is largely unexplored. This study's objective was to compare machine learning vs. expert-informed predictions for surgical outcome in patients undergoing major liver surgery. METHODS Single tertiary center data on preoperative parameters and postoperative complications for elective hepatic surgery patients were included (2008-2021). Expert-informed prediction models were established on 14 parameters identified by two expert liver surgeons to impact on postoperative outcome. ML models used all available preoperative patient variables (n = 62). Model performance was compared for predicting 3-month postoperative overall morbidity. Temporal validation and additional analysis in major liver resection patients were conducted. RESULTS 889 patients included. Expert-informed models showed low average bias (2-5 CCI points) with high over/underprediction. ML models performed similarly: average prediction 5-10 points higher than observed CCI values with high variability (95% CI -30 to 50). No performance improvement for major liver surgery patients. CONCLUSION No clinical relevance in the application of ML for predicting postoperative overall morbidity was found. Despite being a novel hype, ML has the potential for application in clinical practice. However, at this stage it does not replace established approaches of prediction modelling.
Collapse
Affiliation(s)
- Roxane D Staiger
- Department of Surgery & Transplantation, University Hospital Zurich, Zurich, Switzerland.
| | - Tarun Mehra
- Department of Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
| | - Sarah R Haile
- Department of Epidemiology, Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Anja Domenghino
- Department of Surgery & Transplantation, University Hospital Zurich, Zurich, Switzerland
| | | | - Fariba Abbassi
- Department of Surgery & Transplantation, University Hospital Zurich, Zurich, Switzerland
| | - Damian Kozbur
- Department of Economics, University of Zurich, Zurich, Switzerland
| | - Philipp Dutkowski
- Department of Surgery & Transplantation, University Hospital Zurich, Zurich, Switzerland
| | - Milo A Puhan
- Department of Epidemiology, Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Pierre-Alain Clavien
- Department of Surgery & Transplantation, University Hospital Zurich, Zurich, Switzerland
| |
Collapse
|
19
|
Maji S, Ghosh SK, Jha JK, Chaturvedi V. A prospective observational study to assess the epidemiological profile of multiple primary cancers in Eastern India. J Cancer Res Ther 2024; 20:888-892. [PMID: 39023596 DOI: 10.4103/jcrt.jcrt_1603_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 07/29/2022] [Indexed: 07/20/2024]
Abstract
BACKGROUND Multiple primary cancers once thought to be rare have become increasingly common as the lifespan of cancer survivors has increased with availability of better and more effective cancer treatment. However, their exact incidence is not known and data on their epidemiological characteristics are not available. AIM The aim of this study is to study the epidemiologic characteristics of multiple primary cancers in the eastern region of India. MATERIALS AND METHOD The study was conducted in the Department of Surgical Oncology, Medical College, Kolkata, from 2017 to 2020 over a period of 3 years. All patients with a diagnosis of second primary as per International Agency for Research on Cancer (IARC) definition or those developing a second primary within the study period were included for analysis. Data were recorded in form of preformed questionnaires. All the cases were followed up for at least 12 months. RESULT Fifty cases of multiple primary tumors were identified, out of which 21 were synchronous while rest 29 were metachronous type. The male-female ratio was 1:1.2. The median age at presentation for index malignancy was 50 years. The most common malignancy in the synchronous group was a combination of variety of GI cancers (six cases). In the metachronous category, a combination of reproductive cancers (breast, ovary, cervix, and endometrium) along with Gastrointestinal cancer (GI) cancers (colon, rectum) was most frequently found (eight cases). Definite risk factors for multiple primary tumors were identifiable in 10 cases: arsenic exposure in 5 cases, hereditary in 4 cases, and immunosuppression in 1, while in 8 cases, risk factors were only speculative (radiation 5 cases, chemotherapy 3). At the time of the last follow-up, 36 subjects were alive and 3 dead while the status of 11 subjects was unknown. CONCLUSION This is the first comprehensive study on multiple primary cancers and the largest so far in India. Our study overcomes the shortcoming of previous case series from our subcontinent. The merits of our study include the use of the most accepted IARC definition, updated staging guidelines with long follow-up, and reliable survival data. Additionally, we could identify risk factors in 50% of our subjects. And our study shows various new combinations of cancers not reported before. Clustering of cases in the young adolescent group (25-49) years is also a new finding. We also highlight the existing ambiguity in the way this entity is defined. Demerits include the loss of follow-up data in a significant number of patients.
Collapse
Affiliation(s)
- Suvendu Maji
- Department of Surgical Oncology, Medical College, Kolkata, West Bengal, India
| | | | | | | |
Collapse
|
20
|
Ma T, Wang J. GraphPath: a graph attention model for molecular stratification with interpretability based on the pathway-pathway interaction network. Bioinformatics 2024; 40:btae165. [PMID: 38530778 PMCID: PMC11007237 DOI: 10.1093/bioinformatics/btae165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 02/22/2024] [Accepted: 03/22/2024] [Indexed: 03/28/2024] Open
Abstract
MOTIVATION Studying the molecular heterogeneity of cancer is essential for achieving personalized therapy. At the same time, understanding the biological processes that drive cancer development can lead to the identification of valuable therapeutic targets. Therefore, achieving accurate and interpretable clinical predictions requires paramount attention to thoroughly characterizing patients at both the molecular and biological pathway levels. RESULTS Here, we present GraphPath, a biological knowledge-driven graph neural network with multi-head self-attention mechanism that implements the pathway-pathway interaction network. We train GraphPath to classify the cancer status of patients with prostate cancer based on their multi-omics profiling. Experiment results show that our method outperforms P-NET and other baseline methods. Besides, two external cohorts are used to validate that the model can be generalized to unseen samples with adequate predictive performance. We reduce the dimensionality of latent pathway embeddings and visualize corresponding classes to further demonstrate the optimal performance of the model. Additionally, since GraphPath's predictions are interpretable, we identify target cancer-associated pathways that significantly contribute to the model's predictions. Such a robust and interpretable model has the potential to greatly enhance our understanding of cancer's biological mechanisms and accelerate the development of targeted therapies. AVAILABILITY AND IMPLEMENTATION https://github.com/amazingma/GraphPath.
Collapse
Affiliation(s)
- Teng Ma
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 41083, Hunan, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 41083, Hunan, China
| |
Collapse
|
21
|
Unger M, Kather JN. Deep learning in cancer genomics and histopathology. Genome Med 2024; 16:44. [PMID: 38539231 PMCID: PMC10976780 DOI: 10.1186/s13073-024-01315-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 03/13/2024] [Indexed: 07/08/2024] Open
Abstract
Histopathology and genomic profiling are cornerstones of precision oncology and are routinely obtained for patients with cancer. Traditionally, histopathology slides are manually reviewed by highly trained pathologists. Genomic data, on the other hand, is evaluated by engineered computational pipelines. In both applications, the advent of modern artificial intelligence methods, specifically machine learning (ML) and deep learning (DL), have opened up a fundamentally new way of extracting actionable insights from raw data, which could augment and potentially replace some aspects of traditional evaluation workflows. In this review, we summarize current and emerging applications of DL in histopathology and genomics, including basic diagnostic as well as advanced prognostic tasks. Based on a growing body of evidence, we suggest that DL could be the groundwork for a new kind of workflow in oncology and cancer research. However, we also point out that DL models can have biases and other flaws that users in healthcare and research need to know about, and we propose ways to address them.
Collapse
Affiliation(s)
- Michaela Unger
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
- Department of Medicine I, University Hospital Dresden, Dresden, Germany.
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
| |
Collapse
|
22
|
Yan H, Weng D, Li D, Gu Y, Ma W, Liu Q. Prior knowledge-guided multilevel graph neural network for tumor risk prediction and interpretation via multi-omics data integration. Brief Bioinform 2024; 25:bbae184. [PMID: 38670157 PMCID: PMC11052635 DOI: 10.1093/bib/bbae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 04/06/2024] [Indexed: 04/28/2024] Open
Abstract
The interrelation and complementary nature of multi-omics data can provide valuable insights into the intricate molecular mechanisms underlying diseases. However, challenges such as limited sample size, high data dimensionality and differences in omics modalities pose significant obstacles to fully harnessing the potential of these data. The prior knowledge such as gene regulatory network and pathway information harbors useful gene-gene interaction and gene functional module information. To effectively integrate multi-omics data and make full use of the prior knowledge, here, we propose a Multilevel-graph neural network (GNN): a hierarchically designed deep learning algorithm that sequentially leverages multi-omics data, gene regulatory networks and pathway information to extract features and enhance accuracy in predicting survival risk. Our method achieved better accuracy compared with existing methods. Furthermore, key factors nonlinearly associated with the tumor pathogenesis are prioritized by employing two interpretation algorithms (i.e. GNN-Explainer and IGscore) for neural networks, at gene and pathway level, respectively. The top genes and pathways exhibit strong associations with disease in survival analyses, many of which such as SEC61G and CYP27B1 are previously reported in the literature.
Collapse
Affiliation(s)
- Hongxi Yan
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| | - Dawei Weng
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Dongguo Li
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Yu Gu
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Wenji Ma
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, 200025, Shanghai, China
| | - Qingjie Liu
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| |
Collapse
|
23
|
Chakraborty S, Guan Z, Begg CB, Shen R. Topical hidden genome: discovering latent cancer mutational topics using a Bayesian multilevel context-learning approach. Biometrics 2024; 80:ujae030. [PMID: 38682463 PMCID: PMC11056772 DOI: 10.1093/biomtc/ujae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/18/2024] [Accepted: 04/04/2024] [Indexed: 05/01/2024]
Abstract
Inferring the cancer-type specificities of ultra-rare, genome-wide somatic mutations is an open problem. Traditional statistical methods cannot handle such data due to their ultra-high dimensionality and extreme data sparsity. To harness information in rare mutations, we have recently proposed a formal multilevel multilogistic "hidden genome" model. Through its hierarchical layers, the model condenses information in ultra-rare mutations through meta-features embodying mutation contexts to characterize cancer types. Consistent, scalable point estimation of the model can incorporate 10s of millions of variants across thousands of tumors and permit impressive prediction and attribution. However, principled statistical inference is infeasible due to the volume, correlation, and noninterpretability of mutation contexts. In this paper, we propose a novel framework that leverages topic models from computational linguistics to effectuate dimension reduction of mutation contexts producing interpretable, decorrelated meta-feature topics. We propose an efficient MCMC algorithm for implementation that permits rigorous full Bayesian inference at a scale that is orders of magnitude beyond the capability of existing out-of-the-box inferential high-dimensional multi-class regression methods and software. Applying our model to the Pan Cancer Analysis of Whole Genomes dataset reveals interesting biological insights including somatic mutational topics associated with UV exposure in skin cancer, aging in colorectal cancer, and strong influence of epigenome organization in liver cancer. Under cross-validation, our model demonstrates highly competitive predictive performance against blackbox methods of random forest and deep learning.
Collapse
Affiliation(s)
- Saptarshi Chakraborty
- Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14214, USA
| | - Zoe Guan
- Biostatistics Center, Mass General Research Institute, Boston, MA 02114, USA
| | - Colin B Begg
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | - Ronglai Shen
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| |
Collapse
|
24
|
Rydzewski NR, Shi Y, Li C, Chrostek MR, Bakhtiar H, Helzer KT, Bootsma ML, Berg TJ, Harari PM, Floberg JM, Blitzer GC, Kosoff D, Taylor AK, Sharifi MN, Yu M, Lang JM, Patel KR, Citrin DE, Sundling KE, Zhao SG. A platform-independent AI tumor lineage and site (ATLAS) classifier. Commun Biol 2024; 7:314. [PMID: 38480799 PMCID: PMC10937974 DOI: 10.1038/s42003-024-05981-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 02/27/2024] [Indexed: 03/17/2024] Open
Abstract
Histopathologic diagnosis and classification of cancer plays a critical role in guiding treatment. Advances in next-generation sequencing have ushered in new complementary molecular frameworks. However, existing approaches do not independently assess both site-of-origin (e.g. prostate) and lineage (e.g. adenocarcinoma) and have minimal validation in metastatic disease, where classification is more difficult. Utilizing gradient-boosted machine learning, we developed ATLAS, a pair of separate AI Tumor Lineage and Site-of-origin models from RNA expression data on 8249 tumor samples. We assessed performance independently in 10,376 total tumor samples, including 1490 metastatic samples, achieving an accuracy of 91.4% for cancer site-of-origin and 97.1% for cancer lineage. High confidence predictions (encompassing the majority of cases) were accurate 98-99% of the time in both localized and remarkably even in metastatic samples. We also identified emergent properties of our lineage scores for tumor types on which the model was never trained (zero-shot learning). Adenocarcinoma/sarcoma lineage scores differentiated epithelioid from biphasic/sarcomatoid mesothelioma. Also, predicted lineage de-differentiation identified neuroendocrine/small cell tumors and was associated with poor outcomes across tumor types. Our platform-independent single-sample approach can be easily translated to existing RNA-seq platforms. ATLAS can complement and guide traditional histopathologic assessment in challenging situations and tumors of unknown primary.
Collapse
Affiliation(s)
- Nicholas R Rydzewski
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Yue Shi
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Chenxuan Li
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | | | - Hamza Bakhtiar
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Kyle T Helzer
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Matthew L Bootsma
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Tracy J Berg
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Paul M Harari
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
| | - John M Floberg
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
| | - Grace C Blitzer
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
| | - David Kosoff
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Amy K Taylor
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Marina N Sharifi
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Menggang Yu
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| | - Joshua M Lang
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Krishnan R Patel
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Deborah E Citrin
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kaitlin E Sundling
- Department of Pathology and Laboratory Medicine, University of Wisconsin, Madison, WI, USA
- Wisconsin State Laboratory of Hygiene, University of Wisconsin, Madison, WI, USA
| | - Shuang G Zhao
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA.
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA.
- William S. Middleton Veterans Hospital, Madison, WI, USA.
| |
Collapse
|
25
|
Hassan J, Saeed SM, Deka L, Uddin MJ, Das DB. Applications of Machine Learning (ML) and Mathematical Modeling (MM) in Healthcare with Special Focus on Cancer Prognosis and Anticancer Therapy: Current Status and Challenges. Pharmaceutics 2024; 16:260. [PMID: 38399314 PMCID: PMC10892549 DOI: 10.3390/pharmaceutics16020260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/29/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024] Open
Abstract
The use of data-driven high-throughput analytical techniques, which has given rise to computational oncology, is undisputed. The widespread use of machine learning (ML) and mathematical modeling (MM)-based techniques is widely acknowledged. These two approaches have fueled the advancement in cancer research and eventually led to the uptake of telemedicine in cancer care. For diagnostic, prognostic, and treatment purposes concerning different types of cancer research, vast databases of varied information with manifold dimensions are required, and indeed, all this information can only be managed by an automated system developed utilizing ML and MM. In addition, MM is being used to probe the relationship between the pharmacokinetics and pharmacodynamics (PK/PD interactions) of anti-cancer substances to improve cancer treatment, and also to refine the quality of existing treatment models by being incorporated at all steps of research and development related to cancer and in routine patient care. This review will serve as a consolidation of the advancement and benefits of ML and MM techniques with a special focus on the area of cancer prognosis and anticancer therapy, leading to the identification of challenges (data quantity, ethical consideration, and data privacy) which are yet to be fully addressed in current studies.
Collapse
Affiliation(s)
- Jasmin Hassan
- Drug Delivery & Therapeutics Lab, Dhaka 1212, Bangladesh; (J.H.); (S.M.S.)
| | | | - Lipika Deka
- Faculty of Computing, Engineering and Media, De Montfort University, Leicester LE1 9BH, UK;
| | - Md Jasim Uddin
- Department of Pharmaceutical Technology, Faculty of Pharmacy, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Diganta B. Das
- Department of Chemical Engineering, Loughborough University, Loughborough LE11 3TU, UK
| |
Collapse
|
26
|
Lorkowski SW, Dermawan JK, Rubin BP. The practical utility of AI-assisted molecular profiling in the diagnosis and management of cancer of unknown primary: an updated review. Virchows Arch 2024; 484:369-375. [PMID: 37999736 DOI: 10.1007/s00428-023-03708-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 11/07/2023] [Accepted: 11/14/2023] [Indexed: 11/25/2023]
Abstract
Cancer of unknown primary (CUP) presents a complex diagnostic challenge, characterized by metastatic tumors of unknown tissue origin and a dismal prognosis. This review delves into the emerging significance of artificial intelligence (AI) and machine learning (ML) in transforming the landscape of CUP diagnosis, classification, and treatment. ML approaches, trained on extensive molecular profiling data, have shown promise in accurately predicting tissue of origin. Genomic profiling, encompassing driver mutations and copy number variations, plays a pivotal role in CUP diagnosis by providing insights into tumor type-specific oncogenic alterations. Mutational signatures (MS), reflecting somatic mutation patterns, offer further insights into CUP diagnosis. Known MS with established etiology, such as ultraviolet (UV) light-induced DNA damage and tobacco exposure, have been identified in cases of dedifferentiated/transdifferentiated melanoma and carcinoma. Deep learning models that integrate gene expression data and DNA methylation patterns offer insights into tissue lineage and tumor classification. In digital pathology, machine learning algorithms analyze whole-slide images to aid in CUP classification. Finally, precision oncology, guided by molecular profiling, offers targeted therapies independent of primary tissue identification. Clinical trials assigning CUP patients to molecularly guided therapies, including targetable alterations and tumor mutation burden as an immunotherapy biomarker, have resulted in improved overall survival in a subset of patients. In conclusion, AI- and ML-driven approaches are revolutionizing CUP management by enhancing diagnostic accuracy. Precision oncology utilizing enhanced molecular profiling facilitates the identification of targeted therapies that transcend the need to identify the tissue of origin, ultimately improving patient outcomes.
Collapse
Affiliation(s)
- Shuhui Wang Lorkowski
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA
| | - Josephine K Dermawan
- Robert J. Tomsich Pathology and Laboratory Medicine Institute, Cleveland Clinic, Cleveland, OH, 44195, USA
| | - Brian P Rubin
- Robert J. Tomsich Pathology and Laboratory Medicine Institute, Cleveland Clinic, Cleveland, OH, 44195, USA.
| |
Collapse
|
27
|
Salvadores M, Supek F. Cell cycle gene alterations associate with a redistribution of mutation risk across chromosomal domains in human cancers. NATURE CANCER 2024; 5:330-346. [PMID: 38200245 DOI: 10.1038/s43018-023-00707-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 12/11/2023] [Indexed: 01/12/2024]
Abstract
Mutations in human cells exhibit increased burden in heterochromatic, late DNA replication time (RT) chromosomal domains, with variation in mutation rates between tissues mirroring variation in heterochromatin and RT. We observed that regional mutation risk further varies between individual tumors in a manner independent of cell type, identifying three signatures of domain-scale mutagenesis in >4,000 tumor genomes. The major signature reflects remodeling of heterochromatin and of the RT program domains seen across tumors, tissues and cultured cells, and is robustly linked with higher expression of cell proliferation genes. Regional mutagenesis is associated with loss of activity of the tumor-suppressor genes RB1 and TP53, consistent with their roles in cell cycle control, with distinct mutational patterns generated by the two genes. Loss of regional heterogeneity in mutagenesis is associated with deficiencies in various DNA repair pathways. These mutation risk redistribution processes modify the mutation supply towards important genes, diverting the course of somatic evolution.
Collapse
Affiliation(s)
- Marina Salvadores
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Fran Supek
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, Barcelona, Spain.
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain.
| |
Collapse
|
28
|
Anaya J, Sidhom JW, Mahmood F, Baras AS. Multiple-instance learning of somatic mutations for the classification of tumour type and the prediction of microsatellite status. Nat Biomed Eng 2024; 8:57-67. [PMID: 37919367 PMCID: PMC10805698 DOI: 10.1038/s41551-023-01120-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/30/2023] [Indexed: 11/04/2023]
Abstract
Large-scale genomic data are well suited to analysis by deep learning algorithms. However, for many genomic datasets, labels are at the level of the sample rather than for individual genomic measures. Machine learning models leveraging these datasets generate predictions by using statically encoded measures that are then aggregated at the sample level. Here we show that a single weakly supervised end-to-end multiple-instance-learning model with multi-headed attention can be trained to encode and aggregate the local sequence context or genomic position of somatic mutations, hence allowing for the modelling of the importance of individual measures for sample-level classification and thus providing enhanced explainability. The model solves synthetic tasks that conventional models fail at, and achieves best-in-class performance for the classification of tumour type and for predicting microsatellite status. By improving the performance of tasks that require aggregate information from genomic datasets, multiple-instance deep learning may generate biological insight.
Collapse
Affiliation(s)
- Jordan Anaya
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - John-William Sidhom
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Bloomberg~Kimmel Institute for Cancer Immunotherapy, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA
| | - Alexander S Baras
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Bloomberg~Kimmel Institute for Cancer Immunotherapy, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
29
|
Yang Y, Zeng Q, Liu G, Zheng S, Luo T, Guo Y, Tang J, Huang Y. Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer. BMC Bioinformatics 2023; 24:465. [PMID: 38066424 PMCID: PMC10709847 DOI: 10.1186/s12859-023-05529-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/12/2023] [Indexed: 12/18/2023] Open
Abstract
Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the similarity of cancers and the specificity of methylation patterns among them, we developed the Cancer Hierarchy Classification Tool (CHCT) using the idea of hierarchical classification, with methylation data from 30 cancer types and 8239 methylome samples downloaded from publicly available databases (The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO)). We used unsupervised clustering to divide the classification subproblems and screened differentially methylated sites using Analysis of variance (ANOVA) test, Tukey-kramer test, and Boruta algorithms to construct models for each classifier module. After validation, CHCT accurately classified 1568 out of 1660 cases in the test set, with an average accuracy of 94.46%. We further curated an independent validation cohort of 677 cancer samples from GEO and assigned a diagnosis using CHCT, which showed high diagnostic potential with generally high accuracies (an average accuracy of 91.40%). Moreover, CHCT demonstrates predictive capability for additional cancer types beyond its original classifier scope as demonstrated in the medulloblastoma and pituitary tumor datasets. In summary, CHCT can hierarchically classify primary cancer by methylation profile, by splitting a large-scale classification of 30 cancer types into ten smaller classification problems. These results indicate that cancer hierarchical classification has the potential to be an accurate and robust cancer classification method.
Collapse
Affiliation(s)
- Youpeng Yang
- Medicine School, Sun Yat-sen University, Shenzhen, 518107, China
| | - Qiuhong Zeng
- Geneplus-Shenzhen Institute, Shenzhen, 518118, China
| | - Gaotong Liu
- Geneplus-Shenzhen Institute, Shenzhen, 518118, China
| | - Shiyao Zheng
- Medicine School, Sun Yat-sen University, Shenzhen, 518107, China
| | - Tianyang Luo
- Medicine School, Sun Yat-sen University, Shenzhen, 518107, China
| | - Yibin Guo
- Medicine School, Sun Yat-sen University, Shenzhen, 518107, China.
| | - Jia Tang
- NHC Key Laboratory of Male Reproduction and Genetics, Guangdong Provincial Reproductive Science Institute (Guangdong Provincial Fertility Hospital), Guangzhou, 510062, China.
- School of Medicine, Jinan University, Guangzhou, 510632, China.
| | - Yi Huang
- Geneplus-Shenzhen Institute, Shenzhen, 518118, China.
| |
Collapse
|
30
|
Zelli V, Manno A, Compagnoni C, Ibraheem RO, Zazzeroni F, Alesse E, Rossi F, Arbib C, Tessitore A. Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations. J Transl Med 2023; 21:836. [PMID: 37990214 PMCID: PMC10664515 DOI: 10.1186/s12967-023-04720-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 11/10/2023] [Indexed: 11/23/2023] Open
Abstract
BACKGROUND Machine learning (ML) represents a powerful tool to capture relationships between molecular alterations and cancer types and to extract biological information. Here, we developed a plain ML model aimed at distinguishing cancer types based on genetic lesions, providing an additional tool to improve cancer diagnosis, particularly for tumors of unknown origin. METHODS TCGA data from 9,927 samples spanning 32 different cancer types were downloaded from cBioportal. A vector space model type data transformation technique was designed to build consistently homogeneous new datasets containing, as predictive features, calls for somatic point mutations and copy number variations at chromosome arm-level, thus allowing the use of the XGBoost classifier models. Considering the imbalance in the dataset, due to large difference in the number of cases for each tumor, two preprocessing strategies were considered: i) setting a percentage cut-off threshold to remove less represented cancer types, ii) dividing cancer types into different groups based on biological criteria and training a specific XGBoost model for each of them. The performance of all trained models was mainly assessed by the out-of-sample balanced accuracy (BACC) and the AUC scores. RESULTS The XGBoost classifier achieved the best performance (BACC 77%; AUC 97%) on a dataset containing the 10 most represented tumor types. Moreover, dividing the 18 most represented cancers into three different groups (endocrine-related carcinomas, other carcinomas and other cancers),such analysis models achieved 78%, 71% and 86% BACC, respectively, with AUC scores greater than 96%. In addition, the model capable of linking each group to a specific cancer type reached 81% BACC and 94% AUC. Overall, the diagnostic potential of our model was comparable/higher with respect to others already described in literature and based on similar molecular data and ML approaches. CONCLUSIONS A boosted ML approach able to accurately discriminate different cancer types was developed. The methodology builds datasets simpler and more interpretable than the original data, while keeping enough information to accurately train standard ML models without resorting to sophisticated Deep Learning architectures. In combination with histopathological examinations, this approach could improve cancer diagnosis by using specific DNA alterations, processed by a replicable and easy-to-use automated technology. The study encourages new investigations which could further increase the classifier's performance, for example by considering more features and dividing tumors into their main molecular subtypes.
Collapse
Affiliation(s)
- Veronica Zelli
- Department of Biotechnological and Applied Clinical Sciences, University of L'Aquila, 67100, L'Aquila, Italy
- Center for Molecular Diagnostics and Advanced Therapies, University of L'Aquila, Via Petrini, 67100, L'Aquila, Italy
| | - Andrea Manno
- Department of Information Engineering, Computer Science and Mathematics, Center of Excellence DEWS, University of L'Aquila, 67100, L'Aquila, Italy
| | - Chiara Compagnoni
- Department of Biotechnological and Applied Clinical Sciences, University of L'Aquila, 67100, L'Aquila, Italy
| | - Rasheed Oyewole Ibraheem
- Department of Information Engineering, Computer Science and Mathematics, Center of Excellence DEWS, University of L'Aquila, 67100, L'Aquila, Italy
| | - Francesca Zazzeroni
- Department of Biotechnological and Applied Clinical Sciences, University of L'Aquila, 67100, L'Aquila, Italy
| | - Edoardo Alesse
- Department of Biotechnological and Applied Clinical Sciences, University of L'Aquila, 67100, L'Aquila, Italy
| | - Fabrizio Rossi
- Department of Information Engineering, Computer Science and Mathematics, Center of Excellence DEWS, University of L'Aquila, 67100, L'Aquila, Italy
| | - Claudio Arbib
- Department of Information Engineering, Computer Science and Mathematics, Center of Excellence DEWS, University of L'Aquila, 67100, L'Aquila, Italy
| | - Alessandra Tessitore
- Department of Biotechnological and Applied Clinical Sciences, University of L'Aquila, 67100, L'Aquila, Italy.
- Center for Molecular Diagnostics and Advanced Therapies, University of L'Aquila, Via Petrini, 67100, L'Aquila, Italy.
| |
Collapse
|
31
|
Fawaz A, Ferraresi A, Isidoro C. Systems Biology in Cancer Diagnosis Integrating Omics Technologies and Artificial Intelligence to Support Physician Decision Making. J Pers Med 2023; 13:1590. [PMID: 38003905 PMCID: PMC10672164 DOI: 10.3390/jpm13111590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/07/2023] [Accepted: 11/08/2023] [Indexed: 11/26/2023] Open
Abstract
Cancer is the second major cause of disease-related death worldwide, and its accurate early diagnosis and therapeutic intervention are fundamental for saving the patient's life. Cancer, as a complex and heterogeneous disorder, results from the disruption and alteration of a wide variety of biological entities, including genes, proteins, mRNAs, miRNAs, and metabolites, that eventually emerge as clinical symptoms. Traditionally, diagnosis is based on clinical examination, blood tests for biomarkers, the histopathology of a biopsy, and imaging (MRI, CT, PET, and US). Additionally, omics biotechnologies help to further characterize the genome, metabolome, microbiome traits of the patient that could have an impact on the prognosis and patient's response to the therapy. The integration of all these data relies on gathering of several experts and may require considerable time, and, unfortunately, it is not without the risk of error in the interpretation and therefore in the decision. Systems biology algorithms exploit Artificial Intelligence (AI) combined with omics technologies to perform a rapid and accurate analysis and integration of patient's big data, and support the physician in making diagnosis and tailoring the most appropriate therapeutic intervention. However, AI is not free from possible diagnostic and prognostic errors in the interpretation of images or biochemical-clinical data. Here, we first describe the methods used by systems biology for combining AI with omics and then discuss the potential, challenges, limitations, and critical issues in using AI in cancer research.
Collapse
Affiliation(s)
| | | | - Ciro Isidoro
- Laboratory of Molecular Pathology, Department of Health Sciences, Università del Piemonte Orientale, 28100 Novara, Italy; (A.F.); (A.F.)
| |
Collapse
|
32
|
Zhang S, He S, Zhu X, Wang Y, Xie Q, Song X, Xu C, Wang W, Xing L, Xia C, Wang Q, Li W, Zhang X, Yu J, Ma S, Shi J, Gu H. DNA methylation profiling to determine the primary sites of metastatic cancers using formalin-fixed paraffin-embedded tissues. Nat Commun 2023; 14:5686. [PMID: 37709764 PMCID: PMC10502058 DOI: 10.1038/s41467-023-41015-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 08/18/2023] [Indexed: 09/16/2023] Open
Abstract
Identifying the primary site of metastatic cancer is critical to guiding the subsequent treatment. Approximately 3-9% of metastatic patients are diagnosed with cancer of unknown primary sites (CUP) even after a comprehensive diagnostic workup. However, a widely accepted molecular test is still not available. Here, we report a method that applies formalin-fixed, paraffin-embedded tissues to construct reduced representation bisulfite sequencing libraries (FFPE-RRBS). We then generate and systematically evaluate 28 molecular classifiers, built on four DNA methylation scoring methods and seven machine learning approaches, using the RRBS library dataset of 498 fresh-frozen tumor tissues from primary cancer patients. Among these classifiers, the beta value-based linear support vector (BELIVE) performs the best, achieving overall accuracies of 81-93% for identifying the primary sites in 215 metastatic patients using top-k predictions (k = 1, 2, 3). Coincidentally, BELIVE also successfully predicts the tissue of origin in 81-93% of CUP patients (n = 68).
Collapse
Affiliation(s)
- Shirong Zhang
- Translational Medicine Research Center, Hangzhou First People's Hospital, 310006, Hangzhou, Zhejiang Province, China
- Key Laboratory of Clinical Cancer Pharmacology and Toxicology Research of Zhejiang Province, Hangzhou First People's Hospital, 310006, Hangzhou, Zhejiang Province, China
| | - Shutao He
- State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, 200031, Shanghai, China
- Institute of Biotechnology and Health, Beijing Academy of Science and Technology, 100089, Beijing, China
| | - Xin Zhu
- Key Laboratory of Head & Neck Cancer Translational Research of Zhejiang Province, Zhejiang Cancer Hospital, 310022, Hangzhou, Zhejiang Province, China
| | - Yunfei Wang
- Zhejiang ShengTing Biotech Co. Ltd, 310018, Hangzhou, Zhejiang Province, China
| | - Qionghuan Xie
- Zhejiang ShengTing Biotech Co. Ltd, 310018, Hangzhou, Zhejiang Province, China
| | - Xianrang Song
- Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, 250117, Jinan, Shandong Province, China
| | - Chunwei Xu
- Department of Respiratory Medicine, Jinling Hospital, Nanjing University School of Medicine, 210002, Nanjing, Jiangshu Province, China
| | - Wenxian Wang
- Key Laboratory of Head & Neck Cancer Translational Research of Zhejiang Province, Zhejiang Cancer Hospital, 310022, Hangzhou, Zhejiang Province, China
| | - Ligang Xing
- Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, 250117, Jinan, Shandong Province, China
| | - Chengqing Xia
- Zhejiang ShengTing Biotech Co. Ltd, 310018, Hangzhou, Zhejiang Province, China
| | - Qian Wang
- Department of Respiratory Medicine, Affiliated Hospital of Nanjing University of Chinese Medicine, Jiangsu Province Hospital of Chinese Medicine, 210029, Nanjing, Jiangshu Province, China
| | - Wenfeng Li
- Department of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical University, 325000, Wenzhou, Zhejiang Province, China
| | - Xiaochen Zhang
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, 310006, Hangzhou, Zhejiang Province, China
| | - Jinming Yu
- Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, 250117, Jinan, Shandong Province, China
| | - Shenglin Ma
- Translational Medicine Research Center, Hangzhou First People's Hospital, 310006, Hangzhou, Zhejiang Province, China.
- Department of Oncology, Hangzhou Cancer Hospital, 310006, Hangzhou, Zhejiang Province, China.
| | - Jiantao Shi
- State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, 200031, Shanghai, China.
| | - Hongcang Gu
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, 230031, Hefei, Anhui Province, China.
- Hefei Cancer Hospital, Chinese Academy of Sciences, 230031, Hefei, Anhui Province, China.
| |
Collapse
|
33
|
Darmofal M, Suman S, Atwal G, Chen JF, Chang JC, Toomey M, Vakiani E, Varghese AM, Rema AB, Syed A, Schultz N, Berger M, Morris Q. Deep Learning Model for Tumor Type Prediction using Targeted Clinical Genomic Sequencing Data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.08.23295131. [PMID: 37732244 PMCID: PMC10508812 DOI: 10.1101/2023.09.08.23295131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
Tumor type guides clinical treatment decisions in cancer, but histology-based diagnosis remains challenging. Genomic alterations are highly diagnostic of tumor type, and tumor type classifiers trained on genomic features have been explored, but the most accurate methods are not clinically feasible, relying on features derived from whole genome sequencing (WGS), or predicting across limited cancer types. We use genomic features from a dataset of 39,787 solid tumors sequenced using a clinical targeted cancer gene panel to develop Genome-Derived-Diagnosis Ensemble (GDD-ENS): a hyperparameter ensemble for classifying tumor type using deep neural networks. GDD-ENS achieves 93% accuracy for high-confidence predictions across 38 cancer types, rivalling performance of WGS-based methods. GDD-ENS can also guide diagnoses on rare type and cancers of unknown primary, and incorporate patient-specific clinical information for improved predictions. Overall, integrating GDD-ENS into prospective clinical sequencing workflows has enabled clinically-relevant tumor type predictions to guide treatment decisions in real time.
Collapse
Affiliation(s)
- Madison Darmofal
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine; New York, NY 10065, USA
| | - Shalabh Suman
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Gurnit Atwal
- Computational Biology Program, Ontario Institute for Cancer Research; Toronto, ON M5G 0A3, Canada
- Department of Molecular Genetics, University of Toronto; Toronto, ON M5S 1A8, Canada
- Vector Institute; Toronto, ON M5G 1M1, Canada
| | - Jie-Fu Chen
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Jason C. Chang
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Michael Toomey
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine; New York, NY 10065, USA
| | - Efsevia Vakiani
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Anna M Varghese
- Department of Medicine, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | | | - Aijazuddin Syed
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Nikolaus Schultz
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Michael Berger
- Department of Pathology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| | - Quaid Morris
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center; New York, NY 10065, USA
| |
Collapse
|
34
|
Liu Z, Samee M. Structural underpinnings of mutation rate variations in the human genome. Nucleic Acids Res 2023; 51:7184-7197. [PMID: 37395403 PMCID: PMC10415140 DOI: 10.1093/nar/gkad551] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/06/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023] Open
Abstract
Single nucleotide mutation rates have critical implications for human evolution and genetic diseases. Importantly, the rates vary substantially across the genome and the principles underlying such variations remain poorly understood. A recent model explained much of this variation by considering higher-order nucleotide interactions in the 7-mer sequence context around mutated nucleotides. This model's success implicates a connection between DNA shape and mutation rates. DNA shape, i.e. structural properties like helical twist and tilt, is known to capture interactions between nucleotides within a local context. Thus, we hypothesized that changes in DNA shape features at and around mutated positions can explain mutation rate variations in the human genome. Indeed, DNA shape-based models of mutation rates showed similar or improved performance over current nucleotide sequence-based models. These models accurately characterized mutation hotspots in the human genome and revealed the shape features whose interactions underlie mutation rate variations. DNA shape also impacts mutation rates within putative functional regions like transcription factor binding sites where we find a strong association between DNA shape and position-specific mutation rates. This work demonstrates the structural underpinnings of nucleotide mutations in the human genome and lays the groundwork for future models of genetic variations to incorporate DNA shape.
Collapse
Affiliation(s)
- Zian Liu
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
35
|
Ning W, Wu T, Wu C, Wang S, Tao Z, Wang G, Zhao X, Diao K, Wang J, Chen J, Chen F, Liu XS. Accurate prediction of pan-cancer types using machine learning with minimal number of DNA methylation sites. J Mol Cell Biol 2023; 15:mjad023. [PMID: 37037781 PMCID: PMC10635511 DOI: 10.1093/jmcb/mjad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 02/08/2023] [Accepted: 04/07/2023] [Indexed: 04/12/2023] Open
Abstract
DNA methylation analysis has been applied to determine the primary site of cancer; however, robust and accurate prediction of cancer types with a minimum number of sites is still a significant scientific challenge. To build an accurate and robust cancer type prediction tool with a minimum number of DNA methylation sites, we internally benchmarked different DNA methylation site selection and ranking procedures, as well as different classification models. We used The Cancer Genome Atlas dataset (26 cancer types with 8296 samples) to train and test models and used an independent dataset (17 cancer types with 2738 samples) for model validation. A deep neural network model using a combined feature selection procedure (named MethyDeep) can predict 26 cancer types using 30 methylation sites with superior performance compared with the known methods for both primary and metastatic cancers in independent validation datasets. In conclusion, MethyDeep is an accurate and robust cancer type predictor with the minimum number of DNA methylation sites; it could help the cost-effective clarification of cancer of unknown primary patients and the liquid biopsy-based early screening of cancers.
Collapse
Affiliation(s)
- Wei Ning
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
- Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing 100049, China
| | - Tao Wu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| | - Chenxu Wu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| | - Shixiang Wang
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| | - Ziyu Tao
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| | - Guangshuai Wang
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| | - Xiangyu Zhao
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| | - Kaixuan Diao
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| | - Jinyu Wang
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| | - Jing Chen
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| | - Fuxiang Chen
- Department of Clinical Immunology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China
| | - Xue-Song Liu
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
| |
Collapse
|
36
|
Wang Z, Zhou Y, Zhang Y, Mo YK, Wang Y. XMR: an explainable multimodal neural network for drug response prediction. FRONTIERS IN BIOINFORMATICS 2023; 3:1164482. [PMID: 37600972 PMCID: PMC10433751 DOI: 10.3389/fbinf.2023.1164482] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/14/2023] [Indexed: 08/22/2023] Open
Abstract
Introduction: Existing large-scale preclinical cancer drug response databases provide us with a great opportunity to identify and predict potentially effective drugs to combat cancers. Deep learning models built on these databases have been developed and applied to tackle the cancer drug-response prediction task. Their prediction has been demonstrated to significantly outperform traditional machine learning methods. However, due to the "black box" characteristic, biologically faithful explanations are hardly derived from these deep learning models. Interpretable deep learning models that rely on visible neural networks (VNNs) have been proposed to provide biological justification for the predicted outcomes. However, their performance does not meet the expectation to be applied in clinical practice. Methods: In this paper, we develop an XMR model, an eXplainable Multimodal neural network for drug Response prediction. XMR is a new compact multimodal neural network consisting of two sub-networks: a visible neural network for learning genomic features and a graph neural network (GNN) for learning drugs' structural features. Both sub-networks are integrated into a multimodal fusion layer to model the drug response for the given gene mutations and the drug's molecular structures. Furthermore, a pruning approach is applied to provide better interpretations of the XMR model. We use five pathway hierarchies (cell cycle, DNA repair, diseases, signal transduction, and metabolism), which are obtained from the Reactome Pathway Database, as the architecture of VNN for our XMR model to predict drug responses of triple negative breast cancer. Results: We find that our model outperforms other state-of-the-art interpretable deep learning models in terms of predictive performance. In addition, our model can provide biological insights into explaining drug responses for triple-negative breast cancer. Discussion: Overall, combining both VNN and GNN in a multimodal fusion layer, XMR captures key genomic and molecular features and offers reasonable interpretability in biology, thereby better predicting drug responses in cancer patients. Our model would also benefit personalized cancer therapy in the future.
Collapse
Affiliation(s)
- Zihao Wang
- Department of Computer Science, Indiana University Bloomington, Bloomington, IN, United States
| | - Yun Zhou
- Department of Environmental and Occupational Health, School of Public Health, Indiana University Bloomington, Bloomington, IN, United States
| | - Yu Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, IN, United States
| | - Yu K. Mo
- Department of Computer Science, Indiana University Bloomington, Bloomington, IN, United States
| | - Yijie Wang
- Department of Computer Science, Indiana University Bloomington, Bloomington, IN, United States
| |
Collapse
|
37
|
Moon I, LoPiccolo J, Baca SC, Sholl LM, Kehl KL, Hassett MJ, Liu D, Schrag D, Gusev A. Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary. Nat Med 2023; 29:2057-2067. [PMID: 37550415 PMCID: PMC11484892 DOI: 10.1038/s41591-023-02482-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/30/2023] [Indexed: 08/09/2023]
Abstract
Cancer of unknown primary (CUP) is a type of cancer that cannot be traced back to its primary site and accounts for 3-5% of all cancers. Established targeted therapies are lacking for CUP, leading to generally poor outcomes. We developed OncoNPC, a machine-learning classifier trained on targeted next-generation sequencing (NGS) data from 36,445 tumors across 22 cancer types from three institutions. Oncology NGS-based primary cancer-type classifier (OncoNPC) achieved a weighted F1 score of 0.942 for high confidence predictions ([Formula: see text]) on held-out tumor samples, which made up 65.2% of all the held-out samples. When applied to 971 CUP tumors collected at the Dana-Farber Cancer Institute, OncoNPC predicted primary cancer types with high confidence in 41.2% of the tumors. OncoNPC also identified CUP subgroups with significantly higher polygenic germline risk for the predicted cancer types and with significantly different survival outcomes. Notably, patients with CUP who received first palliative intent treatments concordant with their OncoNPC-predicted cancers had significantly better outcomes (hazard ratio (HR) = 0.348; 95% confidence interval (CI) = 0.210-0.570; P = [Formula: see text]). Furthermore, OncoNPC enabled a 2.2-fold increase in patients with CUP who could have received genomically guided therapies. OncoNPC thus provides evidence of distinct CUP subgroups and offers the potential for clinical decision support for managing patients with CUP.
Collapse
Affiliation(s)
- Intae Moon
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA
| | - Jaclyn LoPiccolo
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sylvan C Baca
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Lynette M Sholl
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Kenneth L Kehl
- Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA
| | - Michael J Hassett
- Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA
| | - David Liu
- Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- The Broad Institute of MIT & Harvard, Cambridge, MA, USA
| | - Deborah Schrag
- Memorial Sloan Kettering Cancer Center, New York City, NY, USA
| | - Alexander Gusev
- Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA.
- The Broad Institute of MIT & Harvard, Cambridge, MA, USA.
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
38
|
Ashekyan O, Shahbazyan N, Bareghamyan Y, Kudryavzeva A, Mandel D, Schmidt M, Loeffler-Wirth H, Uduman M, Chand D, Underwood D, Armen G, Arakelyan A, Nersisyan L, Binder H. Transcriptomic Maps of Colorectal Liver Metastasis: Machine Learning of Gene Activation Patterns and Epigenetic Trajectories in Support of Precision Medicine. Cancers (Basel) 2023; 15:3835. [PMID: 37568651 PMCID: PMC10417131 DOI: 10.3390/cancers15153835] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 07/24/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023] Open
Abstract
The molecular mechanisms of the liver metastasis of colorectal cancer (CRLM) remain poorly understood. Here, we applied machine learning and bioinformatics trajectory inference to analyze a gene expression dataset of CRLM. We studied the co-regulation patterns at the gene level, the potential paths of tumor development, their functional context, and their prognostic relevance. Our analysis confirmed the subtyping of five liver metastasis subtypes (LMS). We provide gene-marker signatures for each LMS, and a comprehensive functional characterization that considers both the hallmarks of cancer and the tumor microenvironment. The ordering of CRLMs along a pseudotime-tree revealed a continuous shift in expression programs, suggesting a developmental relationship between the subtypes. Notably, trajectory inference and personalized analysis discovered a range of epigenetic states that shape and guide metastasis progression. By constructing prognostic maps that divided the expression landscape into regions associated with favorable and unfavorable prognoses, we derived a prognostic expression score. This was associated with critical processes such as epithelial-mesenchymal transition, treatment resistance, and immune evasion. These factors were associated with responses to neoadjuvant treatment and the formation of an immuno-suppressive, mesenchymal state. Our machine learning-based molecular profiling provides an in-depth characterization of CRLM heterogeneity with possible implications for treatment and personalized diagnostics.
Collapse
Affiliation(s)
- Ohanes Ashekyan
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia; (O.A.); (N.S.); (Y.B.); (A.K.); (D.M.); (L.N.)
| | - Nerses Shahbazyan
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia; (O.A.); (N.S.); (Y.B.); (A.K.); (D.M.); (L.N.)
| | - Yeva Bareghamyan
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia; (O.A.); (N.S.); (Y.B.); (A.K.); (D.M.); (L.N.)
| | - Anna Kudryavzeva
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia; (O.A.); (N.S.); (Y.B.); (A.K.); (D.M.); (L.N.)
| | - Daria Mandel
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia; (O.A.); (N.S.); (Y.B.); (A.K.); (D.M.); (L.N.)
| | - Maria Schmidt
- IZBI, Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstr. 16–18, 04107 Leipzig, Germany; (M.S.); (H.L.-W.)
| | - Henry Loeffler-Wirth
- IZBI, Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstr. 16–18, 04107 Leipzig, Germany; (M.S.); (H.L.-W.)
| | - Mohamed Uduman
- Agenus Inc., 3 Forbes Road, Lexington, MA 7305, USA; (M.U.); (D.C.); (D.U.); (G.A.)
| | - Dhan Chand
- Agenus Inc., 3 Forbes Road, Lexington, MA 7305, USA; (M.U.); (D.C.); (D.U.); (G.A.)
| | - Dennis Underwood
- Agenus Inc., 3 Forbes Road, Lexington, MA 7305, USA; (M.U.); (D.C.); (D.U.); (G.A.)
| | - Garo Armen
- Agenus Inc., 3 Forbes Road, Lexington, MA 7305, USA; (M.U.); (D.C.); (D.U.); (G.A.)
| | - Arsen Arakelyan
- Institute of Molecular Biology of the National Academy of Sciences of the Republic of Armenia, 7 Has-Ratyan Str., Yerevan 0014, Armenia;
| | - Lilit Nersisyan
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia; (O.A.); (N.S.); (Y.B.); (A.K.); (D.M.); (L.N.)
| | - Hans Binder
- Armenian Bioinformatics Institute, 3/6 Nelson Stepanyan Str., Yerevan 0062, Armenia; (O.A.); (N.S.); (Y.B.); (A.K.); (D.M.); (L.N.)
- IZBI, Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstr. 16–18, 04107 Leipzig, Germany; (M.S.); (H.L.-W.)
| |
Collapse
|
39
|
Huang Y, Pfeiffer SM, Zhang Q. Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients. Comput Struct Biotechnol J 2023; 21:3865-3874. [PMID: 37593720 PMCID: PMC10432138 DOI: 10.1016/j.csbj.2023.07.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 07/16/2023] [Accepted: 07/25/2023] [Indexed: 08/19/2023] Open
Abstract
Timely and accurate primary tumor diagnosis is critical, and misdiagnoses and delays may cause undue health and economic burden. To predict primary tumor types based on genomics data from a de-identified US nationwide clinico-genomic database (CGDB), the XGBoost-based Clinico-Genomic Machine Learning Model (XC-GeM) was developed to predict 13 primary tumor types based on data from 12,060 patients in the CGDB, derived from routine clinical comprehensive genomic profiling (CGP) testing and chart-confirmed electronic health records (EHRs). The SHapley Additive exPlanations method was used to interpret model predictions. XC-GeM reached an outstanding area under the curve (AUC) of 0.965 and Matthew's correlation coefficient (MCC) of 0.742 in the holdout validation dataset. In the independent validation cohort of 955 patients, XC-GeM reached 0.954 AUC and 0.733 MCC and made correct predictions in 77% of non-small cell lung cancer (NSCLC), 86% of colorectal cancer, and 84% of breast cancer patients. Top predictors for the overall model (e.g. tumor mutational burden (TMB), gender, and KRAS alteration), and for specific tumor types (e.g., TMB and EGFR alteration for NSCLC) were supported by published studies. XC-GeM also achieved an excellent AUC of 0.880 and positive MCC of 0.540 in 507 patients with missing primary diagnosis. XC-GeM is the first algorithm to predict primary tumor type using US nationwide data from routine CGP testing and chart-confirmed EHRs, showing promising performance. It may enhance the accuracy and efficiency of cancer diagnoses, enabling more timely treatment choices and potentially leading to better outcomes.
Collapse
Affiliation(s)
| | | | - Qing Zhang
- Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080, United States
| |
Collapse
|
40
|
Sanjaya P, Maljanen K, Katainen R, Waszak SM, Aaltonen LA, Stegle O, Korbel JO, Pitkänen E. Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping. Genome Med 2023; 15:47. [PMID: 37420249 PMCID: PMC10326961 DOI: 10.1186/s13073-023-01204-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 06/21/2023] [Indexed: 07/09/2023] Open
Abstract
BACKGROUND Cancer genome sequencing enables accurate classification of tumours and tumour subtypes. However, prediction performance is still limited using exome-only sequencing and for tumour types with low somatic mutation burden such as many paediatric tumours. Moreover, the ability to leverage deep representation learning in discovery of tumour entities remains unknown. METHODS We introduce here Mutation-Attention (MuAt), a deep neural network to learn representations of simple and complex somatic alterations for prediction of tumour types and subtypes. In contrast to many previous methods, MuAt utilizes the attention mechanism on individual mutations instead of aggregated mutation counts. RESULTS We trained MuAt models on 2587 whole cancer genomes (24 tumour types) from the Pan-Cancer Analysis of Whole Genomes (PCAWG) and 7352 cancer exomes (20 types) from the Cancer Genome Atlas (TCGA). MuAt achieved prediction accuracy of 89% for whole genomes and 64% for whole exomes, and a top-5 accuracy of 97% and 90%, respectively. MuAt models were found to be well-calibrated and perform well in three independent whole cancer genome cohorts with 10,361 tumours in total. We show MuAt to be able to learn clinically and biologically relevant tumour entities including acral melanoma, SHH-activated medulloblastoma, SPOP-associated prostate cancer, microsatellite instability, POLE proofreading deficiency, and MUTYH-associated pancreatic endocrine tumours without these tumour subtypes and subgroups being provided as training labels. Finally, scrunity of MuAt attention matrices revealed both ubiquitous and tumour-type specific patterns of simple and complex somatic mutations. CONCLUSIONS Integrated representations of somatic alterations learnt by MuAt were able to accurately identify histological tumour types and identify tumour entities, with potential to impact precision cancer medicine.
Collapse
Affiliation(s)
- Prima Sanjaya
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland
| | - Katri Maljanen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland
| | - Riku Katainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland
- Department of Medical and Clinical Genetics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Sebastian M Waszak
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo and Oslo University Hospital, Oslo, Norway
- Swiss Institute for Experimental Cancer Research School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Department of Neurology, University of California, San Francisco (UCSF), San Francisco, CA, USA
| | - Lauri A Aaltonen
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jan O Korbel
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Esa Pitkänen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland.
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
41
|
Michuda J, Breschi A, Kapilivsky J, Manghnani K, McCarter C, Hockenberry AJ, Mineo B, Igartua C, Dudley JT, Stumpe MC, Beaubier N, Shirazi M, Jones R, Morency E, Blackwell K, Guinney J, Beauchamp KA, Taxter T. Validation of a Transcriptome-Based Assay for Classifying Cancers of Unknown Primary Origin. Mol Diagn Ther 2023; 27:499-511. [PMID: 37099070 PMCID: PMC10300170 DOI: 10.1007/s40291-023-00650-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/02/2023] [Indexed: 04/27/2023]
Abstract
INTRODUCTION Cancers assume a variety of distinct histologies, and may originate from a myriad of sites including solid organs, hematopoietic cells, and connective tissue. Clinical decision-making based on consensus guidelines such as the National Comprehensive Cancer Network (NCCN) is often predicated on a specific histologic and anatomic diagnosis, supported by clinical features and pathologist interpretation of morphology and immunohistochemical (IHC) staining patterns. However, in patients with nonspecific morphologic and IHC findings-in addition to ambiguous clinical presentations such as recurrence versus new primary-a definitive diagnosis may not be possible, resulting in the patient being categorized as having a cancer of unknown primary (CUP). Therapeutic options and clinical outcomes are poor for patients with CUP, with a median survival of 8-11 months. METHODS Here, we describe and validate the Tempus Tumor Origin (Tempus TO) assay, an RNA-sequencing-based machine learning classifier capable of discriminating between 68 clinically relevant cancer subtypes. Model accuracy was assessed using primary and/or metastatic samples with known subtype. RESULTS We show that the Tempus TO model is 91% accurate when assessed on both a retrospectively held out cohort and a set of samples sequenced after model freeze that collectively contained 9210 total samples with known diagnoses. When evaluated on a cohort of CUPs, the model recapitulated established associations between genomic alterations and cancer subtype. DISCUSSION Combining diagnostic prediction tests (e.g., Tempus TO) with sequencing-based variant reporting (e.g., Tempus xT) may expand therapeutic options for patients with cancers of unknown primary or uncertain histology.
Collapse
|
42
|
Zheng W, Pu M, Li X, Du Z, Jin S, Li X, Zhou J, Zhang Y. Deep learning model accurately classifies metastatic tumors from primary tumors based on mutational signatures. Sci Rep 2023; 13:8752. [PMID: 37253775 PMCID: PMC10229594 DOI: 10.1038/s41598-023-35842-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 05/24/2023] [Indexed: 06/01/2023] Open
Abstract
Metastatic propagation is the leading cause of death for most cancers. Prediction and elucidation of metastatic process is crucial for the treatment of cancer. Even though somatic mutations have been linked to tumorigenesis and metastasis, it is less explored whether metastatic events can be identified through genomic mutational signatures, which are concise descriptions of the mutational processes. Here, we developed MetaWise, a Deep Neural Network (DNN) model, by applying mutational signatures as input features calculated from Whole-Exome Sequencing (WES) data of TCGA and other metastatic cohorts. This model can accurately classify metastatic tumors from primary tumors and outperform traditional machine learning (ML) models and a deep learning (DL) model, DiaDeL. Signatures of non-coding mutations also have a major impact on the model's performance. SHapley Additive exPlanations (SHAP) and Local Surrogate (LIME) analyses identify several mutational signatures which are directly correlated to metastatic spread in cancers, including APOBEC-mutagenesis, UV-induced signatures, and DNA damage response deficiency signatures.
Collapse
Affiliation(s)
- Weisheng Zheng
- Beijing StoneWise Technology Co Ltd., Haidian District, Beijing, China
| | - Mengchen Pu
- Beijing StoneWise Technology Co Ltd., Haidian District, Beijing, China
| | - Xiaorong Li
- Beijing StoneWise Technology Co Ltd., Haidian District, Beijing, China
- Minzu University of China, Beijing, China
| | - Zhaolan Du
- Beijing StoneWise Technology Co Ltd., Haidian District, Beijing, China
- Beijing University of Technology, Beijing, China
| | - Sutong Jin
- Beijing StoneWise Technology Co Ltd., Haidian District, Beijing, China
- Harbin Institute of Technology, Weihai, Shandong, China
| | - Xingshuai Li
- Beijing StoneWise Technology Co Ltd., Haidian District, Beijing, China
| | - Jielong Zhou
- Beijing StoneWise Technology Co Ltd., Haidian District, Beijing, China
| | - Yingsheng Zhang
- Beijing StoneWise Technology Co Ltd., Haidian District, Beijing, China.
| |
Collapse
|
43
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 PMCID: PMC10186658 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| |
Collapse
|
44
|
MacDonald S, Foley H, Yap M, Johnston RL, Steven K, Koufariotis LT, Sharma S, Wood S, Addala V, Pearson JV, Roosta F, Waddell N, Kondrashova O, Trzaskowski M. Generalising uncertainty improves accuracy and safety of deep learning analytics applied to oncology. Sci Rep 2023; 13:7395. [PMID: 37149669 PMCID: PMC10164181 DOI: 10.1038/s41598-023-31126-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 03/07/2023] [Indexed: 05/08/2023] Open
Abstract
Uncertainty estimation is crucial for understanding the reliability of deep learning (DL) predictions, and critical for deploying DL in the clinic. Differences between training and production datasets can lead to incorrect predictions with underestimated uncertainty. To investigate this pitfall, we benchmarked one pointwise and three approximate Bayesian DL models for predicting cancer of unknown primary, using three RNA-seq datasets with 10,968 samples across 57 cancer types. Our results highlight that simple and scalable Bayesian DL significantly improves the generalisation of uncertainty estimation. Moreover, we designed a prototypical metric-the area between development and production curve (ADP), which evaluates the accuracy loss when deploying models from development to production. Using ADP, we demonstrate that Bayesian DL improves accuracy under data distributional shifts when utilising 'uncertainty thresholding'. In summary, Bayesian DL is a promising approach for generalising uncertainty, improving performance, transparency, and safety of DL models for deployment in the real world.
Collapse
Affiliation(s)
- Samual MacDonald
- Max Kelsen, Brisbane, QLD, Australia
- ARC Training Centre for Information Resilience (CIRES), Brisbane, Australia
- The University of Queensland, Brisbane, Australia
| | | | | | | | | | | | - Sowmya Sharma
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
- ACL Pathology, Bella Vista, NSW, Australia
| | - Scott Wood
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | | | - John V Pearson
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Fred Roosta
- ARC Training Centre for Information Resilience (CIRES), Brisbane, Australia
- The University of Queensland, Brisbane, Australia
| | - Nicola Waddell
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Olga Kondrashova
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia.
| | - Maciej Trzaskowski
- Max Kelsen, Brisbane, QLD, Australia.
- ARC Training Centre for Information Resilience (CIRES), Brisbane, Australia.
- The University of Queensland, Brisbane, Australia.
- QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia.
| |
Collapse
|
45
|
Cheng N, Lou B, Wang H. Discovering the digital biomarker of hepatocellular carcinoma in serum with SERS-based biosensors and intelligence vision. Colloids Surf B Biointerfaces 2023; 226:113315. [PMID: 37086688 DOI: 10.1016/j.colsurfb.2023.113315] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/31/2023] [Accepted: 04/11/2023] [Indexed: 04/24/2023]
Abstract
By its many virtues, non-biomarker-reliant molecular detection has recently shown bright prospects for cancer screening but its clinical application is hindered by the shortage of measurable criteria that are analogous to biomarkers. Here, we report a digital biomarker, as a new-concept serum biomarker, of hepatocellular carcinoma (HCC) found with SERS-based biosensors and a deep neural network "digital retina" for visualizing and explicitly defining spectral fingerprints. We validate the discovered digital biomarker (a collection of 10 characteristic peaks in the serum SERS spectra) with unsupervised clustering of spectra from an independent sample batch comprised normal individuals and HCC cases; the validation results show clustering accuracies of 95.71% and 100.00%, respectively. Furthermore, we find that the digital biomarker of HCC shares a few common peaks with three clinically applied serum biomarkers, which means it could convey essential biomolecular information similar to these biomarkers. Accordingly, we present an intelligent method for early HCC detection that leverages the digital biomarker with similar traits as biomarkers. Employing the digital biomarker, we could accurately stratify HCC, hepatitis B, and normal populations with linear classifiers, exhibiting accuracies over 92% and area under the receiver operating curve values above 0.93. It is anticipated that this non-biomarker-reliant molecular detection method will facilitate mass cancer screening.
Collapse
Affiliation(s)
- Ningtao Cheng
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang 310058, China.
| | - Bin Lou
- Department of Laboratory Medicine, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310003, China
| | - Hongyang Wang
- International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Hospital, Shanghai 200438, China; National Center for Liver Cancer, Shanghai 201805, China.
| |
Collapse
|
46
|
Swanson K, Wu E, Zhang A, Alizadeh AA, Zou J. From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell 2023; 186:1772-1791. [PMID: 36905928 DOI: 10.1016/j.cell.2023.01.035] [Citation(s) in RCA: 211] [Impact Index Per Article: 105.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 01/10/2023] [Accepted: 01/26/2023] [Indexed: 03/12/2023]
Abstract
Machine learning (ML) is increasingly used in clinical oncology to diagnose cancers, predict patient outcomes, and inform treatment planning. Here, we review recent applications of ML across the clinical oncology workflow. We review how these techniques are applied to medical imaging and to molecular data obtained from liquid and solid tumor biopsies for cancer diagnosis, prognosis, and treatment design. We discuss key considerations in developing ML for the distinct challenges posed by imaging and molecular data. Finally, we examine ML models approved for cancer-related patient usage by regulatory agencies and discuss approaches to improve the clinical usefulness of ML.
Collapse
Affiliation(s)
- Kyle Swanson
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Eric Wu
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| | - Angela Zhang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Ash A Alizadeh
- Department of Medicine, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Computer Science, Stanford University, Stanford, CA, USA; Department of Electrical Engineering, Stanford University, Stanford, CA, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
47
|
Bae M, Kim G, Lee TR, Ahn JM, Park H, Park SR, Song KB, Jun E, Oh D, Lee JW, Park YS, Song KW, Byeon JS, Kim BH, Sohn JH, Kim MH, Kim GM, Chie EK, Kang HC, Kong SY, Woo SM, Lee JE, Ryu JM, Lee J, Kim D, Ki CS, Cho EH, Choi JK. Integrative modeling of tumor genomes and epigenomes for enhanced cancer diagnosis by cell-free DNA. Nat Commun 2023; 14:2017. [PMID: 37037826 PMCID: PMC10085982 DOI: 10.1038/s41467-023-37768-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 03/22/2023] [Indexed: 04/12/2023] Open
Abstract
Multi-cancer early detection remains a key challenge in cell-free DNA (cfDNA)-based liquid biopsy. Here, we perform cfDNA whole-genome sequencing to generate two test datasets covering 2125 patient samples of 9 cancer types and 1241 normal control samples, and also a reference dataset for background variant filtering based on 20,529 low-depth healthy samples. An external cfDNA dataset consisting of 208 cancer and 214 normal control samples is used for additional evaluation. Accuracy for cancer detection and tissue-of-origin localization is achieved using our algorithm, which incorporates cancer type-specific profiles of mutation distribution and chromatin organization in tumor tissues as model references. Our integrative model detects early-stage cancers, including those of pancreatic origin, with high sensitivity that is comparable to that of late-stage detection. Model interpretation reveals the contribution of cancer type-specific genomic and epigenomic features. Our methodologies may lay the groundwork for accurate cfDNA-based cancer diagnosis, especially at early stages.
Collapse
Affiliation(s)
- Mingyun Bae
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Gyuhee Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Tae-Rim Lee
- Genome Research Center, GC Genome, Yongin, Republic of Korea
| | - Jin Mo Ahn
- Genome Research Center, GC Genome, Yongin, Republic of Korea
| | - Hyunwook Park
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Sook Ryun Park
- Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Ki Byung Song
- Division of Hepato-Biliary and Pancreatic Surgery, Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Eunsung Jun
- Division of Hepato-Biliary and Pancreatic Surgery, Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Dongryul Oh
- Department of Radiation Oncology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jeong-Won Lee
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Young Sik Park
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Ki-Won Song
- Division of Hepatopancreatobiliary Surgery and Liver Transplantation, Department of Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jeong-Sik Byeon
- Department of Gastroenterology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Bo Hyun Kim
- Center for Liver and Pancreatobiliary Cancer, National Cancer Center, Goyang, Republic of Korea
| | - Joo Hyuk Sohn
- Division of Medical Oncology, Department of Internal Medicine, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, Republic of Korea
- AIMA, Inc., Avison Biomedical Research Center, Seoul, Republic of Korea
| | - Min Hwan Kim
- Division of Medical Oncology, Department of Internal Medicine, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Gun Min Kim
- Division of Medical Oncology, Department of Internal Medicine, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Eui Kyu Chie
- Department of Radiation Oncology, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Hyun-Cheol Kang
- Department of Radiation Oncology, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Sun-Young Kong
- Department of Laboratory Medicine, National Cancer Center, Goyang, Republic of Korea
| | - Sang Myung Woo
- Center for Liver and Pancreatobiliary Cancer, National Cancer Center, Goyang, Republic of Korea
| | - Jeong Eon Lee
- Department of Surgery, Samsung Medical Center, Seoul, Republic of Korea
| | - Jai Min Ryu
- Department of Surgery, Samsung Medical Center, Seoul, Republic of Korea
| | - Junnam Lee
- Genome Research Center, GC Genome, Yongin, Republic of Korea
| | - Dasom Kim
- Genome Research Center, GC Genome, Yongin, Republic of Korea
| | - Chang-Seok Ki
- Genome Research Center, GC Genome, Yongin, Republic of Korea
| | - Eun-Hae Cho
- Genome Research Center, GC Genome, Yongin, Republic of Korea.
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea.
| |
Collapse
|
48
|
Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023; 15:1958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Patterson
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- The Wistar Institute, Philadelphia, PA 19104, USA
| | | | - Bin Tian
- The Wistar Institute, Philadelphia, PA 19104, USA
| | - Noam Auslander
- The Wistar Institute, Philadelphia, PA 19104, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
49
|
Zhang F, Zhang R, Wei M, Li G. A machine learning based approach for quantitative evaluation of cell migration in Transwell assays based on deformation characteristics. Analyst 2023; 148:1371-1382. [PMID: 36857714 DOI: 10.1039/d2an01882a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
Many pathological and physiological processes, including embryonic development, immune response and cancer metastasis, involve studies on cell migration, and especially detection methods, for which it is difficult to satisfy the requirements for rapid and quantitative evaluation and analysis. In view of the shortcomings in simultaneously quantifying the number of migrated cells and non-migrated cells using Transwell assays, we propose a novelty approach for the evaluation of cell migration by distinguishing whether the cells have migrated based on the regularity of the cell morphology changes. Traditionally, the status of living cells and dead cells are detected and analyzed by machine learning using some common morphological characteristics, e.g., area and perimeter of the cells. However, the accuracy of detecting whether cells have migrated or not using these common characteristics is not high, and the characteristics are not appropriate for our studies. Therefore, from the point of view of mechanism analysis for the migration behavior, we examined the regularity of different morphology changes of migrated cells and non-migrated cells, and thus discovered the distinguishable morphological characteristics. Then, two deformation characteristics, deformation index and taper index are proposed. Then, a machine learning based algorithm that can identify migrated cells according to the proposed deformation characteristics was devised. In addition, images of migrated cells and non-migrated cells were obtained from the Transwell assays. This algorithm was trained, and was able to successfully identify migrated cells with an accuracy of 84% using the proposed morphological characteristics. This method greatly improves the identification accuracy when compared with the identification of traditional characteristics of which the accuracy was about 54.7%. This machine learning based method might be employed as a potential tool for cell counting and evaluation of cell migration with the aim of reducing time and improving automation compared with the traditional method. This method is effective, rapid, and incorporate advances in artificial intelligence which could be used for adapting the current evaluation methods.
Collapse
Affiliation(s)
- Fei Zhang
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013, China.
| | - Rongbiao Zhang
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013, China.
| | - Mingji Wei
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013, China.
| | - Guoxiao Li
- School of Information Engineering, Jiangsu Vocational College of Agriculture and Forestry, Jurong, Jiangsu 212400, China
| |
Collapse
|
50
|
Giesemann J, Delgadillo J, Schwartz B, Bennemann B, Lutz W. Predicting dropout from psychological treatment using different machine learning algorithms, resampling methods, and sample sizes. Psychother Res 2023:1-13. [PMID: 36669124 DOI: 10.1080/10503307.2022.2161432] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
OBJECTIVE The occurrence of dropout from psychological interventions is associated with poor treatment outcome and high health, societal and economic costs. Recently, machine learning (ML) algorithms have been tested in psychotherapy outcome research. Dropout predictions are usually limited by imbalanced datasets and the size of the sample. This paper aims to improve dropout prediction by comparing ML algorithms, sample sizes and resampling methods. METHOD Twenty ML algorithms were examined in twelve subsamples (drawn from a sample of N = 49,602) using four resampling methods in comparison to the absence of resampling and to each other. Prediction accuracy was evaluated in an independent holdout dataset using the F1-Measure. RESULTS Resampling methods improved the performance of ML algorithms and down-sampling can be recommended, as it was the fastest method and as accurate as the other methods. For the highest mean F1-Score of .51 a minimum sample size of N = 300 was necessary. No specific algorithm or algorithm group can be recommended. CONCLUSION Resampling methods could improve the accuracy of predicting dropout in psychological interventions. Down-sampling is recommended as it is the least computationally taxing method. The training sample should contain at least 300 cases.
Collapse
Affiliation(s)
- Julia Giesemann
- Clinical Psychology and Psychotherapy, Department of Psychology, University of Trier, Trier, Germany
| | - Jaime Delgadillo
- Clinical and Applied Psychology Unit, Department of Psychology, University of Sheffield, Sheffield, UK
| | - Brian Schwartz
- Clinical Psychology and Psychotherapy, Department of Psychology, University of Trier, Trier, Germany
| | - Björn Bennemann
- Clinical Psychology and Psychotherapy, Department of Psychology, University of Trier, Trier, Germany
| | - Wolfgang Lutz
- Clinical Psychology and Psychotherapy, Department of Psychology, University of Trier, Trier, Germany
| |
Collapse
|