1
|
Belčič Mikič T, Arnol M. The Use of Machine Learning in the Diagnosis of Kidney Allograft Rejection: Current Knowledge and Applications. Diagnostics (Basel) 2024; 14:2482. [PMID: 39594148 PMCID: PMC11592658 DOI: 10.3390/diagnostics14222482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 10/31/2024] [Accepted: 11/04/2024] [Indexed: 11/28/2024] Open
Abstract
Kidney allograft rejection is one of the main limitations to long-term kidney transplant survival. The diagnostic gold standard for detecting rejection is a kidney biopsy, an invasive procedure that can often give imprecise results due to complex diagnostic criteria and high interobserver variability. In recent years, several additional diagnostic approaches to rejection have been investigated, some of them with the aid of machine learning (ML). In this review, we addressed studies that investigated the detection of kidney allograft rejection over the last decade using various ML algorithms. Various ML techniques were used in three main categories: (a) histopathologic assessment of kidney tissue with the aim to improve the diagnostic accuracy of a kidney biopsy, (b) assessment of gene expression in rejected kidney tissue or peripheral blood and the development of diagnostic classifiers based on these data, (c) radiologic assessment of kidney tissue using diffusion-weighted magnetic resonance imaging and the construction of a computer-aided diagnostic system. In histopathology, ML algorithms could serve as a support to the pathologist to avoid misclassifications and overcome interobserver variability. Diagnostic platforms based on biopsy-based transcripts serve as a supplement to a kidney biopsy, especially in cases where histopathologic diagnosis is inconclusive. ML models based on radiologic evaluation or gene signature in peripheral blood may be useful in cases where kidney biopsy is contraindicated in addition to other non-invasive biomarkers. The implementation of ML-based diagnostic methods is usually slow and undertaken with caution considering ethical and legal issues. In summary, the approach to the diagnosis of rejection should be individualized and based on all available diagnostic tools (including ML-based), leaving the responsibility for over- and under-treatment in the hands of the clinician.
Collapse
Affiliation(s)
- Tanja Belčič Mikič
- Department of Nephrology, University Medical Centre Ljubljana, Zaloška 7, 1000 Ljubljana, Slovenia;
- Faculty of Medicine, University of Ljubljana, Vrazov trg 2, 1000 Ljubljana, Slovenia
| | - Miha Arnol
- Department of Nephrology, University Medical Centre Ljubljana, Zaloška 7, 1000 Ljubljana, Slovenia;
- Faculty of Medicine, University of Ljubljana, Vrazov trg 2, 1000 Ljubljana, Slovenia
| |
Collapse
|
2
|
Lundy DJ, Szomolay B, Liao CT. Systems Approaches to Cell Culture-Derived Extracellular Vesicles for Acute Kidney Injury Therapy: Prospects and Challenges. FUNCTION 2024; 5:zqae012. [PMID: 38706963 PMCID: PMC11065115 DOI: 10.1093/function/zqae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/02/2024] [Accepted: 03/05/2024] [Indexed: 05/07/2024] Open
Abstract
Acute kidney injury (AKI) is a heterogeneous syndrome, comprising diverse etiologies of kidney insults that result in high mortality and morbidity if not well managed. Although great efforts have been made to investigate underlying pathogenic mechanisms of AKI, there are limited therapeutic strategies available. Extracellular vesicles (EV) are membrane-bound vesicles secreted by various cell types, which can serve as cell-free therapy through transfer of bioactive molecules. In this review, we first overview the AKI syndrome and EV biology, with a particular focus on the technical aspects and therapeutic application of cell culture-derived EVs. Second, we illustrate how multi-omic approaches to EV miRNA, protein, and genomic cargo analysis can yield new insights into their mechanisms of action and address unresolved questions in the field. We then summarize major experimental evidence regarding the therapeutic potential of EVs in AKI, which we subdivide into stem cell and non-stem cell-derived EVs. Finally, we highlight the challenges and opportunities related to the clinical translation of animal studies into human patients.
Collapse
Affiliation(s)
- David J Lundy
- Graduate Institute of Biomedical Materials & Tissue Engineering, Taipei Medical University, Taipei 235603, Taiwan
- International PhD Program in Biomedical Engineering, Taipei Medical University, Taipei 235603, Taiwan
- Center for Cell Therapy, Taipei Medical University Hospital, Taipei 110301, Taiwan
| | - Barbara Szomolay
- Systems Immunity Research Institute, Cardiff University School of Medicine, Cardiff CF14 4XN, UK
- Division of Infection and Immunity, Cardiff University School of Medicine, Cardiff CF14 4XN, UK
| | - Chia-Te Liao
- Division of Nephrology, Department of Internal Medicine, Shuang Ho Hospital, Taipei Medical University, New Taipei City 23561, Taiwan
- Division of Nephrology, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
- Research Center of Urology and Kidney, Taipei Medical University, Taipei 110, Taiwan
| |
Collapse
|
3
|
Heil BJ, Crawford J, Greene CS. The effect of non-linear signal in classification problems using gene expression. PLoS Comput Biol 2023; 19:e1010984. [PMID: 36972227 PMCID: PMC10079219 DOI: 10.1371/journal.pcbi.1010984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 04/06/2023] [Accepted: 02/28/2023] [Indexed: 03/29/2023] Open
Abstract
Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.
Collapse
Affiliation(s)
- Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, United States of America
| | - Jake Crawford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, United States of America
| | - Casey S. Greene
- Department of Pharmacology, University of Colorado School of Medicine, Colorado, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Colorado, United States of America
| |
Collapse
|
4
|
Zhang Y, Wang M, Wang Z, Liu Y, Xiong S, Zou Q. MetaSEM: Gene Regulatory Network Inference from Single-Cell RNA Data by Meta-Learning. Int J Mol Sci 2023; 24:2595. [PMID: 36768917 PMCID: PMC9916710 DOI: 10.3390/ijms24032595] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Regulators in gene regulatory networks (GRNs) are crucial for identifying cell states. However, GRN inference based on scRNA-seq data has several problems, including high dimensionality and sparsity, and requires more label data. Therefore, we propose a meta-learning GRN inference framework to identify regulatory factors. Specifically, meta-learning solves the parameter optimization problem caused by high-dimensional sparse data features. In addition, a few-shot solution was used to solve the problem of lack of label data. A structural equation model (SEM) was embedded in the model to identify important regulators. We integrated the parameter optimization strategy into the bi-level optimization to extract the feature consistent with GRN reasoning. This unique design makes our model robust to small-scale data. By studying the GRN inference task, we confirmed that the selected regulators were closely related to gene expression specificity. We further analyzed the GRN inferred to find the important regulators in cell type identification. Extensive experimental results showed that our model effectively captured the regulator in single-cell GRN inference. Finally, the visualization results verified the importance of the selected regulators for cell type recognition.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Maocheng Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Zixuan Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Shuwen Xiong
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| |
Collapse
|
5
|
Liu F, Tao W, Yang J, Wu W, Wang J. STNet: A novel spiking neural network combining its own time signal with the spatial signal of an artificial neural network. Front Neurosci 2023; 17:1151949. [PMID: 37144088 PMCID: PMC10153670 DOI: 10.3389/fnins.2023.1151949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/10/2023] [Indexed: 05/06/2023] Open
Abstract
Introduction This article proposes a novel hybrid network that combines the temporal signal of a spiking neural network (SNN) with the spatial signal of an artificial neural network (ANN), namely the Spatio-Temporal Combined Network (STNet). Methods Inspired by the way the visual cortex in the human brain processes visual information, two versions of STNet are designed: a concatenated one (C-STNet) and a parallel one (P-STNet). In the C-STNet, the ANN, simulating the primary visual cortex, extracts the simple spatial information of objects first, and then the obtained spatial information is encoded as spiking time signals for transmission to the rear SNN which simulates the extrastriate visual cortex to process and classify the spikes. With the view that information from the primary visual cortex reaches the extrastriate visual cortex via ventral and dorsal streams, in P-STNet, the parallel combination of the ANN and the SNN is employed to extract the original spatio-temporal information from samples, and the extracted information is transferred to a posterior SNN for classification. Results The experimental results of the two STNets obtained on six small and two large benchmark datasets were compared with eight commonly used approaches, demonstrating that the two STNets can achieve improved performance in terms of accuracy, generalization, stability, and convergence. Discussion These prove that the idea of combining ANN and SNN is feasible and can greatly improve the performance of SNN.
Collapse
Affiliation(s)
- Fang Liu
- School of Mathematical Sciences, Dalian University of Technology, Dalian, China
- Key Laboratory for Computational Mathematics and Data Intelligence of Liaoning Province, Dalian, China
| | - Wentao Tao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, China
- Key Laboratory for Computational Mathematics and Data Intelligence of Liaoning Province, Dalian, China
| | - Jie Yang
- School of Mathematical Sciences, Dalian University of Technology, Dalian, China
- Key Laboratory for Computational Mathematics and Data Intelligence of Liaoning Province, Dalian, China
- *Correspondence: Jie Yang
| | - Wei Wu
- School of Mathematical Sciences, Dalian University of Technology, Dalian, China
- Key Laboratory for Computational Mathematics and Data Intelligence of Liaoning Province, Dalian, China
| | - Jian Wang
- College of Science, China University of Petroleum (East China), Qingdao, China
| |
Collapse
|
6
|
Grobe N, Scheiber J, Zhang H, Garbe C, Wang X. Omics and Artificial Intelligence in Kidney Diseases. ADVANCES IN KIDNEY DISEASE AND HEALTH 2023; 30:47-52. [PMID: 36723282 DOI: 10.1053/j.akdh.2022.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 10/28/2022] [Accepted: 11/16/2022] [Indexed: 01/20/2023]
Abstract
Omics applications in nephrology may have relevance in the future to improve clinical care of kidney disease patients. In a short term, patients will benefit from specific measurement and computational analyses around biomarkers identified at various omics-levels. In mid term and long term, these approaches will need to be integrated into a holistic representation of the kidney and all its influencing factors for individualized patient care. Research demonstrates robust data to justify the application of omics for better understanding, risk stratification, and individualized treatment of kidney disease patients. Despite these advances in the research setting, there is still a lack of evidence showing the combination of omics technologies with artificial intelligence and its application in clinical diagnostics and care of patients with kidney disease.
Collapse
Affiliation(s)
| | | | | | - Christian Garbe
- Frankfurter Innovationszentrum Biotechnologie, Frankfurt am Main, Germany
| | | |
Collapse
|
7
|
Moreno M, Vilaça R, Ferreira PG. Scalable transcriptomics analysis with Dask: applications in data science and machine learning. BMC Bioinformatics 2022; 23:514. [PMID: 36451115 PMCID: PMC9710082 DOI: 10.1186/s12859-022-05065-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 11/16/2022] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. METHODS In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. RESULTS This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https://github.com/martaccmoreno/gexp-ml-dask . CONCLUSION By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.
Collapse
Affiliation(s)
- Marta Moreno
- grid.5808.50000 0001 1503 7226Department of Computer Science, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal ,grid.20384.3d0000 0004 0500 6380Laboratory of Artificial Intelligence and Decision Support, INESC TEC, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal
| | - Ricardo Vilaça
- grid.20384.3d0000 0004 0500 6380High-Assurance Software Laboratory, INESC TEC, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal ,grid.10328.380000 0001 2159 175XDepartment of Informatics, Minho Advanced Computing Center, University of Minho, Gualtar, 4710-070 Braga, Portugal
| | - Pedro G. Ferreira
- grid.5808.50000 0001 1503 7226Department of Computer Science, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal ,grid.20384.3d0000 0004 0500 6380Laboratory of Artificial Intelligence and Decision Support, INESC TEC, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal ,grid.5808.50000 0001 1503 7226Institute of Molecular Pathology and Immunology of the University of Porto, Institute for Research and Innovation in Health (i3s), R. Alfredo Allen 208, 4200-135 Porto, Portugal
| |
Collapse
|
8
|
Stafford IS, Gosink MM, Mossotto E, Ennis S, Hauben M. A Systematic Review of Artificial Intelligence and Machine Learning Applications to Inflammatory Bowel Disease, with Practical Guidelines for Interpretation. Inflamm Bowel Dis 2022; 28:1573-1583. [PMID: 35699597 PMCID: PMC9527612 DOI: 10.1093/ibd/izac115] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Indexed: 12/15/2022]
Abstract
BACKGROUND Inflammatory bowel disease (IBD) is a gastrointestinal chronic disease with an unpredictable disease course. Computational methods such as machine learning (ML) have the potential to stratify IBD patients for the provision of individualized care. The use of ML methods for IBD was surveyed, with an additional focus on how the field has changed over time. METHODS On May 6, 2021, a systematic review was conducted through a search of MEDLINE and Embase databases, with the search structure ("machine learning" OR "artificial intelligence") AND ("Crohn* Disease" OR "Ulcerative Colitis" OR "Inflammatory Bowel Disease"). Exclusion criteria included studies not written in English, no human patient data, publication before 2001, studies that were not peer reviewed, nonautoimmune disease comorbidity research, and record types that were not primary research. RESULTS Seventy-eight (of 409) records met the inclusion criteria. Random forest methods were most prevalent, and there was an increase in neural networks, mainly applied to imaging data sets. The main applications of ML to clinical tasks were diagnosis (18 of 78), disease course (22 of 78), and disease severity (16 of 78). The median sample size was 263. Clinical and microbiome-related data sets were most popular. Five percent of studies used an external data set after training and testing for additional model validation. DISCUSSION Availability of longitudinal and deep phenotyping data could lead to better modeling. Machine learning pipelines that consider imbalanced data and that feature selection only on training data will generate more generalizable models. Machine learning models are increasingly being applied to more complex clinical tasks for specific phenotypes, indicating progress towards personalized medicine for IBD.
Collapse
Affiliation(s)
- Imogen S Stafford
- Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
- Institute for Life Sciences, University Of Southampton, Southampton, UK
- NIHR Southampton Biomedical Research, University HospitalSouthampton, Southampton, UK
| | | | - Enrico Mossotto
- Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| | - Sarah Ennis
- Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| | - Manfred Hauben
- Pfizer Inc, New York, NY, USA
- NYU Langone Health, Department of Medicine, New York, NY, USA
| |
Collapse
|
9
|
|
10
|
Talukder A, Zhang W, Li X, Hu H. A deep learning method for miRNA/isomiR target detection. Sci Rep 2022; 12:10618. [PMID: 35739186 PMCID: PMC9226005 DOI: 10.1038/s41598-022-14890-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 06/14/2022] [Indexed: 11/30/2022] Open
Abstract
Accurate identification of microRNA (miRNA) targets at base-pair resolution has been an open problem for over a decade. The recent discovery of miRNA isoforms (isomiRs) adds more complexity to this problem. Despite the existence of many methods, none considers isomiRs, and their performance is still suboptimal. We hypothesize that by taking the isomiR-mRNA interactions into account and applying a deep learning model to study miRNA-mRNA interaction features, we may improve the accuracy of miRNA target predictions. We developed a deep learning tool called DMISO to capture the intricate features of miRNA/isomiR-mRNA interactions. Based on tenfold cross-validation, DMISO showed high precision (95%) and recall (90%). Evaluated on three independent datasets, DMISO had superior performance to five tools, including three popular conventional tools and two recently developed deep learning-based tools. By applying two popular feature interpretation strategies, we demonstrated the importance of the miRNA regions other than their seeds and the potential contribution of the RNA-binding motifs within miRNAs/isomiRs and mRNAs to the miRNA/isomiR-mRNA interactions.
Collapse
Affiliation(s)
- Amlan Talukder
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Wencai Zhang
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL, 32816, USA.
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL, 32816, USA.
| |
Collapse
|
11
|
Zhang L, Mao R, Lau CT, Chung WC, Chan JCP, Liang F, Zhao C, Zhang X, Bian Z. Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods. Sci Rep 2022; 12:9962. [PMID: 35705632 PMCID: PMC9200771 DOI: 10.1038/s41598-022-14048-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 05/31/2022] [Indexed: 12/11/2022] Open
Abstract
Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expression signature and disease severity, we thereby aim to assess whether genes can help to diagnose UC and predict its correlation with immune regulation. A total of ten eligible microarrays (including 387 UC patients and 139 healthy subjects) were included in this study, specifically with six microarrays (GSE48634, GSE6731, GSE114527, GSE13367, GSE36807, and GSE3629) in the training group and four microarrays (GSE53306, GSE87473, GSE74265, and GSE96665) in the testing group. After the data processing, we found 87 differently expressed genes. Furthermore, a total of six machine learning methods, including support vector machine, least absolute shrinkage and selection operator, random forest, gradient boosting machine, principal component analysis, and neural network were adopted to identify potentially useful genes. The synthetic minority oversampling (SMOTE) was used to adjust the imbalanced sample size for two groups (if any). Consequently, six genes were selected for model establishment. According to the receiver operating characteristic, two genes of OLFM4 and C4BPB were finally identified. The average values of area under curve for these two genes are higher than 0.8, either in the original datasets or SMOTE-adjusted datasets. Besides, these two genes also significantly correlated to six immune cells, namely Macrophages M1, Macrophages M2, Mast cells activated, Mast cells resting, Monocytes, and NK cells activated (P < 0.05). OLFM4 and C4BPB may be conducive to identifying patients with UC. Further verification studies could be conducted.
Collapse
Affiliation(s)
- Lin Zhang
- Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Rui Mao
- Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Chung Tai Lau
- Chinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, China
| | - Wai Chak Chung
- Chinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, China
| | - Jacky C P Chan
- Department of Computer Science, HKBU Faculty of Science, Hong Kong Baptist University, Hong Kong, SAR, China
| | - Feng Liang
- Chinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, China
| | - Chenchen Zhao
- Oncology Department, The Second Affiliated Hospital of Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Xuan Zhang
- Chinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, China. .,Centre for Chinese Herbal Medicine Drug Development, Hong Kong Baptist University, Hong Kong, SAR, China.
| | - Zhaoxiang Bian
- Chinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, China. .,Centre for Chinese Herbal Medicine Drug Development, Hong Kong Baptist University, Hong Kong, SAR, China.
| |
Collapse
|
12
|
Bourgeais V, Zehraoui F, Hanczar B. GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression. Bioinformatics 2022; 38:2504-2511. [PMID: 35266505 DOI: 10.1093/bioinformatics/btac147] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 02/02/2022] [Accepted: 03/07/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Medical care is becoming more and more specific to patients' needs due to the increased availability of omics data. The application to these data of sophisticated machine learning models, in particular deep learning, can improve the field of precision medicine. However, their use in clinics is limited as their predictions are not accompanied by an explanation. The production of accurate and intelligible predictions can benefit from the inclusion of domain knowledge. Therefore, knowledge-based deep learning models appear to be a promising solution. RESULTS In this paper, we propose GraphGONet, where the Gene Ontology is encapsulated in the hidden layers of a new self-explaining neural network. Each neuron in the layers represents a biological concept, combining the gene expression profile of a patient, and the information from its neighboring neurons. The experiments described in the paper confirm that our model not only performs as accurately as the state-of-the-art (non-explainable ones) but also automatically produces stable and intelligible explanations composed of the biological concepts with the highest contribution. This feature allows experts to use our tool in a medical setting. AVAILABILITY GraphGONet is freely available at https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victoria Bourgeais
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| | - Farida Zehraoui
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| | - Blaise Hanczar
- IBISC,Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes, 91020, France
| |
Collapse
|
13
|
Maudsley S, Leysen H, van Gastel J, Martin B. Systems Pharmacology: Enabling Multidimensional Therapeutics. COMPREHENSIVE PHARMACOLOGY 2022:725-769. [DOI: 10.1016/b978-0-12-820472-6.00017-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
14
|
Bourgeais V, Zehraoui F, Ben Hamdoune M, Hanczar B. Deep GONet: self-explainable deep neural network based on Gene Ontology for phenotype prediction from gene expression data. BMC Bioinformatics 2021; 22:455. [PMID: 34551707 PMCID: PMC8456586 DOI: 10.1186/s12859-021-04370-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 09/08/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND With the rapid advancement of genomic sequencing techniques, massive production of gene expression data is becoming possible, which prompts the development of precision medicine. Deep learning is a promising approach for phenotype prediction (clinical diagnosis, prognosis, and drug response) based on gene expression profile. Existing deep learning models are usually considered as black-boxes that provide accurate predictions but are not interpretable. However, accuracy and interpretation are both essential for precision medicine. In addition, most models do not integrate the knowledge of the domain. Hence, making deep learning models interpretable for medical applications using prior biological knowledge is the main focus of this paper. RESULTS In this paper, we propose a new self-explainable deep learning model, called Deep GONet, integrating the Gene Ontology into the hierarchical architecture of the neural network. This model is based on a fully-connected architecture constrained by the Gene Ontology annotations, such that each neuron represents a biological function. The experiments on cancer diagnosis datasets demonstrate that Deep GONet is both easily interpretable and highly performant to discriminate cancer and non-cancer samples. CONCLUSIONS Our model provides an explanation to its predictions by identifying the most important neurons and associating them with biological functions, making the model understandable for biologists and physicians.
Collapse
Affiliation(s)
- Victoria Bourgeais
- IBISC, Univ Evry, Université Paris-Saclay, 91020 Évry-Courcouronnes, France
| | - Farida Zehraoui
- IBISC, Univ Evry, Université Paris-Saclay, 91020 Évry-Courcouronnes, France
| | | | - Blaise Hanczar
- IBISC, Univ Evry, Université Paris-Saclay, 91020 Évry-Courcouronnes, France
| |
Collapse
|
15
|
Wartmann H, Heins S, Kloiber K, Bonn S. Bias-invariant RNA-sequencing metadata annotation. Gigascience 2021; 10:giab064. [PMID: 34553213 PMCID: PMC8559615 DOI: 10.1093/gigascience/giab064] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 06/11/2021] [Accepted: 09/01/2021] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs. FINDINGS Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples. CONCLUSION Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.
Collapse
Affiliation(s)
- Hannes Wartmann
- Institute of Medical Systems Biology, Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sven Heins
- Institute of Medical Systems Biology, Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Karin Kloiber
- Institute of Medical Systems Biology, Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Stefan Bonn
- Institute of Medical Systems Biology, Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
16
|
Gubatan J, Levitte S, Patel A, Balabanis T, Wei MT, Sinha SR. Artificial intelligence applications in inflammatory bowel disease: Emerging technologies and future directions. World J Gastroenterol 2021; 27:1920-1935. [PMID: 34007130 PMCID: PMC8108036 DOI: 10.3748/wjg.v27.i17.1920] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/04/2021] [Accepted: 04/13/2021] [Indexed: 02/06/2023] Open
Abstract
Inflammatory bowel disease (IBD) is a complex and multifaceted disorder of the gastrointestinal tract that is increasing in incidence worldwide and associated with significant morbidity. The rapid accumulation of large datasets from electronic health records, high-definition multi-omics (including genomics, proteomics, transcriptomics, and metagenomics), and imaging modalities (endoscopy and endomicroscopy) have provided powerful tools to unravel novel mechanistic insights and help address unmet clinical needs in IBD. Although the application of artificial intelligence (AI) methods has facilitated the analysis, integration, and interpretation of large datasets in IBD, significant heterogeneity in AI methods, datasets, and clinical outcomes and the need for unbiased prospective validations studies are current barriers to incorporation of AI into clinical practice. The purpose of this review is to summarize the most recent advances in the application of AI and machine learning technologies in the diagnosis and risk prediction, assessment of disease severity, and prediction of clinical outcomes in patients with IBD.
Collapse
Affiliation(s)
- John Gubatan
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Steven Levitte
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Akshar Patel
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Tatiana Balabanis
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Mike T Wei
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Sidhartha R Sinha
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| |
Collapse
|
17
|
Crawford J, Greene CS. Incorporating biological structure into machine learning models in biomedicine. Curr Opin Biotechnol 2020; 63:126-134. [PMID: 31962244 PMCID: PMC7308204 DOI: 10.1016/j.copbio.2019.12.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 12/17/2019] [Accepted: 12/19/2019] [Indexed: 12/19/2022]
Abstract
In biomedical applications of machine learning, relevant information often has a rich structure that is not easily encoded as real-valued predictors. Examples of such data include DNA or RNA sequences, gene sets or pathways, gene interaction or coexpression networks, ontologies, and phylogenetic trees. We highlight recent examples of machine learning models that use structure to constrain model architecture or incorporate structured data into model training. For machine learning in biomedicine, where sample size is limited and model interpretability is crucial, incorporating prior knowledge in the form of structured data can be particularly useful. The area of research would benefit from performant open source implementations and independent benchmarking efforts.
Collapse
Affiliation(s)
- Jake Crawford
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA, United States.
| |
Collapse
|
18
|
Jin S, Zeng X, Xia F, Huang W, Liu X. Application of deep learning methods in biological networks. Brief Bioinform 2020; 22:1902-1917. [PMID: 32363401 DOI: 10.1093/bib/bbaa043] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 02/19/2020] [Accepted: 03/05/2020] [Indexed: 01/07/2023] Open
Abstract
The increase in biological data and the formation of various biomolecule interaction databases enable us to obtain diverse biological networks. These biological networks provide a wealth of raw materials for further understanding of biological systems, the discovery of complex diseases and the search for therapeutic drugs. However, the increase in data also increases the difficulty of biological networks analysis. Therefore, algorithms that can handle large, heterogeneous and complex data are needed to better analyze the data of these network structures and mine their useful information. Deep learning is a branch of machine learning that extracts more abstract features from a larger set of training data. Through the establishment of an artificial neural network with a network hierarchy structure, deep learning can extract and screen the input information layer by layer and has representation learning ability. The improved deep learning algorithm can be used to process complex and heterogeneous graph data structures and is increasingly being applied to the mining of network data information. In this paper, we first introduce the used network data deep learning models. After words, we summarize the application of deep learning on biological networks. Finally, we discuss the future development prospects of this field.
Collapse
|
19
|
Smith AM, Walsh JR, Long J, Davis CB, Henstock P, Hodge MR, Maciejewski M, Mu XJ, Ra S, Zhao S, Ziemek D, Fisher CK. Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data. BMC Bioinformatics 2020; 21:119. [PMID: 32197580 PMCID: PMC7085143 DOI: 10.1186/s12859-020-3427-8] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 02/21/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatures of phenotypes such as clinical outcome has not been attained in almost any disease area. Here, we report a comprehensive analysis spanning prediction tasks from ulcerative colitis, atopic dermatitis, diabetes, to many cancer subtypes for a total of 24 binary and multiclass prediction problems and 26 survival analysis tasks. We systematically investigate the influence of gene subsets, normalization methods and prediction algorithms. Crucially, we also explore the novel use of deep representation learning methods on large transcriptomics compendia, such as GTEx and TCGA, to boost the performance of state-of-the-art methods. The resources and findings in this work should serve as both an up-to-date reference on attainable performance, and as a benchmarking resource for further research. RESULTS Approaches that combine large numbers of genes outperformed single gene methods consistently and with a significant margin, but neither unsupervised nor semi-supervised representation learning techniques yielded consistent improvements in out-of-sample performance across datasets. Our findings suggest that using l2-regularized regression methods applied to centered log-ratio transformed transcript abundances provide the best predictive analyses overall. CONCLUSIONS Transcriptomics-based phenotype prediction benefits from proper normalization techniques and state-of-the-art regularized regression approaches. In our view, breakthrough performance is likely contingent on factors which are independent of normalization and general modeling techniques; these factors might include reduction of systematic errors in sequencing data, incorporation of other data types such as single-cell sequencing and proteomics, and improved use of prior knowledge.
Collapse
Affiliation(s)
| | | | - John Long
- Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA
| | - Craig B Davis
- Oncology Global Product Development, Pfizer Inc., San Diego, CA, USA
| | | | - Martin R Hodge
- Inflammation and Immunology, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA
| | - Mateusz Maciejewski
- Inflammation and Immunology, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA
| | - Xinmeng Jasmine Mu
- Oncology Research & Development, Worldwide Research & Development, Pfizer Inc., San Diego, CA, USA
| | - Stephen Ra
- Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA
| | - Shanrong Zhao
- Computational Sciences, Worldwide Research & Development, Pfizer Inc., Cambridge, MA, USA
| | - Daniel Ziemek
- Inflammation and Immunology, Worldwide Research & Development, Pfizer Pharma GmbH., Berlin, Germany
| | | |
Collapse
|
20
|
Stafford IS, Kellermann M, Mossotto E, Beattie RM, MacArthur BD, Ennis S. A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases. NPJ Digit Med 2020; 3:30. [PMID: 32195365 PMCID: PMC7062883 DOI: 10.1038/s41746-020-0229-3] [Citation(s) in RCA: 123] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Accepted: 01/17/2020] [Indexed: 02/07/2023] Open
Abstract
Autoimmune diseases are chronic, multifactorial conditions. Through machine learning (ML), a branch of the wider field of artificial intelligence, it is possible to extract patterns within patient data, and exploit these patterns to predict patient outcomes for improved clinical management. Here, we surveyed the use of ML methods to address clinical problems in autoimmune disease. A systematic review was conducted using MEDLINE, embase and computers and applied sciences complete databases. Relevant papers included "machine learning" or "artificial intelligence" and the autoimmune diseases search term(s) in their title, abstract or key words. Exclusion criteria: studies not written in English, no real human patient data included, publication prior to 2001, studies that were not peer reviewed, non-autoimmune disease comorbidity research and review papers. 169 (of 702) studies met the criteria for inclusion. Support vector machines and random forests were the most popular ML methods used. ML models using data on multiple sclerosis, rheumatoid arthritis and inflammatory bowel disease were most common. A small proportion of studies (7.7% or 13/169) combined different data types in the modelling process. Cross-validation, combined with a separate testing set for more robust model evaluation occurred in 8.3% of papers (14/169). The field may benefit from adopting a best practice of validation, cross-validation and independent testing of ML models. Many models achieved good predictive results in simple scenarios (e.g. classification of cases and controls). Progression to more complex predictive models may be achievable in future through integration of multiple data types.
Collapse
Affiliation(s)
- I. S. Stafford
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
- Institute for Life Sciences, University of Southampton, Southampton, UK
| | - M. Kellermann
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| | - E. Mossotto
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
- Institute for Life Sciences, University of Southampton, Southampton, UK
| | - R. M. Beattie
- Department of Paediatric Gastroenterology, Southampton Children’s Hospital, Southampton, UK
| | - B. D. MacArthur
- Institute for Life Sciences, University of Southampton, Southampton, UK
| | - S. Ennis
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| |
Collapse
|
21
|
Azodi CB, Pardo J, VanBuren R, de Los Campos G, Shiu SH. Transcriptome-Based Prediction of Complex Traits in Maize. THE PLANT CELL 2020; 32:139-151. [PMID: 31641024 PMCID: PMC6961623 DOI: 10.1105/tpc.19.00332] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 09/24/2019] [Accepted: 10/21/2019] [Indexed: 05/11/2023]
Abstract
The ability to predict traits from genome-wide sequence information (i.e., genomic prediction) has improved our understanding of the genetic basis of complex traits and transformed breeding practices. Transcriptome data may also be useful for genomic prediction. However, it remains unclear how well transcript levels can predict traits, particularly when traits are scored at different development stages. Using maize (Zea mays) genetic markers and transcript levels from seedlings to predict mature plant traits, we found that transcript and genetic marker models have similar performance. When the transcripts and genetic markers with the greatest weights (i.e., the most important) in those models were used in one joint model, performance increased. Furthermore, genetic markers important for predictions were not close to or identified as regulatory variants for important transcripts. These findings demonstrate that transcript levels are useful for predicting traits and that their predictive power is not simply due to genetic variation in the transcribed genomic regions. Finally, genetic marker models identified only 1 of 14 benchmark flowering-time genes, while transcript models identified 5. These data highlight that, in addition to being useful for genomic prediction, transcriptome data can provide a link between traits and variation that cannot be readily captured at the sequence level.
Collapse
Affiliation(s)
- Christina B Azodi
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
- The DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, 48824
| | - Jeremy Pardo
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
- Plant Resilience Institute, Michigan State University, East Lansing, Michigan 48824
| | - Robert VanBuren
- Plant Resilience Institute, Michigan State University, East Lansing, Michigan 48824
- Department of Horticulture, Michigan State University, East Lansing, Michigan 48824
| | - Gustavo de Los Campos
- Epidemiology and Biostatistics and Statistics and Probability Departments, Michigan State University, East Lansing, Michigan 48824
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
- The DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, 48824
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|
22
|
Xu B, Liu Y, Yu S, Wang L, Dong J, Lin H, Yang Z, Wang J, Xia F. A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network. BMC Med Genomics 2019; 12:188. [PMID: 31865919 PMCID: PMC6927107 DOI: 10.1186/s12920-019-0627-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Prediction of pathogenic genes is crucial for disease prevention, diagnosis, and treatment. But traditional genetic localization methods are often technique-difficulty and time-consuming. With the development of computer science, computational biology has gradually become one of the main methods for finding candidate pathogenic genes. METHODS We propose a pathogenic genes prediction method based on network embedding which is called Multipath2vec. Firstly, we construct an heterogeneous network which is called GP-network. It is constructed based on three kinds of relationships between genes and phenotypes, including correlations between phenotypes, interactions between genes and known gene-phenotype pairs. Then in order to embedding the network better, we design the multi-path to guide random walk in GP-network. The multi-path includes multiple paths between genes and phenotypes which can capture complex structural information of heterogeneous network. Finally, we use the learned vector representation of each phenotype and protein to calculate the similarities and rank according to the similarities between candidate genes and the target phenotype. RESULTS We implemented Multipath2vec and four baseline approaches (i.e., CATAPULT, PRINCE, Deepwalk and Metapath2vec) on many-genes gene-phenotype data, single-gene gene-phenotype data and whole gene-phenotype data. Experimental results show that Multipath2vec outperformed the state-of-the-art baselines in pathogenic genes prediction task. CONCLUSIONS We propose Multipath2vec that can be utilized to predict pathogenic genes and experimental results show the higher accuracy of pathogenic genes prediction.
Collapse
Affiliation(s)
- Bo Xu
- School of Software, Dalian University of Technology, Dalian, 116000 China
- Key Laboratory for Ubiquitous Network and Service Software of Liaoning, Dalian, 116000 China
| | - Yu Liu
- School of Software, Dalian University of Technology, Dalian, 116000 China
| | - Shuo Yu
- School of Software, Dalian University of Technology, Dalian, 116000 China
| | - Lei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116000 China
| | - Jie Dong
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116000 China
| | - Hongfei Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116000 China
| | - Zhihao Yang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116000 China
| | - Jian Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116000 China
| | - Feng Xia
- School of Software, Dalian University of Technology, Dalian, 116000 China
- Key Laboratory for Ubiquitous Network and Service Software of Liaoning, Dalian, 116000 China
| |
Collapse
|
23
|
Hooiveld-Noeken J, Fehrmann R, de Vries E, Jalving M. Driving innovation for rare skin cancers: utilizing common tumours and machine learning to predict immune checkpoint inhibitor response. IMMUNO-ONCOLOGY TECHNOLOGY 2019; 4:1-7. [PMID: 35755000 PMCID: PMC9216707 DOI: 10.1016/j.iotech.2019.11.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 11/15/2019] [Accepted: 11/19/2019] [Indexed: 12/30/2022]
Abstract
Metastatic Merkel cell carcinoma (MCC) and cutaneous squamous cell carcinoma (cSCC) are rare and both show impressive responses to immune checkpoint inhibitor treatment. However, at least 40% of patients do not respond to these expensive and potentially toxic drugs. Development of predictive biomarkers of response and rational, effective combination treatment strategies in these rare, often frail patient populations is challenging. This review discusses the pathophysiology and treatment of MCC and cSCC, with a particular focus on potential biomarkers of response to immunotherapy, and discusses how transfer learning using big data collected from patients with common tumours can be used in combination with deep phenotyping of rare tumours to develop predictive biomarkers and elucidate novel treatment targets.
Metastatic Merkel cell carcinoma and cutaneous squamous cell carcinoma are rare tumours. Immunotherapy gives impressive responses but most patients do not survive long term. Small patient numbers prevent extensive biomarker research in clinical trials. Pooled data from common and rare tumours can be used to train neural networks. In rare cancers, neural networks can help identify biomarkers and novel treatment targets.
Collapse
|
24
|
Huang T, Huang X, Shi B, Yao M. GEREDB: Gene expression regulation database curated by mining abstracts from literature. J Bioinform Comput Biol 2019; 17:1950024. [PMID: 31617460 DOI: 10.1142/s0219720019500240] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Understanding how genes are expressed and regulated in different biological processes are fundamental and challenging issues. Considerable progress has been made in studying the relationship between the expression and regulation of human genes. However, it is difficult to use these resources productively to analyze gene expression data. GEREDB (www.thua45.cn/geredb) has been developed to facilitate analyses that will provide insights into the regulation of genes that govern specific biological responses. GEREDB is a publicly available, manually curated biological database that stores the data regarding relationships between expression and regulation of human genes. To date, more than 39,000 Links have been contextually annotated by reviewing more than 53,000 abstracts. GEREDB can be searched using the official NCBI gene symbol as a query, and it can be downloaded along with the GEREA software package. GEREDB has the ability to analyze user-supplied gene expression data in a causal analysis oriented manner using the GEREA bioinformatics tool.
Collapse
Affiliation(s)
- Tinghua Huang
- College of Animal Science, Yangtze University, Jingzhou, Hubei 434025, P. R. China
| | - Xiali Huang
- College of Animal Science, Yangtze University, Jingzhou, Hubei 434025, P. R. China
| | - Bomei Shi
- College of Animal Science, Yangtze University, Jingzhou, Hubei 434025, P. R. China
| | - Min Yao
- College of Animal Science, Yangtze University, Jingzhou, Hubei 434025, P. R. China
| |
Collapse
|
25
|
Computational methods for Gene Regulatory Networks reconstruction and analysis: A review. Artif Intell Med 2019; 95:133-145. [DOI: 10.1016/j.artmed.2018.10.006] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 10/23/2018] [Accepted: 10/23/2018] [Indexed: 01/14/2023]
|
26
|
Li Z, Gao N, Martini JWR, Simianer H. Integrating Gene Expression Data Into Genomic Prediction. Front Genet 2019; 10:126. [PMID: 30858865 PMCID: PMC6397893 DOI: 10.3389/fgene.2019.00126] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2018] [Accepted: 02/04/2019] [Indexed: 01/14/2023] Open
Abstract
Gene expression profiles potentially hold valuable information for the prediction of breeding values and phenotypes. In this study, the utility of transcriptome data for phenotype prediction was tested with 185 inbred lines of Drosophila melanogaster for nine traits in two sexes. We incorporated the transcriptome data into genomic prediction via two methods: GTBLUP and GRBLUP, both combining single nucleotide polymorphisms (SNPs) and transcriptome data. The genotypic data was used to construct the common additive genomic relationship, which was used in genomic best linear unbiased prediction (GBLUP) or jointly in a linear mixed model with a transcriptome-based linear kernel (GTBLUP), or with a transcriptome-based Gaussian kernel (GRBLUP). We studied the predictive ability of the models and discuss a concept of "omics-augmented broad sense heritability" for the multi-omics era. For most traits, GRBLUP and GBLUP provided similar predictive abilities, but GRBLUP explained more of the phenotypic variance. There was only one trait (olfactory perception to Ethyl Butyrate in females) in which the predictive ability of GRBLUP (0.23) was significantly higher than the predictive ability of GBLUP (0.21). Our results suggest that accounting for transcriptome data has the potential to improve genomic predictions if transcriptome data can be included on a larger scale.
Collapse
Affiliation(s)
- Zhengcao Li
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany
| | - Ning Gao
- State Key Laboratory of Biocontrol, Guangzhou Higher Education Mega Center, School of Life Science, Sun Yat-sen University, Guangzhou, China
| | | | - Henner Simianer
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany
| |
Collapse
|
27
|
Gold MP, LeNail A, Fraenkel E. Shallow Sparsely-Connected Autoencoders for Gene Set Projection. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019; 24:374-385. [PMID: 30963076 PMCID: PMC6417803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
When analyzing biological data, it can be helpful to consider gene sets, or predefined groups of biologically related genes. Methods exist for identifying gene sets that are differential between conditions, but large public datasets from consortium projects and single-cell RNA-Sequencing have opened the door for gene set analysis using more sophisticated machine learning techniques, such as autoencoders and variational autoencoders. We present shallow sparsely-connected autoencoders (SSCAs) and variational autoencoders (SSCVAs) as tools for projecting gene-level data onto gene sets. We tested these approaches on single-cell RNA-Sequencing data from blood cells and on RNA-Sequencing data from breast cancer patients. Both SSCA and SSCVA can recover known biological features from these datasets and the SSCVA method often outperforms SSCA (and six existing gene set scoring algorithms) on classification and prediction tasks.
Collapse
Affiliation(s)
- Maxwell P. Gold
- Department of Biological Engineering, Massachusetts Institute of Technology, 21 Ames St. Cambridge, MA, 02139, USA
| | - Alexander LeNail
- Department of Biological Engineering, Massachusetts Institute of Technology, 21 Ames St. Cambridge, MA, 02139, USA
| | - Ernest Fraenkel
- Department of Biological Engineering, Massachusetts Institute of Technology, 21 Ames St. Cambridge, MA, 02139, USA
| |
Collapse
|