1
|
Akki AJ, Patil SA, Hungund S, Sahana R, Patil MM, Kulkarni RV, Raghava Reddy K, Zameer F, Raghu AV. Advances in Parkinson's disease research - A computational network pharmacological approach. Int Immunopharmacol 2024; 139:112758. [PMID: 39067399 DOI: 10.1016/j.intimp.2024.112758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 07/22/2024] [Accepted: 07/22/2024] [Indexed: 07/30/2024]
Abstract
Parkinson's disease (PD), the second most prevalent neurodegenerative disorder, is projected to see a significant rise in incidence over the next three decades. The precise treatment of PD remains a formidable challenge, prompting ongoing research into early diagnostic methodologies. Network pharmacology, a burgeoning field grounded in systems biology, examines the intricate networks of biological systems to identify critical signal nodes, facilitating the development of multi-target therapeutic molecules. This approach systematically maps the components of Parkinson's disease, thereby reducing its complexity. In this review, we explore the application of network pharmacology workflows in PD, discuss the techniques employed in this field, and evaluate the current advancements and status of network pharmacology in the context of Parkinson's disease. The comprehensive insights will pave newer paths to explore early disease biomarkers and to develop diagnosis with a holistic in silico, in vitro, in vivo and clinical studies.
Collapse
Affiliation(s)
- Ali Jawad Akki
- Faculty of Science and Technology, BLDE (Deemed-to-be University), Vijayapura 586 103, India
| | - Shruti A Patil
- Faculty of Science and Technology, BLDE (Deemed-to-be University), Vijayapura 586 103, India
| | - Sphoorty Hungund
- Faculty of Science and Technology, BLDE (Deemed-to-be University), Vijayapura 586 103, India
| | - R Sahana
- Department of Computer Science and Engineering, RV Institute of Technology and Management, 560 076 Bengaluru, India
| | - Malini M Patil
- Department of Computer Science and Engineering, RV Institute of Technology and Management, 560 076 Bengaluru, India.
| | - Raghavendra V Kulkarni
- Faculty of Science and Technology, BLDE (Deemed-to-be University), Vijayapura 586 103, India
| | - K Raghava Reddy
- School of Chemical and Biomolecular Engineering, The University of Sydney, Sydney, NSW 12 2006, Australia
| | - Farhan Zameer
- Department of Dravyaguna (Ayurveda Pharmacology), Alva's Ayurveda Medical College, and PathoGutOmics Laboratory, ATMA Research Centre, Dakshina Kannada 574 227, India.
| | - Anjanapura V Raghu
- Department of Basic Sciences, Faculty of Engineering and Technology, CMR University, 562149 Bangalore, India.
| |
Collapse
|
2
|
Cheng C, Messerschmidt L, Bravo I, Waldbauer M, Bhavikatti R, Schenk C, Grujic V, Model T, Kubinec R, Barceló J. A General Primer for Data Harmonization. Sci Data 2024; 11:152. [PMID: 38297013 PMCID: PMC10831085 DOI: 10.1038/s41597-024-02956-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 01/11/2024] [Indexed: 02/02/2024] Open
Affiliation(s)
- Cindy Cheng
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany.
| | - Luca Messerschmidt
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany
| | - Isaac Bravo
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany
| | - Marco Waldbauer
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany
| | | | - Caress Schenk
- School of Humanities and Social Sciences, Nazarbayev University, Kabanbay Batry Ave., 53, Astana, 010000, Kazakhstan
| | - Vanja Grujic
- Faculty of Law, University of Brasilia, Campus Universitário Darcy Ribeiro Asa Norte, Brasília, 10587, Brazil
| | - Tim Model
- Delve, 2225 3rd St, San Francisco, 94107, California, USA
| | - Robert Kubinec
- Division of Social Science, New York University Abu Dhabi, Social Science Building (A5), Abu Dhabi, 129188, United Arab Emirates
| | - Joan Barceló
- Division of Social Science, New York University Abu Dhabi, Social Science Building (A5), Abu Dhabi, 129188, United Arab Emirates
| |
Collapse
|
3
|
Kalpana S, Lin WY, Wang YC, Fu Y, Wang HY. Alternate Antimicrobial Therapies and Their Companion Tests. Diagnostics (Basel) 2023; 13:2490. [PMID: 37568853 PMCID: PMC10417861 DOI: 10.3390/diagnostics13152490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 07/14/2023] [Indexed: 08/13/2023] Open
Abstract
New antimicrobial approaches are essential to counter antimicrobial resistance. The drug development pipeline is exhausted with the emergence of resistance, resulting in unsuccessful trials. The lack of an effective drug developed from the conventional drug portfolio has mandated the introspection into the list of potentially effective unconventional alternate antimicrobial molecules. Alternate therapies with clinically explicable forms include monoclonal antibodies, antimicrobial peptides, aptamers, and phages. Clinical diagnostics optimize the drug delivery. In the era of diagnostic-based applications, it is logical to draw diagnostic-based treatment for infectious diseases. Selection criteria of alternate therapeutics in infectious diseases include detection, monitoring of response, and resistance mechanism identification. Integrating these diagnostic applications is disruptive to the traditional therapeutic development. The challenges and mitigation methods need to be noted. Applying the goals of clinical pharmacokinetics that include enhancing efficacy and decreasing toxicity of drug therapy, this review analyses the strong correlation of alternate antimicrobial therapeutics in infectious diseases. The relationship between drug concentration and the resulting effect defined by the pharmacodynamic parameters are also analyzed. This review analyzes the perspectives of aligning diagnostic initiatives with the use of alternate therapeutics, with a particular focus on companion diagnostic applications in infectious diseases.
Collapse
Affiliation(s)
- Sriram Kalpana
- Department of Laboratory Medicine, Linkou Chang Gung Memorial Hospital, Taoyuan 333423, Taiwan;
| | - Wan-Ying Lin
- Department of Medicine, University of California San Diego, San Diego, CA 92093, USA;
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA;
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Yu-Chiang Wang
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA;
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Yiwen Fu
- Department of Medicine, Kaiser Permanente Santa Clara Medical Center, Santa Clara, CA 95051, USA;
| | - Hsin-Yao Wang
- Department of Laboratory Medicine, Linkou Chang Gung Memorial Hospital, Taoyuan 333423, Taiwan;
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA;
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| |
Collapse
|
4
|
Tran L, He K, Wang D, Jiang H. A cross-validation statistical framework for asymmetric data integration. Biometrics 2023; 79:1280-1292. [PMID: 35524490 PMCID: PMC9637892 DOI: 10.1111/biom.13685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 04/19/2022] [Indexed: 11/26/2022]
Abstract
The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.
Collapse
Affiliation(s)
- Lam Tran
- Department of Biostatistics, University of Michigan, Ann Arbor MI, USA
| | - Kevin He
- Department of Biostatistics, University of Michigan, Ann Arbor MI, USA
| | - Di Wang
- Department of Biostatistics, University of Michigan, Ann Arbor MI, USA
| | - Hui Jiang
- Department of Biostatistics, University of Michigan, Ann Arbor MI, USA
| |
Collapse
|
5
|
Asif M, Martiniano HFMC, Lamurias A, Kausar S, Couto FM. DGH-GO: dissecting the genetic heterogeneity of complex diseases using gene ontology. BMC Bioinformatics 2023; 24:171. [PMID: 37101154 PMCID: PMC10134522 DOI: 10.1186/s12859-023-05290-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 04/14/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Complex diseases such as neurodevelopmental disorders (NDDs) exhibit multiple etiologies. The multi-etiological nature of complex-diseases emerges from distinct but functionally similar group of genes. Different diseases sharing genes of such groups show related clinical outcomes that further restrict our understanding of disease mechanisms, thus, limiting the applications of personalized medicine approaches to complex genetic disorders. RESULTS Here, we present an interactive and user-friendly application, called DGH-GO. DGH-GO allows biologists to dissect the genetic heterogeneity of complex diseases by stratifying the putative disease-causing genes into clusters that may contribute to distinct disease outcome development. It can also be used to study the shared etiology of complex-diseases. DGH-GO creates a semantic similarity matrix for the input genes by using Gene Ontology (GO). The resultant matrix can be visualized in 2D plots using different dimension reduction methods (T-SNE, Principal component analysis, umap and Principal coordinate analysis). In the next step, clusters of functionally similar genes are identified from genes functional similarities assessed through GO. This is achieved by employing four different clustering methods (K-means, Hierarchical, Fuzzy and PAM). The user may change the clustering parameters and explore their effect on stratification immediately. DGH-GO was applied to genes disrupted by rare genetic variants in Autism Spectrum Disorder (ASD) patients. The analysis confirmed the multi-etiological nature of ASD by identifying four clusters of genes that were enriched for distinct biological mechanisms and clinical outcome. In the second case study, the analysis of genes shared by different NDDs showed that genes causing multiple disorders tend to aggregate in similar clusters, indicating a possible shared etiology. CONCLUSION DGH-GO is a user-friendly application that allows biologists to study the multi-etiological nature of complex diseases by dissecting their genetic heterogeneity. In summary, functional similarities, dimension reduction and clustering methods, coupled with interactive visualization and control over analysis allows biologists to explore and analyze their datasets without requiring expert knowledge on these methods. The source code of proposed application is available at https://github.com/Muh-Asif/DGH-GO.
Collapse
Affiliation(s)
- Muhammad Asif
- Biomedical Data Science Lab, Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad, 38000, Pakistan.
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
| | - Hugo F M C Martiniano
- Instituto Nacional de Saúde Doutor Ricardo Jorge, Avenida Padre Cruz, 1649-016, Lisbon, Portugal
- BioISI - Instituto de Biosistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal
| | - Andre Lamurias
- Department of Computer Science, Aalborg University, Ålborg, Denmark
- NOVA LINCS, NOVA School of Science and Technology, Lisboa, Portugal
| | - Samina Kausar
- DeepOmicsVision, Avenue de Luminy, 13009, Marseille, France
| | - Francisco M Couto
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
| |
Collapse
|
6
|
Husereau D, Steuten L, Muthu V, Thomas DM, Spinner DS, Ivany C, Mengel M, Sheffield B, Yip S, Jacobs P, Sullivan T. Effective and Efficient Delivery of Genome-Based Testing-What Conditions Are Necessary for Health System Readiness? Healthcare (Basel) 2022; 10:healthcare10102086. [PMID: 36292532 PMCID: PMC9602865 DOI: 10.3390/healthcare10102086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 10/09/2022] [Accepted: 10/12/2022] [Indexed: 01/09/2023] Open
Abstract
Health systems internationally must prepare for a future of genetic/genomic testing to inform healthcare decision-making while creating research opportunities. High functioning testing services will require additional considerations and health system conditions beyond traditional diagnostic testing. Based on a literature review of good practices, key informant interviews, and expert discussion, this article attempts to synthesize what conditions are necessary, and what good practice may look like. It is intended to aid policymakers and others designing future systems of genome-based care and care prevention. These conditions include creating communities of practice and healthcare system networks; resource planning; across-region informatics; having a clear entry/exit point for innovation; evaluative function(s); concentrated or coordinated service models; mechanisms for awareness and care navigation; integrating innovation and healthcare delivery functions; and revisiting approaches to financing, education and training, regulation, and data privacy and security. The list of conditions we propose was developed with an emphasis on describing conditions that would be applicable to any healthcare system, regardless of capacity, organizational structure, financing, population characteristics, standardization of care processes, or underlying culture.
Collapse
Affiliation(s)
- Don Husereau
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON K1G 5Z3, Canada
- Correspondence: ; Tel.: +1-6132994379
| | - Lotte Steuten
- Office of Health Economics, London SE1 2HB, UK
- City Health Economics Centre (CHEC), City University of London, London EC1V 0HB, UK
| | - Vivek Muthu
- Marivek Healthcare Consulting, Epsom KT18 7PF, UK
| | - David M. Thomas
- Garvan Institute of Medical Research, Sydney, NSW 2010, Australia
- Omico, Sydney, NSW 2010, Australia
| | - Daryl S. Spinner
- Menarini Silicon Biosystems Inc., Huntingdon Valley, PA 19006, USA
| | - Craig Ivany
- Provincial Health Services Authority, Vancouver, BC V5Z 1G1, Canada
| | - Michael Mengel
- Department of Laboratory Medicine & Pathology, University of Alberta, Edmonton, AB T6G 2S2, Canada
| | | | - Stephen Yip
- Department of Pathology and Laboratory Medicine, Faculty of Medicine, University of British Columbia, Vancouver, BC V6T 1Z7, Canada
| | - Philip Jacobs
- Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB T6G 2R3, Canada
| | - Terrence Sullivan
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON M5T 3M6, Canada
- Gerald Bronfman Department of Oncology, McGill University, Montreal, QC H4A 3T2, Canada
| |
Collapse
|
7
|
Płuciennik A, Płaczek A, Wilk A, Student S, Oczko-Wojciechowska M, Fujarewicz K. Data Integration–Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics. Int J Mol Sci 2022; 23:ijms231911880. [PMID: 36233181 PMCID: PMC9569592 DOI: 10.3390/ijms231911880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 09/24/2022] [Accepted: 09/28/2022] [Indexed: 11/23/2022] Open
Abstract
(1) Background: The data from independent gene expression sources may be integrated for the purpose of molecular diagnostics of cancer. So far, multiple approaches were described. Here, we investigated the impacts of different data fusion strategies on classification accuracy and feature selection stability, which allow the costs of diagnostic tests to be reduced. (2) Methods: We used molecular features (gene expression) combined with a feature extracted from the independent clinical data describing a patient’s sample. We considered the dependencies between selected features in two data fusion strategies (early fusion and late fusion) compared to classification models based on molecular features only. We compared the best accuracy classification models in terms of the number of features, which is connected to the potential cost reduction of the diagnostic classifier. (3) Results: We show that for thyroid cancer, the extracted clinical feature is correlated with (but not redundant to) the molecular data. The usage of data fusion allows a model to be obtained with similar or even higher classification quality (with a statistically significant accuracy improvement, a p-value below 0.05) and with a reduction in molecular dimensionality of the feature space from 15 to 3–8 (depending on the feature selection method). (4) Conclusions: Both strategies give comparable quality results, but the early fusion method provides better feature selection stability.
Collapse
Affiliation(s)
- Alicja Płuciennik
- Department of Systems Biology and Engineering, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
- Department of Technology Development, Gabos Software Sp z o.o., Mikołowska 100, 40-065 Katowice, Poland
- Correspondence: (A.P.); (S.S.)
| | - Aleksander Płaczek
- Department of Technology Development, Gabos Software Sp z o.o., Mikołowska 100, 40-065 Katowice, Poland
- Department of Applied Informatics, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Agata Wilk
- Department of Systems Biology and Engineering, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
- Department of Biostatistics and Bioinformatics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice Branch, Wybrzeze AK 14, 44-100 Gliwice, Poland
| | - Sebastian Student
- Department of Systems Biology and Engineering, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
- Biotechnology Center, Silesian University of Technology, Bolesława Krzywoustego 8, 44-100 Gliwice, Poland
- Correspondence: (A.P.); (S.S.)
| | - Małgorzata Oczko-Wojciechowska
- Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice Branch, Wybrzeze AK 14, 44-100 Gliwice, Poland
| | - Krzysztof Fujarewicz
- Department of Systems Biology and Engineering, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| |
Collapse
|
8
|
Dall'Alba G, Casa PL, Abreu FPD, Notari DL, de Avila E Silva S. A Survey of Biological Data in a Big Data Perspective. BIG DATA 2022; 10:279-297. [PMID: 35394342 DOI: 10.1089/big.2020.0383] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.
Collapse
Affiliation(s)
- Gabriel Dall'Alba
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
- Genome Science and Technology Program, Faculty of Science, The University of British Columbia, Vancouver, Canada
| | - Pedro Lenz Casa
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Daniel Luis Notari
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Scheila de Avila E Silva
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| |
Collapse
|
9
|
Youn J, Rai N, Tagkopoulos I. Knowledge integration and decision support for accelerated discovery of antibiotic resistance genes. Nat Commun 2022; 13:2360. [PMID: 35487919 PMCID: PMC9055065 DOI: 10.1038/s41467-022-29993-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 03/04/2022] [Indexed: 11/09/2022] Open
Abstract
We present a machine learning framework to automate knowledge discovery through knowledge graph construction, inconsistency resolution, and iterative link prediction. By incorporating knowledge from 10 publicly available sources, we construct an Escherichia coli antibiotic resistance knowledge graph with 651,758 triples from 23 triple types after resolving 236 sets of inconsistencies. Iteratively applying link prediction to this graph and wet-lab validation of the generated hypotheses reveal 15 antibiotic resistant E. coli genes, with 6 of them never associated with antibiotic resistance for any microbe. Iterative link prediction leads to a performance improvement and more findings. The probability of positive findings highly correlates with experimentally validated findings (R2 = 0.94). We also identify 5 homologs in Salmonella enterica that are all validated to confer resistance to antibiotics. This work demonstrates how evidence-driven decisions are a step toward automating knowledge discovery with high confidence and accelerated pace, thereby substituting traditional time-consuming and expensive methods.
Collapse
Affiliation(s)
- Jason Youn
- Department of Computer Science, University of California, Davis, CA, 95616, USA
- Genome Center, University of California, Davis, CA, 95616, USA
- USDA/NSF AI Institute for Next Generation Food Systems (AIFS), University of California, Davis, CA, 95616, USA
| | - Navneet Rai
- Department of Computer Science, University of California, Davis, CA, 95616, USA
- Genome Center, University of California, Davis, CA, 95616, USA
- USDA/NSF AI Institute for Next Generation Food Systems (AIFS), University of California, Davis, CA, 95616, USA
| | - Ilias Tagkopoulos
- Department of Computer Science, University of California, Davis, CA, 95616, USA.
- Genome Center, University of California, Davis, CA, 95616, USA.
- USDA/NSF AI Institute for Next Generation Food Systems (AIFS), University of California, Davis, CA, 95616, USA.
| |
Collapse
|
10
|
Das S, Mukhopadhyay I. TiMEG: an integrative statistical method for partially missing multi-omics data. Sci Rep 2021; 11:24077. [PMID: 34911979 PMCID: PMC8674330 DOI: 10.1038/s41598-021-03034-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 11/24/2021] [Indexed: 11/25/2022] Open
Abstract
Multi-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case-control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omic analysis or common imputation-based methods. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.
Collapse
Affiliation(s)
- Sarmistha Das
- Human Genetics Unit, Indian Statistical Institute, Kolkata, 700108, India
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, USA
| | | |
Collapse
|
11
|
Kovanda A, Zimani AN, Peterlin B. How to design a national genomic project-a systematic review of active projects. Hum Genomics 2021; 15:20. [PMID: 33761998 PMCID: PMC7988644 DOI: 10.1186/s40246-021-00315-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 02/23/2021] [Indexed: 01/18/2023] Open
Abstract
An increasing number of countries are investing efforts to exploit the human genome, in order to improve genetic diagnostics and to pave the way for the integration of precision medicine into health systems. The expected benefits include improved understanding of normal and pathological genomic variation, shorter time-to-diagnosis, cost-effective diagnostics, targeted prevention and treatment, and research advances.We review the 41 currently active individual national projects concerning their aims and scope, the number and age structure of included subjects, funding, data sharing goals and methods, and linkage with biobanks, medical data, and non-medical data (exposome). The main aims of ongoing projects were to determine normal genomic variation (90%), determine pathological genomic variation (rare disease, complex diseases, cancer, etc.) (71%), improve infrastructure (59%), and enable personalized medicine (37%). Numbers of subjects to be sequenced ranges substantially, from a hundred to over a million, representing in some cases a significant portion of the population. Approximately half of the projects report public funding, with the rest having various mixed or private funding arrangements. 90% of projects report data sharing (public, academic, and/or commercial with various levels of access) and plan on linking genomic data and medical data (78%), existing biobanks (44%), and/or non-medical data (24%) as the basis for enabling personal/precision medicine in the future.Our results show substantial diversity in the analysed categories of 41 ongoing national projects. The overview of current designs will hopefully inform national initiatives in designing new genomic projects and contribute to standardisation and international collaboration.
Collapse
Affiliation(s)
- Anja Kovanda
- Clinical Institute of Genomic Medicine, University Medical Centre Ljubljana, Slajmerjeva 4, Ljubljana, Slovenia
| | - Ana Nyasha Zimani
- Clinical Institute of Genomic Medicine, University Medical Centre Ljubljana, Slajmerjeva 4, Ljubljana, Slovenia
| | - Borut Peterlin
- Clinical Institute of Genomic Medicine, University Medical Centre Ljubljana, Slajmerjeva 4, Ljubljana, Slovenia.
| |
Collapse
|
12
|
Irshad O, Ghani Khan MU. Formalization and Semantic Integration of Heterogeneous Omics Annotations for Exploratory Searches. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200127122818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aim:
To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations.
Background:
Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell.
Objective:
To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data.
Method:
We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with
Result:
To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers.
Conclusion:
Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax
Collapse
Affiliation(s)
- Omer Irshad
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| | - Muhammad Usman Ghani Khan
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| |
Collapse
|
13
|
Samra H, Li A, Soh B. GENE2D: A NoSQL Integrated Data Repository of Genetic Disorders Data. Healthcare (Basel) 2020; 8:healthcare8030257. [PMID: 32781728 PMCID: PMC7551627 DOI: 10.3390/healthcare8030257] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 07/27/2020] [Accepted: 08/04/2020] [Indexed: 11/16/2022] Open
Abstract
There are few sources from which to obtain clinical and genetic data for use in research in Saudi Arabia. Numerous obstacles led to the difficulty of integrating these data from silos and scattered sources to provide standardized access to large data sets for patients with common health conditions. To this end, we sought to contribute to this area and offer a practical and easy-to-implement solution. In this paper, we aim to design and implement a "not only SQL" (NoSQL) based integration framework to generate an Integrated Data Repository of Genetic Disorders Data (GENE2D) to integrate data from various genetic clinics and research centers in Saudi Arabia and provide an easy-to-use query interface for researchers to conduct their studies on large datasets. The major components involved in the GENE2D architecture consists of the data sources, the integrated data repository (IDR) as a central database, and the application interface. The IDR uses a NoSQL document store via MongoDB (an open source document-oriented database program) as a backend database. The application interface called Query Builder provides multiple services for data retrieval from the database using a custom query to answer simple or complex research questions. The GENE2D system demonstrates its potential to help grow and develop a national genetic disorders database in Saudi Arabia.
Collapse
Affiliation(s)
- Halima Samra
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC 3086, Australia;
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Correspondence:
| | - Alice Li
- La Trobe Business School, La Trobe University, Melbourne, VIC 3086, Australia;
| | - Ben Soh
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC 3086, Australia;
| |
Collapse
|
14
|
Irshad O, Khan MUG. Integration and Querying of Heterogeneous Omics Semantic Annotations for Biomedical and Biomolecular Knowledge Discovery. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190409112025] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Background:Exploring various functional aspects of a biological cell system has been a focused research trend for last many decades. Biologists, scientists and researchers are continuously striving for unveiling the mysteries of these functional aspects to improve the health standards of life. For getting such understanding, astronomically growing, heterogeneous and geographically dispersed omics data needs to be critically analyzed. Currently, omics data is available in different types and formats through various data access interfaces. Applications which require offline and integrated data encounter a lot of data heterogeneity and global dispersion issues.Objective:For facilitating especially such applications, heterogeneous data must be collected, integrated and warehoused in such a loosely coupled way so that each molecular entity can computationally be understood independently or in association with other entities within or across the various cellular aspects.Methods:In this paper, we propose an omics data integration schema and its corresponding data warehouse system for integrating, warehousing and presenting heterogeneous and geographically dispersed omics entities according to the cellular functional aspects.Results & Conclusion:Such aspect-oriented data integration, warehousing and data access interfacing through graphical search, web services and application programing interfaces make our proposed integrated data schema and warehouse system better and useful than other contemporary ones.
Collapse
Affiliation(s)
- Omer Irshad
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan
| | - Muhammad Usman Ghani Khan
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore, Pakistan
| |
Collapse
|
15
|
Mihaylov I, Kańduła M, Krachunov M, Vassilev D. A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models. Biol Direct 2019; 14:22. [PMID: 31752974 PMCID: PMC6868770 DOI: 10.1186/s13062-019-0249-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 09/20/2019] [Indexed: 12/17/2022] Open
Abstract
Background Recently high-throughput technologies have been massively used alongside clinical tests to study various types of cancer. Data generated in such large-scale studies are heterogeneous, of different types and formats. With lack of effective integration strategies novel models are necessary for efficient and operative data integration, where both clinical and molecular information can be effectively joined for storage, access and ease of use. Such models, combined with machine learning methods for accurate prediction of survival time in cancer studies, can yield novel insights into disease development and lead to precise personalized therapies. Results We developed an approach for intelligent data integration of two cancer datasets (breast cancer and neuroblastoma) − provided in the CAMDA 2018 ‘Cancer Data Integration Challenge’, and compared models for prediction of survival time. We developed a novel semantic network-based data integration framework that utilizes NoSQL databases, where we combined clinical and expression profile data, using both raw data records and external knowledge sources. Utilizing the integrated data we introduced Tumor Integrated Clinical Feature (TICF) − a new feature for accurate prediction of patient survival time. Finally, we applied and validated several machine learning models for survival time prediction. Conclusion We developed a framework for semantic integration of clinical and omics data that can borrow information across multiple cancer studies. By linking data with external domain knowledge sources our approach facilitates enrichment of the studied data by discovery of internal relations. The proposed and validated machine learning models for survival time prediction yielded accurate results. Reviewers This article was reviewed by Eran Elhaik, Wenzhong Xiao and Carlos Loucera.
Collapse
Affiliation(s)
- Iliyan Mihaylov
- Faculty of Mathematics and Informatics, Sofia University, "St. Kliment Ohridski", 5 James Bourchier Blvd., Sofia, 1164, Bulgaria
| | - Maciej Kańduła
- Department of Biotechnology, Boku University, Vienna, 1180, Austria.,Institute for Machine Learning, Johannes Kepler University, Linz, 4040, Austria
| | - Milko Krachunov
- Faculty of Mathematics and Informatics, Sofia University, "St. Kliment Ohridski", 5 James Bourchier Blvd., Sofia, 1164, Bulgaria
| | - Dimitar Vassilev
- Faculty of Mathematics and Informatics, Sofia University, "St. Kliment Ohridski", 5 James Bourchier Blvd., Sofia, 1164, Bulgaria.
| |
Collapse
|
16
|
Emam I, Elyasigomari V, Matthews A, Pavlidis S, Rocca-Serra P, Guitton F, Verbeeck D, Grainger L, Borgogni E, Del Giudice G, Saqi M, Houston P, Guo Y. PlatformTM, a standards-based data custodianship platform for translational medicine research. Sci Data 2019; 6:149. [PMID: 31409798 PMCID: PMC6692384 DOI: 10.1038/s41597-019-0156-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 07/25/2019] [Indexed: 12/20/2022] Open
Abstract
Biomedical informatics has traditionally adopted a linear view of the informatics process (collect, store and analyse) in translational medicine (TM) studies; focusing primarily on the challenges in data integration and analysis. However, a data management challenge presents itself with the new lifecycle view of data emphasized by the recent calls for data re-use, long term data preservation, and data sharing. There is currently a lack of dedicated infrastructure focused on the 'manageability' of the data lifecycle in TM research between data collection and analysis. Current community efforts towards establishing a culture for open science prompt the creation of a data custodianship environment for management of TM data assets to support data reuse and reproducibility of research results. Here we present the development of a lifecycle-based methodology to create a metadata management framework based on community driven standards for standardisation, consolidation and integration of TM research data. Based on this framework, we also present the development of a new platform (PlatformTM) focused on managing the lifecycle for translational research data assets.
Collapse
Affiliation(s)
- Ibrahim Emam
- Data Science Institute, Imperial College London, London, UK.
| | | | - Alex Matthews
- Clinical Research Centre, University of Surrey, Guildford, UK
| | | | | | | | | | | | | | | | - Mansoor Saqi
- Data Science Institute, Imperial College London, London, UK
| | - Paul Houston
- CDISC, Clinical Data Interchange Standards Consortium and CDISC EU Foundation, London, UK
| | - Yike Guo
- Data Science Institute, Imperial College London, London, UK
| |
Collapse
|
17
|
Ethier J, McGilchrist M, Barton A, Cloutier A, Curcin V, Delaney BC, Burgun A. The TRANSFoRm project: Experience and lessons learned regarding functional and interoperability requirements to support primary care. Learn Health Syst 2018; 2:e10037. [PMID: 31245579 PMCID: PMC6508823 DOI: 10.1002/lrh2.10037] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 07/05/2017] [Accepted: 07/12/2017] [Indexed: 01/02/2023] Open
Abstract
INTRODUCTION The current model of medical knowledge production, transfer, and application suffers from serious shortcomings. Learning health systems (LHS) have recently emerged as a potential solution-systems in which health information generated from patients is continuously analyzed to improve knowledge that will be transferred to patient care. METHOD Various approaches of data integration already exist and could be considered for the implementation of a LHS. We discuss what are the possible informatics approaches to address the functional requirements of LHS, in the specific context of primary care, and present the experience and lessons learned from the TRANSFoRm project. RESULT Implemented in 4 countries around 5 systems, TRANSFoRm is based on a local-as-view data mediation approach integrating the structural and terminological models in the same framework. It clearly demonstrated that it has the potential to address the requirements for a LHS in primary care, by dealing with data fragmented across multiple points of service. Also, it has the potential to support the generation of hypotheses from the context of clinical care, retrospective and prospective research, and decision support systems that improve the relevance of medical decisions. CONCLUSION The LHS approach embodies a shift from an institution-centered to a patient-centered perspective in knowledge production and transfer and can address important challenges in the primary care setting.
Collapse
Affiliation(s)
- Jean‐François Ethier
- Department of Medicine, Faculty of Medicine and Health SciencesUniversité de SherbrookeSherbrookeCanada
- INSERM UMR 1138 team 22 Centre de Recherche des Cordeliers, Faculté de médecineUniversité Paris Descartes—Sorbonne Paris CitéParisFrance
| | - Mark McGilchrist
- Division of Population Health SciencesUniversity of DundeeDundeeUK
| | - Adrien Barton
- Department of Medicine, Faculty of Medicine and Health SciencesUniversité de SherbrookeSherbrookeCanada
| | - Anne‐Marie Cloutier
- Department of Medicine, Faculty of Medicine and Health SciencesUniversité de SherbrookeSherbrookeCanada
| | - Vasa Curcin
- Division of Health and Social Care Research, Faculty of Life Sciences and MedicineKing's College LondonLondonUK
| | - Brendan C. Delaney
- Department of Surgery and Cancer, Faculty of MedicineImperial College LondonLondonUK
| | - Anita Burgun
- INSERM UMR 1138 team 22 Centre de Recherche des Cordeliers, Faculté de médecineUniversité Paris Descartes—Sorbonne Paris CitéParisFrance
| |
Collapse
|
18
|
Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 2018; 25:300-308. [PMID: 29346583 PMCID: PMC7378878 DOI: 10.1093/jamia/ocx121] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 09/20/2017] [Accepted: 09/28/2017] [Indexed: 12/17/2022] Open
Abstract
Objective Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. Materials and Methods DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Results and Conclusion Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.
Collapse
Affiliation(s)
- Xiaoling Chen
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Anupama E Gururaj
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Ruiling Liu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ergin Soysal
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Firat Tiryaki
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yueling Li
- Center for Research in Biological Systems
| | - Nansu Zong
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Min Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Deevakar Rogith
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Mandana Salimi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hyeon-Eui Kim
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | | | - Claudiu Farcas
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Todd Johnson
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ron Margolis
- National Institutes of Health, Bethesda, MD, USA
| | | | | | - Ian M Fore
- National Institutes of Health, Bethesda, MD, USA
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
19
|
Al Kawam A, Sen A, Datta A, Dickey N. Understanding the Bioinformatics Challenges of Integrating Genomics into Healthcare. IEEE J Biomed Health Inform 2017; 22:1672-1683. [PMID: 29990071 DOI: 10.1109/jbhi.2017.2778263] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genomic data is paving the way towards personalized healthcare. By unveiling genetic disease-contributing factors, genomic data can aid in the detection, diagnosis, and treatment of a wide range of complex diseases. Integrating genomic data into healthcare is riddled with a wide range of challenges spanning social, ethical, legal, educational, economic, and technical aspects. Bioinformatics is a core integration aspect presenting an overwhelming number of unaddressed challenges. In this paper we tackle the fundamental bioinformatics integration concerns including: genomic data generation, storage, representation, and utilization in conjunction with clinical data. We divide the bioinformatics challenges into a series of seven intertwined integration aspects spanning the areas of informatics, knowledge management, and communication. For each aspect, we provide a detailed discussion of the current research directions, outstanding challenges, and possible resolutions. This paper seeks to help narrow the gap between the genomic applications, which are being predominantly utilized in research settings, and the clinical adoption of these applications.
Collapse
|
20
|
Conceptual Modeling for Genomics: Building an Integrated Repository of Open Data. CONCEPTUAL MODELING 2017. [DOI: 10.1007/978-3-319-69904-2_26] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
21
|
Savonnet M, Leclercq E, Naubourg P. eClims: An Extensible and Dynamic Integration Framework for Biomedical Information Systems. IEEE J Biomed Health Inform 2016; 20:1640-1649. [DOI: 10.1109/jbhi.2015.2464353] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
22
|
Abstract
The rise of genomically targeted therapies and immunotherapy has revolutionized the practice of oncology in the last 10–15 years. At the same time, new technologies and the electronic health record (EHR) in particular have permeated the oncology clinic. Initially designed as billing and clinical documentation systems, EHR systems have not anticipated the complexity and variety of genomic information that needs to be reviewed, interpreted, and acted upon on a daily basis. Improved integration of cancer genomic data with EHR systems will help guide clinician decision making, support secondary uses, and ultimately improve patient care within oncology clinics. Some of the key factors relating to the challenge of integrating cancer genomic data into EHRs include: the bioinformatics pipelines that translate raw genomic data into meaningful, actionable results; the role of human curation in the interpretation of variant calls; and the need for consistent standards with regard to genomic and clinical data. Several emerging paradigms for integration are discussed in this review, including: non-standardized efforts between individual institutions and genomic testing laboratories; “middleware” products that portray genomic information, albeit outside of the clinical workflow; and application programming interfaces that have the potential to work within clinical workflow. The critical need for clinical-genomic knowledge bases, which can be independent or integrated into the aforementioned solutions, is also discussed.
Collapse
Affiliation(s)
- Jeremy L Warner
- Department of Medicine, Division of Hematology/Oncology, Vanderbilt University, Nashville, TN, USA. .,Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, 37232, USA. .,Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| | - Sandeep K Jain
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, 37232, USA.,Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Mia A Levy
- Department of Medicine, Division of Hematology/Oncology, Vanderbilt University, Nashville, TN, USA.,Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, 37232, USA.,Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| |
Collapse
|
23
|
Liu Y, Chiaromonte F, Li B. Structured Ordinary Least Squares: A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units. Biometrics 2016; 73:529-539. [PMID: 27649087 DOI: 10.1111/biom.12579] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 07/01/2016] [Accepted: 07/01/2016] [Indexed: 11/29/2022]
Abstract
In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package "sSDR," publicly available on CRAN, includes all procedures necessary to implement the sOLS approach.
Collapse
Affiliation(s)
- Yang Liu
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania 16802, U.S.A
| | - Francesca Chiaromonte
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania 16802, U.S.A
| | - Bing Li
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania 16802, U.S.A
| |
Collapse
|
24
|
Myneni S, Patel VL, Bova GS, Wang J, Ackerman CF, Berlinicke CA, Chen SH, Lindvall M, Zack DJ. Resolving complex research data management issues in biomedical laboratories: Qualitative study of an industry-academia collaboration. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 126:160-70. [PMID: 26652980 PMCID: PMC4778387 DOI: 10.1016/j.cmpb.2015.11.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Revised: 10/21/2015] [Accepted: 11/03/2015] [Indexed: 06/05/2023]
Abstract
This paper describes a distributed collaborative effort between industry and academia to systematize data management in an academic biomedical laboratory. Heterogeneous and voluminous nature of research data created in biomedical laboratories make information management difficult and research unproductive. One such collaborative effort was evaluated over a period of four years using data collection methods including ethnographic observations, semi-structured interviews, web-based surveys, progress reports, conference call summaries, and face-to-face group discussions. Data were analyzed using qualitative methods of data analysis to (1) characterize specific problems faced by biomedical researchers with traditional information management practices, (2) identify intervention areas to introduce a new research information management system called Labmatrix, and finally to (3) evaluate and delineate important general collaboration (intervention) characteristics that can optimize outcomes of an implementation process in biomedical laboratories. Results emphasize the importance of end user perseverance, human-centric interoperability evaluation, and demonstration of return on investment of effort and time of laboratory members and industry personnel for success of implementation process. In addition, there is an intrinsic learning component associated with the implementation process of an information management system. Technology transfer experience in a complex environment such as the biomedical laboratory can be eased with use of information systems that support human and cognitive interoperability. Such informatics features can also contribute to successful collaboration and hopefully to scientific productivity.
Collapse
Affiliation(s)
- Sahiti Myneni
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, United States.
| | - Vimla L Patel
- New York Academy of Medicine, New York, NY, United States; Department of Biomedical Informatics, Arizona State University, United States
| | - G Steven Bova
- Departments of Pathology, Genetic Medicine, Health Sciences Informatics, Oncology, and Urology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Jian Wang
- BioFortis Inc., Columbia, MD, United States
| | - Christopher F Ackerman
- Fraunhofer Institute for Experimental Software Engineering, College Park, MD, United States
| | | | | | - Mikael Lindvall
- Fraunhofer Institute for Experimental Software Engineering, College Park, MD, United States
| | - Donald J Zack
- Departments of Pathology, Genetic Medicine, Health Sciences Informatics, Oncology, and Urology, Johns Hopkins University School of Medicine, Baltimore, MD, United States; Wilmer Eye Institute, United States; Institute of Genetic Medicine Johns Hopkins University School of Medicine, Baltimore, MD, United States
| |
Collapse
|
25
|
Masseroli M, Canakoglu A, Ceri S. Integration and Querying of Genomic and Proteomic Semantic Annotations for Biomedical Knowledge Extraction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:209-219. [PMID: 27045824 DOI: 10.1109/tcbb.2015.2453944] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Understanding complex biological phenomena involves answering complex biomedical questions on multiple biomolecular information simultaneously, which are expressed through multiple genomic and proteomic semantic annotations scattered in many distributed and heterogeneous data sources; such heterogeneity and dispersion hamper the biologists' ability of asking global queries and performing global evaluations. To overcome this problem, we developed a software architecture to create and maintain a Genomic and Proteomic Knowledge Base (GPKB), which integrates several of the most relevant sources of such dispersed information (including Entrez Gene, UniProt, IntAct, Expasy Enzyme, GO, GOA, BioCyc, KEGG, Reactome, and OMIM). Our solution is general, as it uses a flexible, modular, and multilevel global data schema based on abstraction and generalization of integrated data features, and a set of automatic procedures for easing data integration and maintenance, also when the integrated data sources evolve in data content, structure, and number. These procedures also assure consistency, quality, and provenance tracking of all integrated data, and perform the semantic closure of the hierarchical relationships of the integrated biomedical ontologies. At http://www.bioinformatics.deib.polimi.it/GPKB/, a Web interface allows graphical easy composition of queries, although complex, on the knowledge base, supporting also semantic query expansion and comprehensive explorative search of the integrated data to better sustain biomedical knowledge extraction.
Collapse
|
26
|
Herr TM, Bielinski SJ, Bottinger E, Brautbar A, Brilliant M, Chute CG, Cobb BL, Denny JC, Hakonarson H, Hartzler AL, Hripcsak G, Kannry J, Kohane IS, Kullo IJ, Lin S, Manzi S, Marsolo K, Overby CL, Pathak J, Peissig P, Pulley J, Ralston J, Rasmussen L, Roden DM, Tromp G, Uphoff T, Weng C, Wolf W, Williams MS, Starren J. Practical considerations in genomic decision support: The eMERGE experience. J Pathol Inform 2015; 6:50. [PMID: 26605115 PMCID: PMC4629307 DOI: 10.4103/2153-3539.165999] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 07/23/2015] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND Genomic medicine has the potential to improve care by tailoring treatments to the individual. There is consensus in the literature that pharmacogenomics (PGx) may be an ideal starting point for real-world implementation, due to the presence of well-characterized drug-gene interactions. Clinical Decision Support (CDS) is an ideal avenue by which to implement PGx at the bedside. Previous literature has established theoretical models for PGx CDS implementation and discussed a number of anticipated real-world challenges. However, work detailing actual PGx CDS implementation experiences has been limited. Anticipated challenges include data storage and management, system integration, physician acceptance, and more. METHODS In this study, we analyzed the experiences of ten members of the Electronic Medical Records and Genomics (eMERGE) Network, and one affiliate, in their attempts to implement PGx CDS. We examined the resulting PGx CDS system characteristics and conducted a survey to understand the unanticipated implementation challenges sites encountered. RESULTS Ten sites have successfully implemented at least one PGx CDS rule in the clinical setting. The majority of sites elected to create an Omic Ancillary System (OAS) to manage genetic and genomic data. All sites were able to adapt their existing CDS tools for PGx knowledge. The most common and impactful delays were not PGx-specific issues. Instead, they were general IT implementation problems, with top challenges including team coordination/communication and staffing. The challenges encountered caused a median total delay in system go-live of approximately two months. CONCLUSIONS These results suggest that barriers to PGx CDS implementations are generally surmountable. Moreover, PGx CDS implementation may not be any more difficult than other healthcare IT projects of similar scope, as the most significant delays encountered were not unique to genomic medicine. These are encouraging results for any institution considering implementing a PGx CDS tool, and for the advancement of genomic medicine.
Collapse
Affiliation(s)
- Timothy M Herr
- Department of Preventive Medicine, Division of Health and Biomedical Informatics, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | | | - Erwin Bottinger
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine, Mount Sinai, New York, USA
| | - Ariel Brautbar
- Division of Genetics and Endocrinology, Cook Children's Medical Center, Fort Worth, Texas, USA
| | - Murray Brilliant
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA
| | - Christopher G Chute
- Division of General Internal Medicine, Johns Hopkins University, Baltimore, Maryland, USA
| | - Beth L Cobb
- Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Baltimore, MD, USA
| | - Hakon Hakonarson
- Department of Pediatrics, The Children's Hospital of Philadelphia, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | | | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Medical Center, New York, USA
| | - Joseph Kannry
- Icahn School of Medicine, Mount Sinai, New York, USA
| | - Isaac S Kohane
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Iftikhar J Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Simon Lin
- Nationwide Children's Hospital, Columbus, Ohio, USA
| | - Shannon Manzi
- Department of Pharmacy, Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Keith Marsolo
- Department of Pediatrics, University of Cincinnati College of Medicine, Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | | | - Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Peggy Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA
| | - Jill Pulley
- Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - James Ralston
- Group Health Research Institute, Seattle, Washington, USA
| | - Luke Rasmussen
- Department of Preventive Medicine, Division of Health and Biomedical Informatics, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Dan M Roden
- Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Gerard Tromp
- Weis Center for Research, Geisinger Clinic, Danville, Pennsylvania, USA
| | - Timothy Uphoff
- Molecular Pathology, Mashfield Labs, Marshfield, Wisconsin, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, USA
| | - Wendy Wolf
- Department of Pediatrics, Harvard Medical School, Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Marc S Williams
- Genomic Medicine Institute, Geisinger Health System, Danville, Pennsylvania, USA
| | - Justin Starren
- Department of Preventive Medicine, Division of Health and Biomedical Informatics, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|
27
|
Orechia J, Pathak A, Shi Y, Nawani A, Belozerov A, Fontes C, Lakhiani C, Jawale C, Patel C, Quinn D, Botvinnik D, Mei E, Cotter E, Byleckie J, Ullman-Cullere M, Chhetri P, Chalasani P, Karnam P, Beaudoin R, Sahu S, Belozerova Y, Mathew JP. OncDRS: An integrative clinical and genomic data platform for enabling translational research and precision medicine. Appl Transl Genom 2015; 6:18-25. [PMID: 27054074 PMCID: PMC4803771 DOI: 10.1016/j.atg.2015.08.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 08/05/2015] [Indexed: 02/01/2023]
Abstract
We live in the genomic era of medicine, where a patient's genomic/molecular data is becoming increasingly important for disease diagnosis, identification of targeted therapy, and risk assessment for adverse reactions. However, decoding the genomic test results and integrating it with clinical data for retrospective studies and cohort identification for prospective clinical trials is still a challenging task. In order to overcome these barriers, we developed an overarching enterprise informatics framework for translational research and personalized medicine called Synergistic Patient and Research Knowledge Systems (SPARKS) and a suite of tools called Oncology Data Retrieval Systems (OncDRS). OncDRS enables seamless data integration, secure and self-navigated query and extraction of clinical and genomic data from heterogeneous sources. Within a year of release, the system has facilitated more than 1500 research queries and has delivered data for more than 50 research studies.
Collapse
Affiliation(s)
- John Orechia
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Ameet Pathak
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Yunling Shi
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Aniket Nawani
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Andrey Belozerov
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Caitlin Fontes
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Camille Lakhiani
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Chetan Jawale
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Chetansharan Patel
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Daniel Quinn
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Dmitry Botvinnik
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Eddie Mei
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Elizabeth Cotter
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - James Byleckie
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | | | - Padam Chhetri
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Poornima Chalasani
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Purushotham Karnam
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Ronald Beaudoin
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Sandeep Sahu
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Yelena Belozerova
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| | - Jomol P Mathew
- Dana-Faber Cancer Institute, 450 Brookline Ave., Boston, MA-02215, United States
| |
Collapse
|
28
|
Roman S, Panduro A. Genomic medicine in gastroenterology: A new approach or a new specialty? World J Gastroenterol 2015; 21:8227-8237. [PMID: 26217074 PMCID: PMC4507092 DOI: 10.3748/wjg.v21.i27.8227] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Revised: 03/24/2015] [Accepted: 05/04/2015] [Indexed: 02/06/2023] Open
Abstract
Throughout history, many medical milestones have been achieved to prevent and treat human diseases. Man's early conception of illness was naturally holistic or integrative. However, scientific knowledge was atomized into quantitative and qualitative research. In the field of medicine, the main trade-off was the creation of many medical specialties that commonly treat patients in advanced stages of disease. However, now that we are immersed in the post-genomic era, how should we reevaluate medicine? Genomic medicine has evoked a medical paradigm shift based on the plausibility to predict the genetic susceptibility to disease. Additionally, the development of chronic diseases should be viewed as a continuum of interactions between the individual's genetic make-up and environmental factors such as diet, physical activity, and emotions. Thus, personalized medicine is aimed at preventing or reversing clinical symptoms, and providing a better quality of life by integrating the genetic, environmental and cultural factors of diseases. Whether using genomic medicine in the field of gastroenterology is a new approach or a new medical specialty remains an open question. To address this issue, it will require the mutual work of educational and governmental authorities with public health professionals, with the goal of translating genomic medicine into better health policies.
Collapse
|
29
|
Livingston KM, Bada M, Baumgartner WA, Hunter LE. KaBOB: ontology-based semantic integration of biomedical databases. BMC Bioinformatics 2015; 16:126. [PMID: 25903923 PMCID: PMC4448321 DOI: 10.1186/s12859-015-0559-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 03/30/2015] [Indexed: 04/04/2023] Open
Abstract
Background The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. Results We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. Conclusions KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0559-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kevin M Livingston
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - Michael Bada
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - William A Baumgartner
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
30
|
Ashish N, Toga AW. Medical data transformation using rewriting. Front Neuroinform 2015; 9:1. [PMID: 25750622 PMCID: PMC4335467 DOI: 10.3389/fninf.2015.00001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Accepted: 01/29/2015] [Indexed: 11/13/2022] Open
Abstract
This paper presents a system for declaratively transforming medical subjects' data into a common data model representation. Our work is part of the “GAAIN” project on Alzheimer's disease data federation across multiple data providers. We present a general purpose data transformation system that we have developed by leveraging the existing state-of-the-art in data integration and query rewriting. In this work we have further extended the current technology with new formalisms that facilitate expressing a broader range of data transformation tasks, plus new execution methodologies to ensure efficient data transformation for disease datasets.
Collapse
Affiliation(s)
- Naveen Ashish
- Laboratory of Neuroimaging, Institute for Neuroimaging and Neuroinformatics, Keck School of Medicine of USC, University of Southern California Los Angeles, CA, USA
| | - Arthur W Toga
- Laboratory of Neuroimaging, Institute for Neuroimaging and Neuroinformatics, Keck School of Medicine of USC, University of Southern California Los Angeles, CA, USA
| |
Collapse
|
31
|
Machado CM, Rebholz-Schuhmann D, Freitas AT, Couto FM. The semantic web in translational medicine: current applications and future directions. Brief Bioinform 2015; 16:89-103. [PMID: 24197933 PMCID: PMC4293377 DOI: 10.1093/bib/bbt079] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 10/08/2013] [Indexed: 11/14/2022] Open
Abstract
Semantic web technologies offer an approach to data integration and sharing, even for resources developed independently or broadly distributed across the web. This approach is particularly suitable for scientific domains that profit from large amounts of data that reside in the public domain and that have to be exploited in combination. Translational medicine is such a domain, which in addition has to integrate private data from the clinical domain with proprietary data from the pharmaceutical domain. In this survey, we present the results of our analysis of translational medicine solutions that follow a semantic web approach. We assessed these solutions in terms of their target medical use case; the resources covered to achieve their objectives; and their use of existing semantic web resources for the purposes of data sharing, data interoperability and knowledge discovery. The semantic web technologies seem to fulfill their role in facilitating the integration and exploration of data from disparate sources, but it is also clear that simply using them is not enough. It is fundamental to reuse resources, to define mappings between resources, to share data and knowledge. All these aspects allow the instantiation of translational medicine at the semantic web-scale, thus resulting in a network of solutions that can share resources for a faster transfer of new scientific results into the clinical practice. The envisioned network of translational medicine solutions is on its way, but it still requires resolving the challenges of sharing protected data and of integrating semantic-driven technologies into the clinical practice.
Collapse
Affiliation(s)
- Catia M. Machado
- *Corresponding author. Catia M. Machado, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Portugal and Instituto de Engenharia de Sistemas e Computadores - Investigação e Desenvolvimento, Universidade de Lisboa, Portugal. E-mail:
| | | | | | | |
Collapse
|
32
|
Wade TD, Zelarney PT, Hum RC, McGee S, Batson DH. Using patient lists to add value to integrated data repositories. J Biomed Inform 2014; 52:72-7. [PMID: 24534444 DOI: 10.1016/j.jbi.2014.02.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Revised: 12/20/2013] [Accepted: 02/04/2014] [Indexed: 01/16/2023]
Abstract
Patient lists are project-specific sets of patients that can be queried in integrated data repositories (IDR's). By allowing a set of patients to be an addition to the qualifying conditions of a query, returned results will refer to, and only to, that set of patients. We report a variety of use cases for such lists, including: restricting retrospective chart review to a defined set of patients; following a set of patients for practice management purposes; distributing "honest-brokered" (deidentified) data; adding phenotypes to biosamples; and enhancing the content of study or registry data. Among the capabilities needed to implement patient lists in an IDR are: capture of patient identifiers from a query and feedback of these into the IDR; the existence of a permanent internal identifier in the IDR that is mappable to external identifiers; the ability to add queryable attributes to the IDR; the ability to merge data from multiple queries; and suitable control over user access and de-identification of results. We implemented patient lists in a custom IDR of our own design. We reviewed capabilities of other published IDRs for focusing on sets of patients. The widely used i2b2 IDR platform has various ways to address patient sets, and it could be modified to add the low-overhead version of patient lists that we describe.
Collapse
Affiliation(s)
- Ted D Wade
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA.
| | - Pearlanne T Zelarney
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA
| | - Richard C Hum
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA
| | - Sylvia McGee
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206, USA
| | - Deborah H Batson
- Department of Research Informatics, Children's Hospital Colorado Research Institute, Aurora, CO 80045, USA
| |
Collapse
|
33
|
Alawieh A, Sabra Z, Nokkari A, El-Assaad A, Mondello S, Zaraket F, Fadlallah B, Kobeissy FH. Bioinformatics approach to understanding interacting pathways in neuropsychiatric disorders. Methods Mol Biol 2014; 1168:157-172. [PMID: 24870135 DOI: 10.1007/978-1-4939-0847-9_9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Bioinformatics-based applications have been incorporated into several medical disciplines, including cancer, neuroscience, and recently psychiatry. Both the increasing interest in the molecular aspect of neuropsychiatry and the availability of high-throughput discovery and analysis tools have encouraged the incorporation of bioinformatics and neurosystems biology techniques into psychiatry and neuroscience research. As applied to neuropsychiatry, systems biology involves the acquisition and processing of high-throughput datasets to infer new information. A major component in bioinformatics output is pathway analysis that provides an insight into and prediction of possible underlying pathogenic processes which may help understand disease pathogenesis. In addition, this analysis serves as a tool to identify potential biomarkers implicated in these disorders. In this chapter, we summarize the different tools and algorithms used in pathway analysis along with their applications to the different layers of molecular investigations, from genomics to proteomics.
Collapse
Affiliation(s)
- Ali Alawieh
- Department of Neurosciences, Medical University of South Carolina, Charleston, SC, USA
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Keator DB, Helmer K, Steffener J, Turner JA, Van Erp TGM, Gadde S, Ashish N, Burns GA, Nichols BN. Towards structured sharing of raw and derived neuroimaging data across existing resources. Neuroimage 2013; 82:647-61. [PMID: 23727024 PMCID: PMC4028152 DOI: 10.1016/j.neuroimage.2013.05.094] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 05/11/2013] [Accepted: 05/18/2013] [Indexed: 10/26/2022] Open
Abstract
Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery.
Collapse
Affiliation(s)
- D B Keator
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA 92617, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Lengauer T. Stellenwert der Bioinformatik für die personalisierte Medizin. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2013; 56:1489-94. [DOI: 10.1007/s00103-013-1819-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
36
|
Devine EB, Capurro D, van Eaton E, Alfonso-Cristancho R, Devlin A, Yanez ND, Yetisgen-Yildiz M, Flum DR, Tarczy-Hornoch P. Preparing Electronic Clinical Data for Quality Improvement and Comparative Effectiveness Research: The SCOAP CERTAIN Automation and Validation Project. EGEMS 2013; 1:1025. [PMID: 25848565 PMCID: PMC4371452 DOI: 10.13063/2327-9214.1025] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background: The field of clinical research informatics includes creation of clinical data repositories (CDRs) used to conduct quality improvement (QI) activities and comparative effectiveness research (CER). Ideally, CDR data are accurately and directly abstracted from disparate electronic health records (EHRs), across diverse health-systems. Objective: Investigators from Washington State’s Surgical Care Outcomes and Assessment Program (SCOAP) Comparative Effectiveness Research Translation Network (CERTAIN) are creating such a CDR. This manuscript describes the automation and validation methods used to create this digital infrastructure. Methods: SCOAP is a QI benchmarking initiative. Data are manually abstracted from EHRs and entered into a data management system. CERTAIN investigators are now deploying Caradigm’s Amalga™ tool to facilitate automated abstraction of data from multiple, disparate EHRs. Concordance is calculated to compare data automatically to manually abstracted. Performance measures are calculated between Amalga and each parent EHR. Validation takes place in repeated loops, with improvements made over time. When automated abstraction reaches the current benchmark for abstraction accuracy - 95% - itwill ‘go-live’ at each site. Progress to Date: A technical analysis was completed at 14 sites. Five sites are contributing; the remaining sites prioritized meeting Meaningful Use criteria. Participating sites are contributing 15–18 unique data feeds, totaling 13 surgical registry use cases. Common feeds are registration, laboratory, transcription/dictation, radiology, and medications. Approximately 50% of 1,320 designated data elements are being automatically abstracted—25% from structured data; 25% from text mining. Conclusion: In semi-automating data abstraction and conducting a rigorous validation, CERTAIN investigators will semi-automate data collection to conduct QI and CER, while advancing the Learning Healthcare System.
Collapse
|
37
|
|
38
|
Shoenbill K, Fost N, Tachinardi U, Mendonca EA. Genetic data and electronic health records: a discussion of ethical, logistical and technological considerations. J Am Med Inform Assoc 2013; 21:171-80. [PMID: 23771953 PMCID: PMC3912723 DOI: 10.1136/amiajnl-2013-001694] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Objective The completion of sequencing the human genome in 2003 has spurred the production and collection of genetic data at ever increasing rates. Genetic data obtained for clinical purposes, as is true for all results of clinical tests, are expected to be included in patients’ medical records. With this explosion of information, questions of what, when, where and how to incorporate genetic data into electronic health records (EHRs) have reached a critical point. In order to answer these questions fully, this paper addresses the ethical, logistical and technological issues involved in incorporating these data into EHRs. Materials and methods This paper reviews journal articles, government documents and websites relevant to the ethics, genetics and informatics domains as they pertain to EHRs. Results and discussion The authors explore concerns and tasks facing health information technology (HIT) developers at the intersection of ethics, genetics, and technology as applied to EHR development. Conclusions By ensuring the efficient and effective incorporation of genetic data into EHRs, HIT developers will play a key role in facilitating the delivery of personalized medicine.
Collapse
Affiliation(s)
- Kimberly Shoenbill
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin, USA
| | | | | | | |
Collapse
|
39
|
Harrow I, Filsell W, Woollard P, Dix I, Braxenthaler M, Gedye R, Hoole D, Kidd R, Wilson J, Rebholz-Schuhmann D. Towards Virtual Knowledge Broker services for semantic integration of life science literature and data sources. Drug Discov Today 2013; 18:428-34. [DOI: 10.1016/j.drudis.2012.11.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 11/09/2012] [Accepted: 11/22/2012] [Indexed: 10/27/2022]
|
40
|
Lonergan DF, Ehrenfeld JM. Advancement of information technology in outpatient and perioperative settings to support patient care and translational research. Pain Manag 2012; 2:445-9. [DOI: 10.2217/pmt.12.43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
SUMMARY Information systems assist in documentation and clinical decision support in settings ranging from an outpatient clinical encounter to the monitoring in an operating room. Such information, if stored and categorized well in a centralized database, offers a treasure trove of information for translational researchers. At Vanderbilt University Medical Center (TN, USA), there is an ongoing effort to advance information systems in all areas and couple this data with a robust genetic repository. It is hoped that such an effort will achieve improvements in quality of care and decreases in costs, while simultaneously providing a fertile ground for translational research.
Collapse
Affiliation(s)
- Daniel F Lonergan
- Division of Pain Medicine, Department of Anesthesiology, Vanderbilt University Medical Center, TN, USA
| | - Jesse M Ehrenfeld
- Assistant Professor of Anesthesiology & Biomedical Informatics Director, Center for Evidence-Based Anesthesia Director, Perioperative Data Systems Research Medical Director, Perioperative Quality Department of Anesthesiology, Vanderbilt University Medical Center, TN, USA
| |
Collapse
|
41
|
Kounelakis MG, Zervakis ME, Giakos GC, Postma GJ, Buydens LMC, Kotsiakis X. On the relevance of glycolysis process on brain gliomas. IEEE J Biomed Health Inform 2012; 17:128-35. [PMID: 22614725 DOI: 10.1109/titb.2012.2199128] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The proposed analysis considers aspects of both statistical and biological validation of the glycolysis effect on brain gliomas, at both genomic and metabolic level. In particular, two independent datasets are analyzed in parallel, one engaging genomic (Microarray Expression) data and the other metabolomic (Magnetic Resonance Spectroscopy Imaging) data. The aim of this study is twofold. First to show that, apart from the already studied genes (markers), other genes such as those involved in the human cell glycolysis significantly contribute in gliomas discrimination. Second, to demonstrate how the glycolysis process can open new ways towards the design of patient-specific therapeutic protocols. The results of our analysis demonstrate that the combination of genes participating in the glycolytic process (ALDOA, ALDOC, ENO2, GAPDH, HK2, LDHA, LDHB, MDH1, PDHB, PFKM, PGI, PGK1, PGM1 and PKLR) with the already known tumor suppressors (PTEN, Rb, TP53), oncogenes (CDK4, EGFR, PDGF) and HIF-1, enhance the discrimination of low versus high-grade gliomas providing high prediction ability in a cross-validated framework. Following these results and supported by the biological effect of glycolytic genes on cancer cells, we address the study of glycolysis for the development of new treatment protocols.
Collapse
|
42
|
Palma JP, Benitz WE, Tarczy-Hornoch P, Butte AJ, Longhurst CA. Neonatal Informatics: Transforming Neonatal Care Through Translational Bioinformatics. Neoreviews 2012; 13:e281-e284. [PMID: 22924023 DOI: 10.1542/neo.13-5-e281] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The future of neonatal informatics will be driven by the availability of increasingly vast amounts of clinical and genetic data. The field of translational bioinformatics is concerned with linking and learning from these data and applying new findings to clinical care to transform the data into proactive, predictive, preventive, and participatory health. As a result of advances in translational informatics, the care of neonates will become more data driven, evidence based, and personalized.
Collapse
Affiliation(s)
- Jonathan P Palma
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA
| | | | | | | | | |
Collapse
|
43
|
Bauer-Mehren A, van Mullingen EM, Avillach P, Carrascosa MDC, Garcia-Serna R, Piñero J, Singh B, Lopes P, Oliveira JL, Diallo G, Ahlberg Helgee E, Boyer S, Mestres J, Sanz F, Kors JA, Furlong LI. Automatic filtering and substantiation of drug safety signals. PLoS Comput Biol 2012; 8:e1002457. [PMID: 22496632 PMCID: PMC3320573 DOI: 10.1371/journal.pcbi.1002457] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Accepted: 02/20/2012] [Indexed: 02/02/2023] Open
Abstract
Drug safety issues pose serious health threats to the population and constitute a major cause of mortality worldwide. Due to the prominent implications to both public health and the pharmaceutical industry, it is of great importance to unravel the molecular mechanisms by which an adverse drug reaction can be potentially elicited. These mechanisms can be investigated by placing the pharmaco-epidemiologically detected adverse drug reaction in an information-rich context and by exploiting all currently available biomedical knowledge to substantiate it. We present a computational framework for the biological annotation of potential adverse drug reactions. First, the proposed framework investigates previous evidences on the drug-event association in the context of biomedical literature (signal filtering). Then, it seeks to provide a biological explanation (signal substantiation) by exploring mechanistic connections that might explain why a drug produces a specific adverse reaction. The mechanistic connections include the activity of the drug, related compounds and drug metabolites on protein targets, the association of protein targets to clinical events, and the annotation of proteins (both protein targets and proteins associated with clinical events) to biological pathways. Hence, the workflows for signal filtering and substantiation integrate modules for literature and database mining, in silico drug-target profiling, and analyses based on gene-disease networks and biological pathways. Application examples of these workflows carried out on selected cases of drug safety signals are discussed. The methodology and workflows presented offer a novel approach to explore the molecular mechanisms underlying adverse drug reactions. Adverse drug reactions (ADRs) constitute a major cause of morbidity and mortality worldwide. Due to the relevance of ADRs for both public health and pharmaceutical industry, it is important to develop efficient ways to monitor ADRs in the population. In addition, it is also essential to comprehend why a drug produces an adverse effect. To unravel the molecular mechanisms of ADRs, it is necessary to consider the ADR in the context of current biomedical knowledge that might explain it. Nowadays there are plenty of information sources that can be exploited in order to accomplish this goal. Nevertheless, the fragmentation of information and, more importantly, the diverse knowledge domains that need to be traversed, pose challenges to the task of exploring the molecular mechanisms of ADRs. We present a novel computational framework to aid in the collection and exploration of evidences that support the causal inference of ADRs detected by mining clinical records. This framework was implemented as publicly available tools integrating state-of-the-art bioinformatics methods for the analysis of drugs, targets, biological processes and clinical events. The availability of such tools for in silico experiments will facilitate research on the mechanisms that underlie ADR, contributing to the development of safer drugs.
Collapse
Affiliation(s)
- Anna Bauer-Mehren
- Research Programme on Biomedical Informatics (GRIB), IMIM-Hospital del Mar Research Institute, DCEX, Universitat Pompeu Fabra, Barcelona, Spain
| | | | - Paul Avillach
- LESIM-ISPED, Université de Bordeaux, Bordeaux, France
- LERTIM, EA 3283, Faculté de Médecine, Université de Aix-Marseille, Marseille, France
| | - María del Carmen Carrascosa
- Research Programme on Biomedical Informatics (GRIB), IMIM-Hospital del Mar Research Institute, DCEX, Universitat Pompeu Fabra, Barcelona, Spain
| | - Ricard Garcia-Serna
- Research Programme on Biomedical Informatics (GRIB), IMIM-Hospital del Mar Research Institute, DCEX, Universitat Pompeu Fabra, Barcelona, Spain
| | - Janet Piñero
- Research Programme on Biomedical Informatics (GRIB), IMIM-Hospital del Mar Research Institute, DCEX, Universitat Pompeu Fabra, Barcelona, Spain
| | - Bharat Singh
- Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Pedro Lopes
- DETI/IEETA, Universidade de Aveiro, Aveiro, Portugal
| | | | - Gayo Diallo
- LESIM-ISPED, Université de Bordeaux, Bordeaux, France
| | | | | | - Jordi Mestres
- Research Programme on Biomedical Informatics (GRIB), IMIM-Hospital del Mar Research Institute, DCEX, Universitat Pompeu Fabra, Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), IMIM-Hospital del Mar Research Institute, DCEX, Universitat Pompeu Fabra, Barcelona, Spain
| | - Jan A. Kors
- Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Laura I. Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM-Hospital del Mar Research Institute, DCEX, Universitat Pompeu Fabra, Barcelona, Spain
- * E-mail:
| |
Collapse
|
44
|
|
45
|
Müller H, Freytag JC, Leser U. Improving data quality by source analysis. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2012. [DOI: 10.1145/2107536.2107538] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
In many domains, data cleaning is hampered by our limited ability to specify a comprehensive set of integrity constraints to assist in identification of erroneous data. An alternative approach to improve data quality is to exploit different data sources that contain information about the same set of objects. Such overlapping sources highlight hot-spots of poor data quality through conflicting data values and immediately provide alternative values for conflict resolution. In order to derive a dataset of high quality, we can merge the overlapping sources based on a quality assessment of the conflicting values. The quality of the resulting dataset, however, is highly dependent on our ability to asses the quality of conflicting values effectively.
The main objective of this article is to introduce methods that aid the developer of an integrated system over overlapping, but contradicting sources in the task of improving the quality of data. Value conflicts between contradicting sources are often systematic, caused by some characteristic of the different sources. Our goal is to identify such systematic differences and outline data patterns that occur in conjunction with them. Evaluated by an expert user, the regularities discovered provide insights into possible conflict reasons and help to assess the quality of inconsistent values. The contributions of this article are two concepts of systematic conflicts: contradiction patterns and minimal update sequences. Contradiction patterns resemble a special form of association rules that summarize characteristic data properties for conflict occurrence. We adapt existing association rule mining algorithms for mining contradiction patterns. Contradiction patterns, however, view each class of conflicts in isolation, sometimes leading to largely overlapping patterns. Sequences of set-oriented update operations that transform one data source into the other are compact descriptions for all regular differences among the sources. We consider minimal update sequences as the most likely explanation for observed differences between overlapping data sources. Furthermore, the order of operations within the sequences point out potential dependencies between systematic differences. Finding minimal update sequences, however, is beyond reach in practice. We show that the problem already is NP-complete for a restricted set of operations. In the light of this intractability result, we present heuristics that lead to convincing results for all examples we considered.
Collapse
Affiliation(s)
| | | | - Ulf Leser
- Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
46
|
Greene CS, Troyanskaya OG. Accurate evaluation and analysis of functional genomics data and methods. Ann N Y Acad Sci 2012; 1260:95-100. [PMID: 22268703 DOI: 10.1111/j.1749-6632.2011.06383.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The development of technology capable of inexpensively performing large-scale measurements of biological systems has generated a wealth of data. Integrative analysis of these data holds the promise of uncovering gene function, regulation, and, in the longer run, understanding complex disease. However, their analysis has proved very challenging, as it is difficult to quickly and effectively assess the relevance and accuracy of these data for individual biological questions. Here, we identify biases that present challenges for the assessment of functional genomics data and methods. We then discuss evaluation methods that, taken together, begin to address these issues. We also argue that the funding of systematic data-driven experiments and of high-quality curation efforts will further improve evaluation metrics so that they more-accurately assess functional genomics data and methods. Such metrics will allow researchers in the field of functional genomics to continue to answer important biological questions in a data-driven manner.
Collapse
Affiliation(s)
- Casey S Greene
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA.
| | | |
Collapse
|
47
|
Wade TD, Hum RC, Murphy JR. A Dimensional Bus model for integrating clinical and research data. J Am Med Inform Assoc 2011; 18 Suppl 1:i96-102. [PMID: 21856687 PMCID: PMC3241170 DOI: 10.1136/amiajnl-2011-000339] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2011] [Accepted: 07/11/2011] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVES Many clinical research data integration platforms rely on the Entity-Attribute-Value model because of its flexibility, even though it presents problems in query formulation and execution time. The authors sought more balance in these traits. MATERIALS AND METHODS Borrowing concepts from Entity-Attribute-Value and from enterprise data warehousing, the authors designed an alternative called the Dimensional Bus model and used it to integrate electronic medical record, sponsored study, and biorepository data. Each type of observational collection has its own table, and the structure of these tables varies to suit the source data. The observational tables are linked to the Bus, which holds provenance information and links to various classificatory dimensions that amplify the meaning of the data or facilitate its query and exposure management. RESULTS The authors implemented a Bus-based clinical research data repository with a query system that flexibly manages data access and confidentiality, facilitates catalog search, and readily formulates and compiles complex queries. CONCLUSION The design provides a workable way to manage and query mixed schemas in a data warehouse.
Collapse
Affiliation(s)
- Ted D Wade
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, Colorado 80206-2761, USA.
| | | | | |
Collapse
|
48
|
Malin B, Loukides G, Benitez K, Clayton EW. Identifiability in biobanks: models, measures, and mitigation strategies. Hum Genet 2011; 130:383-92. [PMID: 21739176 PMCID: PMC3621020 DOI: 10.1007/s00439-011-1042-5] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Accepted: 06/12/2011] [Indexed: 12/29/2022]
Abstract
The collection and sharing of person-specific biospecimens has raised significant questions regarding privacy. In particular, the question of identifiability, or the degree to which materials stored in biobanks can be linked to the name of the individuals from which they were derived, is under scrutiny. The goal of this paper is to review the extent to which biospecimens and affiliated data can be designated as identifiable. To achieve this goal, we summarize recent research in identifiability assessment for DNA sequence data, as well as associated demographic and clinical data, shared via biobanks. We demonstrate the variability of the degree of risk, the factors that contribute to this variation, and potential ways to mitigate and manage such risk. Finally, we discuss the policy implications of these findings, particularly as they pertain to biobank security and access policies. We situate our review in the context of real data sharing scenarios and biorepositories.
Collapse
Affiliation(s)
- Bradley Malin
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 600, Nashville, TN 37203, USA. Department of Electrical Engineering and Computer Science, School of Engineering, Vanderbilt University, Nashville, USA
| | - Grigorios Loukides
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 600, Nashville, TN 37203, USA
| | - Kathleen Benitez
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 600, Nashville, TN 37203, USA
| | - Ellen Wright Clayton
- Department of Pediatrics, School of Medicine, Vanderbilt, USA. Center for Biomedical Ethics and Society, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 400, Nashville, TN 37203, USA. School of Law, Vanderbilt University, Nashville, USA
| |
Collapse
|
49
|
Sarkar IN, Butte AJ, Lussier YA, Tarczy-Hornoch P, Ohno-Machado L. Translational bioinformatics: linking knowledge across biological and clinical realms. J Am Med Inform Assoc 2011; 18:354-7. [PMID: 21561873 PMCID: PMC3128415 DOI: 10.1136/amiajnl-2011-000245] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2011] [Accepted: 04/19/2011] [Indexed: 11/30/2022] Open
Abstract
Nearly a decade since the completion of the first draft of the human genome, the biomedical community is positioned to usher in a new era of scientific inquiry that links fundamental biological insights with clinical knowledge. Accordingly, holistic approaches are needed to develop and assess hypotheses that incorporate genotypic, phenotypic, and environmental knowledge. This perspective presents translational bioinformatics as a discipline that builds on the successes of bioinformatics and health informatics for the study of complex diseases. The early successes of translational bioinformatics are indicative of the potential to achieve the promise of the Human Genome Project for gaining deeper insights to the genetic underpinnings of disease and progress toward the development of a new generation of therapies.
Collapse
Affiliation(s)
- Indra Neil Sarkar
- Center for Clinical and Translational Science, University of Vermont, Burlington, Vermont 05405, USA.
| | | | | | | | | |
Collapse
|
50
|
|