1
|
Rodriguez JM, Maquedano M, Cerdan-Velez D, Calvo E, Vazquez J, Tress ML. A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.14.623419. [PMID: 39605392 PMCID: PMC11601488 DOI: 10.1101/2024.11.14.623419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
The human genome has been the subject of intense scrutiny by experimental and manual curation projects for more than two decades. Novel coding genes have been proposed from large-scale RNASeq, ribosome profiling and proteomics experiments. Here we carry out an in-depth analysis of an entire proteomics database. We analysed the proteins, peptides and spectra housed in the human build of the PeptideAtlas proteomics database to identify coding regions that are not yet annotated in the GENCODE reference gene set. We find support for hundreds of missing alternative protein isoforms and unannotated upstream translations, and evidence of cross-contamination from other species. There was reliable peptide evidence for 34 novel unannotated open reading frames (ORFs) in PeptideAtlas. We find that almost half belong to coding genes that are missing from GENCODE and other reference sets. Most of the remaining ORFs were not conserved beyond human, however, and their peptide confirmation was restricted to cancer cell lines. We show that this is strong evidence for aberrant translation, raising important questions about the extent of aberrant translation and how these ORFs should be annotated in reference genomes.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Miguel Maquedano
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Daniel Cerdan-Velez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Enrique Calvo
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Jesús Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| |
Collapse
|
2
|
Rodriguez JM, Abascal F, Cerdán-Vélez D, Gómez LM, Vázquez J, Tress ML. Evidence for widespread translation of 5' untranslated regions. Nucleic Acids Res 2024; 52:8112-8126. [PMID: 38953162 DOI: 10.1093/nar/gkae571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 06/07/2024] [Accepted: 06/19/2024] [Indexed: 07/03/2024] Open
Abstract
Ribosome profiling experiments support the translation of a range of novel human open reading frames. By contrast, most peptides from large-scale proteomics experiments derive from just one source, 5' untranslated regions. Across the human genome we find evidence for 192 translated upstream regions, most of which would produce protein isoforms with extended N-terminal ends. Almost all of these N-terminal extensions are from highly abundant genes, which suggests that the novel regions we detect are just the tip of the iceberg. These upstream regions have characteristics that are not typical of coding exons. Their GC-content is remarkably high, even higher than 5' regions in other genes, and a large majority have non-canonical start codons. Although some novel upstream regions have cross-species conservation - five have orthologues in invertebrates for example - the reading frames of two thirds are not conserved beyond simians. These non-conserved regions also have no evidence of purifying selection, which suggests that much of this translation is not functional. In addition, non-conserved upstream regions have significantly more peptides in cancer cell lines than would be expected, a strong indication that an aberrant or noisy translation initiation process may play an important role in translation from upstream regions.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA. UK
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Jesús Vázquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| |
Collapse
|
3
|
Sholeye AR, Williams AA, Loots DT, Tutu van Furth AM, van der Kuip M, Mason S. Tuberculous Granuloma: Emerging Insights From Proteomics and Metabolomics. Front Neurol 2022; 13:804838. [PMID: 35386409 PMCID: PMC8978302 DOI: 10.3389/fneur.2022.804838] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 02/24/2022] [Indexed: 12/24/2022] Open
Abstract
Mycobacterium tuberculosis infection, which claims hundreds of thousands of lives each year, is typically characterized by the formation of tuberculous granulomas — the histopathological hallmark of tuberculosis (TB). Our knowledge of granulomas, which comprise a biologically diverse body of pro- and anti-inflammatory cells from the host immune responses, is based mainly upon examination of lungs, in both human and animal studies, but little on their counterparts from other organs of the TB patient such as the brain. The biological heterogeneity of TB granulomas has led to their diverse, relatively uncoordinated, categorization, which is summarized here. However, there is a pressing need to elucidate more fully the phenotype of the granulomas from infected patients. Newly emerging studies at the protein (proteomics) and metabolite (metabolomics) levels have the potential to achieve this. In this review we summarize the diverse nature of TB granulomas based upon the literature, and amplify these accounts by reporting on the relatively few, emerging proteomics and metabolomics studies on TB granulomas. Metabolites (for example, trimethylamine-oxide) and proteins (such as the peptide PKAp) associated with TB granulomas, and knowledge of their localizations, help us to understand the resultant phenotype. Nevertheless, more multidisciplinary ‘omics studies, especially in human subjects, are required to contribute toward ushering in a new era of understanding of TB granulomas – both at the site of infection, and on a systemic level.
Collapse
Affiliation(s)
- Abisola Regina Sholeye
- Department of Biochemistry, Human Metabolomics, Faculty of Natural and Agricultural Sciences, North-West University, Potchefstroom, South Africa
| | - Aurelia A. Williams
- Department of Biochemistry, Human Metabolomics, Faculty of Natural and Agricultural Sciences, North-West University, Potchefstroom, South Africa
| | - Du Toit Loots
- Department of Biochemistry, Human Metabolomics, Faculty of Natural and Agricultural Sciences, North-West University, Potchefstroom, South Africa
| | - A. Marceline Tutu van Furth
- Department of Pediatric Infectious Diseases and Immunology, Pediatric Infectious Diseases and Immunology, Amsterdam University Medical Center, Emma Children's Hospital, Amsterdam, Netherlands
| | - Martijn van der Kuip
- Department of Pediatric Infectious Diseases and Immunology, Pediatric Infectious Diseases and Immunology, Amsterdam University Medical Center, Emma Children's Hospital, Amsterdam, Netherlands
| | - Shayne Mason
- Department of Biochemistry, Human Metabolomics, Faculty of Natural and Agricultural Sciences, North-West University, Potchefstroom, South Africa
- *Correspondence: Shayne Mason
| |
Collapse
|
4
|
Abstract
Evolution gave rise to creatures that are arguably more sophisticated than the greatest human-designed systems. This feat has inspired computer scientists since the advent of computing and led to optimization tools that can evolve complex neural networks for machines-an approach known as "neuroevolution." After a few successes in designing evolvable representations for high-dimensional artifacts, the field has been recently revitalized by going beyond optimization: to many, the wonder of evolution is less in the perfect optimization of each species than in the creativity of such a simple iterative process, that is, in the diversity of species. This modern view of artificial evolution is moving the field away from microevolution, following a fitness gradient in a niche, to macroevolution, filling many niches with highly different species. It already opened promising applications, like evolving gait repertoires, video game levels for different tastes, and diverse designs for aerodynamic bikes.
Collapse
|
5
|
Abascal F, Juan D, Jungreis I, Kellis M, Martinez L, Rigau M, Rodriguez JM, Vazquez J, Tress ML. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res 2019; 46:7070-7084. [PMID: 29982784 PMCID: PMC6101605 DOI: 10.1093/nar/gky587] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 06/18/2018] [Indexed: 12/16/2022] Open
Abstract
Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.
Collapse
Affiliation(s)
- Federico Abascal
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - David Juan
- Comparative Genomics Lab, Instituto de Biologica Evolutiva, Universitat Pompeu Fabra, Barcelona, Spain
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA and Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Laura Martinez
- Bioinformatics Unit, Spanish National Cancer Research Centre, Madrid, Spain
| | - Maria Rigau
- Computational Biology Life Sciences Group, Barcelona Supercomputing Center, Barcelona, Spain
| | - Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
| | - Jesus Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre, Madrid, Spain
| |
Collapse
|
6
|
Capitanio D, Moriggi M, Gelfi C. Mapping the human skeletal muscle proteome: progress and potential. Expert Rev Proteomics 2017; 14:825-839. [PMID: 28780899 DOI: 10.1080/14789450.2017.1364996] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
INTRODUCTION Human skeletal muscle represents 40% of our body mass and deciphering its proteome composition to further understand mechanisms regulating muscle function under physiological and pathological conditions has proved a challenge. The inter-individual variability, the presence of structurally and functionally different muscle types and the high protein dynamic range require carefully selected methodologies for the assessment of the muscle proteome. Furthermore, physiological studies are understandingly hampered by ethical issues related to biopsies on healthy subjects, making it difficult to recruit matched controls essential for comparative studies. Areas covered: This review critically analyses studies performed on muscle to date and identifies what still remains unknown or poorly investigated in physiological and pathological states, such as training, aging, metabolic disorders and muscular dystrophies. Expert commentary: Efforts should be made on biological fluid analyses targeting low abundant/low molecular weight fragments generated from muscle cell disruption to improve diagnosis and clinical monitoring. From a methodological point of view, particular attention should be paid to improve the characterization of intact proteins and unknown post translational modifications to better understand the molecular mechanisms of muscle disorders.
Collapse
Affiliation(s)
- Daniele Capitanio
- a Department of Biomedical Sciences for Health , University of Milan , Segrate , Milan , Italy
| | - Manuela Moriggi
- a Department of Biomedical Sciences for Health , University of Milan , Segrate , Milan , Italy
| | - Cecilia Gelfi
- a Department of Biomedical Sciences for Health , University of Milan , Segrate , Milan , Italy
| |
Collapse
|
7
|
Abstract
In 2004, when the protein estimate from the finished human genome was only 24,000, the surprise was compounded as reviewed estimates fell to 19,000 by 2014. However, variability in the total canonical protein counts (i.e. excluding alternative splice forms) of open reading frames (ORFs) in different annotation portals persists. This work assesses these differences and possible causes. A 16-year analysis of Ensembl and UniProtKB/Swiss-Prot shows convergence to a protein number of ~20,000. The former had shown some yo-yoing, but both have now plateaued. Nine major annotation portals, reviewed at the beginning of 2017, gave a spread of counts from 21,819 down to 18,891. The 4-way cross-reference concordance (within UniProt) between Ensembl, Swiss-Prot, Entrez Gene and the Human Gene Nomenclature Committee (HGNC) drops to 18,690, indicating methodological differences in protein definitions and experimental existence support between sources. The Swiss-Prot and neXtProt evidence criteria include mass spectrometry peptide verification and also cross-references for antibody detection from the Human Protein Atlas. Notwithstanding, hundreds of Swiss-Prot entries are classified as non-coding biotypes by HGNC. The only inference that protein numbers might still rise comes from numerous reports of small ORF (smORF) discovery. However, while there have been recent cases of protein verifications from previous miss-annotation of non-coding RNA, very few have passed the Swiss-Prot curation and genome annotation thresholds. The post-genomic era has seen both advances in data generation and improvements in the human reference assembly. Notwithstanding, current numbers, while persistently discordant, show that the earlier yo-yoing has largely ceased. Given the importance to biology and biomedicine of defining the canonical human proteome, the task will need more collaborative inter-source curation combined with broader and deeper experimental confirmation in vivo and in vitro of proteins predicted in silico. The eventual closure could be well be below ~19,000.
Collapse
Affiliation(s)
- Christopher Southan
- IUPHAR/BPS Guide to Pharmacology, Centre for Integrative Physiology, University of Edinburgh, Edinburgh, EH8 9XD, UK
| |
Collapse
|
8
|
Wetmore BA, Merrick BA. Invited Review: Toxicoproteomics: Proteomics Applied to Toxicology and Pathology. Toxicol Pathol 2016; 32:619-42. [PMID: 15580702 DOI: 10.1080/01926230490518244] [Citation(s) in RCA: 122] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Global measurement of proteins and their many attributes in tissues and biofluids defines the field of proteomics. Toxicoproteomics, as part of the larger field of toxicogenomics, seeks to identify critical proteins and pathways in biological systems that are affected by and respond to adverse chemical and environmental exposures using global protein expression technologies. Toxicoproteomics integrates 3 disciplinary areas: traditional toxicology and pathology, differential protein and gene expression analysis, and systems biology. Key topics to be reviewed are the evolution of proteomics, proteomic technology platforms and their capabilities with exemplary studies from biology and medicine, a review of over 50 recent studies applying proteomic analysis to toxicological research, and the recent development of databases designed to integrate -Omics technologies with toxicology and pathology. Proteomics is examined for its potential in discovery of new biomarkers and toxicity signatures, in mapping serum, plasma, and other biofluid proteomes, and in parallel proteomic and transcriptomic studies. The new field of toxicoproteomics is uniquely positioned toward an expanded understanding of protein expression during toxicity and environmental disease for the advancement of public health.
Collapse
Affiliation(s)
- Barbara A Wetmore
- National Center for Toxicogenomics, National Institute of Environmental Health Sciences, Research Triangle Park, North Caroline 27709, USA
| | | |
Collapse
|
9
|
Silva F, Duarte M, Correia L, Oliveira SM, Christensen AL. Open Issues in Evolutionary Robotics. EVOLUTIONARY COMPUTATION 2015; 24:205-236. [PMID: 26581015 DOI: 10.1162/evco_a_00172] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
One of the long-term goals in evolutionary robotics is to be able to automatically synthesize controllers for real autonomous robots based only on a task specification. While a number of studies have shown the applicability of evolutionary robotics techniques for the synthesis of behavioral control, researchers have consistently been faced with a number of issues preventing the widespread adoption of evolutionary robotics for engineering purposes. In this article, we review and discuss the open issues in evolutionary robotics. First, we analyze the benefits and challenges of simulation-based evolution and subsequent deployment of controllers versus evolution on real robotic hardware. Second, we discuss specific evolutionary computation issues that have plagued evolutionary robotics: (1) the bootstrap problem, (2) deception, and (3) the role of genomic encoding and genotype-phenotype mapping in the evolution of controllers for complex tasks. Finally, we address the absence of standard research practices in the field. We also discuss promising avenues of research. Our underlying motivation is the reduction of the current gap between evolutionary robotics and mainstream robotics, and the establishment of evolutionary robotics as a canonical approach for the engineering of autonomous robots.
Collapse
Affiliation(s)
- Fernando Silva
- Bio-inspired Computation and Intelligent Machines Lab, 1649-026 Lisboa, Portugal BioISI, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal Instituto de Telecomunicações, 1049-001 Lisboa, Portugal
| | - Miguel Duarte
- Bio-inspired Computation and Intelligent Machines Lab, 1649-026 Lisboa, Portugal Instituto de Telecomunicações, 1049-001 Lisboa, Portugal Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal
| | - Luís Correia
- BioISI, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Sancho Moura Oliveira
- Bio-inspired Computation and Intelligent Machines Lab, 1649-026 Lisboa, Portugal Instituto de Telecomunicações, 1049-001 Lisboa, Portugal Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal
| | - Anders Lyhne Christensen
- Bio-inspired Computation and Intelligent Machines Lab, 1649-026 Lisboa, Portugal Instituto de Telecomunicações, 1049-001 Lisboa, Portugal Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal
| |
Collapse
|
10
|
da Costa JP, Carvalhais V, Ferreira R, Amado F, Vilanova M, Cerca N, Vitorino R. Proteome signatures—how are they obtained and what do they teach us? Appl Microbiol Biotechnol 2015. [PMID: 26205520 DOI: 10.1007/s00253-015-6795-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
11
|
Abstract
Since the publication of the human genome, two key points have emerged. First, it is still not certain which regions of the genome code for proteins. Second, the number of discrete protein-coding genes is far fewer than the number of different proteins. Proteomics has the potential to address some of these postgenomic issues if the obstacles that we face can be overcome in our efforts to combine proteomic and genomic data. There are many challenges associated with high-throughput and high-output proteomic technologies. Consequently, for proteomics to continue at its current growth rate, new approaches must be developed to ease data management and data mining. Initiatives have been launched to develop standard data formats for exchanging mass spectrometry proteomic data, including the Proteomics Standards Initiative formed by the Human Proteome Organization. Databases such as SwissProt and Uniprot are publicly available repositories for protein sequences annotated for function, subcellular location and known potential post-translational modifications. The availability of bioinformatics solutions is crucial for proteomics technologies to fulfil their promise of adding further definition to the functional output of the human genome. The aim of the Oxford Genome Anatomy Project is to provide a framework for integrating molecular, cellular, phenotypic and clinical information with experimental genetic and proteomics data. This perspective also discusses models to make the Oxford Genome Anatomy Project accessible and beneficial for academic and commercial research and development.
Collapse
Affiliation(s)
- Christian Rohlff
- Oxford Genome Sciences Ltd, 22 The Quadrant, Barton Lane, Abingdon Sciences Park, Abingdon, OX14 3YS, UK.
| |
Collapse
|
12
|
Abstract
In the past several years, proteomics and its subdiscipline clinical proteomics have been engaged in the discovery of the next generation protein of biomarkers. As the effort and the intensive debate it has sparked continue, it is becoming apparent that a paradigm shift is needed in proteomics in order to truly comprehend the complexity of the human proteome and assess its subtle variations among individuals. This review introduces the concept of population proteomics as a future direction in proteomics research. Population proteomics is the study of protein diversity in human populations. High-throughput, top-down mass spectrometric approaches are employed to investigate, define and understand protein diversity and modulations across and within populations. Population proteomics is a discovery-oriented endeavor with a goal of establishing the incidence of protein structural variations and quantitative regulation of these modifications. Assessing human protein variations among and within populations is viewed as a paramount undertaking that can facilitate clinical proteomics' effort in discovery and validation of protein features that can be used as markers for early diagnosis of disease, monitoring of disease progression and assessment of therapy. This review outlines the growing need for analyzing individuals' proteomes and describes the approaches that are likely to be applied in such a population proteomics endeavor.
Collapse
Affiliation(s)
- Dobrin Nedelkov
- Intrinsic Bioprobes, Inc., 625 S. Smith Rd, Suite 22, Tempe, AZ 85281, USA.
| |
Collapse
|
13
|
Yu LCH, Wang JT, Wei SC, Ni YH. Host-microbial interactions and regulation of intestinal epithelial barrier function: From physiology to pathology. World J Gastrointest Pathophysiol 2012; 3:27-43. [PMID: 22368784 PMCID: PMC3284523 DOI: 10.4291/wjgp.v3.i1.27] [Citation(s) in RCA: 175] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Revised: 10/04/2011] [Accepted: 02/08/2012] [Indexed: 02/06/2023] Open
Abstract
The gastrointestinal tract is the largest reservoir of commensal bacteria in the human body, providing nutrients and space for the survival of microbes while concurrently operating mucosal barriers to confine the microbial population. The epithelial cells linked by tight junctions not only physically separate the microbiota from the lamina propria, but also secrete proinflammatory cytokines and reactive oxygen species in response to pathogen invasion and metabolic stress and serve as a sentinel to the underlying immune cells. Accumulating evidence indicates that commensal bacteria are involved in various physiological functions in the gut and microbial imbalances (dysbiosis) may cause pathology. Commensal bacteria are involved in the regulation of intestinal epithelial cell turnover, promotion of epithelial restitution and reorganization of tight junctions, all of which are pivotal for fortifying barrier function. Recent studies indicate that aberrant bacterial lipopolysaccharide-mediated signaling in gut mucosa may be involved in the pathogenesis of chronic inflammation and carcinogenesis. Our perception of enteric commensals has now changed from one of opportunistic pathogens to active participants in maintaining intestinal homeostasis. This review attempts to explain the dynamic interaction between the intestinal epithelium and commensal bacteria in disease and health status.
Collapse
|
14
|
Richards AL, Jones L, Moskvina V, Kirov G, Gejman PV, Levinson DF, Sanders AR, Purcell S, Visscher PM, Craddock N, Owen MJ, Holmans P, O’Donovan MC. Schizophrenia susceptibility alleles are enriched for alleles that affect gene expression in adult human brain. Mol Psychiatry 2012; 17:193-201. [PMID: 21339752 PMCID: PMC4761872 DOI: 10.1038/mp.2011.11] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Revised: 01/12/2011] [Accepted: 01/19/2011] [Indexed: 12/11/2022]
Abstract
It is widely thought that alleles that influence susceptibility to common diseases, including schizophrenia, will frequently do so through effects on gene expression. As only a small proportion of the genetic variance for schizophrenia has been attributed to specific loci, this remains an unproven hypothesis. The International Schizophrenia Consortium (ISC) recently reported a substantial polygenic contribution to that disorder, and that schizophrenia risk alleles are enriched among single-nucleotide polymorphisms (SNPs) selected for marginal evidence for association (P<0.5) from genome-wide association studies (GWAS). It follows that if schizophrenia susceptibility alleles are enriched for those that affect gene expression, those marginally associated SNPs, which are also expression quantitative trait loci (eQTLs), should carry more true association signals compared with SNPs that are not marginally associated. To test this, we identified marginally associated (P<0.5) SNPs from two of the largest available schizophrenia GWAS data sets. We assigned eQTL status to those SNPs based upon an eQTL data set derived from adult human brain. Using the polygenic score method of analysis reported by the ISC, we observed and replicated the observation that higher probability cis-eQTLs predicted schizophrenia better than those with a lower probability for being a cis-eQTL. Our data support the hypothesis that alleles conferring risk of schizophrenia are enriched among those that affect gene expression. Moreover, our data show that notwithstanding the likely developmental origin of schizophrenia, studies of adult brain tissue can, in principle, allow relevant susceptibility eQTLs to be identified.
Collapse
Affiliation(s)
- Alexander L Richards
- MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Lesley Jones
- MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Valentina Moskvina
- MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - George Kirov
- MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Pablo V Gejman
- Center for Psychiatric Genetics, Department of Psychiatry and Behavioral Sciences, Northshore University Health System Research Institute, 1001 University Place, Evanston, IL 60201, USA
| | | | - Alan R Sanders
- Center for Psychiatric Genetics, Department of Psychiatry and Behavioral Sciences, Northshore University Health System Research Institute, 1001 University Place, Evanston, IL 60201, USA
| | | | | | - Shaun Purcell
- Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Massachusetts 02114, USA
- Center for Human Genetic Research, Massachusetts General Hospital, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | - Peter M Visscher
- Queensland Statistical Genetics Laboratory, Queensland Institute of Medical Research, 300 Herston Road, Brisbane 4006, Australia
| | - Nick Craddock
- MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Michael J Owen
- MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Peter Holmans
- MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Michael C O’Donovan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| |
Collapse
|
15
|
Making every SAR point count: the development of Chemistry Connect for the large-scale integration of structure and bioactivity data. Drug Discov Today 2011; 16:1019-30. [PMID: 22024215 DOI: 10.1016/j.drudis.2011.10.005] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Revised: 09/10/2011] [Accepted: 10/10/2011] [Indexed: 11/24/2022]
Abstract
The increase in drug research output from patent applications, together with the expansion of public data collections, such as ChEMBL and PubChem BioAssay, has made it essential for pharmaceutical companies to integrate both internal and external 'SAR estate'. The AstraZeneca response has been the development of an enterprise application, Chemistry Connect, containing 45 million unique chemical structures from 18 internal and external data sources. It includes merged compound-to-assay-to-result-to-target relationships extracted from patents, papers and internal data. Users can explore connections between these by searching using drug names or synonyms, chemical structures, patent numbers and target protein identifiers at a scale not previously available.
Collapse
|
16
|
Hartson SD, Matts RL. Approaches for defining the Hsp90-dependent proteome. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2011; 1823:656-67. [PMID: 21906632 DOI: 10.1016/j.bbamcr.2011.08.013] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Revised: 08/22/2011] [Accepted: 08/23/2011] [Indexed: 10/17/2022]
Abstract
Hsp90 is the target of ongoing drug discovery studies seeking new compounds to treat cancer, neurodegenerative diseases, and protein folding disorders. To better understand Hsp90's roles in cellular pathologies and in normal cells, numerous studies have utilized proteomics assays and related high-throughput tools to characterize its physical and functional protein partnerships. This review surveys these studies, and summarizes the strengths and limitations of the individual attacks. We also include downloadable spreadsheets compiling all of the Hsp90-interacting proteins identified in more than 23 studies. These tools include cross-references among gene aliases, human homologues of yeast Hsp90-interacting proteins, hyperlinks to database entries, summaries of canonical pathways that are enriched in the Hsp90 interactome, and additional bioinformatic annotations. In addition to summarizing Hsp90 proteomics studies performed to date and the insights they have provided, we identify gaps in our current understanding of Hsp90-mediated proteostasis. This article is part of a Special Issue entitled: Heat Shock Protein 90 (HSP90).
Collapse
Affiliation(s)
- Steven D Hartson
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, USA
| | | |
Collapse
|
17
|
Ohlendieck K. Skeletal muscle proteomics: current approaches, technical challenges and emerging techniques. Skelet Muscle 2011; 1:6. [PMID: 21798084 PMCID: PMC3143904 DOI: 10.1186/2044-5040-1-6] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2010] [Accepted: 02/01/2011] [Indexed: 01/08/2023] Open
Abstract
Background Skeletal muscle fibres represent one of the most abundant cell types in mammals. Their highly specialised contractile and metabolic functions depend on a large number of membrane-associated proteins with very high molecular masses, proteins with extensive posttranslational modifications and components that exist in highly complex supramolecular structures. This makes it extremely difficult to perform conventional biochemical studies of potential changes in protein clusters during physiological adaptations or pathological processes. Results Skeletal muscle proteomics attempts to establish the global identification and biochemical characterisation of all members of the muscle-associated protein complement. A considerable number of proteomic studies have employed large-scale separation techniques, such as high-resolution two-dimensional gel electrophoresis or liquid chromatography, and combined them with mass spectrometry as the method of choice for high-throughput protein identification. Muscle proteomics has been applied to the comprehensive biochemical profiling of developing, maturing and aging muscle, as well as the analysis of contractile tissues undergoing physiological adaptations seen in disuse atrophy, physical exercise and chronic muscle transformation. Biomedical investigations into proteome-wide alterations in skeletal muscle tissues were also used to establish novel biomarker signatures of neuromuscular disorders. Importantly, mass spectrometric studies have confirmed the enormous complexity of posttranslational modifications in skeletal muscle proteins. Conclusions This review critically examines the scientific impact of modern muscle proteomics and discusses its successful application for a better understanding of muscle biology, but also outlines its technical limitations and emerging techniques to establish new biomarker candidates.
Collapse
Affiliation(s)
- Kay Ohlendieck
- Muscle Biology Laboratory, Department of Biology, National University of Ireland, Maynooth, County Kildare, Ireland.
| |
Collapse
|
18
|
Chandra A, Wormser GP, Klempner MS, Trevino RP, Crow MK, Latov N, Alaedini A. Anti-neural antibody reactivity in patients with a history of Lyme borreliosis and persistent symptoms. Brain Behav Immun 2010; 24:1018-24. [PMID: 20227484 PMCID: PMC2897967 DOI: 10.1016/j.bbi.2010.03.002] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Revised: 02/20/2010] [Accepted: 03/02/2010] [Indexed: 01/09/2023] Open
Abstract
Some Lyme disease patients report debilitating chronic symptoms of pain, fatigue, and cognitive deficits despite recommended courses of antibiotic treatment. The mechanisms responsible for these symptoms, collectively referred to as post-Lyme disease syndrome (PLS) or chronic Lyme disease, remain unclear. We investigated the presence of immune system abnormalities in PLS by assessing the levels of antibodies to neural proteins in patients and controls. Serum samples from PLS patients, post-Lyme disease healthy individuals, patients with systemic lupus erythematosus, and normal healthy individuals were analyzed for anti-neural antibodies by immunoblotting and immunohistochemistry. Anti-neural antibody reactivity was found to be significantly higher in the PLS group than in the post-Lyme healthy (p<0.01) and normal healthy (p<0.01) groups. The observed heightened antibody reactivity in PLS patients could not be attributed solely to the presence of cross-reactive anti-borrelia antibodies, as the borrelial seronegative patients also exhibited elevated anti-neural antibody levels. Immunohistochemical analysis of PLS serum antibody activity demonstrated binding to cells in the central and peripheral nervous systems. The results provide evidence for the existence of a differential immune system response in PLS, offering new clues about the etiopathogenesis of the disease that may prove useful in devising more effective treatment strategies.
Collapse
Affiliation(s)
- Abhishek Chandra
- Department of Neurology and Neuroscience, Cornell University, New York, NY, USA
| | - Gary P. Wormser
- Division of Infectious Diseases, Department of Medicine, New York Medical College, Valhalla, NY, USA
| | | | | | - Mary K. Crow
- Division of Rheumatology, Hospital for Special Surgery, New York, NY, USA
| | - Norman Latov
- Department of Neurology and Neuroscience, Cornell University, New York, NY, USA
| | - Armin Alaedini
- Department of Neurology and Neuroscience, Cornell University, New York, NY, USA,Corresponding author: Armin Alaedini, Department of Neurology and Neuroscience, Weill Medical College of Cornell University, 1300 York Ave., LC-819, New York, NY 10065; Phone: 212-746-7841;
| |
Collapse
|
19
|
Reeves GA, Talavera D, Thornton JM. Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 2009; 6:129-47. [PMID: 19019817 DOI: 10.1098/rsif.2008.0341] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent years have seen a huge increase in the generation of genomic and proteomic data. This has been due to improvements in current biological methodologies, the development of new experimental techniques and the use of computers as support tools. All these raw data are useless if they cannot be properly analysed, annotated, stored and displayed. Consequently, a vast number of resources have been created to present the data to the wider community. Annotation tools and databases provide the means to disseminate these data and to comprehend their biological importance. This review examines the various aspects of annotation: type, methodology and availability. Moreover, it puts a special interest on novel annotation fields, such as that of phenotypes, and highlights the recent efforts focused on the integrating annotations.
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
20
|
Gauthier DJ, Lazure C. Complementary methods to assist subcellular fractionation in organellar proteomics. Expert Rev Proteomics 2008; 5:603-17. [PMID: 18761470 DOI: 10.1586/14789450.5.4.603] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Organellar proteomics aims to describe the full complement of proteins of subcellular structures and organelles. When compared with whole-cell or whole-tissue proteomes, the more focused results from subcellular proteomic studies have yielded relatively simpler datasets from which biologically relevant information can be more easily extracted. In every proteomic study, the quality and purity of the biological sample to be investigated is of the utmost importance for a successful analysis. In organellar proteomics, one of the most crucial steps in sample preparation is the initial subcellular fractionation procedure by which the enriched preparation of the sought-after organelle is obtained. In nearly all available organellar proteomic studies, the method of choice relies on one or several rounds of density-based gradient centrifugation. Although this method has been recognized for decades as yielding relatively pure preparations of organelles, recent technological advances in protein separation and identification can now reveal even minute amounts of contamination, which in turn can greatly complicate data interpretation. The scope of this review focuses on recently published innovative complementary or alternative methods to perform subcellular fractionation, which can further refine the way in which sample preparation is accomplished in organellar proteomics.
Collapse
Affiliation(s)
- Daniel J Gauthier
- Neuropeptides Structure and Metabolism Research Unit, Institut de Recherches Cliniques de Montréal, University of Montréal, 110 Pine Avenue West, Montréal, Québec, Canada H2W 1R7.
| | | |
Collapse
|
21
|
Steinlein OK, Bertrand D. Neuronal nicotinic acetylcholine receptors: from the genetic analysis to neurological diseases. Biochem Pharmacol 2008; 76:1175-83. [PMID: 18691557 DOI: 10.1016/j.bcp.2008.07.012] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2008] [Revised: 07/09/2008] [Accepted: 07/09/2008] [Indexed: 10/21/2022]
Abstract
Nicotinic acetylcholine receptors (nAChRs) are ligand-gated channels that mediate, in the peripheral nervous system, fast neurotransmission at the neuromuscular junction and in ganglia. Widely expressed in the central nervous system neuronal nAChRs are thought to contribute both to neurotransmission and modulation of neuronal activity. To date, eleven genes encoding for these receptors have been identified in the mammalian genome and their structure is well conserved throughout evolution. Progresses made in the field of genetics and the identification of a large number of small genetic variants such as single nucleotide polymorphisms raise new questions about the physiologic and pharmacologic consequences of such variations. The finding of associations between polymorphisms in the genes encoding for the neuronal nAChRs and neurological disorders such as schizophrenia and Alzheimer disease illustrate the importance of getting a better understanding of these receptors from the gene to function. In this work we present an overview over the progress that has been made in understanding the role of nAChR genes in monogenic disorders such as familial epilepsy, and review the latest knowledge about genetic variants of the nAChR genes and their relationship with common disorders and behavioural traits of complex etiology.
Collapse
Affiliation(s)
- O K Steinlein
- Institute of Human Genetics, University Hospital, Ludwig Maximilians University, Munich, Germany
| | | |
Collapse
|
22
|
Proteome of monocyte priming by lipopolysaccharide, including changes in interleukin-1beta and leukocyte elastase inhibitor. Proteome Sci 2008; 6:13. [PMID: 18492268 PMCID: PMC2413206 DOI: 10.1186/1477-5956-6-13] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2007] [Accepted: 05/20/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Monocytes can be primed in vitro by lipopolysaccharide (LPS) for release of cytokines, for enhanced killing of cancer cells, and for enhanced release of microbicidal oxygen radicals like superoxide and peroxide. We investigated the proteins involved in regulating priming, using 2D gel proteomics. RESULTS Monocytes from 4 normal donors were cultured for 16 h in chemically defined medium in Teflon bags +/- LPS and +/- 4-(2-aminoethyl)-benzenesulfonyl fluoride (AEBSF), a serine protease inhibitor. LPS-primed monocytes released inflammatory cytokines, and produced increased amounts of superoxide. AEBSF blocked priming for enhanced superoxide, but did not affect cytokine release, showing that AEBSF was not toxic. After staining large-format 2D gels with Sypro ruby, we compared the monocyte proteome under the four conditions for each donor. We found 30 protein spots that differed significantly in response to LPS or AEBSF, and these proteins were identified by ion trap mass spectrometry. CONCLUSION We identified 19 separate proteins that changed in response to LPS or AEBSF, including ATP synthase, coagulation factor XIII, ferritin, coronin, HN ribonuclear proteins, integrin alpha IIb, pyruvate kinase, ras suppressor protein, superoxide dismutase, transketolase, tropomyosin, vimentin, and others. Interestingly, in response to LPS, precursor proteins for interleukin-1beta appeared; and in response to AEBSF, there was an increase in elastase inhibitor. The increase in elastase inhibitor provides support for our hypothesis that priming requires an endogenous serine protease.
Collapse
|
23
|
Valiokas R, Klenkar G, Tinazli A, Reichel A, Tampé R, Piehler J, Liedberg B. Self-assembled monolayers containing terminal mono-, bis-, and tris-nitrilotriacetic acid groups: characterization and application. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2008; 24:4959-4967. [PMID: 18393558 DOI: 10.1021/la703709a] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
We have undertaken a structural and functional study of self-assembled monolayers (SAMs) formed on gold from a series of alkylthiol compounds containing terminal multivalent chelators (MCHs) composed of mono-, bis-, and tris-nitrilotriacetic acid (NTA) moieties. SAMs were formed from single-component solutions of the mono-, bis-, and tris-NTA compounds, as well as from mixtures with a tri(ethylene glycol)-terminated alkylthiol (EG(3)). Contact angle goniometry, null ellipsometry, and infrared spectroscopy were used to explore the structural characteristics of the MCH SAMs. Ellipsometric measurements show that the amount of the MCH groups on surfaces increases with increasing mol % of the MCH thiols in the loading solution up to about 80 mol %. We also conclude that mixed SAMs, prepared in the solution composition regime 0-30 mol % of the MCH thiols, consist of a densely packed alkyl layer, an amorphous ethylene glycol layer, and an outermost layer of MCH groups exposed toward the ambient. Above 30 mol %, a significant degree of disorder is observed in the SAMs. Finally, functional evaluation of the three MCH SAMs prepared at 0-30 mol% reveals a consistent increase in binding strength with increasing multivalency. The tris-NTA SAM, in particular, is enabled for stable and functional immobilization of a His6-tagged extracellular receptor subunit, even at low chelator surface concentrations, which makes it suitable for applications when a low surface density of capturing sites is desirable, e.g., in kinetic analyses.
Collapse
Affiliation(s)
- Ramunas Valiokas
- Department of Functional Nanomaterials, Institute of Physics, Savanoriu; 231, LT-02300 Vilnius, Lithuania.
| | | | | | | | | | | | | |
Collapse
|
24
|
On the use of different mass spectrometric techniques for characterization of sequence variability in genomic DNA. Anal Bioanal Chem 2008; 391:135-49. [DOI: 10.1007/s00216-008-1929-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Revised: 01/25/2008] [Accepted: 01/31/2008] [Indexed: 10/22/2022]
|
25
|
Abstract
We accept that we are responsible for the quality of life of animals in our care. We accept that the activities of man affect all the living things with which we share this planet. But we are slow to realize that as a result we have a duty of care for all living things. That duty extends to the breeding of animals for which we are responsible. When animals are bred by man for a purpose, the aim should be to meet certain goals: to improve the precision with which breeding outcomes can be predicted; to avoid the introduction and advance of characteristics deleterious to well-being; and to manage genetic resources and diversity between and within populations as set out in the Convention on Biological Diversity. These goals are summed up in the phrase precision animal breeding. They should apply whether animals are bred as sources of usable products or services for medical or scientific research, for aesthetic or cultural considerations, or as pets. Modern molecular and quantitative genetics and advances in reproductive physiology provide the tools with which these goals can be met.
Collapse
Affiliation(s)
- A P F Flint
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire LE12 5RD, UK.
| | | |
Collapse
|
26
|
How a Generative Encoding Fares as Problem-Regularity Decreases. PARALLEL PROBLEM SOLVING FROM NATURE – PPSN X 2008. [DOI: 10.1007/978-3-540-87700-4_36] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
27
|
Huber LC, Stanczyk J, Jüngel A, Gay S. Epigenetics in inflammatory rheumatic diseases. ACTA ACUST UNITED AC 2007; 56:3523-31. [PMID: 17968922 DOI: 10.1002/art.22948] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Lars C Huber
- University Hospital Zurich, Zurich Center for Integrative Human Physiology, Zurich, Switzerland.
| | | | | | | |
Collapse
|
28
|
Dahinden C, Parmigiani G, Emerick MC, Bühlmann P. Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries. BMC Bioinformatics 2007; 8:476. [PMID: 18072965 PMCID: PMC2233645 DOI: 10.1186/1471-2105-8-476] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2007] [Accepted: 12/11/2007] [Indexed: 11/10/2022] Open
Abstract
Background The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. Results We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ1-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries. Conclusion We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.
Collapse
|
29
|
Hene L, Sreenu VB, Vuong MT, Abidi SHI, Sutton JK, Rowland-Jones SL, Davis SJ, Evans EJ. Deep analysis of cellular transcriptomes - LongSAGE versus classic MPSS. BMC Genomics 2007; 8:333. [PMID: 17892551 PMCID: PMC2104538 DOI: 10.1186/1471-2164-8-333] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Accepted: 09/24/2007] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Deep transcriptome analysis will underpin a large fraction of post-genomic biology. 'Closed' technologies, such as microarray analysis, only detect the set of transcripts chosen for analysis, whereas 'open' e.g. tag-based technologies are capable of identifying all possible transcripts, including those that were previously uncharacterized. Although new technologies are now emerging, at present the major resources for open-type analysis are the many publicly available SAGE (serial analysis of gene expression) and MPSS (massively parallel signature sequencing) libraries. These technologies have never been compared for their utility in the context of deep transcriptome mining. RESULTS We used a single LongSAGE library of 503,431 tags and a "classic" MPSS library of 1,744,173 tags, both prepared from the same T cell-derived RNA sample, to compare the ability of each method to probe, at considerable depth, a human cellular transcriptome. We show that even though LongSAGE is more error-prone than MPSS, our LongSAGE library nevertheless generated 6.3-fold more genome-matching (and therefore likely error-free) tags than the MPSS library. An analysis of a set of 8,132 known genes detectable by both methods, and for which there is no ambiguity about tag matching, shows that MPSS detects only half (54%) the number of transcripts identified by SAGE (3,617 versus 1,955). Analysis of two additional MPSS libraries shows that each library samples a different subset of transcripts, and that in combination the three MPSS libraries (4,274,992 tags in total) still only detect 73% of the genes identified in our test set using SAGE. The fraction of transcripts detected by MPSS is likely to be even lower for uncharacterized transcripts, which tend to be more weakly expressed. The source of the loss of complexity in MPSS libraries compared to SAGE is unclear, but its effects become more severe with each sequencing cycle (i.e. as MPSS tag length increases). CONCLUSION We show that MPSS libraries are significantly less complex than much smaller SAGE libraries, revealing a serious bias in the generation of MPSS data unlikely to have been circumvented by later technological improvements. Our results emphasize the need for the rigorous testing of new expression profiling technologies.
Collapse
Affiliation(s)
- Lawrence Hene
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Vattipally B Sreenu
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Mai T Vuong
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - S Hussain I Abidi
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Julian K Sutton
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Sarah L Rowland-Jones
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Simon J Davis
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| | - Edward J Evans
- Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK
| |
Collapse
|
30
|
Abstract
The human genome project and related research initiatives have enabled the identification of a significant number of genetic variants that are predictive of drug response and outcome (pharmacogenomic biomarkers). As yet, incorporation of routine pharmacogenomic testing into clinical practice has been relatively modest. Potential barriers to adoption include a relative lack of prospective controlled trials establishing the benefits of such testing, economic constraints, and ethical concerns, among others. Clinicians considering the use of pharmacogenomic testing in their practice also may be unfamiliar with the concepts and principles underlying this rapidly evolving discipline. Consequently, the purpose of this review is to provide the clinical pharmacologist with a primer on the principles and molecular mechanisms underlying pharmacogenomics. In addition, the methods currently being used to discover novel pharmacogenomic biomarkers and then apply these to clinical practice will be described.
Collapse
Affiliation(s)
- Michael H Court
- Comparative and Molecular Pharmacogenomics Laboratory, Department of Pharmacology and Experimental Therapeutics, Tufts University School of Medicine, 136 Harrison Avenue, Boston, MA 02111, USA
| |
Collapse
|
31
|
Abstract
Systems biology, possibly the latest sub-discipline of biology, has arisen as a result of the shockwave of genomic and proteomic data that has appeared in the past few years. However, despite ubiquitous initiatives that carry this label, there is no precise definition of systems biology other than the implication of a new, all-encompassing, multidisciplinary endeavor. Here we propose that systems biology is more than the integration of biology with methods of the physical and computational sciences, and also more than the expansion of the single-pathway approach to embracing genome-scale networks. It is the discipline that specifically addresses the fundamental properties of the complexity that living systems represent. To facilitate the discussion, we dissect and project the multifaceted systems complexity of living organisms into five dimensions: (1) molecular complexity; (2) structural complexity; (3) temporal complexity; (4) abstraction and emergence; and (5) algorithmic complexity. This "five-dimensional space" may provide a framework for comparing, classifying, and complementing the vast diversity of existing systems biology programs and their goals, and will also give a glimpse of the magnitude of the scientific problems associated with unraveling the ultimate mysteries of life.
Collapse
Affiliation(s)
- S Huang
- Harvard Medical School, Department of Surgery and Vascular Biology Program, Karp 11-212, Children's Hospital, 300 Longwood Avenue, Boston, 02115 MA, USA.
| | | |
Collapse
|
32
|
Abstract
Proper validation can accelerate sequence-based discovery of proteins and protein-coding genes. Databases currently contain a backlog of experimentally unverified gene models and tentative assignments of observed transcripts to coding or noncoding RNA. We present and apply a general principle, founded on base composition and the genetic code and validated here by bulk 2-D gels, that can improve the reliability of such classifications and of the algorithms or pipelines that lead to them.
Collapse
Affiliation(s)
- Stéphane Cruveiller
- Atelier de Génomique Comparative, Genoscope, Centre National de Séquençage, Evry, France
| | | | | | | |
Collapse
|
33
|
Hillen N, Stevanovic S. Contribution of mass spectrometry-based proteomics to immunology. Expert Rev Proteomics 2007; 3:653-64. [PMID: 17181480 DOI: 10.1586/14789450.3.6.653] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Antigen processing forwards various information about the cellular status and the proteome to the cell surface for scrutiny by the cellular immune system. Thus the repertoire of major histocompatibility complex (MHC)-bound peptides and the MHC ligandome, indirectly mirrors the proteome in order to make alterations instantly detectable and, if necessary, to oppose them. Mass spectrometry is the core technology for analysis of both proteome and MHC ligandome and has evoked several strategies to gain qualitative and quantitative insight into the MHC-presented peptide repertoire. After immunoaffinity purification of detergent-solubilized peptide-MHC complexes followed by acid elution of peptides, liquid chromatography-mass spectrometry is applied to determine individual peptide sequences and, thus, allow qualitative characterization of the MHC-bound repertoire. Differential quantification based on stable isotope labeling enables the relative comparison of two samples, such as diseased and healthy tissue. Targeted searches for certain natural ligands, such as the 'predict-calibrate-detect' strategy, include motif-based epitope prediction and calibration with reference peptides. Thus, various approaches are now available for exposing and understanding the intricacies of the MHC ligand repertoire. Analysis of differences in the MHC ligandome under distinct conditions contributes to our understanding of basic cellular processes, but also enables the formulation of immunodiagnostic or immunotherapeutic strategies.
Collapse
Affiliation(s)
- Nina Hillen
- University of Tübingen, Department of Immunology, Institute for Cell Biology, 72076 Tübingen, Germany.
| | | |
Collapse
|
34
|
Taft RJ, Pheasant M, Mattick JS. The relationship between non-protein-coding DNA and eukaryotic complexity. Bioessays 2007; 29:288-99. [PMID: 17295292 DOI: 10.1002/bies.20544] [Citation(s) in RCA: 403] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
There are two intriguing paradoxes in molecular biology--the inconsistent relationship between organismal complexity and (1) cellular DNA content and (2) the number of protein-coding genes--referred to as the C-value and G-value paradoxes, respectively. The C-value paradox may be largely explained by varying ploidy. The G-value paradox is more problematic, as the extent of protein coding sequence remains relatively static over a wide range of developmental complexity. We show by analysis of sequenced genomes that the relative amount of non-protein-coding sequence increases consistently with complexity. We also show that the distribution of introns in complex organisms is non-random. Genes composed of large amounts of intronic sequence are significantly overrepresented amongst genes that are highly expressed in the nervous system, and amongst genes downregulated in embryonic stem cells and cancers. We suggest that the informational paradox in complex organisms may be explained by the expansion of cis-acting regulatory elements and genes specifying trans-acting non-protein-coding RNAs.
Collapse
Affiliation(s)
- Ryan J Taft
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Australia
| | | | | |
Collapse
|
35
|
Wishart DS. Discovering drug targets through the web. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2006; 2:9-17. [PMID: 20483274 DOI: 10.1016/j.cbd.2006.01.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2005] [Revised: 01/28/2006] [Accepted: 01/30/2006] [Indexed: 11/25/2022]
Abstract
Traditionally, drug-target discovery is a "wet-bench" experimental process, depending on carefully designed genetic screens, biochemical tests and cellular assays to identify proteins and genes that are associated with a particular disease or condition. However, recent advances in DNA sequencing, transcript profiling, protein identification and protein quantification are leading to a flood of genomic and proteomic data that is, or potentially could be, linked to disease data. The quantity of data generated by these high throughput methods is forcing scientists to re-think the way they do traditional drug-target discovery. In particular it is leading them more and more towards identifying potential drug targets using computers. In fact, drug-target identification is now being done as much on the desk-top as on the bench-top. This review focuses on describing how drug-target discovery can be done in silico (i.e. via computer) using a variety of bioinformatic resources that are freely available on the web. Specifically, it highlights a number of web-accessible sequence databases, automated genome annotation tools, text mining tools; and integrated drug/sequence databases that can be used to identify drug targets for both endogenous (genetic and epigenetic) diseases as well as exogenous (infectious) diseases.
Collapse
Affiliation(s)
- David S Wishart
- Departments of Computing Science and Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E8
| |
Collapse
|
36
|
Righetti PG. Real and imaginary artefacts in proteome analysis via two-dimensional maps. J Chromatogr B Analyt Technol Biomed Life Sci 2006; 841:14-22. [PMID: 16517224 DOI: 10.1016/j.jchromb.2006.02.022] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2005] [Revised: 01/20/2006] [Accepted: 02/11/2006] [Indexed: 11/27/2022]
Abstract
The present review touches on a long-lasting debate on possible artefacts (i.e. generation of spurious spots, not belonging to the biological sample under analysis) induced by the separation technique (in this case, two-dimensional mapping) per se. It is shown here that some of the biggest offenders, always blamed in the past (at least since 1970, i.e. since the inception of gel-base isoelectric focusing protocols), namely deamidation (of Asn and Gln residues) and carbamylation (due to cyanate produced in urea solution), simply do not occur in properly handled samples and have never indeed been demonstrated in real samples, except when forced in purpose. Conversely, two unexpected major artefacts have been recently shown to plague 2D mapping. One is formation of homo- and hetero-oligomers in samples that have been reduced but not alkylated prior to entering the electric field. The phenomenon is highly aggravated in alkaline pH regions and can lead to an impressive number of spurious spots not existing in the original sample. Thus, alkylation (best if performed with acrylamide or vinylpyridines) is a must for avoiding such spurious spots, as well as sample streaking and smearing in the alkaline gel region, and for maintaining sample integrity. In fact, the other unexpected artefact is desulfuration (beta-elimination) by which, upon prolonged electrophoresis, the sample looses an -SH group fro Cys residues. This loss, in the long run, is accompanied by massive protein degradation due to lysis of a C-N bond along the polypeptide chain. Here too, alkylation of -SH groups of Cys almost completely prevents this noxious degradation phenomenon.
Collapse
Affiliation(s)
- Pier Giorgio Righetti
- Polytechnic of Milano, Department of Chemistry, Giulio Natta, Materials and Engineering Chemistry, Via Mancinelli 7, Milano 20131, Italy.
| |
Collapse
|
37
|
Yao A, Charlab R, Li P. Systematic identification of pseudogenes through whole genome expression evidence profiling. Nucleic Acids Res 2006; 34:4477-85. [PMID: 16945953 PMCID: PMC1636364 DOI: 10.1093/nar/gkl591] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2005] [Revised: 07/28/2006] [Accepted: 07/31/2006] [Indexed: 01/23/2023] Open
Abstract
The identification of pseudogenes is an integral and significant part of the genome annotation because of their abundance and their impact on the experimental analysis of functional genes. Most of the computational annotation systems are not optimized for systematic pseudogene recognition, often annotating pseudogenes as functional genes, and users then propagate these errors to subsequent analyses and interpretations. In order to validate gene annotations and to identify pseudogenes that are potentially mis-annotated, we developed a novel approach based on whole genome profiling of existing transcript and protein sequences. This method has two important features: (i) equally detects both processed and non-processed pseudogenes and (ii) can identify transcribed pseudogenes. Applying this method to the human Ensembl gene predictions, we discovered that 2011 (9% of total) Ensembl genes in the categories of known and novel might be pseudogenes based on expression evidence. Of these, 1200 genes are found to have no existing evidence of transcription, and 811 genes are found with transcription evidence but contain significant translation disruption. Approximately 40% of the 2011 identified pseudogenes presented a multi-exon structure, representing non-processed pseudogenes. We have demonstrated the power of whole genome profiling of expression sequences to improve the accuracy of gene annotations.
Collapse
Affiliation(s)
- Alison Yao
- Celera Genomics45 West Gude Dr, Rockville, MD 20850, USA
- Applied Biosystems Inc45 West Gude Dr, Rockville, MD 20850, USA
| | - Rosane Charlab
- Applied Biosystems Inc45 West Gude Dr, Rockville, MD 20850, USA
| | - Peter Li
- Applied Biosystems Inc45 West Gude Dr, Rockville, MD 20850, USA
| |
Collapse
|
38
|
Bortoluzzi S, Scannapieco P, Cestaro A, Danieli GA, Schiaffino S. Computational reconstruction of the human skeletal muscle secretome. Proteins 2006; 62:776-92. [PMID: 16342272 DOI: 10.1002/prot.20803] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
In multicellular organisms, secreted proteins play pivotal regulatory roles in intercellular communication. Proteins secreted by skeletal muscle can act locally on muscle cells through autocrine/paracrine loops and on surrounding tissues such as muscle blood vessels, or they can be released into the blood stream, thus producing systemic effects. By a computational approach, we have screened 6255 products of genes expressed in normal human skeletal muscle. Putatively secreted proteins were identified by sequential steps of sieving, through prediction of signal peptide, recognition of transmembrane regions, and analysis of protein annotation. The resulting putative skeletal muscle secretome consists of 319 proteins, including 78 still uncharacterized proteins. This is the first human skeletal muscle secretome produced by computational analysis. Knowledge of proteins secreted by skeletal muscle could stimulate development of novel treatments for different diseases, including muscle atrophy and dystrophy. In addition, better knowledge of the secretion process in skeletal muscle can be useful for future gene therapy approaches.
Collapse
|
39
|
Brinkman RR, Dubé MP, Rouleau GA, Orr AC, Samuels ME. Human monogenic disorders — a source of novel drug targets. Nat Rev Genet 2006; 7:249-60. [PMID: 16534513 DOI: 10.1038/nrg1828] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The decrease in new drug applications and approvals over the past several years results from an underlying crisis in drug target identification and validation. Model organisms are being used to address this problem, in combination with novel approaches such as the International HapMap Project. What has been underappreciated is that discovery of new drug targets can also be revived by traditional Mendelian genetics. A large fraction of the human gene repertoire remains phenotypically uncharacterized, and is likely to encode many unanticipated and novel phenotypes that will be of interest to pharmaceutical and biotechnological drug developers.
Collapse
Affiliation(s)
- Ryan R Brinkman
- British Columbia Cancer Research Centre, University of British Columbia, Vancouver, British Columbia V5Z 1C3, Canada
| | | | | | | | | |
Collapse
|
40
|
Kemmer D, Podowski RM, Arenillas D, Lim J, Hodges E, Roth P, Sonnhammer ELL, Höög C, Wasserman WW. NovelFam3000--uncharacterized human protein domains conserved across model organisms. BMC Genomics 2006; 7:48. [PMID: 16533400 PMCID: PMC1440326 DOI: 10.1186/1471-2164-7-48] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Accepted: 03/13/2006] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins. DESCRIPTION From the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system. CONCLUSION Consistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families.
Collapse
Affiliation(s)
- Danielle Kemmer
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | - Raf M Podowski
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | - David Arenillas
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, Canada
| | - Jonathan Lim
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, Canada
| | - Emily Hodges
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | - Peggy Roth
- Department of Developmental Biology, Stockholm University, Stockholm, Sweden
| | - Erik LL Sonnhammer
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | - Christer Höög
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, Canada
| |
Collapse
|
41
|
Abstract
The human genome project has had an impact on both biological research and its political organization; this review focuses primarily on the scientific novelty that has emerged from the project but also touches on its political dimensions. The project has generated both anticipated and novel information; in the later category are the description of the unusual distribution of genes, the prevalence of non-protein-coding genes, and the extraordinary evolutionary conservation of some regions of the genome. The applications of the sequence data are just starting to be felt in basic, rather than therapeutic, biomedical research and in the vibrant human origins and variation debates. The political impact of the project is in the unprecedented extent to which directed funding programs have emerged as drivers of basic research and the organization of the multidisciplinary groups that are needed to utilize the human DNA sequence.
Collapse
Affiliation(s)
- Peter F R Little
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney 2074, New South Wales, Australia.
| |
Collapse
|
42
|
Abstract
Recent years have brought a dramatic change in our understanding of the role of ribonucleic acids (RNAs) within the cell. In addition to the already well-known classes of RNAs that take part in the transmission of genetic information from DNA to proteins, a new highly heterogeneous group of RNA molecules has emerged. The regulatory nonprotein-coding RNAs (npcRNAs) have been shown to be involved in modulation of gene expression on both the transcriptional and post-transcriptional level. They participate in mechanisms of chromatin modification, regulation of transcription factor activity, and influencing mRNA stability, processing, and translation. npcRNAs are key factors in genetic imprinting, dosage compensation of X-chromosome-linked genes, and many processes of differentiation and development.
Collapse
Affiliation(s)
- M Szymański
- Institute of Bioorganic Chemistry of the Polish Academy of Sciences, Noskowskiego 12, 61-704 Poznan, Poland.
| | | |
Collapse
|
43
|
Abstract
Proteomics reveals complex protein expression, function, interactions and localization in different phenotypes of neuron. As proteomics, regarded as a highly complex screening technology, moves from a theoretical approach to practical reality, neuroscientists have to determine the most-appropriate applications for this technology. Even though proteomics compliments genomics, it is in sheer contrast to the basically constant genome due to its dynamic nature. Neuroscientists have to surmount difficulties particular to the research in neuroscience; such as limited sample amounts, heterogeneous cellular compositions in samples and the fact that many proteins of interest are hydrophobic proteins. The necessity of exclusive technology, sophisticated software and skilled manpower tops the challenge. This review examines subcellular organelle isolation, protein fractionation and separation using two-dimensional gel electrophoresis (2-DGE) as well as multi-dimensional liquid chromatography (LC) followed by mass spectrometry (MS). The methods for quantifying relative gene product expression between samples (e.g., two-dimensional difference in gel electrophoresis (2D-DIGE), isotope-coded affinity tag (ICAT) and iTRAQ) are elaborated. An overview of the techniques used currently to assign post-translational modification status on a proteomics scale is also evaluated. The feasible coverage of the proteome, ability to detect unique cell components such as post-synaptic densities and membrane proteins, resource requirements and quantitative as well as qualitative reliability of different approaches is also discussed. While there are many challenges in neuroproteomics, this field promises many returns in the future.
Collapse
|
44
|
Abstract
The human and mouse genomes each contain at least 12 genes encoding LIM homeodomain (LIM-HD) transcription factors. These gene regulatory proteins feature two LIM domains in their amino termini and a characteristic DNA binding homeodomain. Studies of mouse models and human patients have established that the LIM-HD factors are critical for the development of specialized cells in multiple tissue types, including the nervous system, skeletal muscle, the heart, the kidneys, and endocrine organs such as the pituitary gland and the pancreas. In this article, we review the roles of the LIM-HD proteins in mammalian development and their involvement in human diseases.
Collapse
Affiliation(s)
- Chad S Hunter
- Department of Biology and The Indiana University Center for Regenerative Biology and Medicine, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202-5132, USA
| | | |
Collapse
|
45
|
Kemmer D, Huang Y, Shah SP, Lim J, Brumm J, Yuen MMS, Ling J, Xu T, Wasserman WW, Ouellette BFF. Ulysses - an application for the projection of molecular interactions across species. Genome Biol 2005; 6:R106. [PMID: 16356269 PMCID: PMC1414088 DOI: 10.1186/gb-2005-6-12-r106] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2005] [Revised: 08/03/2005] [Accepted: 11/08/2005] [Indexed: 11/21/2022] Open
Abstract
Ulysses, a new software for the parallel analysis and display of protein interactions detected in various species, is described. We developed Ulysses as a user-oriented system that uses a process called Interolog Analysis for the parallel analysis and display of protein interactions detected in various species. Ulysses was designed to perform such Interolog Analysis by the projection of model organism interaction data onto homologous human proteins, and thus serves as an accelerator for the analysis of uncharacterized human proteins. The relevance of projections was assessed and validated against published reference collections. All source code is freely available, and the Ulysses system can be accessed via a web interface .
Collapse
Affiliation(s)
- Danielle Kemmer
- Center for Genomics and Bioinformatics, Karolinska Institutet, 171 77 Stockholm, Sweden
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver V5Z 4H4, BC, Canada
| | - Yong Huang
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - Sohrab P Shah
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
- Department of Computer Science, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - Jonathan Lim
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver V5Z 4H4, BC, Canada
| | - Jochen Brumm
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver V5Z 4H4, BC, Canada
| | - Macaire MS Yuen
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - John Ling
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - Tao Xu
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver V5Z 4H4, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - BF Francis Ouellette
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| |
Collapse
|
46
|
Neverov AD, Artamonova II, Nurtdinov RN, Frishman D, Gelfand MS, Mironov AA. Alternative splicing and protein function. BMC Bioinformatics 2005; 6:266. [PMID: 16274476 PMCID: PMC1298288 DOI: 10.1186/1471-2105-6-266] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2005] [Accepted: 11/07/2005] [Indexed: 11/16/2022] Open
Abstract
Background Alternative splicing is a major mechanism of generating protein diversity in higher eukaryotes. Although at least half, and probably more, of mammalian genes are alternatively spliced, it was not clear, whether the frequency of alternative splicing is the same in different functional categories. The problem is obscured by uneven coverage of genes by ESTs and a large number of artifacts in the EST data. Results We have developed a method that generates possible mRNA isoforms for human genes contained in the EDAS database, taking into account the effects of nonsense-mediated decay and translation initiation rules, and a procedure for offsetting the effects of uneven EST coverage. Then we computed the number of mRNA isoforms for genes from different functional categories. Genes encoding ribosomal proteins and genes in the category "Small GTPase-mediated signal transduction" tend to have fewer isoforms than the average, whereas the genes in the category "DNA replication and chromosome cycle" have more isoforms than the average. Genes encoding proteins involved in protein-protein interactions tend to be alternatively spliced more often than genes encoding non-interacting proteins, although there is no significant difference in the number of isoforms of alternatively spliced genes. Conclusion Filtering for functional isoforms satisfying biological constraints and accountung for uneven EST coverage allowed us to describe differences in alternative splicing of genes from different functional categories. The observations seem to be consistent with expectations based on current biological knowledge: less isoforms for ribosomal and signal transduction proteins, and more alternative splicing of interacting and cell cycle proteins.
Collapse
Affiliation(s)
- AD Neverov
- State Scientific Center GosNIIGenetika, 1st Dorozhny proezd 1, Moscow, 117545, Russia
| | - II Artamonova
- Institute for Bioinformatics/MIPS, GSF – National Research Center for Environment and Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - RN Nurtdinov
- Department of Bioengineering and Bioinformatics, M.V.Lomonosov Moscow State University, Vorobievy Gory 1–73, Moscow, 119992, Russia
| | - D Frishman
- Institute for Bioinformatics/MIPS, GSF – National Research Center for Environment and Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
- Department of Genome Oriented Bioinformatics, Technical University of Munich, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - MS Gelfand
- State Scientific Center GosNIIGenetika, 1st Dorozhny proezd 1, Moscow, 117545, Russia
- Department of Bioengineering and Bioinformatics, M.V.Lomonosov Moscow State University, Vorobievy Gory 1–73, Moscow, 119992, Russia
- Institute for Information Transmission Problems RAS, Bolshoi Karetny pereulok 19, Moscow, 127994, Russia
| | - AA Mironov
- State Scientific Center GosNIIGenetika, 1st Dorozhny proezd 1, Moscow, 117545, Russia
- Department of Bioengineering and Bioinformatics, M.V.Lomonosov Moscow State University, Vorobievy Gory 1–73, Moscow, 119992, Russia
| |
Collapse
|
47
|
Yanes O, Villanueva J, Querol E, Aviles FX. Functional Screening of Serine Protease Inhibitors in the Medical Leech Hirudo medicinalis Monitored by Intensity Fading MALDI-TOF MS. Mol Cell Proteomics 2005; 4:1602-13. [PMID: 16030009 DOI: 10.1074/mcp.m500145-mcp200] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The blood-feeding invertebrates are a rich biological source of drugs and lead compounds to treat cardiovascular diseases because they have evolved highly efficient mechanisms to feed on their hosts by blocking blood coagulation. In this work, we focused our attention on the leech Hirudo medicinalis. We performed, by "intensity fading" MALDI-TOF mass spectrometry, a comprehensive detection and functional analysis of pre-existent peptides and small proteins with the capability of binding to trypsin-like proteases related to blood coagulation. Combining "intensity fading MS" and off-line LC prefractionation allowed us to detect more than 75 molecules present in the leech extract that interact specifically with a trypsin-like protease over a sample profile of nearly 2,000 different peptides/proteins in the 2-20-kDa range. Moreover we resolved 232 individual components from the complex mixture, 13 of which have high sequence homology with previously described serine protease inhibitors. Our findings indicate that such extracts are much more complex than expected. Additionally, intensity fading MS, when complemented with LC separation strategies, seems to be a useful tool to investigate complex biological samples, establishing a new bridge between profiling, functional peptidomics, and subsequent drug discovery.
Collapse
Affiliation(s)
- Oscar Yanes
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica, Universitat Autònoma de Barcelona, 08193 Bellaterra (Barcelona), Spain
| | | | | | | |
Collapse
|
48
|
Orth AP, Batalov S, Perrone M, Chanda SK. The promise of genomics to identify novel therapeutic targets. Expert Opin Ther Targets 2005; 8:587-96. [PMID: 15584864 DOI: 10.1517/14728222.8.6.587] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The cataloguing of the human genome has provided an unprecedented prospectus for target identification and drug discovery. A current analysis indicates that slightly more than 3000 unique protein encoding loci are potentially amenable to pharmacological intervention (the 'druggable genome', which can be queried at http://function.gnf.org/druggable). However, the assessment of genome sequence data has not resulted in the anticipated acceleration of novel therapeutic developments. The basis for this shortfall lies in the significant attrition rates endemic to preclinical/clinical development, as well as the often underestimated complexity of gene function in higher order biological systems. To address the latter issue, a number of strategies have emerged to facilitate genomics-driven target identification and validation, including cellular profiling of gene function, in silico modelling of gene networks, and systematic analyses of protein complexes. The expectation is that the integration of these and other systems-based technologies may enable the conversion of potential genomic targets into functionally validated molecules, and result in practicable gene-based drug discovery pipelines.
Collapse
Affiliation(s)
- Anthony P Orth
- The Genomics Institute of the Novartis Research Foundation, 10675 John J. Hopkins Drive, San Diego, CA 92121, USA
| | | | | | | |
Collapse
|
49
|
Shin JH, Krapfenbauer K, Lubec G. Column chromatographic prefractionation leads to the detection of 543 different gene products in human fetal brain. Electrophoresis 2005; 26:2759-78. [PMID: 15966016 DOI: 10.1002/elps.200500051] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In a previous publication a large series of proteins were identified in fetal human brain by the use of two-dimensional electrophoresis (2-DE) with subsequent matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) and MALDI-tandem time-of-flight (TOF/TOF) analysis. Further identification of many more different spots by traditional 2-DE without additional step such as narrow immobilized ph gradient (IPG) strips or prefractionation seems unlikely and we therefore decided to separate extracted brain proteins by ion-exchange chromatography using a TSK gel DEAE-5PW column followed by 2-DE of individual fractions and analysis by MALDI-TOF/TOF with LIFT technology in fetal brain of the early second trimester. About 1880 protein spots corresponding to 543 different gene products were identified. These proteins included housekeeping, signaling, cytoskeletal, metabolic, antioxidant, and neuron/synaptosomal specific proteins. Among these, 314 gene products (314/543, 57.8%), which have never been detected in traditional 2-DE of human fetal brain, were observed by this method. This updated map of fetal brain proteins may serve as data base and reference map for fetal brain proteins, and the methodology applied may be used as a valuable analytical tool for the basis of protein expressional studies in health and disease.
Collapse
Affiliation(s)
- Joo-Ho Shin
- Department of Pediatrics, Medical University of Vienna, Vienna, Austria
| | | | | |
Collapse
|
50
|
Righetti PG, Castagna A, Antonioli P, Boschetti E. Prefractionation techniques in proteome analysis: the mining tools of the third millennium. Electrophoresis 2005; 26:297-319. [PMID: 15657944 DOI: 10.1002/elps.200406189] [Citation(s) in RCA: 235] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The present review deals with prefractionation protocols used in proteomic investigation in preparation for mass spectrometry (MS) or two-dimensional electrophoresis (2-DE) map analysis. Briefly, reported methods focus on cell organelle differential centrifugation and on chromatographic approaches, to continue in extenso with a panoply of electrophoretic methods. In the case of chromatography, procedures useful as a prefractionation step, including affinity, ion-exchange, and reversed-phase resins, revealed several hundreds of new species, previously undetected in unfractionated samples. Novel chromatographic prefractionation methods are also discussed such as a multistaged fractionation column, consisting in a set of immobilized chemistries, serially connected in a stack format (an assembly of seven blocks), each capable of harvesting a given protein population. Such a method significantly simplifies the complexity of treated samples while concentrating species, all resulting in a larger number of visible proteins by MS or 2-DE. Electrophoretic prefractionation protocols include all those electrokinetic methodologies which are performed in free solution, essentially all relying on isoelectric focusing steps (although some approaches based on gels and granulated media are also discussed). Devices associated with electrophoretic separation are multichamber apparatus, such as the multicompartment electrolyzers equipped with either isoelectric membranes or with isoelectric beads. Multicup device electrophoresis and several others, exploiting the conventional technique of carrier ampholyte focusing, are reviewed. This review also reports approaches for sample treatments in order to detect low-abundance species. Among others, a special emphasis is made on the reduction of concentration difference between proteins constituting a sample. This latter consists in a library of combinatorial ligands coupled to small beads. Such a library comprises hexameric ligands composed of 20 amino acids, resulting in millions of different structures. When these beads are impregnated with complex proteomes (e.g., human sera) of widely differing protein compositions, they are able to significantly reduce the concentration differences, thus greatly enhancing the possibility to evidence low-abundance species. It is felt that this panoply of methods could offer a strong step forward in "mining below the tip of the iceberg" for detecting the "unseen proteome".
Collapse
Affiliation(s)
- Pier Giorgio Righetti
- University of Verona, Department of Industrial and Agricultural Biotechnolgies, Verona, Italy.
| | | | | | | |
Collapse
|