1
|
Wang MY, Zhang BL, Liang QQ, Lian XM, Zhang K, Yang QE, Yang WK. Chromosome-level genome assembly, annotation, and population genomic resource of argali (Ovis ammon). Sci Data 2025; 12:57. [PMID: 39799149 PMCID: PMC11724849 DOI: 10.1038/s41597-025-04400-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 01/02/2025] [Indexed: 01/15/2025] Open
Abstract
Argali stands as the largest species among wild sheep in Central and East Asia, with a concerning rate of decline estimated at 30%. The intraspecific taxonomy of argali remains contentious due to limited genomic data and unclear geographic separation. In this study, we constructed a chromosome-level genome assembly and annotation for the Tibetan argali (O. a. hodgsoni), together with population genomic resequencing of 32 individuals representing four subspecies. The contig-level genome was 2.64 Gb in size, with a contig N50 length of 71.69 Mb and an estimated genomic completeness of 96.01%. Using Hi-C sequencing data scaffolding, 99.90% of initially assembled sequences were mapped and oriented onto 28 pseudo-chromosomes except the Y chromosome. Annotation uncovered 21,564 protein-coding genes and 46.38% repeat sequences. The average coverage of the population resequencing data was 23.74 with mean mapping ratio up to of 97.19%. The high-quality genome assembly and annotation of the Tibetan argali, coupled with the high-depth population genomic data, will serve as a valuable genetic resource for studies on the taxonomy and conservation of argali.
Collapse
Affiliation(s)
- Mu-Yang Wang
- Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, 830011, China
- China-Tajikistan Belt and Road Joint Laboratory on Biodiversity Conservation and Sustainable Use, Urumqi, 830011, China
- Xinjiang Key Laboratory of Biodiversity Conservation and Application in Arid lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Bao-Lin Zhang
- Key Laboratory of Genetic Evolution & Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, 650223, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, 650107, China
- Yunnan Key Laboratory of Biodiversity Information, Kunming, Yunnan, 650223, China
| | - Qi-Qi Liang
- Beijing Bio Huaxing Gene Technology Co., LTDs, Beijing, 100049, China
| | - Xin-Ming Lian
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810001, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810001, China
| | - Ke Zhang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810001, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810001, China
| | - Qi-En Yang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810001, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810001, China.
| | - Wei-Kang Yang
- Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, 830011, China.
- China-Tajikistan Belt and Road Joint Laboratory on Biodiversity Conservation and Sustainable Use, Urumqi, 830011, China.
- Xinjiang Key Laboratory of Biodiversity Conservation and Application in Arid lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, 830011, China.
| |
Collapse
|
2
|
Betschart RO, Riccio C, Aguilera-Garcia D, Blankenberg S, Guo L, Moch H, Seidl D, Solleder H, Thalén F, Thiéry A, Twerenbold R, Zeller T, Zoche M, Ziegler A. Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control. Biom J 2024; 66:e202300278. [PMID: 38988195 DOI: 10.1002/bimj.202300278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 07/12/2024]
Abstract
Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data: on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg-Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.6:1, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.
Collapse
Affiliation(s)
| | | | - Domingo Aguilera-Garcia
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Stefan Blankenberg
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Linlin Guo
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Holger Moch
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Dagmar Seidl
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Hugo Solleder
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
| | - Felix Thalén
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
| | | | - Raphael Twerenbold
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- German Center for Cardiovascular Research (DZHK), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Tanja Zeller
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- German Center for Cardiovascular Research (DZHK), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Martin Zoche
- Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Andreas Ziegler
- Cardio-CARE, Medizincampus Davos, Davos, Switzerland
- Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa
| |
Collapse
|
3
|
Hadebe MT, Malgwi SA, Okpeku M. Revolutionizing Malaria Vector Control: The Importance of Accurate Species Identification through Enhanced Molecular Capacity. Microorganisms 2023; 12:82. [PMID: 38257909 PMCID: PMC10818655 DOI: 10.3390/microorganisms12010082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 12/08/2023] [Accepted: 12/20/2023] [Indexed: 01/24/2024] Open
Abstract
Many factors, such as the resistance to pesticides and a lack of knowledge of the morphology and molecular structure of malaria vectors, have made it more challenging to eradicate malaria in numerous malaria-endemic areas of the globe. The primary goal of this review is to discuss malaria vector control methods and the significance of identifying species in vector control initiatives. This was accomplished by reviewing methods of molecular identification of malaria vectors and genetic marker classification in relation to their use for species identification. Due to its specificity and consistency, molecular identification is preferred over morphological identification of malaria vectors. Enhanced molecular capacity for species identification will improve mosquito characterization, leading to accurate control strategies/treatment targeting specific mosquito species, and thus will contribute to malaria eradication. It is crucial for disease epidemiology and surveillance to accurately identify the Plasmodium spp. that are causing malaria in patients. The capacity for disease surveillance will be significantly increased by the development of more accurate, precise, automated, and high-throughput diagnostic techniques. In conclusion, although morphological identification is quick and achievable at a reduced cost, molecular identification is preferred for specificity and sensitivity. To achieve the targeted malaria elimination goal, proper identification of vectors using accurate techniques for effective control measures should be prioritized.
Collapse
Affiliation(s)
| | | | - Moses Okpeku
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville, Durban 4000, South Africa
| |
Collapse
|
4
|
Shushan A, Luria N, Lachman O, Sela N, Laskar O, Belausov E, Smith E, Dombrovsky A. Characterization of a novel psyllid-transmitted waikavirus in carrots. Virus Res 2023; 335:199192. [PMID: 37558054 PMCID: PMC10448213 DOI: 10.1016/j.virusres.2023.199192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 07/19/2023] [Accepted: 08/06/2023] [Indexed: 08/11/2023]
Abstract
Carrots collected from the Western Negev region in Israel during the winter of 2019 showed disease symptoms of chlorosis, leaf curling, a loss of apical dominance, and multiple lateral roots that were not associated with known pathogens of the carrot yellows disease. Symptomatic carrots were studied for a possible involvement of plant viruses in disease manifestations using high throughput sequencing analyses. The results revealed the presence of a waikavirus, sharing a ∼70% nucleotide sequence identity with Waikavirus genus members. Virions purified from waikavirus-positive carrots were visualized by transmission electron microscopy, showing icosahedral particle diameter of ∼28 nm. The genome sequence was validated by overlapping amplicons by designed 12 primer sets. A complete genome sequence was achieved by rapid amplification of cDNA ends (RACE) for sequencing the 5' end, and RT-PCR with oligo dT for sequencing the 3' end. The genome encodes a single large ORF, characteristic of waikaviruses. Aligning the waikavirus-deduced amino-acid sequence with other waikavirus species at the Pro-Pol region, a conserved sequence between the putative proteinase and the RNA-dependent RNA polymerase, showed a ∼40% identity, indicating the identification of a new waikavirus species. The amino-acid sequence of the three coat proteins and cleavage sites were experimentally determined by liquid chromatography-mass spectrometry. A phylogenetic analysis based on the Pro-Pol region revealed that the new waikavirus clusters with persimmon waikavirus and actinidia yellowing virus 1. The new waikavirus genome was localized in the phloem of waikavirus-infected carrots. The virus was transmitted to carrot and coriander plants by the psyllid Bactericera trigonica Hodkinson (Hemiptera: Triozidae).
Collapse
Affiliation(s)
- Ariel Shushan
- Department of Plant Pathology and Weed Research, Agricultural Research Organization-The Volcani Center, 68 HaMaccabim Road, P.O.B 15159, Rishon LeTsiyon 7528809, Israel; The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of University of Jerusalem, Rehovot 761001, Israel
| | - Neta Luria
- Department of Plant Pathology and Weed Research, Agricultural Research Organization-The Volcani Center, 68 HaMaccabim Road, P.O.B 15159, Rishon LeTsiyon 7528809, Israel
| | - Oded Lachman
- Department of Plant Pathology and Weed Research, Agricultural Research Organization-The Volcani Center, 68 HaMaccabim Road, P.O.B 15159, Rishon LeTsiyon 7528809, Israel
| | - Noa Sela
- Bioinformatics Unit, Agricultural Research Organization-The Volcani Center, 68 HaMaccabim Road, P.O.B 15159, Rishon LeZion 7505101, Israel
| | - Orly Laskar
- Department of Infectious Diseases, Israel Institute for Biological Research, P.O.B 19, Ness Ziona 74100, Israel
| | - Eduard Belausov
- Department of Ornamental Plants and Agricultural Biotechnology, Agricultural Research Organization, The Volcani Center, 68 HaMaccabim Road, P.O.B 15159, Rishon LeZion 7505101, Israel
| | - Elisheva Smith
- Department of Plant Pathology and Weed Research, Agricultural Research Organization-The Volcani Center, 68 HaMaccabim Road, P.O.B 15159, Rishon LeTsiyon 7528809, Israel
| | - Aviv Dombrovsky
- Department of Plant Pathology and Weed Research, Agricultural Research Organization-The Volcani Center, 68 HaMaccabim Road, P.O.B 15159, Rishon LeTsiyon 7528809, Israel.
| |
Collapse
|
5
|
McBride DJ, Fielding C, Newington T, Vatsiou A, Fischl H, Bajracharya M, Thomson VS, Fraser LJ, Fujita PA, Becq J, Kingsbury Z, Ross MT, Moat SJ, Morgan S. Whole-Genome Sequencing Can Identify Clinically Relevant Variants from a Single Sub-Punch of a Dried Blood Spot Specimen. Int J Neonatal Screen 2023; 9:52. [PMID: 37754778 PMCID: PMC10532340 DOI: 10.3390/ijns9030052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 09/01/2023] [Accepted: 09/06/2023] [Indexed: 09/28/2023] Open
Abstract
The collection of dried blood spots (DBS) facilitates newborn screening for a variety of rare, but very serious conditions in healthcare systems around the world. Sub-punches of varying sizes (1.5-6 mm) can be taken from DBS specimens to use as inputs for a range of biochemical assays. Advances in DNA sequencing workflows allow whole-genome sequencing (WGS) libraries to be generated directly from inputs such as peripheral blood, saliva, and DBS. We compared WGS metrics obtained from libraries generated directly from DBS to those generated from DNA extracted from peripheral blood, the standard input for this type of assay. We explored the flexibility of DBS as an input for WGS by altering the punch number and size as inputs to the assay. We showed that WGS libraries can be successfully generated from a variety of DBS inputs, including a single 3 mm or 6 mm diameter punch, with equivalent data quality observed across a number of key metrics of importance in the detection of gene variants. We observed no difference in the performance of DBS and peripheral-blood-extracted DNA in the detection of likely pathogenic gene variants in samples taken from individuals with cystic fibrosis or phenylketonuria. WGS can be performed directly from DBS and is a powerful method for the rapid discovery of clinically relevant, disease-causing gene variants.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Stuart J. Moat
- Wales Newborn Screening Laboratory, University Hospital of Wales, Cardiff CF14 4XW, UK
- School of Medicine, Cardiff University, Cardiff CF14 4XW, UK
| | - Sian Morgan
- All Wales Genetics Laboratory, University Hospital of Wales, Cardiff CF14 4XW, UK
| |
Collapse
|
6
|
Shin JW, Shin A, Park SS, Lee JM. Haplotype-specific insertion-deletion variations for allele-specific targeting in Huntington's disease. Mol Ther Methods Clin Dev 2022; 25:84-95. [PMID: 35356757 PMCID: PMC8933729 DOI: 10.1016/j.omtm.2022.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 03/01/2022] [Indexed: 11/25/2022]
Abstract
Huntington's disease (HD) is a dominantly inherited neurodegenerative disease caused by an expanded CAG repeat in huntingtin (HTT). Given an important role for HTT in development and significant neurodegeneration at the time of clinical manifestation in HD, early treatment of allele-specific drugs represents a promising strategy. The feasibility of an allele-specific antisense oligonucleotide (ASO) targeting single-nucleotide polymorphisms (SNPs) has been demonstrated in models of HD. Here, we constructed a map of haplotype-specific insertion-deletion variations (indels) to develop alternative mutant-HTT-specific strategies. We mapped indels annotated in the 1000 Genomes Project data on common HTT haplotypes, revealing candidate indels for mutant-specific HTT targeting. Subsequent sequencing of an HD family confirmed candidate sites and revealed additional allele-specific indels. Interestingly, the most common normal HTT haplotype carries indels of big allele length differences at many sites, further uncovering promising haplotype-specific targets. When patient-derived cells carrying the most common HTT diplotype were treated with ASOs targeting the mutant alleles of candidate indels (rs772629195 or rs72239206), complete mutant specificity was observed. In summary, our map of haplotype-specific indels permits the identification of allele-specific targets in HD subjects, potentially contributing to the development of safe HTT-lowering therapeutics that are suitable for early treatment in HD.
Collapse
Affiliation(s)
- Jun Wan Shin
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA.,Department of Neurology, Harvard Medical School, Boston, MA 02115, USA
| | - Aram Shin
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Seri S Park
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jong-Min Lee
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA.,Department of Neurology, Harvard Medical School, Boston, MA 02115, USA.,Medical and Population Genetics Program, Broad Institute of M.I.T. and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
7
|
Schüle S, Ostheim P, Port M, Abend M. Identifying radiation responsive exon-regions of genes often used for biodosimetry and acute radiation syndrome prediction. Sci Rep 2022; 12:9545. [PMID: 35680903 PMCID: PMC9184472 DOI: 10.1038/s41598-022-13577-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 05/17/2022] [Indexed: 11/12/2022] Open
Abstract
Gene expression (GE) analysis of FDXR, DDB2, WNT3 and POU2AF1 is a promising approach for identification of clinically relevant groups (unexposed, low- and high exposed) after radiological/nuclear events. However, results from international biodosimetry exercises have shown differences in dose estimates based on radiation-induced GE of the four genes. Also, differences in GE using next-generation-sequening (NGS) and validation with quantitative real-time polymerase chain reaction (qRT-PCR) was reported. These discrepancies could be caused by radiation-responsive differences among exons of the same gene. We performed GE analysis with qRT-PCR using TaqMan-assays covering all exon-regions of FDXR, DDB2, WNT3 and POU2AF1. Peripheral whole blood from three healthy donors was X-irradiated with 0, 0.5 and 4 Gy. After 24 and 48 h a dose-dependent up-regulation across almost all exon-regions for FDXR and DDB2 (4–42-fold) was found. A down-regulation for POU2AF1 (two- to threefold) and WNT3 (< sevenfold) at the 3’-end was found at 4 Gy irradiation only. Hence, this confirms our hypothesis for radiation-responsive exon-regions for WNT3 and POU2AF1, but not for FDXR and DDB2. Finally, we identified the most promising TaqMan-assays for FDXR (e.g. AR7DTG3, Hs00244586_m1), DDB2 (AR47X6H, Hs03044951_m1), WNT3 (Hs00902258_m1, Hs00902257_m1) and POU2AF1 (Hs01573370_g1, Hs01573371_m1) for biodosimetry purposes and acute radiation syndrome prediction, considering several criteria (detection limit, dose dependency, time persistency, inter-individual variability).
Collapse
Affiliation(s)
- Simone Schüle
- Bundeswehr Institute of Radiobiology Affiliated to the University Ulm, Neuherbergstr. 11, 80937, Munich, Germany
| | - Patrick Ostheim
- Bundeswehr Institute of Radiobiology Affiliated to the University Ulm, Neuherbergstr. 11, 80937, Munich, Germany
| | - Matthias Port
- Bundeswehr Institute of Radiobiology Affiliated to the University Ulm, Neuherbergstr. 11, 80937, Munich, Germany
| | - Michael Abend
- Bundeswehr Institute of Radiobiology Affiliated to the University Ulm, Neuherbergstr. 11, 80937, Munich, Germany.
| |
Collapse
|
8
|
Akoniyon OP, Adewumi TS, Maharaj L, Oyegoke OO, Roux A, Adeleke MA, Maharaj R, Okpeku M. Whole Genome Sequencing Contributions and Challenges in Disease Reduction Focused on Malaria. BIOLOGY 2022; 11:587. [PMID: 35453786 PMCID: PMC9027812 DOI: 10.3390/biology11040587] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 03/31/2022] [Accepted: 04/01/2022] [Indexed: 12/11/2022]
Abstract
Malaria elimination remains an important goal that requires the adoption of sophisticated science and management strategies in the era of the COVID-19 pandemic. The advent of next generation sequencing (NGS) is making whole genome sequencing (WGS) a standard today in the field of life sciences, as PCR genotyping and targeted sequencing provide insufficient information compared to the whole genome. Thus, adapting WGS approaches to malaria parasites is pertinent to studying the epidemiology of the disease, as different regions are at different phases in their malaria elimination agenda. Therefore, this review highlights the applications of WGS in disease management, challenges of WGS in controlling malaria parasites, and in furtherance, provides the roles of WGS in pursuit of malaria reduction and elimination. WGS has invaluable impacts in malaria research and has helped countries to reach elimination phase rapidly by providing required information needed to thwart transmission, pathology, and drug resistance. However, to eliminate malaria in sub-Saharan Africa (SSA), with high malaria transmission, we recommend that WGS machines should be readily available and affordable in the region.
Collapse
Affiliation(s)
- Olusegun Philip Akoniyon
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Taiye Samson Adewumi
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Leah Maharaj
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Olukunle Olugbenle Oyegoke
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Alexandra Roux
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Matthew A. Adeleke
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| | - Rajendra Maharaj
- Office of Malaria Research, South African Medical Research Council, Cape Town 7505, South Africa;
| | - Moses Okpeku
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville Campus, Durban 4041, South Africa; (O.P.A.); (T.S.A.); (L.M.); (O.O.O.); (A.R.); (M.A.A.)
| |
Collapse
|
9
|
Integrated Genomic Analysis Identifies ANKRD36 Gene as a Novel and Common Biomarker of Disease Progression in Chronic Myeloid Leukemia. BIOLOGY 2021; 10:biology10111182. [PMID: 34827175 PMCID: PMC8615070 DOI: 10.3390/biology10111182] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 09/01/2021] [Accepted: 09/03/2021] [Indexed: 02/05/2023]
Abstract
Simple Summary Chronic myeloid leukemia is a type of blood cancer that is regarded as a success story in determining the exact biological origin, pathogenesis and development of a molecularly targeted (mutation-specific) therapy that has led to successful treatment of this fatal cancer. It is caused by the BCR-ABL fusion gene, which is formed from the translocation between chromosomes 9 and 22. Anti-BCR-ABL drugs, known as tyrosine kinase inhibitors (TKIs), have led to long-term remissions in more than 80% of CML patients and even cure in about one-third of patients. Nevertheless, many patients face drug resistance, and disease progression occurs in about 30% of CML patients, leading to morbidities and mortality. Unfortunately, no biomarkers of CML progression are available due to a poor understanding of the mechanism of progression. Therefore, finding reliable molecular biomarkers of CML progression is one of the most attractive research areas in 21st-century cancer research. In this study, we report novel genomic variants exclusively found in all our advanced-phase CML patients. This study will help in identifying CML patients at risk of disease progression and timely therapeutic interventions to avoid or at least delay fatal disease progression in this cancer. Abstract Background: Chronic myeloid leukemia (CML) is initiated in bone marrow due to chromosomal translocation t(9;22) leading to fusion oncogene BCR-ABL. Targeting BCR-ABL by tyrosine kinase inhibitors (TKIs) has changed fatal CML into an almost curable disease. Despite that, TKIs lose their effectiveness due to disease progression. Unfortunately, the mechanism of CML progression is poorly understood and common biomarkers for CML progression are unavailable. This study was conducted to find novel biomarkers of CML progression by employing whole-exome sequencing (WES). Materials and Methods: WES of accelerated phase (AP) and blast crisis (BC) CML patients was carried out, with chronic-phase CML (CP-CML) patients as control. After DNA library preparation and exome enrichment, clustering and sequencing were carried out using Illumina platforms. Statistical analysis was carried out using SAS/STAT software version 9.4, and R package was employed to find mutations shared exclusively by all AP-/BC-CML patients. Confirmation of mutations was carried out using Sanger sequencing and protein structure modeling using I-TASSER followed by mutant generation and visualization using PyMOL. Results: Three novel genes (ANKRD36, ANKRD36B and PRSS3) were mutated exclusively in all AP-/BC-CML patients. Only ANKRD36 gene mutations (c.1183_1184 delGC and c.1187_1185 dupTT) were confirmed by Sanger sequencing. Protein modeling studies showed that mutations induce structural changes in ANKRD36 protein. Conclusions: Our studies show that ANKRD36 is a potential common biomarker and drug target of early CML progression. ANKRD36 is yet uncharacterized in humans. It has the highest expression in bone marrow, specifically myeloid cells. We recommend carrying out further studies to explore the role of ANKRD36 in the biology and progression of CML.
Collapse
|
10
|
Tsang KY, Chan TCH, Yeung MCW, Wong TK, Lau WT, Mak CM. Validation of amplicon-based next generation sequencing panel for second-tier test in newborn screening for inborn errors of metabolism. J LAB MED 2021. [DOI: 10.1515/labmed-2021-0115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Objectives
Next generation sequencing (NGS) technology has allowed cost-effective massive parallel DNA sequencing. To evaluate the utility of NGS for newborn screening (NBS) of inborn errors of metabolism (IEM), a custom panel was designed to target 87 disease-related genes. The pilot study was primarily proposed for second-tier testing under the NBSIEM program in Hong Kong.
Methods
The validation of the panel was performed with two reference genomes and an external quality assurance (EQA) sample. Sequencing libraries were synthesized with amplicon-based approach. The libraries were pooled, spiked-in with 2% PhiX DNA as technical control, for 16-plex sequencing runs. Sequenced reads were analyzed using a commercially available pipeline.
Results
The average target region coverage was 208× and the fraction of region with target depth ≥20× was 95.7%, with a sensitivity of 91.2%. There were 85 out of 87 genes with acceptable coverage, and EQA result was satisfactory. The turnaround time from DNA extraction to completion of variant calling and quality control (QC) procedures was 2.5 days.
Conclusions
The NGS approach with the amplicon-based panel has been validated for analytical performance and is suitable for second-tier NBSIEM test.
Collapse
Affiliation(s)
- Kwok Yeung Tsang
- Newborn Screening for Inborn Errors of Metabolism Laboratory, Hong Kong Children's Hospital , Hong Kong SAR , P.R. China
- Department of Pathology, Division of Chemical Pathology , Hong Kong Children’s Hospital , Hong Kong SAR , P.R. China
| | - Toby Chun Hei Chan
- Newborn Screening for Inborn Errors of Metabolism Laboratory, Hong Kong Children's Hospital , Hong Kong SAR , P.R. China
- Department of Pathology, Division of Chemical Pathology , Hong Kong Children’s Hospital , Hong Kong SAR , P.R. China
| | - Matthew Chun Wing Yeung
- Newborn Screening for Inborn Errors of Metabolism Laboratory, Hong Kong Children's Hospital , Hong Kong SAR , P.R. China
- Department of Pathology, Division of Chemical Pathology , Hong Kong Children’s Hospital , Hong Kong SAR , P.R. China
| | - Tsz Ki Wong
- Newborn Screening for Inborn Errors of Metabolism Laboratory, Hong Kong Children's Hospital , Hong Kong SAR , P.R. China
| | - Wan Ting Lau
- Newborn Screening for Inborn Errors of Metabolism Laboratory, Hong Kong Children's Hospital , Hong Kong SAR , P.R. China
| | - Chloe Miu Mak
- Newborn Screening for Inborn Errors of Metabolism Laboratory, Hong Kong Children's Hospital , Hong Kong SAR , P.R. China
- Department of Pathology, Division of Chemical Pathology , Hong Kong Children’s Hospital , Hong Kong SAR , P.R. China
| |
Collapse
|
11
|
Ahmed Z, Renart EG, Zeeshan S. Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping. PeerJ 2021; 9:e11724. [PMID: 34395068 PMCID: PMC8320519 DOI: 10.7717/peerj.11724] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/14/2021] [Indexed: 12/12/2022] Open
Abstract
Over the last few decades, genomics is leading toward audacious future, and has been changing our views about conducting biomedical research, studying diseases, and understanding diversity in our society across the human species. The whole genome and exome sequencing (WGS/WES) are two of the most popular next-generation sequencing (NGS) methodologies that are currently being used to detect genetic variations of clinical significance. Investigating WGS/WES data for the variant discovery and genotyping is based on the nexus of different data analytic applications. Although several bioinformatics applications have been developed, and many of those are freely available and published. Timely finding and interpreting genetic variants are still challenging tasks among diagnostic laboratories and clinicians. In this study, we are interested in understanding, evaluating, and reporting the current state of solutions available to process the NGS data of variable lengths and types for the identification of variants, alleles, and haplotypes. Residing within the scope, we consulted high quality peer reviewed literature published in last 10 years. We were focused on the standalone and networked bioinformatics applications proposed to efficiently process WGS and WES data, and support downstream analysis for gene-variant discovery, annotation, prediction, and interpretation. We have discussed our findings in this manuscript, which include but not are limited to the set of operations, workflow, data handling, involved tools, technologies and algorithms and limitations of the assessed applications.
Collapse
Affiliation(s)
- Zeeshan Ahmed
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA.,Department of Medicine, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Eduard Gibert Renart
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Saman Zeeshan
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| |
Collapse
|
12
|
Liu Y, Yu Z, Dinger ME, Li J. Index suffix-prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression. Bioinformatics 2020; 35:2066-2074. [PMID: 30407482 DOI: 10.1093/bioinformatics/bty936] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 11/04/2018] [Accepted: 11/07/2018] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION Advanced high-throughput sequencing technologies have produced massive amount of reads data, and algorithms have been specially designed to contract the size of these datasets for efficient storage and transmission. Reordering reads with regard to their positions in de novo assembled contigs or in explicit reference sequences has been proven to be one of the most effective reads compression approach. As there is usually no good prior knowledge about the reference sequence, current focus is on the novel construction of de novo assembled contigs. RESULTS We introduce a new de novo compression algorithm named minicom. This algorithm uses large k-minimizers to index the reads and subgroup those that have the same minimizer. Within each subgroup, a contig is constructed. Then some pairs of the contigs derived from the subgroups are merged into longer contigs according to a (w, k)-minimizer-indexed suffix-prefix overlap similarity between two contigs. This merging process is repeated after the longer contigs are formed until no pair of contigs can be merged. We compare the performance of minicom with two reference-based methods and four de novo methods on 18 datasets (13 RNA-seq datasets and 5 whole genome sequencing datasets). In the compression of single-end reads, minicom obtained the smallest file size for 22 of 34 cases with significant improvement. In the compression of paired-end reads, minicom achieved 20-80% compression gain over the best state-of-the-art algorithm. Our method also achieved a 10% size reduction of compressed files in comparison with the best algorithm under the reads-order preserving mode. These excellent performances are mainly attributed to the exploit of the redundancy of the repetitive substrings in the long contigs. AVAILABILITY AND IMPLEMENTATION https://github.com/yuansliu/minicom. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuansheng Liu
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, Australia
| | - Zuguo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, China.,School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Australia
| | - Marcel E Dinger
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia.,St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, Australia
| |
Collapse
|
13
|
Bewicke-Copley F, Arjun Kumar E, Palladino G, Korfi K, Wang J. Applications and analysis of targeted genomic sequencing in cancer studies. Comput Struct Biotechnol J 2019; 17:1348-1359. [PMID: 31762958 PMCID: PMC6861594 DOI: 10.1016/j.csbj.2019.10.004] [Citation(s) in RCA: 87] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 10/18/2019] [Accepted: 10/22/2019] [Indexed: 12/31/2022] Open
Abstract
Next Generation Sequencing (NGS) has dramatically improved the flexibility and outcomes of cancer research and clinical trials, providing highly sensitive and accurate high-throughput platforms for large-scale genomic testing. In contrast to whole-genome (WGS) or whole-exome sequencing (WES), targeted genomic sequencing (TS) focuses on a panel of genes or targets known to have strong associations with pathogenesis of disease and/or clinical relevance, offering greater sequencing depth with reduced costs and data burden. This allows targeted sequencing to identify low frequency variants in targeted regions with high confidence, thus suitable for profiling low-quality and fragmented clinical DNA samples. As a result, TS has been widely used in clinical research and trials for patient stratification and the development of targeted therapeutics. However, its transition to routine clinical use has been slow. Many technical and analytical obstacles still remain and need to be discussed and addressed before large-scale and cross-centre implementation. Gold-standard and state-of-the-art procedures and pipelines are urgently needed to accelerate this transition. In this review we first present how TS is conducted in cancer research, including various target enrichment platforms, the construction of target panels, and selected research and clinical studies utilising TS to profile clinical samples. We then present a generalised analytical workflow for TS data discussing important parameters and filters in detail, aiming to provide the best practices of TS usage and analyses.
Collapse
Key Words
- BAM, Binary Alignment Map
- BWA, Burrows-Wheeler Aligner
- Background error
- CLL, Chronic Lymphocytic Leukaemia
- COSMIC, Catalogue of Somatic Mutations in Cancer
- Cancer genomics
- Clinical samples
- ESP, Exome Sequencing Project
- FF, Fresh Frozen
- FFPE, Formalin Fixed Paraffin Embedded
- FL, Follicular Lymphoma
- GATK, Genome Analysis Toolkit
- ICGC, International Cancer Genome Consortium
- MBC, Molecular Barcode
- NCCN, the National Comprehensive Cancer Network®
- NGS, Next Generation Sequencing
- NHL, Non-Hodgkin Lymphoma
- NSCLC, Non-Small Cell Lung Carcinoma
- PCR duplicates
- QC, Quality Control
- SAM, Sequence Alignment Map
- TCGA, The Cancer Genome Atlas
- TS, Targeted Sequencing
- Targeted sequencing
- UMI, Unique Molecular Identifiers
- VAF, Variant Allele Frequency
- Variant calling
- WES, Whole Exome Sequencing
- WGS, Whole Genome Sequencing
- tFL, Transformed Follicular Lymphoma
Collapse
Affiliation(s)
- Findlay Bewicke-Copley
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Emil Arjun Kumar
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Giuseppe Palladino
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Koorosh Korfi
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Jun Wang
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| |
Collapse
|
14
|
Klasberg S, Surendranath V, Lange V, Schöfl G. Bioinformatics Strategies, Challenges, and Opportunities for Next Generation Sequencing-Based HLA Genotyping. Transfus Med Hemother 2019; 46:312-325. [PMID: 31832057 PMCID: PMC6876610 DOI: 10.1159/000502487] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Accepted: 07/30/2019] [Indexed: 12/16/2022] Open
Abstract
The advent of next generation sequencing (NGS) has altered the face of genotyping the human leukocyte antigen (HLA) system in clinical, stem cell donor registry, and research contexts. NGS has led to a dramatically increased sequencing throughput at high accuracy, while being more time and cost efficient than precursor technologies. This has led to a broader and deeper profiling of the key genes in the human immunogenetic make-up. The rapid evolution of sequencing technologies is evidenced by the development of varied short-read sequencing platforms with differing read lengths and sequencing capacities to long-read sequencing platforms capable of profiling full genes without fragmentation. Concomitantly, there has been development of a diverse set of computational analyses and software tools developed to deal with the various strengths and limitations of the sequencing data generated by the different sequencing platforms. This review surveys the different modalities involved in generating NGS HLA profiling sequence data. It systematically describes various computational approaches that have been developed to achieve HLA genotyping to different degrees of resolution. At each stage, this review enumerates the drawbacks and advantages of each of the platforms and analysis approaches, thus providing a comprehensive picture of the current state of HLA genotyping technologies.
Collapse
|
15
|
Ibrahim O, Sutherland HG, Haupt LM, Griffiths LR. Saliva as a comparable-quality source of DNA for Whole Exome Sequencing on Ion platforms. Genomics 2019; 112:1437-1443. [PMID: 31445087 DOI: 10.1016/j.ygeno.2019.08.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Revised: 08/05/2019] [Accepted: 08/19/2019] [Indexed: 11/17/2022]
Abstract
BACKGROUND Whole Exome Sequencing (WES) utilises overlapping fragments prone to sequencing artefacts. Saliva, a non-invasive source of DNA, has been successfully used in WES studies on various platforms. This study explored the validity and quality of DNA sourced from saliva compared to whole blood on an Ion Platform. METHODS DNA was extracted from both sample types from four individuals. WES, performed on the Ion Proton platform was assessed for quality metrics (Depth, Genotyping Quality, etc.) and variant identification for the same source sample-pairs. RESULTS No significant differences in quality metrics were identified between data obtained from whole blood and saliva samples, with several saliva samples demonstrating higher coverage depth. Variants within the same sample, from the two genomic DNA sources, had an average concordance similar to other studies and platforms with different chemistry. CONCLUSION Saliva-extracted DNA provides comparable sequencing quality to whole blood for WES on Ion Torrent Platforms.
Collapse
Affiliation(s)
- Omar Ibrahim
- Genomics Research Centre, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, Australia
| | - Heidi G Sutherland
- Genomics Research Centre, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, Australia
| | - Larisa M Haupt
- Genomics Research Centre, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, Australia
| | - Lyn R Griffiths
- Genomics Research Centre, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, Australia.
| |
Collapse
|
16
|
Hasan MS, Wu X, Zhang L. Uncovering missed indels by leveraging unmapped reads. Sci Rep 2019; 9:11093. [PMID: 31366961 PMCID: PMC6668410 DOI: 10.1038/s41598-019-47405-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 07/12/2019] [Indexed: 02/08/2023] Open
Abstract
In current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic variants. Although existing alignment tools have shown great accuracy in mapping short reads to the reference genome, a significant number of short reads still remain unmapped and are often excluded from downstream analyses thereby causing nonnegligible information loss in the subsequent variant calling procedure. This paper describes Genesis-indel, a computational pipeline that explores the unmapped reads to identify novel indels that are initially missed in the original procedure. Genesis-indel is applied to the unmapped reads of 30 breast cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-indel identifies 72,997 novel high-quality indels previously not found, among which 16,141 have not been annotated in the widely used mutation database. Statistical analysis of these indels shows significant enrichment of indels residing in oncogenes and tumour suppressor genes. Functional annotation further reveals that these indels are strongly correlated with pathways of cancer and can have high to moderate impact on protein functions. Additionally, some of the indels overlap with the genes that do not have any indel mutations called from the originally mapped reads but have been shown to contribute to the tumorigenesis in multiple carcinomas, further emphasizing the importance of rescuing indels hidden in the unmapped reads in cancer and disease studies.
Collapse
Affiliation(s)
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA.
| |
Collapse
|
17
|
|
18
|
Falardeau F, Camurri MV, Campeau PM. Genomic approaches to diagnose rare bone disorders. Bone 2017; 102:5-14. [PMID: 27474525 DOI: 10.1016/j.bone.2016.07.020] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 07/24/2016] [Indexed: 02/01/2023]
Abstract
Skeletal dysplasias are Mendelian disorders with a prevalence of approximatively 1 in every 5000 individuals and can usually be diagnosed based on clinical and radiological findings. However, given that some diseases can be caused by several different genes, and that some genes can cause a variety of different phenotypes, achieving a molecular diagnosis can be challenging. We review here different approaches, from single gene sequencing to genomic approaches using next-generation sequencing, to reach a molecular diagnosis for skeletal dysplasias. We will further describe the overall advantages and limitations of first, second and third-generation sequencing, including single gene sequencing, whole-exome and genome sequencing (WES and WGS), multiple gene panel sequencing and single molecule sequencing. We also provide a brief overview of potential future applications of emerging technologies.
Collapse
Affiliation(s)
- Félix Falardeau
- CHU Sainte-Justine Research Center, Montreal, Canada; Division of Molecular and Cellular Biology, Department of Biology, University of Sherbrooke, Sherbrooke, Canada
| | | | - Philippe M Campeau
- CHU Sainte-Justine Research Center, Montreal, Canada; Division of Medical Genetics, Department of Pediatrics, University of Montreal, Montreal, Canada.
| |
Collapse
|
19
|
Rabbani B, Nakaoka H, Akhondzadeh S, Tekin M, Mahdieh N. Next generation sequencing: implications in personalized medicine and pharmacogenomics. MOLECULAR BIOSYSTEMS 2017; 12:1818-30. [PMID: 27066891 DOI: 10.1039/c6mb00115g] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
A breakthrough in next generation sequencing (NGS) in the last decade provided an unprecedented opportunity to investigate genetic variations in humans and their roles in health and disease. NGS offers regional genomic sequencing such as whole exome sequencing of coding regions of all genes, as well as whole genome sequencing. RNA-seq offers sequencing of the entire transcriptome and ChIP-seq allows for sequencing the epigenetic architecture of the genome. Identifying genetic variations in individuals can be used to predict disease risk, with the potential to halt or retard disease progression. NGS can also be used to predict the response to or adverse effects of drugs or to calculate appropriate drug dosage. Such a personalized medicine also provides the possibility to treat diseases based on the genetic makeup of the patient. Here, we review the basics of NGS technologies and their application in human diseases to foster human healthcare and personalized medicine.
Collapse
Affiliation(s)
- Bahareh Rabbani
- Cardiogenetic Research Center, Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences, Niayesh-Vali asr Intersection, Tehran, Iran.
| | - Hirofumi Nakaoka
- Division of Human Genetics, Department of Integrated Genetics, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan
| | - Shahin Akhondzadeh
- Psychiatric Research Center, Roozbeh Psychiatric Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Mustafa Tekin
- John P Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Nejat Mahdieh
- Cardiogenetic Research Center, Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences, Niayesh-Vali asr Intersection, Tehran, Iran.
| |
Collapse
|
20
|
Usman T, Hadlich F, Demasius W, Weikard R, Kühn C. Unmapped reads from cattle RNAseq data: A source for missing and misassembled sequences in the reference assemblies and for detection of pathogens in the host. Genomics 2017; 109:36-42. [DOI: 10.1016/j.ygeno.2016.11.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 11/21/2016] [Accepted: 11/28/2016] [Indexed: 11/15/2022]
|
21
|
Leiva-Torres GA, Nebesio N, Vidal SM. Discovery of Variants Underlying Host Susceptibility to Virus Infection Using Whole-Exome Sequencing. Methods Mol Biol 2017; 1656:209-227. [PMID: 28808973 PMCID: PMC7120756 DOI: 10.1007/978-1-4939-7237-1_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
The clinical course of any viral infection greatly differs in individuals. This variation results from various viral, host, and environmental factors. The identification of host genetic factors influencing inter-individual variation in susceptibility to several pathogenic viruses has tremendously increased our understanding of the mechanisms and pathways required for immunity. Next-generation sequencing of whole exomes represents a powerful tool in biomedical research. In this chapter, we briefly introduce whole-exome sequencing in the context of genetic approaches to identify host susceptibility genes to viral infections. We then describe general aspects of the workflow for whole-exome sequence analysis together with the tools and online resources that can be used to identify and annotate variant calls, and then prioritize them for their potential association to phenotypes of interest.
Collapse
Affiliation(s)
- Gabriel A Leiva-Torres
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Research Center on Complex Traits, Montreal, QC, Canada
- Department of Medicine, McGill University, Montreal, QC, Canada
| | - Nestor Nebesio
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Research Center on Complex Traits, Montreal, QC, Canada
- Department of Medicine, McGill University, Montreal, QC, Canada
| | - Silvia M Vidal
- Department of Human Genetics, McGill University, Montreal, QC, Canada.
- McGill University Research Center on Complex Traits, Montreal, QC, Canada.
- Department of Medicine, McGill University, Montreal, QC, Canada.
| |
Collapse
|
22
|
Zayed H. The Qatar genome project: translation of whole-genome sequencing into clinical practice. Int J Clin Pract 2016; 70:832-834. [PMID: 27586018 DOI: 10.1111/ijcp.12871] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Accepted: 08/02/2016] [Indexed: 11/29/2022] Open
Abstract
Qatar Genome Project was launched in 2013 with the intent to sequence the genome of each Qatari citizen in an effort to protect Qataris from the high rate of indigenous genetic diseases by allowing the mapping of disease-causing variants/rare variants and establishing a Qatari reference genome. Indeed, this project is expected to have numerous global benefits because the elevated homogeneity of the Qatari population, that will make Qatar an excellent genetic laboratory that will generate a wealth of data that will allow us to make sense of the genotype-phenotype correlations of many diseases, especially the complex multifactorial diseases, and will pave the way for changing the traditional medical practice of looking first at the phenotype rather than the genotype.
Collapse
Affiliation(s)
- Hatem Zayed
- Biomedical Program, Department of Health Sciences, Qatar University, Doha, Qatar.
| |
Collapse
|
23
|
Popitsch N, Schuh A, Taylor JC. ReliableGenome: annotation of genomic regions with high/low variant calling concordance. Bioinformatics 2016; 33:155-160. [PMID: 27605105 PMCID: PMC5903559 DOI: 10.1093/bioinformatics/btw587] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 08/12/2016] [Accepted: 09/04/2016] [Indexed: 12/30/2022] Open
Abstract
Motivation The increasing adoption of clinical whole-genome resequencing (WGS) demands for highly accurate and reproducible variant calling (VC) methods. The observed discordance between state-of-the-art VC pipelines, however, indicates that the current practice still suffers from non-negligible numbers of false positive and negative SNV and INDEL calls that were shown to be enriched among discordant calls but also in genomic regions with low sequence complexity. Results Here, we describe our method ReliableGenome (RG) for partitioning genomes into high and low concordance regions with respect to a set of surveyed VC pipelines. Our method combines call sets derived by multiple pipelines from arbitrary numbers of datasets and interpolates expected concordance for genomic regions without data. By applying RG to 219 deep human WGS datasets, we demonstrate that VC concordance depends predominantly on genomic context rather than the actual sequencing data which manifests in high recurrence of regions that can/cannot be reliably genotyped by a single method. This enables the application of pre-computed regions to other data created with comparable sequencing technology and software. RG outperforms comparable efforts in predicting VC concordance and false positive calls in low-concordance regions which underlines its usefulness for variant filtering, annotation and prioritization. RG allows focusing resource-intensive algorithms (e.g. consensus calling methods) on the smaller, discordant share of the genome (20–30%) which might result in increased overall accuracy at reasonable costs. Our method and analysis of discordant calls may further be useful for development, benchmarking and optimization of VC algorithms and for the relative comparison of call sets between different studies/pipelines. Availability and Implementation RG was implemented in Java, source code and binaries are freely available for non-commercial use at https://github.com/popitsch/wtchg-rg/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Niko Popitsch
- Wellcome Trust Centre of Human Genetics, University of Oxford, Oxford OX3 7BN, UK.,National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, The Churchill Hospital, Old Road OX3 7LE, UK
| | | | - Anna Schuh
- National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, The Churchill Hospital, Old Road OX3 7LE, UK.,Department of Oncology, University of Oxford, Oxford OX3 7DQ, UK
| | - Jenny C Taylor
- Wellcome Trust Centre of Human Genetics, University of Oxford, Oxford OX3 7BN, UK.,National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, The Churchill Hospital, Old Road OX3 7LE, UK
| |
Collapse
|
24
|
Koboldt DC, Larson DE, Wilson RK. Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection. ACTA ACUST UNITED AC 2016; 44:15.4.1-17. [PMID: 25553206 DOI: 10.1002/0471250953.bi1504s44] [Citation(s) in RCA: 140] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The identification of small sequence variants remains a challenging but critical step in the analysis of next-generation sequencing data. Our variant calling tool, VarScan 2, employs heuristic and statistic thresholds based on user-defined criteria to call variants using SAMtools mpileup data as input. Here, we provide guidelines for generating that input, and describe protocols for using VarScan 2 to (1) identify germline variants in individual samples; (2) call somatic mutations, copy number alterations, and LOH events in tumor-normal pairs; and (3) identify germline variants, de novo mutations, and Mendelian inheritance errors in family trios. Further, we describe a strategy for variant filtering that removes likely false positives associated with common sequencing- and alignment-related artifacts.
Collapse
Affiliation(s)
- Daniel C Koboldt
- The Genome Institute at Washington University in St. Louis, Missouri 63108, USA
| | - David E Larson
- The Genome Institute at Washington University in St. Louis, Missouri, USA, 63108
| | - Richard K Wilson
- The Genome Institute at Washington University in St. Louis, Missouri, USA, 63108
| |
Collapse
|
25
|
Small RNA-Based Antiviral Defense in the Phytopathogenic Fungus Colletotrichum higginsianum. PLoS Pathog 2016; 12:e1005640. [PMID: 27253323 PMCID: PMC4890784 DOI: 10.1371/journal.ppat.1005640] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 04/26/2016] [Indexed: 12/21/2022] Open
Abstract
Even though the fungal kingdom contains more than 3 million species, little is known about the biological roles of RNA silencing in fungi. The Colletotrichum genus comprises fungal species that are pathogenic for a wide range of crop species worldwide. To investigate the role of RNA silencing in the ascomycete fungus Colletotrichum higginsianum, knock-out mutants affecting genes for three RNA-dependent RNA polymerase (RDR), two Dicer-like (DCL), and two Argonaute (AGO) proteins were generated by targeted gene replacement. No effects were observed on vegetative growth for any mutant strain when grown on complex or minimal media. However, Δdcl1, Δdcl1Δdcl2 double mutant, and Δago1 strains showed severe defects in conidiation and conidia morphology. Total RNA transcripts and small RNA populations were analyzed in parental and mutant strains. The greatest effects on both RNA populations was observed in the Δdcl1, Δdcl1Δdcl2, and Δago1 strains, in which a previously uncharacterized dsRNA mycovirus [termed Colletotrichum higginsianum non-segmented dsRNA virus 1 (ChNRV1)] was derepressed. Phylogenetic analyses clearly showed a close relationship between ChNRV1 and members of the segmented Partitiviridae family, despite the non-segmented nature of the genome. Immunoprecipitation of small RNAs associated with AGO1 showed abundant loading of 5’U-containing viral siRNA. C. higginsianum parental and Δdcl1 mutant strains cured of ChNRV1 revealed that the conidiation and spore morphology defects were primarily caused by ChNRV1. Based on these results, RNA silencing involving ChDCL1 and ChAGO1 in C. higginsianum is proposed to function as an antiviral mechanism. Colletotrichum sp. comprises a diverse group of fungal pathogens that attack over 3000 plant species worldwide. Understanding the underlying mechanisms that govern fungal development and pathogenicity may enable more effective and sustainable approaches to crop disease management and control. In most organisms, RNA silencing is an important mechanism to control endogenous and exogenous RNA. RNA silencing utilizes small regulatory molecules (small RNAs) produced by proteins called Dicer (DCL), and exercise their function though effector proteins named Argonaute (AGO). Here, we investigated the role of RNA silencing machinery in the fungus Colletotrichum higginsianum, by generating deletions in genes encoding RNA silencing components. Severe defects were observed in both conidiation and conidia morphology in the Δdcl1, Δdcl1Δdcl2, and Δago1 strains. Analysis of transcripts and small RNAs revealed an uncharacterized dsRNA virus persistently infecting C. higginsianum. The virus was shown (1) to be de-repressed in the Δdcl1, Δdcl1Δdcl2 and Δago1 strains, and (2) to cause the conidiation and spore mutant phenotypes. Our results indicate that C. higginsianum employs RNA silencing as an antiviral mechanism to suppress viruses and their debilitating effects.
Collapse
|
26
|
Wang PPS, Parker WT, Branford S, Schreiber AW. BAM-matcher: a tool for rapid NGS sample matching. Bioinformatics 2016; 32:2699-701. [PMID: 27153667 DOI: 10.1093/bioinformatics/btw239] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 04/25/2016] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED The standard method used by high-throughput genome sequencing facilities for detecting mislabelled samples is to use independently generated high-density SNP data to determine sample identity. However, as it has now become commonplace to have multiple samples sequenced from the same source, such as for analysis of somatic variants using matched tumour and normal samples, we can directly use the genotype information inherent in the sequence data to match samples and thus bypass the need for additional laboratory testing. Here we present BAM-matcher, a tool that can rapidly determine whether two BAM files represent samples from the same biological source by comparing their genotypes. BAM-matcher is designed to be simple to use, provides easily interpretable results, and is suitable for deployment at early stages of data processing pipelines. AVAILABILITY AND IMPLEMENTATION BAM-matcher is licensed under the Creative Commons by Attribution license, and is available from: https://bitbucket.org/sacgf/bam-matcher SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT paul.wang@sa.gov.au.
Collapse
Affiliation(s)
- Paul P S Wang
- Department of Genetics and Molecular Pathology, and ACRF Cancer Genomics Facility, Centre for Cancer Biology, SA Pathology, Adelaide, Australia
| | - Wendy T Parker
- Department of Genetics and Molecular Pathology, and ACRF Cancer Genomics Facility, Centre for Cancer Biology, SA Pathology, Adelaide, Australia, School of Pharmacy and Medical Science, University of South Australia, Adelaide, Australia
| | - Susan Branford
- Department of Genetics and Molecular Pathology, and School of Pharmacy and Medical Science, University of South Australia, Adelaide, Australia, School of Biological Sciences and School of Medicine, University of Adelaide, Adelaide, Australia
| | - Andreas W Schreiber
- ACRF Cancer Genomics Facility, Centre for Cancer Biology, SA Pathology, Adelaide, Australia, School of Biological Sciences and
| |
Collapse
|
27
|
Li J, Batcha AMN, Grüning B, Mansmann UR. An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology. Cancer Inform 2016; 14:87-107. [PMID: 27081306 PMCID: PMC4827795 DOI: 10.4137/cin.s30793] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 03/02/2016] [Accepted: 03/17/2016] [Indexed: 12/23/2022] Open
Abstract
Next-generation sequencing (NGS) technologies that have advanced rapidly in the past few years possess the potential to classify diseases, decipher the molecular code of related cell processes, identify targets for decision-making on targeted therapy or prevention strategies, and predict clinical treatment response. Thus, NGS is on its way to revolutionize oncology. With the help of NGS, we can draw a finer map for the genetic basis of diseases and can improve our understanding of diagnostic and prognostic applications and therapeutic methods. Despite these advantages and its potential, NGS is facing several critical challenges, including reduction of sequencing cost, enhancement of sequencing quality, improvement of technical simplicity and reliability, and development of semiautomated and integrated analysis workflow. In order to address these challenges, we conducted a literature research and summarized a four-stage NGS workflow for providing a systematic review on NGS-based analysis, explaining the strength and weakness of diverse NGS-based software tools, and elucidating its potential connection to individualized medicine. By presenting this four-stage NGS workflow, we try to provide a minimal structural layout required for NGS data storage and reproducibility.
Collapse
Affiliation(s)
- Jian Li
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany.; German Cancer Consortium (DKTK), Heidelberg, Germany.; German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Aarif Mohamed Nazeer Batcha
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany.; German Cancer Consortium (DKTK), Heidelberg, Germany.; German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany.; Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Ulrich R Mansmann
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany.; German Cancer Consortium (DKTK), Heidelberg, Germany
| |
Collapse
|
28
|
Modai S, Shomron N. Molecular Risk Factors for Schizophrenia. Trends Mol Med 2016; 22:242-253. [DOI: 10.1016/j.molmed.2016.01.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 01/15/2016] [Accepted: 01/15/2016] [Indexed: 01/02/2023]
|
29
|
|
30
|
Moorcraft SY, Gonzalez D, Walker BA. Understanding next generation sequencing in oncology: A guide for oncologists. Crit Rev Oncol Hematol 2015; 96:463-74. [DOI: 10.1016/j.critrevonc.2015.06.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 05/21/2015] [Accepted: 06/17/2015] [Indexed: 12/17/2022] Open
|
31
|
Beal MA, Gagné R, Williams A, Marchetti F, Yauk CL. Characterizing Benzo[a]pyrene-induced lacZ mutation spectrum in transgenic mice using next-generation sequencing. BMC Genomics 2015; 16:812. [PMID: 26481219 PMCID: PMC4617527 DOI: 10.1186/s12864-015-2004-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2015] [Accepted: 10/03/2015] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The transgenic rodent mutation reporter assay provides an efficient approach to identify mutagenic agents in vivo. A major advantage of this assay is that mutant reporter transgenes can be sequenced to provide information on the mode of action of a mutagen and to identify clonally expanded mutations. However, conventional DNA sequence analysis is laborious and expensive for long transgenes, such as lacZ (3096 bp), and is not normally implemented in routine screening. METHODS We developed a high-throughput next-generation sequencing (NGS) approach to simultaneously sequence large numbers of barcoded mutant lacZ transgenes from different animals. We collected 3872 mutants derived from the bone marrow DNA of six Muta™Mouse males exposed to the well-established mutagen benzo[a]pyrene (BaP) and six solvent-exposed controls. Mutants within animal samples were pooled, barcoded, and then sequenced using NGS. RESULTS We identified 1652 mutant sequences from 1006 independent mutations that underwent clonal expansion. This deep sequencing analysis of mutation spectrum demonstrated that BaP causes primarily guanine transversions (e.g. G:C → T:A), which is highly consistent with previous studies employing Sanger sequencing. Furthermore, we identified novel mutational hotspots in the lacZ transgene that were previously uncharacterized by Sanger sequencing. Deep sequencing also allowed for an unprecedented ability to correct for clonal expansion events, improving the sensitivity of the mutation reporter assay by 50 %. CONCLUSION These results demonstrate that the high-throughput nature and reduced costs offered by NGS provide a sensitive and fast approach for elucidating and comparing mutagenic mechanisms of various agents among tissues and enabling improved evaluation of genotoxins.
Collapse
Affiliation(s)
- Marc A Beal
- Carleton University, Ottawa, ON, K1S 5B6, Canada.
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| | - Rémi Gagné
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| | - Andrew Williams
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| | - Francesco Marchetti
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| | - Carole L Yauk
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| |
Collapse
|
32
|
Bae JS, Kim NKD, Lee C, Kim SC, Lee HR, Song HR, Park KB, Kim HW, Lee SH, Kim HY, Lee SC, Jeong C, Park MS, Yoo WJ, Chung CY, Choi IH, Kim OH, Park WY, Cho TJ. Comprehensive genetic exploration of skeletal dysplasia using targeted exome sequencing. Genet Med 2015; 18:563-9. [PMID: 26402641 DOI: 10.1038/gim.2015.129] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 08/10/2015] [Indexed: 02/07/2023] Open
Abstract
PURPOSE The purpose of this study was to evaluate the clinical utility of targeted exome sequencing (TES) as a molecular diagnostic tool for patients with skeletal dysplasia. METHODS A total of 185 patients either diagnosed with or suspected to have skeletal dysplasia were recruited over a period of 3 years. TES was performed for 255 genes associated with the pathogenesis of skeletal dysplasia, and candidate variants were selected using a bioinformatics analysis. All candidate variants were confirmed by Sanger sequencing, correlation with the phenotype, and a cosegregation study in the family. RESULTS TES detected "confirmed" or "highly likely" pathogenic sequence variants in 74% (71 of 96) of cases in the assured clinical diagnosis category and 20.3% (13 of 64 cases) of cases in the uncertain clinical diagnosis category. TES successfully detected pathogenic variants in all 25 cases of previously known genotypes. The data also suggested a copy-number variation that led to a molecular diagnosis. CONCLUSION This study demonstrates the feasibility of TES for the molecular diagnosis of skeletal dysplasia. However, further confirmation is needed for a final molecular diagnosis, including Sanger sequencing of candidate variants with suspected, poorly captured exons.Genet Med 18 6, 563-569.
Collapse
Affiliation(s)
- Jun-Seok Bae
- Samsung Genome Institute, Samsung Medical Center, Seoul, Korea.,Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, Seoul, Korea
| | - Nayoung K D Kim
- Samsung Genome Institute, Samsung Medical Center, Seoul, Korea
| | - Chung Lee
- Samsung Genome Institute, Samsung Medical Center, Seoul, Korea.,Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, Seoul, Korea
| | - Sang Cheol Kim
- Samsung Genome Institute, Samsung Medical Center, Seoul, Korea
| | - Hey Ran Lee
- Division of Pediatric Orthopaedics, Seoul National University Children's Hospital, Seoul, Korea
| | - Hae-Ryong Song
- Department of Orthopaedic Surgery, Korea University Guro Hospital, Seoul, Korea
| | - Kun Bo Park
- Department of Orthopaedic Surgery, Inje University Haeundae Hospital, Busan, Korea
| | - Hyun Woo Kim
- Department of Orthopaedic Surgery, Severance Children's Hospital, Yonsei University College of Medicine, Seoul, Korea
| | - Soon Hyuck Lee
- Department of Orthopaedic Surgery, Korea University Anam Hospital, Seoul, Korea
| | - Ha Yong Kim
- Department of Orthopaedic Surgery, Eulji University Daejeon Hospital, Daejeon, Korea
| | - Soon Chul Lee
- Department of Orthopaedic Surgery, CHA Bundang Medical Center, CHA University School of Medicine, Seongnam, Korea
| | - Changhoon Jeong
- Department of Orthopaedic Surgery, Bucheon St. Mary's Hospital, the Catholic University of Korea, Bucheon, Korea
| | - Moon Seok Park
- Department of Orthopaedic Surgery, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Won Joon Yoo
- Division of Pediatric Orthopaedics, Seoul National University Children's Hospital, Seoul, Korea
| | - Chin Youb Chung
- Department of Orthopaedic Surgery, Seoul National University Bundang Hospital, Seongnam, Korea
| | - In Ho Choi
- Division of Pediatric Orthopaedics, Seoul National University Children's Hospital, Seoul, Korea
| | - Ok-Hwa Kim
- Department of Radiology, Woorisoa Children's Hospital, Seoul, Korea
| | - Woong-Yang Park
- Samsung Genome Institute, Samsung Medical Center, Seoul, Korea.,Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, Seoul, Korea.,Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon, Korea
| | - Tae-Joon Cho
- Division of Pediatric Orthopaedics, Seoul National University Children's Hospital, Seoul, Korea
| |
Collapse
|
33
|
MaPSeq, A Service-Oriented Architecture for Genomics Research within an Academic Biomedical Research Institution. INFORMATICS 2015. [DOI: 10.3390/informatics2030020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
34
|
Eduardoff M, Santos C, de la Puente M, Gross T, Fondevila M, Strobl C, Sobrino B, Ballard D, Schneider P, Carracedo Á, Lareu M, Parson W, Phillips C. Inter-laboratory evaluation of SNP-based forensic identification by massively parallel sequencing using the Ion PGM™. Forensic Sci Int Genet 2015; 17:110-121. [DOI: 10.1016/j.fsigen.2015.04.007] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Revised: 03/31/2015] [Accepted: 04/12/2015] [Indexed: 01/20/2023]
|
35
|
D'Antonio M, D'Onorio De Meo P, Pallocca M, Picardi E, D'Erchia AM, Calogero RA, Castrignanò T, Pesole G. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application. BMC Genomics 2015; 16:S3. [PMID: 26046471 PMCID: PMC4461013 DOI: 10.1186/1471-2164-16-s6-s3] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Background The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results. Methods In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Results Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.
Collapse
|
36
|
Na YJ, Sohn KA, Kim JH. Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information. BMC Med Genomics 2015; 8 Suppl 2:S4. [PMID: 26045178 PMCID: PMC4460593 DOI: 10.1186/1755-8794-8-s2-s4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background The rapid advances in genome sequencing technologies have resulted in an unprecedented number of genome variations being discovered in humans. However, there has been very limited coverage of interpretation of the personal genome sequencing data in terms of diseases. Methods In this paper we present the first computational analysis scheme for interpreting personal genome data by simultaneously considering the functional impact of damaging variants and curated disease-gene association data. This method is based on mutual information as a measure of the relative closeness between the personal genome and diseases. We hypothesize that a higher mutual information score implies that the personal genome is more susceptible to a particular disease than other diseases. Results The method was applied to the sequencing data of 50 acute myeloid leukemia (AML) patients in The Cancer Genome Atlas. The utility of associations between a disease and the personal genome was explored using data of healthy (control) people obtained from the 1000 Genomes Project. The ranks of the disease terms in the AML patient group were compared with those in the healthy control group using "Leukemia, Myeloid, Acute" (C04.557.337.539.550) as the corresponding MeSH disease term. The mutual information rank of the disease term was substantially higher in the AML patient group than in the healthy control group, which demonstrates that the proposed methodology can be successfully applied to infer associations between the personal genome and diseases. Conclusions Overall, the area under the receiver operating characteristics curve was significantly larger for the AML patient data than for the healthy controls. This methodology could contribute to consequential discoveries and explanations for mining personal genome sequencing data in terms of diseases, and have versatility with respect to genomic-based knowledge such as drug-gene and environmental-factor-gene interactions.
Collapse
|
37
|
Pipan V, Kunej T. Initiative for standardization of the format of the next-generation sequencing (NGS) results. Discoveries (Craiova) 2015; 3:e44. [PMID: 32309567 PMCID: PMC6941547 DOI: 10.15190/d.2015.36] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2015] [Revised: 05/08/2015] [Accepted: 05/08/2015] [Indexed: 12/02/2022] Open
Abstract
The number of published reports using next-generation sequencing (NGS) technology in cancer research is increasing. These technologies generate large amounts of data that need to be appropriately presented and available to other researchers for further use. Our goal was to create a comprehensive database with single nucleotide polymorphisms (SNPs) associated with different types of cancer to integrate them to our bioinformatics tools. We reviewed more than 200 scientific papers and extracted relevant information on mutations detected by NGS technology. The current version of the database contains more than 100.000 mutations in more than 70 types of cancer. However, our review of NGS studies revealed great variation in presentation of NGS data in scientific literature with almost no effort for standardization of the data format. NGS results are published in a variety of forms which hinders the gathering of information. Therefore we suggested a uniform format for presenting the NGS data. This will allow faster database development, easier access and data sharing between the laboratories. The database will be a useful tool to many researchers in the field of cancer research and can be a base for a range of studies such as genome-wide association studies, microRNA target binding, and development of cancer biomarkers research.
Collapse
Affiliation(s)
- Veronika Pipan
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Slovenia
| | - Tanja Kunej
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Slovenia
| |
Collapse
|
38
|
Dearing KR, Weiss GJ. Translating next-generation sequencing from clinical trials to clinical practice for the treatment of advanced cancers. Per Med 2015; 12:155-162. [PMID: 29754537 DOI: 10.2217/pme.14.54] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Next-generation sequencing (NGS) is being applied in oncology care to identify specific molecular aberrations of patient's tumors. The use of NGS now allows for sequencing entire human genomes within a reasonable cost and practical time frames for treatment decision making. Further delineation of epigenetics, transcriptomics, metagenomics and NGS at the level of circulating tumor DNA reveal ever increasing complexity to understand these interactions and the roles they play in cancer. With the improvement in understanding the study of proteomics, it has become clear that NGS has room for innovation to someday include sequencing of proteins. Early embarkation of NGS incorporated into clinical trials has begun. Here, we review the feasibility and practicality of translating NGS from clinical trials to clinical practice.
Collapse
Affiliation(s)
- Kristen R Dearing
- Cancer Treatment Centers of America, 14200 Celebrate Life Way, Goodyear, AZ 85338, USA
| | - Glen J Weiss
- Cancer Treatment Centers of America, 14200 Celebrate Life Way, Goodyear, AZ 85338, USA.,CRAB-Clinical Trials Consortium, 1730 Minor Ave., Seattle, WA 98101, USA
| |
Collapse
|
39
|
Lan JH, Yin Y, Reed EF, Moua K, Thomas K, Zhang Q. Impact of three Illumina library construction methods on GC bias and HLA genotype calling. Hum Immunol 2014; 76:166-75. [PMID: 25543015 DOI: 10.1016/j.humimm.2014.12.016] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Revised: 10/17/2014] [Accepted: 12/15/2014] [Indexed: 01/04/2023]
Abstract
Next-generation sequencing (NGS) is increasingly recognized for its ability to overcome allele ambiguity and deliver high-resolution typing in the HLA system. Using this technology, non-uniform read distribution can impede the reliability of variant detection, which renders high-confidence genotype calling particularly difficult to achieve in the polymorphic HLA complex. Recently, library construction has been implicated as the dominant factor in instigating coverage bias. To study the impact of this phenomenon on HLA genotyping, we performed long-range PCR on 12 samples to amplify HLA-A, -B, -C, -DRB1, and -DQB1, and compared the relative contribution of three Illumina library construction methods (TruSeq Nano, Nextera, Nextera XT) in generating downstream bias. Here, we show high GC% to be a good predictor of low sequencing depth. Compared to standard TruSeq Nano, GC bias was more prominent in transposase-based protocols, particularly Nextera XT, likely through a combination of transposase insertion bias being coupled with a high number of PCR enrichment cycles. Importantly, our findings demonstrate non-uniform read depth can have a direct and negative impact on the robustness of HLA genotyping, which has clinical implications for users when choosing a library construction strategy that aims to balance cost and throughput with data quality.
Collapse
Affiliation(s)
- James H Lan
- UCLA Immunogenetics Center, Department of Pathology & Laboratory Medicine, Los Angeles, CA, USA; University of British Columbia, Clinician Investigator Program, Vancouver, BC, Canada
| | - Yuxin Yin
- UCLA Immunogenetics Center, Department of Pathology & Laboratory Medicine, Los Angeles, CA, USA
| | - Elaine F Reed
- UCLA Immunogenetics Center, Department of Pathology & Laboratory Medicine, Los Angeles, CA, USA
| | - Kevin Moua
- UCLA Immunogenetics Center, Department of Pathology & Laboratory Medicine, Los Angeles, CA, USA
| | - Kimberly Thomas
- UCLA Immunogenetics Center, Department of Pathology & Laboratory Medicine, Los Angeles, CA, USA
| | - Qiuheng Zhang
- UCLA Immunogenetics Center, Department of Pathology & Laboratory Medicine, Los Angeles, CA, USA
| |
Collapse
|
40
|
The revolution in human monogenic disease mapping. Genes (Basel) 2014; 5:792-803. [PMID: 25198531 PMCID: PMC4198931 DOI: 10.3390/genes5030792] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Revised: 08/29/2014] [Accepted: 09/01/2014] [Indexed: 12/18/2022] Open
Abstract
The successful completion of the Human Genome Project (HGP) was an unprecedented scientific advance that has become an invaluable resource in the search for genes that cause monogenic and common (polygenic) diseases. Prior to the HGP, linkage analysis had successfully mapped many disease genes for monogenic disorders; however, the limitations of this approach were particularly evident for identifying causative genes in rare genetic disorders affecting lifespan and/or reproductive fitness, such as skeletal dysplasias. In this review, we illustrate the challenges of mapping disease genes in such conditions through the ultra-rare disorder fibrodysplasia ossificans progressiva (FOP) and we discuss the advances that are being made through current massively parallel (“next generation”) sequencing (MPS) technologies.
Collapse
|
41
|
Tae H, Karunasena E, Bavarva JH, McIver LJ, Garner HR. Large scale comparison of non-human sequences in human sequencing data. Genomics 2014; 104:453-8. [PMID: 25173571 DOI: 10.1016/j.ygeno.2014.08.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Revised: 08/17/2014] [Accepted: 08/19/2014] [Indexed: 11/19/2022]
Abstract
Several studies have demonstrated that unmapped reads in next generation sequencing data could be used to identify infectious agents or structural variants, but there has been no intensive effort to analyze and classify all non-human sequences found in individual large data sets. To identify commonality in non-human sequences by infectious agents and putative contamination events, we analyzed non-human sequences in 150 genomic sequencing data files from the 1000 Genomes Project and observed that 0.13% of reads on average showed similarities to non-human genomes. We compared results among different sample groups divided based on ethnicities, sequencing centers and enrichment methods (whole genome sequencing vs. exome sequencing) and found that sequencing centers had specific signatures of contaminating genomes as 'time stamps'. We also observed many unmapped reads that falsely indicated contamination because of the high similarity of human sequences to sequences in non-human genome assemblies such as mouse and Nicotiana.
Collapse
Affiliation(s)
- Hongseok Tae
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Enusha Karunasena
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Jasmin H Bavarva
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Lauren J McIver
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Harold R Garner
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
42
|
Yan Y, Yi G, Sun C, Qu L, Yang N. Genome-wide characterization of insertion and deletion variation in chicken using next generation sequencing. PLoS One 2014; 9:e104652. [PMID: 25133774 PMCID: PMC4136736 DOI: 10.1371/journal.pone.0104652] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 07/10/2014] [Indexed: 12/30/2022] Open
Abstract
Insertion and deletion (INDEL) is one of the main events contributing to genetic and phenotypic diversity, which receives less attention than SNP and large structural variation. To gain a better knowledge of INDEL variation in chicken genome, we applied next generation sequencing on 12 diverse chicken breeds at an average effective depth of 8.6. Over 1.3 million non-redundant short INDELs (1-49 bp) were obtained, the vast majority (92.48%) of which were novel. Follow-up validation assays confirmed that most (88.00%) of the randomly selected INDELs represent true variations. The majority (95.76%) of INDELs were less than 10 bp. Both the detected number and affected bases were larger for deletions than insertions. In total, INDELs covered 3.8 Mbp, corresponding to 0.36% of the chicken genome. The average genomic INDEL density was estimated as 0.49 per kb. INDELs were ubiquitous and distributed in a non-uniform fashion across chromosomes, with lower INDEL density in micro-chromosomes than in others, and some functional regions like exons and UTRs were prone to less INDELs than introns and intergenic regions. Nearly 620,253 INDELs fell in genic regions, 1,765 (0.28%) of which located in exons, spanning 1,358 (7.56%) unique Ensembl genes. Many of them are associated with economically important traits and some are the homologues of human disease-related genes. We demonstrate that sequencing multiple individuals at a medium depth offers a promising way for reliable identification of INDELs. The coding INDELs are valuable candidates for further elucidation of the association between genotypes and phenotypes. The chicken INDELs revealed by our study can be useful for future studies, including development of INDEL markers, construction of high density linkage map, INDEL arrays design, and hopefully, molecular breeding programs in chicken.
Collapse
Affiliation(s)
- Yiyuan Yan
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Guoqiang Yi
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Congjiao Sun
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lujiang Qu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Ning Yang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
43
|
Bahlo M, Tankard R, Lukic V, Oliver KL, Smith KR. Using familial information for variant filtering in high-throughput sequencing studies. Hum Genet 2014; 133:1331-41. [PMID: 25129038 PMCID: PMC4185103 DOI: 10.1007/s00439-014-1479-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 08/07/2014] [Indexed: 12/30/2022]
Abstract
High-throughput sequencing studies (HTS) have been highly successful in identifying the genetic causes of human disease, particularly those following Mendelian inheritance. Many HTS studies to date have been performed without utilizing available family relationships between samples. Here, we discuss the many merits and occasional pitfalls of using identity by descent information in conjunction with HTS studies. These methods are not only applicable to family studies but are also useful in cohorts of apparently unrelated, ‘sporadic’ cases and small families underpowered for linkage and allow inference of relationships between individuals. Incorporating familial/pedigree information not only provides powerful filtering options for the extensive variant lists that are usually produced by HTS but also allows valuable quality control checks, insights into the genetic model and the genotypic status of individuals of interest. In particular, these methods are valuable for challenging discovery scenarios in HTS analysis, such as in the study of populations poorly represented in variant databases typically used for filtering, and in the case of poor-quality HTS data.
Collapse
Affiliation(s)
- Melanie Bahlo
- The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia,
| | | | | | | | | |
Collapse
|
44
|
Noninvasive prenatal diagnosis of common aneuploidies by semiconductor sequencing. Proc Natl Acad Sci U S A 2014; 111:7415-20. [PMID: 24799683 DOI: 10.1073/pnas.1321997111] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Massively parallel sequencing (MPS) of cell-free fetal DNA from maternal plasma has revolutionized our ability to perform noninvasive prenatal diagnosis. This approach avoids the risk of fetal loss associated with more invasive diagnostic procedures. The present study developed an effective method for noninvasive prenatal diagnosis of common chromosomal aneuploidies using a benchtop semiconductor sequencing platform (SSP), which relies on the MPS platform but offers advantages over existing noninvasive screening techniques. A total of 2,275 pregnant subjects was included in the study; of these, 515 subjects who had full karyotyping results were used in a retrospective analysis, and 1,760 subjects without karyotyping were analyzed in a prospective study. In the retrospective study, all 55 fetal trisomy 21 cases were identified using the SSP with a sensitivity and specificity of 99.94% and 99.46%, respectively. The SSP also detected 16 trisomy 18 cases with 100% sensitivity and 99.24% specificity and 3 trisomy 13 cases with 100% sensitivity and 100% specificity. Furthermore, 15 fetuses with sex chromosome aneuploidies (10 45,X, 2 47,XYY, 2 47,XXX, and 1 47,XXY) were detected. In the prospective study, nine fetuses with trisomy 21, three with trisomy 18, three with trisomy 13, and one with 45,X were detected. To our knowledge, this is the first large-scale clinical study to systematically identify chromosomal aneuploidies based on cell-free fetal DNA using the SSP and provides an effective strategy for large-scale noninvasive screening for chromosomal aneuploidies in a clinical setting.
Collapse
|
45
|
Effective filtering strategies to improve data quality from population-based whole exome sequencing studies. BMC Bioinformatics 2014; 15:125. [PMID: 24884706 PMCID: PMC4098776 DOI: 10.1186/1471-2105-15-125] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Accepted: 04/16/2014] [Indexed: 12/12/2022] Open
Abstract
Background Genotypes generated in next generation sequencing studies contain errors which can significantly impact the power to detect signals in common and rare variant association tests. These genotyping errors are not explicitly filtered by the standard GATK Variant Quality Score Recalibration (VQSR) tool and thus remain a source of errors in whole exome sequencing (WES) projects that follow GATK’s recommended best practices. Therefore, additional data filtering methods are required to effectively remove these errors before performing association analyses with complex phenotypes. Here we empirically derive thresholds for genotype and variant filters that, when used in conjunction with the VQSR tool, achieve higher data quality than when using VQSR alone. Results The detailed filtering strategies improve the concordance of sequenced genotypes with array genotypes from 99.33% to 99.77%; improve the percent of discordant genotypes removed from 10.5% to 69.5%; and improve the Ti/Tv ratio from 2.63 to 2.75. We also demonstrate that managing batch effects by separating samples based on different target capture and sequencing chemistry protocols results in a final data set containing 40.9% more high-quality variants. In addition, imputation is an important component of WES studies and is used to estimate common variant genotypes to generate additional markers for association analyses. As such, we demonstrate filtering methods for imputed data that improve genotype concordance from 79.3% to 99.8% while removing 99.5% of discordant genotypes. Conclusions The described filtering methods are advantageous for large population-based WES studies designed to identify common and rare variation associated with complex diseases. Compared to data processed through standard practices, these strategies result in substantially higher quality data for common and rare association analyses.
Collapse
|
46
|
Bodian DL, McCutcheon JN, Kothiyal P, Huddleston KC, Iyer RK, Vockley JG, Niederhuber JE. Germline variation in cancer-susceptibility genes in a healthy, ancestrally diverse cohort: implications for individual genome sequencing. PLoS One 2014; 9:e94554. [PMID: 24728327 PMCID: PMC3984285 DOI: 10.1371/journal.pone.0094554] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 02/17/2014] [Indexed: 01/05/2023] Open
Abstract
Technological advances coupled with decreasing costs are bringing whole genome and whole exome sequencing closer to routine clinical use. One of the hurdles to clinical implementation is the high number of variants of unknown significance. For cancer-susceptibility genes, the difficulty in interpreting the clinical relevance of the genomic variants is compounded by the fact that most of what is known about these variants comes from the study of highly selected populations, such as cancer patients or individuals with a family history of cancer. The genetic variation in known cancer-susceptibility genes in the general population has not been well characterized to date. To address this gap, we profiled the nonsynonymous genomic variation in 158 genes causally implicated in carcinogenesis using high-quality whole genome sequences from an ancestrally diverse cohort of 681 healthy individuals. We found that all individuals carry multiple variants that may impact cancer susceptibility, with an average of 68 variants per individual. Of the 2,688 allelic variants identified within the cohort, most are very rare, with 75% found in only 1 or 2 individuals in our population. Allele frequencies vary between ancestral groups, and there are 21 variants for which the minor allele in one population is the major allele in another. Detailed analysis of a selected subset of 5 clinically important cancer genes, BRCA1, BRCA2, KRAS, TP53, and PTEN, highlights differences between germline variants and reported somatic mutations. The dataset can serve a resource of genetic variation in cancer-susceptibility genes in 6 ancestry groups, an important foundation for the interpretation of cancer risk from personal genome sequences.
Collapse
Affiliation(s)
- Dale L. Bodian
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Justine N. McCutcheon
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Prachi Kothiyal
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Kathi C. Huddleston
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Ramaswamy K. Iyer
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Joseph G. Vockley
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
- * E-mail:
| | - John E. Niederhuber
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| |
Collapse
|
47
|
Alzu'bi A, Zhou L, Watzlaf V. Personal genomic information management and personalized medicine: challenges, current solutions, and roles of HIM professionals. PERSPECTIVES IN HEALTH INFORMATION MANAGEMENT 2014; 11:1c. [PMID: 24808804 PMCID: PMC3995490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In recent years, the term personalized medicine has received more and more attention in the field of healthcare. The increasing use of this term is closely related to the astonishing advancement in DNA sequencing technologies and other high-throughput biotechnologies. A large amount of personal genomic data can be generated by these technologies in a short time. Consequently, the needs for managing, analyzing, and interpreting these personal genomic data to facilitate personalized care are escalated. In this article, we discuss the challenges for implementing genomics-based personalized medicine in healthcare, current solutions to these challenges, and the roles of health information management (HIM) professionals in genomics-based personalized medicine.
Collapse
Affiliation(s)
- Amal Alzu'bi
- The Department of Health Information Management at the University of Pittsburgh in Pittsburgh, PA
| | - Leming Zhou
- The Department of Health Information Management at the University of Pittsburgh in Pittsburgh, PA
| | - Valerie Watzlaf
- The Department of Health Information Management at the University of Pittsburgh in Pittsburgh, PA
| |
Collapse
|
48
|
Abstract
Next-generation sequencing (NGS) has enabled whole-exome and whole-genome sequencing of tumors for causative mutations, allowing for more accurate targeting of therapies. In the process of sequencing the tumor, comparisons to the germline genome may identify variants associated with susceptibility to cancer as well as other hereditary diseases. Already, the combination of massively parallel sequencing and selective capture approaches has facilitated efficient simultaneous genetic analysis (multiplex testing) of large numbers of candidate genes. As the field of oncology incorporates NGS approaches into tumor and germline analyses, it has become clear that the ability to achieve high-throughput genotyping surpasses our current ability to interpret and appropriately apply the vast amounts of data generated from such technologies. A review of the current state of knowledge of rare and common genetic variants associated with cancer risk or treatment outcome reveals significant progress, as well as a number of challenges associated with the clinical translation of these discoveries. The combined efforts of oncologists, genetic counselors, and cancer geneticists will be required to drive the paradigm shift toward personalized or precision medicine and to ensure the incorporation of NGS technologies into the practice of preventive oncology.
Collapse
Affiliation(s)
- Zsofia K. Stadler
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| | - Kasmintan A. Schrader
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| | - Joseph Vijai
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| | - Mark E. Robson
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| | - Kenneth Offit
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| |
Collapse
|
49
|
Esteban-Jurado C, Garre P, Vila M, Lozano JJ, Pristoupilova A, Beltrán S, Abulí A, Muñoz J, Balaguer F, Ocaña T, Castells A, Piqué JM, Carracedo A, Ruiz-Ponte C, Bessa X, Andreu M, Bujanda L, Caldés T, Castellví-Bel S. New genes emerging for colorectal cancer predisposition. World J Gastroenterol 2014; 20:1961-1971. [PMID: 24587672 PMCID: PMC3934466 DOI: 10.3748/wjg.v20.i8.1961] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Revised: 12/20/2013] [Accepted: 01/15/2014] [Indexed: 02/06/2023] Open
Abstract
Colorectal cancer (CRC) is one of the most frequent neoplasms and an important cause of mortality in the developed world. This cancer is caused by both genetic and environmental factors although 35% of the variation in CRC susceptibility involves inherited genetic differences. Mendelian syndromes account for about 5% of the total burden of CRC, with Lynch syndrome and familial adenomatous polyposis the most common forms. Excluding hereditary forms, there is an important fraction of CRC cases that present familial aggregation for the disease with an unknown germline genetic cause. CRC can be also considered as a complex disease taking into account the common disease-commom variant hypothesis with a polygenic model of inheritance where the genetic components of common complex diseases correspond mostly to variants of low/moderate effect. So far, 30 common, low-penetrance susceptibility variants have been identified for CRC. Recently, new sequencing technologies including exome- and whole-genome sequencing have permitted to add a new approach to facilitate the identification of new genes responsible for human disease predisposition. By using whole-genome sequencing, germline mutations in the POLE and POLD1 genes have been found to be responsible for a new form of CRC genetic predisposition called polymerase proofreading-associated polyposis.
Collapse
|
50
|
Puckelwartz MJ, Pesce LL, Nelakuditi V, Dellefave-Castillo L, Golbus JR, Day SM, Cappola TP, Dorn GW, Foster IT, McNally EM. Supercomputing for the parallelization of whole genome analysis. Bioinformatics 2014; 30:1508-13. [PMID: 24526712 DOI: 10.1093/bioinformatics/btu071] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION The declining cost of generating DNA sequence is promoting an increase in whole genome sequencing, especially as applied to the human genome. Whole genome analysis requires the alignment and comparison of raw sequence data, and results in a computational bottleneck because of limited ability to analyze multiple genomes simultaneously. RESULTS We now adapted a Cray XE6 supercomputer to achieve the parallelization required for concurrent multiple genome analysis. This approach not only markedly speeds computational time but also results in increased usable sequence per genome. Relying on publically available software, the Cray XE6 has the capacity to align and call variants on 240 whole genomes in ∼50 h. Multisample variant calling is also accelerated. AVAILABILITY AND IMPLEMENTATION The MegaSeq workflow is designed to harness the size and memory of the Cray XE6, housed at Argonne National Laboratory, for whole genome analysis in a platform designed to better match current and emerging sequencing volume.
Collapse
Affiliation(s)
- Megan J Puckelwartz
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| | - Lorenzo L Pesce
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| | - Viswateja Nelakuditi
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| | - Lisa Dellefave-Castillo
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| | - Jessica R Golbus
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| | - Sharlene M Day
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| | - Thomas P Cappola
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| | - Gerald W Dorn
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| | - Ian T Foster
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| | - Elizabeth M McNally
- Department of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USADepartment of Medicine, Computation Institute and Argonne National Laboratory, 9700 S. Cass Ave. Argonne, IL 60439, USA, Department of Human Genetics, The University of Chicago, 5841 S. Maryland Ave Chicago, IL 60637, USA, Department of Internal Medicine, The University of Michigan, 1150 W Medical Center Dr. Ann Arbor, MI 48109, USA, Perelman School of Medicine, Penn Cardiovascular Institute and Department of Medicine, University of Pennsylvania, 3400 Civic Center Blvd. Philadelphia, PA 19104, USA and Washington University School of Medicine, 660 S. Euclid Ave. St. Louis, MO 63110, USA
| |
Collapse
|