1
|
Hirai J, Hsiao ST, Yeh HM, Nishikawa J. Population panmixia of the pelagic shrimp Lucensosergia lucens between Japanese and Taiwanese waters in the western North Pacific. Sci Rep 2025; 15:7040. [PMID: 40044689 PMCID: PMC11882780 DOI: 10.1038/s41598-025-91208-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 02/18/2025] [Indexed: 03/09/2025] Open
Abstract
The pelagic shrimp Lucensosergia lucens is a commercially important species in Japan and Taiwan; however, a recent significant decline in L. lucens catch has been reported in Suruga Bay, Japan. In the present study, multiple molecular approaches were used to understand the population structure of L. lucens in Japanese and Taiwanese waters. Our analysis of mitochondrial cytochrome c oxidase subunit I and control region obtained by Sanger sequencing showed no evidence of different population structures, contrary to the previous study based on the control region. Genome-wide single-nucleotide polymorphism (SNP) analysis using multiplexed inter-simple sequence repeats genotyping by sequencing revealed panmixia in Japanese and Taiwanese populations. The contemporary migration rates estimated from the SNP data suggest that the Kuroshio Current plays a key role in L. lucens transportation from Taiwan to Japan. Additionally, mitogenome sequences obtained by genome skimming showed no region-specific genetic lineages in Japan or Taiwan. The results obtained by multiple molecular approaches suggested that L. lucens is widely distributed with a dispersal capacity in the Kuroshio and adjacent regions in the western North Pacific. Because apparent panmixia of L. lucens was observed in Japanese and Taiwanese waters, international cooperation is needed for the sustainable fishing of this shrimp.
Collapse
Affiliation(s)
- Junya Hirai
- Atmosphere and Ocean Research Institute, The University of Tokyo, Kashiwa, Japan.
| | - Sheng-Tai Hsiao
- Fisheries Research Institute, Ministry of Agriculture, Keelung, Taiwan
| | - Hsin-Ming Yeh
- Fisheries Research Institute, Ministry of Agriculture, Keelung, Taiwan
| | - Jun Nishikawa
- School of Marine Science and Technology, Tokai University, Shizuoka, Japan.
| |
Collapse
|
2
|
Eberth S, Koblitz J, Steenpaß L, Pommerenke C. Refined variant calling pipeline on RNA-seq data of breast cancer cell lines without matched-normal samples. BMC Res Notes 2025; 18:67. [PMID: 39955561 PMCID: PMC11829467 DOI: 10.1186/s13104-025-07140-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 02/04/2025] [Indexed: 02/17/2025] Open
Abstract
OBJECTIVE RNA-seq delivers valuable insights both to transcriptional patterns and mutational landscapes for transcribed genes. However, as tumour cell lines frequently lack their matched-normal counterpart, variant calling without the paired normal sample is still challenging. In order to exclude variants of common genetic variation without a matched-normal control, filtering strategies need to be developed to identify tumour relevant variants in cell lines. RESULTS Here, variants of 29 breast cancer cell lines were called on RNA-seq data via HaplotypeCaller. Low read depth sites, RNA-edit sites, and low complexity regions in coding regions were excluded. Common variants were filtered using 1000 genomes, gnomAD, and dbSNP data. Starting from hundred thousands of single nucleotide variants and small insertions and deletions, about thousand variants remained after filtering for each sample. Extracted variants were validated against the Catalogue of Somatic Mutations in Cancer (COSMIC) for 10 cell lines included in both data sets. Approximately half of the COSMIC variants were successfully called. Importantly, missing variants could mainly be attributed to sites with low read depth. Moreover, filtered variants also included all 10 cancer gene census COSMIC variants, a condensed hallmark variant set.
Collapse
Affiliation(s)
- Sonja Eberth
- Human and Animal Cell Lines, Leibniz-Institute DSMZ-DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Julia Koblitz
- Bioinformatics, IT and Databases, Leibniz-Institute DSMZ-DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Laura Steenpaß
- Human and Animal Cell Lines, Leibniz-Institute DSMZ-DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, 38124, Braunschweig, Germany
- Zoological Institute, Technische Universität Braunschweig, 38106, Braunschweig, Germany
| | - Claudia Pommerenke
- Bioinformatics, IT and Databases, Leibniz-Institute DSMZ-DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, 38124, Braunschweig, Germany.
| |
Collapse
|
3
|
Martínez del Río J, Menéndez-Arias L. Next-Generation Sequencing Methods to Determine the Accuracy of Retroviral Reverse Transcriptases: Advantages and Limitations. Viruses 2025; 17:173. [PMID: 40006928 PMCID: PMC11861041 DOI: 10.3390/v17020173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2024] [Revised: 01/24/2025] [Accepted: 01/24/2025] [Indexed: 02/27/2025] Open
Abstract
Retroviruses, like other RNA viruses, mutate at very high rates and exist as genetically heterogeneous populations. The error-prone activity of viral reverse transcriptase (RT) is largely responsible for the observed variability, most notably in HIV-1. In addition, RTs are widely used in biotechnology to detect RNAs and to clone expressed genes, among many other applications. The fidelity of retroviral RTs has been traditionally analyzed using enzymatic (gel-based) or reporter-based assays. However, these methods are laborious and have important limitations. The development of next-generation sequencing (NGS) technologies opened the possibility of obtaining reverse transcription error rates from a large number of sequences, although appropriate protocols had to be developed. In this review, we summarize the developments in this field that allowed the determination of RNA-dependent DNA synthesis error rates for different RTs (viral and non-viral), including methods such as PRIMER IDs, REP-SEQ, ARC-SEQ, CIR-SEQ, SMRT-SEQ and ROLL-SEQ. Their advantages and limitations are discussed. Complementary DNA (cDNA) synthesis error rates obtained in different studies, using RTs and RNAs of diverse origins, are presented and compared. Future improvements in methodological pipelines will be needed for the precise identification of mutations in the RNA template, including modified bases.
Collapse
Affiliation(s)
- Javier Martínez del Río
- Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas & Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049 Madrid, Spain
| | - Luis Menéndez-Arias
- Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas & Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049 Madrid, Spain
| |
Collapse
|
4
|
Fuchs SA, Hülse L, Tamayo T, Kolbe-Busch S, Pfeffer K, Dilthey AT. NanoCore: core-genome-based bacterial genomic surveillance and outbreak detection in healthcare facilities from Nanopore and Illumina data. mSystems 2024; 9:e0108024. [PMID: 39373471 PMCID: PMC11575142 DOI: 10.1128/msystems.01080-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 09/16/2024] [Indexed: 10/08/2024] Open
Abstract
Genomic surveillance enables the early detection of pathogen transmission in healthcare facilities and contributes to the reduction of substantial patient harm. Fast turnaround times, flexible multiplexing, and low capital requirements make Nanopore sequencing well suited for genomic surveillance purposes; the analysis of Nanopore data, however, can be challenging. We present NanoCore, a user-friendly method for Nanopore-based genomic surveillance in healthcare facilities, enabling the calculation and visualization of cgMLST-like (core-genome multilocus sequence typing) sample distances directly from unassembled Nanopore reads. NanoCore implements a mapping, variant calling, and multilevel filtering strategy and also supports the analysis of Illumina data. We validated NanoCore on two 24-isolate data sets of methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant Enterococcus faecium (VRE). In the Nanopore-only mode, NanoCore-based pairwise distances between closely related isolates were near-identical to Illumina-based SeqSphere+ distances, a gold standard commercial method (average differences of 0.75 and 0.81 alleles for MRSA and VRE; sd = 0.98 and 1.00), and gave an identical clustering into closely related and non-closely related isolates. In the "hybrid" mode, in which only Nanopore data are used for some isolates and only Illumina data for others, increased average pairwise isolate distance differences were observed (average differences of 3.44 and 1.95 for MRSA and VRE, respectively; sd = 2.76 and 1.34), while clustering results remained identical. NanoCore is computationally efficient (<15 hours of wall time for the analysis of a 24-isolate data set on a workstation), available as free software, and supports installation via conda. In conclusion, NanoCore enables the effective use of the Nanopore technology for bacterial pathogen surveillance in healthcare facilities. IMPORTANCE Genomic surveillance involves sequencing the genomes and measuring the relatedness of bacteria from different patients or locations in the same healthcare facility, enabling an improved understanding of pathogen transmission pathways and the detection of "silent" outbreaks that would otherwise go undetected. It has become an indispensable tool for the detection and prevention of healthcare-associated infections and is routinely applied by many healthcare institutions. The earlier an outbreak or transmission chain is detected, the better; in this context, the Oxford Nanopore sequencing technology has important potential advantages over traditionally used short-read sequencing technologies, because it supports "real-time" data generation and the cost-effective "on demand" sequencing of small numbers of bacterial isolates. The analysis of Nanopore sequencing data, however, can be challenging. We present NanoCore, a user-friendly software for genomic surveillance that works directly based on Nanopore sequencing reads in FASTQ format, and demonstrate that its accuracy is equivalent to traditional gold standard short read-based analyses.
Collapse
Affiliation(s)
- Sebastian A Fuchs
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University, Düsseldorf, Germany
| | - Lisanna Hülse
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University, Düsseldorf, Germany
| | - Teresa Tamayo
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University, Düsseldorf, Germany
| | - Susanne Kolbe-Busch
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University, Düsseldorf, Germany
| | - Klaus Pfeffer
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University, Düsseldorf, Germany
| | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
5
|
Martínez Del Río J, Frutos-Beltrán E, Sebastián-Martín A, Lasala F, Yasukawa K, Delgado R, Menéndez-Arias L. HIV-1 Reverse Transcriptase Error Rates and Transcriptional Thresholds Based on Single-strand Consensus Sequencing of Target RNA Derived From In Vitro-transcription and HIV-infected Cells. J Mol Biol 2024; 436:168815. [PMID: 39384034 DOI: 10.1016/j.jmb.2024.168815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 10/02/2024] [Accepted: 10/02/2024] [Indexed: 10/11/2024]
Abstract
Nucleotide incorporation and lacZ-based forward mutation assays have been widely used to determine the accuracy of reverse transcriptases (RTs) in RNA-dependent DNA polymerization reactions. However, they involve quite complex and laborious procedures, and cannot provide accurate error rates. Recently, NGS-based methods using barcodes opened the possibility of detecting all errors introduced by the RT, although their widespread use is limited by cost, due to the large size of libraries to be sequenced. In this study, we describe a novel and relatively simple NGS assay based on single-strand consensus sequencing that provides robust results with a relatively small number of raw sequences (around 60 Mb). The method has been validated by determining the error rate of HIV-1 (BH10 strain) RT using the HIV-1 protease-coding sequence as target. HIV-1 reverse transcription error rates in standard conditions (37 °C/3 mM Mg2+) using an in vitro-transcribed RNA were around 7.3 × 10-5. In agreement with previous reports, an 8-fold increase in RT's accuracy was observed after reducing Mg2+ concentration to 0.5 mM. The fidelity of HIV-1 RT was also higher at 50 °C than at 37 °C (error rate 1.5 × 10-5). Interestingly, error rates obtained with HIV-1 RNA from infected cells as template of the reverse transcription at 3 mM Mg2+ (7.4 × 10-5) were similar to those determined with the in vitro-transcribed RNA, and were reduced to 1.8 × 10-5 in the presence of 0.5 mM Mg2+. Values obtained at low magnesium concentrations were modestly higher than the transcription error rates calculated for human cells, thereby suggesting a realistic transcriptional threshold for our NGS-based error rate determinations.
Collapse
Affiliation(s)
- Javier Martínez Del Río
- Centro de Biología Molecular Severo Ochoa (Consejo Superior de Investigaciones Científicas and Universidad Autónoma de Madrid), Madrid 28049, Spain
| | - Estrella Frutos-Beltrán
- Centro de Biología Molecular Severo Ochoa (Consejo Superior de Investigaciones Científicas and Universidad Autónoma de Madrid), Madrid 28049, Spain
| | - Alba Sebastián-Martín
- Centro de Biología Molecular Severo Ochoa (Consejo Superior de Investigaciones Científicas and Universidad Autónoma de Madrid), Madrid 28049, Spain
| | - Fátima Lasala
- Laboratory of Molecular Microbiology, Instituto de Investigación Hospital 12 de Octubre (lmas12), Madrid 28041, Spain
| | - Kiyoshi Yasukawa
- Division of Food Science and Biotechnology, Graduate School of Agriculture, Kyoto University, Kitashirakawa, Sakyo-ku, Kyoto 606-8502, Japan
| | - Rafael Delgado
- Laboratory of Molecular Microbiology, Instituto de Investigación Hospital 12 de Octubre (lmas12), Madrid 28041, Spain; CIBERINFEC, Instituto de Salud Carlos III, Madrid, Spain; School of Medicine, Universidad Complutense, Madrid 28040, Spain
| | - Luis Menéndez-Arias
- Centro de Biología Molecular Severo Ochoa (Consejo Superior de Investigaciones Científicas and Universidad Autónoma de Madrid), Madrid 28049, Spain.
| |
Collapse
|
6
|
Slapnik B, Šket R, Črepinšek K, Tesovnik T, Bizjan BJ, Kovač J. The quality and detection limits of mitochondrial heteroplasmy by long read nanopore sequencing. Sci Rep 2024; 14:26778. [PMID: 39501054 PMCID: PMC11538439 DOI: 10.1038/s41598-024-78270-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 10/29/2024] [Indexed: 11/08/2024] Open
Abstract
This study evaluates long-read and short-read sequencing for mitochondrial DNA (mtDNA) heteroplasmy detection. 592,315 bootstrapped datasets generated from two single-nucleotide mismatched ultra-deep sequenced mtDNA samples were used to assess basecalling error and accuracy, limit of heteroplasmy detection, and heteroplasmy detection across various coverage depths. Results showed high Phred scores of data with GC-rich sequence bias for long reads. Limit of detection of 12% heteroplasmy was identified, showing strong correlation (R2 ≥ 0.955) with expected heteroplasmy but underreporting tendency of high-level variants. Nanopore sequencing shows potential for direct applicability in mitochondrial diseases diagnostics, but stringent validation processes to ensure diagnostic result quality are required.
Collapse
Affiliation(s)
- Barbara Slapnik
- Clinical Institute for Special Laboratory Diagnostics, University Children's Hospital, University Medical Centre Ljubljana, Ljubljana, 1000, Slovenia
- Faculty of Medicine, University of Ljubljana, Ljubljana, 1000, Slovenia
| | - Robert Šket
- Clinical Institute for Special Laboratory Diagnostics, University Children's Hospital, University Medical Centre Ljubljana, Ljubljana, 1000, Slovenia
- Faculty of Medicine, University of Ljubljana, Ljubljana, 1000, Slovenia
| | - Klementina Črepinšek
- Clinical Institute for Special Laboratory Diagnostics, University Children's Hospital, University Medical Centre Ljubljana, Ljubljana, 1000, Slovenia
- Faculty of Medicine, University of Ljubljana, Ljubljana, 1000, Slovenia
| | - Tine Tesovnik
- Clinical Institute for Special Laboratory Diagnostics, University Children's Hospital, University Medical Centre Ljubljana, Ljubljana, 1000, Slovenia
- Faculty of Medicine, University of Ljubljana, Ljubljana, 1000, Slovenia
| | - Barbara Jenko Bizjan
- Clinical Institute for Special Laboratory Diagnostics, University Children's Hospital, University Medical Centre Ljubljana, Ljubljana, 1000, Slovenia
- Faculty of Medicine, University of Ljubljana, Ljubljana, 1000, Slovenia
| | - Jernej Kovač
- Clinical Institute for Special Laboratory Diagnostics, University Children's Hospital, University Medical Centre Ljubljana, Ljubljana, 1000, Slovenia.
- Faculty of Medicine, University of Ljubljana, Ljubljana, 1000, Slovenia.
| |
Collapse
|
7
|
Chen Y, Shen R, Feng X, Panageas K. Unlocking the Power of Multi-institutional Data: Integrating and Harmonizing Genomic Data Across Institutions. ARXIV 2024:arXiv:2402.00077v2. [PMID: 39575113 PMCID: PMC11581117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2024]
Abstract
Cancer is a complex disease driven by genomic alterations, and tumor sequencing is becoming a mainstay of clinical care for cancer patients. The emergence of multi-institution sequencing data presents a powerful resource for learning real-world evidence to enhance precision oncology. GENIE BPC, led by American Association for Cancer Research, establishes a unique database linking genomic data with clinical information for patients treated at multiple cancer centers. However, leveraging sequencing data from multiple institutions presents significant challenges. Variability in gene panels can lead to loss of information when analyses focus on genes common across panels. Additionally, differences in sequencing techniques and patient heterogeneity across institutions add complexity. High data dimensionality, sparse gene mutation patterns, and weak signals at the individual gene level further complicate matters. Motivated by these real-world challenges, we introduce the Bridge model. It uses a quantile-matched latent variable approach to derive integrated features to preserve information beyond common genes and maximize the utilization of all available data, while leveraging information sharing to enhance both learning efficiency and the model's capacity to generalize. By extracting harmonized and noise-reduced lower-dimensional latent variables, the true mutation pattern unique to each individual is captured. We assess model's performance and parameter estimation through extensive simulation studies. The extracted latent features from the Bridge model consistently excel in predicting patient survival across six cancer types in GENIE BPC data.
Collapse
Affiliation(s)
- Yuan Chen
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, U.S.A
| | - Ronglai Shen
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, U.S.A
| | - Xiwen Feng
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - Katherine Panageas
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, U.S.A
| |
Collapse
|
8
|
Cheng C, Cheng Q, Zhou W, Chen Y, Xiao P. Highly accurate single-color fluorogenic DNA decoding sequencing for mutational genotyping. J Pharm Biomed Anal 2024; 249:116397. [PMID: 39111245 DOI: 10.1016/j.jpba.2024.116397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 07/30/2024] [Accepted: 08/03/2024] [Indexed: 08/20/2024]
Abstract
We proposed a single-color fluorogenic DNA decoding sequencing method designed to improve sequencing accuracy, increase read length and throughput, as well as decrease scanning time. This method involves the incorporation of a mixture of four types of 3'-O-modified nucleotide reversible terminators into each reaction. Among them, two nucleotides are labeled with the same fluorophore, while the remaining two are unlabeled. Only one nucleotide can be extended in each reaction, and an encoding that partially defines base composition can be obtained. Through cyclic interrogation of a template twice with different nucleotide combinations, two sets of encodings are sequentially obtained, enabling the determination of the sequence. We demonstrate the feasibility of this method using established sequencing chemistry, achieving a cycle efficiency of approximately 99.5 %. Notably, this strategy exhibits remarkable efficacy in the detection and correction of sequencing errors, achieving a theoretical error rate of 0.00016 % at a sequencing depth of ×2, which is lower than Sanger sequencing. This method is theoretically compatible with the existing sequencing-by-synthesis (SBS) platforms, and the instrument is simpler, which may facilitate further reductions in sequencing costs, thereby broadening its applications in biology and medicine. Moreover, we demonstrate the capability to detect known mutation sites using information from only a single sequencing run. We validate this approach by accurately identifying a mutation site in the human mitochondrial DNA.
Collapse
Affiliation(s)
- Chu Cheng
- College of Medicine and Health Science, Wuhan Polytechnic University, Wuhan, China.
| | - Qingzhou Cheng
- College of Medicine and Health Science, Wuhan Polytechnic University, Wuhan, China
| | - Wei Zhou
- College of Medicine and Health Science, Wuhan Polytechnic University, Wuhan, China
| | - Yulong Chen
- College of Medicine and Health Science, Wuhan Polytechnic University, Wuhan, China
| | - Pengfeng Xiao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
9
|
Chen Y, Shen R, Feng X, Panageas K. Unlocking the power of multi-institutional data: Integrating and harmonizing genomic data across institutions. Biometrics 2024; 80:ujae146. [PMID: 39679742 PMCID: PMC11647914 DOI: 10.1093/biomtc/ujae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 09/20/2024] [Accepted: 11/12/2024] [Indexed: 12/17/2024]
Abstract
Cancer is a complex disease driven by genomic alterations, and tumor sequencing is becoming a mainstay of clinical care for cancer patients. The emergence of multi-institution sequencing data presents a powerful resource for learning real-world evidence to enhance precision oncology. GENIE BPC, led by American Association for Cancer Research, establishes a unique database linking genomic data with clinical information for patients treated at multiple cancer centers. However, leveraging sequencing data from multiple institutions presents significant challenges. Variability in gene panels can lead to loss of information when analyses focus on genes common across panels. Additionally, differences in sequencing techniques and patient heterogeneity across institutions add complexity. High data dimensionality, sparse gene mutation patterns, and weak signals at the individual gene level further complicate matters. Motivated by these real-world challenges, we introduce the Bridge model. It uses a quantile-matched latent variable approach to derive integrated features to preserve information beyond common genes and maximize the utilization of all available data, while leveraging information sharing to enhance both learning efficiency and the model's capacity to generalize. By extracting harmonized and noise-reduced lower-dimensional latent variables, the true mutation pattern unique to each individual is captured. We assess model's performance and parameter estimation through extensive simulation studies. The extracted latent features from the Bridge model consistently excel in predicting patient survival across six cancer types in GENIE BPC data.
Collapse
Affiliation(s)
- Yuan Chen
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10017, United States
| | - Ronglai Shen
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10017, United States
| | - Xiwen Feng
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Katherine Panageas
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10017, United States
| |
Collapse
|
10
|
Liu YY, Cheng K, Just R, Enke S, Bright JA. Sequencing-induced artefacts in NGS STR data. Forensic Sci Int Genet 2024; 72:103086. [PMID: 38897164 DOI: 10.1016/j.fsigen.2024.103086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/10/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024]
Abstract
Significant progress has been made in recent years in the development of techniques for Next Generation Sequencing (NGS), or Massively Parallel Sequencing (MPS), of forensically relevant short tandem repeat (STR) loci. However, as these technologies are investigated and adopted by forensic laboratories, new challenges unfold that require further scrutiny. In the analysis of DNA profiles generated using the MiSeq FGx sequencing system, we have observed noise sequences with relatively high readcounts that are challenging to distinguish from genuine alleles. These high read count noise sequences appear as allele sequences with one or a few substituted bases compared to a known allele sequence within the profile. An examination of ForenSeq DNA Signature Prep Kit STR noise sequences revealed that the substituted base of a parent allele can align to the same position on the sequence across noise sequences. This suggests that these substitution events occur at specific positions within the amplicon, resulting in multiple noise reads with substitutions at the same position. Mapping of the noise events onto the original raw read positions revealed a high number of events, or "noise spikes", occurring at specific positions within a given sequencing run. These noise spikes affected reads across the entire run, agnostic of locus or sample, while the position, occurrence, and amplitude of the spikes differed across runs. The majority of noise sequences with high read counts in a DNA profile were generated from base changes at these spike positions, and could be classified as "noise spike artefacts". In this paper we present evidence of the noise spike artefacts and their genesis during the sequencing process in the sequencing-by-synthesis (SBS) cycles, as well as the methods developed to detect them. The information and methods will assist laboratories with detecting noise spikes in MiSeq FGx sequencing runs, differentiating authentic allele sequences from noise spike artefacts, and developing protocols for analyst review and handling of MiSeq FGx data.
Collapse
Affiliation(s)
- Yao-Yuan Liu
- ESR Limited, Private Bag 92021, Auckland, New Zealand.
| | - Kevin Cheng
- ESR Limited, Private Bag 92021, Auckland, New Zealand
| | - Rebecca Just
- National Bioforensic Analysis Center, National Biodefense Analysis and Countermeasures Center, 8300 Research Plaza, Fort Detrick, MD, United States
| | - Sana Enke
- National Bioforensic Analysis Center, National Biodefense Analysis and Countermeasures Center, 8300 Research Plaza, Fort Detrick, MD, United States
| | | |
Collapse
|
11
|
Kotlarz K, Mielczarek M, Biecek P, Guldbrandtsen B, Szyda J. Exploring the impact of sequence context on errors in SNP genotype calling with whole genome sequencing data using AI-based autoencoder approach. NAR Genom Bioinform 2024; 6:lqae131. [PMID: 39318508 PMCID: PMC11420682 DOI: 10.1093/nargab/lqae131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 08/23/2024] [Accepted: 09/06/2024] [Indexed: 09/26/2024] Open
Abstract
A critical step in the analysis of whole genome sequencing data is variant calling. Despite its importance, variant calling is prone to errors. Our study investigated the association between incorrect single nucleotide polymorphism (SNP) calls and variant quality metrics and nucleotide context. In our study, incorrect SNPs were defined in 20 Holstein-Friesian cows by comparing their SNPs genotypes identified by whole genome sequencing with the IlluminaNovaSeq6000 and the EuroGMD50K genotyping microarray. The dataset was divided into the correct SNP set (666 333 SNPs) and the incorrect SNP set (4 557 SNPs). The training dataset consisted of only the correct SNPs, while the test dataset contained a balanced mix of all the incorrectly and correctly called SNPs. An autoencoder was constructed to identify systematically incorrect SNPs that were marked as outliers by a one-class support vector machine and isolation forest algorithms. The results showed that 59.53% (±0.39%) of the incorrect SNPs had systematic patterns, with the remainder being random errors. The frequent occurrence of the CGC 3-mer was due to mislabelling a call for C. Incorrect T instead of A call was associated with the presence of T in the neighbouring downstream position. These errors may arise due to the fluorescence patterns of nucleotide labelling.
Collapse
Affiliation(s)
- Krzysztof Kotlarz
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw 51-631, Poland
| | - Magda Mielczarek
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw 51-631, Poland
| | - Przemysław Biecek
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw 00-662, Poland
- Institute of Informatics, University of Warsaw, Warsaw 02-097, Poland
| | - Bernt Guldbrandtsen
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg C 1870, Denmark
| | - Joanna Szyda
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw 51-631, Poland
| |
Collapse
|
12
|
Severins I, Bastiaanssen C, Kim SH, Simons RB, van Noort J, Joo C. Single-molecule structural and kinetic studies across sequence space. Science 2024; 385:898-904. [PMID: 39172834 DOI: 10.1126/science.adn5968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 07/01/2024] [Indexed: 08/24/2024]
Abstract
At the core of molecular biology lies the intricate interplay between sequence, structure, and function. Single-molecule techniques provide in-depth dynamic insights into structure and function, but laborious assays impede functional screening of large sequence libraries. We introduce high-throughput Single-molecule Parallel Analysis for Rapid eXploration of Sequence space (SPARXS), integrating single-molecule fluorescence with next-generation sequencing. We applied SPARXS to study the sequence-dependent kinetics of the Holliday junction, a critical intermediate in homologous recombination. By examining the dynamics of millions of Holliday junctions, covering thousands of distinct sequences, we demonstrated the ability of SPARXS to uncover sequence patterns, evaluate sequence motifs, and construct thermodynamic models. SPARXS emerges as a versatile tool for untangling the mechanisms that underlie sequence-specific processes at the molecular scale.
Collapse
Affiliation(s)
- Ivo Severins
- Department of BioNanoScience, Kavli Institute of Nanoscience, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, the Netherlands
- Biological and Soft Matter Physics, Huygens-Kamerlingh Onnes Laboratory, Leiden University, Niels Bohrweg 2, 2333 CA Leiden, Netherlands
| | - Carolien Bastiaanssen
- Department of BioNanoScience, Kavli Institute of Nanoscience, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, the Netherlands
| | - Sung Hyun Kim
- Department of BioNanoScience, Kavli Institute of Nanoscience, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, the Netherlands
- Department of Physics, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Roy B Simons
- Department of BioNanoScience, Kavli Institute of Nanoscience, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, the Netherlands
| | - John van Noort
- Biological and Soft Matter Physics, Huygens-Kamerlingh Onnes Laboratory, Leiden University, Niels Bohrweg 2, 2333 CA Leiden, Netherlands
| | - Chirlmin Joo
- Department of BioNanoScience, Kavli Institute of Nanoscience, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, the Netherlands
- Department of Physics, Ewha Womans University, Seoul 03760, Republic of Korea
| |
Collapse
|
13
|
Jia H, Tan S, Zhang YE. Chasing Sequencing Perfection: Marching Toward Higher Accuracy and Lower Costs. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae024. [PMID: 38991976 PMCID: PMC11423848 DOI: 10.1093/gpbjnl/qzae024] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 01/25/2024] [Accepted: 01/29/2024] [Indexed: 07/13/2024]
Abstract
Next-generation sequencing (NGS), represented by Illumina platforms, has been an essential cornerstone of basic and applied research. However, the sequencing error rate of 1 per 1000 bp (10-3) represents a serious hurdle for research areas focusing on rare mutations, such as somatic mosaicism or microbe heterogeneity. By examining the high-fidelity sequencing methods developed in the past decade, we summarized three major factors underlying errors and the corresponding 12 strategies mitigating these errors. We then proposed a novel framework to classify 11 preexisting representative methods according to the corresponding combinatory strategies and identified three trends that emerged during methodological developments. We further extended this analysis to eight long-read sequencing methods, emphasizing error reduction strategies. Finally, we suggest two promising future directions that could achieve comparable or even higher accuracy with lower costs in both NGS and long-read sequencing.
Collapse
Affiliation(s)
- Hangxing Jia
- CAS Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Shengjun Tan
- CAS Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yong E Zhang
- CAS Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
14
|
Carter MH, Gribble J, Diller JR, Denison MR, Mirza SA, Chappell JD, Halasa NB, Ogden KM. Human Rotaviruses of Multiple Genotypes Acquire Conserved VP4 Mutations during Serial Passage. Viruses 2024; 16:978. [PMID: 38932271 PMCID: PMC11209247 DOI: 10.3390/v16060978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 06/06/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024] Open
Abstract
Human rotaviruses exhibit limited tropism and replicate poorly in most cell lines. Attachment protein VP4 is a key rotavirus tropism determinant. Previous studies in which human rotaviruses were adapted to cultured cells identified mutations in VP4. However, most such studies were conducted using only a single human rotavirus genotype. In the current study, we serially passaged 50 human rotavirus clinical specimens representing five of the genotypes most frequently associated with severe human disease, each in triplicate, three to five times in primary monkey kidney cells then ten times in the MA104 monkey kidney cell line. From 13 of the 50 specimens, we obtained 25 rotavirus antigen-positive lineages representing all five genotypes, which tended to replicate more efficiently in MA104 cells at late versus early passage. We used Illumina next-generation sequencing and analysis to identify variants that arose during passage. In VP4, variants encoded 28 mutations that were conserved for all P[8] rotaviruses and 12 mutations that were conserved for all five genotypes. These findings suggest there may be a conserved mechanism of human rotavirus adaptation to MA104 cells. In the future, such a conserved adaptation mechanism could be exploited to study human rotavirus biology or efficiently manufacture vaccines.
Collapse
Affiliation(s)
- Maximilian H. Carter
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jennifer Gribble
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Julia R. Diller
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Mark R. Denison
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Sara A. Mirza
- Centers for Disease Control and Prevention, Atlanta, GA 30329, USA
| | - James D. Chappell
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Natasha B. Halasa
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Kristen M. Ogden
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| |
Collapse
|
15
|
Chen H, Wang B, Cai L, Zhang Y, Shu Y, Liu W, Leng X, Zhai J, Niu B, Zhou Q, Cao S. The performance of homopolymer detection using dichromatic and tetrachromatic fluorogenic next-generation sequencing platforms. BMC Genomics 2024; 25:542. [PMID: 38822237 PMCID: PMC11140927 DOI: 10.1186/s12864-024-10474-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 05/29/2024] [Indexed: 06/02/2024] Open
Abstract
OBJECTIVES Homopolymer (HP) sequencing is error-prone in next-generation sequencing (NGS) assays, and may induce false insertion/deletions and substitutions. This study aimed to evaluate the performance of dichromatic and tetrachromatic fluorogenic NGS platforms when sequencing homopolymeric regions. RESULTS A HP-containing plasmid was constructed and diluted to serial frequencies (3%, 10%, 30%, 60%) to determine the performance of an MGISEQ-2000, MGISEQ-200, and NextSeq 2000 in HP sequencing. An evident negative correlation was observed between the detected frequencies of four nucleotide HPs and the HP length. Significantly decreased rates (P < 0.01) were found in all 8-mer HPs in all three NGS systems at all four expected frequencies, except in the NextSeq 2000 at 3%. With the application of a unique molecular identifier (UMI) pipeline, there were no differences between the detected frequencies of any HPs and the expected frequencies, except for poly-G 8-mers using the MGI 200 platform. UMIs improved the performance of all three NGS platforms in HP sequencing. CONCLUSIONS We first constructed an HP-containing plasmid based on an EGFR gene backbone to evaluate the performance of NGS platforms when sequencing homopolymeric regions. A highly comparable performance was observed between the MGISEQ-2000 and NextSeq 2000, and introducing UMIs is a promising approach to improve the performance of NGS platforms in sequencing homopolymeric regions.
Collapse
Affiliation(s)
- HuiJuan Chen
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
- WillingMed Technology Beijing Co., Ltd, Beijing, 100176, China
| | - Bing Wang
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China
| | - LiLi Cai
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China
| | - YiRan Zhang
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China
| | - YingShuang Shu
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China
| | - Wen Liu
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China
| | - Xue Leng
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China
| | - JinCheng Zhai
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China
| | - BeiFang Niu
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China.
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China.
- ChosenMed Technology (Zhejiang) Co. Ltd, Zhejiang, 311103, China.
| | - QiMing Zhou
- Beijing ChosenMed Clinical Laboratory Co. Ltd, Beijing, 100176, China.
- ChosenMed Technology (Zhejiang) Co. Ltd, Zhejiang, 311103, China.
| | - ShuNan Cao
- Polar Research Institute of China, Shanghai, 201209, China.
| |
Collapse
|
16
|
Muyas F, Sauer CM, Valle-Inclán JE, Li R, Rahbari R, Mitchell TJ, Hormoz S, Cortés-Ciriano I. De novo detection of somatic mutations in high-throughput single-cell profiling data sets. Nat Biotechnol 2024; 42:758-767. [PMID: 37414936 PMCID: PMC11098751 DOI: 10.1038/s41587-023-01863-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/07/2023] [Indexed: 07/08/2023]
Abstract
Characterization of somatic mutations at single-cell resolution is essential to study cancer evolution, clonal mosaicism and cell plasticity. Here, we describe SComatic, an algorithm designed for the detection of somatic mutations in single-cell transcriptomic and ATAC-seq (assay for transposase-accessible chromatin sequence) data sets directly without requiring matched bulk or single-cell DNA sequencing data. SComatic distinguishes somatic mutations from polymorphisms, RNA-editing events and artefacts using filters and statistical tests parameterized on non-neoplastic samples. Using >2.6 million single cells from 688 single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq) data sets spanning cancer and non-neoplastic samples, we show that SComatic detects mutations in single cells accurately, even in differentiated cells from polyclonal tissues that are not amenable to mutation detection using existing methods. Validated against matched genome sequencing and scRNA-seq data, SComatic achieves F1 scores between 0.6 and 0.7 across diverse data sets, in comparison to 0.2-0.4 for the second-best performing method. In summary, SComatic permits de novo mutational signature analysis, and the study of clonal heterogeneity and mutational burdens at single-cell resolution.
Collapse
Affiliation(s)
- Francesc Muyas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Carolin M Sauer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Jose Espejo Valle-Inclán
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Ruoyan Li
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Raheleh Rahbari
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Thomas J Mitchell
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, UK
- Department of Surgery, University of Cambridge, Cambridge, UK
| | - Sahand Hormoz
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Isidro Cortés-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.
| |
Collapse
|
17
|
Cao B, Zheng Y, Shao Q, Liu Z, Xie L, Zhao Y, Wang B, Zhang Q, Wei X. Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Rep 2024; 43:113699. [PMID: 38517891 DOI: 10.1016/j.celrep.2024.113699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/15/2023] [Accepted: 01/05/2024] [Indexed: 03/24/2024] Open
Abstract
Over the past decade, the rapid development of DNA synthesis and sequencing technologies has enabled preliminary use of DNA molecules for digital data storage, overcoming the capacity and persistence bottlenecks of silicon-based storage media. DNA storage has now been fully accomplished in the laboratory through existing biotechnology, which again demonstrates the viability of carbon-based storage media. However, the high cost and latency of data reconstruction pose challenges that hinder the practical implementation of DNA storage beyond the laboratory. In this article, we review existing advanced DNA storage methods, analyze the characteristics and performance of biotechnological approaches at various stages of data writing and reading, and discuss potential factors influencing DNA storage from the perspective of data reconstruction.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China; Centre for Frontier AI Research, Agency for Science, Technology, and Research (A(∗)STAR), 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| | - Qi Shao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Zhenlu Liu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Yunzhu Zhao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China.
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| |
Collapse
|
18
|
Schiffers S, Oberdoerffer S. ac4C: a fragile modification with stabilizing functions in RNA metabolism. RNA (NEW YORK, N.Y.) 2024; 30:583-594. [PMID: 38531654 PMCID: PMC11019744 DOI: 10.1261/rna.079948.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 02/09/2024] [Indexed: 03/28/2024]
Abstract
In recent years, concerted efforts to map and understand epitranscriptomic modifications in mRNA have unveiled new complexities in the regulation of gene expression. These studies cumulatively point to diverse functions in mRNA metabolism, spanning pre-mRNA processing, mRNA degradation, and translation. However, this emerging landscape is not without its intricacies and sources of discrepancies. Disparities in detection methodologies, divergent interpretations of functional outcomes, and the complex nature of biological systems across different cell types pose significant challenges. With a focus of N4-acetylcytidine (ac4C), this review endeavors to unravel conflicting narratives by examining the technological, biological, and methodological factors that have contributed to discrepancies and thwarted research progress. Our goal is to mitigate detection inconsistencies and establish a unified model to elucidate the contribution of ac4C to mRNA metabolism and cellular equilibrium.
Collapse
Affiliation(s)
- Sarah Schiffers
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, Maryland 20892, USA
| | - Shalini Oberdoerffer
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, Maryland 20892, USA
| |
Collapse
|
19
|
Cooley NP, Wright ES. Many purported pseudogenes in bacterial genomes are bona fide genes. BMC Genomics 2024; 25:365. [PMID: 38622536 PMCID: PMC11017572 DOI: 10.1186/s12864-024-10137-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/17/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Microbial genomes are largely comprised of protein coding sequences, yet some genomes contain many pseudogenes caused by frameshifts or internal stop codons. These pseudogenes are believed to result from gene degradation during evolution but could also be technical artifacts of genome sequencing or assembly. RESULTS Using a combination of observational and experimental data, we show that many putative pseudogenes are attributable to errors that are incorporated into genomes during assembly. Within 126,564 publicly available genomes, we observed that nearly identical genomes often substantially differed in pseudogene counts. Causal inference implicated assembler, sequencing platform, and coverage as likely causative factors. Reassembly of genomes from raw reads confirmed that each variable affects the number of putative pseudogenes in an assembly. Furthermore, simulated sequencing reads corroborated our observations that the quality and quantity of raw data can significantly impact the number of pseudogenes in an assembler dependent fashion. The number of unexpected pseudogenes due to internal stops was highly correlated (R2 = 0.96) with average nucleotide identity to the ground truth genome, implying relative pseudogene counts can be used as a proxy for overall assembly correctness. Applying our method to assemblies in RefSeq resulted in rejection of 3.6% of assemblies due to significantly elevated pseudogene counts. Reassembly from real reads obtained from high coverage genomes showed considerable variability in spurious pseudogenes beyond that observed with simulated reads, reinforcing the finding that high coverage is necessary to mitigate assembly errors. CONCLUSIONS Collectively, these results demonstrate that many pseudogenes in microbial genome assemblies are actually genes. Our results suggest that high read coverage is required for correct assembly and indicate an inflated number of pseudogenes due to internal stops is indicative of poor overall assembly quality.
Collapse
Affiliation(s)
- Nicholas P Cooley
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Erik S Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Evolutionary Biology and Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
20
|
Pommerenke C, Nagel S, Haake J, Koelz AL, Christgen M, Steenpass L, Eberth S. Molecular Characterization and Subtyping of Breast Cancer Cell Lines Provide Novel Insights into Cancer Relevant Genes. Cells 2024; 13:301. [PMID: 38391914 PMCID: PMC10886524 DOI: 10.3390/cells13040301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 01/26/2024] [Accepted: 02/02/2024] [Indexed: 02/24/2024] Open
Abstract
Continuous cell lines are important and commonly used in vitro models in breast cancer (BC) research. Selection of the appropriate model cell line is crucial and requires consideration of their molecular characteristics. To characterize BC cell line models in depth, we profiled a panel of 29 authenticated and publicly available BC cell lines by mRNA-sequencing, mutation analysis, and immunoblotting. Gene expression profiles separated BC cell lines in two major clusters that represent basal-like (mainly triple-negative BC) and luminal BC subtypes, respectively. HER2-positive cell lines were located within the luminal cluster. Mutation calling highlighted the frequent aberration of TP53 and BRCA2 in BC cell lines, which, therefore, share relevant characteristics with primary BC. Furthermore, we showed that the data can be used to find novel, potential oncogenic fusion transcripts, e.g., FGFR2::CRYBG1 and RTN4IP1::CRYBG1 in cell line MFM-223, and to elucidate the regulatory circuit of IRX genes and KLF15 as novel candidate tumor suppressor genes in BC. Our data indicated that KLF15 was activated by IRX1 and inhibited by IRX3. Moreover, KLF15 inhibited IRX1 in cell line HCC-1599. Each BC cell line carries unique molecular features. Therefore, the molecular characteristics of BC cell lines described here might serve as a valuable resource to improve the selection of appropriate models for BC research.
Collapse
Affiliation(s)
- Claudia Pommerenke
- Department of Bioinformatics, IT and Databases, Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany;
| | - Stefan Nagel
- Department of Human and Animal Cell Lines, Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany; (S.N.)
| | - Josephine Haake
- Department of Human and Animal Cell Lines, Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany; (S.N.)
| | - Anne Leena Koelz
- Department of Human and Animal Cell Lines, Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany; (S.N.)
| | - Matthias Christgen
- Institute of Pathology, Hannover Medical School, 30625 Hannover, Germany
| | - Laura Steenpass
- Department of Human and Animal Cell Lines, Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany; (S.N.)
- Zoological Institute, Technische Universität Braunschweig, 38106 Braunschweig, Germany
| | - Sonja Eberth
- Department of Human and Animal Cell Lines, Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany; (S.N.)
| |
Collapse
|
21
|
Wijeratne S, Gonzalez MEH, Roach K, Miller KE, Schieffer KM, Fitch JR, Leonard J, White P, Kelly BJ, Cottrell CE, Mardis ER, Wilson RK, Miller AR. Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity. BMC Genomics 2024; 25:122. [PMID: 38287261 PMCID: PMC10823626 DOI: 10.1186/s12864-024-10021-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 01/16/2024] [Indexed: 01/31/2024] Open
Abstract
BACKGROUND Cancers exhibit complex transcriptomes with aberrant splicing that induces isoform-level differential expression compared to non-diseased tissues. Transcriptomic profiling using short-read sequencing has utility in providing a cost-effective approach for evaluating isoform expression, although short-read assembly displays limitations in the accurate inference of full-length transcripts. Long-read RNA sequencing (Iso-Seq), using the Pacific Biosciences (PacBio) platform, can overcome such limitations by providing full-length isoform sequence resolution which requires no read assembly and represents native expressed transcripts. A constraint of the Iso-Seq protocol is due to fewer reads output per instrument run, which, as an example, can consequently affect the detection of lowly expressed transcripts. To address these deficiencies, we developed a concatenation workflow, PacBio Full-Length Isoform Concatemer Sequencing (PB_FLIC-Seq), designed to increase the number of unique, sequenced PacBio long-reads thereby improving overall detection of unique isoforms. In addition, we anticipate that the increase in read depth will help improve the detection of moderate to low-level expressed isoforms. RESULTS In sequencing a commercial reference (Spike-In RNA Variants; SIRV) with known isoform complexity we demonstrated a 3.4-fold increase in read output per run and improved SIRV recall when using the PB_FLIC-Seq method compared to the same samples processed with the Iso-Seq protocol. We applied this protocol to a translational cancer case, also demonstrating the utility of the PB_FLIC-Seq method for identifying differential full-length isoform expression in a pediatric diffuse midline glioma compared to its adjacent non-malignant tissue. Our data analysis revealed increased expression of extracellular matrix (ECM) genes within the tumor sample, including an isoform of the Secreted Protein Acidic and Cysteine Rich (SPARC) gene that was expressed 11,676-fold higher than in the adjacent non-malignant tissue. Finally, by using the PB_FLIC-Seq method, we detected several cancer-specific novel isoforms. CONCLUSION This work describes a concatenation-based methodology for increasing the number of sequenced full-length isoform reads on the PacBio platform, yielding improved discovery of expressed isoforms. We applied this workflow to profile the transcriptome of a pediatric diffuse midline glioma and adjacent non-malignant tissue. Our findings of cancer-specific novel isoform expression further highlight the importance of long-read sequencing for characterization of complex tumor transcriptomes.
Collapse
Affiliation(s)
- Saranga Wijeratne
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
| | - Maria E Hernandez Gonzalez
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
| | - Kelli Roach
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
| | - Katherine E Miller
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Kathleen M Schieffer
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
- Department of Pathology, The Ohio State University College of Medicine, Columbus, OH, USA
| | - James R Fitch
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
| | - Jeffrey Leonard
- Department of Neurosurgery, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Neurosurgery, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Peter White
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Benjamin J Kelly
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
| | - Catherine E Cottrell
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
- Department of Pathology, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Elaine R Mardis
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
- Department of Neurosurgery, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Richard K Wilson
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Anthony R Miller
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA.
| |
Collapse
|
22
|
Wang S, Shi M, Zhang Y, Niu J, Li W, Yuan J, Cai C, Yang Y, Gao P, Guo X, Li B, Lu C, Cao G. Construction of LncRNA-Related ceRNA Networks in Longissimus Dorsi Muscle of Jinfen White Pigs at Different Developmental Stages. Curr Issues Mol Biol 2024; 46:340-354. [PMID: 38248324 PMCID: PMC10814722 DOI: 10.3390/cimb46010022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/23/2023] [Accepted: 12/28/2023] [Indexed: 01/23/2024] Open
Abstract
The development of skeletal muscle in pigs might determine the quality of pork. In recent years, long non-coding RNAs (lncRNAs) have been found to play an important role in skeletal muscle growth and development. In this study, we investigated the whole transcriptome of the longissimus dorsi muscle (LDM) of Jinfen White pigs at three developmental stages (1, 90, and 180 days) and performed a comprehensive analysis of lncRNAs, mRNAs, and micro-RNAs (miRNAs), aiming to find the key regulators and interaction networks in Jinfen White pigs. A total of 2638 differentially expressed mRNAs (DE mRNAs) and 982 differentially expressed lncRNAs (DE lncRNAs) were identified. Compared with JFW_1d, there were 497 up-regulated and 698 down-regulated DE mRNAs and 212 up-regulated and 286 down-regulated DE lncRNAs in JFW_90d, respectively. In JFW_180d, there were 613 up-regulated and 895 down-regulated DE mRNAs and 184 up-regulated and 131 down-regulated DE lncRNAs compared with JFW_1d. There were 615 up-regulated and 477 down-regulated DE mRNAs and 254 up-regulated and 355 down-regulated DE lncRNAs in JFW_180d compared with JFW_90d. Compared with mRNA, lncRNA has fewer exons, fewer ORFs, and a shorter length. We performed GO and KEGG pathway functional enrichment analysis for DE mRNAs and the potential target genes of DE lncRNAs. As a result, several pathways are involved in muscle growth and development, such as the PI3K-Akt, MAPK, hedgehog, and hippo signaling pathways. These are among the pathways through which mRNA and lncRNAs function. As part of this study, bioinformatic screening was used to identify miRNAs and DE lncRNAs that could act as ceRNAs. Finally, we constructed an lncRNA-miRNA-mRNA regulation network containing 26 mRNAs, 7 miRNAs, and 17 lncRNAs; qRT-PCR was used to verify the key genes in these networks. Among these, XLOC_022984/miR-127/ENAH and XLOC_016847/miR-486/NRF1 may function as key ceRNA networks. In this study, we obtained transcriptomic profiles from the LDM of Jinfen White pigs at three developmental stages and screened out lncRNA-miRNA-mRNA regulatory networks that may provide crucial information for the further exploration of the molecular mechanisms during skeletal muscle development.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Chang Lu
- College of Animal Science, Shanxi Agricultural University, No. 1 Mingxian South Road, Taigu 030801, China; (S.W.); (M.S.); (Y.Z.); (J.N.); (W.L.); (J.Y.); (C.C.); (Y.Y.); (P.G.); (X.G.); (B.L.)
| | - Guoqing Cao
- College of Animal Science, Shanxi Agricultural University, No. 1 Mingxian South Road, Taigu 030801, China; (S.W.); (M.S.); (Y.Z.); (J.N.); (W.L.); (J.Y.); (C.C.); (Y.Y.); (P.G.); (X.G.); (B.L.)
| |
Collapse
|
23
|
Chumsakul O, Nakamura K, Fukamachi K, Ishikawa S, Oshima T. GeF-seq: A Simple Procedure for Base-Pair Resolution ChIP-seq. Methods Mol Biol 2024; 2819:39-53. [PMID: 39028501 DOI: 10.1007/978-1-0716-3930-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Nucleotide sequences recognized and bound by DNA-binding proteins (DBPs) are critical to controlling and maintaining gene expression, replication, chromosome segregation, cell division, and nucleoid structure in bacterial cells. Therefore, determination of the binding sequences of DBPs is important not only to study DBP recognition mechanisms but also to understand the fundamentals of cell homeostasis. While ChIP-seq analysis appears to be an effective way to determine DBP binding sites on the genome, the resolution is sometimes not sufficient to identify the sites precisely. Here we introduce a simple and effective method named Genome Footprinting with high-throughput sequencing (GeF-seq) to determine binding sites of DBPs with single base-pair resolution. GeF-seq detects binding sites of DBPs as sharp peaks and thus makes it possible to identify the recognition sequence in each "binding peak" more easily and accurately compared to the common ChIP-seq.
Collapse
Affiliation(s)
- Onuma Chumsakul
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara, Japan
- Basic Research and Development Division, Rohto Pharmaceutical Co., Ltd., Kyoto, Japan
- Graduate School of Science, Technology and Innovation, Kobe University, Noda, Kobe, Japan
| | - Kensuke Nakamura
- Division of Informatics, Bioengineering and Bioscience, Maebashi Institute of Technology, Maebashi, Gunma, Japan
| | - Kazuki Fukamachi
- Department of Biotechnology, Toyama Prefectural University, Imizu, Toyama, Japan
| | - Shu Ishikawa
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara, Japan.
- Graduate School of Science, Technology and Innovation, Kobe University, Noda, Kobe, Japan.
| | - Taku Oshima
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara, Japan.
- Department of Biotechnology, Toyama Prefectural University, Imizu, Toyama, Japan.
| |
Collapse
|
24
|
Qin Y, Wu L, Zhang Q, Wen C, Van Nostrand JD, Ning D, Raskin L, Pinto A, Zhou J. Effects of error, chimera, bias, and GC content on the accuracy of amplicon sequencing. mSystems 2023; 8:e0102523. [PMID: 38038441 PMCID: PMC10734440 DOI: 10.1128/msystems.01025-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023] Open
Abstract
IMPORTANCE Amplicon sequencing of targeted genes is the predominant approach to estimate the membership and structure of microbial communities. However, accurate reconstruction of community composition is difficult due to sequencing errors, and other methodological biases and effective approaches to overcome these challenges are essential. Using a mock community of 33 phylogenetically diverse strains, this study evaluated the effect of GC content on sequencing results and tested different approaches to improve overall sequencing accuracy while characterizing the pros and cons of popular amplicon sequence data processing approaches. The sequencing results from this study can serve as a benchmarking data set for future algorithmic improvements. Furthermore, the new insights on sequencing error, chimera formation, and GC bias from this study will help enhance the quality of amplicon sequencing studies and support the development of new data analysis approaches.
Collapse
Affiliation(s)
- Yujia Qin
- Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA
| | - Liyou Wu
- Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA
| | - Qiuting Zhang
- Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA
| | - Chongqin Wen
- Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA
- Fisheries College, Guangdong Ocean University, Zhanjiang, Guangdong, China
| | - Joy D. Van Nostrand
- Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA
| | - Daliang Ning
- Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA
| | - Lutgarde Raskin
- Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Ameet Pinto
- School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Jizhong Zhou
- Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of Oklahoma, Norman, Oklahoma, USA
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing, China
- Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- School of Civil Engineering and Environmental Sciences, University of Oklahoma, Norman, Oklahoma, USA
- School of Computer Science, University of Oklahoma, Norman, Oklahoma, USA
| |
Collapse
|
25
|
Koike K, Honda R, Aoki M, Yamamoto‐Ikemoto R, Syutsubo K, Matsuura N. A quantitative sequencing method using synthetic internal standards including functional and phylogenetic marker genes. ENVIRONMENTAL MICROBIOLOGY REPORTS 2023; 15:497-511. [PMID: 37465846 PMCID: PMC10667660 DOI: 10.1111/1758-2229.13189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 06/23/2023] [Indexed: 07/20/2023]
Abstract
The method of spiking synthetic internal standard genes (ISGs) to samples for amplicon sequencing, generating sequences and converting absolute gene numbers from read counts has been used only for phylogenetic markers and has not been applied to functional markers. In this study, we developed ISGs, including gene sequences of the 16S rRNA, pmoA, encoding a subunit of particulate methane monooxygenase and amoA, encoding a subunit of ammonia monooxygenase. We added ISGs to the samples, amplified the target genes and performed amplicon sequencing. For the mock community, the copy numbers converted from read counts using ISGs were equivalent to those obtained by the quantitative real-time polymerase chain reaction (4.0 × 104 versus 4.1 × 104 and 3.0 × 103 versus 4.0 × 103 copies μL-DNA-1 for 16S rRNA and pmoA genes, respectively), but we also identified underestimation, possibly due to primer coverage (7.8 × 102 versus 3.7 × 103 μL-DNA-1 for amoA gene). We then applied this method to environmental samples and analysed phylogeny, functional diversity and absolute quantities. One Methylocystis population was most abundant in the sludge samples [16S rRNA gene (3.8 × 109 copies g-1 ) and the pmoA gene (2.3 × 109 copies g-1 )] and were potentially interrelated. This study demonstrates that ISG spiking is useful for evaluating sequencing data processing and quantifying functional markers.
Collapse
Affiliation(s)
- Kazuyoshi Koike
- Graduate School of Natural Science and TechnologyKanazawa UniversityKanazawaJapan
| | - Ryo Honda
- Faculty of Geosciences and Civil EngineeringKanazawa UniversityKanazawaJapan
| | - Masataka Aoki
- Regional Environment Conservation DivisionNational Institute for Environmental Studies (NIES)IbarakiJapan
| | | | - Kazuaki Syutsubo
- Regional Environment Conservation DivisionNational Institute for Environmental Studies (NIES)IbarakiJapan
- Research Center for Water Environment Technology, School of Engineeringthe University of TokyoTokyoJapan
| | - Norihisa Matsuura
- Faculty of Geosciences and Civil EngineeringKanazawa UniversityKanazawaJapan
| |
Collapse
|
26
|
Li R, Wang Q, Yang J, Zhu J, Liu J, Wu R, Sun H. Comparison of three massively parallel sequencing platforms for single nucleotide polymorphism (SNP) genotyping in forensic genetics. Int J Legal Med 2023; 137:1361-1372. [PMID: 37336821 DOI: 10.1007/s00414-023-03035-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 05/30/2023] [Indexed: 06/21/2023]
Abstract
Three MPS platforms are being used in forensic genetic analysis, i.e., MiSeq FGx, Ion S5 XL, and MGISEQ-2000. However, few studies compared their performance. In this study, we sequenced 83 common SNPs of 71 samples using the ForenSeq™ DNA Signature Prep Kit on MiSeq FGx, the Precision ID Identity Panel on Ion S5 XL, and the MGIEasy Signature Identification Library Prep Kit on MGISEQ-2000 and then the performance was compared. Results showed that the MiSeq FGx had the highest sequence quality but the lowest sequencing depth and allele balance. Discordant genotypes were observed at six SNPs, which may be caused by variants at primer binding regions, indel errors, or misalignments. Besides, two kinds of background noises, allele-specific miscalled reads (ASMR) and allele-nonspecific miscalled reads (ANMR), were characterized. MGISEQ-2000 showed the highest level of ASMR while Ion S5 XL had the highest level of ANMR. Site- and genotype-dependent miscalled patterns were observed at several SNPs on Ion S5 XL and MGISEQ-2000, but few on MiSeq FGx. In conclusion, the three MPS platforms perform differently with respect to sequencing quality, sequencing depth, allele balance, concordance, and background noise. These findings may be useful for data comparison, mixture deconvolution, and heteroplasmy analysis in forensic genetics.
Collapse
Affiliation(s)
- Ran Li
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
- School of Medicine, Jiaying University, Meizhou, 514015, People's Republic of China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
| | - Qiangwei Wang
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
| | - Jingyi Yang
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
| | - Jianzhang Zhu
- Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, 510080, People's Republic of China
| | - Jiajun Liu
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
| | - Riga Wu
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China
| | - Hongyu Sun
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China.
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, 510080, People's Republic of China.
| |
Collapse
|
27
|
Das S, Biswas NK, Basu A. Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data. Nucleic Acids Res 2023; 51:e75. [PMID: 37378434 PMCID: PMC10415152 DOI: 10.1093/nar/gkad539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 05/16/2023] [Accepted: 06/27/2023] [Indexed: 06/29/2023] Open
Abstract
High-throughput sequencing (HTS) has revolutionized science by enabling super-fast detection of genomic variants at base-pair resolution. Consequently, it poses the challenging problem of identification of technical artifacts, i.e. hidden non-random error patterns. Understanding the properties of sequencing artifacts holds the key in separating true variants from false positives. Here, we develop Mapinsights, a toolkit that performs quality control (QC) analysis of sequence alignment files, capable of detecting outliers based on sequencing artifacts of HTS data at a deeper resolution compared with existing methods. Mapinsights performs a cluster analysis based on novel and existing QC features derived from the sequence alignment for outlier detection. We applied Mapinsights on community standard open-source datasets and identified various quality issues including technical errors related to sequencing cycles, sequencing chemistry, sequencing libraries and across various orthogonal sequencing platforms. Mapinsights also enables identification of anomalies related to sequencing depth. A logistic regression-based model built on the features of Mapinsights shows high accuracy in detecting 'low-confidence' variant sites. Quantitative estimates and probabilistic arguments provided by Mapinsights can be utilized in identifying errors, bias and outlier samples, and also aid in improving the authenticity of variant calls.
Collapse
Affiliation(s)
- Subrata Das
- National Institute of Biomedical Genomics, Kalyani, 741251, West Bengal, India
| | - Nidhan K Biswas
- National Institute of Biomedical Genomics, Kalyani, 741251, West Bengal, India
| | - Analabha Basu
- National Institute of Biomedical Genomics, Kalyani, 741251, West Bengal, India
| |
Collapse
|
28
|
Jeon H, Ahn J, Na B, Hong S, Sael L, Kim S, Yoon S, Baek D. AIVariant: a deep learning-based somatic variant detector for highly contaminated tumor samples. Exp Mol Med 2023; 55:1734-1742. [PMID: 37524869 PMCID: PMC10474289 DOI: 10.1038/s12276-023-01049-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/10/2023] [Accepted: 04/24/2023] [Indexed: 08/02/2023] Open
Abstract
The detection of somatic DNA variants in tumor samples with low tumor purity or sequencing depth remains a daunting challenge despite numerous attempts to address this problem. In this study, we constructed a substantially extended set of actual positive variants originating from a wide range of tumor purities and sequencing depths, as well as actual negative variants derived from sequencer-specific sequencing errors. A deep learning model named AIVariant, trained on this extended dataset, outperforms previously reported methods when tested under various tumor purities and sequencing depths, especially low tumor purity and sequencing depth.
Collapse
Affiliation(s)
- Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- Genome4me Inc., Seoul, 08826, Republic of Korea
| | - Junhak Ahn
- Genome4me Inc., Seoul, 08826, Republic of Korea
- School of Biological Sciences, Seoul National University, Seoul, 08826, Republic of Korea
| | - Byunggook Na
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Soona Hong
- AIGENDRUG Co., Ltd., Seoul, 08826, Republic of Korea
| | - Lee Sael
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sungroh Yoon
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, 08826, Republic of Korea
| | - Daehyun Baek
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea.
- Genome4me Inc., Seoul, 08826, Republic of Korea.
- School of Biological Sciences, Seoul National University, Seoul, 08826, Republic of Korea.
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
29
|
Wang L, Ho AT, Hurst LD, Yang S. Re-evaluating evidence for adaptive mutation rate variation. Nature 2023; 619:E52-E56. [PMID: 37495884 PMCID: PMC10371861 DOI: 10.1038/s41586-023-06314-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 06/12/2023] [Indexed: 07/28/2023]
Affiliation(s)
- Long Wang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Alexander T Ho
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK.
| | - Sihai Yang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China.
| |
Collapse
|
30
|
Cao J, Yu T, Xu B, Hu Z, Zhang XO, Theurkauf W, Weng Z. Epigenetic and chromosomal features drive transposon insertion in Drosophila melanogaster. Nucleic Acids Res 2023; 51:2066-2086. [PMID: 36762470 PMCID: PMC10018349 DOI: 10.1093/nar/gkad054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 01/12/2023] [Accepted: 02/07/2023] [Indexed: 02/11/2023] Open
Abstract
Transposons are mobile genetic elements prevalent in the genomes of most species. The distribution of transposons within a genome reflects the actions of two opposing processes: initial insertion site selection, and selective pressure from the host. By analyzing whole-genome sequencing data from transposon-activated Drosophila melanogaster, we identified 43 316 de novo and 237 germline insertions from four long-terminal-repeat (LTR) transposons, one LINE transposon (I-element), and one DNA transposon (P-element). We found that all transposon types favored insertion into promoters de novo, but otherwise displayed distinct insertion patterns. De novo and germline P-element insertions preferred replication origins, often landing in a narrow region around transcription start sites and in regions of high chromatin accessibility. De novo LTR transposon insertions preferred regions with high H3K36me3, promoters and exons of active genes; within genes, LTR insertion frequency correlated with gene expression. De novo I-element insertion density increased with distance from the centromere. Germline I-element and LTR transposon insertions were depleted in promoters and exons, suggesting strong selective pressure to remove transposons from functional elements. Transposon movement is associated with genome evolution and disease; therefore, our results can improve our understanding of genome and disease biology.
Collapse
Affiliation(s)
- Jichuan Cao
- The School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Tianxiong Yu
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Bo Xu
- The School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Zhongren Hu
- The School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xiao-ou Zhang
- The School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - William E Theurkauf
- Program in Molecular Medicine, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| |
Collapse
|
31
|
Performance evaluation of six popular short-read simulators. Heredity (Edinb) 2023; 130:55-63. [PMID: 36496447 PMCID: PMC9905089 DOI: 10.1038/s41437-022-00577-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/10/2022] [Accepted: 11/11/2022] [Indexed: 12/14/2022] Open
Abstract
High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas "gold-standard" empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design-yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators-ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim-and discuss important considerations for selecting suitable models for benchmarking.
Collapse
|
32
|
Cheng C, Fei Z, Xiao P. Methods to improve the accuracy of next-generation sequencing. Front Bioeng Biotechnol 2023; 11:982111. [PMID: 36741756 PMCID: PMC9895957 DOI: 10.3389/fbioe.2023.982111] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 01/11/2023] [Indexed: 01/21/2023] Open
Abstract
Next-generation sequencing (NGS) is present in all fields of life science, which has greatly promoted the development of basic research while being gradually applied in clinical diagnosis. However, the cost and throughput advantages of next-generation sequencing are offset by large tradeoffs with respect to read length and accuracy. Specifically, its high error rate makes it extremely difficult to detect SNPs or low-abundance mutations, limiting its clinical applications, such as pharmacogenomics studies primarily based on SNP and early clinical diagnosis primarily based on low abundance mutations. Currently, Sanger sequencing is still considered to be the gold standard due to its high accuracy, so the results of next-generation sequencing require verification by Sanger sequencing in clinical practice. In order to maintain high quality next-generation sequencing data, a variety of improvements at the levels of template preparation, sequencing strategy and data processing have been developed. This study summarized the general procedures of next-generation sequencing platforms, highlighting the improvements involved in eliminating errors at each step. Furthermore, the challenges and future development of next-generation sequencing in clinical application was discussed.
Collapse
|
33
|
Camiolo S, Hughes J, Baldanti F, Furione M, Lilleri D, Lombardi G, Angelini M, Gerna G, Zavattoni M, Davison AJ, Suárez NM. Identifying high-confidence variants in human cytomegalovirus genomes sequenced from clinical samples. Virus Evol 2022; 8:veac114. [PMID: 37091479 PMCID: PMC10120596 DOI: 10.1093/ve/veac114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 10/27/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022] Open
Abstract
Understanding the intrahost evolution of viral populations has implications in pathogenesis, diagnosis, and treatment and has recently made impressive advances from developments in high-throughput sequencing. However, the underlying analyses are very sensitive to sources of bias, error, and artefact in the data, and it is important that these are addressed adequately if robust conclusions are to be drawn. The key factors include (1) determining the number of viral strains present in the sample analysed; (2) monitoring the extent to which the data represent these strains and assessing the quality of these data; (3) dealing with the effects of cross-contamination; and (4) ensuring that the results are reproducible. We investigated these factors by generating sequence datasets, including biological and technical replicates, directly from clinical samples obtained from a small cohort of patients who had been infected congenitally with the herpesvirus human cytomegalovirus, with the aim of developing a strategy for identifying high-confidence intrahost variants. We found that such variants were few in number and typically present in low proportions and concluded that human cytomegalovirus exhibits a very low level of intrahost variability. In addition to clarifying the situation regarding human cytomegalovirus, our strategy has wider applicability to understanding the intrahost variability of other viruses.
Collapse
Affiliation(s)
- Salvatore Camiolo
- School of Infection and Immunity, MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Joseph Hughes
- School of Infection and Immunity, MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
- Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, School of Infection and Immunity, University of Pavia, Pavia 27100, Italy
| | - Fausto Baldanti
- Microbiology and Virology Department, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Policlinico San Matteo, Pavia 27100, Italy
| | - Milena Furione
- Microbiology and Virology Department, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Policlinico San Matteo, Pavia 27100, Italy
| | - Daniele Lilleri
- Microbiology and Virology Department, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Policlinico San Matteo, Pavia 27100, Italy
| | - Giuseppina Lombardi
- Neonatal and Intensive Care Unit, Fondazione IRCCS Policlinico San Matteo, Pavia 27100, Italy
| | - Micol Angelini
- Neonatal and Intensive Care Unit, Fondazione IRCCS Policlinico San Matteo, Pavia 27100, Italy
| | - Giuseppe Gerna
- Transplant Research Area and Centre for Inherited Cardiovascular Diseases, Fondazione IRCCS Policlinico San Matteo, Pavia 27100, Italy
| | - Maurizio Zavattoni
- Microbiology and Virology Department, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Policlinico San Matteo, Pavia 27100, Italy
| | - Andrew J Davison
- School of Infection and Immunity, MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Nicolás M Suárez
- School of Infection and Immunity, MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| |
Collapse
|
34
|
Craven KE, Fischer CG, Jiang L, Pallavajjala A, Lin MT, Eshleman JR. Optimizing Insertion and Deletion Detection Using Next-Generation Sequencing in the Clinical Laboratory. J Mol Diagn 2022; 24:1217-1231. [PMID: 36162758 PMCID: PMC9808503 DOI: 10.1016/j.jmoldx.2022.08.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 07/18/2022] [Accepted: 08/31/2022] [Indexed: 01/13/2023] Open
Abstract
Detection of insertions and deletions (InDels) by short-read next-generation sequencing (NGS) technology can be challenging because of frequent misaligned reads. A systematic analysis of short InDels (1 to 30 bases) and fms-related receptor tyrosine kinase 3 (FLT3) internal tandem duplications (ITDs; 6 to 183 bases) from 46 clinical cases of solid or hematologic malignancy processed with a clinical NGS assay identified misaligned reads in every case, ranging from 3% to 100% of reads with the InDel showing mismapped bases. Mismaps also increased with InDel size. As a consequence, the clinical NGS bioinformatics pipeline undercalled the variant allele frequency by 1% to 84%, incorrectly called simultaneous single-base substitutions along with InDels, or did not report an FLT3 ITD that had been detected by capillary electrophoresis. To improve the ability of the pipeline to better detect and quantify InDels, we utilized a software program called Assembly-Based ReAligner (ABRA2) to more accurately remap reads. ABRA2 was able to correct 41% to 100% of the reads with mismapped bases and led to absolute increases in the variant allele frequency from 1% to 61% along with correction of all of the single-base substitutions except for two cases. ABRA2 could also detect multiple FLT3 ITD clones except for one 183-base ITD. Our analysis has found that ABRA2 performs well on short InDels as well as FLT3 ITDs that are <100 bases.
Collapse
Affiliation(s)
- Kelly E Craven
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Catherine G Fischer
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland; Division of Cancer Prevention, National Cancer Institute, Rockville, Maryland
| | - LiQun Jiang
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Aparna Pallavajjala
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Ming-Tseh Lin
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - James R Eshleman
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland; Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland; The Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, Baltimore, Maryland.
| |
Collapse
|
35
|
Wang Z, Moffitt AB, Andrews P, Wigler M, Levy D. Accurate measurement of microsatellite length by disrupting its tandem repeat structure. Nucleic Acids Res 2022; 50:e116. [PMID: 36095132 PMCID: PMC9723644 DOI: 10.1093/nar/gkac723] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/03/2022] [Accepted: 08/15/2022] [Indexed: 12/24/2022] Open
Abstract
Tandem repeats of simple sequence motifs, also known as microsatellites, are abundant in the genome. Because their repeat structure makes replication error-prone, variant microsatellite lengths are often generated during germline and other somatic expansions. As such, microsatellite length variations can serve as markers for cancer. However, accurate error-free measurement of microsatellite lengths is difficult with current methods precisely because of this high error rate during amplification. We have solved this problem by using partial mutagenesis to disrupt enough of the repeat structure of initial templates so that their sequence lengths replicate faithfully. In this work, we use bisulfite mutagenesis to convert a C to a U, later read as T. Compared to untreated templates, we achieve three orders of magnitude reduction in the error rate per round of replication. By requiring agreement from two independent first copies of an initial template, we reach error rates below one in a million. We apply this method to a thousand microsatellite loci from the human genome, revealing microsatellite length distributions not observable without mutagenesis.
Collapse
Affiliation(s)
| | | | - Peter Andrews
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Dan Levy
- To whom correspondence should be addressed. Tel: +1 516 367 5039; Fax: +1 516 367 8381;
| |
Collapse
|
36
|
Giorgashvili E, Reichel K, Caswara C, Kerimov V, Borsch T, Gruenstaeudl M. Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly-A Case Study in the Narrow Endemic Calligonum bakuense. FRONTIERS IN PLANT SCIENCE 2022; 13:779830. [PMID: 35874012 PMCID: PMC9296850 DOI: 10.3389/fpls.2022.779830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for C. bakuense. Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of C. bakuense, a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice.
Collapse
Affiliation(s)
- Eka Giorgashvili
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Katja Reichel
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Calvinna Caswara
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Vuqar Kerimov
- Institute of Botany, Azerbaijan National Academy of Sciences (ANAS), Baku, Azerbaijan
| | - Thomas Borsch
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
- Botanischer Garten und Botanisches Museum Berlin, Freie Universität Berlin, Berlin, Germany
| | - Michael Gruenstaeudl
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
37
|
Wang G, Liu Y, Bai X, Cao P, Pang X, Han J. Identification and poisoning diagnosis of Aconitum materials using a genus-specific nucleotide signature. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022; 237:113539. [PMID: 35489139 DOI: 10.1016/j.ecoenv.2022.113539] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 04/07/2022] [Accepted: 04/16/2022] [Indexed: 06/14/2023]
Abstract
Aconitum genus generally contains hypertoxic alkaloids. Poisoning incidents due to the improper ingestion of Aconitum materials frequently occur around the world. DNA barcoding is considered as a powerful tool for species identification, but complete sequences of conventional DNA barcodes are sometimes unattainable from food and highly processed products due to severe DNA degradation. Therefore, a shorter molecular marker will be more profitable for the authentication and poisoning diagnosis of Aconitum materials. In this study, 1246 psbA-trnH sequences and chloroplast genomes representing 183 taxa of Aconitum were collected, and a 23-bp nucleotide signature unique to Aconitum genus (5'-TATATGAGTCATTGAAGTTGCAG-3') was developed. The nucleotide signature was conserved and universal within Aconitum while divergent among other genera. The specific molecular signature was then successfully applied to the detection of processed Aconitum ingredients. To further evaluate the application potential of nucleotide signature in completely unknown mixture samples, boiled food mixtures, containing different ratios of Aconitum materials, were sequenced by high-throughput sequencing technology. The results showed that the nucleotide signature sequence could be directly extracted from raw sequencing data, even at a low DNA concentration of 0.2 ng/µl. Consequently, the 23-bp genus-specific nucleotide signature represents a significant step forward in the use of DNA barcoding to identify processed samples and food mixtures with degraded DNA. This study undoubtedly provides a new perspective and strong support for the identification and detection of Aconitum-containing products, which can be further introduced to the diagnosis of food poisoning.
Collapse
Affiliation(s)
- Gang Wang
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, China
| | - Yang Liu
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, China
| | - Xuanjiao Bai
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, China
| | - Pei Cao
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, China
| | - Xiaohui Pang
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, China
| | - Jianping Han
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, China.
| |
Collapse
|
38
|
Cleal K, Baird DM. Dysgu: efficient structural variant calling using short or long reads. Nucleic Acids Res 2022; 50:e53. [PMID: 35100420 PMCID: PMC9122538 DOI: 10.1093/nar/gkac039] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 12/20/2021] [Accepted: 01/24/2022] [Indexed: 12/27/2022] Open
Abstract
Structural variation (SV) plays a fundamental role in genome evolution and can underlie inherited or acquired diseases such as cancer. Long-read sequencing technologies have led to improvements in the characterization of structural variants (SVs), although paired-end sequencing offers better scalability. Here, we present dysgu, which calls SVs or indels using paired-end or long reads. Dysgu detects signals from alignment gaps, discordant and supplementary mappings, and generates consensus contigs, before classifying events using machine learning. Additional SVs are identified by remapping of anomalous sequences. Dysgu outperforms existing state-of-the-art tools using paired-end or long-reads, offering high sensitivity and precision whilst being among the fastest tools to run. We find that combining low coverage paired-end and long-reads is competitive in terms of performance with long-reads at higher coverage values.
Collapse
Affiliation(s)
- Kez Cleal
- Division of Cancer and Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Duncan M Baird
- Division of Cancer and Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| |
Collapse
|
39
|
Irinyi L, Roper M, Malik R, Meyer W. Finding a needle in a haystack – <i>in silico</i> search for environmental traces of <i>Candida auris</i><i> </i>. Jpn J Infect Dis 2022; 75:490-495. [DOI: 10.7883/yoken.jjid.2022.068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Laszlo Irinyi
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Australia
| | - Michael Roper
- Division of Biomedical Science and Biochemistry, Australian National University, Australia
| | - Richard Malik
- Centre for Veterinary Education, The University of Sydney, Australia
| | - Wieland Meyer
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Australia
| |
Collapse
|
40
|
Miller SJ, Campbell CE, Jimenez-Corea HA, Wu GH, Logan R. Neuroglial Senescence, α-Synucleinopathy, and the Therapeutic Potential of Senolytics in Parkinson’s Disease. Front Neurosci 2022; 16:824191. [PMID: 35516803 PMCID: PMC9063319 DOI: 10.3389/fnins.2022.824191] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 03/22/2022] [Indexed: 12/02/2022] Open
Abstract
Parkinson’s disease (PD) is the most common movement disorder and the second most prevalent neurodegenerative disease after Alzheimer’s disease. Despite decades of research, there is still no cure for PD and the complicated intricacies of the pathology are still being worked out. Much of the research on PD has focused on neurons, since the disease is characterized by neurodegeneration. However, neuroglia has become recognized as key players in the health and disease of the central nervous system. This review provides a current perspective on the interactive roles that α-synuclein and neuroglial senescence have in PD. The self-amplifying and cyclical nature of oxidative stress, neuroinflammation, α-synucleinopathy, neuroglial senescence, neuroglial chronic activation and neurodegeneration will be discussed. Finally, the compelling role that senolytics could play as a therapeutic avenue for PD is explored and encouraged.
Collapse
Affiliation(s)
- Sean J. Miller
- Pluripotent Diagnostics Corp. (PDx), Molecular Medicine Research Institute, Sunnyvale, CA, United States
| | | | | | - Guan-Hui Wu
- Department of Neurology, Suzhou Municipal Hospital, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou, China
| | - Robert Logan
- Pluripotent Diagnostics Corp. (PDx), Molecular Medicine Research Institute, Sunnyvale, CA, United States
- Department of Biology, Eastern Nazarene College, Quincy, MA, United States
- *Correspondence: Robert Logan,
| |
Collapse
|
41
|
Marin M, Vargas R, Harris M, Jeffrey B, Epperson LE, Durbin D, Strong M, Salfinger M, Iqbal Z, Akhundova I, Vashakidze S, Crudu V, Rosenthal A, Farhat MR. Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome. Bioinformatics 2022; 38:1781-1787. [PMID: 35020793 PMCID: PMC8963317 DOI: 10.1093/bioinformatics/btac023] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 12/23/2021] [Accepted: 01/07/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Short-read whole-genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences and sequencing bias reduces the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. To benchmark short-read variant calling, we used 36 diverse clinical Mycobacterium tuberculosis (Mtb) isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically studied the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias and GC content. RESULTS Reference-based Illumina variant calling demonstrated a maximum recall of 89.0% and minimum precision of 98.5% across parameters evaluated. The approach that maximized variant recall while still maintaining high precision (<99%) was tuning the mapping quality filtering threshold, i.e. confidence of the read mapping (recall = 85.8%, precision = 99.1%, MQ ≥ 40). Additional masking of repetitive sequence content is an alternative conservative approach to variant calling that increases precision at cost to recall (recall = 70.2%, precision = 99.6%, MQ ≥ 40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52/168 PE/PPE genes (34.5%). From these results, we present a refined list of low confidence regions across the Mtb genome, which we found to frequently overlap with regions with structural variation, low sequence uniqueness and low sequencing coverage. Our benchmarking results have broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems and more generally for WGS applications in other organisms. AVAILABILITY AND IMPLEMENTATION All relevant code is available at https://github.com/farhat-lab/mtb-illumina-wgs-evaluation. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maximillian Marin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Roger Vargas
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Michael Harris
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20894, USA
| | - Brendan Jeffrey
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20894, USA
| | - L Elaine Epperson
- Center for Genes, Environment, and Health, National Jewish Health, Denver, CO 80206, USA
| | - David Durbin
- Mycobacteriology Reference Laboratory, Advanced Diagnostic Laboratories, National Jewish Health, Denver, CO 80206, USA
| | - Michael Strong
- Center for Genes, Environment, and Health, National Jewish Health, Denver, CO 80206, USA
| | - Max Salfinger
- College of Public Health and Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - Zamin Iqbal
- EMBL-EBI, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Irada Akhundova
- Scientific Research Institute of Lung Diseases, Ministry of Health, Baku AZ1014, Azerbaijan
| | - Sergo Vashakidze
- Department of Medicine, The University of Georgia, Tbilisi 0171, Georgia
- National Center for Tuberculosis and Lung Diseases, Ministry of Health, Tbilisi 0171, Georgia
| | - Valeriu Crudu
- Phthisiopneumology Institute, Ministry of Health, Chisinau 2025, Republic of Moldova
| | - Alex Rosenthal
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20894, USA
| | - Maha Reda Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
42
|
Sun KH(M, Wong YT(H, Cheung KM(C, Yuen C(M, Chan YT(T, Lai WY(J, Chao C(D, Fan WS(K, Chow YK(K, Law MF, Tam HC(T. Update on Molecular Diagnosis in Extranodal NK/T-Cell Lymphoma and Its Role in the Era of Personalized Medicine. Diagnostics (Basel) 2022; 12:diagnostics12020409. [PMID: 35204500 PMCID: PMC8871212 DOI: 10.3390/diagnostics12020409] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 01/23/2022] [Accepted: 01/28/2022] [Indexed: 02/06/2023] Open
Abstract
Natural killer (NK)/T-cell lymphoma (NKTCL) is an aggressive malignancy with unique epidemiological, histological, molecular, and clinical characteristics. It occurs in two pathological forms, namely, extranodal NKTCL (ENKTCL) and aggressive NK leukemia, according to the latest World Health Organization (WHO) classification. Epstein–Barr virus (EBV) infection has long been proposed as the major etiology of lymphomagenesis. The adoption of high-throughput sequencing has allowed us to gain more insight into the molecular mechanisms of ENKTCL, which largely involve chromosome deletion and aberrations in Janus kinase (JAK)-signal transducer and activator of transcription (STAT), programmed cell death protein-1 (PD-1)/PD-ligand 1 (PD-L1) pathways, as well as mutations in tumor suppressor genes. The molecular findings could potentially influence the traditional chemoradiotherapy approach, which is known to be associated with significant toxicity. This article will review the latest molecular findings in NKTCL and recent advances in the field of molecular diagnosis in NKTCL. Issues of quality control and technical difficulties will also be discussed, along with future prospects in the molecular diagnosis and treatment of NKTCL.
Collapse
Affiliation(s)
- Ka-Hei (Murphy) Sun
- Division of Hematopathology, Department of Anatomical and Cellular Pathology, Prince of Wales Hospital, Hong Kong; (K.-H.S.); (C.Y.)
| | | | - Ka-Man (Carmen) Cheung
- Department of Medicine and Therapeutics, Prince of Wales Hospital, Hong Kong; (K.-M.C.); (Y.-T.C.); (W.-Y.L.); (C.C.); (W.-S.F.); (Y.-K.C.); (H.-C.T.)
| | - Carmen (Michelle) Yuen
- Division of Hematopathology, Department of Anatomical and Cellular Pathology, Prince of Wales Hospital, Hong Kong; (K.-H.S.); (C.Y.)
| | - Yun-Tat (Ted) Chan
- Department of Medicine and Therapeutics, Prince of Wales Hospital, Hong Kong; (K.-M.C.); (Y.-T.C.); (W.-Y.L.); (C.C.); (W.-S.F.); (Y.-K.C.); (H.-C.T.)
| | - Wing-Yan (Jennifer) Lai
- Department of Medicine and Therapeutics, Prince of Wales Hospital, Hong Kong; (K.-M.C.); (Y.-T.C.); (W.-Y.L.); (C.C.); (W.-S.F.); (Y.-K.C.); (H.-C.T.)
| | - Chun (David) Chao
- Department of Medicine and Therapeutics, Prince of Wales Hospital, Hong Kong; (K.-M.C.); (Y.-T.C.); (W.-Y.L.); (C.C.); (W.-S.F.); (Y.-K.C.); (H.-C.T.)
| | - Wing-Sum (Katie) Fan
- Department of Medicine and Therapeutics, Prince of Wales Hospital, Hong Kong; (K.-M.C.); (Y.-T.C.); (W.-Y.L.); (C.C.); (W.-S.F.); (Y.-K.C.); (H.-C.T.)
| | - Yuen-Kiu (Karen) Chow
- Department of Medicine and Therapeutics, Prince of Wales Hospital, Hong Kong; (K.-M.C.); (Y.-T.C.); (W.-Y.L.); (C.C.); (W.-S.F.); (Y.-K.C.); (H.-C.T.)
| | - Man-Fai Law
- Department of Medicine and Therapeutics, Prince of Wales Hospital, Hong Kong; (K.-M.C.); (Y.-T.C.); (W.-Y.L.); (C.C.); (W.-S.F.); (Y.-K.C.); (H.-C.T.)
- Correspondence:
| | - Ho-Chi (Tommy) Tam
- Department of Medicine and Therapeutics, Prince of Wales Hospital, Hong Kong; (K.-M.C.); (Y.-T.C.); (W.-Y.L.); (C.C.); (W.-S.F.); (Y.-K.C.); (H.-C.T.)
| |
Collapse
|
43
|
van der Vossen EWJ, Bastos D, Stols-Gonçalves D, de Goffau MC, Davids M, Pereira JPB, Li Yim AYF, Henneman P, Netea MG, de Vos WM, de Jonge W, Groen AK, Nieuwdorp M, Levin E. Effects of fecal microbiota transplant on DNA methylation in subjects with metabolic syndrome. Gut Microbes 2022; 13:1993513. [PMID: 34747338 PMCID: PMC8583152 DOI: 10.1080/19490976.2021.1993513] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Accumulating evidence shows that microbes with their theater of activity residing within the human intestinal tract (i.e., the gut microbiome) influence host metabolism. Some of the strongest results come from recent fecal microbial transplant (FMT) studies that relate changes in intestinal microbiota to various markers of metabolism as well as the pathophysiology of insulin resistance. Despite these developments, there is still a limited understanding of the multitude of effects associated with FMT on the general physiology of the host, beyond changes in gut microbiome composition. We examined the effect of either allogenic (lean donor) or autologous FMTs on the gut microbiome, plasma metabolome, and epigenomic (DNA methylation) reprogramming in peripheral blood mononuclear cells in individuals with metabolic syndrome measured at baseline (pre-FMT) and after 6 weeks (post-FMT). Insulin sensitivity was determined with a stable isotope-based 2 step hyperinsulinemic clamp and multivariate machine learning methodology was used to uncover discriminative microbes, metabolites, and DNA methylation loci. A larger gut microbiota shift was associated with an allogenic than with autologous FMT. Furthemore, the data results of the the allogenic FMT group data indicates that the introduction of new species can potentially modulate the plasma metabolome and (as a result) the epigenome. Most notably, the introduction of Prevotella ASVs directly correlated with methylation of AFAP1, a gene involved in mitochondrial function, insulin sensitivity, and peripheral insulin resistance (Rd, rate of glucose disappearance). FMT was found to have notable effects on the gut microbiome but also on the host plasma metabolome and the epigenome of immune cells providing new avenues of inquiry in the context of metabolic syndrome treatment for the manipulation of host physiology to achieve improved insulin sensitivity.
Collapse
Affiliation(s)
- Eduard W. J. van der Vossen
- Department of Vascular Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Diogo Bastos
- Department of Vascular Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands,Horaizon BV, Delft, The Netherlands
| | - Daniela Stols-Gonçalves
- Department of Vascular Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Marcus C. de Goffau
- Department of Vascular Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands,Wellcome Sanger Institute, Cambridge, UK
| | - Mark Davids
- Department of Vascular Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Joao P. B. Pereira
- Department of Vascular Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands,Horaizon BV, Delft, The Netherlands
| | - Andrew Y. F. Li Yim
- Department of Genome Diagnostics, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Peter Henneman
- Department of Genome Diagnostics, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Mihai G. Netea
- Department of Experimental Internal Medicine, Radboud University, Nijmegen, The Netherlands,Department for Genomics & Immunoregulation, Life and Medical Sciences Institute (Limes), University of Bonn, Bonn, Germany
| | - Willem M. de Vos
- Laboratory of Microbiology, Wageningen University, Wageningen, The Netherlands,Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Wouter de Jonge
- Tytgat Institute for Liver and Intestinal Research, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Albert K. Groen
- Department of Vascular Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Max Nieuwdorp
- Department of Vascular Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands,CONTACT Max Nieuwdorp
| | - Evgeni Levin
- Department of Vascular Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands,Horaizon BV, Delft, The Netherlands,Evgeni Levin Department of Vascular Medicine, Amsterdam University Medical Center, Meibergdreef 9, Room G1-143, Amsterdam1105 AZ, The Netherlands
| |
Collapse
|
44
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
45
|
Fedarko MW, Kolmogorov M, Pevzner PA. Analyzing rare mutations in metagenomes assembled using long and accurate reads. Genome Res 2022; 32:2119-2133. [PMID: 36418060 PMCID: PMC9808630 DOI: 10.1101/gr.276917.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/16/2022] [Indexed: 11/25/2022]
Abstract
The advent of long and accurate "HiFi" reads has greatly improved our ability to generate complete metagenome-assembled genomes (MAGs), enabling "complete metagenomics" studies that were nearly impossible to conduct with short reads. In particular, HiFi reads simplify the identification and phasing of mutations in MAGs: It is increasingly feasible to distinguish between positions that are prone to mutations and positions that rarely ever mutate, and to identify co-occurring groups of mutations. However, the problems of identifying rare mutations in MAGs, estimating the false-discovery rate (FDR) of these identifications, and phasing identified mutations remain open in the context of HiFi data. We present strainFlye, a pipeline for the FDR-controlled identification and analysis of rare mutations in MAGs assembled using HiFi reads. We show that deep HiFi sequencing has the potential to reveal and phase tens of thousands of rare mutations in a single MAG, identify hotspots and coldspots of these mutations, and detail MAGs' growth dynamics.
Collapse
Affiliation(s)
- Marcus W. Fedarko
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA;,Center for Microbiome Innovation, University of California San Diego, La Jolla, California 92093, USA
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA;,Center for Microbiome Innovation, University of California San Diego, La Jolla, California 92093, USA;,UC Santa Cruz Genomics Institute, Santa Cruz, California 95064, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA;,Center for Microbiome Innovation, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
46
|
Bagal UR, Phan J, Welsh RM, Misas E, Wagner D, Gade L, Litvintseva AP, Cuomo CA, Chow NA. MycoSNP: A Portable Workflow for Performing Whole-Genome Sequencing Analysis of Candida auris. Methods Mol Biol 2022; 2517:215-228. [PMID: 35674957 DOI: 10.1007/978-1-0716-2417-3_17] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Candida auris is an urgent public health threat characterized by high drug-resistant rates and rapid spread in healthcare settings worldwide. As part of the C. auris response, molecular surveillance has helped public health officials track the global spread and investigate local outbreaks. Here, we describe whole-genome sequencing analysis methods used for routine C. auris molecular surveillance in the United States; methods include reference selection, reference preparation, quality assessment and control of sequencing reads, read alignment, and single-nucleotide polymorphism calling and filtration. We also describe the newly developed pipeline MycoSNP, a portable workflow for performing whole-genome sequencing analysis of fungal organisms including C. auris.
Collapse
Affiliation(s)
- Ujwal R Bagal
- Mycotic Diseases Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - John Phan
- Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Rory M Welsh
- Mycotic Diseases Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Elizabeth Misas
- Mycotic Diseases Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | | | - Lalitha Gade
- Mycotic Diseases Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | | | - Christina A Cuomo
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nancy A Chow
- Mycotic Diseases Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA.
| |
Collapse
|
47
|
Morgan SJ, Durfey SL, Ravishankar S, Jorth P, Ni W, Skerrett DT, Aitken ML, McKone EF, Salipante SJ, Radey MC, Singh PK. A population-level strain genotyping method to study pathogen strain dynamics in human infections. JCI Insight 2021; 6:e152472. [PMID: 34935640 PMCID: PMC8783678 DOI: 10.1172/jci.insight.152472] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
A hallmark of chronic bacterial infections is the long-term persistence of 1 or more pathogen species at the compromised site. Repeated detection of the same bacterial species can suggest that a single strain or lineage is continually present. However, infection with multiple strains of a given species, strain acquisition and loss, and changes in strain relative abundance can occur. Detecting strain-level changes and their effects on disease is challenging because most methods require labor-intensive isolate-by-isolate analyses, and thus, only a few cells from large infecting populations can be examined. Here, we present a population-level method for enumerating and measuring the relative abundance of strains called population multi-locus sequence typing (PopMLST). The method exploits PCR amplification of strain-identifying polymorphic loci, next-generation sequencing to measure allelic variants, and informatic methods to determine whether variants arise from sequencing errors or low-abundance strains. These features enable PopMLST to simultaneously interrogate hundreds of bacterial cells that are cultured en masse from patient samples or are present in DNA directly extracted from clinical specimens without ex vivo culture. This method could be used to detect epidemic or super-infecting strains, facilitate understanding of strain dynamics during chronic infections, and enable studies that link strain changes to clinical outcomes.
Collapse
Affiliation(s)
- Sarah J. Morgan
- Department of Microbiology, University of Washington School of Medicine, Seattle, Washington, USA
| | - Samantha L. Durfey
- Department of Microbiology, University of Washington School of Medicine, Seattle, Washington, USA
| | - Sumedha Ravishankar
- Department of Microbiology, University of Washington School of Medicine, Seattle, Washington, USA
| | - Peter Jorth
- Department of Pathology and Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Wendy Ni
- Department of Microbiology, University of Washington School of Medicine, Seattle, Washington, USA
| | - Duncan T. Skerrett
- Department of Microbiology, University of Washington School of Medicine, Seattle, Washington, USA
| | - Moira L. Aitken
- Department of Medicine, University of Washington School of Medicine, Seattle, Washington, USA
| | | | - Stephen J. Salipante
- Department of Laboratory Medicine and Pathology, University of Washington School of Medicine, Seattle, Washington, USA
| | - Matthew C. Radey
- Department of Microbiology, University of Washington School of Medicine, Seattle, Washington, USA
| | - Pradeep K. Singh
- Department of Microbiology, University of Washington School of Medicine, Seattle, Washington, USA
- Department of Medicine, University of Washington School of Medicine, Seattle, Washington, USA
| |
Collapse
|
48
|
Epstein HE, Hernandez-Agreda A, Starko S, Baum JK, Vega Thurber R. Inconsistent Patterns of Microbial Diversity and Composition Between Highly Similar Sequencing Protocols: A Case Study With Reef-Building Corals. Front Microbiol 2021; 12:740932. [PMID: 34899629 PMCID: PMC8656265 DOI: 10.3389/fmicb.2021.740932] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 11/01/2021] [Indexed: 12/14/2022] Open
Abstract
16S rRNA gene profiling (amplicon sequencing) is a popular technique for understanding host-associated and environmental microbial communities. Most protocols for sequencing amplicon libraries follow a standardized pipeline that can differ slightly depending on laboratory facility and user. Given that the same variable region of the 16S gene is targeted, it is generally accepted that sequencing output from differing protocols are comparable and this assumption underlies our ability to identify universal patterns in microbial dynamics through meta-analyses. However, discrepant results from a combined 16S rRNA gene dataset prepared by two labs whose protocols differed only in DNA polymerase and sequencing platform led us to scrutinize the outputs and challenge the idea of confidently combining them for standard microbiome analysis. Using technical replicates of reef-building coral samples from two species, Montipora aequituberculata and Porites lobata, we evaluated the consistency of alpha and beta diversity metrics between data resulting from these highly similar protocols. While we found minimal variation in alpha diversity between platform, significant differences were revealed with most beta diversity metrics, dependent on host species. These inconsistencies persisted following removal of low abundance taxa and when comparing across higher taxonomic levels, suggesting that bacterial community differences associated with sequencing protocol are likely to be context dependent and difficult to correct without extensive validation work. The results of this study encourage caution in the statistical comparison and interpretation of studies that combine rRNA gene sequence data from distinct protocols and point to a need for further work identifying mechanistic causes of these observed differences.
Collapse
Affiliation(s)
- Hannah E. Epstein
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | | | - Samuel Starko
- Department of Biology, University of Victoria, Victoria, BC, Canada
| | - Julia K. Baum
- Department of Biology, University of Victoria, Victoria, BC, Canada
| | - Rebecca Vega Thurber
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
49
|
Preising GA, Faber-Hammond JJ, Renn SCP. Correspondence of aCGH and long-read genome assembly for detection of copy number differences: A proof-of-concept with cichlid genomes. PLoS One 2021; 16:e0258193. [PMID: 34618847 PMCID: PMC8496808 DOI: 10.1371/journal.pone.0258193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 09/21/2021] [Indexed: 11/18/2022] Open
Abstract
Copy number variation is an important source of genetic variation, yet data are often lacking due to technical limitations for detection given the current genome assemblies. Our goal is to demonstrate the extent to which an array-based platform (aCGH) can identify genomic loci that are collapsed in genome assemblies that were built with short-read technology. Taking advantage of two cichlid species for which genome assemblies based on Illumina and PacBio are available, we show that inter-species aCGH log2 hybridization ratios correlate more strongly with inferred copy number differences based on PacBio-built genome assemblies than based on Illumina-built genome assemblies. With regard to inter-species copy number differences of specific genes identified by each platform, the set identified by aCGH intersects to a greater extent with the set identified by PacBio than with the set identified by Illumina. Gene function, according to Gene Ontology analysis, did not substantially differ among platforms, and platforms converged on functions associated with adaptive phenotypes. The results of the current study further demonstrate that aCGH is an effective platform for identifying copy number variable sequences, particularly those collapsed in short read genome assemblies.
Collapse
Affiliation(s)
| | | | - Suzy C. P. Renn
- Department of Biology, Reed College, Portland, OR, United States of America
| |
Collapse
|
50
|
Delahaye C, Nicolas J. Sequencing DNA with nanopores: Troubles and biases. PLoS One 2021; 16:e0257521. [PMID: 34597327 PMCID: PMC8486125 DOI: 10.1371/journal.pone.0257521] [Citation(s) in RCA: 231] [Impact Index Per Article: 57.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 09/06/2021] [Indexed: 12/03/2022] Open
Abstract
Oxford Nanopore Technologies' (ONT) long read sequencers offer access to longer DNA fragments than previous sequencer generations, at the cost of a higher error rate. While many papers have studied read correction methods, few have addressed the detailed characterization of observed errors, a task complicated by frequent changes in chemistry and software in ONT technology. The MinION sequencer is now more stable and this paper proposes an up-to-date view of its error landscape, using the most mature flowcell and basecaller. We studied Nanopore sequencing error biases on both bacterial and human DNA reads. We found that, although Nanopore sequencing is expected not to suffer from GC bias, it is a crucial parameter with respect to errors. In particular, low-GC reads have fewer errors than high-GC reads (about 6% and 8% respectively). The error profile for homopolymeric regions or regions with short repeats, the source of about half of all sequencing errors, also depends on the GC rate and mainly shows deletions, although there are some reads with long insertions. Another interesting finding is that the quality measure, although over-estimated, offers valuable information to predict the error rate as well as the abundance of reads. We supplemented this study with an analysis of a rapeseed RNA read set and shown a higher level of errors with a higher level of deletion in these data. Finally, we have implemented an open source pipeline for long-term monitoring of the error profile, which enables users to easily compute various analysis presented in this work, including for future developments of the sequencing device. Overall, we hope this work will provide a basis for the design of better error-correction methods.
Collapse
|