☆ 4.8 Article

Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity

NUCLEIC ACIDS RESEARCH (2009)

期刊

NUCLEIC ACIDS RESEARCH

卷 37, 期 15, 页码 -

出版社

OXFORD UNIV PRESS

DOI: 10.1093/nar/gkp492

关键词

-

类别

Biochemistry & Molecular Biology

资金

French 'Ministere de l'Enseignement superieur et de la Recherche'
'La ligue regionale contre le Cancer' Languedoc Roussillon
Universite de Montpellier 2
ANR [BLAN07-1_185484]

向作者/读者索取更多资源

Protocol

Reagent

摘要

Ultra high-throughput sequencing is used to analyse the transcriptome or interactome at unprecedented depth on a genome-wide scale. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as background distribution, sequence errors, and read length impact on the prediction capacity of sequence census experiments. Here we suggest a computational approach to measure these factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. For instance, by analysing chromatin immunoprecipitation read sets, we estimate that 4.6% of reads are affected by SNPs. We show that, although the nucleotide error probability is low, it significantly increases with the position in the sequence. Choosing a read length above 19 bp practically eliminates the risk of finding irrelevant positions, while above 20 bp the number of uniquely mapped reads decreases. With our procedure, we obtain 0.6% false positives among genomic locations. Hence, even rare signatures should identify biologically relevant regions, if they are mapped on the genome. This indicates that digital transcriptomics may help to characterize the wealth of yet undiscovered, low-abundance transcripts.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Multidisciplinary Sciences

GALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads

Mohamed Awad, Xiangchao Gan

Summary: This paper introduces GALA, a computational framework for chromosome-based sequencing data separation and gap-free de novo assembly. It allows integration of different data sources and addresses the challenge of achieving gap-free chromosome-scale assemblies using current workflows for long-read platforms. The method is demonstrated through the assembly of various genomes.

NATURE COMMUNICATIONS (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

Local assembly of long reads enables phylogenomics of transposable elements in a polyploid cell line

Shunhua Han, Guilherme B. Dias, Preston J. Basting, Raghuvir Viswanatha, Norbert Perrimon, Casey M. Bergman

Summary: Animal cell lines often undergo extreme genome restructuring events that hinder de novo whole-genome assembly. This study used long-read and linked-read technologies to sequence the genome of a tetraploid Drosophila cell line and developed a novel method called TELR for TE analysis. The results shed light on the role and mechanism of transposable elements in animal cell culture genome evolution.

NUCLEIC ACIDS RESEARCH (2022)

添加到收藏夹

Article Plant Sciences

Variation in Chloroplast Genome Size: Biological Phenomena and Technological Artifacts

Ante Turudic, Zlatko Liber, Martina Grdisa, Jernej Jakse, Filip Varga, Zlatko Satovic

Summary: The development of bioinformatic solutions requires biological knowledge and often makes assumptions. In this study, we investigated the relationship between chloroplast sequence lengths and taxonomic proximity of species using RefSeq sequences from the asterid and rosid clades. We found that chloroplast length distributions are narrow at the family and genus levels, with outliers indicating possible inaccuracies in sequence assembly.

PLANTS-BASEL (2023)

添加到收藏夹

Article Biochemical Research Methods

No one tool to rule them all: prokaryotic gene prediction tool annotations are highly dependent on the organism of study

Nicholas J. Dimonaco, Wayne Aubrey, Kim Kenobi, Amanda Clare, Christopher J. Creevey

Summary: This article presents an evaluation framework for assessing the performance of CDS prediction tools based on a comprehensive set of primary and secondary metrics. The research found that no individual tool ranked as the most accurate across all genomes or metrics analyzed, and even top-ranked tools produced conflicting gene collections.

BIOINFORMATICS (2022)

添加到收藏夹

Article Biotechnology & Applied Microbiology

Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping

Marketa Nykrynova, Vojtech Barton, Matej Bezdicek, Martina Lengerova, Helena Skutkova

Summary: The study introduces a pipeline for identifying highly variable genomic fragments in unmapped reads through a modified hybrid assembly approach. These variable regions can be used in efficient laboratory methods for bacterial typing with high discriminatory power, such as mini-MLST, replacing expensive methods like MLST. Through this approach, infection monitoring can be carried out more rapidly.

BMC GENOMICS (2022)

添加到收藏夹

Article Multidisciplinary Sciences

CaBagE: A Cas9-based Background Elimination strategy for targeted, long-read DNA sequencing

Amelia D. Wallace, Thomas A. Sasani, Jordan Swanier, Brooke L. Gates, Jeff Greenland, Brent S. Pedersen, Katherine E. Varley, Aaron R. Quinlan

Summary: The study introduces a method called CaBagE for efficient and rapid target enrichment of large, structurally complex DNA targets. By leveraging the stable binding of Cas9 to its DNA target, desired fragments are protected from digestion, allowing for enrichment. Testing on five genomic targets showed that enrichment with CaBagE resulted in high coverage of target loci.

PLOS ONE (2021)

添加到收藏夹

Article Biotechnology & Applied Microbiology

Protein length distribution is remarkably uniform across the tree of life

Yannis Nevers, Natasha M. Glover, Christophe Dessimoz, Odile Lecompte

Summary: By comparing protein length distribution across 2326 species, it was found that proteins are slightly longer on average in eukaryotes than in bacteria or archaea, but the variation of length distribution across species is low, especially compared to other genomic features. Moreover, most cases of atypical protein length distribution appear to be due to artifactual gene annotation, suggesting that the actual variation of protein length distribution across species is even smaller.

GENOME BIOLOGY (2023)

添加到收藏夹

Article Veterinary Sciences

Isolation, Identification, and Genomic Characterization of Chicken Astrovirus Isolates From China

Wei Zhao, Jialei Shi, Yongxiu Yao, Hongxia Shao, Aijian Qin, Kun Qian

Summary: This study successfully isolated and molecularly characterized two strains of Chicken astrovirus (CAstV), and observed their effect on hatchability. The genetic analysis showed that these two strains had typical characteristics of avian astroviruses, with high similarity among Chinese strains and a common origin with strains from the UK.

FRONTIERS IN VETERINARY SCIENCE (2022)

添加到收藏夹

Article Microbiology

Distribution of rare N4-like viruses in temperate estuaries unveiled by viromics

Mengqi Sun, Feng Chen

Summary: The relative abundance of N4-like viruses in two temperate estuaries was assessed using four different methods, and it was found that N4-like viruses were of low abundance in these environments. The study also identified locally isolated N4-like virus species, and indicated that N4-like viruses may be more abundant in colder water. The importance of including local viral sequences in reference databases was highlighted.

ENVIRONMENTAL MICROBIOLOGY (2022)

添加到收藏夹

Article Biochemical Research Methods

Assessment of selection pressure exerted on genes from complete pangenomes helps to improve the accuracy in the prediction of new genes

Alejandro Rubio, Juan Jimenez, Antonio J. Perez-Pulido

Summary: Bacterial genomes provide valuable data for understanding the complete set of genes of a species. By analyzing multiple bacterial strains, shared genes and strain-specific genes can be identified. However, current computational gene finders may miss some existing genes. This study estimated the selective pressure on genes in the Acinetobacter baumannii pangenome and found that most genes are under negative selection, but a subset showed values compatible with positive selection, which may be related to acquisition of new functions.

BRIEFINGS IN BIOINFORMATICS (2022)

添加到收藏夹

Article Multidisciplinary Sciences

CusProSe: a customizable protein annotation software with an application to the prediction of fungal secondary metabolism genes

Leonor Oliveira, Nicolas Chevrollier, Jean-Felix Dallery, Richard J. J. O'Connell, Marc-Henri Lebrun, Muriel Viaud, Olivier Lespinet

Summary: In this article, we introduce a new application called CustomProteinSearch (CusProSe), which is designed to help users search for proteins based on their domain composition. The application consists of two customizable tools, IterHMMBuild and ProSeCDA. IterHMMBuild allows for the iterative construction of Hidden Markov Model (HMM) profiles for specific protein sequences, while ProSeCDA scans a proteome using an HMM profile database and annotates identified proteins using user-defined rules. We successfully used CusProSe to identify genes encoding key enzyme families involved in secondary metabolism in fungal genomes, as well as characterize different sub-families of terpene synthases.

SCIENTIFIC REPORTS (2023)

添加到收藏夹

Article Genetics & Heredity

Prediction of CTCF loop anchor based on machine learning

Xiao Zhang, Wen Zhu, Huimin Sun, Yijie Ding, Li Liu

Summary: In this study, a comparative analysis was conducted to investigate the sequence preference and binding strength of anchor and non-anchor CTCF binding sites. A machine learning model based on CTCF binding intensity and DNA sequence was proposed to predict the formation of chromatin loop anchors. The accuracy of this model reached 0.8646, and it was found that the formation of loop anchor is mainly influenced by CTCF binding strength and binding pattern.

FRONTIERS IN GENETICS (2023)

添加到收藏夹

Article Genetics & Heredity

Genome-wide simple sequence repeat markers in potato: abundance, distribution, composition, and polymorphism

Yinqiao Jian, Wenyuan Yan, Jianfei Xu, Shaoguang Duan, Guangcun Li, Liping Jin

Summary: This study analyzed the abundance and distribution of SSRs in four potato genomes, identifying a large number of polymorphic markers, with a focus on intergenic regions. The high-density potato SSR markers developed will facilitate genetic research and marker-pyramiding in potato breeding.

DNA RESEARCH (2021)

添加到收藏夹

Article Genetics & Heredity

FrangiPANe, a tool for creating a panreference using left behind reads

Tranchant-Dubreuil Christine, Chenal Clothilde, Blaison Mathieu, Albar Laurence, Klein Valentin, Mariac Cedric, A. Wing Rod, Vigouroux Yves, Sabot Francois

Summary: FrangiPANe is a pipeline developed to build panreference using short reads through a map-then-assemble strategy. Applying it to 248 African rice genomes using an improved CG14 reference genome, we identified an average of 8 Mb of new sequences and 5290 new contigs per individual. The pipeline allows for the anchoring of new contigs within the reference genome and annotation of new genes. It simplifies the construction of a panreference and can be used for pangenome studies and selection detection.

NAR GENOMICS AND BIOINFORMATICS (2023)

添加到收藏夹

Article Agriculture, Dairy & Animal Science

Improving the accuracy of genomic prediction for meat quality traits using whole genome sequence data in pigs

Zhanwei Zhuang, Jie Wu, Yibin Qiu, Donglin Ruan, Rongrong Ding, Cineng Xu, Shenping Zhou, Yuling Zhang, Yiyi Liu, Fucai Ma, Jifei Yang, Ying Sun, Enqin Zheng, Ming Yang, Gengyuan Cai, Jie Yang, Zhenfang Wu

Summary: In this study, whole genome sequence (WGS) data was used to evaluate the prediction accuracy of genomic best linear unbiased prediction (GBLUP) for meat quality in large-scale crossbred commercial pigs. The results showed that using WGS data for genomic prediction resulted in different accuracies for different meat quality traits, ranging from 0.08 to 0.47. The study also found that MultiBLUP outperformed GBLUP and yielded accuracy increases ranging from 17.39% to 75%. Furthermore, genotype imputation from 50K chip to WGS level showed a high concordance rate and correlation coefficient.

JOURNAL OF ANIMAL SCIENCE AND BIOTECHNOLOGY (2023)

添加到收藏夹

Article Computer Science, Hardware & Architecture

Linking indexing data structures to de Bruijn graphs: Construction and update

Bastien Cazaux, Thierry Lecroq, Eric Rivals

JOURNAL OF COMPUTER AND SYSTEM SCIENCES (2019)

添加到收藏夹

Article Mathematics, Applied

Improved online algorithms for jumbled matching

Sukhpal Singh Ghuman, Jorma Tarhio, Tamanna Chhabra

DISCRETE APPLIED MATHEMATICS (2020)

添加到收藏夹

Article Biochemical Research Methods

AQUAPONY: visualization and interpretation of phylogeographic information on phylogenetic trees

Bastien Cazaux, Guillaume Castel, Eric Rivals

BIOINFORMATICS (2019)

添加到收藏夹

Article Multidisciplinary Sciences

NF-Y controls fidelity of transcription initiation at gene promoters through maintenance of the nucleosome-depleted region

Andrew J. Oldfield, Telmo Henriques, Dhirendra Kumar, Adam B. Burkholder, Senthilkumar Cinghu, Damien Paulet, Brian D. Bennett, Pengyi Yang, Benjamin S. Scruggs, Christopher A. Lavender, Eric Rivals, Karen Adelman, Raja Jothi

NATURE COMMUNICATIONS (2019)

添加到收藏夹

Article Computer Science, Information Systems

Hierarchical Overlap Graph

Bastien Cazaux, Eric Rivals

INFORMATION PROCESSING LETTERS (2020)

添加到收藏夹

Article Biochemical Research Methods

PEWO: a collection of workflows to benchmark phylogenetic placement

Benjamin Linard, Nikolai Romashchenko, Fabio Pardi, Eric Rivals

BIOINFORMATICS (2020)

添加到收藏夹

Article Biochemical Research Methods

Rapid screening and detection of inter-type viral recombinants using phylo-k-mers

Guillaume E. Scholz, Benjamin Linard, Nikolai Romashchenko, Eric Rivals, Fabio Pardi

BIOINFORMATICS (2020)

添加到收藏夹

Article Multidisciplinary Sciences

FTO-mediated cytoplasmic m6Am demethylation adjusts stem-like properties in colorectal cancer cell

Sebastien Relier, Julie Ripoll, Helene Guillorit, Amandine Amalric, Cyrinne Achour, Florence Boissiere, Jerome Vialaret, Aurore Attina, Francoise Debart, Armelle Choquet, Francoise Macari, Virginie Marchand, Yuri Motorin, Emmanuelle Samalin, Jean-Jacques Vasseur, Julie Pannequin, Francesca Aguilo, Evelyne Lopez-Crapez, Christophe Hirtz, Eric Rivals, Amandine Bastide, Alexandre David

Summary: The demethylase FTO was shown to remove N6-methyladenosine (m6A) and N6, 2'-O-dimethyladenosine (m6A(m)) modifications on RNAs. Here the authors show that FTO impedes cancer stem cell-like abilities in colorectal cancer cells through its m6A(m) demethylase activity, not through internal m6A demethylase activity.

NATURE COMMUNICATIONS (2021)

添加到收藏夹

Review Biochemistry & Molecular Biology

The multifaceted functions of the Fat mass and Obesity-associated protein (FTO) in normal and cancer cells

Sebastien Relier, Eric Rivals, Alexandre David

Summary: Over the past decade, mRNA modification has emerged as a new layer of gene expression regulation. FTO, as the first identified eraser of N6-methyladenosine (m6A) adducts, has attracted much attention in the field of epitranscriptomics. The contradictory studies on the regulatory role of FTO in gene expression may be attributed to its wide spectrum of substrates and RNA sequence preferences. This review focuses on current knowledge related to FTO function in healthy and cancer cells, emphasizing its divergent roles in different tissues and subcellular and molecular contexts.

RNA BIOLOGY (2022)

添加到收藏夹

Article Chemistry, Analytical

Multivariate Analysis of RNA Chemistry Marks Uncovers Epitranscriptomics-Based Biomarker Signature for Adult Diffuse Glioma Diagnostics

S. Relier, A. Amalric, A. Attina, I. B. Koumare, V. Rigau, F. Burel Vandenbos, D. Fontaine, M. Baroncini, J. P. Hugnot, H. Duffau, L. Bauchet, C. Hirtz, E. Rivals, A. David

Summary: One of the main challenges in cancer management is the discovery of reliable biomarkers for decision-making and treatment outcome prediction. This study combines high-throughput molecular profiling technologies with statistical multivariate analysis to design a pipeline for identifying biomarker signatures that can guide precision medicine and improve disease diagnosis.

ANALYTICAL CHEMISTRY (2022)

添加到收藏夹

Article Infectious Diseases

FT-GPI, a highly sensitive and accurate predictor of GPI-anchored proteins, reveals the composition and evolution of the GPI proteome in Plasmodium species

Lena M. M. Sauer, Rodrigo Canovas, Daniel Roche, Hosam Shams-Eldin, Patrice Ravel, Jacques Colinge, Ralph T. T. Schwarz, Choukri Ben Mamoun, Eric Rivals, Emmanuel Cornillot

Summary: Protozoan parasites attach specific and diverse proteins to their plasma membrane via a GPI anchor. The FT-GPI software can detect GPI-anchored proteins and identify new candidates for vaccines against malaria and other parasitic diseases.

MALARIA JOURNAL (2023)

添加到收藏夹

Article Biochemical Research Methods

Physical modeling of ribosomes along messenger RNA: Estimating kinetic parameters from ribosome profiling experiments using a ballistic model

Carole Chevalier, Jerome Dorignac, Yahaya Ibrahim, Armelle Choquet, Alexandre David, Julie Ripoll, Eric Rivals, Frederic Geniet, Nils-Ole Walliser, John Palmeri, Andrea Parmeggiani, Jean-Charles Walter

Summary: Gene expression involves the synthesis of proteins from the information encoded on DNA, and translation of mRNA into amino acid sequences is one of the main steps in gene expression. Understanding the motion of ribosomes along mRNA is crucial for studying genetic expression. In this study, a new experimental and theoretical approach is proposed to obtain kinetic rates with better accuracy by categorizing mRNA based on the number of ribosomes and using ribo-sequencing techniques.

PLOS COMPUTATIONAL BIOLOGY (2023)

添加到收藏夹

Article Biochemical Research Methods

dipwmsearch: a Python package for searching di-PWM motifs

Marie Mille, Julie Ripoll, Bastien Cazaux, Eric Rivals

Summary: This study proposes a Python package called dipwmsearch, which offers an original and efficient algorithm for searching for occurrences of dinucleotide PWMs in sequences. The package allows the enumeration of matching words and simultaneous searching in the sequence, even if it contains IUPAC codes. Users can easily install dipwmsearch via Pypi or conda, and they also have access to comprehensive documentation and executable scripts for using dinucleotide PWMs.

BIOINFORMATICS (2023)

添加到收藏夹

Article Biochemical Research Methods

EPIK: precise and scalable evolutionary placement with informative k-mers

Nikolai Romashchenko, Benjamin Linard, Fabio Pardi, Eric Rivals

Summary: Motivation Phylogenetic placement is a method for analyzing massive collections of newly sequenced DNA using a high-quality reference tree. Alignment-free approaches based on phylo-k-mers have emerged to simplify the process, but are limited by data preprocessing and the large number of k-mers to consider. The authors propose a filtering method based on mutual information to select informative phylo-k-mers, improving efficiency at the cost of a slight loss in accuracy. They develop the tools IPK and EPIK, which outperform previous software and provide fast and accurate phylogenetic placement.

BIOINFORMATICS (2023)

添加到收藏夹

Article Biochemical Research Methods

Computing Phylo-k-Mers

Nikolai Romashchenko, Benjamin Linard, Eric Rivals, Fabio Pardi

Summary: The paper introduces a method for computing phylo-k-mers based on the concept of phylogenetically-informative k-mers and proposes algorithms to solve the computational problem. In practice, this method can efficiently find k-mers with probabilities above a given threshold in a phylogenetic tree.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.