4.6 Article

A universal genomic coordinate translator for comparative genomics

期刊

BMC BIOINFORMATICS
卷 15, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/1471-2105-15-227

关键词

Comparative genomics; Genomic coordinate translation; Genomic duplication; Cross-species gene expression analysis

资金

  1. Science for Life Laboratory (MGG)
  2. Bioinformatics Infrastructure for Life Sciences in Sweden

向作者/读者索取更多资源

Background: Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N-2 with the number of available genomes, N. Results: Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Conclusions: Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biology

Genome assembly of the basket willow,Salix viminalis, reveals earliest stages of sex chromosome expansion

Pedro Almeida, Estelle Proux-Wera, Allison Churcher, Lucile Soler, Jacques Dainat, Pascal Pucholt, Jessica Nordlund, Tom Martin, Ann-Christin Ronnberg-Wastljung, Bjorn Nystedt, Sofia Berlin, Judith E. Mank

BMC BIOLOGY (2020)

Article Rheumatology

Molecular pathways in patients with systemic lupus erythematosus revealed by gene-centred DNA sequencing

Johanna K. Sandling, Pascal Pucholt, Lina Hultin Rosenberg, Fabiana H. G. Farias, Sergey Kozyrev, Maija-Leena Eloranta, Andrei Alexsson, Matteo Bianchi, Leonid Padyukov, Christine Bengtsson, Roland Jonsson, Roald Omdal, Benedicte A. Lie, Laura Massarenti, Rudi Steffensen, Marianne A. Jakobsen, Soren T. Lillevang, Karoline Lerang, Oyvind Molberg, Anne Voss, Anne Troldborg, Soren Jacobsen, Ann-Christine Syvanen, Andreas Jonsen, Iva Gunnarsson, Elisabet Svenungsson, Solbritt Rantapaa-Dahlqvist, Anders A. Bengtsson, Christopher Sjowall, Dag Leonard, Kerstin Lindblad-Toh, Lars Ronnblom

Summary: The study identified two main independent pathways involved in SLE susceptibility: T lymphocyte differentiation and innate immunity, characterized by HLA and interferon, respectively. Pathway PRS could define pathways in individual patients, who on average were positive for seven pathways. Pathway PRS-based clustering allowed stratification of patients into four groups with different risk score profiles.

ANNALS OF THE RHEUMATIC DISEASES (2021)

Article Microbiology

Comparative Fungal Community Analyses Using Metatranscriptomics and Internal Transcribed Spacer Amplicon Sequencing from Norway Spruce

Andreas N. Schneider, John Sundh, Gorel Sundstrom, Kerstin Richau, Nicolas Delhomme, Manfred Grabherr, Vaughan Hurry, Nathaniel R. Street

Summary: The study compared fungal community information obtained from RNA-Seq and fungal ITS1 DNA amplicon sequencing, revealing both consistency and differences between the two methods in terms of taxonomic and functional insights. It also demonstrated the potential of transcriptomic data to provide biologically informative functional insights, advancing our understanding of the interaction and effect between host plants and their associated microbial communities.

MSYSTEMS (2021)

Article Biology

A novel canine reference genome resolves genomic architecture and uncovers transcript complexity

Chao Wang, Ola Wallerman, Maja-Louise Arendt, Elisabeth Sundstrom, Asa Karlsson, Jessika Nordin, Suvi Makelainen, Gerli Rosengren Pielberg, Jeanette Hanson, Asa Ohlsson, Sara Saellstrom, Henrik Ronnberg, Ingrid Ljungvall, Jens Haggstrom, Tomas F. Bergstrom, Ake Hedhammar, Jennifer R. S. Meadows, Kerstin Lindblad-Toh

Summary: The GSD_1.0 domestic dog reference genome shows a 55-fold increase in contiguity over CanFam3.1 and uncovers previously hidden functional elements. Sequencing of 27 dogs revealed millions of genetic variations, some of which could directly impact gene products.

COMMUNICATIONS BIOLOGY (2021)

Article Rheumatology

Allele frequency spectrum of known ankylosing spondylitis associated variants in a Swedish population

A. Mathioudaki, J. Nordin, A. Kastbom, P. Soderkvist, P. Eriksson, J. Cedergren, K. Lindblad-Toh, J. R. S. Meadows

Summary: This study aimed to characterize known ankylosing spondylitis (AS) susceptibility variants in a homogeneous Swedish data set and successfully replicated major histocompatibility complex (MHC) and non-MHC loci associated with AS in the Swedish population. The study showed a different replication pattern compared to discovery data sets, possibly due to differences in population demographics.

SCANDINAVIAN JOURNAL OF RHEUMATOLOGY (2022)

Article Genetics & Heredity

Genome assemblies of three closely related leaf beetle species (Galerucella spp.)

Xuyue Yang, Tanja Slotte, Jacques Dainat, Peter A. Hamback

Summary: This study reported the genome assemblies and annotations of three closely related Galerucella species with varying genome sizes and scaffold numbers. A large number of protein-coding genes were identified, contributing to future population genomics studies.

G3-GENES GENOMES GENETICS (2021)

Article Rheumatology

Contribution of Rare Genetic Variation to Disease Susceptibility in a Large Scandinavian Myositis Cohort

Matteo Bianchi, Sergey V. Kozyrev, Antonella Notarnicola, Lina Hultin Rosenberg, Asa Karlsson, Pascal Pucholt, Simon Rothwell, Andrei Alexsson, Johanna K. Sandling, Helena Andersson, Robert G. Cooper, Leonid Padyukov, Anna Tjarnlund, Maryam Dastmalchi, Jennifer R. S. Meadows, Louise Pyndt Diederichsen, Oyvind Molberg, Hector Chinoy, Janine Lamb, Lars Ronnblom, Kerstin Lindblad-Toh, Ingrid E. Lundberg

Summary: By conducting targeted DNA sequencing of immune-related genes in IIM patients and healthy controls, this study identified IFI35 as a potential genetic risk locus for IIMs, highlighting a genetic signature of type I IFN pathway activation. Genetic associations with AGER and PSMB8 in the major histocompatibility complex locus were detected in the antisynthetase syndrome subgroup, suggesting a less marked genetic signature of the type I IFN pathway. Enrichment analyses revealed a burden of synonymous and noncoding rare variants in IIM patients, indicating increased disease predisposition associated with these classes of rare variants.

ARTHRITIS & RHEUMATOLOGY (2022)

Article Genetics & Heredity

Association of Protective HLA-A With HLA-B*27 Positive Ankylosing Spondylitis

Jessika Nordin, Mats Pettersson, Lina Hultin Rosenberg, Argyri Mathioudaki, Asa Karlsson, Eva Muren, Karolina Tandre, Lars Ronnblom, Alf Kastbom, Jan Cedergren, Per Eriksson, Peter Soderkvist, Kerstin Lindblad-Toh, Jennifer R. S. Meadows

Summary: This study investigated the role of MHC in ankylosing spondylitis through typing 17 genes, identifying 25 HLA protein-coding variants associated with the disease. Novel protective associations were found in a HLA-B*27 positive population, and unique risk variants were identified in different sexes, highlighting the impact of population composition on disease associations.

FRONTIERS IN GENETICS (2021)

Letter Plant Sciences

TEsorter: An accurate and fast method to classify LTR-retrotransposons in plant genomes

Ren-Gang Zhang, Guang-Yuan Li, Xiao-Ling Wang, Jacques Dainat, Zhao-Xuan Wang, Shujun Ou, Yongpeng Ma

HORTICULTURE RESEARCH (2022)

Article Multidisciplinary Sciences

Metatranscriptomics captures dynamic shifts in mycorrhizal coordination in boreal forests

Simon R. Law, Alonso R. Serrano, Yohann Daguerre, John Sundh, Andreas N. Schneider, Zsofia R. Stangl, David Castro, Manfred Grabherr, Torgny Nasholm, Nathaniel R. Street, Vaughan Hurry

Summary: Carbon storage and cycling in boreal forests, which are the largest terrestrial carbon store, are influenced by the complex interactions between trees and soil microorganisms. This study developed a metatranscriptomic approach to examine the impact of nutrient enrichment on Norway spruce fine roots and the fungal community structure and function. The findings demonstrate the important role of host-microbe dynamics in mediating the carbon storage potential of boreal soils under changing nutrient conditions.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2022)

Article Chemistry, Analytical

Flexible Machine Learning Algorithms for Clinical Gait Assessment Tools

Christian Greve, Hobey Tam, Manfred Grabherr, Aditya Ramesh, Bart Scheerder, Juha M. Hijmans

Summary: This study developed a novel machine learning algorithm and tested its validity for automated gait partitioning of laboratory-based and sensor-based gait data, showing low mean errors. By combining reinforcement learning with deep neural networks, significant reduction in the size of training datasets was achieved, providing high flexibility for end-users.

SENSORS (2022)

Article Biochemistry & Molecular Biology

Genomic analyses of the Linum distyly supergene reveal convergent evolution at the molecular level

Juanita Gutierrez-Valencia, Marco Fracassetti, Emma L. Berdan, Ignas Bunikis, Lucile Soler, Jacques Dainat, Verena E. Kutschera, Aleksandra Losvik, Aurelie Desamore, P. William Hughes, Alireza Foroozani, Benjamin Laenen, Edouard Pesquet, Mohamed Abdelaziz, Olga Vinnere Pettersson, Bjorn Nystedt, Adrian C. Brennan, Juan Arroyo, Tanja Slotte

Summary: This study characterized the genetic architecture and evolution of the distyly supergene in Linum, showing that hemizygosity and thrum-specific expression of S-linked genes are major features. Structural variation plays a key role in recombination suppression, and S-linked genes are under purifying selection. These findings provide insights into the origin and maintenance of floral polymorphism.

CURRENT BIOLOGY (2022)

Article Multidisciplinary Sciences

Targeted sequencing reveals the somatic mutation landscape in a Swedish breast cancer cohort

Argyri Mathioudaki, Viktor Ljungstrom, Malin Melin, Maja Louise Arendt, Jessika Nordin, Asa Karlsson, Eva Muren, Pushpa Saksena, Jennifer R. S. Meadows, Voichita D. Marinescu, Tobias Sjoblom, Kerstin Lindblad-Toh

SCIENTIFIC REPORTS (2020)

Article Multidisciplinary Sciences

A comparative genomics multitool for scientific discovery and conservation

Diane P. Genereux, Aitor Serres, Joel Armstrong, Jeremy Johnson, Voichita D. Marinescu, Eva Muren, David Juan, Gill Bejerano, Nicholas R. Casewell, Leona G. Chemnick, Joana Damas, Federica Di Palma, Mark Diekhans, Ian T. Fiddes, Manuel Garber, Vadim N. Gladyshev, Linda Goodman, Wilfried Haerty, Marlys L. Houck, Robert Hubley, Teemu Kivioja, Klaus-Peter Koepfli, Lukas F. K. Kuderna, Eric S. Lander, Jennifer R. S. Meadows, William J. Murphy, Will Nash, Hyun Ji Noh, Martin Nweeia, Andreas R. Pfenning, Katherine S. Pollard, David A. Ray, Beth Shapiro, Arian F. A. Smit, Mark S. Springer, Cynthia C. Steiner, Ross Swofford, Jussi Taipale, Emma C. Teeling, Jason Turner-Maier, Jessica Alfoldi, Bruce Birren, Oliver A. Ryder, Harris A. Lewin, Benedict Paten, Tomas Marques-Bonet, Kerstin Lindblad-Toh, Elinor K. Karlsson

NATURE (2020)

Article Rheumatology

Genetic and clinical basis for two distinct subtypes of primary Sjogren's syndrome

Gudny Ella Thorlacius, Lina Hultin-Rosenberg, Johanna K. Sandling, Matteo Bianchi, Juliana Imgenberg-Kreuz, Pascal Pucholt, Elke Theander, Marika Kvarnstrom, Helena Forsblad-d'Elia, Sara Magnusson Bucher, Katrine B. Norheim, Svein Joar Auglaend Johnsen, Daniel Hammenfors, Kathrine Skarstein, Malin V. Jonsson, Eva Baecklund, Lara A. Aqrawi, Janicke Liaaen Jensen, Oyvind Palm, Andrew P. Morris, Jennifer R. S. Meadows, Solbritt Rantapaa-Dahlqvist, Thomas Mandl, Per Eriksson, Lars Lind, Roald Omdal, Roland Jonsson, Kerstin Lindblad-Toh, Lars Ronnblom, Marie Wahren-Herlenius, Gunnel Nordmark

Summary: The clinical presentation of primary Sjogren's syndrome (pSS) varies widely. A multicentre study identified two patient subgroups based on the presence of SSA/SSB antibodies and genetic markers in the HLA region, with strong association signals found in HLA-DQA1 locus for the antibody-positive subgroup. Replication confirmed the association with MHC class I and II variants, indicating distinct clinical manifestations in pSS subgroups.

RHEUMATOLOGY (2021)

暂无数据