4.7 Article

Chromosome-scale inference of hybrid speciation and admixture with convolutional neural networks

Journal

MOLECULAR ECOLOGY RESOURCES
Volume 21, Issue 8, Pages 2676-2688

Publisher

WILEY
DOI: 10.1111/1755-0998.13355

Keywords

admixture; convolutional neural networks; deep learning; gene flow; hybridization; model selection

Funding

  1. National Institute of General Medical Sciences [R01GM127348]
  2. National Science Foundation [IOS-1811784]

Ask authors/readers for more resources

In order to understand the process of speciation and uncover phylogenetic patterns, researchers use a deep learning method like CNNs to infer the frequency and mode of hybridization among closely related organisms. By analyzing genealogical discordance and selecting hybridization scenario models, this approach helps to better comprehend patterns of admixture, especially when dealing with closely linked data where nonindependence needs to be considered.
Inferring the frequency and mode of hybridization among closely related organisms is an important step for understanding the process of speciation and can help to uncover reticulated patterns of phylogeny more generally. Phylogenomic methods to test for the presence of hybridization come in many varieties and typically operate by leveraging expected patterns of genealogical discordance in the absence of hybridization. An important assumption made by these tests is that the data (genes or SNPs) are independent given the species tree. However, when the data are closely linked, it is especially important to consider their nonindependence. Recently, deep learning techniques such as convolutional neural networks (CNNs) have been used to perform population genetic inferences with linked SNPs coded as binary images. Here, we use CNNs for selecting among candidate hybridization scenarios using the tree topology (((P-1, P-2), P-3), Out) and a matrix of pairwise nucleotide divergence (d(XY)) calculated in windows across the genome. Using coalescent simulations to train and independently test a neural network showed that our method, HyDe-CNN, was able to accurately perform model selection for hybridization scenarios across a wide breath of parameter space. We then used HyDe-CNN to test models of admixture in Heliconius butterflies, as well as comparing it to phylogeny-based introgression statistics. Given the flexibility of our approach, the dropping cost of long-read sequencing and the continued improvement of CNN architectures, we anticipate that inferences of hybridization using deep learning methods like ours will help researchers to better understand patterns of admixture in their study organisms.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Review Plant Sciences

Sequencing and Analyzing the Transcriptomes of a Thousand Species Across the Tree of Life for Green Plants

Gane Ka-Shu Wong, Douglas E. Soltis, Jim Leebens-Mack, Norman J. Wickett, Michael S. Barker, Yves Van de Peer, Sean W. Graham, Michael Melkonian

ANNUAL REVIEW OF PLANT BIOLOGY, VOL 71, 2020 (2020)

Article Plant Sciences

The Chimonanthus salicifolius genome provides insight into magnoliid evolution and flavonoid biosynthesis

Qundan Lv, Jie Qiu, Jie Liu, Zheng Li, Wenting Zhang, Qin Wang, Jie Fang, Junjie Pan, Zhengdao Chen, Wenliang Cheng, Michael S. Barker, Xuehui Huang, Xin Wei, Kejun Cheng

PLANT JOURNAL (2020)

Article Plant Sciences

Genes derived from ancient polyploidy have higher genetic diversity and are associated with domestication in Brassica rapa

Xinshuai Qi, Hong An, Tara E. Hall, Chenlu Di, Paul D. Blischak, Michael T. W. McKibben, Yue Hao, Gavin C. Conant, J. Chris Pires, Michael S. Barker

Summary: The study found a close relationship between domestication and polyploidy in Brassica rapa crops, with genetic diversity derived from ancient polyploidy playing a key role in the domestication of B. rapa and supporting its importance in the success of modern agriculture.

NEW PHYTOLOGIST (2021)

Review Plant Sciences

Patterns and Processes of Diploidization in Land Plants

Zheng Li, Michael T. W. McKibben, Geoffrey S. Finch, Paul D. Blischak, Brittany L. Sutherland, Michael S. Barker

Summary: This review discusses the impact of polyploidy on chromosome pairing behavior in land plants, as well as the two major processes of diploidization: cytological diploidization and genic diploidization/fractionation. It also compares gene fractionation variation across land plants and highlights differences in diploidization between plants and animals.

ANNUAL REVIEW OF PLANT BIOLOGY, VOL 72, 2021 (2021)

Article Plant Sciences

Pilot RNA-seq data from 24 species of vascular plants at Harvard Forest

Hannah E. Marx, Stacy A. Jorgensen, Eldridge Wisely, Zheng Li, Katrina M. Dlugosch, Michael S. Barker

Summary: This study involved generating and analyzing RNA-seq data for 24 vascular plant species, highlighting the challenges of collecting RNA data from diverse plant communities and revealing no significant differences in transcriptome quality between diploid and polyploid species. The findings provide opportunities for future large-scale studies at the intersection of ecology and genomics.

APPLICATIONS IN PLANT SCIENCES (2021)

Article Biochemistry & Molecular Biology

The contributions from the progenitor genomes of the mesopolyploid Brassiceae are evolutionarily distinct but functionally compatible

Yue Hao, Makenzie E. Mabry, Patrick P. Edger, Michael Freeling, Chunfang Zheng, Lingling Jin, Robert VanBuren, Marivi Colle, Hong An, R. Shawn Abrahams, Jacob D. Washburn, Xinshuai Qi, Kerrie Barry, Christopher Daum, Shengqiang Shu, Jeremy Schmutz, David Sankoff, Michael S. Barker, Eric Lyons, J. Chris Pires, Gavin C. Conant

Summary: The study investigates the gene loss history after whole-genome triplication (WGT) in Brassiceae tribe members, confirming a two-step formation model with significant temporal gaps. It highlights distinguishable homoeolog loss rates among subgenomes and proposes a mix and match model of allopolyploidy where genes from different subgenomes function together without difficulty.

GENOME RESEARCH (2021)

Article Ecology

Animal chromosome counts reveal a similar range of chromosome numbers but with less polyploidy in animals compared to flowering plants

Cristian Roman-Palacios, Cesar A. Medina, Shing H. Zhan, Michael S. Barker

Summary: Understanding the mechanisms behind chromosome evolution can provide insights into lineage origin, persistence, and evolutionary tempo. A database of chromosome counts for animals was presented, showing similarities in distribution with flowering plants, though driven by different factors. Animals and plants exhibit similar frequencies of speciation-related changes in chromosome number, but plant speciation is more often associated with changes in ploidy.

JOURNAL OF EVOLUTIONARY BIOLOGY (2021)

Article Biotechnology & Applied Microbiology

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids

Philipp E. Bayer, Armin Scheben, Agnieszka A. Golicz, Yuxuan Yuan, Sebastien Faure, HueyTyng Lee, Harmeet Singh Chawla, Robyn Anderson, Ian Bancroft, Harsh Raman, Yong Pyo Lim, Steven Robbens, Lixi Jiang, Shengyi Liu, Michael S. Barker, M. Eric Schranz, Xiaowu Wang, Graham J. King, J. Chris Pires, Boulos Chalhoub, Rod J. Snowdon, Jacqueline Batley, David Edwards

Summary: Plant genomes show significant presence/absence variation (PAV) within a species, with different causes of gene loss between diploids and polyploids. In diploids, gene loss propensity is primarily associated with transposable elements, while in polyploids like B. napus, gene loss propensity is linked to homoeologous recombination. These findings provide insights into the underlying biological and physical factors of gene presence/absence, paving the way for the application of machine learning methods in the field.

PLANT BIOTECHNOLOGY JOURNAL (2021)

Article Multidisciplinary Sciences

Analysis of the Coptis chinensis genome reveals the diversification of protoberberine-type alkaloids

Yifei Liu, Bo Wang, Shaohua Shu, Zheng Li, Chi Song, Di Liu, Yan Niu, Jinxin Liu, Jingjing Zhang, Heping Liu, Zhigang Hu, Bisheng Huang, Xiuyu Liu, Wei Liu, Liping Jiang, Mohammad Murtaza Alami, Yuxin Zhou, Yutao Ma, Xiangxiang He, Yicheng Yang, Tianyuan Zhang, Hui Hu, Michael S. Barker, Shilin Chen, Xuekui Wang, Jing Nie

Summary: Chinese goldthread (Coptis chinensis) is an early-diverging eudicot plant with diverse medicinal applications. The high-quality genome assembly and annotation of C. chinensis revealed a single ancient whole-genome duplication event shared by the Ranunculaceae family. The study also highlighted the functional importance of CYP719 gene in diversifying protoberberine-type alkaloids.

NATURE COMMUNICATIONS (2021)

Article Evolutionary Biology

Chromosome-Scale Genome Assembly of Gilia yorkii Enables Genetic Mapping of Floral Traits in an Interspecies Cross

David E. Jarvis, Peter J. Maughan, Joseph DeTemple, Veronica Mosquera, Zheng Li, Michael S. Barker, Leigh A. Johnson, Clinton J. Whipple

Summary: This study used the chromosome-scale reference genome of Gilia yorkii to investigate genome evolution in the Polemoniaceae and identified important genes related to inflorescence architecture and flower color variation through quantitative trait loci mapping. The results demonstrate that Gilia can serve as a genetic model for studying the evolution of development in plants.

GENOME BIOLOGY AND EVOLUTION (2022)

Article Multidisciplinary Sciences

Underwater CAM photosynthesis elucidated by Isoetes genome

David Wickell, Li-Yaung Kuo, Hsiao-Pei Yang, Amra Dhabalia Ashok, Iker Irisarri, Armin Dadras, Sophie de Vries, Jan de Vries, Yao-Moan Huang, Zheng Li, Michael S. Barker, Nolan T. Hartwick, Todd P. Michael, Fay-Wei Li

Summary: Despite the extensive characterization of crassulacean acid metabolism (CAM) in terrestrial angiosperms, little attention has been given to aquatics and early diverging land plants. Here, the authors assemble the genome of Isoetes taiwanensis and investigate the genetic factors driving CAM in this aquatic lycophyte. Despite broad similarities between CAM in Isoetes and terrestrial angiosperms, several key differences are identified, including the recruitment of 'bacterial-type' PEPC and diverged circadian control of key CAM pathway genes in Isoetes.

NATURE COMMUNICATIONS (2021)

Article Biology

Genome size evolution in the diverse insect order Trichoptera

Jacqueline Heckenhauer, Paul B. Frandsen, John S. Sproul, Zheng Li, Juraj Paule, Amanda M. Larracuente, Peter J. Maughan, Michael S. Barker, Julio Schneider, Russell J. Stewart, Steffen U. Pauls

Summary: The size of genomes in caddisflies varies greatly, and the expansion of repetitive elements, particularly transposable elements, is identified as a major driver of larger genome sizes. The association between transposable elements and genome size shows a linear relationship. Moreover, expanded genomes are more likely to occur in caddisfly lineages with higher ecological diversity.

GIGASCIENCE (2022)

No Data Available