Article
Biochemistry & Molecular Biology
Renjie Tan, Yufeng Shen
Summary: Exome sequencing is widely used in genetic studies and clinical diagnosis, but the data is noisy and existing methods can't achieve high precision and recall rates simultaneously. To address this, researchers developed a transfer learning method called CNV-espresso which encodes candidate CNVs as images and uses pretrained convolutional neural networks to classify copy number states. CNV-espresso outperforms manual inspection in large-scale exome sequencing studies.
NUCLEIC ACIDS RESEARCH
(2022)
Article
Multidisciplinary Sciences
David Laehnemann, Johannes Koester, Ute Fischer, Arndt Borkhardt, Alice C. McHardy, Alexander Schoenhuth
Summary: ProSolo is a tool for calling single nucleotide variants from MDA single cell DNA sequencing data, which integrates all relevant MDA biases to achieve higher accuracy and reliability in single cells compared to existing tools.
NATURE COMMUNICATIONS
(2021)
Article
Multidisciplinary Sciences
Kiran Krishnamachari, Dylan Lu, Alexander Swift-Scott, Anuar Yeraliyev, Kayla Lee, Weitai Huang, Sim Ngak Leng, Anders Jacobsen Skanderup
Summary: The authors develop VarNet, a weakly supervised deep learning model for somatic variant calling in cancer with robust performance across multiple cancer genomics datasets.
NATURE COMMUNICATIONS
(2022)
Article
Biochemical Research Methods
Alex Chklovski, Donovan H. Parks, Ben J. J. Woodcroft, Gene W. Tyson
Summary: This work presents CheckM2, a machine learning-based tool for predicting the genome quality of isolated, single-cell, and metagenome-assembled genomes. CheckM2 outperforms existing tools in accuracy and computational speed, as demonstrated by synthetic and experimental data. CheckM2's database can be rapidly updated with new high-quality reference genomes, even for taxa represented by only a single genome. It accurately predicts the genome quality of MAGs from novel lineages, including those with reduced genome size.
Article
Computer Science, Artificial Intelligence
Michael Franklin Mbouopda, Engelbert Mephu Nguifo
Summary: This paper proposes a time series classification method using shapelets, which exploit the shared characteristics among members of the same class to improve the computational efficiency. Experimental results show that the proposed method achieves higher accuracy and scalability compared to the state of the art Shapelet Transform algorithm.
PATTERN RECOGNITION
(2024)
Article
Biochemical Research Methods
Vladimir Smirnov
Summary: The paper introduces a new method called MAGUS for aligning large numbers of sequences, with enhancements that allow for faster alignment of larger datasets compared to other methods. Results demonstrate the advantages of MAGUS in both accuracy and speed over other alignment software.
PLOS COMPUTATIONAL BIOLOGY
(2021)
Article
Computer Science, Artificial Intelligence
Aljo Jose, Sujala D. Shetty
Summary: This study proposes an accurate and scalable click-through rate (CTR) prediction model for real-time recommendations, which uses an ensemble method and knowledge distillation to distill multiple CTR models into a more accurate and scalable deep neural network (DNN). The low latency of the distilled model makes it suitable for deployment in real-time recommender systems.
EXPERT SYSTEMS WITH APPLICATIONS
(2022)
Article
Plant Sciences
Sven E. Weber, Harmeet Singh Chawla, Lennard Ehrig, Lee T. Hickey, Matthias Frisch, Rod J. Snowdon
Summary: Genomic selection is widely used in modern plant breeding, but SNP arrays used in this process are prone to technical errors. This study demonstrates the potential value of failed allele calls in genomic prediction, especially when the failure is caused by biological reasons. The study also presents statistical pipelines to filter out failed SNP calls caused by biological reasons, improving prediction accuracy.
FRONTIERS IN PLANT SCIENCE
(2023)
Article
Medicine, General & Internal
Sudhir Jadhao, Candice L. Davison, Eileen Roulis, Elizna M. Schoeman, Mayur Divate, Mitchel Haring, Chris Williams, Arvind Jaya Shankar, Simon Lee, Natalie M. Pecheniuk, David O. Irving, Catherine A. Hyland, Robert L. Flower, Shivashankar H. Nagaraj
Summary: The study developed a novel genetic blood typing algorithm RBCeq to accurately identify 36 blood group systems, predict complex blood types, and report variants with potential clinical relevance. RBCeq can assist blood banks and laboratories in overcoming methodological limitations in multi-ethnic populations.
Article
Genetics & Heredity
Xiao Du, Lili Li, Fan Liang, Sanyang Liu, Wenxin Zhang, Shuai Sun, Yuhui Sun, Fei Fan, Linying Wang, Xinming Liang, Weijin Qiu, Guangyi Fan, Ou Wang, Weifei Yang, Jiezhong Zhang, Yuhui Xiao, Yang Wang, Depeng Wang, Shoufang Qu, Fang Chen, Jie Huang
Summary: Establishing high-confidence SV calls for a benchmark sample that has been characterized by multiple technologies provides a valuable resource for investigating SVs in human biology, disease, and clinical research.
GENOMICS PROTEOMICS & BIOINFORMATICS
(2022)
Article
Multidisciplinary Sciences
Guanhua Xun, Stephan Thomas Lane, Vassily Andrew Petrov, Brandon Elliott Pepa, Huimin Zhao
Summary: The SPOT system is a rapid, sensitive, and accurate COVID-19 diagnostic system that can detect virus samples quickly with high sensitivity and specificity. Its portability enables high-volume, low-cost testing capabilities in areas in urgent need of COVID-19 testing.
NATURE COMMUNICATIONS
(2021)
Article
Robotics
Thomas George Thuruthel, Josie Hughes, Antonia Georgopoulou, Frank Clemens, Fumiya Iida
Summary: Traditional soft robotic sensors are limited by their highly nonlinear time variant behavior. Current research focuses on improving mechano-electrical properties or modeling algorithms. This study presents a method for combining multi-material soft strain sensors to obtain higher quality sensors, allowing for accurate estimation of the strain state of a system.
IEEE ROBOTICS AND AUTOMATION LETTERS
(2021)
Article
Biochemistry & Molecular Biology
Furkan Ozden, Can Alkan, A. Ercument Cicek
Summary: Accurate and efficient detection of copy number variants (CNVs) is crucial for studying complex genetic diseases. However, copy number detection on whole-exome sequencing (WES) data is less accurate compared to whole-genome sequencing (WGS) data. This study introduces a novel deep learning model, DECoNT, which improves the precision of CNV detection on WES data sets, regardless of sequencing technology, exome capture kit, and CNV caller.
Article
Chemistry, Analytical
Bin Yang, Xiaowei Zeng, Jin Zhang, Jilie Kong, Xueen Fang
Summary: A highly selective and sensitive detection method has been developed to accurately identify the SARS-CoV-2 Delta variant. The assay has the advantages of rapidity, high sensitivity, good reproducibility, and the ability to differentiate between different SARS-CoV-2 variants, which has been validated in clinical samples.
Article
Biochemical Research Methods
Matthew The, Patroklos Samaras, Bernhard Kuster, Mathias Wilhelm
Summary: This article presents an extension to the Picked Protein FDR method that can handle protein groups, and introduces new strategies to obtain accurate FDR estimates. The validation analysis shows that the new method produces reliable protein group-level FDR estimates regardless of the dataset size.
MOLECULAR & CELLULAR PROTEOMICS
(2022)
Article
Mathematics, Applied
Suho Oh, Hwanchul Yoo, Taedong Yun
SIAM JOURNAL ON DISCRETE MATHEMATICS
(2013)
Article
Biochemical Research Methods
Lesley M. Chapman, Noah Spies, Patrick Pai, Chun Shen Lim, Andrew Carroll, Giuseppe Narzisi, Christopher M. Watson, Christos Proukakis, Wayne E. Clarke, Naoki Nariai, Eric Dawson, Garan Jones, Daniel Blankenberg, Christian Brueffer, Chunlin Xiao, Rohit Raj Kolora, Noah Alexander, Paul Wolujewicz, Azza E. Ahmed, Graeme Smith, Saadlee Shehreen, Aaron M. Wenger, Marc Salit, Justin M. Zook
PLOS COMPUTATIONAL BIOLOGY
(2020)
Article
Multidisciplinary Sciences
Jouni Siren, Jean Monlong, Xian Chang, Adam M. Novak, Jordan M. Eizenga, Charles Markello, Jonas A. Sibbesen, Glenn Hickey, Pi-Chuan Chang, Andrew Carroll, Namrata Gupta, Stacey Gabriel, Thomas W. Blackwell, Aakrosh Ratan, Kent D. Taylor, Stephen S. Rich, Jerome Rotter, David Haussler, Erik Garrison, Benedict Paten
Summary: Giraffe is a pangenome short-read mapper that efficiently maps to a collection of haplotypes threaded through a sequence graph. It speeds up mapping to thousands of human genomes and enables improved accuracy in genome-wide genotyping, ultimately enhancing genomic analyses. This tool facilitates a more comprehensive characterization of variation and has the potential to benefit various genomic studies.
Article
Multidisciplinary Sciences
Zachary R. McCaw, Thomas Colthurst, Taedong Yun, Nicholas A. Furlotte, Andrew Carroll, Babak Alipanahi, Cory Y. McLean, Farhad Hormozdiari
Summary: DeepNull is a method that uses deep learning to identify and adjust for non-linear relationships, improving statistical power and phenotypic prediction in genome-wide association studies (GWAS).
NATURE COMMUNICATIONS
(2022)
Article
Biology
Jared O'Connell, Taedong Yun, Meghan Moreno, Helen Li, Nadia Litterman, Alexey Kolesnikov, Elizabeth Noblin, Pi-Chuan Chang, Anjali Shastri, Elizabeth H. Dorfman, Suyash Shringarpure, Adam Auton, Andrew Carroll, Cory Y. McLean, Stella Aslibekyan, Elizabeth Babalola, Robert K. Bell, Jessica Bielenberg, Katarzyna Bryc, Emily Bullis, Daniella Coker, Gabriel Cuellar Partida, Devika Dhamija, Sayantan Das, Sarah L. Elson, Teresa Filshtein, Kipper Fletez-Brant, Pierre Fontanillas, Will Freyman, Pooja M. Gandhi, Karl Heilbron, Alejandro Hernandez, Barry Hicks, David A. Hinds, Ethan M. Jewett, Yunxuan Jiang, Katelyn Kukar, Keng-Han Lin, Maya Lowe, Jey McCreight, Matthew H. Mclntyre, Steven J. Micheletti, Joanna L. Mountain, Priyanka Nandakumar, Aaron A. Petrakovitz, G. David Poznik, Morgan Schumacher, Janie F. Shelton, Jingchunzi Shi, Christophe Toukam Tchakoute, Vinh Tran, Joyce Y. Tung, Xin Wang, Wei Wang, Catherine H. Weldon, Peter Wilton, Corinna Wong
Summary: A new genome-wide imputation reference panel comprising 2,269 individuals of Sub-Saharan African ancestries was constructed by O'Connell et al. Using DeepVariant, they created best practices for reference panel development and generated a high-quality resource that will empower high-resolution genome-wide imputation efforts for individuals with African ancestries. The raw sequencing data, variant calls, and imputation panel for this cohort are freely available via dbGaP, serving as an invaluable resource for further study of admixed African genetics.
COMMUNICATIONS BIOLOGY
(2021)
Article
Biochemistry & Molecular Biology
Charles Markello, Charles Huang, Alex Rodriguez, Andrew Carroll, Pi-Chuan Chang, Jordan Eizenga, Thomas Markello, David Haussler, Benedict Paten
Summary: This study introduces a pedigree-aware workflow based on pangenome graphs to improve the accuracy of genome mapping and variant calling. The workflow shows significant improvements in single-nucleotide variants and insertion/deletion variants compared to linear-reference mapping and pangenome graph mapping. Additionally, the study adapts and upgrades deleterious-variant detecting methods for streamlined application in undiagnosed diseases.
Article
Biotechnology & Applied Microbiology
Gunjan Baid, Daniel E. Cook, Kishwar Shafin, Taedong Yun, Felipe Llinares-Lopez, Quentin Berthet, Anastasiya Belyaeva, Armin Topfer, Aaron M. Wenger, William J. Rowell, Howard Yang, Alexey Kolesnikov, Waleed Ammar, Jean-Philippe Vert, Ashish Vaswani, Cory Y. McLean, Maria Nattestad, Pi-Chuan Chang, Andrew Carroll
Summary: DeepConsensus improves the accuracy and quality of PacBio HiFi reads by significantly reducing read errors through sequence correction.
NATURE BIOTECHNOLOGY
(2023)
Article
Biochemical Research Methods
Nae-Chyun Chen, Alexey Kolesnikov, Sidharth Goel, Taedong Yun, Pi-Chuan Chang, Andrew Carroll
Summary: In this study, population-aware DeepVariant models were developed to improve the accuracy and recall of variant calling in single samples. By using allele frequencies from the 1000 Genomes Project, this model reduced variant calling errors and improved the precision of rare homozygous and pathogenic clinvar calls. The study also found that diverse reference panels were more accurate than population-specific panels, even when the sample ancestry matched the population.
BMC BIOINFORMATICS
(2023)
Article
Genetics & Heredity
Justin Cosentino, Babak Behsaz, Babak Alipanahi, Zachary R. McCaw, Davin Hill, Tae-Hwi Schwantes-An, Dongbing Lai, Andrew Carroll, Brian D. Hobbs, Michael H. Cho, Cory Y. McLean, Farhad Hormozdiari
Summary: A deep convolutional neural network is utilized to predict COPD case-control status using raw spirograms and noisy medical-record-based labels. The machine-learning-based liability score accurately distinguishes COPD cases and controls, predicts COPD-related hospitalization, and is associated with overall survival and exacerbation events. The genome-wide association study on the liability score replicates known COPD and lung function loci and identifies new loci.