☆ 4.7 Article

Genotype imputation via matrix completion

GENOME RESEARCH (2013)

期刊

GENOME RESEARCH

卷 23, 期 3, 页码 509-518

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT

DOI: 10.1101/gr.145821.112

关键词

-

类别

Biochemistry & Molecular Biology Biotechnology & Applied Microbiology Genetics & Heredity

资金

United States Public Health Service [GM53275, HG006139]
NCSU FRPD
UC MEXUS-CONACYT doctoral fellowship [213627]
Direct For Mathematical & Physical Scien
Division Of Mathematical Sciences [1310319] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

Reagent

摘要

Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Editorial Material Biochemistry & Molecular Biology

Paving the path toward genomic privacy with secure imputation

Maxwell A. Sherman

Summary: It is crucial for the biomedical community to protect the privacy of participants in genomic studies, and the accurate and efficient implementation of secure genotype imputation offers practical ways to safeguard sensitive genomic data for various bioinformatics applications.

CELL SYSTEMS (2021)

添加到收藏夹

Article Multidisciplinary Sciences

Evaluation of the genetic risk for COVID-19 outcomes in COPD and differences among worldwide populations

Rui Marcalo, Sonya Neto, Miguel Pinheiro, Ana J. Rodrigues, Nuno Sousa, Manuel A. S. Santos, Paula Simao, Carla Valente, Lilia Andrade, Alda Marques, Gabriela R. Moura

Summary: This study reveals a high genetic heterogeneity for COVID-19 susceptibility and severity across global populations, and it suggests that the prognosis of patients with COPD is not related to genetic risk.

PLOS ONE (2022)

添加到收藏夹

Article Biotechnology & Applied Microbiology

ScLRTC: imputation for single-cell RNA-seq data via low-rank tensor completion

Xiutao Pan, Zhong Li, Shengwei Qin, Minzhe Yu, Hang Hu

Summary: The novel method scLRTC, based on low-rank tensor completion, shows superior performance in imputing dropout values in scRNA-seq data compared to state-of-the-art tools. It excels in restoring gene expression levels and achieving accurate cell classification results on both simulated and real datasets.

BMC GENOMICS (2021)

添加到收藏夹

Article Multidisciplinary Sciences

A five-safes approach to a secure and scalable genomics data repository

Chih Chuan Shih, Jieqi Chen, Ai Shan Lee, Nicolas Bertin, Maxime Hebrard, Chiea Chuen Khor, Zheng Li, Joanna Hui Juan Tan, Wee Yang Meah, Su Qin Peh, Shi Qi Mok, Kar Seng Sim, Jianjun Liu, Ling Wang, Eleanor Wong, Jingmei Li, Aung Tin, Ching-Yu Cheng, Chew-Kiat Heng, Jian-Min Yuan, Woon-Puay Koh, Seang Mei Saw, Yechiel Friedlander, Xueling Sim, Jin Fang Chai, Yap Seng Chong, Sonia Davila, Liuh Ling Goh, Eng Sing Lee, Tien Yin Wong, Neerja Karnani, Khai Pang Leong, Khung Keong Yeo, John C. Chambers, Su Chi Lim, Rick Siow Mong Goh, Patrick Tan, Rajkumar Dorajoo

Summary: Genomic researchers are increasingly using commercial cloud service providers (CSPs) to manage data and analytics needs. However, without adequate security controls, the risk of unauthorized access to cloud-stored data may be higher. The Research Assets Provisioning and Tracking Online Repository (RAPTOR) by the Genome Institute of Singapore is a cloud-native genomics data repository and analytics platform that implements a five-safes framework to provide security and governance controls to data contributors and users, ensuring compliance with regulations.

ISCIENCE (2023)

添加到收藏夹

Article Biochemical Research Methods

Achieving improved accuracy for imputation of ancient DNA

Kristiina Ausmees, Carl Nettelblad

Summary: The study investigates the benefits of an imputation method based on haplotype frequencies in ancient DNA analysis, showing improved accuracy and ability to capture rare variation at lower coverages. The software prophaser is optimized for parallel processing on GPUs, offering reasonable runtimes in experiments.

BIOINFORMATICS (2023)

添加到收藏夹

Article Engineering, Electrical & Electronic

Rail transit OD-matrix completion via manifold regularized tensor factorisation

Hanxuan Dong, Fan Ding, Huachun Tan, Yuankai Wu, Qin Li, Bin Ran

Summary: A novel tensor completion method is proposed in this paper for imputing missing data in the origin-destination matrices of rail transit. By establishing an OD-matrix tensor and extracting similarity matrices, the method successfully achieves accurate imputation of missing data.

IET INTELLIGENT TRANSPORT SYSTEMS (2021)

添加到收藏夹

Article Genetics & Heredity

Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses

Alison R. Barton, Maxwell A. Sherman, Ronen E. Mukamel, Po-Ru Loh

Summary: This study leveraged haplotype sharing in the UK Biobank to impute exome-wide variants and identified significant associations involving rare protein-altering variants. The research revealed significant associations in multiple genes and proposed allelic series containing multiple "likely-causal" variants.

NATURE GENETICS (2021)

添加到收藏夹

Article Computer Science, Information Systems

Quaternion-based color image completion via logarithmic approximation

Liqiao Yang, Jifei Miao, Kit Ian Kou

Summary: In this paper, a method is proposed to apply quaternion matrix framework to image completion, which approximates rank with new quaternion matrix logarithmic norm. Unlike traditional methods that handle RGB channels separately and may destroy the image structure, this method uses a pure quaternion matrix to preserve the image structure. The logarithmic norm is used to improve the accuracy of rank estimation, and experimental results show that this approach achieves superior performance in color image completion.

INFORMATION SCIENCES (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

CMDB: the comprehensive population genome variation database of China

Zhichao Li, Xiaosen Jiang, Mingyan Fang, Yong Bai, Siyang Liu, Shujia Huang, Xin Jin

Summary: The Chinese Millionome Database (CMDB) is a database that contains low-coverage whole-genome sequencing (WGS) data from 141,431 unrelated healthy Chinese individuals, covering 9.04 million single nucleotide variants (SNV) with allele frequency information. The CMDB is the most representative and comprehensive Chinese population genome database to date, housing data from a multi-ethnic Chinese population with wide geographical distribution.

NUCLEIC ACIDS RESEARCH (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Matrix Completion via Schatten Capped p Norm

Guorui Li, Guang Guo, Sancheng Peng, Cong Wang, Shui Yu, Jianwei Niu, Jianli Mo

Summary: This paper introduces a new approach to solve the low-rank matrix completion problem. By designing a new non-convex Schatten capped p norm, which balances between the rank and nuclear norm of the matrix, a matrix completion method is proposed. Through extensive experiments in image inpainting, the proposed method is shown to improve the accuracy of matrix completion compared with existing methods.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2022)

添加到收藏夹

Article Multidisciplinary Sciences

Simple, fast, and flexible framework for matrix completion with infinite width neural networks

Adityanarayanan Radhakrishnan, George Stefanakis, Mikhail Belkin, Caroline Uhler

Summary: This paper proposes an infinite width neural network framework for matrix completion, which is simple, fast, and flexible. The effectiveness of the framework is demonstrated through competitive results in applications such as virtual drug screening and image inpainting/reconstruction.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2022)

添加到收藏夹

Article Psychiatry

HLA-DQB1 6672G>C (rs113332494) is associated with clozapine-induced neutropenia and agranulocytosis in individuals of European ancestry

Bettina Konte, James T. R. Walters, Dan Rujescu, Sophie E. Legge, Antonio F. Pardinas, Dan Cohen, Munir Pirmohamed, Jari Tiihonen, Annette M. Hartmann, Jan P. Bogers, Jan van der Weide, Karen van der Weide, Anu Putkonen, Eila Repo-Tiihonen, Tero Hallikainen, Ed Silva, Oddur Ingimarsson, Engilbert Sigurdsson, James L. Kennedy, Patrick F. Sullivan, Marcella Rietschel, Gerome Breen, Hreinn Stefansson, Kari Stefansson, David A. Collier, Michael C. O'Donovan, Ina Giegling

Summary: The study suggests that the HLA-DQB1 gene plays a significant role in clozapine-induced agranulocytosis and neutropenia, indicating the involvement of immune system factors. Using local ancestry estimates can help identify risk variants and improve the prediction of hematological adverse effects.

TRANSLATIONAL PSYCHIATRY (2021)

添加到收藏夹

Article Radiology, Nuclear Medicine & Medical Imaging

A deep matrix completion method for imputing missing histological data in breast cancer by integrating DCE-MRI radiomics

Ming Fan, You Zhang, Zhenyu Fu, Maosheng Xu, Shiwei Wang, Sangma Xie, Xin Gao, Yue Wang, Lihua Li

Summary: The DMC method proposed in this study significantly improved the imputation performance by integrating tumor histological and radiomics data. It showed better prediction performance compared to other methods, indicating its potential in tumor characterization and patient management.

MEDICAL PHYSICS (2021)

添加到收藏夹

Article Environmental Sciences

Cooperative Electromagnetic Data Annotation via Low-Rank Matrix Completion

Wei Zhang, Jian Yang, Qiang Li, Jingran Lin, Huaizong Shao, Guomin Sun

Summary: Cooperative electromagnetic data annotation is a crucial step in signal processing applications. This study proposes a low-rank matrix recovery approach for cooperative annotation, which effectively exploits the correlation of electromagnetic signals and observations from multiple receivers.

REMOTE SENSING (2023)

添加到收藏夹

Article Multidisciplinary Sciences

Genomic prediction for testes weight of the tiger pufferfish, Takifugu rubripes, using medium to low density SNPs

Sho Hosoya, Sota Yoshikawa, Mana Sato, Kiyoshi Kikuchi

Summary: The study found that the prediction accuracy of genomic selection for standard length, body weight, and testes weight in tiger pufferfish was within an acceptable range when using 4000 or 1200 SNPs. However, predictive abilities decreased with less than 1200 SNPs due to reduced accuracy in estimating genetic relationships among individuals.

SCIENTIFIC REPORTS (2021)

添加到收藏夹

Article Biology

WiSER: Robust and scalable estimation and inference of within-subject variances from intensive longitudinal data

Christopher A. German, Janet S. Sinsheimer, Jin Zhou, Hua Zhou

Summary: The availability of longitudinal data from electronic health records and wearable devices has opened up new research questions. In many studies, individual variability of a longitudinal outcome is as important as the mean. This article proposes a scalable method, WiSER, for estimating and inferring the effects of predictors on within-subject variance. It is robust and computationally efficient.

BIOMETRICS (2022)

添加到收藏夹

Article Mathematical & Computational Biology

Efficient Algorithms and Implementation of a Semiparametric Joint Model for Longitudinal and Competing Risk Data: With Applications to Massive Biobank Data

Shanpeng Li, Ning Li, Hong Wang, Jin Zhou, Hua Zhou, Gang Li

Summary: This paper addresses the computational barriers in semiparametric joint models for longitudinal and competing risk survival data, and proposes customized linear scan algorithms to reduce computational complexities and significantly speed up the existing methods.

COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE (2022)

添加到收藏夹

Article Statistics & Probability

A User-Friendly Computational Framework for Robust Structured Regression with the L-2 Criterion

Jocelyn T. Chi, Eric C. Chi

Summary: We introduce a user-friendly computational framework for implementing robust versions of structured regression methods. The framework allows robust regression with the L-2 criterion for additional structural constraints, without requiring complex tuning procedures. It can be used to identify heterogeneous subpopulations and can incorporate nonrobust structured regression solvers. We provide convergence guarantees for the framework and demonstrate its flexibility with examples.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Revisiting convexity-preserving signal recovery with the linearly involved GMC penalty

Xiaoqian Liu, Eric C. Chi

Summary: The paper introduces a newly proposed regularizer called the generalized minimax concave (GMC) penalty, which maintains the convexity of the objective function. The paper focuses on signal recovery with the linearly involved GMC penalty and presents a new method for setting the matrix parameter and solving the penalty. The paper also analyzes the desirable properties of the solution path and applies the linearly involved GMC penalty to 1-D signal recovery and matrix regression, demonstrating its superior performance compared to the total variation (TV) regularizer.

PATTERN RECOGNITION LETTERS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Multi-scale affinities with missing data: Estimation and applications

Min Zhang, Gal Mishne, Eric C. Chi

Summary: The paper introduces a new method for constructing row and column affinities even when data are missing by leveraging a co-clustering technique. It exploits solving the optimization problem for multiple pairs of cost parameters and filling in missing values with increasingly smooth estimates. This approach takes advantage of the coupled similarity structure among both the rows and columns of a data matrix.

STATISTICAL ANALYSIS AND DATA MINING (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Bag of little bootstraps for massive and distributed longitudinal data

Xinkai Zhou, Jin J. Zhou, Hua Zhou

Summary: The study introduces a highly efficient statistical method for analyzing very large longitudinal datasets, showing significant advantages over traditional methods.

STATISTICAL ANALYSIS AND DATA MINING (2022)

添加到收藏夹

Article Statistics & Probability

A Legacy of EM Algorithms

Kenneth Lange, Hua Zhou

Summary: Nan Laird has made a significant impact on computational statistics, particularly in the areas of the expectation-maximisation algorithm and longitudinal modelling. This article revisits the derivation of some of her most useful algorithms, using the perspective of the minorisation-maximisation principle. The MM principle allows for a more straightforward implementation of the classical EM algorithm and suggests the potential for faster convergence in entirely new algorithms, particularly in high-dimensional settings.

INTERNATIONAL STATISTICAL REVIEW (2022)

添加到收藏夹

Article Plant Sciences

Sequential hybridization may have facilitated ecological transitions in the Southwestern pinyon pine syngameon

Ryan Buck, Diego Ortega-Del Vecchyo, Catherine Gehring, Rhett Michelson, Dulce Flores-Renteria, Barbara Klein, Amy V. Whipple, Lluvia Flores-Renteria

Summary: This study evaluates the formation, structure, and maintenance of a multispecies interbreeding network, and finds that gene flow in syngameons can increase genetic diversity, facilitate colonization of new environments, and contribute to hybrid speciation. The study also demonstrates that participation in syngameons can maintain morphological and genetic distinctiveness at species boundaries, while allowing for extensive gene flow in sympatric areas.

NEW PHYTOLOGIST (2023)

添加到收藏夹

Article Mathematics, Applied

ORTHOGONAL TRACE-SUM MAXIMIZATION: TIGHTNESS OF THE SEMIDEFINITE RELAXATION AND GUARANTEE OF LOCALLY OPTIMAL SOLUTIONS

Joong-Ho Won, Teng Zhang, Hua Zhou

Summary: This paper studies an optimization problem on the sum of traces of matrix quadratic forms in m semiorthogonal matrices, which can be considered as a generalization of the synchronization of rotations. The paper shows that its semidefinite programming relaxation solves the original nonconvex problems exactly with high probability under an additive noise model with small noise in the order of O(m(1/4)). In addition, it shows that the sufficient condition for global optimality considered in a previous paper is also necessary under a similar small noise condition.

SIAM JOURNAL ON OPTIMIZATION (2022)

添加到收藏夹

Article Statistics & Probability

A Sharper Computational Tool for L2E Regression

Xiaoqian Liu, Eric C. Chi, Kenneth Lange

Summary: Building on previous research, this article focuses on estimation in robust structured regression under the L2E criterion. The authors propose a new algorithm for updating the regression coefficients using the majorization-minimization (MM) principle, which achieves faster convergence compared to the existing method. They also simplify and accelerate the estimation process by reparameterizing the model and estimating precision using a modified Newton's method. Additionally, the authors introduce distance-to-set penalties for constrained estimation, resulting in improved performance in coefficient estimation and structure recovery. The proposed tactics are validated through simulation examples and a real data application.

TECHNOMETRICS (2023)

添加到收藏夹

Article Statistics & Probability

Bayesian Trend Filtering via Proximal Markov Chain Monte Carlo

Qiang Heng, Hua Zhou, Eric C. Chi

Summary: Proximal Markov chain Monte Carlo is a novel approach that combines Bayesian computation with convex optimization to popularize the use of nondifferentiable priors in Bayesian statistics. This article extends the paradigm of proximal MCMC by introducing a new class of nondifferentiable priors called epigraph priors. The proposed method enables automated regularization parameter selection and achieves simultaneous calibration of mean, scale, and regularization parameters in a fully Bayesian framework.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2023)

添加到收藏夹

Article Statistics & Probability

Bayesian Inference Using the Proximal Mapping: Uncertainty Quantification Under Varying Dimensionality

Maoran Xu, Hua Zhou, Yujie Hu, Leo L. Duan

Summary: In statistical applications, it is common to encounter parameters supported on a varying or unknown dimensional space. To avoid this issue, we propose a new generative process for the prior: starting from a continuous random variable, we transform it into a varying-dimensional space using the proximal mapping. This allows us to directly exploit popular frequentist regularizations and algorithms, while providing a principled and probabilistic uncertainty estimation.

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2023)

添加到收藏夹

Article Statistics & Probability

Robust Low-Rank Tensor Decomposition with the L2 Criterion

Qiang Heng, Eric C. Chi, Yufeng Liu

Summary: In this article, a robust Tucker decomposition estimator called Tucker-L2E, based on the L-2 criterion, is presented to enhance the robustness against outliers. Numerical experiments demonstrate that Tucker-L2E has stronger recovery performance in challenging high-rank scenarios compared to existing alternatives. The appropriate Tucker-rank can be selected in a data-driven manner using cross-validation or hold-out validation. The practical effectiveness of Tucker-L2E is validated on real data applications in fMRI tensor denoising, PARAFAC analysis of fluorescence data, and feature extraction for classification of corrupted images.

TECHNOMETRICS (2023)

添加到收藏夹

Article Genetics & Heredity

Demographic modeling of admixed Latin American populations from whole genomes

Santiago G. Medina-Munoz, Diego Ortega-Del Vecchyo, Luis Pablo Cruz-Hervert, Leticia Ferreyra-Reyes, Lourdes Garcia-Garcia, Andres Moreno-Estrada, Aaron P. Ragsdale

Summary: This study used high-coverage whole-genome data and existing genomes from Latin America to infer the complex evolutionary history of Latin American populations. The models developed in this study provide a more accurate prediction of genetic variation in admixed populations and can be a valuable resource for future studies.

AMERICAN JOURNAL OF HUMAN GENETICS (2023)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.