4.7 Article

Genotype imputation via matrix completion

期刊

GENOME RESEARCH
卷 23, 期 3, 页码 509-518

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
DOI: 10.1101/gr.145821.112

关键词

-

资金

  1. United States Public Health Service [GM53275, HG006139]
  2. NCSU FRPD
  3. UC MEXUS-CONACYT doctoral fellowship [213627]
  4. Direct For Mathematical & Physical Scien
  5. Division Of Mathematical Sciences [1310319] Funding Source: National Science Foundation

向作者/读者索取更多资源

Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biology

WiSER: Robust and scalable estimation and inference of within-subject variances from intensive longitudinal data

Christopher A. German, Janet S. Sinsheimer, Jin Zhou, Hua Zhou

Summary: The availability of longitudinal data from electronic health records and wearable devices has opened up new research questions. In many studies, individual variability of a longitudinal outcome is as important as the mean. This article proposes a scalable method, WiSER, for estimating and inferring the effects of predictors on within-subject variance. It is robust and computationally efficient.

BIOMETRICS (2022)

Article Mathematical & Computational Biology

Efficient Algorithms and Implementation of a Semiparametric Joint Model for Longitudinal and Competing Risk Data: With Applications to Massive Biobank Data

Shanpeng Li, Ning Li, Hong Wang, Jin Zhou, Hua Zhou, Gang Li

Summary: This paper addresses the computational barriers in semiparametric joint models for longitudinal and competing risk survival data, and proposes customized linear scan algorithms to reduce computational complexities and significantly speed up the existing methods.

COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE (2022)

Article Statistics & Probability

A User-Friendly Computational Framework for Robust Structured Regression with the L-2 Criterion

Jocelyn T. Chi, Eric C. Chi

Summary: We introduce a user-friendly computational framework for implementing robust versions of structured regression methods. The framework allows robust regression with the L-2 criterion for additional structural constraints, without requiring complex tuning procedures. It can be used to identify heterogeneous subpopulations and can incorporate nonrobust structured regression solvers. We provide convergence guarantees for the framework and demonstrate its flexibility with examples.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2022)

Article Computer Science, Artificial Intelligence

Revisiting convexity-preserving signal recovery with the linearly involved GMC penalty

Xiaoqian Liu, Eric C. Chi

Summary: The paper introduces a newly proposed regularizer called the generalized minimax concave (GMC) penalty, which maintains the convexity of the objective function. The paper focuses on signal recovery with the linearly involved GMC penalty and presents a new method for setting the matrix parameter and solving the penalty. The paper also analyzes the desirable properties of the solution path and applies the linearly involved GMC penalty to 1-D signal recovery and matrix regression, demonstrating its superior performance compared to the total variation (TV) regularizer.

PATTERN RECOGNITION LETTERS (2022)

Article Computer Science, Artificial Intelligence

Multi-scale affinities with missing data: Estimation and applications

Min Zhang, Gal Mishne, Eric C. Chi

Summary: The paper introduces a new method for constructing row and column affinities even when data are missing by leveraging a co-clustering technique. It exploits solving the optimization problem for multiple pairs of cost parameters and filling in missing values with increasingly smooth estimates. This approach takes advantage of the coupled similarity structure among both the rows and columns of a data matrix.

STATISTICAL ANALYSIS AND DATA MINING (2022)

Article Computer Science, Artificial Intelligence

Bag of little bootstraps for massive and distributed longitudinal data

Xinkai Zhou, Jin J. Zhou, Hua Zhou

Summary: The study introduces a highly efficient statistical method for analyzing very large longitudinal datasets, showing significant advantages over traditional methods.

STATISTICAL ANALYSIS AND DATA MINING (2022)

Article Statistics & Probability

A Legacy of EM Algorithms

Kenneth Lange, Hua Zhou

Summary: Nan Laird has made a significant impact on computational statistics, particularly in the areas of the expectation-maximisation algorithm and longitudinal modelling. This article revisits the derivation of some of her most useful algorithms, using the perspective of the minorisation-maximisation principle. The MM principle allows for a more straightforward implementation of the classical EM algorithm and suggests the potential for faster convergence in entirely new algorithms, particularly in high-dimensional settings.

INTERNATIONAL STATISTICAL REVIEW (2022)

Article Plant Sciences

Sequential hybridization may have facilitated ecological transitions in the Southwestern pinyon pine syngameon

Ryan Buck, Diego Ortega-Del Vecchyo, Catherine Gehring, Rhett Michelson, Dulce Flores-Renteria, Barbara Klein, Amy V. Whipple, Lluvia Flores-Renteria

Summary: This study evaluates the formation, structure, and maintenance of a multispecies interbreeding network, and finds that gene flow in syngameons can increase genetic diversity, facilitate colonization of new environments, and contribute to hybrid speciation. The study also demonstrates that participation in syngameons can maintain morphological and genetic distinctiveness at species boundaries, while allowing for extensive gene flow in sympatric areas.

NEW PHYTOLOGIST (2023)

Article Mathematics, Applied

ORTHOGONAL TRACE-SUM MAXIMIZATION: TIGHTNESS OF THE SEMIDEFINITE RELAXATION AND GUARANTEE OF LOCALLY OPTIMAL SOLUTIONS

Joong-Ho Won, Teng Zhang, Hua Zhou

Summary: This paper studies an optimization problem on the sum of traces of matrix quadratic forms in m semiorthogonal matrices, which can be considered as a generalization of the synchronization of rotations. The paper shows that its semidefinite programming relaxation solves the original nonconvex problems exactly with high probability under an additive noise model with small noise in the order of O(m(1/4)). In addition, it shows that the sufficient condition for global optimality considered in a previous paper is also necessary under a similar small noise condition.

SIAM JOURNAL ON OPTIMIZATION (2022)

Article Statistics & Probability

A Sharper Computational Tool for L2E Regression

Xiaoqian Liu, Eric C. Chi, Kenneth Lange

Summary: Building on previous research, this article focuses on estimation in robust structured regression under the L2E criterion. The authors propose a new algorithm for updating the regression coefficients using the majorization-minimization (MM) principle, which achieves faster convergence compared to the existing method. They also simplify and accelerate the estimation process by reparameterizing the model and estimating precision using a modified Newton's method. Additionally, the authors introduce distance-to-set penalties for constrained estimation, resulting in improved performance in coefficient estimation and structure recovery. The proposed tactics are validated through simulation examples and a real data application.

TECHNOMETRICS (2023)

Article Statistics & Probability

Bayesian Trend Filtering via Proximal Markov Chain Monte Carlo

Qiang Heng, Hua Zhou, Eric C. Chi

Summary: Proximal Markov chain Monte Carlo is a novel approach that combines Bayesian computation with convex optimization to popularize the use of nondifferentiable priors in Bayesian statistics. This article extends the paradigm of proximal MCMC by introducing a new class of nondifferentiable priors called epigraph priors. The proposed method enables automated regularization parameter selection and achieves simultaneous calibration of mean, scale, and regularization parameters in a fully Bayesian framework.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2023)

Article Statistics & Probability

Bayesian Inference Using the Proximal Mapping: Uncertainty Quantification Under Varying Dimensionality

Maoran Xu, Hua Zhou, Yujie Hu, Leo L. Duan

Summary: In statistical applications, it is common to encounter parameters supported on a varying or unknown dimensional space. To avoid this issue, we propose a new generative process for the prior: starting from a continuous random variable, we transform it into a varying-dimensional space using the proximal mapping. This allows us to directly exploit popular frequentist regularizations and algorithms, while providing a principled and probabilistic uncertainty estimation.

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2023)

Article Statistics & Probability

Robust Low-Rank Tensor Decomposition with the L2 Criterion

Qiang Heng, Eric C. Chi, Yufeng Liu

Summary: In this article, a robust Tucker decomposition estimator called Tucker-L2E, based on the L-2 criterion, is presented to enhance the robustness against outliers. Numerical experiments demonstrate that Tucker-L2E has stronger recovery performance in challenging high-rank scenarios compared to existing alternatives. The appropriate Tucker-rank can be selected in a data-driven manner using cross-validation or hold-out validation. The practical effectiveness of Tucker-L2E is validated on real data applications in fMRI tensor denoising, PARAFAC analysis of fluorescence data, and feature extraction for classification of corrupted images.

TECHNOMETRICS (2023)

Article Genetics & Heredity

Demographic modeling of admixed Latin American populations from whole genomes

Santiago G. Medina-Munoz, Diego Ortega-Del Vecchyo, Luis Pablo Cruz-Hervert, Leticia Ferreyra-Reyes, Lourdes Garcia-Garcia, Andres Moreno-Estrada, Aaron P. Ragsdale

Summary: This study used high-coverage whole-genome data and existing genomes from Latin America to infer the complex evolutionary history of Latin American populations. The models developed in this study provide a more accurate prediction of genetic variation in admixed populations and can be a valuable resource for future studies.

AMERICAN JOURNAL OF HUMAN GENETICS (2023)

暂无数据