期刊
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
卷 5, 期 2, 页码 313-318出版社
IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2007.1068
关键词
SNP; genotype; haplotype; phasing; algorithm
Emerging microarray technologies allow affordable typing of very long genome sequences. A key challenge in analyzing such a huge amount of data is scalable and accurate computational inferring of haplotypes (that is, splitting of each genotype into a pair of corresponding haplotypes). In this paper, we first phase genotypes consisting only of two SNPs using genotypes frequencies adjusted to the random mating model and then extend the phasing of two-SNP genotypes to the phasing of complete genotypes using maximum spanning trees. The runtime of the proposed 2SNP algorithm is O(nm(n + log m), where n and m are the numbers of genotypes and SNPs, respectively, and it can handle genotypes spanning the entire chromosomes in a matter of hours. On data sets across 23 chromosomal regions from HapMap [ 11], 2SNP is several orders of magnitude faster than GERBIL and PHASE when matching them in quality measured by the number of correctly phased genotypes, single-site, and switching errors. For example, the 2SNP software phases the entire chromosome (10(5) SNPs from HapMap) for 30 individuals in 2 hours with an average switching error of 7.7 percent. We have also enhanced the 2SNP algorithm to phase family trio data and compared it with four other well-known phasing methods on simulated data from [15]. 2SNP is much faster than all of them while losing in quality only to PHASE. 2SNP software is publicly available at http://alla.cs.gsu.edu/similar to software/2SNP.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据