4.6 Article

OD-seq: outlier detection in multiple sequence alignments

期刊

BMC BIOINFORMATICS
卷 16, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s12859-015-0702-1

关键词

Outlier; Multiple sequence alignment

资金

  1. Science Foundation Ireland [11/PI/1034]
  2. Science Foundation Ireland (SFI) [11/PI/1034] Funding Source: Science Foundation Ireland (SFI)

向作者/读者索取更多资源

Background: Multiple sequence alignments (MSA) are widely used in sequence analysis for a variety of tasks. Outlier sequences can make downstream analyses unreliable or make the alignments less accurate while they are being constructed. This paper describes a simple method for automatically detecting outliers and accompanying software called OD-seq. It is based on finding sequences whose average distance to the rest of the sequences in a dataset, is anomalous. Results: The software can take a MSA, distance matrix or set of unaligned sequences as input. Outlier sequences are found by examining the average distance of each sequence to the rest. Anomalous average distances are then found using the interquartile range of the distribution of average distances or by bootstrapping them. The complexity of any analysis of a distance matrix is normally at least O(N-2) for N sequences. This is prohibitive for large N but is reduced here by using the mBed algorithm from Clustal Omega. This reduces the complexity to O(N log(N)) which makes even very large alignments easy to analyse on a single core. We tested the ability of OD-seq to detect outliers using artificial test cases of sequences from Pfam families, seeded with sequences from other Pfam families. Using a MSA as input, OD-seq is able to detect outliers with very high sensitivity and specificity. Conclusion: OD-seq is a practical and simple method to detect outliers in MSAs. It can also detect outliers in sets of unaligned sequences, but with reduced accuracy. For medium sized alignments, of a few thousand sequences, it can detect outliers in a few seconds. Software available as http://www. bioinf. ucd. ie/download/od-seq. tar. gz.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemical Research Methods

Instability in progressive multiple sequence alignment algorithms

Kieran Boyce, Fabian Sievers, Desmond G. Higgins

ALGORITHMS FOR MOLECULAR BIOLOGY (2015)

Article Biochemical Research Methods

Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments

Gearoid Fox, Fabian Sievers, Desmond G. Higgins

BIOINFORMATICS (2016)

Article Biochemistry & Molecular Biology

Measuring Transcription Rate Changes via Time-Course 4-Thiouridine Pulse-Labelling Improves Transcriptional Target Identification

Thomas Schwarzl, Desmond G. Higgins, Walter Kolch, David J. Duffy

JOURNAL OF MOLECULAR BIOLOGY (2015)

Letter Multidisciplinary Sciences

Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments

Kieran Boyce, Fabian Sievers, Desmond G. Higgins

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2015)

Article Oncology

Integrative omics reveals MYCN as a global suppressor of cellular signalling and enables network-based therapeutic target discovery in neuroblastoma

David J. Duffy, Aleksandar Krstic, Melinda Halasz, Thomas Schwarzl, Dirk Fey, Kristiina Iljin, Jai Prakash Mehta, Kate Killick, Jenny Whilde, Benedetta Turriziani, Saija Haapa-Paananen, Vidal Fey, Matthias Fischer, Frank Westermann, Kai-Oliver Henrich, Steffen Bannert, Desmond G. Higgins, Walter Kolch

ONCOTARGET (2015)

Article Biochemistry & Molecular Biology

Prolyl hydroxylase-1 regulates hepatocyte apoptosis in an NF-κB-dependent manner

Susan F. Fitzpatrick, Zsolt Fabian, Bettina Schaible, Colin R. Lenihan, Thomas Schwarzl, Javier Rodriguez, Xingnan Zheng, Zongwei Li, Murtaza M. Tambuwala, Desmond G. Higgins, Yvonne O'Meara, Craig Slattery, Mario C. Manresa, Peter Fraisl, Ulrike Bruning, Myriam Baes, Peter Carmeliet, Glen Doherty, Alex von Kriegsheim, Eoin P. Cummins, Cormac T. Taylor

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS (2016)

Article Biochemistry & Molecular Biology

ProViz-a web-based visualization tool to investigate the functional and evolutionary features of protein sequences

Peter Jehl, Jean Manguy, Denis C. Shields, Desmond G. Higgins, Norman E. Davey

NUCLEIC ACIDS RESEARCH (2016)

Article Multidisciplinary Sciences

Identification of Non-Coding RNAs in the Candida parapsilosis Species Group

Paul D. Donovan, Markus S. Schroeder, Desmond G. Higgins, Geraldine Butler

PLOS ONE (2016)

Article Oncology

Wnt signalling is a bi-directional vulnerability of cancer cells

David J. Duffy, Aleksandar Krstic, Thomas Schwarzl, Melinda Halasz, Kristiina Iljin, Dirk Fey, Bridget Haley, Jenny Whilde, Saija Haapa-Paananen, Vidal Fey, Matthias Fischer, Frank Westermann, Kai-Oliver Henrich, Steffen Bannert, Desmond G. Higgins, Walter Kolch

ONCOTARGET (2016)

Article Genetics & Heredity

Multiple Origins of the Pathogenic Yeast Candida orthopsilosis by Separate Hybridizations between Two Parental Species

Markus S. Schroder, Kontxi Martinez de San Vicente, Tamara H. R. Prandini, Stephen Hammel, Desmond G. Higgins, Eduardo Bagagli, Kenneth H. Wolfe, Geraldine Butler

PLOS GENETICS (2016)

Article Biochemistry & Molecular Biology

Clustal Omega for making accurate alignments of many protein sequences

Fabian Sievers, Desmond G. Higgins

PROTEIN SCIENCE (2018)

Article Biochemical Research Methods

QuanTest2: benchmarking multiple sequence alignments using secondary structure prediction

Fabian Sievers, Desmond G. Higgins

BIOINFORMATICS (2020)

Article Biochemical Research Methods

Protein multiple sequence alignment benchmarking through secondary structure prediction

Quan Le, Fabian Sievers, Desmond G. Higgins

BIOINFORMATICS (2017)

暂无数据