4.7 Article

Xolik: finding cross-linked peptides with maximum paired scores in linear time

Journal

BIOINFORMATICS
Volume 35, Issue 2, Pages 251-257

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bty526

Keywords

-

Funding

  1. Research Grant Council (RGC) of the Hong Kong S.A.R. government [T12-402/13N]

Ask authors/readers for more resources

Motivation: Cross-linking technique coupled with mass spectrometry (MS) is widely used in the analysis of protein structures and protein-protein interactions. In order to identify cross-linked peptides from MS data, we need to consider all pairwise combinations of peptides, which is computationally prohibitive when the sequence database is large. To alleviate this problem, some heuristic screening strategies are used to reduce the number of peptide pairs during the identification. However, heuristic screening strategies may miss some true cross-linked peptides. Results: We directly tackle the combination challenge without using any screening strategies. With the data structure of double-ended queue, the proposed algorithm reduces the quadratic time complexity of exhaustive searching down to the linear time complexity. We implement the algorithm in a tool named Xolik. The running time of Xolik is validated using databases with different numbers of proteins. Experiments using synthetic and empirical datasets show that Xolik outperforms existing tools in terms of running time and statistical power.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Genetics & Heredity

Leveraging LD eigenvalue regression to improve the estimation of SNP heritability and confounding inflation

Shuang Song, Wei Jiang, Yiliang Zhang, Lin Hou, Hongyu Zhao

Summary: Heritability is a crucial concept in genetic studies, and LD eigenvalue regression (LDER) is an extended method that utilizes LD information to estimate genetic contributions more accurately and differentiate between polygenicity and confounding effects.

AMERICAN JOURNAL OF HUMAN GENETICS (2022)

Article Biotechnology & Applied Microbiology

Post-translational modifications reshape the antigenic landscape of the MHC I immunopeptidome in tumors

Assaf Kacen, Aaron Javitt, Matthias P. Kramer, David Morgenstern, Tomer Tsaban, Merav D. Shmueli, Guo Ci Teo, Felipe da Veiga Leprevost, Eilon Barnea, Fengchao Yu, Arie Admon, Lea Eisenbach, Yardena Samuels, Ora Schueler-Furman, Yishai Levin, Alexey Nesvizhskii, Yifat Merbl

Summary: This study describes an antigen discovery pipeline that analyzes different post-translational modifications (PTMs) of antigens, expands the antigen landscape, and reveals disease-specific modified targets, which have important implications for T cell-mediated therapies in cancer and other fields.

NATURE BIOTECHNOLOGY (2023)

Article Respiratory System

Integrative analyses for the identification of idiopathic pulmonary fibrosis-associated genes and shared loci with other diseases

Ming Chen, Yiliang Zhang, Taylor Adams, Dingjue Ji, Wei Jiang, Louise Wain, Michael Cho, Naftali Kaminski, Hongyu Zhao

Summary: Our study identified new genes associated with IPF susceptibility through integrative analysis, expanding the understanding of the complex genetic architecture and disease mechanism of IPF.

THORAX (2023)

Article Chemistry, Analytical

Evaluating Linear Ion Trap for MS3-Based Multiplexed Single-Cell Proteomics

Junho Park, Fengchao Yu, James M. Fulcher, Sarah M. Williams, Kristin Engbrecht, Ronald J. Moore, Geremy C. Clair, Vladislav Petyuk, Alexey I. Nesvizhskii, Ying Zhu

Summary: To overcome the issue of ratio compression in isobaric labeling-based multiplexed single-cell proteomics, we developed an improved MS3-based method using a linear ion trap. This method increased proteome coverage for single-cell-level peptides and was applied to study immune activation in single macrophages.

ANALYTICAL CHEMISTRY (2023)

Article Biochemical Research Methods

A novel Bayesian framework for harmonizing information across tissues and studies to increase cell type deconvolution accuracy

Wenxuan Deng, Bolun Li, Jiawei Wang, Wei Jiang, Xiting Yan, Ningshan Li, Milica Vukmirovic, Naftali Kaminski, Jing Wang, Hongyu Zhao

Summary: Computational cell type deconvolution can reveal cell type proportion heterogeneity in samples. This study introduces tranSig, a novel Bayesian framework, to improve signature matrix inference from single-cell RNA sequencing data. The simulations and applications show that tranSig is accurate and robust in defining gene expression signatures of different cell types, facilitating improved in silico cell type deconvolutions.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biochemical Research Methods

MSFragger-Labile: A Flexible Method to Improve Labile PTM Analysis in Proteomics

Daniel A. Polasky, Daniel J. Geiszler, Fengchao Yu, Kai Li, Guo Ci Teo, Alexey I. Nesvizhskii

Summary: Posttranslational modifications of proteins play crucial roles in defining and regulating protein functions. Traditional methods for identifying these modifications using mass spectrometry-based proteomics have limitations, as they treat modifications as static attachments to peptide sequences. However, many modifications undergo fragmentation during mass spectrometry experiments, offering opportunities for improved searches. In this study, a new labile mode in the MSFragger search engine was developed to incorporate modification-specific fragment ions, resulting in significant improvements in identifying phosphopeptides, RNA-crosslinked peptides, and ADP-ribosylated peptides.

MOLECULAR & CELLULAR PROTEOMICS (2023)

Article Biochemistry & Molecular Biology

Capturing the hierarchically assorted modules of protein-protein interactions in the organized

Shuaijian Dai, Shichang Liu, Chen Zhou, Fengchao Yu, Guang Zhu, Wenhao Zhang, Haiteng Deng, Al Burlingame, Weichuan Yu, Tingliang Wang, Ning Li

Summary: In this study, the researchers used cross-linking mass spectrometry to analyze the nuclear proteins in soybean seedlings, and identified 1297 nuclear protein-protein interactions. They also constructed a network model for these interactions, and discovered several ethylene-specific module variants.

MOLECULAR PLANT (2023)

Article Multidisciplinary Sciences

Unraveling the glycosylated immunopeptidome with HLA-Glyco

Georges Bedran, Daniel A. Polasky, Yi Hsiao, Fengchao Yu, Felipe da Veiga Leprevost, Javier A. Alfaro, Marcin Cieslik, Alexey I. Nesvizhskii

Summary: In this study, a fast computational workflow merging the MSFragger-Glyco search algorithm with a false discovery rate control is introduced for analyzing glycopeptides from mass spectrometry-based immunopeptidome data. The authors analyze eight large-scale publicly available studies and find that glycosylated MHC-associated peptides are predominantly presented by MHC class II. They present a comprehensive resource, HLA-Glyco, which contains over 3,400 human leukocyte antigen (HLA) class II N-glycopeptides from 1,049 distinct protein glycosylation sites. This resource provides valuable insights into glycosylation properties in antigen recognition and immune modulation.

NATURE COMMUNICATIONS (2023)

Article Biochemical Research Methods

Nontargeted screening of aldehydes and ketones by chemical isotope labeling combined with ultra-high performance liquid chromatography-high resolution mass spectrometry followed by hybrid filtering of features

Ruizhi Zhu, Han Chen, Meiyan Liu, Yanqun Xu, Wei Jiang, Xiaoxi Si, Lunzhao Yi, Ying Gu, Dabing Ren, Juan Wang

Summary: A chemical isotope labeling strategy coupled with ultra-high performance liquid chromatography-high resolution mass spectrometry (UHPLC-HRMS) was developed for the capture and detection of aldehydes and ketones. A post-acquisition data processing method called MSFilter was proposed to facilitate the screening and identification of these compounds in complex matrices.

JOURNAL OF CHROMATOGRAPHY A (2023)

Article Health Care Sciences & Services

Statistical assessment of biomarker replicability using MAJAR method

Yuhan Xie, Song Zhai, Wei Jiang, Hongyu Zhao, Devan Mehrotra, Judong Shen

Summary: In the era of precision medicine, biomarkers associated with drug efficacy and safety responses are used for patient stratification and drug response prediction. Meta-analysis is commonly used to identify prognostic and predictive biomarkers, but it is challenging to find independent studies for replication, limiting the impact of discovered biomarkers.

STATISTICAL METHODS IN MEDICAL RESEARCH (2023)

Article Multidisciplinary Sciences

Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform

Fengchao Yu, Guo Ci Teo, Andy T. T. Kong, Klemens Froehlich, Ginny Xiaohe Li, Vadim Demichev, Alexey I. Nesvizhskii

Summary: MSFragger-DIA is a fast and sensitive tool for direct peptide identification from DIA spectra, demonstrating excellent performance in applications such as large-scale tumor studies and single-cell proteomics.

NATURE COMMUNICATIONS (2023)

Article Multidisciplinary Sciences

MSBooster: improving peptide identification rates using deep learning-based features

Kevin L. Yang, Fengchao Yu, Guo Ci Teo, Kai Li, Vadim Demichev, Markus Ralser, Alexey I. Nesvizhskii

Summary: This article presents an accessible method to improve peptide spectrum match rescoring using deep learning predictions in bottom-up proteomics. The authors demonstrate significant improvements in peptide/protein identifications across various experiments, including single-cell proteomics and immunopeptidomics. They introduce a new tool called MSBooster, which incorporates deep learning-based predictions of peptide properties to rescore peptide-to-spectrum matches, and show its utility in different workflows, such as immunopeptidomics and single-cell proteomics.

NATURE COMMUNICATIONS (2023)

Article Multidisciplinary Sciences

Detecting diagnostic features in MS/MS spectra of post-translationally modified peptides

Daniel J. Geiszler, Daniel A. Polasky, Fengchao Yu, Alexey I. Nesvizhskii

Summary: Post-translational modifications are important in proteomics, but can complicate searches for modified peptides. The authors present an automated method to find diagnostic spectral features for any modification, improving peptide recovery and localization. They demonstrate the utility of this approach for various modifications and analyze the interactions between ion intensity and statistical properties. This method has been incorporated into PTM-Shepherd and FragPipe.

NATURE COMMUNICATIONS (2023)

Review Biochemical Research Methods

Robustness of quantifying mediating effects of genetically regulated expression on complex traits with mediated expression score regression

Chen Lin, Wei Liu, Wei Jiang, Hongyu Zhao

Summary: Genetic association signals are commonly found in noncoding regions, highlighting the importance of gene expression regulation in human diseases and traits. However, it has been challenging to colocalize expression quantitative trait loci (eQTL) with disease-associated variants. Mediated expression score regression (MESC) is a method that quantifies the proportion of trait heritability mediated by genetically regulated gene expressions (GReX). However, MESC may lead to biased estimates of mediated heritability due to misspecifications of gene and SNP annotations, as well as errors in eQTL effect estimates.

BIOLOGY METHODS & PROTOCOLS (2023)

Meeting Abstract Biochemistry & Molecular Biology

Enabling Large-scale Glycoproteomics Data Analysis with the MSFragger Glyco Software Suite

Daniel Polasky, Fengchao Yu, Daniel Geislzer, Kai Li, Alexey Nesvizhskii

GLYCOBIOLOGY (2022)

No Data Available