4.7 Article

Re-Fraction: A Machine Learning Approach for Deterministic Identification of Protein Homologues and Splice Variants in Large-scale MS-based Proteomics

Journal

JOURNAL OF PROTEOME RESEARCH
Volume 11, Issue 5, Pages 3035-3045

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/pr300072J

Keywords

Proteomics; Machine learning; Protein Inference; Protein homologues; Splice variants; Isoforms; Mass spectrometry

Funding

  1. NICTA
  2. ARC [DP0984267]
  3. Australian Research Council [DP0984267] Funding Source: Australian Research Council

Ask authors/readers for more resources

A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Review Biochemical Research Methods

Computational systems approach towards phosphoproteomics and their downstream regulation

Di Xiao, Carissa Chen, Pengyi Yang

Summary: This article reviews computational methods, tools, and systems approaches that have been developed for phosphoproteomics data analysis, categorizing them into data processing, functional analysis, phosphoproteome annotation, and integration with other omics. The article highlights potential research directions that could contribute significantly to this fast-growing field.

PROTEOMICS (2023)

Article Biochemical Research Methods

Synergistic Targeting of DNA-PK and KIT Signaling Pathways in KIT Mutant Acute Myeloid Leukemia

Heather C. Murray, Kasey Miller, Joshua S. Brzozowski, Richard G. S. Kahl, Nathan D. Smith, Sean J. Humphrey, Matthew D. Dun, Nicole M. Verrills

Summary: Acute myeloid leukemia (AML) is a highly aggressive form of leukemia with a poor prognosis. Mutations in kinases, such as FLT3 and KIT, are common in AML patients and are associated with treatment resistance. This study identified DNA-PK as a potential therapeutic target in AML and demonstrated that DNA-PK inhibition sensitizes AML cells with FLT3 and KIT mutations to standard treatments. The findings suggest that targeting DNA-PK could improve the outcomes of AML patients with these mutations.

MOLECULAR & CELLULAR PROTEOMICS (2023)

Editorial Material Multidisciplinary Sciences

Motifs mapped for almost all human kinase enzymes

Sean J. Humphrey, Elise J. Needham

Summary: A computational resource can identify candidate protein targets for a major class of kinase enzymes in humans, which is important for understanding the role of cell signaling in health and disease.

NATURE (2023)

Article Biochemical Research Methods

scSTAR reveals hidden heterogeneity with a real-virtual cell pair structure across conditions in single-cell RNA sequencing data

Jie Hao, Jiawei Zou, Jiaqiang Zhang, Ke Chen, Duojiao Wu, Wei Cao, Guoguo Shang, Jean Y. H. Yang, KongFatt Wong-Lin, Hourong Sun, Zhen Zhang, Xiangdong Wang, Wantao Chen, Xin Zou

Summary: Cell-state transition analysis using single-cell RNA-sequencing can reveal additional information in time-resolved biological phenomena. However, current methods are limited to short-term evolution of cell states based on gene expression derivative. This study presents scSTAR, a method that overcomes this limitation by constructing a paired-cell projection between different biological conditions with arbitrary time spans, leading to more accurate predictions and new discoveries in aging and cancer research.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biochemical Research Methods

Benchmarking of analytical combinations for COVID-19 outcome prediction using single-cell RNA sequencing data

Yue Cao, Shila Ghazanfar, Pengyi Yang, Jean Yang

Summary: The advancement of scRNA-seq technology has led to its increasing use in large-scale patient cohort studies. This study evaluates the impact of analytical choices on patient outcome prediction using scRNA-seq COVID-19 datasets. The study examines the difference between single-view and multi-view feature spaces, surveys multiple learning platforms, and compares integration approaches. The results highlight the power of ensemble learning, consistency among different learning methods, and the importance of dataset normalization.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biology

Deep multimodal graph-based network for survival prediction from highly multiplexed images and patient variables

Xiaohang Fu, Ellis Patrick, Jean Y. H. Yang, David Dagan Feng, Jinman Kim

Summary: The spatial architecture and phenotypic heterogeneity of tumor cells are associated with cancer prognosis and outcomes. Imaging mass cytometry captures high-dimensional maps of disease-relevant biomarkers at single-cell resolution, which can inform patient-specific prognosis. However, existing methods for survival prediction do not utilize spatial phenotype information at the single-cell level, and there is a lack of end-to-end methods that integrate imaging data with clinical information for improved accuracy. We propose a deep multimodal graph-based network that considers spatial phenotype information and clinical variables to enhance survival prediction, and demonstrate its effectiveness in breast cancer datasets.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Biochemical Research Methods

Spatial analysis for highly multiplexed imaging data to identify tissue microenvironments

Ellis Patrick, Nicolas P. Canete, Sourish S. Iyengar, Andrew N. Harman, Greg T. Sutherland, Pengyi Yang

Summary: Highly multiplexed in situ imaging cytometry assays enable simultaneous study of spatial organization of multiple cell types. We propose a statistical method that clusters local indicators of spatial association to quantify complex multi-cellular relationships. Our approach successfully identifies distinct tissue architectures in datasets from state-of-the-art high-parameter assays, demonstrating its value in summarizing information-rich data generated from these technologies.

CYTOMETRY PART A (2023)

Article Biochemical Research Methods

Deep Proteome Profiling of White Adipose Tissue Reveals Marked Conservation and Distinct Features Between Different Anatomical Depots

Soren Madsen, Marin E. Nelson, Vinita Deshpande, Sean J. Humphrey, Kristen C. Cooke, Anna Howell, Alexis Diaz-Vegas, James G. Burchfield, Jacqueline Stockli, David E. James

Summary: White adipose tissue consists of subcutaneous adipose tissue (SAT) and abdominal/visceral adipose tissue, which have different molecular underpinnings. Through proteomics profiling, it was found that SAT adipocytes are geared toward higher catabolic activity, while visceral adipocytes are more suited for lipid storage. A Western diet caused significant changes in adipocyte proteomes, particularly in visceral adipocytes, indicating mitochondrial stress and adipocyte de-differentiation. The comparison between adipocytes and 3T3-L1 proteomes revealed overlap, supporting the utility of the 3T3-L1 adipocyte model.

MOLECULAR & CELLULAR PROTEOMICS (2023)

Article Biochemistry & Molecular Biology

Multi-task learning from multimodal single-cell omics with Matilda

Chunlei Liu, Hao Huang, Pengyi Yang

Summary: Multimodal single-cell omics technologies allow for simultaneous profiling of multiple molecular programs in individual cells, providing a new level of resolution for studying biological systems. However, integrating and analyzing multimodal single-cell omics data presents challenges due to the lack of suitable methods. In this study, we propose Matilda, a multi-task learning method that can perform data simulation, dimension reduction, cell type classification, and feature selection in a unified framework. We compare Matilda with other state-of-the-art methods using datasets from popular multimodal single-cell omics technologies, and show its utility in addressing multiple key tasks in integrative analysis.

NUCLEIC ACIDS RESEARCH (2023)

Article Multidisciplinary Sciences

Phosphoproteomics reveals rewiring of the insulin signaling network and multi-nodal defects in insulin resistance

Daniel J. Fazakerley, Julian van Gerwen, Kristen C. Cooke, Xiaowen Duan, Elise J. Needham, Alexis Diaz-Vegas, Soren Madsen, Dougall M. Norris, Amber S. Shun-Shion, James R. Krycer, James G. Burchfield, Pengyi Yang, Mark R. Wade, Joseph T. Brozinick, David E. James, Sean J. Humphrey

Summary: The failure of metabolic tissues to respond to insulin is an early marker of type 2 diabetes. Using global phosphoproteomics, the authors demonstrate that insulin resistance is caused by a significant rewiring of insulin signaling pathways, leading to dysregulated GSK3 activity. Dysregulation of protein phosphorylation plays a crucial role in adipocyte insulin response and insulin resistance. Through phosphoproteomics, the researchers reveal a marked rewiring of the insulin signaling network and identify common dysregulated phosphosites and subnetworks that contribute to insulin resistance, including non-canonical regulators MARK2/3 and GSK3. Inhibition of GSK3 partially reverses insulin resistance in cells and tissue explants.

NATURE COMMUNICATIONS (2023)

Article Biochemical Research Methods

PAD2: interactive exploration of transcription factor genomic colocalization using ChIP-seq data

Taiyun Kim, Hani Jieun Kim, Andrew J. Oldfield, Pengyi Yang

Summary: This article introduces a protocol for utilizing PAD2, an interactive web application, to investigate the colocalization of various transcription factors and chromatin-regulating proteins in mouse embryonic stem cells. The protocol includes steps for accessing and searching the PAD2 database, selecting and submitting genomic regions, and conducting protein colocalization analysis using heatmap and ranked correlation plot.

STAR PROTOCOLS (2023)

Review Mathematical & Computational Biology

Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data

Daniel Kim, Andy Tran, Hani Jieun Kim, Yingxin Lin, Jean Yee Hwa Yang, Pengyi Yang

Summary: Inferring gene regulatory networks is crucial in biology, and recent advances in sequencing technology have led to the development of state-of-the-art methods that utilize single-cell multi-omic data for more comprehensive and precise network reconstruction.

NPJ SYSTEMS BIOLOGY AND APPLICATIONS (2023)

No Data Available