4.5 Article

On the Sample Complexity of Cancer Pathways Identification

Journal

JOURNAL OF COMPUTATIONAL BIOLOGY
Volume 23, Issue 1, Pages 30-41

Publisher

MARY ANN LIEBERT, INC
DOI: 10.1089/cmb.2015.0100

Keywords

cancer pathways; exclusivity; PAC learning; VC dimension

Funding

  1. NSF [IIS-1247581]
  2. NIH [R01-CA180776]
  3. University of Padova [CPDA121378/12]
  4. NATIONAL CANCER INSTITUTE [R01CA180776] Funding Source: NIH RePORTER

Ask authors/readers for more resources

Advances in DNA sequencing technologies have enabled large cancer sequencing studies, collecting somatic mutation data from a large number of cancer patients. One of the main goals of these studies is the identification of all cancer genesgenes associated with cancer. Its achievement is complicated by the extensive mutational heterogeneity of cancer, due to the fact that important mutations in cancer target combinations of genes (i.e., pathways). Recently, the pattern of mutual exclusivity among mutations in a cancer pathway has been observed, and methods that find significant combinations of cancer genes by detecting mutual exclusivity have been proposed. A key question in the analysis of mutual exclusivity is the computation of the minimum number of samples required to reliably find a meaningful set of mutually exclusive mutations in the data, or conclude that there is no such set. In general, the problem of determining the sample complexity, or the number of samples required to identify significant combinations of features, of genomic problems is largely unexplored. In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Hardware & Architecture

Distributed agreement in dynamic peer-to-peer networks

John Augustine, Gopal Pandurangan, Peter Robinson, Eli Upfal

JOURNAL OF COMPUTER AND SYSTEM SCIENCES (2015)

Article Computer Science, Information Systems

TRIEST: Counting Local and Global Triangles in Fully Dynamic Streams with Fixed Memory Size

Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, Eli Upfal

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2017)

Article Computer Science, Information Systems

ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

Matteo Riondato, Eli UPfal

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2018)

Article Computer Science, Artificial Intelligence

Distributed Graph Diameter Approximation

Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

ALGORITHMS (2020)

Article Computer Science, Artificial Intelligence

Efficient Approximation for Restricted Biclique Cover Problems

Alessandro Epasto, Eli Upfal

ALGORITHMS (2018)

Proceedings Paper Computer Science, Information Systems

Controlling False Discoveries During Interactive Data Exploration

Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig, Eli Upfal, Tim Kraska

SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (2017)

Proceedings Paper Computer Science, Information Systems

Safe Visual Data Exploration

Zheguang Zhao, Emanuel Zgraggen, Lorenzo De Stefani, Carsten Binnig, Eli Upfal, Tim Kraska

SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (2017)

Proceedings Paper Telecommunications

Minimizing Operational Cost for Zero Information Leakage

Megumi Ando, Eli Upfal

2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC) (2017)

Article Computer Science, Information Systems

MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

PROCEEDINGS OF THE VLDB ENDOWMENT (2017)

Proceedings Paper Computer Science, Information Systems

The k-Nearest Representatives Classifier: A Distance-Based Classifier with Strong Generalization Bounds

Cyrus Cousins, Eli Upfal

2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) (2017)

Proceedings Paper Computer Science, Hardware & Architecture

A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs

Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016) (2016)

Proceedings Paper Computer Science, Artificial Intelligence

Wiggins: Detecting Valuable Information in Dynamic Networks Using Limited Resources

Ahmad Mahmoody, Matteo Riondato, Eli Upfal

PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16) (2016)

Proceedings Paper Computer Science, Interdisciplinary Applications

Optimizing Static and Adaptive Probing Schedules for Rapid Event Detection

Ahmad Mahmoody, Evgenios M. Kornaropoulos, Eli Upfal

COMBINATORIAL OPTIMIZATION AND APPLICATIONS, (COCOA 2015) (2015)

Proceedings Paper Computer Science, Hardware & Architecture

Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation

Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

SPAA'15: PROCEEDINGS OF THE 27TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (2015)

No Data Available