4.5 Article

Efficient mining of the most significant patterns with permutation testing

期刊

DATA MINING AND KNOWLEDGE DISCOVERY
卷 34, 期 4, 页码 1201-1234

出版社

SPRINGER
DOI: 10.1007/s10618-020-00687-8

关键词

Statistical pattern mining; Hypothesis testing; Top-kpatterns

资金

  1. National Science Foundation [IIS-1247581]
  2. University of Padova
  3. MIUR, the Italian Ministry of Education, University and Research [20174LF3T8]

向作者/读者索取更多资源

The extraction of patterns displaying significant association with a class label is a key data mining task with wide application in many domains. We introduce and study a variant of the problem that requires to mine the top-kstatistically significant patterns, thus providing tight control on the number of patterns reported in output. We developTopKWY, the first algorithm to mine the top-ksignificant patterns while rigorously controlling the family-wise error rate of the output, and provide theoretical evidence of its effectiveness.TopKWYcrucially relies on a novel strategy to explore statistically significant patterns and on several key implementation choices, which may be of independent interest. Our extensive experimental evaluation shows thatTopKWYenables the extraction of the most significant patterns from large datasets which could not be analyzed by the state-of-the-art. In addition,TopKWYimproves over the state-of-the-art even for the extraction ofallsignificant patterns.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemical Research Methods

SPRISS: approximating frequent k-mers by sampling reads, and applications

Diego Santoro, Leonardo Pellegrina, Matteo Comin, Fabio Vandin

Summary: SPRISS is an efficient algorithm for approximating frequent k-mers and their frequencies in next-generation sequencing data. It uses a simple yet powerful reads sampling scheme to obtain comparable results in a shorter amount of time. Experimental results demonstrate its efficiency and accuracy.

BIOINFORMATICS (2022)

Article Computer Science, Information Systems

MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, Matteo Riondato

Summary: This paper presents MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for functions with poset structure. MCRapper allows finding statistically-significant functions and approximations of high-expectation functions. It achieves this by using upper bounds to efficiently explore and prune the search space. The paper also introduces TFP-R, an algorithm developed using MCRapper for True Frequent Pattern mining, which outperforms existing methods.

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2022)

Article Biochemical Research Methods

Discovering significant evolutionary trajectories in cancer phylogenies

Leonardo Pellegrina, Fabio Vandin

Summary: The study presents a new algorithm, MASTRO, for discovering significantly conserved evolutionary trajectories in cancer. The algorithm is applied to lung cancer and acute myeloid leukemia data, confirming and extending previous findings.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

Fast Approximation of Frequent k-Mers and Applications to Metagenomics

Leonardo Pellegrina, Cinzia Pizzi, Fabio Vandin

JOURNAL OF COMPUTATIONAL BIOLOGY (2020)

Proceedings Paper Computer Science, Information Systems

SPUMANTE: Significant Pattern Mining with Unconditional Testing

Leonardo Pellegrina, Matteo Riondato, Fabio Vandin

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING (2019)

Proceedings Paper Computer Science, Information Systems

Hypothesis Testing and Statistically-sound Pattern Mining

Leonardo Pellegrina, Matteo Riondato, Fabio Vandin

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING (2019)

Proceedings Paper Computer Science, Artificial Intelligence

Efficient Mining of the Most Significant Patterns with Permutation Testing

Leonardo Pellegrina, Fabio Vandin

KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING (2018)

Proceedings Paper Engineering, Aerospace

Design and Test in Microgravity of a Space Tether Length and Length Rate Measurement Device

Gilberto Grassi, Mattia Pezzato, Alessia Gloder, Riccardo Mantellato, Alessandro Francesconi, Enrico Lorenzini, Alvise Rossi, Leonardo Pellegrina

2017 IEEE INTERNATIONAL WORKSHOP ON METROLOGY FOR AEROSPACE (METROAEROSPACE) (2017)

暂无数据