☆ 4.5 Article

Efficient mining of the most significant patterns with permutation testing

DATA MINING AND KNOWLEDGE DISCOVERY (2020)

期刊

DATA MINING AND KNOWLEDGE DISCOVERY

卷 34, 期 4, 页码 1201-1234

出版社

SPRINGER

DOI: 10.1007/s10618-020-00687-8

关键词

Statistical pattern mining; Hypothesis testing; Top-kpatterns

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

资金

National Science Foundation [IIS-1247581]
University of Padova
MIUR, the Italian Ministry of Education, University and Research [20174LF3T8]

向作者/读者索取更多资源

Protocol

Reagent

摘要

The extraction of patterns displaying significant association with a class label is a key data mining task with wide application in many domains. We introduce and study a variant of the problem that requires to mine the top-kstatistically significant patterns, thus providing tight control on the number of patterns reported in output. We developTopKWY, the first algorithm to mine the top-ksignificant patterns while rigorously controlling the family-wise error rate of the output, and provide theoretical evidence of its effectiveness.TopKWYcrucially relies on a novel strategy to explore statistically significant patterns and on several key implementation choices, which may be of independent interest. Our extensive experimental evaluation shows thatTopKWYenables the extraction of the most significant patterns from large datasets which could not be analyzed by the state-of-the-art. In addition,TopKWYimproves over the state-of-the-art even for the extraction ofallsignificant patterns.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

Stat-DSM: Statistically Discriminative Sub-Trajectory Mining With Multiple Testing Correction

Vo Nguyen Le Duy, Takuto Sakuma, Taiju Ishiyama, Hiroki Toda, Kazuya Arai, Masayuki Karasuyama, Yuta Okubo, Masayuki Sunaga, Hiroyuki Hanada, Yasuo Tabei, Ichiro Takeuchi

Summary: This study proposes a novel statistical approach, called Stat-DSM, to evaluate the statistical significance of discriminative sub-trajectory mining results. The proposed method properly controls the statistical significance of the extracted sub-trajectories and addresses the computational and statistical challenges of massive trajectory datasets.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

ITUFP: A fast method for interactive mining of Top-K frequent patterns from uncertain data

Razieh Davashi

Summary: In this paper, a fast method called ITUFP is proposed for interactive mining of Top-K UFPs. The method efficiently stores and extracts pattern information by creating UP-Lists and IMCUP-Lists, and only updates the IMCUP-Lists when the K value changes. Experimental results demonstrate that the proposed method is very efficient for interactive mining of Top-K UFPs.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A cost-effective approach for mining near-optimal top-k patterns

Xin Wang, Zhuo Lan, Yu-Ang He, Yang Wang, Zhi-Gui Liu, Wen-Bo Xie

Summary: This article introduces a cost-effective approach for frequent pattern mining on large graphs. The approach applies a level-wise strategy to incrementally detect frequent patterns and can terminate the mining process once the top-k patterns are discovered. It also utilizes a smart traverse strategy and compact data structures to compute the lower bound of support.

EXPERT SYSTEMS WITH APPLICATIONS (2022)

添加到收藏夹

Article Computer Science, Information Systems

A Fast Algorithm for Mining Top-Rank-k Erasable Closed Patterns

Ham Nguyen, Tuong Le

Summary: This study presents a robust method for mining top-rank-k erasable closed patterns (ECPs) and combines the mining and ranking phases into a single step to improve efficiency. Experimental results confirm that this method outperforms other approaches in mining top-rank-k ECPs.

CMC-COMPUTERS MATERIALS & CONTINUA (2022)

添加到收藏夹

Article Computer Science, Information Systems

TKUS: Mining top-k high utility sequential patterns

Chunkai Zhang, Zilin Du, Wensheng Gan, Philip S. Yu

Summary: High-utility sequential pattern mining (HUSPM) has attracted significant research interest recently, with the main task of finding subsequences with high utility in a quantitative sequential database. The top-k HUSPM concept was introduced to address the challenge of specifying a minimum utility threshold. Existing strategies for top-k HUSPM require improvement in terms of efficiency and scalability.

INFORMATION SCIENCES (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

ROhAN: Row-order agnostic null models for statistically-sound knowledge discovery

Maryam Abuissa, Alexander Lee, Matteo Riondato

Summary: We introduce a new class of null models for statistical validation of binary transactional and sequence datasets. Our null models are Row-Order Agnostic (ROA), in contrast to previous Row-Order Enforcing (ROE) models. We propose the ROhAN algorithmic framework for efficient sampling of datasets from ROA models, and our experimental evaluation demonstrates the differences between ROA and ROE models, as well as the efficiency and scalability of ROhAN.

DATA MINING AND KNOWLEDGE DISCOVERY (2023)

添加到收藏夹

Article Computer Science, Information Systems

TKN: An efficient approach for discovering top-k high utility itemsets with positive or negative profits

Mohamed Ashraf, Tamer Abdelkader, Sherine Rady, Tarek F. Gharib

Summary: In this paper, a TKN method is proposed to efficiently mine Top-K HUIs with positive or negative profits. This method utilizes generalized and adaptive techniques to decrease the dataset traversing cost and narrow the exploration space through pruning and threshold elevating. Experimental results demonstrate the superiority of TKN in finding the required number of patterns compared to other competing algorithms.

INFORMATION SCIENCES (2022)

添加到收藏夹

Article Computer Science, Information Systems

k-PFPMiner: Top-k Periodic Frequent Patterns in Big Temporal Databases

Palla Likhitha, Penugonda Ravikumar, Deepika Saxena, Rage Uday Kiran, Yutaka Watanobe

Summary: Finding periodic-frequent patterns in temporal databases is a significant data mining problem. This paper proposes a solution to discover the top-k periodic-frequent patterns in a database.

IEEE ACCESS (2023)

添加到收藏夹

Article Chemistry, Multidisciplinary

Mining Top-k High Average-Utility Sequential Patterns for Resource Transformation

Kai Cao, Yucong Duan

Summary: High-utility sequential pattern mining (HUSPM) is a method used to find high-utility subsequences in a quantitative sequential database. However, existing extensions of high-utility sequential patterns (HUSP) have high utility that increases with their length, making it difficult to obtain diverse resource patterns. To address this issue, we propose a top-k high average-utility sequential pattern mining (HAUSPM) algorithm based on average utility, which improves efficiency and thresholds through a projection mechanism and a sequence average-utility-raising strategy. Experimental results demonstrate that the proposed algorithm achieves good performance.

APPLIED SCIENCES-BASEL (2023)

添加到收藏夹

Article Computer Science, Information Systems

Mining Diversified Top-r Lasting Cohesive Subgraphs on Temporal Networks

Longlong Lin, Pingpeng Yuan, Ronghua Li, Hai Jin

Summary: This paper investigates the problem of finding diversified lasting cohesive subgraphs from temporal networks and proposes a new model and solution. Empirical studies demonstrate that the proposed solutions perform efficiently and accurately, surpassing existing methods.

IEEE TRANSACTIONS ON BIG DATA (2022)

添加到收藏夹

Article Nursing

Introduction to Statistical Hypothesis Testing in Nursing Research

Courtney Keeler, Alexa Colgrove Curtis

Summary: This article is part of a series that aims to provide nurses with a comprehensive understanding of the concepts and principles essential to clinical research. It covers a wide range of topics from research design to data interpretation. To access all articles in the series, visit the provided link.

AMERICAN JOURNAL OF NURSING (2023)

添加到收藏夹

Article Meteorology & Atmospheric Sciences

Testing Methods of Pattern Extraction for Climate Data Using Synthetic Modes

D. James Fulton, Gabriele C. Hegerl

Summary: This study develops a Monte Carlo method to compare PCA, DMD, and SFA in extracting additive space-time modes present in climate data, showing that the alternative methods outperform PCA significantly in synthetic data and that PCA's extracted modes are not significantly better than random guesses in simple cases.

JOURNAL OF CLIMATE (2021)

添加到收藏夹

Editorial Material Obstetrics & Gynecology

Current controversies: Null hypothesis significance testing

Philip M. Sedgwick, Anne Hammer, Ulrik Schioler Kesmodel, Lars Henning Pedersen

Summary: Traditional null hypothesis significance testing (NHST) is widely used in obstetric and gynecological research, but its application in inferring clinical significance is controversial. Misinterpretation of statistical significance and ignorance of NHST limitations may lead to false claims and dismissal of important factors.

ACTA OBSTETRICIA ET GYNECOLOGICA SCANDINAVICA (2022)

添加到收藏夹

Article Computer Science, Information Systems

Top data mining tools for the healthcare industry

Judith Santos-Pereira, Le Gruenwald, Jorge Bernardino

Summary: This paper presents a survey of popular open-source data mining tools and proposes tool selection criteria based on healthcare application requirements. KNIME and RapidMiner are identified as the best tools for healthcare data mining.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Sound and relatively complete belief Hoare logic for statistical hypothesis testing programs

Yusuke Kawamoto, Tetsuya Sato, Kohei Suenaga

Summary: This paper proposes a new approach for formally describing the requirement for statistical inference and checking the appropriate use of statistical methods in programs. The authors define a belief Hoare logic (BHL) for formalizing and reasoning about statistical beliefs acquired through hypothesis testing. Examples demonstrate the usefulness of BHL in reasoning about practical issues in hypothesis testing, while also discussing the importance of prior beliefs in acquiring statistical beliefs.

ARTIFICIAL INTELLIGENCE (2024)

添加到收藏夹

Article Biochemical Research Methods

SPRISS: approximating frequent k-mers by sampling reads, and applications

Diego Santoro, Leonardo Pellegrina, Matteo Comin, Fabio Vandin

Summary: SPRISS is an efficient algorithm for approximating frequent k-mers and their frequencies in next-generation sequencing data. It uses a simple yet powerful reads sampling scheme to obtain comparable results in a shorter amount of time. Experimental results demonstrate its efficiency and accuracy.

BIOINFORMATICS (2022)

添加到收藏夹

Article Computer Science, Information Systems

MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, Matteo Riondato

Summary: This paper presents MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for functions with poset structure. MCRapper allows finding statistically-significant functions and approximations of high-expectation functions. It achieves this by using upper bounds to efficiently explore and prune the search space. The paper also introduces TFP-R, an algorithm developed using MCRapper for True Frequent Pattern mining, which outperforms existing methods.

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2022)

添加到收藏夹

Article Biochemical Research Methods

Discovering significant evolutionary trajectories in cancer phylogenies

Leonardo Pellegrina, Fabio Vandin

Summary: The study presents a new algorithm, MASTRO, for discovering significantly conserved evolutionary trajectories in cancer. The algorithm is applied to lung cancer and acute myeloid leukemia data, confirming and extending previous findings.

BIOINFORMATICS (2022)

添加到收藏夹

Article Biochemical Research Methods

Fast Approximation of Frequent k-Mers and Applications to Metagenomics

Leonardo Pellegrina, Cinzia Pizzi, Fabio Vandin

JOURNAL OF COMPUTATIONAL BIOLOGY (2020)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

SPUMANTE: Significant Pattern Mining with Unconditional Testing

Leonardo Pellegrina, Matteo Riondato, Fabio Vandin

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING (2019)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Hypothesis Testing and Statistically-sound Pattern Mining

Leonardo Pellegrina, Matteo Riondato, Fabio Vandin

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING (2019)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Efficient Mining of the Most Significant Patterns with Permutation Testing

Leonardo Pellegrina, Fabio Vandin

KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING (2018)

添加到收藏夹

Proceedings Paper Engineering, Aerospace

Design and Test in Microgravity of a Space Tether Length and Length Rate Measurement Device

Gilberto Grassi, Mattia Pezzato, Alessia Gloder, Riccardo Mantellato, Alessandro Francesconi, Enrico Lorenzini, Alvise Rossi, Leonardo Pellegrina

2017 IEEE INTERNATIONAL WORKSHOP ON METROLOGY FOR AEROSPACE (METROAEROSPACE) (2017)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.