4.5 Article

ROhAN: Row-order agnostic null models for statistically-sound knowledge discovery

期刊

DATA MINING AND KNOWLEDGE DISCOVERY
卷 37, 期 4, 页码 1692-1718

出版社

SPRINGER
DOI: 10.1007/s10618-023-00938-4

关键词

Hypothesis testing; Pattern mining; Sequences; Transactions

向作者/读者索取更多资源

We introduce a new class of null models for statistical validation of binary transactional and sequence datasets. Our null models are Row-Order Agnostic (ROA), in contrast to previous Row-Order Enforcing (ROE) models. We propose the ROhAN algorithmic framework for efficient sampling of datasets from ROA models, and our experimental evaluation demonstrates the differences between ROA and ROE models, as well as the efficiency and scalability of ROhAN.
We introduce a novel class of null models for the statistical validation of results obtained from binary transactional and sequence datasets. Our null models are Row-Order Agnostic (ROA), i.e., do not consider the order of rows in the observed dataset to be fixed, in stark contrast with previous null models, which are Row-Order Enforcing (ROE). We present ROhAN, an algorithmic framework for efficiently sampling datasets from ROA models according to user-specified distributions, which is a necessary step for the resampling-based statistical hypothesis tests employed to validate the results. ROhAN uses Metropolis-Hastings or rejection sampling to build on top of existing or future ROE sampling procedures. Our experimental evaluation shows that ROA models are very different from ROE ones, impacting the statistical validation, and that ROhAN is efficient, mixes fast, and scales well as the dataset grows.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据