期刊
DATA MINING AND KNOWLEDGE DISCOVERY
卷 37, 期 4, 页码 1692-1718出版社
SPRINGER
DOI: 10.1007/s10618-023-00938-4
关键词
Hypothesis testing; Pattern mining; Sequences; Transactions
We introduce a new class of null models for statistical validation of binary transactional and sequence datasets. Our null models are Row-Order Agnostic (ROA), in contrast to previous Row-Order Enforcing (ROE) models. We propose the ROhAN algorithmic framework for efficient sampling of datasets from ROA models, and our experimental evaluation demonstrates the differences between ROA and ROE models, as well as the efficiency and scalability of ROhAN.
We introduce a novel class of null models for the statistical validation of results obtained from binary transactional and sequence datasets. Our null models are Row-Order Agnostic (ROA), i.e., do not consider the order of rows in the observed dataset to be fixed, in stark contrast with previous null models, which are Row-Order Enforcing (ROE). We present ROhAN, an algorithmic framework for efficiently sampling datasets from ROA models according to user-specified distributions, which is a necessary step for the resampling-based statistical hypothesis tests employed to validate the results. ROhAN uses Metropolis-Hastings or rejection sampling to build on top of existing or future ROE sampling procedures. Our experimental evaluation shows that ROA models are very different from ROE ones, impacting the statistical validation, and that ROhAN is efficient, mixes fast, and scales well as the dataset grows.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据