4.5 Article Proceedings Paper

Ranking episodes using a partition model

Journal

DATA MINING AND KNOWLEDGE DISCOVERY
Volume 29, Issue 5, Pages 1312-1342

Publisher

SPRINGER
DOI: 10.1007/s10618-015-0419-9

Keywords

Episode mining; Partition model; Pattern ranking

Ask authors/readers for more resources

One of the biggest setbacks in traditional frequent pattern mining is that overwhelmingly many of the discovered patterns are redundant. A prototypical example of such redundancy is a freerider pattern where the pattern contains a true pattern and some additional noise events. A technique for filtering freerider patterns that has proved to be efficient in ranking itemsets is to use a partition model where a pattern is divided into two subpatterns and the observed support is compared to the expected support under the assumption that these two subpatterns occur independently. In this paper we develop a partition model for episodes, patterns discovered from sequential data. An episode is essentially a set of events, with possible restrictions on the order of events. Unlike with itemset mining, computing the expected support of an episode requires surprisingly sophisticated methods. In order to construct the model, we partition the episode into two subepisodes. We then model how likely the events in each subepisode occur close to each other. If this probability is high-which is often the case if the subepisode has a high support-then we can expect that when one event from a subepisode occurs, then the remaining events occur also close by. This approach increases the expected support of the episode, and if this increase explains the observed support, then we can deem the episode uninteresting. We demonstrate in our experiments that using the partition model can effectively and efficiently reduce the redundancy in episodes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Information Systems

Strongly polynomial efficient approximation scheme for segmentation

Nikolaj Tatti

INFORMATION PROCESSING LETTERS (2019)

Article Computer Science, Information Systems

Density-Friendly Graph Decomposition

Nikolaj Tatti

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2019)

Article Computer Science, Artificial Intelligence

The network-untangling problem: from interactions to activity timelines

Polina Rozenshtein, Nikolaj Tatti, Aristides Gionis

Summary: This paper investigates the problem of determining entity activity based on interactions, proposing two formulations and efficient algorithms for untangling networks. While the sum problem is shown to be NP-hard, the max problem can be solved optimally in linear time. In cases of multiple activity intervals per entity, both formulations are proved to be inapproximable but efficient algorithms based on alternative optimization are proposed. Evaluation on synthetic and real-world datasets supports the validity of concepts and performance of algorithms.

DATA MINING AND KNOWLEDGE DISCOVERY (2021)

Article Computer Science, Artificial Intelligence

Maintaining AUC and H-measure over time

Nikolaj Tatti

Summary: The paper discusses three algorithms for maintaining performance measures of classifiers in machine learning, including AUC and H-measure based on ROC curve. AUC can be updated in O(log n) time by maintaining sorted data points in a search tree. For H-measure, the convex hull can be maintained using a modified convex hull maintenance algorithm, and the measure can be computed or estimated in varying time complexities based on certain conditions. Empirical results show that the proposed methods are significantly faster than baseline approaches.

MACHINE LEARNING (2022)

Article Computer Science, Artificial Intelligence

Ranking with submodular functions on a budget

Guangyi Zhang, Nikolaj Tatti, Aristides Gionis

Summary: Submodular maximization is fundamental in many important machine learning problems and has various applications. However, the study of maximizing submodular functions has often been limited to selecting a set of items, while many real-world applications require a ranking solution. This paper introduces a novel formulation for ranking items with submodular valuations and budget constraints, and proposes practical algorithms with approximation guarantees for different types of budget constraints. The empirical evaluation shows that the proposed algorithms outperform strong baselines.

DATA MINING AND KNOWLEDGE DISCOVERY (2022)

Article Zoology

Quantifying mammalian diets

Kari Lintulaakso, Nikolaj Tatti, Indre Zliobaite

Summary: We propose a quantitative approach for categorising mammalian diets based on the taxonomy of food items and parts consumed. Our analysis reveals associations between dental complexity and the concentrations of certain nutrients. This study not only provides a data foundation for future comparative research, but also offers publicly available large-scale dietary data.

MAMMALIAN BIOLOGY (2023)

Article Computer Science, Artificial Intelligence

Fast computation of distance-generalized cores using sampling

Nikolaj Tatti

Summary: Core decomposition is a classic technique for discovering densely connected regions in a graph. The (k, h)-core is a natural extension of the k-core, where each node must have at least k nodes that can be reached within a distance of h. However, the (k, h)-core decomposition has a significantly increased computational complexity compared to the standard core decomposition. In this paper, a randomized algorithm is proposed to approximate the (k, h)-core decomposition, based on sampling the neighborhoods of nodes.

KNOWLEDGE AND INFORMATION SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

Column-coherent matrix decomposition

Nikolaj Tatti

Summary: Matrix decomposition is widely used in machine learning for dimension reduction or visualization. In this study, we focus on decomposing a matrix X of size n x m into a product WS, where S is a matrix of size n x k with consecutive ones property. We propose 5 different algorithms to solve the problem and compare them experimentally in terms of decompositon quality and computational time. The results show that our algorithms can produce interpretable results in practical time.

DATA MINING AND KNOWLEDGE DISCOVERY (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Coresets remembered and items forgotten: submodular maximization with deletions

Guangyi Zhang, Nikolaj Tatti, Aristides Gionis

Summary: This paper investigates the problem of robust submodular maximization against unexpected deletions and proposes a single-pass streaming algorithm and an offline algorithm, demonstrating their superior performance in real-life applications.

2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Recurrent Segmentation Meets Block Models in Temporal Networks

Chamalee Wickrama Arachchi, Nikolaj Tatti

Summary: This paper explores the method of modeling recurrent activity in temporal networks. The stochastic block model is used as a starting point and the edges are modeled with a Poisson process. Experimental results demonstrate the effectiveness of the algorithm and reveal the existence of recurrent behavior in certain real-world networks.

DISCOVERY SCIENCE (DS 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Community Detection in Edge-Labeled Graphs

Iiro Kumpulainen, Nikolaj Tatti

Summary: This paper studies the problem of finding dense subgraphs that can be explained with edge labels. It shows that greedy heuristics can efficiently find both conjunctive-induced and disjunctive-induced dense subgraphs. Experimental results demonstrate the ability to find interpretable subgraphs in synthetic graphs and real-world networks.

DISCOVERY SCIENCE (DS 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Fast computation of distance-generalized cores using sampling

Nikolaj Tatti

Summary: Core decomposition is a classic technique for discovering densely connected regions in a graph. While the (k,h)-core extension increases computational complexity, a randomized algorithm can provide an approximation of the decomposition in a shorter time. Sample-based approximation complements exact computation and is especially useful for slow network solutions.

2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021) (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Approximation Algorithms for Confidence Bands for Time Series

Nikolaj Tatti

Summary: This study discusses the calculation of confidence intervals and bands for time series, aiming to detect abnormal time series by minimizing the area enveloping k time series. Despite being NP-hard, optimal solutions for different k can be found by optimizing different band regions.

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Fast Likelihood-Based Change Point Detection

Nikolaj Tatti

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Finding events in temporal networks: Segmentation meets densest-subgraph discovery

Polina Rozenshtein, Francesco Bonchi, Aristides Gionis, Mauro Sozio, Nikolaj Tatti

2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) (2018)

No Data Available