☆ 4.5 Article Proceedings Paper

Ranking episodes using a partition model

DATA MINING AND KNOWLEDGE DISCOVERY (2015)

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Volume 29, Issue 5, Pages 1312-1342

Publisher

SPRINGER

DOI: 10.1007/s10618-015-0419-9

Keywords

Episode mining; Partition model; Pattern ranking

Categories

Computer Science, Artificial Intelligence Computer Science, Information Systems

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

One of the biggest setbacks in traditional frequent pattern mining is that overwhelmingly many of the discovered patterns are redundant. A prototypical example of such redundancy is a freerider pattern where the pattern contains a true pattern and some additional noise events. A technique for filtering freerider patterns that has proved to be efficient in ranking itemsets is to use a partition model where a pattern is divided into two subpatterns and the observed support is compared to the expected support under the assumption that these two subpatterns occur independently. In this paper we develop a partition model for episodes, patterns discovered from sequential data. An episode is essentially a set of events, with possible restrictions on the order of events. Unlike with itemset mining, computing the expected support of an episode requires surprisingly sophisticated methods. In order to construct the model, we partition the episode into two subepisodes. We then model how likely the events in each subepisode occur close to each other. If this probability is high-which is often the case if the subepisode has a high support-then we can expect that when one event from a subepisode occurs, then the remaining events occur also close by. This approach increases the expected support of the episode, and if this increase explains the observed support, then we can deem the episode uninteresting. We demonstrate in our experiments that using the partition model can effectively and efficiently reduce the redundancy in episodes.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Review Computer Science, Artificial Intelligence

A survey of episode mining

Oualid Ouarem, Farid Nouioua, Philippe Fournier-Viger

Summary: Episode mining is a research area in data mining that aims to discover interesting episodes in an event sequence. It has been applied to various applications and shown to reveal insightful patterns. This article presents an up-to-date survey of episode mining, covering introduction, algorithms, recent developments, and future research directions.

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY (2023)

Add to Collection

Article Computer Science, Information Systems

A Scalable Analytical Framework for Complex Event Episode Mining With Various Domains Applications

Jerry C. C. Tseng, Sun-Yuan Hsieh, Vincent S. Tseng

Summary: This paper proposes a framework called SAAF based on complex event pattern mining techniques, which can be efficiently and accurately applied to various real-life problems. By adopting incremental analysis and pattern merging methods, combined with the Lambda architecture and Apache Spark technology, it exhibits excellent performance.

IEEE ACCESS (2022)

Add to Collection

Article Computer Science, Information Systems

What to expect from a set of itemsets?

T. Delacroix, P. Lenca, S. Lallich

Summary: This paper introduces the concept of Mutual Constrained Independence (MCI) and proposes a method for computing MCI models based on algebraic geometry. It aims to address the challenge of redundancy in frequency-based data mining and itemset mining. The research also establishes the link between MCI models and a class of MaxEnt models used in pattern mining.

INFORMATION SCIENCES (2022)

Add to Collection

Article Social Sciences, Interdisciplinary

Two in One: A New Tool to Combine Two Rankings Based on the Voronoi Diagram

Francesca Mariani, Mariateresa Ciommi, Maria Cristina Recchioni

Summary: In this paper, a novel method for ranking items based on two indices is proposed. The method involves an iterative scheme utilizing the Voronoi algorithm in a two-dimensional space. Empirical evidence shows that the proposed method performs better in capturing information from the original indices compared to traditional correlation-based ranking.

SOCIAL INDICATORS RESEARCH (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Analysis of the impact of DMUs on the overall efficiency in the event of a merger

A. Saavedra-Nieves, M. G. Fiestras-Janeiro

Summary: This paper discusses several mechanisms for overall ranking of Decision Making Units (DMUs) based on their contribution to the relative efficiency score of a merger. The external organization of agents in each possible merger affects the relative efficiency score, which justifies the use of games and specific ranking indices based on the Shapley value for evaluating DMUs. Computational problems arise when the number of DMUs increases, and two sampling alternatives are proposed to reduce these issues. Finally, the methods are applied to analyze the efficiency of the hotel industry in Spain.

EXPERT SYSTEMS WITH APPLICATIONS (2022)

Add to Collection

Article Chemistry, Multidisciplinary

Spatial Uncertainty of Target Patterns Generated by Different Prediction Models of Landslide Susceptibility

Andrea G. Fabbri, Antonio Patera

Summary: This study exposes the relative uncertainties associated with prediction patterns of landslide susceptibility, using relationships between direct and indirect spatial evidence to analyze the prediction patterns. Five mathematical modeling functions are applied to capture and integrate evidence, resulting in prediction scores.

APPLIED SCIENCES-BASEL (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

VEPRECO: Vertical databases with pre-pruning strategies and common candidate selection policies to fasten sequential pattern mining

Natalia Mordvanyuk, Albert Bifet, Beatriz Lopez

Summary: This paper introduces a new efficient sequential pattern mining algorithm called VEPRECO, which accelerates the mining process through vertical representation of patterns, pre-pruning strategies, and common candidate selection policies. Experimental evaluation shows that the proposed algorithm significantly reduces time and memory usage.

EXPERT SYSTEMS WITH APPLICATIONS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Advanced uncertainty based approach for discovering erasable product patterns

Chanhee Lee, Yoonji Baek, Tin Truong, Unil Yun, Jerry Chun-Wei Lin

Summary: This study presents an efficient algorithm for mining erasable patterns from uncertain databases. The algorithm takes into account the probability of each item and extracts low-profit patterns.

KNOWLEDGE-BASED SYSTEMS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

A framework for proposing a liquid stock portfolio using frequent itemset mining from time-series data

Majid Moghtadai, Farsad Zamani Boroujeni, Mohammadreza Soltanaghaei

Summary: Data mining offers methods for identifying frequent patterns that exceed a specific threshold, which helps business owners find items of high frequency or utility. However, in the stock markets, the challenge for buyers is to select a set of items that fit within their budget. This study proposes a framework that uses pattern mining techniques and stock market data to provide a liquid purchasing portfolio. It utilizes ranking tables to identify highly traded stocks over time and introduces a new algorithm for mining frequent items based on multivariate time series data. Additionally, it applies stock prediction techniques to identify profitable stocks based on user budget. Experimental results show that this framework provides four times more liquidity and 60% more profitability compared to the whole market, evaluated in terms of liquidity and profitability.

APPLIED INTELLIGENCE (2023)

Add to Collection

Article Automation & Control Systems

Mining high occupancy patterns to analyze incremental data in intelligent systems

Heonho Kim, Taewoong Ryu, Chanhee Lee, Hyeonmo Kim, Tin Truong, Philippe Fournier-Viger, Witold Pedrycz, Unil Yun

Summary: In this paper, we propose an approach called HOMI for mining high occupancy patterns on incremental databases. The experimental results demonstrate that HOMI outperforms other methods in terms of performance.

ISA TRANSACTIONS (2022)

Add to Collection

Article Computer Science, Information Systems

Identification of adverse disease agents and risk analysis using frequent pattern mining

Shafiul Alom Ahmed, Bhabesh Nath

Summary: The paper introduces an approach to pattern mining called Improved Frequent Pattern Growth, which constructs an Improved FP-tree data structure and introduces a layout of Conditional FP-tree for efficient generation of frequent patterns. The experimental results highlight the significance of the proposed Improved FP-Growth algorithm over traditional frequent itemset mining algorithms.

INFORMATION SCIENCES (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

ITUFP: A fast method for interactive mining of Top-K frequent patterns from uncertain data

Razieh Davashi

Summary: In this paper, a fast method called ITUFP is proposed for interactive mining of Top-K UFPs. The method efficiently stores and extracts pattern information by creating UP-Lists and IMCUP-Lists, and only updates the IMCUP-Lists when the K value changes. Experimental results demonstrate that the proposed method is very efficient for interactive mining of Top-K UFPs.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

UGMINE: utility-based graph mining

Md. Tanvir Alam, Amit Roy, Chowdhury Farhan Ahmed, Md. Ashraful Islam, Carson K. Leung

Summary: This study proposes a complete framework for utility-based graph pattern mining, introducing the UGMINE algorithm and the RMU pruning technique. Experimental results demonstrate the effectiveness of this framework in extracting high utility subgraph patterns.

APPLIED INTELLIGENCE (2023)

Add to Collection

Article Computer Science, Interdisciplinary Applications

Dimensionality analysis in machine learning failure detection models. A case study with LNG compressors

Fernando Hidalgo-Mompean, Juan Francisco Gomez Fernandez, Gonzalo Cerruela-Garcia, Adolfo Crespo Marquez

Summary: This paper explores the feature selection problem in compressor failure detection using machine learning models. It evaluates the impact of various feature selection ranking methods on the development of diagnostic models for rod drop failure.

COMPUTERS IN INDUSTRY (2021)

Add to Collection

Article Automation & Control Systems

UP-tree & UP-Mine: A fast method based on upper bound for frequent pattern mining from uncertain data

Razieh Davashi

Summary: This study proposes an efficient method based on an upper bound approach to mine uncertain frequent patterns, reducing false positives significantly by tightening the upper bound of expected support and early pruning of infrequent 2-itemsets and their supersets.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2021)

Add to Collection

Article Computer Science, Information Systems

Strongly polynomial efficient approximation scheme for segmentation

Nikolaj Tatti

INFORMATION PROCESSING LETTERS (2019)

Add to Collection

Article Computer Science, Information Systems

Density-Friendly Graph Decomposition

Nikolaj Tatti

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2019)

Add to Collection

Article Computer Science, Artificial Intelligence

The network-untangling problem: from interactions to activity timelines

Polina Rozenshtein, Nikolaj Tatti, Aristides Gionis

Summary: This paper investigates the problem of determining entity activity based on interactions, proposing two formulations and efficient algorithms for untangling networks. While the sum problem is shown to be NP-hard, the max problem can be solved optimally in linear time. In cases of multiple activity intervals per entity, both formulations are proved to be inapproximable but efficient algorithms based on alternative optimization are proposed. Evaluation on synthetic and real-world datasets supports the validity of concepts and performance of algorithms.

DATA MINING AND KNOWLEDGE DISCOVERY (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Maintaining AUC and H-measure over time

Nikolaj Tatti

Summary: The paper discusses three algorithms for maintaining performance measures of classifiers in machine learning, including AUC and H-measure based on ROC curve. AUC can be updated in O(log n) time by maintaining sorted data points in a search tree. For H-measure, the convex hull can be maintained using a modified convex hull maintenance algorithm, and the measure can be computed or estimated in varying time complexities based on certain conditions. Empirical results show that the proposed methods are significantly faster than baseline approaches.

MACHINE LEARNING (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Ranking with submodular functions on a budget

Guangyi Zhang, Nikolaj Tatti, Aristides Gionis

Summary: Submodular maximization is fundamental in many important machine learning problems and has various applications. However, the study of maximizing submodular functions has often been limited to selecting a set of items, while many real-world applications require a ranking solution. This paper introduces a novel formulation for ranking items with submodular valuations and budget constraints, and proposes practical algorithms with approximation guarantees for different types of budget constraints. The empirical evaluation shows that the proposed algorithms outperform strong baselines.

DATA MINING AND KNOWLEDGE DISCOVERY (2022)

Add to Collection

Article Zoology

Quantifying mammalian diets

Kari Lintulaakso, Nikolaj Tatti, Indre Zliobaite

Summary: We propose a quantitative approach for categorising mammalian diets based on the taxonomy of food items and parts consumed. Our analysis reveals associations between dental complexity and the concentrations of certain nutrients. This study not only provides a data foundation for future comparative research, but also offers publicly available large-scale dietary data.

MAMMALIAN BIOLOGY (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Fast computation of distance-generalized cores using sampling

Nikolaj Tatti

Summary: Core decomposition is a classic technique for discovering densely connected regions in a graph. The (k, h)-core is a natural extension of the k-core, where each node must have at least k nodes that can be reached within a distance of h. However, the (k, h)-core decomposition has a significantly increased computational complexity compared to the standard core decomposition. In this paper, a randomized algorithm is proposed to approximate the (k, h)-core decomposition, based on sampling the neighborhoods of nodes.

KNOWLEDGE AND INFORMATION SYSTEMS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Column-coherent matrix decomposition

Nikolaj Tatti

Summary: Matrix decomposition is widely used in machine learning for dimension reduction or visualization. In this study, we focus on decomposing a matrix X of size n x m into a product WS, where S is a matrix of size n x k with consecutive ones property. We propose 5 different algorithms to solve the problem and compare them experimentally in terms of decompositon quality and computational time. The results show that our algorithms can produce interpretable results in practical time.

DATA MINING AND KNOWLEDGE DISCOVERY (2023)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Coresets remembered and items forgotten: submodular maximization with deletions

Guangyi Zhang, Nikolaj Tatti, Aristides Gionis

Summary: This paper investigates the problem of robust submodular maximization against unexpected deletions and proposes a single-pass streaming algorithm and an offline algorithm, demonstrating their superior performance in real-life applications.

2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) (2022)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Recurrent Segmentation Meets Block Models in Temporal Networks

Chamalee Wickrama Arachchi, Nikolaj Tatti

Summary: This paper explores the method of modeling recurrent activity in temporal networks. The stochastic block model is used as a starting point and the edges are modeled with a Poisson process. Experimental results demonstrate the effectiveness of the algorithm and reveal the existence of recurrent behavior in certain real-world networks.

DISCOVERY SCIENCE (DS 2022) (2022)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Community Detection in Edge-Labeled Graphs

Iiro Kumpulainen, Nikolaj Tatti

Summary: This paper studies the problem of finding dense subgraphs that can be explained with edge labels. It shows that greedy heuristics can efficiently find both conjunctive-induced and disjunctive-induced dense subgraphs. Experimental results demonstrate the ability to find interpretable subgraphs in synthetic graphs and real-world networks.

DISCOVERY SCIENCE (DS 2022) (2022)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Fast computation of distance-generalized cores using sampling

Nikolaj Tatti

Summary: Core decomposition is a classic technique for discovering densely connected regions in a graph. While the (k,h)-core extension increases computational complexity, a randomized algorithm can provide an approximation of the decomposition in a shorter time. Sample-based approximation complements exact computation and is especially useful for slow network solutions.

2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021) (2021)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Approximation Algorithms for Confidence Bands for Time Series

Nikolaj Tatti

Summary: This study discusses the calculation of confidence intervals and bands for time series, aiming to detect abnormal time series by minimizing the area enveloping k time series. Despite being NP-hard, optimal solutions for different k can be found by optimizing different band regions.

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES (2021)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Fast Likelihood-Based Change Point Detection

Nikolaj Tatti

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I (2020)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Finding events in temporal networks: Segmentation meets densest-subgraph discovery

Polina Rozenshtein, Francesco Bonchi, Aristides Gionis, Mauro Sozio, Nikolaj Tatti

2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) (2018)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.