Review
Computer Science, Artificial Intelligence
Oualid Ouarem, Farid Nouioua, Philippe Fournier-Viger
Summary: Episode mining is a research area in data mining that aims to discover interesting episodes in an event sequence. It has been applied to various applications and shown to reveal insightful patterns. This article presents an up-to-date survey of episode mining, covering introduction, algorithms, recent developments, and future research directions.
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
(2023)
Article
Computer Science, Information Systems
Jerry C. C. Tseng, Sun-Yuan Hsieh, Vincent S. Tseng
Summary: This paper proposes a framework called SAAF based on complex event pattern mining techniques, which can be efficiently and accurately applied to various real-life problems. By adopting incremental analysis and pattern merging methods, combined with the Lambda architecture and Apache Spark technology, it exhibits excellent performance.
Article
Computer Science, Information Systems
T. Delacroix, P. Lenca, S. Lallich
Summary: This paper introduces the concept of Mutual Constrained Independence (MCI) and proposes a method for computing MCI models based on algebraic geometry. It aims to address the challenge of redundancy in frequency-based data mining and itemset mining. The research also establishes the link between MCI models and a class of MaxEnt models used in pattern mining.
INFORMATION SCIENCES
(2022)
Article
Social Sciences, Interdisciplinary
Francesca Mariani, Mariateresa Ciommi, Maria Cristina Recchioni
Summary: In this paper, a novel method for ranking items based on two indices is proposed. The method involves an iterative scheme utilizing the Voronoi algorithm in a two-dimensional space. Empirical evidence shows that the proposed method performs better in capturing information from the original indices compared to traditional correlation-based ranking.
SOCIAL INDICATORS RESEARCH
(2023)
Article
Computer Science, Artificial Intelligence
A. Saavedra-Nieves, M. G. Fiestras-Janeiro
Summary: This paper discusses several mechanisms for overall ranking of Decision Making Units (DMUs) based on their contribution to the relative efficiency score of a merger. The external organization of agents in each possible merger affects the relative efficiency score, which justifies the use of games and specific ranking indices based on the Shapley value for evaluating DMUs. Computational problems arise when the number of DMUs increases, and two sampling alternatives are proposed to reduce these issues. Finally, the methods are applied to analyze the efficiency of the hotel industry in Spain.
EXPERT SYSTEMS WITH APPLICATIONS
(2022)
Article
Chemistry, Multidisciplinary
Andrea G. Fabbri, Antonio Patera
Summary: This study exposes the relative uncertainties associated with prediction patterns of landslide susceptibility, using relationships between direct and indirect spatial evidence to analyze the prediction patterns. Five mathematical modeling functions are applied to capture and integrate evidence, resulting in prediction scores.
APPLIED SCIENCES-BASEL
(2021)
Article
Computer Science, Artificial Intelligence
Natalia Mordvanyuk, Albert Bifet, Beatriz Lopez
Summary: This paper introduces a new efficient sequential pattern mining algorithm called VEPRECO, which accelerates the mining process through vertical representation of patterns, pre-pruning strategies, and common candidate selection policies. Experimental evaluation shows that the proposed algorithm significantly reduces time and memory usage.
EXPERT SYSTEMS WITH APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Chanhee Lee, Yoonji Baek, Tin Truong, Unil Yun, Jerry Chun-Wei Lin
Summary: This study presents an efficient algorithm for mining erasable patterns from uncertain databases. The algorithm takes into account the probability of each item and extracts low-profit patterns.
KNOWLEDGE-BASED SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Majid Moghtadai, Farsad Zamani Boroujeni, Mohammadreza Soltanaghaei
Summary: Data mining offers methods for identifying frequent patterns that exceed a specific threshold, which helps business owners find items of high frequency or utility. However, in the stock markets, the challenge for buyers is to select a set of items that fit within their budget. This study proposes a framework that uses pattern mining techniques and stock market data to provide a liquid purchasing portfolio. It utilizes ranking tables to identify highly traded stocks over time and introduces a new algorithm for mining frequent items based on multivariate time series data. Additionally, it applies stock prediction techniques to identify profitable stocks based on user budget. Experimental results show that this framework provides four times more liquidity and 60% more profitability compared to the whole market, evaluated in terms of liquidity and profitability.
APPLIED INTELLIGENCE
(2023)
Article
Automation & Control Systems
Heonho Kim, Taewoong Ryu, Chanhee Lee, Hyeonmo Kim, Tin Truong, Philippe Fournier-Viger, Witold Pedrycz, Unil Yun
Summary: In this paper, we propose an approach called HOMI for mining high occupancy patterns on incremental databases. The experimental results demonstrate that HOMI outperforms other methods in terms of performance.
Article
Computer Science, Information Systems
Shafiul Alom Ahmed, Bhabesh Nath
Summary: The paper introduces an approach to pattern mining called Improved Frequent Pattern Growth, which constructs an Improved FP-tree data structure and introduces a layout of Conditional FP-tree for efficient generation of frequent patterns. The experimental results highlight the significance of the proposed Improved FP-Growth algorithm over traditional frequent itemset mining algorithms.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Artificial Intelligence
Razieh Davashi
Summary: In this paper, a fast method called ITUFP is proposed for interactive mining of Top-K UFPs. The method efficiently stores and extracts pattern information by creating UP-Lists and IMCUP-Lists, and only updates the IMCUP-Lists when the K value changes. Experimental results demonstrate that the proposed method is very efficient for interactive mining of Top-K UFPs.
EXPERT SYSTEMS WITH APPLICATIONS
(2023)
Article
Computer Science, Artificial Intelligence
Md. Tanvir Alam, Amit Roy, Chowdhury Farhan Ahmed, Md. Ashraful Islam, Carson K. Leung
Summary: This study proposes a complete framework for utility-based graph pattern mining, introducing the UGMINE algorithm and the RMU pruning technique. Experimental results demonstrate the effectiveness of this framework in extracting high utility subgraph patterns.
APPLIED INTELLIGENCE
(2023)
Article
Computer Science, Interdisciplinary Applications
Fernando Hidalgo-Mompean, Juan Francisco Gomez Fernandez, Gonzalo Cerruela-Garcia, Adolfo Crespo Marquez
Summary: This paper explores the feature selection problem in compressor failure detection using machine learning models. It evaluates the impact of various feature selection ranking methods on the development of diagnostic models for rod drop failure.
COMPUTERS IN INDUSTRY
(2021)
Article
Automation & Control Systems
Razieh Davashi
Summary: This study proposes an efficient method based on an upper bound approach to mine uncertain frequent patterns, reducing false positives significantly by tightening the upper bound of expected support and early pruning of infrequent 2-itemsets and their supersets.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2021)
Article
Computer Science, Information Systems
Nikolaj Tatti
INFORMATION PROCESSING LETTERS
(2019)
Article
Computer Science, Information Systems
Nikolaj Tatti
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA
(2019)
Article
Computer Science, Artificial Intelligence
Polina Rozenshtein, Nikolaj Tatti, Aristides Gionis
Summary: This paper investigates the problem of determining entity activity based on interactions, proposing two formulations and efficient algorithms for untangling networks. While the sum problem is shown to be NP-hard, the max problem can be solved optimally in linear time. In cases of multiple activity intervals per entity, both formulations are proved to be inapproximable but efficient algorithms based on alternative optimization are proposed. Evaluation on synthetic and real-world datasets supports the validity of concepts and performance of algorithms.
DATA MINING AND KNOWLEDGE DISCOVERY
(2021)
Article
Computer Science, Artificial Intelligence
Nikolaj Tatti
Summary: The paper discusses three algorithms for maintaining performance measures of classifiers in machine learning, including AUC and H-measure based on ROC curve. AUC can be updated in O(log n) time by maintaining sorted data points in a search tree. For H-measure, the convex hull can be maintained using a modified convex hull maintenance algorithm, and the measure can be computed or estimated in varying time complexities based on certain conditions. Empirical results show that the proposed methods are significantly faster than baseline approaches.
Article
Computer Science, Artificial Intelligence
Guangyi Zhang, Nikolaj Tatti, Aristides Gionis
Summary: Submodular maximization is fundamental in many important machine learning problems and has various applications. However, the study of maximizing submodular functions has often been limited to selecting a set of items, while many real-world applications require a ranking solution. This paper introduces a novel formulation for ranking items with submodular valuations and budget constraints, and proposes practical algorithms with approximation guarantees for different types of budget constraints. The empirical evaluation shows that the proposed algorithms outperform strong baselines.
DATA MINING AND KNOWLEDGE DISCOVERY
(2022)
Article
Zoology
Kari Lintulaakso, Nikolaj Tatti, Indre Zliobaite
Summary: We propose a quantitative approach for categorising mammalian diets based on the taxonomy of food items and parts consumed. Our analysis reveals associations between dental complexity and the concentrations of certain nutrients. This study not only provides a data foundation for future comparative research, but also offers publicly available large-scale dietary data.
Article
Computer Science, Artificial Intelligence
Nikolaj Tatti
Summary: Core decomposition is a classic technique for discovering densely connected regions in a graph. The (k, h)-core is a natural extension of the k-core, where each node must have at least k nodes that can be reached within a distance of h. However, the (k, h)-core decomposition has a significantly increased computational complexity compared to the standard core decomposition. In this paper, a randomized algorithm is proposed to approximate the (k, h)-core decomposition, based on sampling the neighborhoods of nodes.
KNOWLEDGE AND INFORMATION SYSTEMS
(2023)
Article
Computer Science, Artificial Intelligence
Nikolaj Tatti
Summary: Matrix decomposition is widely used in machine learning for dimension reduction or visualization. In this study, we focus on decomposing a matrix X of size n x m into a product WS, where S is a matrix of size n x k with consecutive ones property. We propose 5 different algorithms to solve the problem and compare them experimentally in terms of decompositon quality and computational time. The results show that our algorithms can produce interpretable results in practical time.
DATA MINING AND KNOWLEDGE DISCOVERY
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Guangyi Zhang, Nikolaj Tatti, Aristides Gionis
Summary: This paper investigates the problem of robust submodular maximization against unexpected deletions and proposes a single-pass streaming algorithm and an offline algorithm, demonstrating their superior performance in real-life applications.
2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Chamalee Wickrama Arachchi, Nikolaj Tatti
Summary: This paper explores the method of modeling recurrent activity in temporal networks. The stochastic block model is used as a starting point and the edges are modeled with a Poisson process. Experimental results demonstrate the effectiveness of the algorithm and reveal the existence of recurrent behavior in certain real-world networks.
DISCOVERY SCIENCE (DS 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Iiro Kumpulainen, Nikolaj Tatti
Summary: This paper studies the problem of finding dense subgraphs that can be explained with edge labels. It shows that greedy heuristics can efficiently find both conjunctive-induced and disjunctive-induced dense subgraphs. Experimental results demonstrate the ability to find interpretable subgraphs in synthetic graphs and real-world networks.
DISCOVERY SCIENCE (DS 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Nikolaj Tatti
Summary: Core decomposition is a classic technique for discovering densely connected regions in a graph. While the (k,h)-core extension increases computational complexity, a randomized algorithm can provide an approximation of the decomposition in a shorter time. Sample-based approximation complements exact computation and is especially useful for slow network solutions.
2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021)
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Nikolaj Tatti
Summary: This study discusses the calculation of confidence intervals and bands for time series, aiming to detect abnormal time series by minimizing the area enveloping k time series. Despite being NP-hard, optimal solutions for different k can be found by optimizing different band regions.
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Nikolaj Tatti
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I
(2020)
Proceedings Paper
Computer Science, Artificial Intelligence
Polina Rozenshtein, Francesco Bonchi, Aristides Gionis, Mauro Sozio, Nikolaj Tatti
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)
(2018)