☆ 4.7 Article

An empirical evaluation of high utility itemset mining algorithms

EXPERT SYSTEMS WITH APPLICATIONS (2018)

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Volume 101, Issue -, Pages 91-115

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2018.02.008

Keywords

itemset mining; High utility itemsets; State-of-the-art high utility itemset mining

Categories

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

Funding

National Science Foundation of China (NSFC) [41401466]
National Key Technology R&D Program of China [2015BAK01B06]
Henan University [xxjc20140005]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

High utility itemset mining (HUIM) has emerged as an important research topic in data mining, with applications to retail-market data analysis, stock market prediction, and recommender systems, etc. However, there are very few empirical studies that systematically compare the performance of state-of-the-art HUIM algorithms. In this paper, we present an experimental evaluation on 10 major HUIM algorithms, using 9 real world and 27 synthetic datasets to evaluate their performance. Our experiments show that EFIM and d2HUP are generally the top two performers in running time, while EFIM also consumes the least memory in most cases. In order to compare these two algorithms in depth, we use another 45 synthetic datasets with varying parameters so as to study the influence of the related parameters, in particular the number of transactions, the number of distinct items and average transaction length, on the running time and memory consumption of EFIM and d2HUP. In this work, we demonstrate that, d2HUP is more efficient than EFIM under low minimum utility values and with large sparse datasets, in terms of running time; although EFIM is the fastest in dense real datasets, it is among the slowest algorithms in sparse datasets. We suggest that, when a dataset is very sparse or the average transaction length is large, and running time is favoured over memory consumption, d2HUP should be chosen. Finally, we compare d2HUP and EFIM with two newest algorithms, mHUlMiner and ULB-Miner, and find these two algorithms have moderate performance. This work has reference value for researchers and practitioners when choosing the most appropriate HUIM algorithm for their specific applications. (C) 2018 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

Efficient algorithms for mining closed high utility itemsets in dynamic profit databases

Trinh D. D. Nguyen, Loan T. T. Nguyen, Lung Vu, Bay Vo, Witold Pedrycz

Summary: This research addresses the problem of mining high-utility itemsets in dynamic unit profit databases, introducing a novel algorithm iEFIM-Closed that outperforms state-of-the-art algorithms in sparse databases.

EXPERT SYSTEMS WITH APPLICATIONS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Mining High Utility Itemsets Using Prefix Trees and Utility Vectors

Jun-Feng Qu, Philippe Fournier-Viger, Mengchi Liu, Bo Hang, Chunyang Hu

Summary: This paper proposes a new algorithm called Hamm for mining high utility itemsets. Hamm utilizes a TV (prefix Tree and utility Vector) structure to mine high utility itemsets in a one-phase manner without candidate generation. Experimental results show that Hamm outperforms other algorithms in terms of performance.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Efficient algorithms for mining closed and maximal high utility itemsets

Hai Duong, Tien Hoang, Thong Tran, Tin Truong, Bac Le, Philippe Fournier-Viger

Summary: Closed high utility itemsets (CHUIs) and maximal high utility itemsets (MaxHUIs) are important concise representations of HUIs. Mining these representations is crucial for generating meaningful high utility association rules. However, existing algorithms suffer from long runtimes, high memory usage, and scalability issues. To address this, this paper proposes two efficient algorithms that can mine these representations faster.

KNOWLEDGE-BASED SYSTEMS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

An efficient method for mining High-Utility itemsets from unstable negative profit databases

N. T. Tung, Trinh D. D. Nguyen, Loan T. T. Nguyen, Bay Vo

Summary: The study of High-Utility Itemset Mining (HUIM) and Frequent Itemset Mining (FIM) is crucial as it explains consumer behavior and provides actionable advice for improving business results. This paper presents strategies for making database scanning more efficient and reducing the number of candidates using strict upper-bound approaches. It also introduces a novel algorithm to efficiently solve the problem.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Information Systems

H-FHAUI: Hiding frequent high average utility itemsets

Bac Le, Tin Truong, Hai Duong, Philippe Fournier-Viger, Hamido Fujita

Summary: High average-utility itemset mining aims to identify sets of items with high average utility through analyzing a quantitative customer transactional database. To address the issue of sensitive information exposure, this study investigates the problem of hiding frequent high average-utility itemsets (FHAUIs) and proposes an algorithm named H-FHAUI. Experimental results demonstrate that H-FHAUI outperforms the baseline approach in terms of performance.

INFORMATION SCIENCES (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

HDSHUI-miner: a novel algorithm for discovering spatial high-utility itemsets in high-dimensional spatiotemporal databases

Rage Uday Kiran, Pamalla Veena, Penugonda Ravikumar, Bathala Venus Vikranth Raj, Minh-Son Dao, Koji Zettsu, Sai Chithra Bommisetti

Summary: Spatial high-utility itemset (SHUI) mining is an important data analysis technique that aims to locate geographically interesting itemsets with high utility in a spatiotemporal database. However, the existing SHUI-Miner algorithm has performance issues when dealing with high-dimensional spatiotemporal databases. This paper proposes a novel algorithm called high-dimensional SHUI-miner (HDSHUI-Miner) that outperforms SHUI-Miner in terms of memory consumption, runtime, and scalability, as demonstrated by experimental results on seven real-world databases. Two real-world case studies are also presented to illustrate the usefulness of the proposed algorithm.

APPLIED INTELLIGENCE (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

High average-utility itemsets mining: a survey

Kuldeep Singh, Rajiv Kumar, Bhaskar Biswas

Summary: HUIM and HAUIM are subdivisions of data mining that focus on obtaining promising patterns in quantitative datasets, with applications in market analysis, bioinformatics, text mining, network analysis, product recommendation, and e-learning.

APPLIED INTELLIGENCE (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

High-utility itemsets mining based on binary particle swarm optimization with multiple adjustment strategies

Wei Fang, Qiang Zhang, Hengyang Lu, Jerry Chun-Wei Lin

Summary: This study proposes an improved binary particle swarm optimization (HUIM-IBPSO) for high-utility itemset mining (HUIM), addressing the issues of exponential growth search space and time-consuming process in traditional exact algorithms. The proposed approach incorporates multiple adjustment strategies to keep the same HUIs, enhance search ability, avoid premature convergence, and improve efficiency in mining HUIs.

APPLIED SOFT COMPUTING (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

FCHM-stream: fast closed high utility itemsets mining over data streams

Muhang Li, Meng Han, Zhiqiang Chen, Hongxin Wu, Xilong Zhang

Summary: The high-speed and continuous nature of data streams poses challenges in mining high utility itemsets in limited memory space. In order to overcome these challenges and provide users with concise and lossless results, a new closed high utility pattern mining algorithm over data stream is proposed, named FCHM-Stream.

KNOWLEDGE AND INFORMATION SYSTEMS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Mining fuzzy high average-utility itemsets using fuzzy utility lists and efficient pruning approach

Manijeh Hajihoseini, Mohammad Karim Sohrabi

Summary: The FHAUI mining problem considers the effect of itemset length on the calculated utility, avoiding large sets containing low utility items. The HiFAM method is proposed to efficiently explore FHAUIs by extending the MHAI method and introducing the FAUL structure. Through a depth-first exploration process, the complete set of FHAUIs can be extracted. Pruning techniques are also used to reduce memory consumption and time complexity.

SOFT COMPUTING (2022)

Add to Collection

Article Computer Science, Software Engineering

Mining high utility itemsets with time-aware scheduling using Apache Spark

Anup Brahmavar, Harish Venkatarama, Geetha Maiya

Summary: Market Basket Analysis, considering purchase quantity and unit profit, has been boosted by the increase in revenue information. However, existing algorithms' performance degrades as databases grow, and distributed computing solutions like Apache Hadoop and Apache Spark have proven effective in solving this problem. This study develops a parallel workflow on a Spark cluster to improve algorithm efficiency, and experimental evaluation demonstrates its superiority.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Generalized maximal utility for mining high average-utility itemsets

Wei Song, Lu Liu, Chaomin Huang

Summary: The paper introduces an efficient HAIU mining algorithm, HAUIM-GMU, based on generalized maximal utility for mining high average-utility itemsets. The algorithm proposes a new pruning strategy utilizing the concept of support to filter out unpromising itemsets effectively, outperforming existing state-of-the-art algorithms according to extensive experimental results.

KNOWLEDGE AND INFORMATION SYSTEMS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

An efficient method for mining multi-level high utility Itemsets

N. T. Tung, Loan T. T. Nguyen, Trinh D. D. Nguyen, Bay Vo

Summary: High-utility itemset mining is an effective tool for analyzing customer behavior by identifying the most beneficial itemsets in transaction databases. Traditional algorithms often overlook categorization of items, leading to a limitation in discovering important patterns at higher levels.

APPLIED INTELLIGENCE (2022)

Add to Collection

Article Multidisciplinary Sciences

Ignoring Internal Utilities in High-Utility Itemset Mining

Damla Oguz

Summary: High-utility itemset mining aims to discover sets of items that are sold together with utility values that exceed a minimum threshold. The method considers internal and external utility values of the itemsets, and their symmetric effects in determining high-utility itemsets. A proposed asymmetric approach focuses on high external utility values while ignoring internal utility values, leading to more efficient discovery and highlighting the fundamental impact of external utility values.

SYMMETRY-BASEL (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Parallel approaches to extract multi-level high utility itemsets from hierarchical transaction databases

Trinh D. D. Nguyen, N. T. Tung, Thiet Pham, Loan T. T. Nguyen

Summary: In the field of data mining, high utility itemset mining (HUIM) is a relevant task for analyzing customer transaction databases. Exploiting frequently purchased items that yield high profit value, HUIM provides useful insights into customer behaviors. This work introduces three new efficient strategies and proposes two new algorithms, MCML+ and MCML++, to significantly improve the performance of multi-level high utility itemset mining using multicore processing. Extensive experiments show that the proposed algorithms outperform previous approaches in terms of running time and scalability.

KNOWLEDGE-BASED SYSTEMS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

An up-to-date comparison of state-of-the-art classification algorithms

Chongsheng Zhang, Changchang Liu, Xiangliang Zhang, George Almpanidis

EXPERT SYSTEMS WITH APPLICATIONS (2017)

Add to Collection

Article Acoustics

Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations

George Almpanidis, Margarita Kotti, Constantine Kotropoulos

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2009)

Add to Collection

Article Acoustics

Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion

George Almpanidis, Constantine Kotropoulos

SPEECH COMMUNICATION (2008)

Add to Collection

Article Computer Science, Artificial Intelligence

On Incremental Learning for Gradient Boosting Decision Trees

Chongsheng Zhang, Yuan Zhang, Xianjin Shi, George Almpanidis, Gaojuan Fan, Xiajiong Shen

NEURAL PROCESSING LETTERS (2019)

Add to Collection

Article Computer Science, Artificial Intelligence

An empirical study on the joint impact of feature selection and data resampling on imbalance classification

Chongsheng Zhang, Paolo Soda, Jingjun Bi, Gaojuan Fan, George Almpanidis, Salvador Garcia, Weiping Ding

Summary: This study investigates the performance of feature selection and data resampling in two opposite imbalanced classification frameworks, and suggests that both frameworks should be considered for finding the best performing imbalanced classification model. The study also examines the impact of classifiers, IR, and SFR on the performance of imbalance classification.

APPLIED INTELLIGENCE (2023)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Parallel High Utility Itemset Mining

Gaojuan Fan, Huaiyuan Xiao, Chongsheng Zhang, George Almpanidis, Philippe Fournier-Viger, Hamido Fujita

Summary: Association rule mining is a popular task in data mining for discovering relationships between co-occurring itemsets in a transactional database. The current algorithms for association rule mining are inefficient when dealing with large volumes of data, and high utility itemset mining (HUIM) has emerged as a popular solution. This paper investigates parallel HUIM algorithms and adapts them for parallel processing using the Apache Spark platform, resulting in improved efficiency.

ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE (2022)

Add to Collection

Article Computer Science, Information Systems

Gridvoronoi: An Efficient Spatial Index for Nearest Neighbor Query Processing

Chongsheng Zhang, George Almpanidis, Faegheh Hasibi, Gaojuan Fan

IEEE ACCESS (2019)

Add to Collection

Article Computer Science, Information Systems

Combining text and link analysis for focused crawling - An application for vertical search engines

G. Almpanidis, C. Kotropoulos, I. Pitas

INFORMATION SYSTEMS (2007)

Add to Collection

Article Computer Science, Artificial Intelligence

Language identification in web documents using discrete HMMs

A Xafopoulos, C Kotropoulos, G Almpanidis, I Pitas

PATTERN RECOGNITION (2004)

Add to Collection

Review Computer Science, Artificial Intelligence

A comprehensive review of slope stability analysis based on artificial intelligence methods

Wei Gao, Shuangshuang Ge

Summary: This study provides a comprehensive review of slope stability research based on artificial intelligence methods, focusing on slope stability computation and evaluation. The review covers studies using quasi-physical intelligence methods, simulated evolutionary methods, swarm intelligence methods, hybrid intelligence methods, artificial neural network methods, vector machine methods, and other intelligence methods. The merits, demerits, and state-of-the-art research advancement of these studies are analyzed, and possible research directions for slope stability investigation based on artificial intelligence methods are suggested.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Machine learning approaches for lateral strength estimation in squat shear walls: A comparative study and practical implications

Khuong Le Nguyen, Hoa Thi Trinh, Saeed Banihashemi, Thong M. Pham

Summary: This study investigated the influence of input parameters on the shear strength of RC squat walls and found that ensemble learning models, particularly XGBoost, can effectively predict the shear strength. The axial load had a greater influence than reinforcement ratio, and longitudinal reinforcement had a more significant impact compared to horizontal and vertical reinforcement. The performance of XGBoost model outperforms traditional design models and reducing input features still yields reliable predictions.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

DHESN: A deep hierarchical echo state network approach for algal bloom prediction

Bo Hu, Huiyan Zhang, Xiaoyi Wang, Li Wang, Jiping Xu, Qian Sun, Zhiyao Zhao, Lei Zhang

Summary: A deep hierarchical echo state network (DHESN) is proposed to address the limitations of shallow coupled structures. By using transfer entropy, candidate variables with strong causal relationships are selected and a hierarchical reservoir structure is established to improve prediction accuracy. Simulation results demonstrate that DHESN performs well in predicting algal bloom.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Learning high-dependence Bayesian network classifier with robust topology

Limin Wang, Lingling Li, Qilong Li, Kuo Li

Summary: This paper discusses the urgency of learning complex multivariate probability distributions due to the increase in data variability and quantity. It introduces a highly scalable classifier called TAN, which utilizes maximum weighted spanning tree (MWST) for graphical modeling. The paper theoretically proves the feasibility of extending one-dependence MWST to model high-dependence relationships and proposes a heuristic search strategy to improve the fitness of the extended topology to data. Experimental results demonstrate that this algorithm achieves a good bias-variance tradeoff and competitive classification performance compared to other high-dependence or ensemble learning algorithms.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Make a song curative: A spatio-temporal therapeutic music transfer model for anxiety reduction

Zhejing Hu, Gong Chen, Yan Liu, Xiao Ma, Nianhong Guan, Xiaoying Wang

Summary: Anxiety is a prevalent issue and music therapy has been found effective in reducing anxiety. To meet the diverse needs of individuals, a novel model called the spatio-temporal therapeutic music transfer model (StTMTM) is proposed.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

A modified reverse-based analysis logic mining model with Weighted Random 2 Satisfiability logic in Discrete Hopfield Neural Network and multi-objective training of Modified Niched Genetic Algorithm

Nur Ezlin Zamri, Mohd. Asyraf Mansor, Mohd Shareduwan Mohd Kasihmuddin, Siti Syatirah Sidik, Alyaa Alway, Nurul Atiqah Romli, Yueling Guo, Siti Zulaikha Mohd Jamaludin

Summary: In this study, a hybrid logic mining model was proposed by combining the logic mining approach with the Modified Niche Genetic Algorithm. This model improves the generalizability and storage capacity of the retrieved induced logic. Various modifications were made to address other issues. Experimental results demonstrate that the proposed model outperforms baseline methods in terms of accuracy, precision, specificity, and correlation coefficient.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

On taking advantage of opportunistic meta-knowledge to reduce configuration spaces for automated machine learning

David Jacob Kedziora, Tien-Dung Nguyen, Katarzyna Musial, Bogdan Gabrys

Summary: The paper addresses the problem of efficiently optimizing machine learning solutions by reducing the configuration space of ML pipelines and leveraging historical performance. The experiments conducted show that opportunistic/systematic meta-knowledge can improve ML outcomes, and configuration-space culling is optimal when balanced. The utility and impact of meta-knowledge depend on various factors and are crucial for generating informative meta-knowledge bases.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Optimal location for an EVPL and capacitors in grid for voltage profile and power loss: FHO-SNN approach

G. Sophia Jasmine, Rajasekaran Stanislaus, N. Manoj Kumar, Thangamuthu Logeswaran

Summary: In the context of a rapidly expanding electric vehicle market, this research investigates the ideal locations for EV charging stations and capacitors in power grids to enhance voltage stability and reduce power losses. A hybrid approach combining the Fire Hawk Optimizer and Spiking Neural Network is proposed, which shows promising results in improving system performance. The optimization approach has the potential to enhance the stability and efficiency of electric grids.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

NLP-based approach for automated safety requirements information retrieval from project documents

Zhijiang Wu, Guofeng Ma

Summary: This study proposes a natural language processing-based framework for requirement retrieval and document association, which can help to mine and retrieve documents related to project managers' requirements. The framework analyzes the ontology relevance and emotional preference of requirements. The results show that the framework performs well in terms of iterations and threshold, and there is a significant matching between the retrieved documents and the requirements, which has significant managerial implications for construction safety management.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Dog nose-print recognition based on the shape and spatial features of scales

Yung-Kuan Chan, Chuen-Horng Lin, Yuan-Rong Ben, Ching-Lin Wang, Shu-Chun Yang, Meng-Hsiun Tsai, Shyr-Shen Yu

Summary: This study proposes a novel method for dog identification using nose-print recognition, which can be applied to controlling stray dogs, locating lost pets, and pet insurance verification. The method achieves high recognition accuracy through two-stage segmentation and feature extraction using a genetic algorithm.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Fostering supply chain resilience for omni-channel retailers: A two-phase approach for supplier selection and demand allocation under disruption risks

Shaohua Song, Elena Tappia, Guang Song, Xianliang Shi, T. C. E. Cheng

Summary: This study aims to optimize supplier selection and demand allocation decisions for omni-channel retailers in order to achieve supply chain resilience. It proposes a two-phase approach that takes into account various factors such as supplier evaluation and demand allocation.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Accelerating Benders decomposition approach for shared parking spaces allocation considering parking unpunctuality and no-shows

Jinyan Hu, Yanping Jiang

Summary: This paper examines the allocation problem of shared parking spaces considering parking unpunctuality and no-shows. It proposes an effective approach using sample average approximation (SAA) combined with an accelerating Benders decomposition (ABD) algorithm to solve the problem. The numerical experiments demonstrate the significance of supply-demand balance for the operation and user satisfaction of the shared parking system.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Review Computer Science, Artificial Intelligence

Financial fraud detection using graph neural networks: A systematic review

Soroor Motie, Bijan Raahemi

Summary: Financial fraud is a persistent problem in the finance industry, but Graph Neural Networks (GNNs) have emerged as a powerful tool for detecting fraudulent activities. This systematic review provides a comprehensive overview of the current state-of-the-art technologies in using GNNs for financial fraud detection, identifies gaps and limitations in existing research, and suggests potential directions for future research.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Review Computer Science, Artificial Intelligence

Occluded person re-identification with deep learning: A survey and perspectives

Enhao Ning, Changshuo Wang, Huang Zhang, Xin Ning, Prayag Tiwari

Summary: This review provides a detailed overview of occluded person re-identification methods and conducts a systematic analysis and comparison of existing deep learning-based approaches. It offers important theoretical and practical references for future research in the field.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

A hierarchical attention detector for bearing surface defect detection

Jiajun Ma, Songyu Hu, Jianzhong Fu, Gui Chen

Summary: The article presents a novel visual hierarchical attention detector for multi-scale defect location and classification, utilizing texture, semantic, and instance features of defects through a hierarchical attention mechanism, achieving multi-scale defect detection in bearing images with complex backgrounds.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

© Peeref 2019-2024. All rights reserved.