4.7 Article

Computing exact permutation p-values for association rules

期刊

INFORMATION SCIENCES
卷 346, 期 -, 页码 146-162

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2016.01.094

关键词

Association rule mining; Statistical significance testing; Permutation testing; Exact permutation p-value

资金

  1. Natural Science Foundation of China [61572094, 61501389]
  2. Fundamental Research Funds for the Central Universities of China [DUT14QY07]
  3. Hong Kong Research Grant Council [HKBU_22302815]
  4. Hong Kong Baptist University [FRG2/14-15/069]

向作者/读者索取更多资源

Association rule mining is an important task in the field of data mining, and many efficient algorithms have been proposed to address this problem. However, a large portion of the rules reported by these algorithms just satisfy the user-defined constraints purely by accident, and those that are not statistically meaningful should be filtered out through statistical significance testing. In the context of association rule discovery, the permutation based approach can achieve better performance than other competitive methods, although several drawbacks of this effective approach narrow its usability. In this paper, we provide an analysis of these disadvantages and propose an algorithm called Exact Permutation p-values for Association Rules (EPAR) to calculate the exact p-values of all tested rules. Experiments on different types of data sets demonstrate that EPAR can successfully alleviate the disadvantages and outperform the direct permutation-based method over several performance measures. (C) 2016 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemistry & Molecular Biology

A tissue-specific collaborative mixed model for jointly analyzing multiple tissues in transcriptome-wide association studies

Xingjie Shi, Xiaoran Chai, Yi Yang, Qing Cheng, Yuling Jiao, Haoyue Chen, Jian Huang, Can Yang, Jin Liu

NUCLEIC ACIDS RESEARCH (2020)

Article Genetics & Heredity

Accurate genetic and environmental covariance estimation with composite likelihood in genome-wide association studies

Boran Gao, Can Yang, Jin Liu, Xiang Zhou

Summary: The new computational method GECKO improves the accuracy of estimating genetic and environmental covariances in GWAS, revealing shared genetic and environmental structures between traits and aiding in the investigation of causal relationships. Compared to traditional methods, GECKO provides more accurate estimates and identifies significant genetic and environmental covariances, demonstrating a twofold power gain in analyzing trait pairs.

PLOS GENETICS (2021)

Article Biochemical Research Methods

XPXP: improving polygenic prediction by cross-population and cross-phenotype analysis

Jiashun Xiao, Mingxuan Cai, Xianghong Hu, Xiang Wan, Gang Chen, Can Yang

Summary: This article presents a cross-population and cross-phenotype method for constructing accurate polygenic risk scores (PRSs) in under-represented populations. By leveraging datasets from European populations and genetically correlated phenotypes, this method improves the accuracy of PRSs in non-European populations and enhances disease prediction and prevention in personalized medicine.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

Significance-Based Essential Protein Discovery

Yan Liu, Hao Liang, Quan Zou, Zengyou He

Summary: The identification of essential proteins is an important problem in bioinformatics. Existing methods have limitations in providing context-free and easily interpretable quantifications of centrality values, specifying proper thresholds, and controlling the quality of reported essential proteins. To overcome these limitations, this study formulates the essential protein discovery problem as a multiple hypothesis testing problem and presents a significance-based method named SigEP. Experimental results demonstrate that SigEP outperforms competing algorithms.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Article Biochemical Research Methods

Essential Protein Recognition via Community Significance

Yan Liu, Wenfang Chen, Zengyou He

Summary: The study introduces a new significance-based essential protein recognition method named EPCS, which outperforms current state-of-the-art essential protein identification methods and the only significance-based method SigEP.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2021)

Article Multidisciplinary Sciences

On the statistical significance of communities from weighted graphs

Zengyou He, Wenfang Chen, Xiaoqi Wei, Yan Liu

Summary: Community detection is a fundamental procedure in analyzing network data and the definition of a community remains a topic of debate. This study presents a new formulation for testing the realness of communities in weighted networks by modeling edge-weights as censored observations. By conducting Logrank tests on internal and external weight sets, the method outperforms existing evaluation metrics in individual community validation.

SCIENTIFIC REPORTS (2021)

Article Mathematical & Computational Biology

scPI: A Scalable Framework for Probabilistic Inference in Single-Cell RNA-Sequencing Data Analysis

Jingsi Ming, Jia Zhao, Can Yang

Summary: The technique of single-cell RNA-sequencing has allowed researchers to explore the cellular heterogeneity of complex tissues. In this study, a scalable framework called scPI was proposed to analyze scRNA-seq data. The scPI framework utilizes amortized variational inference and a nonlinear neural network to infer the low-dimensional representations of the data. Through analysis of real datasets, it was demonstrated that scPI can effectively handle various probabilistic models for scRNA-seq data in terms of scalability, missing value imputation, and cell type clustering.

STATISTICS IN BIOSCIENCES (2023)

Article Multidisciplinary Sciences

Mendelian randomization for causal inference accounting for pleiotropy and sample structure using genome-wide summary statistics

Xianghong Hu, Jia Zhao, Zhixiang Lin, Yang Wang, Heng Peng, Hongyu Zhao, Xiang Wan, Can Yang

Summary: Mendelian randomization (MR) is a valuable tool for inferring causal relationships among traits using summary statistics from GWASs, but existing methods often rely on strong assumptions leading to false-positive findings. Research has shown that considering pleiotropy and sample structure is crucial for reducing confounding effects.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2022)

Letter Biochemistry & Molecular Biology

Organoid-based single-cell spatiotemporal gene expression landscape of human embryonic development and hematopoiesis

Yiming Chao, Yang Xiang, Jiashun Xiao, Weizhong Zheng, Mo R. Ebrahimkhani, Can Yang, Angela Ruohao Wu, Pentao Liu, Yuanhua Huang, Ryohichi Sugimura

SIGNAL TRANSDUCTION AND TARGETED THERAPY (2023)

Article Biochemical Research Methods

PALM: a powerful and adaptive latent model for prioritizing risk variants with functional annotations

Xinyi Yu, Jiashun Xiao, Mingxuan Cai, Yuling Jiao, Xiang Wan, Jin Liu, Can Yang

Summary: The findings from genome-wide association studies have greatly helped us understand the genetic basis of human complex traits and diseases. However, several major challenges still need to be addressed, including the unknown biological functions of most GWAS hits and the identification of genetic risk variants with weak effects. To overcome these challenges, we propose a powerful and adaptive latent model (PALM) that integrates functional annotations with GWAS summary statistics.

BIOINFORMATICS (2023)

Article Biochemical Research Methods

stVAE deconvolves cell-type composition in large-scale cellular resolution spatial transcriptomics

Chen Li, Ting-Fung Chan, Can Yang, Zhixiang Lin

Summary: The study introduces a method called stVAE, based on the variational autoencoder framework, to deconvolve the cell-type composition of cellular resolution spatial transcriptomic datasets. It accurately identifies spatial patterns of cell types and their relative proportions across spots.

BIOINFORMATICS (2023)

Article Materials Science, Multidisciplinary

Ultralong mean free path phonons in HKUST-1 and their scattering by water adsorbates

Hongzhao Fan, Can Yang, Yanguang Zhou

Summary: Metal-organic frameworks (MOFs) have shown potential in energy storage and thermal management. By studying HKUST-1, a typical MOF, we found that its thermal conductivity is strongly size dependent, but decreases when water molecules are adsorbed. We also discovered two thermal energy exchange pathways in HKUST-1 with water molecules, and the thermal conductivity varies with the quantity of adsorbates due to the competition between these pathways.

PHYSICAL REVIEW B (2022)

Article Computer Science, Artificial Intelligence

Detecting Statistically Significant Communities

Zengyou He, Hao Liang, Zheng Chen, Can Zhao, Yan Liu

Summary: Community detection is a key data analysis problem, and many algorithms have been proposed. However, most work does not consider statistical significance. This article presents a tight upper bound on the p-value of a single community and a local search method for detecting statistically significant communities. Experimental results show its comparability with other methods.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2022)

Article Computer Science, Artificial Intelligence

A graph-traversal approach to identify influential nodes in a network

Yan Liu, Xiaoqi Wei, Wenfang Chen, Lianyu Hu, Zengyou He

Summary: This method utilizes a breadth-first search tree to generate a curve for calculating the influence score of nodes, demonstrating superiority over widely used centrality measures in various network domains.

PATTERNS (2021)

Article Computer Science, Information Systems

Instance-Based Classification Through Hypothesis Testing

Zengyou He, Chaohua Sheng, Yan Liu, Quan Zou

Summary: This paper presents a generic framework that formulates the binary classification problem as a two-sample testing problem, which is based on instances and hypothesis testing. Experimental results show that the method achieves performance comparable to classic classifiers and outperforms existing testing-based classifiers.

IEEE ACCESS (2021)

Article Computer Science, Information Systems

A consensus model considers managing manipulative and overconfident behaviours in large-scale group decision-making

Xia Liang, Jie Guo, Peide Liu

Summary: This paper investigates a novel consensus model based on social networks to manage manipulative and overconfident behaviors in large-scale group decision-making. By proposing a novel clustering model and improved methods, the consensus reaching is effectively facilitated. The feedback mechanism and management approach are employed to handle decision makers' behaviors. Simulation experiments and comparative analysis demonstrate the effectiveness of the model.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

CGN: Class gradient network for the construction of adversarial samples

Xiang Li, Haiwang Guo, Xinyang Deng, Wen Jiang

Summary: This paper proposes a method based on class gradient networks for generating high-quality adversarial samples. By introducing a high-level class gradient matrix and combining classification loss and perturbation loss, the method demonstrates superiority in the transferability of adversarial samples on targeted attacks.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Distinguishing latent interaction types from implicit feedbacks for recommendation

Lingyun Lu, Bang Wang, Zizhuo Zhang, Shenghao Liu

Summary: Many recommendation algorithms only rely on implicit feedbacks due to privacy concerns. However, the encoding of interaction types is often ignored. This paper proposes a relation-aware neural model that classifies implicit feedbacks by encoding edges, thereby enhancing recommendation performance.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Proximity-based density description with regularized reconstruction algorithm for anomaly detection

Jaehong Yu, Hyungrok Do

Summary: This study discusses unsupervised anomaly detection using one-class classification, which determines whether a new instance belongs to the target class by constructing a decision boundary. The proposed method uses a proximity-based density description and a regularized reconstruction algorithm to overcome the limitations of existing one-class classification methods. Experimental results demonstrate the superior performance of the proposed algorithm.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Non-iterative border-peeling clustering algorithm based on swap strategy

Hui Tu, Shifei Ding, Xiao Xu, Haiwei Hou, Chao Li, Ling Ding

Summary: Border-Peeling algorithm is a density-based clustering algorithm, but its complexity and issues on unbalanced datasets restrict its application. This paper proposes a non-iterative border-peeling clustering algorithm, which improves the clustering performance by distinguishing and associating core points and border points.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A two-stage denoising framework for zero-shot learning with noisy labels

Long Tang, Pan Zhao, Zhigeng Pan, Xingxing Duan, Panos M. Pardalos

Summary: In this work, a two-stage denoising framework (TSDF) is proposed for zero-shot learning (ZSL) to address the issue of noisy labels. The framework includes a tailored loss function to remove suspected noisy-label instances and a ramp-style loss function to reduce the negative impact of remaining noisy labels. In addition, a dynamic screening strategy (DSS) is developed to efficiently handle the nonconvexity of the ramp-style loss.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Selection of a viable blockchain service provider for data management within the internet of medical things: An MCDM approach to Indian healthcare

Raghunathan Krishankumar, Sundararajan Dhruva, Kattur S. Ravichandran, Samarjit Kar

Summary: Health 4.0 is gaining global attention for better healthcare through digital technologies. This study proposes a new decision-making framework for selecting viable blockchain service providers in the Internet of Medical Things (IoMT). The framework addresses the limitations in previous studies and demonstrates its applicability in the Indian healthcare sector. The results show the top ranking BSPs, the importance of various criteria, and the effectiveness of the developed model.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Q-learning with heterogeneous update strategy

Tao Tan, Hong Xie, Liang Feng

Summary: This paper proposes a heterogeneous update idea and designs HetUp Q-learning algorithm to enlarge the normalized gap by overestimating the Q-value corresponding to the optimal action and underestimating the Q-value corresponding to the other actions. To address the limitation, a softmax strategy is applied to estimate the optimal action, resulting in HetUpSoft Q-learning and HetUpSoft DQN. Extensive experimental results show significant improvements over SOTA baselines.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Dyformer: A dynamic transformer-based architecture for multivariate time series classification

Chao Yang, Xianzhi Wang, Lina Yao, Guodong Long, Guandong Xu

Summary: This paper proposes a dynamic transformer-based architecture called Dyformer for multivariate time series classification. Dyformer captures multi-scale features through hierarchical pooling and adaptive learning strategies, and improves model performance by introducing feature-map-wise attention mechanisms and a joint loss function.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

ESSENT: an arithmetic optimization algorithm with enhanced scatter search strategy for automated test case generation

Xiguang Li, Baolu Feng, Yunhe Sun, Ammar Hawbani, Saeed Hammod Alsamhi, Liang Zhao

Summary: This paper proposes an enhanced scatter search strategy, using opposition-based learning, to solve the problem of automated test case generation based on path coverage (ATCG-PC). The proposed ESSENT algorithm selects the path with the lowest path entropy among the uncovered paths as the target path and generates new test cases to cover the target path by modifying the dimensions of existing test cases. Experimental results show that the ESSENT algorithm outperforms other state-of-the-art algorithms, achieving maximum path coverage with fewer test cases.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

An attention based approach for automated account linkage in federated identity management

Shirin Dabbaghi Varnosfaderani, Piotr Kasprzak, Aytaj Badirova, Ralph Krimmel, Christof Pohl, Ramin Yahyapour

Summary: Linking digital accounts belonging to the same user is crucial for security, user satisfaction, and next-generation service development. However, research on account linkage is mainly focused on social networks, and there is a lack of studies in other domains. To address this, we propose SmartSSO, a framework that automates the account linkage process by analyzing user routines and behavior during login processes. Our experiments on a large dataset show that SmartSSO achieves over 98% accuracy in hit-precision.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A memetic algorithm with fuzzy-based population control for the joint order batching and picker routing problem

Renchao Wu, Jianjun He, Xin Li, Zuguo Chen

Summary: This paper proposes a memetic algorithm with fuzzy-based population control (MA-FPC) to solve the joint order batching and picker routing problem (JOBPRP). The algorithm incorporates batch exchange crossover and a two-level local improvement procedure. Experimental results show that MA-FPC outperforms existing algorithms in terms of solution quality.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

Refining one-class representation: A unified transformer for unsupervised time-series anomaly detection

Guoxiang Zhong, Fagui Liu, Jun Jiang, Bin Wang, C. L. Philip Chen

Summary: In this study, we propose the AMFormer framework to address the problem of mixed normal and anomaly samples in deep unsupervised time-series anomaly detection. By refining the one-class representation and introducing the masked operation mechanism and cost sensitive learning theory, our approach significantly improves anomaly detection performance.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A data-driven optimisation method for a class of problems with redundant variables and indefinite objective functions

Jin Zhou, Kang Zhou, Gexiang Zhang, Ferrante Neri, Wangyang Shen, Weiping Jin

Summary: In this paper, the authors focus on the issue of multi-objective optimisation problems with redundant variables and indefinite objective functions (MOPRVIF) in practical problem-solving. They propose a dual data-driven method for solving this problem, which consists of eliminating redundant variables, constructing objective functions, selecting evolution operators, and using a multi-objective evolutionary algorithm. The experiments conducted on two different problem domains demonstrate the effectiveness, practicality, and scalability of the proposed method.

INFORMATION SCIENCES (2024)

Article Computer Science, Information Systems

A Monte Carlo fuzzy logistic regression framework against imbalance and separation

Georgios Charizanos, Haydar Demirhan, Duygu Icen

Summary: This article proposes a new fuzzy logistic regression framework that addresses the problems of separation and imbalance while maintaining the interpretability of classical logistic regression. By fuzzifying binary variables and classifying subjects based on a fuzzy threshold, the framework demonstrates superior performance on imbalanced datasets.

INFORMATION SCIENCES (2024)