4.7 Article

Sample-Based Attribute Selective AnDE for Large Data

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2016.2608881

关键词

Bayesian network classifiers; large data; classification learning; attribute selection; averaged n-dependence estimators (AnDE); leave-one-out cross validation

资金

  1. Australian Research Council [DP140100087]
  2. Asian Office of Aerospace Research and Development, Air Force Office of Scientific Research [FA2386-15-1-4007]
  3. National Natural Science Foundation of China [61202135]
  4. Natural Science Foundation of Jiangsu, China [BK20130735]
  5. Natural Science Foundation of Jiangsu Higher Education Institutions of China [14KJB520019, 13KJB520011, 13KJB520013]
  6. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University
  7. Priority Academic Program Development of Jiangsu Higher Education Institutions
  8. Monash e-Research Center
  9. eSolutions-Research Support Services
  10. Australian Commonwealth Government

向作者/读者索取更多资源

More and more applications have come with large data sets in the past decade. However, existing algorithms cannot guarantee to scale well on large data. Averaged n-Dependence Estimators (AnDE) allows for flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for large data learning. In this paper, we propose a sample-based attribute selection technique for AnDE. It needs one more pass through the training data, in which a multitude of approximate AnDE models are built and efficiently assessed by leave-one-out cross validation. The use of a sample reduces the training time. Experiments on 15 large data sets demonstrate that the proposed technique significantly reduces AnDE's error at the cost of a modest increase in training time. This efficient and scalable out-of-core approach delivers superior or comparable performance to typical in-core Bayesian network classifiers.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

A Bayesian-inspired, deep learning-based, semi-supervised domain adaptation technique for land cover mapping

Benjamin Lucas, Charlotte Pelletier, Daniel Schmidt, Geoffrey I. Webb, Francois Petitjean

Summary: Land cover maps are essential for environmental research and management. This paper presents Sourcerer, a semi-supervised domain adaptation technique that uses deep learning to generate land cover maps from satellite image time series data. Experimental results show that Sourcerer achieves high accuracy even with limited labeled target data.

MACHINE LEARNING (2023)

Review Biochemical Research Methods

Positive-unlabeled learning in bioinformatics and computational biology: a brief review

Fuyi Li, Shuangyu Dong, Andre Leier, Meiya Han, Xudong Guo, Jing Xu, Xiaoyu Wang, Shirui Pan, Cangzhi Jia, Yang Zhang, Geoffrey Webb, Lachlan J. M. Coin, Chen Li, Jiangning Song

Summary: Conventional supervised binary classification algorithms have been widely used in biological and biomedical data analysis. However, labeling data can be laborious, leading to the proposal of the positive unlabeled (PU) learning scheme. This approach allows for learning from limited positive samples and a large number of unlabeled samples, contributing to the development of various PU learning algorithms for addressing biological questions.

BRIEFINGS IN BIOINFORMATICS (2022)

Review Biochemical Research Methods

Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction

Meng Zhang, Cangzhi Jia, Fuyi Li, Chen Li, Yan Zhu, Tatsuya Akutsu, Geoffrey Webb, Quan Zou, Lachlan J. M. Coin, Jiangning Song

Summary: This study provides benchmark datasets for promoter prediction in 58 different species, and finds that deep learning and traditional machine learning-based approaches generally outperform scoring function-based approaches.

BRIEFINGS IN BIOINFORMATICS (2022)

Article Economics

An accurate and fully-automated ensemble model for weekly time series forecasting

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Pablo Montero-Manso

Summary: Nowadays, accurate forecasts for weekly time series are needed in many businesses and industries. However, the current forecasting literature lacks easy-to-use, automatic, reproducible, and accurate approaches for this task. To address this gap, we propose a forecasting method that leverages state-of-the-art techniques, including forecast combination, meta-learning, and global modeling. Our proposed method, based on a stacking approach with lasso regression, outperforms benchmarks and state-of-the-art models in terms of accuracy and consistently produces the most accurate forecasts for the M4 weekly dataset.

INTERNATIONAL JOURNAL OF FORECASTING (2023)

Article Pharmacology & Pharmacy

COVID-19 restrictions and the incidence and prevalence of prescription opioid use in Australia - a nationwide study

Monica Jung, Dickson Lukose, Suzanne Nielsen, J. Simon Bell, Geoffrey Webb, Jenni Ilomaki

Summary: The COVID-19 pandemic has disrupted healthcare seeking and delivery, and different Australian jurisdictions implemented varying restrictions. Analyzing national pharmacy dispensing data in Australia, it was found that after nationwide COVID-19 restrictions, the incidence and prevalence of opioid dispensing decreased in Victoria, New South Wales, and other jurisdictions. However, in Victoria post-lockdown, both the incidence and prevalence increased. There were no significant changes in the initiation of long-term opioid use in any jurisdiction. More stringent restrictions were associated with greater reductions in overall opioid initiation, but not in long-term opioid use initiation.

BRITISH JOURNAL OF CLINICAL PHARMACOLOGY (2023)

Article Computer Science, Artificial Intelligence

Ultra-fast meta-parameter optimization for time series similarity measures with application to nearest neighbour classification

Chang Wei Tan, Matthieu Herrmann, Geoffrey I. Webb

Summary: Nearest neighbour similarity measures are widely used in time series data analysis applications. This paper proposes ULTRA-FASTMPSEARCH, a family of algorithms for learning meta-parameters for different types of time series distance measures. These algorithms are significantly faster than the previous state of the art.

KNOWLEDGE AND INFORMATION SYSTEMS (2023)

Review Cardiac & Cardiovascular Systems

Did Australia's COVID-19 Restrictions Impact Statin Incidence, Prevalence or Adherence?

Adam C. Livori, Dickson Lukose, J. Simon Bell, Geoffrey I. Webb, Jenni Ilomaki

Summary: COVID-19 restrictions did not result in significant changes in the incidence, prevalence, or adherence to statins in Australia. Adaptive interventions, such as telehealth consultations and medication delivery, successfully maintained access to cardiovascular medications.

CURRENT PROBLEMS IN CARDIOLOGY (2023)

Article Computer Science, Artificial Intelligence

Elastic similarity and distance measures for multivariate time series

Ahmed Shifaz, Charlotte Pelletier, Francois Petitjean, Geoffrey I. Webb

Summary: This paper presents multivariate versions of seven commonly used elastic similarity and distance measures for time series data analytics. These measures can compensate for misalignments in the time axis of time series data. The paper adapts two existing strategies used in multivariate Dynamic Time Warping to these measures. Demonstrating their utility in multivariate time series classification using the nearest neighbor classifier, the paper shows that each measure achieves the highest accuracy on at least one dataset, supporting the value of developing a suite of multivariate similarity and distance measures. The paper also constructs a nearest neighbor-based ensemble of the measures, which proves to be competitive with other state-of-the-art single-strategy multivariate time series classifiers.

KNOWLEDGE AND INFORMATION SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

SETAR-Tree: a novel and accurate tree algorithm for global time series forecasting

Rakshitha Godahewa, Geoffrey I. I. Webb, Daniel Schmidt, Christoph Bergmeir

Summary: This paper explores the close connections between Threshold Autoregressive (TAR) models and regression trees. It introduces a new forecasting-specific tree algorithm called SETAR-Tree, which trains global Pooled Regression (PR) models in the leaves to learn cross-series information. The proposed tree and forest models outperform state-of-the-art tree-based algorithms and forecasting benchmarks in terms of accuracy.

MACHINE LEARNING (2023)

Article Computer Science, Artificial Intelligence

Amercing: An intuitive and effective constraint for dynamic time warping

Matthieu Herrmann, Geoffrey I. Webb

Summary: Dynamic Time Warping (DTW) is a time series distance measure that allows for non-linear alignments between sequences. To address the permissiveness issue of unconstrained DTW, constraints in the form of windows and weights have been introduced. However, these approaches have limitations, such as crude step functions and relative weights. In this paper, Amerced Dynamic Time Warping (ADTW) is proposed as a new variant that penalizes warping with a fixed additive cost, providing both constraints and intuitive outcomes.

PATTERN RECOGNITION (2023)

Article Computer Science, Interdisciplinary Applications

EHR-QC: A streamlined pipeline for automated electronic health records standardisation and preprocessing to predict clinical outcomes

Yashpal Ramakrishnaiah, Nenad Macesic, Geoffrey I. Webb, Anton Y. Peleg, Sonika Tyagi

Summary: The adoption of electronic health records (EHRs) has created opportunities for predicting clinical outcomes and improving patient care. However, non-standardized data representations and anomalies present major challenges in digital health research. To address these challenges, we have developed EHR-QC, a tool with two modules: data standardization and preprocessing. We believe that the development and adoption of tools like EHR-QC are critical for advancing digital health.

JOURNAL OF BIOMEDICAL INFORMATICS (2023)

Article Computer Science, Artificial Intelligence

Rigorous non-disjoint discretization for naive Bayes

Huan Zhang, Liangxiao Jiang, Geoffrey I. Webb

Summary: Naive Bayes is a classical machine learning algorithm that often uses discretization to transform quantitative attributes into qualitative attributes. Non-Disjoint Discretization (NDD) is a novel method that forms overlapping intervals and always locates a value toward the middle of an interval. However, existing approaches to NDD fail to consider the effect of multiple occurrences of a single value. In this study, a new method called Rigorous Non-Disjoint Discretization (RNDD) is proposed to handle multiple occurrences of a single value in a systematic manner, and it outperforms NDD and other existing competitors.

PATTERN RECOGNITION (2023)

Article Biochemical Research Methods

PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships

Tong Pan, Chen Li, Yue Bi, Zhikang Wang, Robin B. Gasser, Anthony W. Purcell, Tatsuya Akutsu, Geoffrey Webb, Seiya Imoto, Jiangning Song

Summary: PFresGO is an attention-based deep-learning approach that incorporates hierarchical structures in Gene Ontology (GO) graphs and natural language processing algorithms for functional annotation of proteins, achieving superior performance compared to existing methods.

BIOINFORMATICS (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Smooth Perturbations for Time Series Adversarial Attacks

Gautier Pialla, Hassan Ismail Fawaz, Maxime Devanne, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller, Christoph Bergmeir, Daniel Schmidt, Geoffrey Webb, Germain Forestier

Summary: Adversarial attacks pose a threat to deep neural networks, especially in the case of time series. Existing attacks for time series are few and often detectable. To address this issue, we propose a new attack method that generates smoother perturbations and improve model robustness through adversarial training.

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I (2022)

Meeting Abstract Public, Environmental & Occupational Health

Incidence and prevalence of prescription opioid use during Australian COVID-19 restrictions

Monica Jung, Dickson Lukose, Suzanne Nielsen, J. Simon Bell, Geoffrey Webb, Jenni Ilomaki

PHARMACOEPIDEMIOLOGY AND DRUG SAFETY (2022)

暂无数据