☆ 4.3 Article

LEARNING DECISION TREES WITH LOG CONDITIONAL LIKELIHOOD

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (2010)

期刊

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE

卷 24, 期 1, 页码 117-151

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD

DOI: 10.1142/S0218001410007877

关键词

Classification; log conditional likelihood; probability estimation tree; discriminative learning; AUC

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In machine learning and data mining, traditional learning models aim for high classification accuracy. However, accurate class probability prediction is more desirable than classification accuracy in many practical applications, such as medical diagnosis. Although it is known that decision trees can be adapted to be class probability estimators in a variety of approaches, and the resulting models are uniformly called Probability Estimation Trees (PETs), the performances of these PETs in class probability estimation, have not yet been investigated. We begin our research by empirically studying PETs in terms of class probability estimation, measured by Log Conditional Likelihood (LCL). We also compare a PET called C4.4 with other representative models, including Naive Bayes, Naive Bayes Tree, Bayesian Network, KNN and SVM, in LCL. From our experiments, we draw several valuable conclusions. First, among various tree-based models, C4.4 is the best in yielding precise class probability prediction measured by LCL. We provide an explanation for this and reveal the nature of LCL. Second, compared with non tree-based models, C4.4 also performs best. Finally, LCL does not dominate another well-established relevant metric-AUC, which suggests that different decision-tree learning models should be used for different objectives. Our experiments are conducted on the basis of 36 UCI sample sets. We run all the models within a machine learning platform-Weka. We also explore an approach to improve the class probability estimation of Naive Bayes Tree. We propose a greedy and recursive learning algorithm, where at each step, LCL is used as the scoring function to expand the decision tree. The algorithm uses Naive Bayes created at leaves to estimate class probabilities of test samples. The whole tree encodes the posterior class probability in its structure. One benefit of improving class probability estimation is that both classification accuracy and AUC can be possibly scaled up. We call the new model LCL Tree (LCLT). Our experiments on 33 UCI sample sets show that LCLT outperforms all state-of-the-art learning models, such as Naive Bayes Tree, significantly in accurate class probability prediction measured by LCL, as well as in classification accuracy and AUC.

LEARNING DECISION TREES WITH LOG CONDITIONAL LIKELIHOOD

期刊

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

LEARNING DECISION TREES WITH LOG CONDITIONAL LIKELIHOOD

期刊

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文