☆ 4.7 Article

Towards UCI plus : A mindful repository design

INFORMATION SCIENCES (2014)

期刊

INFORMATION SCIENCES

卷 261, 期 -, 页码 237-262

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2013.08.059

关键词

Data repository; Data complexity; Classification; Synthetic data set

类别

Computer Science, Information Systems

资金

Ministerio de Educacion y Ciencia [TIN2008-06681-C06-05]
Fundacio Credit Andorra
Govern d'Andorra

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Public repositories have contributed to the maturation of experimental methodology in machine learning. Publicly available data sets have allowed researchers to empirically assess their learners and, jointly with open source machine learning software, they have favoured the emergence of comparative analyses of learners' performance over a common framework. These studies have brought standard procedures to evaluate machine learning techniques. However, current claims such as the superiority of enhanced algorithms are biased by unsustained assumptions made throughout some praxes. In this paper, the early steps of the methodology, which refer to data set selection, are inspected. Particularly, the exploitation of the most popular data repository in machine learning the UCI repository is examined. We analyse the type, complexity, and use of UCI data sets. The study recommends the design of a mindful data repository, UCI+, which should include a set of properly characterised data sets consisting of a complete and representative sample of real-world problems, enriched with artificial benchmarks. The ultimate goal of the UCI+ is to lay the foundations towards a well-supported methodology for learner assessment. (C) 2013 Elsevier Inc. All rights reserved.

Towards UCI plus : A mindful repository design

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Towards UCI plus : A mindful repository design

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文