4.5 Article

Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

期刊

JOURNAL OF SYSTEMS AND SOFTWARE
卷 83, 期 7, 页码 1137-1147

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.jss.2010.01.002

关键词

Classification; Cost-sensitive learning; Over-fitting

资金

  1. Australian Research Council (ARC) [DP0985456]
  2. Nature Science Foundation (NSF) of China [90718020, 10661003]
  3. China 973 Program [2008CB317108]
  4. MOE [07JJD720044]
  5. Guangxi NSF
  6. Guangxi Colleges' Innovation Group

向作者/读者索取更多资源

Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting. (C) 2010 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
Review Computer Science, Software Engineering

A Multi-vocal Literature Review on challenges and critical success factors of phishing education, training and awareness

Orvila Sarker, Asangi Jayatilaka, Sherif Haggag, Chelsea Liu, M. Ali Babar

Summary: This study provides a comprehensive view of the challenges and critical success factors in the design, implementation, and evaluation stages of phishing education, training, and awareness (PETA). The findings highlight the need to address human-centric issues, bridge users' knowledge gaps, and adopt personalized approaches to enhance defense against phishing attacks.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

Performability evaluation of NoSQL-based storage systems☆

Carlos Araujo, Meuse Oliveira Jr., Bruno Nogueira, Paulo Maciel, Eduardo Tavares

Summary: This paper proposes a method based on stochastic Petri nets for evaluating the consistency levels of storage systems based on NoSQL DBMS. The method takes into account different consistency levels and redundant nodes, and estimates the system's availability, throughput, and the probability of accessing the newest data. Experimental results demonstrate the practical feasibility of this approach.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Review Computer Science, Software Engineering

Monitoring tools for DevOps and microservices: A systematic grey literature review

L. Giamattei, A. Guerriero, R. Pietrantuono, S. Russo, I. Malavolta, T. Islam, M. Dinga, A. Koziolek, S. Singh, M. Armbruster, J. M. Gutierrez-Martinez, S. Caro-Alvaro, D. Rodriguez, S. Weber, J. Henss, E. Fernandez Vogelin, F. Simon Panojo

Summary: This article presents the results of a systematic study on the available monitoring tools for DevOps and microservices. It provides a classification and analysis of these tools, aiming to be a useful reference for researchers and practitioners in this field.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

Harmonizing DevOps taxonomies - A grounded theory study

Jessica Diaz, Jorge Perez, Isaque Alves, Fabio Kon, Leonardo Leite, Paulo Meirelles, Carla Rocha

Summary: This paper presents empirical research on the structure of DevOps teams in software-producing organizations to better understand the organizational structure and characteristics of teams adopting DevOps. A theory of DevOps taxonomies is built through analysis, and its consistency with other taxonomies is tested.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

Managing the changing understanding of benefits in software initiatives

Sinan Sigurd Tanilkan, Jo Erskine Hannay

Summary: When deciding to develop new software, it is important to have a clear understanding of the intended benefits. However, our research shows that stakeholders' understanding of benefits often fluctuates during the development process, leading to uncertainty. Therefore, we recommend focusing on helping practitioners embrace changes in their understanding of benefits.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

Detecting security vulnerabilities with vulnerability nets

Pingyan Wang, Shaoying Liu, Ai Liu, Wen Jiang

Summary: This paper presents an approach that combines static analysis tools and manual audits to effectively detect various types of security vulnerabilities. By using a special Petri net representation, the proposed method is able to assist in the detection of taint-style vulnerabilities.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

Early analysis of requirements using NLP and Petri-nets

Edgar Sarmiento-Calisaya, Julio Cesar Sampaio do Prado Leite

Summary: This research introduces an automated requirements analysis approach that combines natural language processing, Petri-nets, and visualization techniques to improve the quality of scenario-based specifications, identify defects, and anticipate inconsistencies.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

Trace matrix optimization for fault localization

Jian Hu

Summary: This paper proposes a two-stage trace matrix optimization method for fault localization, which addresses the challenges of coincidental correctness and data imbalance in the current trace matrix. Through extensive experiments, significant improvements in fault localization effectiveness are demonstrated.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

Hierarchical features extraction and data reorganization for code search

Fan Zhang, Manman Peng, Yuanyuan Shen, Qiang Wu

Summary: This study proposes a novel method called HFEDR that utilizes the hierarchical features of Transformer models and reorganizes training data to improve code search performance. Experimental results demonstrate the effectiveness and rationality of the proposed approach.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

EsArCost: Estimating repair costs of software architecture erosion using slice technology

Tong Wang, Bixin Li

Summary: Software architecture erosion has a negative impact on software quality, performance, and evolution cost. This paper proposes an approach called EsArCost to locate the causes of architecture erosion and estimate the repair cost of each erosion problem. Experimental results show that EsArCost can effectively and efficiently estimate repair costs.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

SYNTONY: Potential-aware fuzzing with particle swarm optimization

Xiajing Wang, Rui Ma, Wei Huo, Zheng Zhang, Jinyuan He, Chaonan Zhang, Donghai Tian

Summary: This paper proposes a new potential-aware fuzzing scheme called SYNTONY that measures seed potential using multiple objectives and prioritizes promising seeds to increase the number of unique crashes and coverage. Experimental results show that SYNTONY outperforms other fuzzing tools and has high compatibility and expansibility.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

An Empirical Investigation Into the Influence of Software Communities' Cultural and on

Stefano Lambiase, Gemma Catolino, Fabiano Pecorelli, Damian A. Tamburri, Fabio Palomba, Willem-Jan van den Heuvel, Filomena Ferrucci

Summary: This paper contributes to the existing body of knowledge on factors affecting productivity in software development by studying the cultural and geographical dispersion of a development community. The results show that cultural and geographical dispersion significantly impact productivity, suggesting that managers and practitioners should consider these aspects throughout the software development lifecycle.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

The effects of required security on software development effort

Elaine Venson, Bradford Clark, Barry Boehm

Summary: The software industry has been under pressure to adopt security practices and reduce software vulnerabilities. This study quantifies the effort required to develop secure software in increasing levels of rigor and scope and provides validated cost multipliers for practitioners to estimate proper resources for adopting security practices.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

Towards an understanding of intra-defect associations: Implications for defect prediction

Yangyang Zhao, Mingyue Jiang, Yibiao Yang, Yuming Zhou, Hanjie Ma, Zuohua Ding

Summary: Previous studies have ignored the potential associations between modules involved in the same defect, and this comprehensive study explores the implications of intra-defect associations for defect prediction. The majority of defects occur across functions, with implicit dependencies between the modules. By considering intra-defect associations and merging modules, the proposed data processing approach significantly improves defect prediction performance.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)

Article Computer Science, Software Engineering

Learning to empathize with users through design thinking in hybrid mode: Insights from two educational case studies

Meira Levy, Irit Hadar

Summary: This research sheds new light on how students learn and practice hybrid work in educational settings through two educational studies. The findings show the benefits of new educational programs in fostering empathy and innovation among students, while also highlighting the challenges and opportunities in addressing real challenges.

JOURNAL OF SYSTEMS AND SOFTWARE (2024)