4.6 Article

Mining Fix Patterns for FindBugs Violations

Journal

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
Volume 47, Issue 1, Pages 165-188

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TSE.2018.2884955

Keywords

Fix pattern; pattern mining; program repair; findbugs violation; unsupervised learning

Funding

  1. Fonds National de la Recherche (FNR), Luxembourg [FIXPATTERN C15/IS/9964569, RECOMMEND C15/IS/10449467]
  2. Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) - Ministry of Science, ICT [2017M3C4A7068179]

Ask authors/readers for more resources

Several static analysis tools have been proposed to detect security vulnerabilities or bad programming practices, but their adoption is hindered by high false positive rates. By analyzing distributions of violations and their fixes, an automated approach using convolutional neural networks and clustering can identify fix patterns and apply them to unresolved violations effectively.
Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the adoption of these tools is hindered by their high false positive rates. If the false positive rate is too high, developers may get acclimated to violation reports from these tools, causing concrete and severe bugs being overlooked. Fortunately, some violations are actually addressed and resolved by developers. We claim that those violations that are recurrently fixed are likely to be true positives, and an automated approach can learn to repair similar unseen violations. However, there is lack of a systematic way to investigate the distributions on existing violations and fixed ones in the wild, that can provide insights into prioritizing violations for developers, and an effective way to mine code and fix patterns which can help developers easily understand the reasons of leading violations and how to fix them. In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Software Engineering

Evaluating Surprise Adequacy for Deep Learning System Testing

Jinhan Kim, Robert Feldt, Shin Yoo

Summary: The rapid adoption of Deep Learning (DL) systems in safety critical domains necessitates the testing of their correctness and robustness. In this article, we propose Surprise Adequacy (SA) as a test adequacy criterion, which measures the difference between the behavior of a DL system for a given input and its behavior for training data. We demonstrate that SA can predict model behavior correctness and detect adversarial examples.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

IBIR: Bug-report-driven Fault Injection

Ahmed Khanfir, Anil Koyuncu, Mike Papadakis, Maxime Cordy, Tegawende F. Bissyande, Jacques Klein, Yves Le Traon

Summary: This study introduces a fault injection tool called iBiR, which injects realistic faults by exploring change patterns associated with user-reported faults. Experimental results show that iBiR outperforms traditional mutation testing in terms of semantic similarity and test effectiveness.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

An In-depth Study of Java Deserialization Remote-Code Execution Exploits and Vulnerabilities

Imen Sayar, Alexandre Bartel, Eric Bodden, Yves Le Traon

Summary: Nowadays, the increasing use of deserialization in applications poses a security risk due to the potential for remote code execution attacks originating from untrusted sources. Deserialization vulnerabilities are a critical concern in web applications, often caused by development process faults and library flaws. This study explores attack gadgets in Java libraries and vulnerabilities in Java applications, identifying and understanding how these weaknesses are introduced, patched, and how long they persist. The analysis reveals that even a minor change in a class can introduce a gadget, and a significant portion of libraries remain unpatched, leaving them vulnerable to future attacks.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

CalcGraph: taming the high costs of deep learning using models

Joe Lorentz, Thomas Hartmann, Assaad Moawad, Francois Fouquet, Djamila Aouada, Yves Le Traon

Summary: This article introduces CalcGraph, a model abstraction of differential programming layers, which can simulate the usage of computational resources and automatically schedule execution based on specified specifications. We propose a novel method for switching models between storage and preallocated memory zones efficiently, maximizing the number of model executions given the available resources. The efficiency of our approach is demonstrated by consuming fewer resources than state-of-the-art frameworks like TensorFlow and PyTorch for single-model and multi-model execution.

SOFTWARE AND SYSTEMS MODELING (2023)

Article Computer Science, Software Engineering

The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct Patches

Haoye Tian, Kui Liu, Yinghua Li, Abdoul Kader Kabore, Anil Koyuncu, Andrew Habib, Li Li, Junhao Wen, Jacques Klein, Tegawende F. Bissyande

Summary: This study explores the use of learned code representations to identify correct patches. The experimental results show that deep learned embeddings can outperform existing methods that rely on dynamic information.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

Reliable Fix Patterns Inferred from Static Checkers for Automated Program Repair

Kui Liu, Jingtang Zhang, Li Li, Anil Koyuncu, Dongsun Kim, Chunpeng Ge, Zhe Liu, Jacques Klein, Tegawende F. Bissyande

Summary: Fix pattern-based patch generation is a promising direction in automated program repair (APR). The performance of pattern-based APR systems depends on the fix ingredients mined from fix changes in development histories. Collecting a reliable set of bug fixes in repositories can be challenging.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Review Computer Science, Artificial Intelligence

Decision support system for blockchain (DLT) platform selection based on ITU recommendations: A systematic literature review approach

Sylvain Kubler, Matthieu Renard, Sankalp Ghatpande, Jean-Philippe Georges, Yves Le Traon

Summary: Blockchain technologies are being explored in various applications, but selecting the right platform is challenging. This paper conducts a systematic literature review and develops a decision support tool based on recommended assessment criteria to aid in platform selection.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Article Computer Science, Software Engineering

Pre-implementation Method Name Prediction for Object-oriented Programming

Shangwen Wang, Ming Wen, Bo Lin, Yepang Liu, Tegawende F. Bissyande, Xiaoguang Mao

Summary: Method naming is a challenging task in object-oriented programming, and automated tool support has been developed to assist developers in this task. However, current approaches assume the availability of method implementation to infer its name, while methods are usually named before their implementations. This work fills the gap by developing an approach that predicts the names of all methods to be implemented within a class based on the class name. A large-scale empirical analysis is conducted to validate the approach, and a hybrid big code-driven approach, Mario, is proposed to predict method names. The experiments show promising results, outperforming existing models and baselines.

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY (2023)

Article Computer Science, Software Engineering

Learning the Relation Between Code Features and Code Transforms With Structured Prediction

Zhongxing Yu, Matias Martinez, Zimin Chen, Tegawende F. F. Bissyande, Martin Monperrus

Summary: This article presents the first approach for structurally predicting code transforms at the level of AST nodes using conditional random fields (CRFs). The approach learns a probabilistic model offline and uses it to predict code transforms for new, unseen code snippets. The experimental evaluation shows that considering code structure is crucial for achieving good prediction accuracy.

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (2023)

Article Computer Science, Software Engineering

Syntactic Versus Semantic Similarity of Artificial and Real Faults in Mutation Testing Studies

Milos Ojdanic, Aayush Garg, Ahmed Khanfir, Renzo Degiovanni, Mike Papadakis, Yves Le Traon

Summary: Fault seeding is commonly used in empirical studies to evaluate and compare test techniques. Recent research has used machine learning techniques to seed faults that look like real ones, raising the question of whether syntactically similar faults result in semantically similar faults. By employing different fault-seeding techniques, the study demonstrates that syntactic similarity does not reflect semantic similarity.

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (2023)

Article Computer Science, Software Engineering

Understanding the quality and evolution of Android app build systems

Pei Liu, Li Li, Kui Liu, Shane McIntosh, John Grundy

Summary: Build systems are crucial in software development to convert source code into executable software. However, the quality and evolution of build systems for mobile apps, particularly on the Android platform, have not been extensively studied. This paper presents an empirical study of 5222 Android projects to investigate the quality and evolution of their build systems.

JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS (2023)

Proceedings Paper Computer Science, Software Engineering

Repairing DNN Architecture: Are We There Yet?

Jinhan Kim, Nargiz Humbatova, Gunel Jahangirova, Paolo Tonella, Shin Yoo

Summary: As the use of Deep Neural Networks (DNNs) in large software systems continues to grow, there is an increasing need for software developers to design, train, and deploy these models. However, little attention has been given to addressing the difficulties developers face when designing and training such models. This paper surveys and evaluates existing techniques for repairing model performance, using real-world mistakes made by developers and artificial faulty models as benchmarks. The findings suggest that random baseline performs as well as or even outperforms existing techniques, but for larger and more complicated models, all repair techniques fail to find fixes. Further research is needed to develop more sophisticated Deep Learning repair techniques.

2023 IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION, ICST (2023)

Article Computer Science, Information Systems

A Query-Based Greedy Approach for Authentic Influencer Discovery in SIoT

Farah Batool, Abdul Rehman, Dongsun Kim, Assad Abbas, Raheel Nawaz, Tahir Mustafa Madni

Summary: The authors propose an informed search greedy approach to efficiently identify influencer nodes in the social Internet of Things that provide legitimate information. This approach minimizes network size and eliminates undesirable connections by ranking and prioritizing nodes. Nodes with ranking greater than 0.5 are considered authentic influencers, while nodes with lower rankings are discarded. The algorithm traverses the pruned network to obtain desired information from the authentic node. Experimental results demonstrate the effectiveness of the approach in terms of time consumption and network traversal.

CMC-COMPUTERS MATERIALS & CONTINUA (2023)

Article Computer Science, Information Systems

COVID-19 Outbreak Prediction by Using Machine Learning Algorithms

Tahir Sher, Abdul Rehman, Dongsun Kim

Summary: COVID-19, a contagious disease, has put pressure on various sectors, but data mining with IoT and SIoT has played a crucial role in overcoming it. This study used different machine learning algorithms to develop a model for analyzing and predicting the existence of COVID-19. The decision tree model performed the best, achieving an accuracy of 98.42%.

CMC-COMPUTERS MATERIALS & CONTINUA (2023)

No Data Available