☆ 4.6 Article

CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (2022)

Journal

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING

Volume 19, Issue 1, Pages 438-451

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TDSC.2020.2984505

Keywords

Software; Deep learning; Training data; Training; Predictive models; Security; Data models; Cross-domain; vulnerability detection; discovery; deep learning; machine learning; domain adaptation

Funding

Defence Science and Technology Group`s Next Generation Technologies Program [DP200100886, LP180100170]
Australian Research Council [LP180100170, DP200100886] Funding Source: Australian Research Council

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This article presents a new system for Cross Domain Software Vulnerability Discovery (CD-VulD) using deep learning and domain adaptation. By learning cross-domain representations, the system achieves better vulnerability detection performance across different projects or vulnerability types.

A major cause of security incidents such as cyber attacks is rooted in software vulnerabilities. These vulnerabilities should ideally be found and fixed before the code gets deployed. Machine learning-based approaches achieve state-of-the-art performance in capturing vulnerabilities. These methods are predominantly supervised. Their prediction models are trained on a set of ground truth data where the training data and test data are assumed to be drawn from the same probability distribution. However, in practice, the test data often differs from the training data in terms of distribution because they are from different projects or they differ in the types of vulnerability. In this article, we present a new system for Cross Domain Software Vulnerability Discovery (CD-VulD) using deep learning (DL) and domain adaptation (DA). We employ DL because it has the capacity of automatically constructing high-level abstract feature representations of programs, which are likely of more cross-domain useful than the handcrafted features driven by domain knowledge. The divergence between distributions is reduced by learning cross-domain representations. First, given software program representations, CD-VulD converts them into token sequences and learns the token embeddings for generalization across tokens. Next, CD-VulD employs a deep feature model to build abstract high-level presentations based on those sequences. Then, the metric transfer learning framework (MTLF) technique is employed to learn cross-domain representations by minimizing the distribution divergence between the source domain and the target domain. Finally, the cross-domain representations are used to build a classifier for vulnerability detection. Experimental results show that CD-VulD outperforms the state-of-the-art vulnerability detection approaches by a wide margin. We make the new datasets publicly available so that our work is replicable and can be further improved.

CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation

Journal

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation

Journal

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper