☆ 4.6 Article

Cybersecurity Named Entity Recognition Using Bidirectional Long Short-Term Memory with Conditional Random Fields

TSINGHUA SCIENCE AND TECHNOLOGY (2021)

Journal

TSINGHUA SCIENCE AND TECHNOLOGY

Volume 26, Issue 3, Pages 259-265

Publisher

TSINGHUA UNIV PRESS

DOI: 10.26599/TST.2019.9010033

Keywords

security blogs; Long Short-Term Memory (LSTM); Named Entity Recognition (NER)

Funding

National Natural Science Foundation of China [61702508, 61802404, U1836209]
National Key Research and Development Program of China [2018YFB0803602, 2016QY06X1204]
National Social Science Foundation of China [19BSH022]
Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences
Beijing Key Laboratory of Network Security and Protection Technology

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Network texts play a crucial role in conveying cybersecurity information, and extracting cybersecurity entities is essential for cybersecurity applications. This paper introduces a novel cybersecurity entity identification model based on Bidirectional Long Short-Term Memory with Conditional Random Fields, showing superior performance in cybersecurity entity extraction through experiments on various types of security-related texts.

Network texts have become important carriers of cybersecurity information on the Internet. These texts include the latest security events such as vulnerability exploitations, attack discoveries, advanced persistent threats, and so on. Extracting cybersecurity entities from these unstructured texts is a critical and fundamental task in many cybersecurity applications. However, most Named Entity Recognition (NER) models are suitable only for general fields, and there has been little research focusing on cybersecurity entity extraction in the security domain. To this end, in this paper, we propose a novel cybersecurity entity identification model based on Bidirectional Long Short-Term Memory with Conditional Random Fields (Bi-LSTM with CRF) to extract security-related concepts and entities from unstructured text. This model, which we have named XBiLSTM-CRF, consists of a word-embedding layer, a bidirectional LSTM layer, and a CRF layer, and concatenates X input with bidirectional LSTM output. Via extensive experiments on an open-source dataset containing an office security bulletin, security blogs, and the Common Vulnerabilities and Exposures list, we demonstrate that XBiLSTM-CRF achieves better cybersecurity entity extraction than state-of-the-art models.

Cybersecurity Named Entity Recognition Using Bidirectional Long Short-Term Memory with Conditional Random Fields

Journal

TSINGHUA SCIENCE AND TECHNOLOGY

Publisher

TSINGHUA UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Cybersecurity Named Entity Recognition Using Bidirectional Long Short-Term Memory with Conditional Random Fields

Journal

TSINGHUA SCIENCE AND TECHNOLOGY

Publisher

TSINGHUA UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper