4.5 Article

Extraction of temporal information from social media messages using the BERT model

期刊

EARTH SCIENCE INFORMATICS
卷 15, 期 1, 页码 573-584

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s12145-021-00756-6

关键词

Temporal information extraction; Temporal expression recognition; BERT; Natural language processing

资金

  1. National Natural Science Foundation of China [42050101, U1711267, 41871311, 41871305]
  2. Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing [KLIGIP-2021A01]
  3. Major scientific and technological innovation projects in Shandong Province [2019JZZY020105]
  4. China Postdoctoral Science Foundation [2021M702991]
  5. Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) [CUG2106116]

向作者/读者索取更多资源

Temporal information extraction from social media messages is crucial for geographical applications. A deep learning-based algorithm, BERT-BiLSTM-CRF, was proposed for automatically extracting temporal information. Experimental results demonstrate that the proposed method outperforms the current state-of-the-art models in extracting temporal information from Chinese social media texts.
Temporal information extraction from social media messages is of critical importance to several geographical applications. Combined with the characteristics of temporal information descriptions in Chinese text, different time expression patterns formed by time unit combinations are summarized. A deep learning-based information extraction algorithm (named BERT-BiLSTM-CRF) for automatically extracting temporal information from social media messages is proposed. Based on the bidirectional long short-term memory-conditional random field (BiLSTM-CRF) model, the BERT (bidirectional encoder representations from transformers) pretrained language model was used to enhance the generalization ability of the word vector model to capture long-range contextual information; then, the trained word vector was input into the BiLSTM-CRF model for further training. The proposed model was then evaluated on the constructed corpus, a set of manually annotated Chinese texts from social media messages. Among the basic models, the BERT-BiLSTM-CRF achieved the highest average F1-score of 85%. The experimental results show that the proposed method outperforms the current state-of-the-art models.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据