☆ 4.7 Article

Dataset-aware multi-task learning approaches for biomedical named entity recognition

BIOINFORMATICS (2020)

Journal

BIOINFORMATICS

Volume 36, Issue 15, Pages 4331-4338

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btaa515

Keywords

-

Categories

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

Funding

Natural Science Foundation of Shenzhen City [JCYJ20180306172131515]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: Named entity recognition is a critical and fundamental task for biomedical text mining. Recently, researchers have focused on exploiting deep neural networks for biomedical named entity recognition (Bio-NER). The performance of deep neural networks on a single dataset mostly depends on data quality and quantity while high-quality data tends to be limited in size. To alleviate task-specific data limitation, some studies explored the multi-task learning (MTL) for Bio-NER and achieved state-of-the-art performance. However, these MTL methods did not make full use of information from various datasets of Bio-NER. The performance of state-of-the-art MTL method was significantly limited by the number of training datasets. Results: We propose two dataset-aware MTL approaches for Bio-NER which jointly train all models for numerous Bio-NER datasets, thus each of these models could discriminatively exploit information from all of related training datasets. Both of our two approaches achieve substantially better performance compared with the state-of-the-art MTL method on 14 out of 15 Bio-NER datasets. Furthermore, we implemented our approaches by incorporating Bio-NER and biomedical part-of-speech (POS) tagging datasets. The results verify Bio-NER and POS can significantly enhance one another.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

Candidate region aware nested named entity recognition

Deng Jiang, Haopeng Ren, Yi Cai, Jingyun Xu, Yanxia Liu, Ho-fung Leung

Summary: Named entity recognition (NER) is crucial in NLP tasks. We propose a neural multi-task model for extracting nested entities, which performs better and more efficiently compared to feature-based models.

NEURAL NETWORKS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Towards Malay named entity recognition: an open-source dataset and a multi-task framework

Yingwen Fu, Nankai Lin, Zhihe Yang, Shengyi Jiang

Summary: Named entity recognition (NER) plays a crucial role in various natural language processing (NLP) applications. However, applying advanced NER research to low-resource languages like Malay has been challenging due to the lack of sufficient data. This paper presents a system for building a Malay NER dataset (MS-NER) with 20,146 sentences through labeled datasets in related languages and iterative optimization. Additionally, a Multi-Task framework (MTBR) is proposed to effectively integrate boundary information for improved NER performance.

CONNECTION SCIENCE (2023)

Add to Collection

Article Biochemical Research Methods

Named Entity Aware Transfer Learning for Biomedical Factoid Question Answering

Keqin Peng, Chuantao Yin, Wenge Rong, Chenghua Lin, Deyu Zhou, Zhang Xiong

Summary: Biomedical factoid question answering is a crucial task in biomedical question answering applications. This study proposes a framework that fine-tunes BioBERT with a named entity dataset to improve question answering performance. BiLSTM is applied to encode the question text for sentence-level information, and bagging is used to combine question and token level information for enhanced overall performance.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Add to Collection

Article Computer Science, Information Systems

TaughtNet: Learning Multi-Task Biomedical Named Entity Recognition From Single-Task Teachers

Vincenzo Moscato, Marco Postiglione, Carlo Sansone, Giancarlo Sperli

Summary: In Biomedical Named Entity Recognition (BioNER), the lack of publicly available annotated datasets hampers the use of cutting-edge deep learning-based methods like BERT and GPT-3. Annotating multiple entity types poses challenges as most existing datasets only provide annotations for a single entity type. To address this, the proposed TaughtNet framework leverages knowledge distillation to fine-tune a single multi-task student model using both ground truth and single-task teachers. Experimental results demonstrate the effectiveness of TaughtNet in recognizing mentions of diseases, chemical compounds, and genes, outperforming state-of-the-art baselines in terms of precision, recall, and F1 scores. TaughtNet also enables the training of smaller and lighter student models, making them suitable for real-world scenarios with limited-memory hardware devices and fast inferences, while also showing potential for explainability. The code and multi-task model have been made publicly available on GitHub (1) and the huggingface repository (2).

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS (2023)

Add to Collection

Article Computer Science, Information Systems

BINER: A low-cost biomedical named entity recognition

Mohsen Asghari, Daniel Sierra-Sosa, Adel S. Elmaghraby

Summary: The healthcare industry aims to improve patient experience and service quality, but there is a lack of standard datasets and computational resources for biomedical natural language understanding. This paper introduces a model trained on low-tier GPU computers to address this challenge.

INFORMATION SCIENCES (2022)

Add to Collection

Article Biochemical Research Methods

Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning

Zhiyu Zhang, Arbee L. P. Chen

Summary: This study introduces a novel fully-shared multi-task learning model based on a pre-trained language model in the biomedical domain, which achieved significant performance improvements on seven benchmark BioNER datasets compared to single-task models.

BMC BIOINFORMATICS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Class-Imbalanced-Aware Distantly Supervised Named Entity Recognition

Yuren Mao, Yu Hao, Weiwei Liu, Xuemin Lin, Xin Cao

Summary: This article proposes a novel PU learning method for distantly supervised NER, which can automatically handle class imbalance and does not rely on class prior estimation, resulting in state-of-the-art performance.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Add to Collection

Article Chemistry, Multidisciplinary

Chinese Named Entity Recognition Model Based on Multi-Task Learning

Qin Fang, Yane Li, Hailin Feng, Yaoping Ruan

Summary: Compared to English, Chinese named entity recognition has lower performance due to the greater ambiguity in entity boundaries in Chinese text. To leverage entity boundary information, the task has been decomposed into two subtasks: boundary annotation and type annotation. A multi-task learning network (MTL-BERT) has been proposed that combines a bidirectional encoder (BERT) model, effectively improving the performance and efficiency of Chinese named entity recognition tasks.

APPLIED SCIENCES-BASEL (2023)

Add to Collection

Review Biochemical Research Methods

Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison

Bosheng Song, Fen Li, Yuansheng Liu, Xiangxiang Zeng

Summary: Deep learning methods have achieved state-of-the-art performance in biomedical named entity recognition, and can be applied to BioNER in various domains based on dataset size and type. These methods are classified into four categories, including single neural network-based, multitask learning-based, transfer learning-based, and hybrid models.

BRIEFINGS IN BIOINFORMATICS (2021)

Add to Collection

Article Biochemical Research Methods

AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning

Ling Luo, Chih-Hsuan Wei, Po-Ting Lai, Robert Leaman, Qingyu Chen, Zhiyong Lu

Summary: Biomedical named entity recognition (BioNER) aims to automatically identify biomedical entities in natural language text, providing a necessary foundation for downstream text mining tasks and applications. Due to the expensive and domain-specific expertise required for manual annotation of training data, current BioNER approaches suffer from data scarcity and limitations in generalizability and entity coverage. In this paper, we propose an all-in-one (AIO) scheme that utilizes external annotated resources to enhance the accuracy and stability of BioNER models. We introduce AIONER, a general-purpose BioNER tool based on cutting-edge deep learning and our AIO scheme, and demonstrate its effectiveness, robustness, and advantages over existing methods on 14 BioNER benchmark tasks and three independent tasks.

BIOINFORMATICS (2023)

Add to Collection

Article Biochemical Research Methods

Transferring From Textual Entailment to Biomedical Named Entity Recognition

Tingting Liang, Congying Xia, Ziqiang Zhao, Yixuan Jiang, Yuyu Yin, Philip S. Yu

Summary: Biomedical Named Entity Recognition (BioNER) aims to identify biomedical entities such as genes, proteins, diseases, and chemical compounds in textual data. However, due to ethical and privacy issues, as well as the specialized nature of biomedical data, BioNER lacks quality labeled data, especially at the token-level. This study proposes a gazetteer-based approach to BioNER, where the task is to build a BioNER system from scratch without any token-level annotations. By formulating BioNER as a Textual Entailment problem and using Textual Entailment with Dynamic Contrastive learning (TEDC), this work addresses the noisy labeling issue and transfers knowledge from pre-trained textual entailment models. The dynamic contrastive learning framework improves the model's discrimination ability by contrasting entities and non-entities in the same sentence. Experimental results on real-world biomedical datasets demonstrate that TEDC achieves state-of-the-art performance for gazetteer-based BioNER.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Add to Collection

Article Biochemical Research Methods

Hierarchical shared transfer learning for biomedical named entity recognition

Zhaoying Chai, Han Jin, Shenghui Shi, Siyan Zhan, Lin Zhuo, Yu Yang

Summary: This study proposes a hierarchical shared transfer learning method, which combines multi-task learning and fine-tuning to fuse the underlying entity features and upper data features, thereby improving the performance of biomedical named entity recognition.

BMC BIOINFORMATICS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Unsupervised cross-domain named entity recognition using entity-aware adversarial training

Qi Peng, Changmeng Zheng, Yi Cai, Tao Wang, Haoran Xie, Qing Li

Summary: An unsupervised cross domain model is proposed in this study, which leverages labeled data from the source domain to predict entities in the target domain by applying adversarial training and an entity-aware attention module to reduce feature discrepancy between different domains.

NEURAL NETWORKS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

A Multi-Task BERT-BiLSTM-AM-CRF Strategy for Chinese Named Entity Recognition

Xiaoyong Tang, Yong Huang, Meng Xia, Chengfeng Long

Summary: Named entity recognition is a technology that aims to identify and mark entities with specific meanings in text. This paper proposes a multi-task intelligent processing model that utilizes machine learning and deep learning techniques, as well as context semantic information, to improve Chinese named entity recognition. The model achieves significant improvements in F1 score compared to previous single task models, demonstrating the effectiveness of multi-task learning.

NEURAL PROCESSING LETTERS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Assisting Multimodal Named Entity Recognition by cross-modal auxiliary tasks

Zhengjie Chen, Yu Zhang, Siya Mi

Summary: This paper introduces a method for improving the performance of Multimodal Named Entity Recognition (MNER) through cross-modal auxiliary tasks. The method utilizes cross-modal matching and cross-modal mutual information maximization to address the issue of mismatched image-text pairs, and separates the features of the main task and auxiliary tasks through a cross-modal gate-control mechanism.

PATTERN RECOGNITION LETTERS (2023)

Add to Collection

Article Biochemical Research Methods

Discovery of G-quadruplex-forming sequences in SARS-CoV-2

Danyang Ji, Mario Juhas, Chi Man Tsang, Chun Kit Kwok, Yongshu Li, Yang Zhang

Summary: Researchers have identified potential G-quadruplex-forming sequences in the SARS-CoV-2 RNA genome and other Coronaviridae family members, with some confirmed to form RNA G-quadruplex structures in vitro. These structures were found to interact with viral helicase (nsp13), suggesting a potential target for inhibiting the virus.

BRIEFINGS IN BIOINFORMATICS (2021)

Add to Collection

Editorial Material Genetics & Heredity

Biosensing Detection of the SARS-CoV-2 D614G Mutation

Yang Zhang, Hui Xi, Mario Juhas

Summary: The emergence of a mutant strain of SARS-CoV-2 with an amino acid change at position 614 has become dominant in the pandemic, highlighting the importance of efficient detection using biosensing technologies for pandemic control.

TRENDS IN GENETICS (2021)

Add to Collection

Article Chemistry, Analytical

Proteomic and Transcriptome Profiling of G-Quadruplex Aptamers Developed for Cell Internalization

Yang Zhang, Yang Wu, Hongjin Zheng, Hui Xi, Taoyu Ye, Chun-Yin Chan, Chun Kit Kwok

Summary: Nucleic acid medicine is emerging as a promising next-generation therapy, and the development of cell-penetrating aptamers can enhance the cellular delivery efficiency of therapeutic nucleic acids. Characteristic CD spectral analysis revealed the G-quadruplex structures of enriched aptamers.

ANALYTICAL CHEMISTRY (2021)

Add to Collection

Article Biochemical Research Methods

A span-based joint model for extracting entities and relations of bacteria biotopes

Mei Zuo, Yang Zhang

Summary: Motivation: Information about bacteria biotopes (BB) is crucial for microbiological research and applications. The BB task at BioNLP-OST 2019 focuses on extracting microorganism locations and phenotypes from biomedical texts. Our span-based model, utilizing a pre-trained BERT model, achieves significantly better performance in entity and relation extraction tasks for BBs compared to previous methods, showing a reduction of 21.6% in slot error rate (SER). The model also shows effectiveness in recognizing nested entities and can be applied to other related tasks with state-of-the-art performance.

BIOINFORMATICS (2022)

Add to Collection

Editorial Material Biochemistry & Molecular Biology

Deep Learning for Imaging and Detection of Microorganisms

Yang Zhang, Hao Jiang, Taoyu Ye, Mario Juhas

Summary: Despite the significant interest in deep learning in microbiology, its full potential is yet to be realized. Deep-learning-based systems are believed to play a crucial role in monitoring and investigating microorganisms in the future.

TRENDS IN MICROBIOLOGY (2021)

Add to Collection

Article Biology

Multi-stage malaria parasite recognition by deep learning

Sen Li, Zeyu Du, Xiangjie Meng, Yang Zhang

Summary: This article introduces a novel deep learning approach using a deep transfer graph convolutional network (DTGCN) for the recognition of malaria parasites of various stages in blood smear images. The method has shown higher accuracy and effectiveness in publicly available microscopic images of multi-stage malaria parasites compared to a wide range of state-of-the-art approaches.

GIGASCIENCE (2021)

Add to Collection

Article Biophysics

Fabricated Metal-Organic Frameworks (MOFs) as luminescent and electrochemical biosensors for cancer biomarkers detection

Brij Mohan, Sandeep Kumar, Hui Xi, Shixuan Ma, Zhiyu Tao, Tiantian Xing, Hengzhi You, Yang Zhang, Peng Ren

Summary: The study focuses on the application of sensors made of porous metal-organic frameworks (MOFs) in cancer biomarker detection, analyzing factors such as fabrication strategies and structural properties that influence sensing performance, and proposes an innovative technique for detecting cancer biomarkers using luminescence and electrochemical sensors.

BIOSENSORS & BIOELECTRONICS (2022)

Add to Collection

Letter Biochemistry & Molecular Biology

DRBin: metagenomic binning based on deep representation learning

Gang Mao, Yulin Wu, Yang Zhang, Xuan Wang, Yan Zhu, Bo Liu, Yadong Wang, Junyi Li

JOURNAL OF GENETICS AND GENOMICS (2022)

Add to Collection

Review Biotechnology & Applied Microbiology

Aptamers targeting SARS-COV-2: a promising tool to fight against COVID-19

Yang Zhang, Mario Juhas, Chun Kit Kwok

Summary: SARS-CoV-2, the cause of COVID-19, is a major contributor to global mortality. Existing antigen/antibody-based immunoassays and neutralizing antibodies are often ineffective against emerging SARS-CoV-2 variants, highlighting the urgent need for new approaches. Aptamers have been successfully used for detecting and inhibiting various viruses, and hold promise in the fight against COVID-19. This review discusses recent advances and future trends in the development of aptamer-based approaches for the diagnosis and treatment of SARS-CoV-2.

TRENDS IN BIOTECHNOLOGY (2023)

Add to Collection

Article Chemistry, Medicinal

Clinical Translation of Aptamers for COVID-19

Yang Zhang, Yongen Li

Summary: SARS-CoV-2, the virus causing COVID-19, remains a leading cause of death globally. Despite the development of effective methods for diagnosing and treating COVID-19, there is still an urgent need for new approaches to tackle SARS-CoV-2 variants and long COVID. Aptamers have shown great potential as diagnostic and therapeutic agents for COVID-19, but their translation into clinical use has been slow, posing challenges that need to be overcome.

JOURNAL OF MEDICINAL CHEMISTRY (2023)

Add to Collection

Article Biochemical Research Methods

A knowledge-integrated deep learning framework for cellular image analysis in parasite microbiology

Ruijun Feng, Sen Li, Yang Zhang

Summary: Cellular image analysis is a crucial method employed by microbiologists for the identification and study of microbes. The article presents a knowledge-integrated deep learning framework for cellular image analysis, focusing on classification, detection, and reconstruction tasks. It provides comprehensive information on various models, datasets, computing environment setup, knowledge representation, data pre-processing, and training and tuning, as well as evaluation and visualization techniques.

STAR PROTOCOLS (2023)

Add to Collection

Article Chemistry, Analytical

Fluorescence detection of the human angiotensinogen protein by the G-quadruplex aptamer

Hui Xi, Hanlin Jiang, Mario Juhas, Yang Zhang

Summary: Noncanonical G-quadruplex nucleic acid structures have been used as probes in biosensors for accurate and efficient detection of metal ions, proteins, and nucleic acids. In this study, a reliable and efficient fluorescent biosensor platform for G-quadruplex based detection of the human AGT protein was constructed using the magnetic bead enrichment method. This biosensor provides high accuracy, speed, and low cost, and successfully detected AGT at the cellular level.

ANALYST (2022)

Add to Collection

Article Biochemistry & Molecular Biology

Correction of out-of-focus microscopic images by deep learning

Chi Zhang, Hao Jiang, Weihuang Liu, Junyi Li, Shiming Tang, Mario Juhas, Yang Zhang

Summary: In this study, a model based on Cycle Generative Adversarial Network (CycleGAN) and a multi-component weighted loss function was developed to address the issue of out-of-focus microscopic images. The proposed model achieved state-of-the-art performance in deblurring and demonstrated excellent generalization capabilities.

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2022)

Add to Collection

Article Biochemistry & Molecular Biology

Genomic pan-cancer classification using image-based deep learning

Taoyu Ye, Sen Li, Yang Zhang

Summary: This study introduces a novel image-based deep learning strategy for cancer classification, achieving higher accuracy compared to existing methods. The approach is not only applicable to various types of cancer, but also helps identify top-ranked tumor-specific genes and pathways through heatmaps.

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2021)

Add to Collection

Article Biochemistry & Molecular Biology

Deep learning for COVID-19 chest CT (computed tomography) image analysis: A lesson from lung cancer

Hao Jiang, Shiming Tang, Weihuang Liu, Yang Zhang

Summary: To address the urgent need for COVID-19 diagnosis, AI-based methods for analyzing chest CT images have been proposed. By synthesizing a dataset and testing various deep learning models, accurate and efficient diagnostic testing for COVID-19 can be achieved.

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2021)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.