4.6 Article Proceedings Paper

Identification of transcription factor contexts in literature using machine learning approaches

Journal

BMC BIOINFORMATICS
Volume 9, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/1471-2105-9-S3-S11

Keywords

-

Funding

  1. Biotechnology and Biological Sciences Research Council [BB/C007360/1] Funding Source: Medline
  2. BBSRC [BB/C007360/1] Funding Source: UKRI

Ask authors/readers for more resources

Background: Availability of information about transcription factors (TFs) is crucial for genome biology, as TFs play a central role in the regulation of gene expression. While manual literature curation is expensive and labour intensive, the development of semi-automated text mining support is hindered by unavailability of training data. There have been no studies on how existing data sources (e.g. TF-related data from the MeSH thesaurus and GO ontology) or potentially noisy example data (e.g. protein-protein interaction, PPI) could be used to provide training data for identification of TF-contexts in literature. Results: In this paper we describe a text-classification system designed to automatically recognise contexts related to transcription factors in literature. A learning model is based on a set of biological features (e.g. protein and gene names, interaction words, other biological terms) that are deemed relevant for the task. We have exploited background knowledge from existing biological resources (MeSH and GO) to engineer such features. Weak and noisy training datasets have been collected from descriptions of TF-related concepts in MeSH and GO, PPI data and data representing non-protein-function descriptions. Three machine-learning methods are investigated, along with a vote-based merging of individual approaches and/or different training datasets. The system achieved highly encouraging results, with most classifiers achieving an F-measure above 90%. Conclusions: The experimental results have shown that the proposed model can be used for identification of TF-related contexts (i.e. sentences) with high accuracy, with a significantly reduced set of features when compared to traditional bag-of-words approach. The results of considering existing PPI data suggest that there is not as high similarity between TF and PPI contexts as we have expected. We have also shown that existing knowledge sources are useful both for feature engineering and for obtaining noisy positive training data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Correction Health Care Sciences & Services

Automatic Extraction of Mental Health Disorders From Domestic Violence Police Narratives: Text Mining Study (vol 20, e11548, 2018)

George Karystianis, Armita Adily, Peter Schofield, Lee Knight, Clara Galdon, David Greenberg, Louisa Jorm, Goran Nenadic, Tony Butler

JOURNAL OF MEDICAL INTERNET RESEARCH (2019)

Article Health Care Sciences & Services

Automated Analysis of Domestic Violence Police Reports to Explore Abuse Types and Victim Injuries: Text Mining Study

George Karystianis, Armita Adily, Peter W. Schofield, David Greenberg, Louisa Jorm, Goran Nenadic, Tony Butler

JOURNAL OF MEDICAL INTERNET RESEARCH (2019)

Article Computer Science, Information Systems

The Use of Data Mining Methods for the Prediction of Dementia: Evidence From the English Longitudinal Study of Aging

Hui Yang, Peter A. Bath

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS (2020)

Article Health Care Sciences & Services

Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper

Kerina H. Jones, Elizabeth M. Ford, Nathan Lea, Lucy J. Griffiths, Lamiece Hassan, Sharon Heys, Emma Squires, Goran Nenadic

JOURNAL OF MEDICAL INTERNET RESEARCH (2020)

Article Health Care Sciences & Services

Prevalence of Mental Illnesses in Domestic Violence Police Records: Text Mining Study

George Karystianis, Annabeth Simpson, Armita Adily, Peter Schofield, David Greenberg, Handan Wand, Goran Nenadic, Tony Butler

JOURNAL OF MEDICAL INTERNET RESEARCH (2020)

Article Public, Environmental & Occupational Health

Public Perspectives of Using Social Media Data to Improve Adverse Drug Reaction Reporting: A Mixed-Methods Study

Alexander Bulcock, Lamiece Hassan, Sally Giles, Caroline Sanders, Goran Nenadic, Stephen Campbell, Will Dixon

Summary: Participants in the study demonstrated low awareness of pharmacovigilance methods and ADR reporting, but showed willingness to share health-related social media data with researchers and regulators. However, they were cautious about the use of automated text mining methods to detect and report ADRs.

DRUG SAFETY (2021)

Article Health Care Sciences & Services

A Social Media Campaign (#datasaveslives) to Promote the Benefits of Using Health Data for Research Purposes: Mixed Methods Analysis

Lamiece Hassan, Goran Nenadic, Mary Patricia Tully

Summary: This study analyzed all publicly available posts on the Twitter platform containing the hashtag #datasaveslives between September 1, 2016, and August 31, 2017. The research found that this hashtag-based social media campaign effectively encouraged a wide audience of stakeholders to disseminate positive examples of health research, supporting community building and bridging practices within and between interdisciplinary sectors.

JOURNAL OF MEDICAL INTERNET RESEARCH (2021)

Article Medical Informatics

Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study

Ghada Alfattni, Maksim Belousov, Niels Peek, Goran Nenadic

Summary: This study evaluates the feasibility of using NLP and deep learning approaches for extracting drug information from clinical free-text notes and presents an extensive error analysis. Results show that deep learning methods can achieve high accuracy and exhibit different strengths in handling various relations.

JMIR MEDICAL INFORMATICS (2021)

Article Public, Environmental & Occupational Health

Take up to eight tablets per day: Incorporating free-text medication instructions into a transparent and reproducible process for preparing drug exposure data for pharmacoepidemiology

Meghna Jani, Belay Birlie Yimer, David Selby, Mark Lunt, Goran Nenadic, William G. Dixon

Summary: This study aimed to examine the impact of incorporating narrative prescribing instructions and subsequent drug preparation assumptions on adverse event rates, using a worked example of opioids and fracture risk. The results showed that assumptions made during the drug preparation process, especially for those with variability in prescription instructions, can impact subsequent risk estimates.

PHARMACOEPIDEMIOLOGY AND DRUG SAFETY (2023)

Article Medicine, Research & Experimental

An Analysis of PubMed Abstracts From 1946 to 2021 to Identify Organizational Affiliations in Epidemiological Criminology: Descriptive Study

George Karystianis, Wilson Lukmanjaya, Paul Simpson, Peter Schofield, Natasha Ginnivan, Goran Nenadic, Marina van Leeuwen, Iain Buchan, Tony Butler

Summary: This study examines the lead authors' affiliations in the field of epidemiological criminology to determine the countries and organizations responsible for the published research. It also explores the relationship between research outputs and the overall standard of a country's justice system.

INTERACTIVE JOURNAL OF MEDICAL RESEARCH (2022)

Article Health Care Sciences & Services

Mental Illness Concordance Between Hospital Clinical Records and Mentions in Domestic Violence Police Narratives: Data Linkage Study

George Karystianis, Rina Carines Cabral, Armita Adily, Wilson Lukmanjaya, Peter Schofield, Iain Buchan, Goran Nenadic, Tony Butler

Summary: This study explores the concordance between mental illness mentions in police event narratives and mental health diagnoses from hospital records in the context of domestic violence. The findings suggest that accessing the rich information contained in police text narratives can enhance current surveillance systems for reporting and understanding domestic violence, and additional insights can be gained through linkage to other health and welfare data collections.

JMIR FORMATIVE RESEARCH (2022)

Article Medical Informatics

Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders

Natalie K. Fitzpatrick, Richard Dobson, Angus Roberts, Kerina Jones, Anoop Shah, Goran Nenadic, Elizabeth Ford

Summary: This study aimed to gather stakeholder views on the creation of a consented, donated databank of clinical free text for NLP research. All stakeholders were strongly in favor of the databank and saw great value in creating an environment for testing and training NLP tools to improve accuracy.

JMIR MEDICAL INFORMATICS (2023)

Article Health Care Sciences & Services

Automatic Extraction of Research Themes in Epidemiological Criminology From PubMed Abstracts From 1946 to 2020: Text Mining Study

George Karystianis, Paul Simpson, Wilson Lukmanjaya, Natasha Ginnivan, Goran Nenadic, Iain Buchan, Tony Butler

Summary: The field of epidemiological criminology aims to study the intersection between public health and justice systems. This study examines the gaps between published research outputs in epidemiological criminology and the research priorities identified by prison stakeholders.

JMIR FORMATIVE RESEARCH (2023)

Proceedings Paper Computer Science, Artificial Intelligence

From Web Crawled Text to Project Descriptions: Automatic Summarizing of Social Innovation Projects

Nikola Milosevic, Dimitar Marinov, Abdullah Gok, Goran Nenadic

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2019) (2019)

Review Medical Informatics

Clinical Text Data in Machine Learning: Systematic Review

Irena Spasic, Goran Nenadic

JMIR MEDICAL INFORMATICS (2020)

No Data Available