4.6 Article

Scaling up data curation using deep learning: An application to literature triage in genomic variation resources

Journal

PLOS COMPUTATIONAL BIOLOGY
Volume 14, Issue 8, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1006390

Keywords

-

Funding

  1. National Institutes of Health Intramural Research Program, National Library of Medicine
  2. Swiss Federal Government through the State Secretariat for Education, Research and Innovation (SERI)
  3. National Institutes of Health (NIH) [UniProt 5U41HG007822-02]
  4. National Human Genome Research Institute of the National Institutes of Health [U41HG007823]
  5. European Molecular Biology Laboratory

Ask authors/readers for more resources

Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/SwissProt could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biochemistry & Molecular Biology

Ensembl 2023

Fergal J. Martin, M. Ridwan Amode, Alisha Aneja, Olanrewaju Austine-Orimoloye, Andrey G. Azov, If Barnes, Arne Becker, Ruth Bennett, Andrew Berry, Jyothish Bhai, Simarpreet Kaur Bhurji, Alexandra Bignell, Sanjay Boddu, Paulo R. Branco Lins, Lucy Brooks, Shashank Budhanuru Ramaraju, Mehrnaz Charkhchi, Alexander Cockburn, Luca Da Rin Fiorretto, Claire Davidson, Kamalkumar Dodiya, Sarah Donaldson, Bilal El Houdaigui, Tamara El Naboulsi, Reham Fatima, Carlos Garcia Giron, Thiago Genez, Gurpreet S. Ghattaoraya, Jose Gonzalez Martinez, Cristi Guijarro, Matthew Hardy, Zoe Hollis, Thibaut Hourlier, Toby Hunt, Mike Kay, Vinay Kaykala, Tuan Le, Diana Lemos, Diego Marques-Coelho, Jose Carlos Marugan, Gabriela Alejandra Merino, Louisse Paola Mirabueno, Aleena Mushtaq, Syed Nakib Hossain, Denye N. Ogeh, Manoj Pandian Sakthivel, Anne Parker, Malcolm Perry, Ivana Pilizota, Irina Prosovetskaia, Jose G. Perez-Silva, Ahamed Imran Abdul Salam, Nuno Saraiva-Agostinho, Helen Schuilenburg, Dan Sheppard, Swati Sinha, Botond Sipos, William Stark, Emily Steed, Ranjit Sukumaran, Dulika Sumathipala, Marie-Marthe Suner, Likhitha Surapaneni, Kyosti Sutinen, Michal Szpak, Francesca Floriana Tricomi, David Urbina-Gomez, Andres Veidenberg, Thomas A. Walsh, Brandon Walts, Elizabeth Wass, Natalie Willhoft, Jamie Allen, Jorge Alvarez-Jarreta, Marc Chakiachvili, Bethany Flint, Stefano Giorgetti, Leanne Haggerty, Garth R. Ilsley, Jane E. Loveland, Benjamin Moore, Jonathan M. Mudge, John Tate, David Thybert, Stephen J. Trevanion, Andrea Winterbottom, Adam Frankish, Sarah E. Hunt, Magali Ruffier, Fiona Cunningham, Sarah Dyer, Robert D. Finn, Kevin L. Howe, Peter W. Harrison, Andrew D. Yates, Paul Flicek

Summary: Ensembl has been providing high-quality genomic resources for vertebrates and model organisms for over 20 years. With the increase in high-quality reference genomes and the development of pangenome representations, Ensembl aims to support downstream research by creating high-quality annotations, tools, and services for species across the tree of life. This report highlights Ensembl's resources for popular reference genomes, the growing annotations, updates to the Variant Effect Predictor, protein structure predictions, and the beta release of their new website.

NUCLEIC ACIDS RESEARCH (2023)

Article Genetics & Heredity

EyeG2P: an automated variant filtering approach improves efficiency of diagnostic genomic testing for inherited ophthalmic disorders

Eva Lenassi, Ana Carvalho, Anja Thormann, Liam Abrahams, Gavin Arno, Tracy Fletcher, Claire Hardcastle, Javier Lopez, Sarah E. Hunt, Patrick Short, Panagiotis Sergouniotis, Michel Michaelides, Andrew Webster, Fiona Cunningham, Simon C. Ramsden, Dalia Kasperaviciute, David R. Fitzpatrick, Graeme C. Black, Jamie M. Ellingford

Summary: EyeG2P is a publicly available database and web application designed for efficient variant prioritization for individuals with inherited ophthalmic conditions. It significantly increases precision compared to routine diagnostic approaches and reduces the number of variants for analysis in whole genome sequencing while maintaining high diagnostic yield.

JOURNAL OF MEDICAL GENETICS (2023)

Article Biochemical Research Methods

Guiding the choice of informatics software and tools for lipidomics research applications

Zhixu Ni, Michele Wolk, Geoff Jukes, Karla Mendivelso Espinosa, Robert Ahrends, Lucila Aimo, Jorge Alvarez-Jarreta, Simon Andrews, Robert Andrews, Alan Bridge, Geremy C. Clair, Matthew J. Conroy, Eoin Fahy, Caroline Gaud, Laura Goracci, Juergen Hartler, Nils Hoffmann, Dominik Kopczyinki, Ansgar Korf, Andrea F. Lopez-Clavijo, Adnan Malik, Jacobo Miranda Ackerman, Martijn R. Molenaar, Claire O'Donovan, Tomas Pluskal, Andrej Shevchenko, Denise Slenter, Gary Siuzdak, Martina Kutmon, Hiroshi Tsugawa, Egon L. Willighagen, Jianguo Xia, Valerie B. O'Donnell, Maria Fedorova

Summary: Progress in mass spectrometry lipidomics has led to a rapid increase in research in biology and biomedicine, generating large datasets that require sophisticated solutions for automated data processing. To address this issue, various software tools have been developed, but researchers often face difficulties in choosing the most suitable approach, resulting in inefficient and time-consuming ad hoc testing.

NATURE METHODS (2023)

Article Biochemistry & Molecular Biology

GENCODE: reference annotation for the human and mouse genomes in 2023

Adam Frankish, Silvia Carbonell-Sala, Mark Diekhans, Irwin Jungreis, Jane E. Loveland, Jonathan M. Mudge, Cristina Sisu, James C. Wright, Carme Arnan, If Barnes, Abhimanyu Banerjee, Ruth Bennett, Andrew Berry, Alexandra Bignell, Carles Boix, Ferriol Calvet, Daniel Cerdan-Velez, Fiona Cunningham, Claire Davidson, Sarah Donaldson, Cagatay Dursun, Reham Fatima, Stefano Giorgetti, Carlos Garcia Giron, Jose Manuel Gonzalez, Matthew Hardy, Peter W. Harrison, Thibaut Hourlier, Zoe Hollis, Toby Hunt, Benjamin James, Yunzhe Jiang, Rory Johnson, Mike Kay, Julien Lagarde, Fergal J. Martin, Laura Martinez Gomez, Surag Nair, Pengyu Ni, Fernando Pozo, Vivek Ramalingam, Magali Ruffier, Bianca M. Schmitt, Jacob M. Schreiber, Emily Steed, Marie-Marthe Suner, Dulika Sumathipala, Irina Sycheva, Barbara Uszczynska-Ratajczak, Elizabeth Wass, Yucheng T. Yang, Andrew Yates, Zahoor Zafrulla, Jyoti S. Choudhary, Mark Gerstein, Roderic Guigo, Tim J. P. Hubbard, Manolis Kellis, Anshul Kundaje, Benedict Paten, Michael L. Tress, Paul Flicek

Summary: GENCODE provides high quality gene and transcript annotation for the human and mouse genomes, supported by experimental data, serving as a reference for genome biology and clinical genomics. The consortium generates data, develops tools and carries out analyses to support the identification and annotation of transcript structures and their function.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource

Elliot Sollis, Abayomi Mosaku, Ala Abid, Annalisa Buniello, Maria Cerezo, Laurent Gil, Tudor Groza, Osman Gunes, Peggy Hall, James Hayhurst, Arwa Ibrahim, Yue Ji, Sajo John, Elizabeth Lewis, Jacqueline A. L. MacArthur, Aoife McMahon, David Osumi-Sutherland, Kalliope Panoutsopoulou, Zoe Pendlington, Santhi Ramachandran, Ray Stefancsik, Jonathan Stewart, Patricia Whetzel, Robert Wilson, Lucia Hindorff, Fiona Cunningham, Samuel A. Lambert, Michael Inouye, Helen Parkinson, Laura W. Harris

Summary: The NHGRI-EBIGWAS Catalog is a knowledgebase that provides comprehensive and standardized genome-wide association study (GWAS) data. By updating software, expanding the scope of the database, and increasing community outreach, the catalog has improved the quality and quantity of data, as well as enhanced interoperability with other resources.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

InterPro in 2022

Typhaine Paysan-Lafosse, Matthias Blum, Sara Chuguransky, Tiago Grego, Beatriz Lazaro Pinto, Gustavo A. Salazar, Maxwell L. Bileschi, Peer Bork, Alan Bridge, Lucy Colwell, Julian Gough, Daniel H. Haft, Ivica Letunic, Aron Marchler-Bauer, Huaiyu Mi, Darren A. Natale, Christine A. Orengo, Arun P. Pandurangan, Catherine Rivoire, Christian J. A. Sigrist, Ian Sillitoe, Narmada Thanki, Paul D. Thomas, Silvio C. E. Tosatto, Cathy H. Wu, Alex Bateman

Summary: The InterPro database has been updated with new data content and website features, providing a more user-friendly access to protein sequence classification and functional domain identification. It has also integrated features from the retiring Pfam website and developed a card game to engage the non-scientific community. Furthermore, the database explores the benefits and challenges of using artificial intelligence for protein structure prediction.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

UniProt: the Universal Protein Knowledgebase in 2023

Alex Bateman, Maria-Jesus Martin, Sandra Orchard, Michele Magrane, Shadab Ahmad, Emanuele Alpi, Emily H. Bowler-Barnett, Ramona Britto, Austra Cukura, Paul Denny, Tunca Dogan, ThankGod Ebenezer, Jun Fan, Penelope Garmiri, Leonardo Jose da Costa Gonzales, Emma Hatton-Ellis, Abdulrahman Hussein, Alexandr Ignatchenko, Giuseppe Insana, Rizwan Ishtiaq, Vishal Joshi, Dushyanth Jyothi, Swaathi Kandasaamy, Antonia Lock, Aurelien Luciani, Marija Lugaric, Jie Luo, Yvonne Lussi, Alistair MacDougall, Fabio Madeira, Mahdi Mahmoudy, Alok Mishra, Katie Moulang, Andrew Nightingale, Sangya Pundir, Guoying Qi, Shriya Raj, Pedro Raposo, Daniel L. Rice, Rabie Saidi, Rafael Santos, Elena Speretta, James Stephenson, Prabhat Totoo, Edward Turner, Nidhi Tyagi, Preethi Vasudev, Kate Warner, Xavier Watkins, Hermann Zellner, Alan J. Bridge, Lucila Aimo, Ghis-laine Argoud-Puy, Andrea H. Auchincloss, Kristian B. Axelsen, Parit Bansal, Delphine Baratin, Teresa M. Batista Neto, Marie-Claude Blatter, Jerven T. Bolleman, Emmanuel Boutet, Lionel Breuza, Blanca Cabrera Gil, Cristina Casals-Casas, Kamal Chikh Echioukh, Elisabeth Coudert, Beatrice Cuche, Edouard de Castro, Anne Estreicher, Maria L. Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Pascale Gaudet, Sebastien Gehant, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz, Chantal Hulo, Nevila Hyka-Nouspikel, Florence Jungo, Arnaud Kerhornou, Philippe Le Mercier, Damien Lieberherr, Patrick Masson, Anne Morgat, Venkatesh Muthukrishnan, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Lucille Pourcel, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Christian J. A. Sigrist, Karin Sonesson, Cecilia N. Arighi, Leslie Armin-ski, Chuming Chen, Yongxing Chen, Hongzhan Huang, Kati Laiho, Peter McGarvey, Darren A. Natale, Karen Ross, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Jian Zhang, Hema Bye-A-Jee, Rossana Zaru, Shyamala Sundaram, Cathy H. Wu

Summary: The UniProt Knowledgebase aims to provide comprehensive, high-quality, and freely accessible protein sequences annotated with functional information. The database has expanded its data processing pipeline and website to accommodate the increasing information content, with over 227 million sequences and plans to include a reference proteome for each taxonomic group. Detailed annotations are extracted from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations from automated systems. The new website, https://www.uniprot.org/, offers enhanced user experience and easy access to data, including AlphaFold structures and improved protein subcellular localization visualizations.

NUCLEIC ACIDS RESEARCH (2023)

Article Cell Biology

The Type 2 Diabetes Knowledge Portal: An open access genetic resource dedicated to type 2 diabetes and related traits

Maria C. Costanzo, Marcin von Grotthuss, Jeffrey Massung, Dongkeun Jang, Lizz Caulkins, Ryan Koesterer, Clint Gilbert, Ryan P. Welch, Parul Kudtarkar, Quy Hoang, Andrew P. Boughton, Preeti Singh, Ying Sun, Marc Duby, Annie Moriondo, Trang Nguyen, Patrick Smadbeck, Benjamin R. Alexander, MacKenzie Brandes, Mary Carmichael, Peter Dornbos, Todd Green, Kenneth C. Huellas-Bruskiewicz, Yue Ji, Alexandria Kluge, Aoife C. McMahon, Josep M. Mercader, Oliver Ruebenacker, Sebanti Sengupta, Dylan Spalding, Daniel Taliun, Philip Smith, Melissa K. Thomas, Beena Akolkar, M. Julia Brosnan, Andriy Cherkas, Audrey Y. Chu, Eric B. Fauman, Caroline S. Fox, Tania Nayak Kamphaus, Melissa R. Miller, Lynette Nguyen, Afshin Parsa, Dermot F. Reilly, Hartmut Ruetten, David Wholley, Norann A. Zaghloul, Goncalo R. Abecasis, David Altshuler, Thomas M. Keane, Mark I. McCarthy, Kyle J. Gaulton, Jose C. Florez, Michael Boehnke, Noel P. Burtt, Jason Flannick

Summary: This study aims to make the Type 2 Diabetes Knowledge Portal (T2DKP) more accessible and useful to both new and existing users. It evaluates the comprehensiveness of T2DKP by comparing its datasets with other repositories, guides researchers unfamiliar with human genetic data on how to interpret and use the data through T2DKP, and discusses the importance of democratizing access to complex disease genetic results.

CELL METABOLISM (2023)

Article Genetics & Heredity

The Gene Ontology knowledgebase in 2023

Suzi A. Aleksander, James Balhoff, Seth Carbon, J. Michael Cherry, Harold J. Drabkin, Dustin Ebert, Marc Feuermann, Pascale Gaudet, Nomi L. Harris, David P. Hill, Raymond Lee, Huaiyu Mi, Sierra Moxon, Christopher J. Mungall, Anushya Muruganugan, Tremayne Mushayahama, Paul W. Sternberg, Paul D. Thomas, Kimberly Van Auken, Jolene Ramsey, Deborah A. Siegele, Rex L. Chisholm, Petra Fey, Maria Cristina Aspromonte, Maria Victoria Nugnes, Federica Quaglia, Silvio Tosatto, Michelle Giglio, Suvarna Nadendla, Giulia Antonazzo, Helen Attrill, Gil dos Santos, Steven Marygold, Victor Strelets, Christopher J. Tabone, Jim Thurmond, Pinglei Zhou, Saadullah H. Ahmed, Praoparn Asanitthong, Diana Luna Buitrago, Meltem N. Erdol, Matthew C. Gage, Mohamed Ali Kadhum, Kan Yan Chloe Li, Miao Long, Aleksandra Michalak, Angeline Pesala, Armalya Pritazahra, Shirin C. C. Saverimuttu, Renzhi Su, Kate E. Thurlow, Ruth C. Lovering, Colin Logie, Snezhana Oliferenko, Judith Blake, Karen Christie, Lori Corbani, Mary E. Dolan, Dmitry Sitnikov, Cynthia Smith, Alayne Cuzick, James Seager, Laurel Cooper, Justin Elser, Pankaj Jaiswal, Parul Gupta, Sushma Naithani, Manuel Lera-Ramirez, Kim Rutherford, Valerie Wood, Jeffrey L. De Pons, Melinda R. Dwinell, G. Thomas Hayman, Mary L. Kaldunski, Anne E. Kwitek, Stanley J. F. Laulederkind, Marek A. Tutaj, Mahima Vedi, Shur-Jen Wang, Peter D'Eustachio, Lucila Aimo, Kristian Axelsen, Alan Bridge, Nevila Hyka-Nouspikel, Anne Morgat, Stacia R. Engel, Kalpana Karra, Stuart R. Miyasato, Robert S. Nash, Marek S. Skrzypek, Shuai Weng, Edith D. Wong, Erika Bakker, Tanya Z. Berardini, Leonore Reiser, Andrea Auchincloss, Ghislaine Argoud-Puy, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Cristina Casals-Casas, Elisabeth Coudert, Anne Estreicher, Maria Livia Famiglietti, Arnaud Gos, Nadine Gruaz-Gumowski, Chantal Hulo, Florence Jungo, Philippe Le Mercier, Damien Lieberherr, Patrick Masson, Ivo Pedruzzi, Lucille Pourcel, Sylvain Poux, Catherine Rivoire, Shyamala Sundaram, Alex Bateman, Emily Bowler-Barnett, Hema Bye-A-Jee, Paul Denny, Alexandr Ignatchenko, Rizwan Ishtiaq, Antonia Lock, Yvonne Lussi, Michele Magrane, Maria J. Martin, Sandra Orchard, Pedro Raposo, Elena Speretta, Nidhi Tyagi, Kate Warner, Rossana Zaru, Alexander D. Diehl, Juancarlos Chan, Stavros Diamantakis, Daniela Raciti, Magdalena Zarowiecki, Malcolm Fisher, Christina James-Zorn, Virgilio Ponferrada, Aaron Zorn, Sridhar Ramachandran, Leyla Ruzicka, Monte Westerfield

Summary: The Gene Ontology (GO) knowledgebase is a comprehensive resource that provides information about the functions of genes and gene products. It covers a wide range of organisms and receives updates from a consortium of scientists. The knowledgebase consists of three components: GO, which describes gene functionality; GO annotations, which provide evidence-supported statements about gene products; and GO-CAMs, which are models of molecular pathways. The knowledgebase is continuously updated and reviewed, and guidance is provided to users on how to make the best use of the data.

GENETICS (2023)

Article Biochemistry & Molecular Biology

The SIB Swiss Institute of Bioinformatics Semantic Web of data

Adrian Altenhoff, Amos Bairoch, Parit Bansal, Delphine Baratin, Frederic Bastian, Jerven Bolleman, Alan Bridge, Frederic Burdet, Katrin Crameri, Jerome Dauvillier, Christophe Dessimoz, Sebastien Gehant, Natasha Glover, Kristin Gnodtke, Catherine Hayes, Mark Ibberson, Evgenia Kriventseva, Dmitry Kuznetsov, Frederique Lisacek, Florence Mehl, Tarcisio Mendes de Farias, Pierre-Andre Michel, Sebastien Moretti, Anne Morgat, Sabine Osterle, Marco Pagni, Nicole Redaschi, Marc Robinson-Rechavi, Kasun Samarasinghe, Ana-Claudia Sima, Damian Szklarczyk, Orlin Topalov, Vasundra Toure, Deepak Unni, Christian von Mering, Julien Wollbrett, Monique Zahn-Zabal, Evgeny Zdobnov

Summary: This paper introduces the SIB Swiss Institute of Bioinformatics and its 11 databases, which provide semantically enriched data according to the FAIR principles. It also discusses the Swiss Personalized Health Network initiative and how it uses semantic enrichment to manipulate data. Examples and the use of SPARQL query language are provided to show how the existing SIB knowledge graphs can address complex biological or clinical questions.

NUCLEIC ACIDS RESEARCH (2023)

Article Oncology

Grading of lung adenocarcinomas with simultaneous segmentation by artificial intelligence (GLASS-AI)

John H. Lockhart, Hayley D. Ackerman, Kyubum Lee, Mahmoud Abdalah, Andrew John Davis, Nicole Hackel, Theresa A. Boyle, James Saller, Aysenur Keske, Kay Hanggi, Brian Ruffell, Olya Stringfield, W. Douglas Cress, Aik Choon Tan, Elsa R. Flores

Summary: Preclinical genetically engineered mouse models (GEMMs) of lung adenocarcinoma are valuable for studying tumor formation, progression, and therapeutic resistance. To improve histological analysis in these models, researchers developed GLASS-AI, a machine learning tool for grading, segmenting, and analyzing tumors. GLASS-AI showed agreement with expert raters and revealed previously unreported intratumor heterogeneity. Integration of immunohistochemical staining with GLASS-AI analysis identified dysregulation of Mapk/Erk signaling in high-grade lung adenocarcinomas. This study demonstrates the usefulness of GLASS-AI and the power of combining machine learning and molecular biology techniques for cancer research.

NPJ PRECISION ONCOLOGY (2023)

Article Biochemical Research Methods

Annotation of biologically relevant ligands in UniProtKB using ChEBI

Elisabeth Coudert, Sebastien Gehant, Edouard de Castro, Monica Pozzato, Delphine Baratin, Teresa Neto, Christian J. A. Sigrist, Nicole Redaschi, Alan Bridge

Summary: This study aims to provide high-quality annotations of binding sites for biologically relevant ligands in UniProtKB using the ChEBI chemical ontology. The researchers developed improved search and query facilities for these binding sites and used stable unique identifiers from ChEBI as reference vocabulary for the annotations. The annotations are freely available for querying and downloading through the UniProt website, REST API, SPARQL endpoint, and FTP site.

BIOINFORMATICS (2023)

Article Biochemical Research Methods

AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning

Ling Luo, Chih-Hsuan Wei, Po-Ting Lai, Robert Leaman, Qingyu Chen, Zhiyong Lu

Summary: Biomedical named entity recognition (BioNER) aims to automatically identify biomedical entities in natural language text, providing a necessary foundation for downstream text mining tasks and applications. Due to the expensive and domain-specific expertise required for manual annotation of training data, current BioNER approaches suffer from data scarcity and limitations in generalizability and entity coverage. In this paper, we propose an all-in-one (AIO) scheme that utilizes external annotated resources to enhance the accuracy and stability of BioNER models. We introduce AIONER, a general-purpose BioNER tool based on cutting-edge deep learning and our AIO scheme, and demonstrate its effectiveness, robustness, and advantages over existing methods on 14 BioNER benchmark tasks and three independent tasks.

BIOINFORMATICS (2023)

Meeting Abstract Endocrinology & Metabolism

Meal-induced activation of inflammatory pathways in youth with monogenic diabetes revealed by transcriptome analysis

Valerie Schwitzgebel, Ingrida Stankute, Cedric Howald, Jean-Louis Blouin, Rasa Verkauskiene, Ioannis Xenarios

HORMONE RESEARCH IN PAEDIATRICS (2023)

Article Biochemical Research Methods

GNorm2: an improved gene name recognition and normalization system

Chih-Hsuan Wei, Ling Luo, Rezarta Islamaj, Po-Ting Lai, Zhiyong Lu

Summary: Gene name normalization is a complex task in biomedical text mining research. GNorm2, an advanced tool, uses deep learning methods to achieve the highest levels of accuracy and efficiency in gene recognition and normalization.

BIOINFORMATICS (2023)

No Data Available