Using the pubannotation ecosystem to perform agile text mining on genomics & informatics: A tutorial review

Hee Jo Nam, Ryota Yamada, Hyun Seok Park

Research output: Contribution to journalReview articlepeer-review

2 Scopus citations

Abstract

The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedi-cal Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotat-ing, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnota-tion, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.

Original languageEnglish
Article numbere13
Pages (from-to)1-10
Number of pages10
JournalGenomics and Informatics
Volume18
Issue number2
DOIs
StatePublished - 2020

Bibliographical note

Funding Information:
This work was supported by a National Research Foundation of Korea grant (NRF-2019R1F1A1058858) funded by the Korean government (MSIT).

Publisher Copyright:
© 2020, Korea Genome Organization.

Keywords

  • Named entity recognition
  • Natural language processing
  • Text mining

Fingerprint

Dive into the research topics of 'Using the pubannotation ecosystem to perform agile text mining on genomics & informatics: A tutorial review'. Together they form a unique fingerprint.

Cite this