TY - JOUR
T1 - ANNO
T2 - A General Annotation Tool for Bilingual Clinical Note Information Extraction
AU - Lee, Kye Hwa
AU - Lee, Hyunsung
AU - Park, Jin Hyeok
AU - Kim, Yi Jun
AU - Lee, Youngho
N1 - Funding Information:
This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (No. HR21C0198).
Publisher Copyright:
© 2022 The Korean Society of Medical Informatics.
PY - 2022/1
Y1 - 2022/1
N2 - Objectives: This study was conducted to develop a generalizable annotation tool for bilingual complex clinical text annotation, which led to the design and development of a clinical text annotation tool, ANNO. Methods: We designed ANNO to enable human annotators to support the annotation of information in clinical documents efficiently and accurately. First, annotations for different classes (word or phrase types) can be tagged according to the type of word using the dictionary function. In addition, it is possible to evaluate and reconcile differences by comparing annotation results between human annotators. Moreover, if the regular expression set for each class is updated during annotation, it is automatically reflected in the new document. The regular expression set created by human annotators is designed such that a word tagged once is automatically labeled in new documents. Results: Because ANNO is a Docker-based web application, users can use it freely without being subjected to dependency issues. Human annotators can share their annotation markups as regular expression sets with a dictionary structure, and they can cross-check their annotated corpora with each other. The dictionary-based regular expression sharing function, cross-check function for each annotator, and standardized input (Microsoft Excel) and output (extensible markup language [XML]) formats are the main features of ANNO. Conclusions: With the growing need for massively annotated clinical data to support the development of machine learning models, we expect ANNO to be helpful to many researchers.
AB - Objectives: This study was conducted to develop a generalizable annotation tool for bilingual complex clinical text annotation, which led to the design and development of a clinical text annotation tool, ANNO. Methods: We designed ANNO to enable human annotators to support the annotation of information in clinical documents efficiently and accurately. First, annotations for different classes (word or phrase types) can be tagged according to the type of word using the dictionary function. In addition, it is possible to evaluate and reconcile differences by comparing annotation results between human annotators. Moreover, if the regular expression set for each class is updated during annotation, it is automatically reflected in the new document. The regular expression set created by human annotators is designed such that a word tagged once is automatically labeled in new documents. Results: Because ANNO is a Docker-based web application, users can use it freely without being subjected to dependency issues. Human annotators can share their annotation markups as regular expression sets with a dictionary structure, and they can cross-check their annotated corpora with each other. The dictionary-based regular expression sharing function, cross-check function for each annotator, and standardized input (Microsoft Excel) and output (extensible markup language [XML]) formats are the main features of ANNO. Conclusions: With the growing need for massively annotated clinical data to support the development of machine learning models, we expect ANNO to be helpful to many researchers.
KW - Data Mining
KW - Information Storage and Retrieval
KW - Information Storage and Retrieval
KW - Medical Records
KW - Personal Health Records
UR - http://www.scopus.com/inward/record.url?scp=85126628215&partnerID=8YFLogxK
U2 - 10.4258/hir.2022.28.1.89
DO - 10.4258/hir.2022.28.1.89
M3 - Article
AN - SCOPUS:85126628215
SN - 2093-3681
VL - 28
SP - 89
EP - 94
JO - Healthcare Informatics Research
JF - Healthcare Informatics Research
IS - 1
ER -