TY - JOUR
T1 - Boosting Transformers
T2 - Recognizing Textual Entailment for Classification of Vaccine News Coverage
AU - Neves, Luiz
AU - Camargo, Chico Q.
AU - Massarani, Luisa
N1 - Publisher Copyright:
© The authors.
PY - 2025
Y1 - 2025
N2 - The introduction of Transformers neural network revolutionized Natural Language Processing by effectively handling long-range dependencies and context. Models like BERT and GPT are at the forefront of Large Language Models and have been used in text classification. Despite their benchmark performance, real-world applications pose challenges, including the requirement for substantial labeled data and class balance. Few-shot learning approaches, like the Recognizing Textual Entailment framework, have emerged to address these issues. RTE identifies relationships between a text T and a hypothesis H. T entails H if the meaning of H, as interpreted in the context of T, can be inferred from the meaning of T. This study explores an RTE framework for classifying vaccine-related headlines using 1,000 labeled data points distributed unevenly across 10 classes. We evaluate eight models and procedures, including both open-source and closed-source, as well as paid and free options. They were tested from four perspectives. The results highlight that deep transfer learning, combining language and task knowledge, like Transformers and RTE, enables the development of text classification models with superior performance, addressing data scarcity and class imbalance. This approach provides a valuable protocol for creating classification models and delivers an automated model for classifying vaccine-related content.
AB - The introduction of Transformers neural network revolutionized Natural Language Processing by effectively handling long-range dependencies and context. Models like BERT and GPT are at the forefront of Large Language Models and have been used in text classification. Despite their benchmark performance, real-world applications pose challenges, including the requirement for substantial labeled data and class balance. Few-shot learning approaches, like the Recognizing Textual Entailment framework, have emerged to address these issues. RTE identifies relationships between a text T and a hypothesis H. T entails H if the meaning of H, as interpreted in the context of T, can be inferred from the meaning of T. This study explores an RTE framework for classifying vaccine-related headlines using 1,000 labeled data points distributed unevenly across 10 classes. We evaluate eight models and procedures, including both open-source and closed-source, as well as paid and free options. They were tested from four perspectives. The results highlight that deep transfer learning, combining language and task knowledge, like Transformers and RTE, enables the development of text classification models with superior performance, addressing data scarcity and class imbalance. This approach provides a valuable protocol for creating classification models and delivers an automated model for classifying vaccine-related content.
KW - BERT
KW - GPT
KW - Natural Language Processing
KW - Recognizing Textual Entailment
KW - Transformers
UR - https://www.scopus.com/pages/publications/105014171676
U2 - 10.5117/CCR2025.1.1.NEVE
DO - 10.5117/CCR2025.1.1.NEVE
M3 - Article
AN - SCOPUS:105014171676
SN - 2665-9085
VL - 7
JO - Computational Communication Research
JF - Computational Communication Research
IS - 1
ER -