TY - GEN
T1 - SemEval-2022 Task 8
T2 - 16th International Workshop on Semantic Evaluation, SemEval 2022
AU - Chen, Xi
AU - Zeynali, Ali
AU - Camargo, Chico Q.
AU - Flock, Fabian
AU - Gaffney, Devin
AU - Grabowicz, Przemyslaw A.
AU - Hale, Scott A.
AU - Jurgens, David
AU - Samory, Mattia
N1 - Funding Information:
This research has received funding through the Volkswagen Foundation. We thank Media Cloud for data access. We thank the Internet Archive which made it possible for participants to all have access to the same data. We are deeply grateful to the annotators and task participants: thank you.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Thousands of new news articles appear daily in outlets in different languages. Understanding which articles refer to the same story can not only improve applications like news aggregation but enable cross-linguistic analysis of media consumption and attention. However, assessing the similarity of stories in news articles is challenging due to the different dimensions in which a story might vary, e.g., two articles may have substantial textual overlap but describe similar events that happened years apart. To address this challenge, we introduce a new dataset of nearly 10,000 news article pairs spanning 18 language combinations annotated for seven dimensions of similarity as SemEval 2022 Task 8. Here, we present an overview of the task, the best performing submissions, and the frontiers and challenges for measuring multilingual news article similarity. While the participants of this SemEval task contributed very strong models, achieving up to 0.818 correlation with gold standard labels across languages, human annotators are capable of reaching higher correlations, suggesting space for further progress.
AB - Thousands of new news articles appear daily in outlets in different languages. Understanding which articles refer to the same story can not only improve applications like news aggregation but enable cross-linguistic analysis of media consumption and attention. However, assessing the similarity of stories in news articles is challenging due to the different dimensions in which a story might vary, e.g., two articles may have substantial textual overlap but describe similar events that happened years apart. To address this challenge, we introduce a new dataset of nearly 10,000 news article pairs spanning 18 language combinations annotated for seven dimensions of similarity as SemEval 2022 Task 8. Here, we present an overview of the task, the best performing submissions, and the frontiers and challenges for measuring multilingual news article similarity. While the participants of this SemEval task contributed very strong models, achieving up to 0.818 correlation with gold standard labels across languages, human annotators are capable of reaching higher correlations, suggesting space for further progress.
UR - http://www.scopus.com/inward/record.url?scp=85129313753&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85129313753
T3 - SemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop
SP - 1094
EP - 1106
BT - SemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop
A2 - Emerson, Guy
A2 - Schluter, Natalie
A2 - Stanovsky, Gabriel
A2 - Kumar, Ritesh
A2 - Palmer, Alexis
A2 - Schneider, Nathan
A2 - Singh, Siddharth
A2 - Ratan, Shyam
PB - Association for Computational Linguistics (ACL)
Y2 - 14 July 2022 through 15 July 2022
ER -