Quantifying the Performance of Adversarial Training on Language Models with Distribution Shifts

Marwan Omar, Soohyeon Choi, Daehun Nyang, David Mohaisen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Adversarial training has recently emerged as an important defense mechanism to robustify machine learning models in the presence adversarial examples. Although adversarial training can boost the robustness of machine learning algorithms by a margin, research has not been conducted to determine if adversarial training is effective in the long-term. As deployments of machine learning algorithms are characterized by dynamics, change of the underlying model is inevitable. The dynamics are a result of model's evolution over time by introducing new training data and drifting the model by changing its parameters. In this paper, we examine the limitations of adversarial training due to the temporal changes of machine learning models. Using a natural language task, we conduct various experiments using a variety of datasets to measure the impact of concept drift on the efficacy of adversarial training. In particular, our analysis shows that certain adversarially-trained models are even more prone to the drift than others. In particular, WordCNN and LSTM-based models are shown more susceptible to the temporal changes than others such as BERT. We validate our findings using multiple real-world datasets on different network architectures. Our work calls for further research into the temporal aspects of adversarial training.

Original languageEnglish
Title of host publicationCySSS 2022 - Proceedings of the 1st Workshop on Cybersecurity and Social Sciences
PublisherAssociation for Computing Machinery, Inc
Pages3-9
Number of pages7
ISBN (Electronic)9781450391771
DOIs
StatePublished - 30 May 2022
Event1st International Workshop on Cybersecurity and Social Sciences, CySSS 2022 - Virtual, Online, Japan
Duration: 30 May 2022 → …

Publication series

NameCySSS 2022 - Proceedings of the 1st Workshop on Cybersecurity and Social Sciences

Conference

Conference1st International Workshop on Cybersecurity and Social Sciences, CySSS 2022
Country/TerritoryJapan
CityVirtual, Online
Period30/05/22 → …

Bibliographical note

Funding Information:
Acknowledgement. Supported by Global Research Laboratory (GRL) Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2016K1A1A2912757). The first and second authors contributed equal.

Publisher Copyright:
© 2022 ACM.

Keywords

  • adversarial training
  • concept drift
  • robustness
  • sentiment analysis

Fingerprint

Dive into the research topics of 'Quantifying the Performance of Adversarial Training on Language Models with Distribution Shifts'. Together they form a unique fingerprint.

Cite this