Learn2Evade: Learning-based generative model for evading PDF malware classifiers

Ho Bae, Younghan Lee, Yohan Kim, Uiwon Hwang, Sungroh Yoon, Yunheung Paek

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Recent research has shown that a small perturbation to an input may forcibly change the prediction of a machine learning (ML) model. Such variants are commonly referred to as adversarial examples. Early studies have focused mostly on ML models for image processing and expanded to other applications, including those for malware classification. In this article, we focus on the problem of finding adversarial examples against ML-based portable document format (PDF) malware classifiers.We deem that our problem is more challenging than those againstMLmodels for image processing because of the highly complex data structure of PDF and of an additional constraint that the generated PDF should exhibit malicious behavior. To resolve our problem, we propose a variant of generative adversarial networks that generate evasive variant PDF malware (without any crash), which can be classified as benign by various existing classifiers yetmaintaining the original malicious behavior. Our model exploits the target classifier as the second discriminator to rapidly generate an evasive variant PDF with our new feature selection process that includes unique features extracted from malicious PDF files. We evaluate our technique against three representative PDF malware classifiers (Hidost 13, Hidost 16, and PDFrate-v2) and further examine its effectiveness with AntiVirus engines from VirusTotal. To the best of our knowledge, our work is the first to analyze the performance against the commercial AntiVirus engines. Our model finds, with great speed, evasive variants for all selected seeds against state-of-The-Art PDF malware classifiers and raises a serious security concern in the presence of adversaries.

Original languageEnglish
Pages (from-to)299-313
Number of pages15
JournalIEEE Transactions on Artificial Intelligence
Volume2
Issue number4
DOIs
StatePublished - 1 Aug 2021

Bibliographical note

Publisher Copyright:
© 2021 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.

Keywords

  • Adversarial examples (AEs)
  • Evading portable document format (PDF) classifiers
  • Generative adversarial networks
  • PDF malware.

Fingerprint

Dive into the research topics of 'Learn2Evade: Learning-based generative model for evading PDF malware classifiers'. Together they form a unique fingerprint.

Cite this