Abstract
Recent research has shown that a small perturbation to an input may forcibly change the prediction of a machine learning (ML) model. Such variants are commonly referred to as adversarial examples. Early studies have focused mostly on ML models for image processing and expanded to other applications, including those for malware classification. In this article, we focus on the problem of finding adversarial examples against ML-based portable document format (PDF) malware classifiers.We deem that our problem is more challenging than those againstMLmodels for image processing because of the highly complex data structure of PDF and of an additional constraint that the generated PDF should exhibit malicious behavior. To resolve our problem, we propose a variant of generative adversarial networks that generate evasive variant PDF malware (without any crash), which can be classified as benign by various existing classifiers yetmaintaining the original malicious behavior. Our model exploits the target classifier as the second discriminator to rapidly generate an evasive variant PDF with our new feature selection process that includes unique features extracted from malicious PDF files. We evaluate our technique against three representative PDF malware classifiers (Hidost 13, Hidost 16, and PDFrate-v2) and further examine its effectiveness with AntiVirus engines from VirusTotal. To the best of our knowledge, our work is the first to analyze the performance against the commercial AntiVirus engines. Our model finds, with great speed, evasive variants for all selected seeds against state-of-The-Art PDF malware classifiers and raises a serious security concern in the presence of adversaries.
Original language | English |
---|---|
Pages (from-to) | 299-313 |
Number of pages | 15 |
Journal | IEEE Transactions on Artificial Intelligence |
Volume | 2 |
Issue number | 4 |
DOIs | |
State | Published - 1 Aug 2021 |
Bibliographical note
Publisher Copyright:© 2021 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
Keywords
- Adversarial examples (AEs)
- Evading portable document format (PDF) classifiers
- Generative adversarial networks
- PDF malware.