Evaluation of Malware Classification Models for Heterogeneous Data

Research output: Contribution to journalArticlepeer-review

Abstract

Machine learning (ML) has found widespread application in various domains. Additionally, ML-based techniques have been employed to address security issues in technology, with numerous studies showcasing their potential and effectiveness in tackling security problems. Over the years, ML methods for identifying malicious software have been developed across various security domains. However, recent research has highlighted the susceptibility of ML models to small input perturbations, known as adversarial examples, which can significantly alter model predictions. While prior studies on adversarial examples primarily focused on ML models for image processing, they have progressively extended to other applications, including security. Interestingly, adversarial attacks have proven to be particularly effective in the realm of malware classification. This study aims to explore the transparency of malware classification and develop an explanation method for malware classifiers. The challenge at hand is more complex than those associated with explainable AI for homogeneous data due to the intricate data structure of malware compared to traditional image datasets. The research revealed that existing explanations fall short in interpreting heterogeneous data. Our employed methods demonstrated that current malware detectors, despite high classification accuracy, may provide a misleading sense of security and measuring classification accuracy is insufficient for validating detectors.

Original languageEnglish
Article number288
JournalSensors (Switzerland)
Volume24
Issue number1
DOIs
StatePublished - Jan 2024

Bibliographical note

Publisher Copyright:
© 2024 by the author.

Keywords

  • IoT
  • XAI for CTI applications
  • XAI for cybersecurity data
  • adversarial learning
  • deep learning
  • interpretability

Fingerprint

Dive into the research topics of 'Evaluation of Malware Classification Models for Heterogeneous Data'. Together they form a unique fingerprint.

Cite this