Untargeted Code Authorship Evasion with Seq2Seq Transformation

Soohyeon Choi, Rhongho Jang, Dae Hun Nyang, David Mohaisen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C#), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by ≈ 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.

Original languageEnglish
Title of host publicationComputational Data and Social Networks - 12th International Conference, CSoNet 2023, Proceedings
EditorsMinh Hoàng Hà, Xingquan Zhu, My T. Thai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages83-92
Number of pages10
ISBN (Print)9789819706686
DOIs
StatePublished - 2024
Event12th International Conference on Computational Data and Social Networks, CSoNet 2023 - Hanoi, Viet Nam
Duration: 11 Dec 202313 Dec 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14479 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th International Conference on Computational Data and Social Networks, CSoNet 2023
Country/TerritoryViet Nam
CityHanoi
Period11/12/2313/12/23

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

Keywords

  • Code Authorship Evasion Attack
  • Code Authorship Identification
  • Machine Learning Identification
  • Program Stylistic Features
  • Software Forensics

Fingerprint

Dive into the research topics of 'Untargeted Code Authorship Evasion with Seq2Seq Transformation'. Together they form a unique fingerprint.

Cite this