Document Type

Article

Source of Publication

Forensic Science International: Digital Investigation

Publication Date

7-1-2024

Abstract

Extracting compiler-provenance-related information (e.g., the source of a compiler, its version, its optimization settings, and compiler-related functions) is crucial for binary-analysis tasks such as function fingerprinting, detecting code clones, and determining authorship attribution. However, the presence of obfuscation techniques has complicated the efforts to automate such extraction. In this paper, we propose an efficient and resilient approach to provenance identification in obfuscated binaries using advanced pre-trained computer-vision models. To achieve this, we transform the program binaries into images and apply a two-layer approach for compiler and optimization prediction. Extensive results from experiments performed on a large-scale dataset show that the proposed method can achieve an accuracy of over 98 % for both obfuscated and deobfuscated binaries.

ISSN

2666-2825

Publisher

Elsevier BV

Volume

49

Disciplines

Computer Sciences

Keywords

Binary code analysis, Compiler provenance, Malware analysis, Reverse engineering

Scopus ID

85198511447

Indexed in Scopus

yes

Open Access

yes

Open Access Type

Hybrid: This publication is openly available in a subscription-based journal/series

Share

COinS