Revisiting Binary Code Authorship Analysis

Document Type

Book Chapter

Source of Publication

Lecture Notes in Computer Science

Publication Date

3-14-2025

Abstract

Binary authorship analysis is a crucial step in malware reverse engineering, but the volume and complexity of the code exacerbate the challenge of this manually intensive task. Consequently, efforts have been made to develop reliable automated tools to facilitate malware authorship analysis; however, many challenges are associated with automated approaches. For instance, the compilation process may remove stylistic features present in the source code. This paper evaluates the features used in existing approaches by utilizing various datasets, including programs written for the Google Code Jam programming competition, student projects from programming courses at multiple universities, and content from GitHub repositories. Additionally, we examined the impact of statistical features on precision, recall, and the false positive rate of these methodologies. The evaluation results reveal that the accuracy of these approaches varies across different application domains and datasets, and some of the selected features appear unrelated to the author’s style, indicating that careful consideration is needed when applying this approach. Finally, using statistical features enhanced the precision and recall of existing approaches while reducing the false positive rate by 10–15%.

ISBN

978-981-96-3530-6, 978-981-96-3531-3

ISSN

0302-3349

Publisher

Springer Nature Singapore

Volume

15564

First Page

428

Last Page

449

Disciplines

Computer Sciences

Indexed in Scopus

no

Open Access

no

Share

COinS