Revisiting Binary Code Authorship Analysis
Document Type
Book Chapter
Source of Publication
Lecture Notes in Computer Science
Publication Date
3-14-2025
Abstract
Binary authorship analysis is a crucial step in malware reverse engineering, but the volume and complexity of the code exacerbate the challenge of this manually intensive task. Consequently, efforts have been made to develop reliable automated tools to facilitate malware authorship analysis; however, many challenges are associated with automated approaches. For instance, the compilation process may remove stylistic features present in the source code. This paper evaluates the features used in existing approaches by utilizing various datasets, including programs written for the Google Code Jam programming competition, student projects from programming courses at multiple universities, and content from GitHub repositories. Additionally, we examined the impact of statistical features on precision, recall, and the false positive rate of these methodologies. The evaluation results reveal that the accuracy of these approaches varies across different application domains and datasets, and some of the selected features appear unrelated to the author’s style, indicating that careful consideration is needed when applying this approach. Finally, using statistical features enhanced the precision and recall of existing approaches while reducing the false positive rate by 10–15%.
DOI Link
ISBN
978-981-96-3530-6, 978-981-96-3531-3
ISSN
Publisher
Springer Nature Singapore
Volume
15564
First Page
428
Last Page
449
Disciplines
Computer Sciences
Recommended Citation
Alrabaee, Saed; Al-kfairy, Mousa; Taha, Mohammad Bany; Alfandi, Omar; Taher, Fatma; and Tang, Jie, "Revisiting Binary Code Authorship Analysis" (2025). All Works. 7133.
https://zuscholars.zu.ac.ae/works/7133
Indexed in Scopus
no
Open Access
no