All Works

Authorship Attribution With Few Training Samples

Farkhund Iqbal, Zayed University
Mourad Debbabi, Concordia University
Benjamin C. M. Fung, McGill University

Document Type

Book Chapter

Source of Publication

Machine Learning for Authorship Attribution and Cyber Forensics

Publication Date

12-5-2020

Abstract

This chapter discusses authorship attribution through a training sample. The focus on authorship attribution discussed in this chapter differs in two ways from the traditional authorship identification problem discussed in the earlier chapters of this book. Firstly, the traditional authorship attribution studies [63, 65] only work in the presence of large training samples from each candidate author, which are typically enough to build a classification model. With authorship attribution, the emphasis is on using a few training samples for each suspect. In some scenarios, no training samples may exist, and the suspects may be asked (usually through court orders) to produce a writing sample for investigation purposes. Secondly, in traditional authorship studies, the goal is to attribute a single anonymous document to its true author. In this chapter, we look at cases where we have more than one anonymous message that needs to be attributed to the true author(s). It is likely that the perpetrator may either create a ghost e-mail account or hack an existing account, and then use it for sending illegitimate messages in order to remain anonymous. To address the aforementioned shortfalls, the authorship attribution problem has been redefined as follows: given a collection of anonymous messages potentially written by a set of suspects {S1, Â·Â·Â·, Sn}, a cybercrime investigator first wants to identify the major groups of messages based on stylometric features; intuitively, each message group is written by one suspect. Then s/he wants to identify the author of each anonymous message collection from the given candidate suspects. To address the newly defined authorship attribution problem, the stylometric pattern-based approach of AuthorMinerl (described previously in Sect. 5.4.1) is extended and called AuthorMinerSmall. When applying this approach, the stylometric features are first extracted from the given anonymous message collection Î©.

DOI Link

10.1007/978-3-030-61675-5_6

ISSN

2364-9488

Publisher

Springer International Publishing

First Page

Last Page

Disciplines

Computer Sciences | Social and Behavioral Sciences

Recommended Citation

Iqbal, Farkhund; Debbabi, Mourad; and Fung, Benjamin C. M., "Authorship Attribution With Few Training Samples" (2020). All Works. 624.
https://zuscholars.zu.ac.ae/works/624

Indexed in Scopus

Open Access

Link to Full Text

COinS

All Works

Authorship Attribution With Few Training Samples

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

First Page

Last Page

Disciplines

Recommended Citation

Indexed in Scopus

Open Access

Search

Browse

Contribute

Content Type

All Works

Authorship Attribution With Few Training Samples

Author First name, Last name, Institution

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

First Page

Last Page

Disciplines

Recommended Citation

Indexed in Scopus

Open Access

Share

Search

Browse

Contribute

Content Type