Criminal Information Mining

Author First name, Last name, Institution

Farkhund Iqbal
Mourad Debbabi
Benjamin C. M. Fung

Document Type

Book Chapter

Source of Publication

Machine Learning for Authorship Attribution and Cyber Forensics

Publication Date



In the previous chapters, the different aspects of the authorship analysis problem were discussed. This chapter will propose a framework for extracting criminal information from the textual content of suspicious online messages. Archives of online messages, including chat logs, e-mails, web forums, and blogs, often contain an enormous amount of forensically relevant information about potential suspects and their illegitimate activities. Such information is usually found in either the header or body of an online document. The IP addresses, hostnames, sender and recipient addresses contained in the e-mail header, the user ID used in chats, and the screen names used in web-based communication help reveal information at the user or application level. For instance, information extracted from a suspicious e-mail corpus helps us to learn who the senders and recipients are, how often they communicate, and how many types of communities/cliques there are in a dataset. Such information also gives us an insight into the inter and intra-community patterns of communication. A clique or a community is a group of users who have an online communication link between them. Header content or user-level information is easy to extract and straightforward to use for the purposes of investigation.




Springer International Publishing

First Page


Last Page



Computer Sciences

Indexed in Scopus


Open Access