Discovering Influential Twitter Authors Via Clustering And Ranking On Apache Storm

Document Type

Conference Proceeding

Source of Publication

2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)

Publication Date

7-14-2021

Abstract

Nowadays several millions of people are throughout the day active, while hundreds of new accounts are created daily on social media. Thousands of short-length posts or tweets are posted on Twitter, a popular micro-blogging platform by a vast variety of authors and thus creating a widely diverse social content. The emerged diversity not only does indicate a remarkable strength, but also reveals a certain kind of difficulty when attempting to find Twitter’s authoritative and influencing authors. This work introduces a two-step algorithmic approach for discovering these authors. A set of metrics and features are, firstly, extracted from the social network e.g. friends and followers and the content of the tweets written by the author are extracted. Then, Twitter’s most authoritative authors are discovered by employing two distinct approaches, one which relies on probabilistic while the other applies fuzzy clustering. In particular, the former, initially, employs the Gaussian Mixture Model to identify the most authoritative authors and then introduces a novel ranking technique which relies on computing the cumulative Gaussian distribution of the extracted metrics and features. On the other hand, the latter combines the Gaussian Mixture Model with fuzzy c-means and subsequently the derived authors are ranked via the Borda count technique. The results indicate that the second scheme was able to find more authoritative authors in the benchmark dataset. Both approaches were designed, implemented, and executed on a local cluster of the Apache Storm framework, a cloud-based platform which supports streaming data and real-time scenarios.

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Volume

00

Disciplines

Computer Sciences

Keywords

Apache Storm, k-means, fuzzy c-means, Gaussian Mixture models, Borda count, Twitter influence, voting, ranking methods, list fusion, fuzzy clustering

Indexed in Scopus

no

Open Access

no

Share

COinS