Stopword detection for streaming content

ORCID Identifiers

Document Type

Conference Proceeding

Source of Publication

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Publication Date

1-1-2018

Abstract

© Springer International Publishing AG, part of Springer Nature 2018. The removal of stopwords is an important preprocessing step in many natural language processing tasks, which can lead to enhanced performance and execution time. Many existing methods either rely on a predefined list of stopwords or compute word significance based on metrics such as tf-idf. The objective of our work in this paper is to identify stopwords, in an unsupervised way, for streaming textual corpora such as Twitter, which have a temporal nature. We propose to consider and model the dynamics of a word within the streaming corpus to identify the ones that are less likely to be informative or discriminative. Our work is based on the discrete wavelet transform (DWT) of word signals in order to extract two features, namely scale and energy. We show that our proposed approach is effective in identifying stopwords and improves the quality of topics in the task of topic detection.

DOI Link

10.1007/978-3-319-76941-7_70

ISBN

9783319769400

ISSN

0302-9743

Publisher

Springer Verlag

Volume

10772 LNCS

First Page

737

Last Page

743

Disciplines

Computer Sciences

Keywords

Discrete wavelet transforms, Information retrieval, Natural language processing systems, Execution time, Pre-processing step, Topic detection, Word signals, Linguistics

Scopus ID

85044446453

Recommended Citation

Fani, Hossein; Bashari, Masoud; Zarrinkalam, Fattane; Bagheri, Ebrahim; and Al-Obeidat, Feras, "Stopword detection for streaming content" (2018). All Works. 3216.
https://zuscholars.zu.ac.ae/works/3216

Indexed in Scopus

yes

Open Access

no

All Works

Stopword detection for streaming content

ORCID Identifiers

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISBN

ISSN

Publisher

Volume

First Page

Last Page

Disciplines

Keywords

Scopus ID

Recommended Citation

Indexed in Scopus

Open Access

Search

Browse

Contribute

Content Type

All Works

Stopword detection for streaming content

Author First name, Last name, Institution

ORCID Identifiers

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISBN

ISSN

Publisher

Volume

First Page

Last Page

Disciplines

Keywords

Scopus ID

Recommended Citation

Indexed in Scopus

Open Access

Share

Search

Browse

Contribute

Content Type