All Works

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Abdul Munem Nerabie, British University in Dubai
Manar AlKhatib, British University in Dubai
Sujith Samuel Mathew, Zayed University
May El Barachi, University of Wollongong in Dubai
Farhad Oroumchian, University of Wollongong in Dubai

Document Type

Article

Source of Publication

Procedia Computer Science

Publication Date

1-1-2021

Abstract

Sentiment Analysis is achieved by using Natural Language Processing (NLP) techniques and finds wide applications in analyzing social media content to determine people’s opinions, attitudes, and emotions toward entities, individuals, issues, events, or topics. The accuracy of sentiment analysis depends on automatic Part-of-Speech (PoS) tagging which is required to label words according to grammatical categories. The challenge of analyzing the Arabic language has found considerable research interest, but now the challenge is amplified with the addition of social media dialects. While numerous morphological analyzers and PoS taggers were proposed for Modern Standard Arabic (MSA), we are now witnessing an increased interest in applying those techniques to the Arabic dialect that is prominent in social media. Indeed, social media texts (e.g. posts, comments, and replies) differ significantly from MSA texts in terms of vocabulary and grammatical structure. Such differences call for reviewing the PoS tagging methods to adapt social media texts. Furthermore, the lack of sufficiently large and diverse social media text corpora constitutes one of the reasons that automatic PoS tagging of social media content has been rarely studied. In this paper, we address those limitations by proposing a novel Arabic social media text corpus that is enriched with complete PoS information, including tags, lemmas, and synonyms. The proposed corpus constitutes the largest manually annotated Arabic corpus to date, with more than 5 million tokens, 238,600 MSA texts, and words from Arabic social media dialect, collected from 65,000 online users’ accounts. Furthermore, our proposed corpus was used to train a custom Long Short-Term Memory deep learning model and showed excellent performance in terms of sentiment classification accuracy and F1-score. The obtained results demonstrate that the use of a diverse corpus that is enriched with PoS information significantly enhances the performance of social media analysis techniques and opens the door for advanced features such as opinion mining and emotion intelligence.

DOI Link

10.1016/j.procs.2021.03.026

ISSN

1877-0509

Publisher

Elsevier

Volume

184

First Page

148

Last Page

155

Disciplines

Computer Sciences

Keywords

Sentiment Analysis, Part of Speech Tagging, Arabic Language, Dialect Arabic, Neural Network

Scopus ID

85106672867

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Nerabie, Abdul Munem; AlKhatib, Manar; Mathew, Sujith Samuel; Barachi, May El; and Oroumchian, Farhad, "The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach" (2021). All Works. 4250.
https://zuscholars.zu.ac.ae/works/4250

Indexed in Scopus

yes

Open Access

yes

Open Access Type

Gold: This publication is openly available in an open access journal/series

Download

Included in

Computer Sciences Commons

COinS

All Works

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

First Page

Last Page

Disciplines

Keywords

Scopus ID

Creative Commons License

Recommended Citation

Indexed in Scopus

Open Access

Open Access Type

Included in

Search

Browse

Contribute

Content Type

All Works

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Author First name, Last name, Institution

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

First Page

Last Page

Disciplines

Keywords

Scopus ID

Creative Commons License

Recommended Citation

Indexed in Scopus

Open Access

Open Access Type

Included in

Share

Search

Browse

Contribute

Content Type