Document Type
Article
Source of Publication
Journal of King Saud University - Computer and Information Sciences
Publication Date
1-1-2024
Abstract
Accurate assignment of meaning to a word based on its context, known as Word Sense Disambiguation (WSD), remains challenging across languages. Extensive research aims to develop automated methods for determining word senses in different contexts. However, the literature lacks the presence of datasets generated for the Arabic language WSD. This paper presents a dataset comprising a hundred polysemous Arabic words. Each word in the dataset encompasses 3–8 distinct senses, with ten example sentences per sense. Some statistical operations are conducted to gain insights into the dataset, enlightening its characteristics and properties. Subsequently, a novel WSD approach is proposed to utilize similarity measures and find the overlap between contextual information and dictionary definitions. The proposed method uses the power of BERT, a pre-trained language model, to enable effective Arabic word disambiguation. In training, new features are integrated to improve the model's ability to differentiate between various senses of words. The proposed BERT models are combined to compose an ensemble model architecture to improve the classification performances. The performance of the WSD system outperforms state-of-the-art systems, achieving an approximate F1-score of 96 %. Statistical analyses are performed to evaluate the overall performance of the WSD approach by providing additional information on model predictions. A case study was implemented to test the effectiveness of WSD in sentiment analysis, a downstream task.
DOI Link
ISSN
Publisher
Elsevier BV
Volume
36
Issue
1
Disciplines
Computer Sciences
Keywords
Arabic natural language processing, BERT, Knowledge-based, Machine learning, Performance evaluation, Word sense disambiguation
Scopus ID
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Recommended Citation
Kaddoura, Sanaa and Nassar, Reem, "EnhancedBERT: A feature-rich ensemble model for Arabic word sense disambiguation with statistical analysis and optimized data collection" (2024). All Works. 6307.
https://zuscholars.zu.ac.ae/works/6307
Indexed in Scopus
yes
Open Access
yes
Open Access Type
Gold: This publication is openly available in an open access journal/series