The construction of a new Lexicon design for Arabic language

Business Transformation through Innovation and Knowledge Management: An Academic Perspective - Proceedings of the 14th International Business Information Management Association Conference, IBIMA 2010


Analyzing Arabic sentences is a difficult task; the difficulties come from several sources. One is that sentences are long and complex, the other difficulties come from the sentence structure. The syntactic structure of sentence parts may be missing, taking into accounts different orders of words and phrases. This paper aims to develop and assess an Arabic Lexicon. The new automatic Lexicon was developed with the purpose of analyzing and extracting the attributes of Arabic words. The lexicon was implemented using two-step process, tokenization and part of speech tagging. The output of the lexicon can be processed by another parser tool which perform an analysis on Arabic sentence to determines if the sentence follows a valid grammatical structure. An evaluation test was conducted to assess the effectiveness and efficiency of the new lexicon design using real sentences taken randomly. The results have shown a minimum accuracy rate of 92% which is considered highly satisfactory. The newly designed lexicon can be widely used for any application that requires Arabic Language analysis and processing.

