Source of Publication
The rapid growth of electronic documents has resulted from the expansion and development of internet technologies. Text-documents classification is a key task in natural language processing that converts unstructured data into structured form and then extract knowledge from it. This conversion generates a high dimensional data that needs further analusis using data mining techniques like feature extraction, feature selection, and classification to derive meaningful insights from the data. Feature selection is a technique used for reducing dimensionality in order to prune the feature space and, as a result, lowering the computational cost and enhancing classification accuracy. This work presents a hybrid filter-wrapper method based on Principal Component Analysis (PCA) as a filter approach to select an appropriate and informative subset of features and Grey Wolf Optimizer (GWO) as wrapper approach (PCA-GWO) to select further informative features. Logistic Regression (LR) is used as an elevator to test the classification accuracy of candidate feature subsets produced by GWO. Three Arabic datasets, namely Alkhaleej, Akhbarona, and Arabiya, are used to assess the efficiency of the proposed method. The experimental results confirm that the proposed method based on PCA-GWO outperforms the baseline classifiers with/without feature selection and other feature selection approaches in terms of classification accuracy.
Institute of Electrical and Electronics Engineers (IEEE)
Feature extraction, Text categorization, Support vector machines, Principal component analysis, Task analysis, Logistics, Data mining, Water resources
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Alomari, Osama Ahmad; Elnagar, Ashraf; Afyouni, Imad; Shahin, Ismail; Nassif, Ali Bou; Hashem, Ibrahim Abaker; and Tubishat, Mohammad, "Hybrid feature selection based on principal component analysis and grey wolf optimizer algorithm for Arabic news article classification" (2022). All Works. 5471.
Indexed in Scopus
Open Access Type
Gold: This publication is openly available in an open access journal/series