On the Integration of Similarity Measures with Machine Learning Models to Enhance Text Classification Performance

Document Type

Article

Source of Publication

Information Sciences

Publication Date

10-1-2022

Abstract

Several techniques have long been proposed to enhance text classification performance, such as: classifier ensembles, feature selection, the integration of similarity measures with classifiers, and meta-heuristic algorithms. The integration of similarity measures with machine learning models (ML), however, has not yet received thorough analysis for text classification. As a result, in an effort to thoroughly investigate the impact of similarity measures integration with ML models, this work makes three major contributions: (1) proposing newly-integrated models and presenting benchmarking studies for integration methodology over balanced/imbalanced datasets; (2) offering detailed analysis for dozens of integrated models that are established, and experimentally proven, to significantly outperform state-of-the-art performance. The models' construction used fourteen similarity measures, three knowledge representations (BoW, TFIDF, and Word embedding), and five models (Support Vector Machine, N-Centroid-based Classifier, Multinomial Naïve Bayesian, Convolutional Neural Network, and Artificial Neural Network); and (3) introducing significantly-effective and highly-efficient variations of these five models. The evaluation study has been conducted internally for integrated models against their baselines, and externally against the state-of-the-art models. While the internal evaluation constantly showed a total enhancement rate of 49.3% and 59% over the balanced and imbalanced datasets, respectively, the external evaluation attested to the superiority of the integrated models.

ISSN

0020-0255

Publisher

Elsevier BV

Disciplines

Computer Sciences

Indexed in Scopus

no

Open Access

no

Share

COinS