On the Integration of Similarity Measures with Machine Learning Models to Enhance Text Classification Performance

Author First name, Last name, Institution

Hassan I. Abdalla, Zayed University
Ali A. Amer, Taiz University

Document Type


Source of Publication

Information Sciences

Publication Date



Several techniques have long been proposed to enhance text classification performance, such as: classifier ensembles, feature selection, the integration of similarity measures with classifiers, and meta-heuristic algorithms. The integration of similarity measures with machine learning models (ML), however, has not yet received thorough analysis for text classification. As a result, in an effort to thoroughly investigate the impact of similarity measures integration with ML models, this work makes three major contributions: (1) proposing newly-integrated models and presenting benchmarking studies for integration methodology over balanced/imbalanced datasets; (2) offering detailed analysis for dozens of integrated models that are established, and experimentally proven, to significantly outperform state-of-the-art performance. The models' construction used fourteen similarity measures, three knowledge representations (BoW, TFIDF, and Word embedding), and five models (Support Vector Machine, N-Centroid-based Classifier, Multinomial Naïve Bayesian, Convolutional Neural Network, and Artificial Neural Network); and (3) introducing significantly-effective and highly-efficient variations of these five models. The evaluation study has been conducted internally for integrated models against their baselines, and externally against the state-of-the-art models. While the internal evaluation constantly showed a total enhancement rate of 49.3% and 59% over the balanced and imbalanced datasets, respectively, the external evaluation attested to the superiority of the integrated models.




Elsevier BV


Computer Sciences

Indexed in Scopus


Open Access