The Impact of Data Normalization on KNN Rendering

Document Type

Book Chapter

Source of Publication

Lecture Notes on Data Engineering and Communications Technologies

Publication Date

9-18-2023

Abstract

Data normalization is a vital preprocessing technique in which the data is either scaled or converted so features will make an equal contribution. The success of classifiers, like K-Nearest Algorithm, is highly dependent on data quality to generalize classification models. In its turn, KNN is the simplest and most widely-used model for different machine learning-based tasks, including text classification, pattern recognition, plagiarism and intrusion detection, ranking models, sentiment analysis, etc. While the core of KNN is basically based on similarity measures, its performance is also highly contingent on the nature and representation of data. It is commonly known in literature that to secure competitive performance with KNN, data must be normalized. This raises a key question about which normalization method would lead to the best performance. To answer this question, the normalization of data with KNN, which has not yet been given good attention, is investigated in this work. We provide a comparative study on the significant impact of data normalization on KNN performance using six normalization methods, namely, Decimal, L2-Norm, Max/Min, Std Norm, TFIDF and BoW. On eight publicly-available datasets, experimental results show that no method dominates the others. However, the L2-Norm, Decimal, and TFIDF methods were shown to obtain the best performance (measured by accuracy, precision, and recall) in most evaluation metrics. Moreover, run time analysis shows that KNN is working efficiently with BoW, followed by TFIDF.

ISBN

978-3-031-43246-0, 978-3-031-43247-7

ISSN

2367-4520

Publisher

Springer Nature Switzerland

Volume

184

First Page

176

Last Page

184

Disciplines

Computer Sciences

Keywords

KNN, Normalization, Text Classification, Machine Learning, Performance Evaluation

Indexed in Scopus

no

Open Access

no

Share

COinS