Enhancing data classification using locally informed weighted k-nearest neighbor algorithm

Author First name, Last name, Institution

Hassan I. Abdalla, Zayed University
Ali A. Amer, Zayed University

Document Type

Article

Source of Publication

Expert Systems with Applications

Publication Date

6-1-2025

Abstract

In this work, a novel locally informed weighted kNN algorithm (LIWkNN) is presented to reduce the detrimental impact of outliers and an imbalanced class. The LIWkNN considers the labels of both the query and its neighboring data points, emphasizing its focus on the vicinity of the query point, enabling it to capture local patterns and variations. The algorithm updates the weights assigned to the neighbors by comparing their labels, which are subsequently utilized in the next step to predict the label for the query. Initially, all training point weights are set to 1. Secondly, predictions are made using the conventional KNN classifier, and then it is verified that the prediction matches the query label in the test data. These weights will be updated if the predicted label differs from the actual query's label, which otherwise will not be changed. According to the weight update process, an outlier's influence on the classification in the weighted KNN is kept to the minimum extent during the classification process, specifically if it is frequently selected as a neighbor for various queries. Thirdly, to address class imbalance, this method adjusts the weighting based on class density, ensuring that minority class points predominantly receive neighbors from their own class. Finally, once this weight update process is complete, the proposed KNN will be working with the final weights to classify the test points. The LIWkNN's competitive performance and straightforward architecture demonstrate the model's novelty, setting it apart from its cutting-edge competitors. To validate the LIWkNN's generalizability on a broader range of datasets, a comprehensive assessment using five evaluation measures (accuracy, F1-measure, ROC, mean absolute error—MAE, and geometric mean—GM) across sixty (balanced, imbalanced, noisy, time-series, and images) datasets is carried out in six experimental phases. According to the results supported with a multi-criteria analysis, LIWkNN is significantly more promising over the vast majority of all datasets taken into consideration, both generally and for specific k values.

ISSN

0957-4174

Publisher

Elsevier BV

Volume

276

Disciplines

Computer Sciences

Keywords

Artificial intelligence, Data classification, Data mining, k-nearest neighbor, kNN, Machine learning

Scopus ID

86000727559

Indexed in Scopus

yes

Open Access

no

Share

COinS