Towards Highly-Efficient k-Nearest Neighbor Algorithm for Big Data Classification
Source of Publication
2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)
the k-nearest neighbors (kNN) algorithm is naturally used to search for the nearest neighbors of a test point in a feature space. A large number of works have been developed in the literature to accelerate the speed of data classification using kNN. In parallel with these works, we present a novel K-nearest neighbor variation with neighboring calculation property, called NCP-kNN. NCP-kNN comes to solve the search complexity of kNN as well as the issue of high-dimensional classification. In fact, these two problems cause an exponentially increasing level of complexity, particularly with big datasets and multiple k values. In NCP-kNN, every test point’s distance is checked with only a limited number of training points instead of the entire dataset. Experimental results on six small datasets, show that the performance of NCP-kNN is equivalent to that of standard kNN on small and big datasets, with NCP-kNN being highly efficient. Furthermore, surprisingly, results on big datasets demonstrate that NCP-kNN is not just faster than standard kNN but also significantly superior. The findings, on the whole, show that NCP-kNN is a promising technique as a highly-efficient kNN variation for big data classification.
Training, Machine learning algorithms, Text categorization, Machine learning, Big Data, Complexity theory, Classification algorithms
Abdalla, Hassan I. and Amer, Ali A., "Towards Highly-Efficient k-Nearest Neighbor Algorithm for Big Data Classification" (2022). All Works. 5785.
Indexed in Scopus