Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification

Document Type

Article

Source of Publication

Applied Soft Computing

Publication Date

5-1-2024

Abstract

Imbalanced data classification presents a challenge in machine learning, inducing biased model learning. Moreover, data dimensionality poses another challenge as it highly impacts classifier performance. This paper proposes a new deep-learning method that combines feature selection with oversampling to address these challenges. The proposed approach, GA-SMOTE-DCNN, integrates a genetic algorithm (GA) for feature selection, SMOTE for oversampling, and a deep 1D-convolutional neural network (DCNN) for classification. This study reveals that pre-splitting the data into training and testing sets before applying SMOTE results in higher accuracy, showing an improvement in accuracy ranging between 1.94% and 3.98% compared to post-SMOTE splitting for each dataset. This method achieved accuracy rates of 86.81% for the Balance Scale dataset, 86.15% for the Oil Spill dataset, 89.21% for the Yeast dataset, 91.32% for the Mammography dataset, 88.23% for the Australian credit dataset, and 89.53% for the German Credit dataset when compared with benchmark methods, underscoring its significance in tackling high-dimensional and imbalanced data classification problems. This method demonstrates scalability in effectively addressing challenges associated with high-dimensional and imbalanced data classification across various domains.

ISSN

1568-4946

Publisher

Elsevier BV

Volume

156

Disciplines

Computer Sciences

Keywords

Convolutional Neural Network, Deep Learning, Feature selection, Genetic Algorithm, Imbalanced data, SMOTE

Scopus ID

85188859519

Indexed in Scopus

yes

Open Access

no

Share

COinS