Cost-sensitive elimination of mislabeled training data

Document Type

Article

Source of Publication

Information Sciences

Publication Date

9-1-2017

Abstract

© 2017 Elsevier Inc. Accurately labeling training data plays a critical role in various supervised learning tasks. Since labeling in practical applications might be erroneous due to various reasons, a wide range of algorithms have been developed to eliminate mislabeled data. These algorithms may make the following two types of errors: identifying a noise-free data as mislabeled, or identifying a mislabeled data as noise free. The effects of these errors may generate different costs, depending on the training datasets and applications. However, the cost variations are usually ignored thus existing works are not optimal regarding costs. In this work, the novel problem of cost-sensitive mislabeled data filtering is studied. By wrapping a cost-minimizing procedure, we propose the prototype cost-sensitive ensemble learning based mislabeled data filtering algorithm, named CSENF. Based on CSENF, we further propose two novel algorithms: the cost-sensitive repeated majority filtering algorithm CSRMF and cost-sensitive repeated consensus filtering algorithm CSRCF. Compared to CSENF, these two algorithms could estimate the mislabeling probability of each training data more confidently. Therefore, they produce less cost compared to CSENF and cost-blind mislabeling filters. Empirical and theoretical evaluations on a set of benchmark datasets illustrate the superior performance of the proposed methods.

DOI Link

10.1016/j.ins.2017.03.034

ISSN

0020-0255

Publisher

Elsevier Inc.

Volume

402

First Page

170

Last Page

181

Disciplines

Computer Sciences

Keywords

Cost-sensitive, Ensemble learning, Mislabeled data filtering

Scopus ID

85016576498

Recommended Citation

Guan, Donghai; Yuan, Weiwei; Ma, Tinghuai; Khattak, Asad Masood; and Chow, Francis, "Cost-sensitive elimination of mislabeled training data" (2017). All Works. 1106.
https://zuscholars.zu.ac.ae/works/1106

Indexed in Scopus

yes

Open Access

no

All Works

Cost-sensitive elimination of mislabeled training data

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

First Page

Last Page

Disciplines

Keywords

Scopus ID

Recommended Citation

Indexed in Scopus

Open Access

Search

Browse

Contribute

Content Type

All Works

Cost-sensitive elimination of mislabeled training data

Author First name, Last name, Institution

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

First Page

Last Page

Disciplines

Keywords

Scopus ID

Recommended Citation

Indexed in Scopus

Open Access

Share

Search

Browse

Contribute

Content Type