Analysis of imbalanced data using cost-sensitive learning

Research output: Contribution to journalArticlepeer-review

Abstract

Abstract.: Typically, classification algorithms strive to maximize the accuracy. However, when dealing with significantly imbalanced data, accuracy may not be the most suitable metric. We believe that the most effective approach for handling imbalanced cases is to minimize the total costs. Unfortunately, precise costs for misclassification are often unavailable in real-world scenarios. To address this problem, we offer a simple and efficient search algorithm for cost-sensitive learning. We also introduce a new performance metric, imbalanced data classification performance (IDCP), which combines the F-score and the area under the curve (AUC). By utilizing the imbalance ratio (IR) as a crucial factor, we use IDCP to determine optimal weights in cost-sensitive learning. Through extensive experiments, we show that our method can find the optimal decision boundary in imbalanced datasets. Our code is available at https://github.com/sssojin/Imbalanced_Classification.

Original languageEnglish
Pages (from-to)7286-7300
Number of pages15
JournalCommunications in Statistics - Theory and Methods
Volume54
Issue number22
DOIs
StatePublished - 2025

Bibliographical note

Publisher Copyright:
© 2025 Taylor & Francis Group, LLC.

Keywords

  • classification performance
  • cost-sensitive learning
  • hybrid classification
  • Imbalanced classification

Fingerprint

Dive into the research topics of 'Analysis of imbalanced data using cost-sensitive learning'. Together they form a unique fingerprint.

Cite this