Abstract
Abstract.: Typically, classification algorithms strive to maximize the accuracy. However, when dealing with significantly imbalanced data, accuracy may not be the most suitable metric. We believe that the most effective approach for handling imbalanced cases is to minimize the total costs. Unfortunately, precise costs for misclassification are often unavailable in real-world scenarios. To address this problem, we offer a simple and efficient search algorithm for cost-sensitive learning. We also introduce a new performance metric, imbalanced data classification performance (IDCP), which combines the F-score and the area under the curve (AUC). By utilizing the imbalance ratio (IR) as a crucial factor, we use IDCP to determine optimal weights in cost-sensitive learning. Through extensive experiments, we show that our method can find the optimal decision boundary in imbalanced datasets. Our code is available at https://github.com/sssojin/Imbalanced_Classification.
| Original language | English |
|---|---|
| Pages (from-to) | 7286-7300 |
| Number of pages | 15 |
| Journal | Communications in Statistics - Theory and Methods |
| Volume | 54 |
| Issue number | 22 |
| DOIs | |
| State | Published - 2025 |
Bibliographical note
Publisher Copyright:© 2025 Taylor & Francis Group, LLC.
Keywords
- classification performance
- cost-sensitive learning
- hybrid classification
- Imbalanced classification