Abstract
This study deals with a method for anomaly detection in seawater temperature data using machine learning methods with oversampling techniques. Data were acquired from 2017 to 2023 using a Conductivity–Temperature–Depth (CTD) system in the Pacific Ocean, Indian Ocean, and Sea of Korea. The seawater temperature data consist of 1414 profiles including 1218 normal and 196 abnormal profiles. This dataset has an imbalance problem in which the amount of abnormal data is insufficient compared to that of normal data. Therefore, we generated abnormal data with oversampling techniques using duplication, uniform random variable, Synthetic Minority Oversampling Technique (SMOTE), and autoencoder (AE) techniques for the balance of data class, and trained Interquartile Range (IQR)-based, one-class support vector machine (OCSVM), and Multi-Layer Perceptron (MLP) models with a balanced dataset for anomaly detection. In the experimental results, the F1 score of the MLP showed the best performance at 0.882 in the combination of learning data, consisting of 30% of the minor data generated by SMOTE. This result is a 71.4%-point improvement over the F1 score of the IQR-based model, which is the baseline of this study, and is 1.3%-point better than the best-performing model among the models without oversampling data.
Original language | English |
---|---|
Article number | 807 |
Journal | Journal of Marine Science and Engineering |
Volume | 12 |
Issue number | 5 |
DOIs | |
State | Published - May 2024 |
Bibliographical note
Publisher Copyright:© 2024 by the authors.
Keywords
- anomaly detection
- class imbalance problem
- data augmentation
- machine learning
- ocean observation
- oversampling