Machine Learning-Based Anomaly Detection on Seawater Temperature Data with Oversampling

Hangoo Kang, Dongil Kim, Sungsu Lim

Research output: Contribution to journalArticlepeer-review

Abstract

This study deals with a method for anomaly detection in seawater temperature data using machine learning methods with oversampling techniques. Data were acquired from 2017 to 2023 using a Conductivity–Temperature–Depth (CTD) system in the Pacific Ocean, Indian Ocean, and Sea of Korea. The seawater temperature data consist of 1414 profiles including 1218 normal and 196 abnormal profiles. This dataset has an imbalance problem in which the amount of abnormal data is insufficient compared to that of normal data. Therefore, we generated abnormal data with oversampling techniques using duplication, uniform random variable, Synthetic Minority Oversampling Technique (SMOTE), and autoencoder (AE) techniques for the balance of data class, and trained Interquartile Range (IQR)-based, one-class support vector machine (OCSVM), and Multi-Layer Perceptron (MLP) models with a balanced dataset for anomaly detection. In the experimental results, the F1 score of the MLP showed the best performance at 0.882 in the combination of learning data, consisting of 30% of the minor data generated by SMOTE. This result is a 71.4%-point improvement over the F1 score of the IQR-based model, which is the baseline of this study, and is 1.3%-point better than the best-performing model among the models without oversampling data.

Original languageEnglish
Article number807
JournalJournal of Marine Science and Engineering
Volume12
Issue number5
DOIs
StatePublished - May 2024

Bibliographical note

Publisher Copyright:
© 2024 by the authors.

Keywords

  • anomaly detection
  • class imbalance problem
  • data augmentation
  • machine learning
  • ocean observation
  • oversampling

Fingerprint

Dive into the research topics of 'Machine Learning-Based Anomaly Detection on Seawater Temperature Data with Oversampling'. Together they form a unique fingerprint.

Cite this