Enhancing machine learning models for total organic carbon prediction by integrating geospatial parameters in river watersheds

Haeseong Oh, Ho Yeon Park, Jae In Kim, Byung Joon Lee, Jung Hyun Choi, Jin Hur

Research output: Contribution to journalArticlepeer-review

Abstract

This study utilizes machine learning (ML) algorithms to develop a robust total organic carbon (TOC) prediction model for river waters in the Geumho River sub-basins, South Korea, considering both non-rain and rain events. The model incorporates geospatial parameters such as land use, slope, flow rate, and basic water quality metrics including biochemical oxygen demand (BOD), chemical oxygen demand (COD), total nitrogen (TN), total phosphorus (TP), and suspended solids (SS). A key aspect of this research is examining how land use information enhances the model's predictive accuracy. We compared two ML algorithms—extreme gradient boosting (XGBoost) and deep neural networks (DNN)—with a traditional multiple linear regression (MLR) approach. XGBoost outperformed the others, achieving an R2 value between 0.61 and 0.68 in the test dataset and demonstrating significant improvement during rain events with an R2 of 0.77 when including land use data. In contrast, this enhancement was not observed with the MLR model. Feature importance analysis using Shapley values highlighted COD as the primary predictor for non-rain events, while during rain events, COD, TP, TN, SS and agricultural land collectively influenced TOC levels. This study significantly advances understanding of TOC variability across different land use scenarios in river systems and underscores the importance of integrating geospatial and water quality parameters to enhance TOC prediction, particularly during rain events. This methodology provides a valuable framework for developing river management strategies and monitoring long-term TOC trends, especially in scenarios with gaps in essential monitoring data.

Original languageEnglish
Article number173743
JournalScience of the Total Environment
Volume943
DOIs
StatePublished - 15 Sep 2024

Bibliographical note

Publisher Copyright:
© 2024

Keywords

  • Feature importance
  • Land use
  • Machine learning
  • Prediction model
  • Total organic carbon

Fingerprint

Dive into the research topics of 'Enhancing machine learning models for total organic carbon prediction by integrating geospatial parameters in river watersheds'. Together they form a unique fingerprint.

Cite this