Forecasting daily PM10 concentrations in Seoul using various data mining techniques

Ji Eun Choi, Hyesun Lee, Jongwoo Song

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


Interest in PM10 concentrations have increased greatly in Korea due to recent increases in air pollution levels. Therefore, we consider a forecasting model for next day PM10 concentration based on the principal elements of air pollution, weather information and Beijing PM2.5. If we can forecast the next day PM10 concentration level accurately, we believe that this forecasting can be useful for policy makers and public. This paper is intended to help forecast a daily mean PM10, a daily max PM10 and four stages of PM10 provided by the Ministry of Environment using various data mining techniques. We use seven models to forecast the daily PM10, which include five regression models (linear regression, Randomforest, gradient boosting, support vector machine, neural network), and two time series models (ARIMA, ARFIMA). As a result, the linear regression model performs the best in the PM10 concentration forecast and the linear regression and Randomforest model performs the best in the PM10 class forecast. The results also indicate that the PM10 in Seoul is influenced by Beijing PM2.5 and air pollution from power stations in the west coast.

Original languageEnglish
Pages (from-to)199-215
Number of pages17
JournalCommunications for Statistical Applications and Methods
Issue number2
StatePublished - 1 Mar 2018

Bibliographical note

Funding Information:
This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2017R1D1A1B03036078).

Publisher Copyright:
© 2018 The Korean Statistical Society, and Korean International Statistical Society.


  • Gradient boosting
  • Linear regression
  • Neural network
  • PM concentration
  • Randomforest
  • Support vector machine


Dive into the research topics of 'Forecasting daily PM10 concentrations in Seoul using various data mining techniques'. Together they form a unique fingerprint.

Cite this