The machine learning-based dropout early warning system for improving the performance of dropout prediction

Research output: Contribution to journalArticlepeer-review

97 Scopus citations

Abstract

A dropout early warning system enables schools to preemptively identify students who are at risk of dropping out of school, to promptly react to them, and eventually to help potential dropout students to continue their learning for a better future. However, the inherent class imbalance between dropout and non-dropout students could pose difficulty in building accurate predictive modeling for a dropout early warning system. The present study aimed to improve the performance of a dropout early warning system: (a) by addressing the class imbalance issue using the synthetic minority oversampling techniques (SMOTE) and the ensemble methods in machine learning; and (b) by evaluating the trained classifiers with both receiver operating characteristic (ROC) and precision-recall (PR) curves. To that end, we trained random forest, boosted decision tree, random forest with SMOTE, and boosted decision tree with SMOTE using the big data samples of the 165,715 high school students from the National Education Information System (NEIS) in South Korea. According to our ROC and PR curve analysis, boosted decision tree showed the optimal performance.

Original languageEnglish
Article number3093
JournalApplied Sciences (Switzerland)
Volume9
Issue number15
DOIs
StatePublished - 1 Aug 2019

Bibliographical note

Publisher Copyright:
© 2019 by the authors.

Keywords

  • Big data
  • Class-imbalance
  • Dropout
  • Ensemble
  • Machine learning
  • Oversampling

Fingerprint

Dive into the research topics of 'The machine learning-based dropout early warning system for improving the performance of dropout prediction'. Together they form a unique fingerprint.

Cite this