A dropout early warning system enables schools to preemptively identify students who are at risk of dropping out of school, to promptly react to them, and eventually to help potential dropout students to continue their learning for a better future. However, the inherent class imbalance between dropout and non-dropout students could pose difficulty in building accurate predictive modeling for a dropout early warning system. The present study aimed to improve the performance of a dropout early warning system: (a) by addressing the class imbalance issue using the synthetic minority oversampling techniques (SMOTE) and the ensemble methods in machine learning; and (b) by evaluating the trained classifiers with both receiver operating characteristic (ROC) and precision-recall (PR) curves. To that end, we trained random forest, boosted decision tree, random forest with SMOTE, and boosted decision tree with SMOTE using the big data samples of the 165,715 high school students from the National Education Information System (NEIS) in South Korea. According to our ROC and PR curve analysis, boosted decision tree showed the optimal performance.
- Big data
- Machine learning