Abstract
As fake news spreads rapidly in social media, attempts to develop detection technology to automatically identify fake news are actively being developed, recently. However, most of them focus only on the linguistic and compositional characteristics of fake news (e.g., source or authors indication, length of a message, frequency of negative words). Compared to them, this study proposes a fake news detection model based on machine learning that reflects the characteristics of users, news content, and social networks based on social capital. To comprehensively reflect the characteristics related to the spread of fake news, this study applied the XGBoost model to estimate the feature importance of each variable to derive the priority factors that preferentially affect fake news detection. Based on the derived variables, we established SVM, RF, LR, CART, and NNET, which are representative classification models of machine learning, and compared the performance rate of fake news detection. To generalize the established models (i.e., to avoid overfitting or underfitting), this study performed a cross-validation step, and to compare the predictive accuracy of the established models. As a result, the RF model indicated the highest prediction rate at about 94%, while the NNET had the lowest performance rate at about 92.1%. The results of this study are expected to contribute to improve the fake news detection system in preparation for the more sophisticated generation and spread of fake news.
Original language | English |
---|---|
Pages (from-to) | 71517-71527 |
Number of pages | 11 |
Journal | IEEE Access |
Volume | 11 |
DOIs | |
State | Published - 2023 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Keywords
- Classification algorithms
- fake news
- fake news detection
- feature selection
- prediction algorithms
- predictive models
- XGBoost