Abstract
In this paper, we explore the effectiveness of the GPT-3 model in tackling imbalanced sentiment analysis, focusing on the Coursera online course review dataset that exhibits high imbalance. Training on such skewed datasets often results in a bias towards the majority class, undermining the classification performance for minority sentiments, thereby accentuating the necessity for a balanced dataset. Two primary initiatives were undertaken: (1) synthetic review generation via fine-tuning of the Davinci base model from GPT-3 and (2) sentiment classification utilizing nine models on both imbalanced and balanced datasets. The results indicate that good-quality synthetic reviews substantially enhance sentiment classification performance. Every model demonstrated an improvement in accuracy, with an average increase of approximately 12.76% on the balanced dataset. Among all the models, the Multinomial Naïve Bayes achieved the highest accuracy, registering 75.12% on the balanced dataset. This study underscores the potential of the GPT-3 model as a feasible solution for addressing data imbalance in sentiment analysis and offers significant insights for future research.
Original language | English |
---|---|
Article number | 9766 |
Journal | Applied Sciences (Switzerland) |
Volume | 13 |
Issue number | 17 |
DOIs | |
State | Published - Sep 2023 |
Bibliographical note
Publisher Copyright:© 2023 by the authors.
Keywords
- GPT-3
- imbalanced sentiment analysis
- sentiment analysis
- sentiment classification
- synthetics review generation
- text classification
- text generation