Mitigating Class Imbalance in Sentiment Analysis through GPT-3-Generated Synthetic Sentences

Cici Suhaeni, Hwan Seung Yong

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

In this paper, we explore the effectiveness of the GPT-3 model in tackling imbalanced sentiment analysis, focusing on the Coursera online course review dataset that exhibits high imbalance. Training on such skewed datasets often results in a bias towards the majority class, undermining the classification performance for minority sentiments, thereby accentuating the necessity for a balanced dataset. Two primary initiatives were undertaken: (1) synthetic review generation via fine-tuning of the Davinci base model from GPT-3 and (2) sentiment classification utilizing nine models on both imbalanced and balanced datasets. The results indicate that good-quality synthetic reviews substantially enhance sentiment classification performance. Every model demonstrated an improvement in accuracy, with an average increase of approximately 12.76% on the balanced dataset. Among all the models, the Multinomial Naïve Bayes achieved the highest accuracy, registering 75.12% on the balanced dataset. This study underscores the potential of the GPT-3 model as a feasible solution for addressing data imbalance in sentiment analysis and offers significant insights for future research.

Original languageEnglish
Article number9766
JournalApplied Sciences (Switzerland)
Volume13
Issue number17
DOIs
StatePublished - Sep 2023

Bibliographical note

Publisher Copyright:
© 2023 by the authors.

Keywords

  • GPT-3
  • imbalanced sentiment analysis
  • sentiment analysis
  • sentiment classification
  • synthetics review generation
  • text classification
  • text generation

Fingerprint

Dive into the research topics of 'Mitigating Class Imbalance in Sentiment Analysis through GPT-3-Generated Synthetic Sentences'. Together they form a unique fingerprint.

Cite this