Topic Modeling for Scientific Articles: Exploring Optimal Hyperparameter Tuning in BERT

Maresha Caroline Wijanto, Ika Widiastuti, Hwan Seung Yong

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Topic modeling has emerged as a successful approach to uncovering topics from textual data. Various topic modeling techniques have been introduced, ranging from traditional algorithms to those based on neural networks. In this research, we explore advanced topic modeling techniques, including BERT-based approaches, to enhance the analysis of scientific articles. We first investigate a widely used Latent Dirichlet Allocation (LDA) model and then explore the capabilities of BERT, to automatically uncover latent topics within scientific papers. The goal of this study is to identify the optimal hyperparameter setting for BERT-based topic modeling of scientific articles. We conduct experiments across several scenarios involving combinations of word embedding, dimension reduction, and clustering methods. The results were analyzed based on the coherence values, average execution time, number of topics generated, visualization through the inter-topic distance map, and the top-N-words of each topic. Our findings suggest that combination of RoBERTa for word embedding, PCA for dimension reduction, and K-Means for clustering yields superior results among the tested scenarios. Further evaluation of BERT-based topic modeling is necessary to validate these findings and explore its applications in various academic and industrial contexts. The implications of these advanced techniques could significantly streamline the process of staying updated with scientific literature, potentially revolutionizing research methodologies across disciplines.

Original languageEnglish
Pages (from-to)912-919
Number of pages8
JournalInternational Journal on Advanced Science, Engineering and Information Technology
Volume14
Issue number3
DOIs
StatePublished - 2024

Bibliographical note

Publisher Copyright:
© IJASEIT is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.

Keywords

  • BERT-based
  • hyperparameter
  • scientific articles
  • topic modeling

Fingerprint

Dive into the research topics of 'Topic Modeling for Scientific Articles: Exploring Optimal Hyperparameter Tuning in BERT'. Together they form a unique fingerprint.

Cite this