Enhancing Gender Prediction Model Performance through Automatic Individual Entity Extraction and Class Balance

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

To build a model that predicts the gender of named entities in text, a high-quality labeled dataset is required, which requires considerable manual effort and time. This paper proposes two major contributions to address this issue. First, we develop a mechanism to automatically extract individual entities from sentences using Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging techniques. This approach automates the data generation process and reduces costs. Second, we ensure class balance in the dataset to optimize model performance. Experimental results demonstrate that the automated data generation method and balanced dataset significantly enhance the performance of the gender prediction model. This work makes a substantial contribution to data generation and the improvement of model performance in Natural Language Processing (NLP) tasks.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE International Conference on Big Data and Smart Computing, BigComp 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages120-121
Number of pages2
Edition2025
ISBN (Electronic)9798331529024
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Big Data and Smart Computing, BigComp 2025 - Kota Kinabalu, Malaysia
Duration: 9 Feb 202512 Feb 2025

Conference

Conference2025 IEEE International Conference on Big Data and Smart Computing, BigComp 2025
Country/TerritoryMalaysia
CityKota Kinabalu
Period9/02/2512/02/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Keywords

  • Automation
  • Data generation
  • NLP
  • Named Entity Recognition
  • Part-of-Speech

Fingerprint

Dive into the research topics of 'Enhancing Gender Prediction Model Performance through Automatic Individual Entity Extraction and Class Balance'. Together they form a unique fingerprint.

Cite this