Abstract
To build a model that predicts the gender of named entities in text, a high-quality labeled dataset is required, which requires considerable manual effort and time. This paper proposes two major contributions to address this issue. First, we develop a mechanism to automatically extract individual entities from sentences using Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging techniques. This approach automates the data generation process and reduces costs. Second, we ensure class balance in the dataset to optimize model performance. Experimental results demonstrate that the automated data generation method and balanced dataset significantly enhance the performance of the gender prediction model. This work makes a substantial contribution to data generation and the improvement of model performance in Natural Language Processing (NLP) tasks.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE International Conference on Big Data and Smart Computing, BigComp 2025 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 120-121 |
| Number of pages | 2 |
| Edition | 2025 |
| ISBN (Electronic) | 9798331529024 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 IEEE International Conference on Big Data and Smart Computing, BigComp 2025 - Kota Kinabalu, Malaysia Duration: 9 Feb 2025 → 12 Feb 2025 |
Conference
| Conference | 2025 IEEE International Conference on Big Data and Smart Computing, BigComp 2025 |
|---|---|
| Country/Territory | Malaysia |
| City | Kota Kinabalu |
| Period | 9/02/25 → 12/02/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Keywords
- Automation
- Data generation
- NLP
- Named Entity Recognition
- Part-of-Speech
Fingerprint
Dive into the research topics of 'Enhancing Gender Prediction Model Performance through Automatic Individual Entity Extraction and Class Balance'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver