Abstract
The rapid development and prevalence of generative AI have made it easy for people to create high-quality deepfake images and videos, but their abuses have also increased exponentially. To mitigate potential social disruption, it is crucial to quickly detect the authenticity of each deepfake content hidden in a sea of information. While researchers have worked on developing deep learning-based methods, the deepfake datasets utilized in these studies are far from the real world in terms of their qualities; most popular deepfake datasets are human-distinguishable. To address this problem, we present a novel deepfake dataset, HiDF, a high-quality and human-indistinguishable deepfake dataset consisting of 62K images and 8K videos. HiDF is a meticulously curated dataset that includes diverse subjects that have undergone rigorous quality checks. A comparison of the quality between HiDF and existing deepfake datasets demonstrates that HiDF is human-indistinguishable. Hence, it can be a valuable benchmark dataset for deepfake detection tasks. Data and code (https://github.com/DSAIL-SKKU/HiDF) are publicly available for future deepfake detection research.
| Original language | English |
|---|---|
| Title of host publication | KDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining |
| Publisher | Association for Computing Machinery |
| Pages | 5527-5538 |
| Number of pages | 12 |
| ISBN (Electronic) | 9798400714542 |
| DOIs | |
| State | Published - 3 Aug 2025 |
| Event | 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto, Canada Duration: 3 Aug 2025 → 7 Aug 2025 |
Publication series
| Name | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
|---|---|
| Volume | 2 |
| ISSN (Print) | 2154-817X |
Conference
| Conference | 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 |
|---|---|
| Country/Territory | Canada |
| City | Toronto |
| Period | 3/08/25 → 7/08/25 |
Bibliographical note
Publisher Copyright:© 2025 Copyright held by the owner/author(s).
Keywords
- ai
- deep-learning
- deepfake
- human-indistinguishable
- multimodal