HiDF: A Human-Indistinguishable Deepfake Dataset

Chaewon Kang, Seoyoon Jeong, Jonghyun Lee, Daejin Choi, Simon S. Woo, Jinyoung Han

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapid development and prevalence of generative AI have made it easy for people to create high-quality deepfake images and videos, but their abuses have also increased exponentially. To mitigate potential social disruption, it is crucial to quickly detect the authenticity of each deepfake content hidden in a sea of information. While researchers have worked on developing deep learning-based methods, the deepfake datasets utilized in these studies are far from the real world in terms of their qualities; most popular deepfake datasets are human-distinguishable. To address this problem, we present a novel deepfake dataset, HiDF, a high-quality and human-indistinguishable deepfake dataset consisting of 62K images and 8K videos. HiDF is a meticulously curated dataset that includes diverse subjects that have undergone rigorous quality checks. A comparison of the quality between HiDF and existing deepfake datasets demonstrates that HiDF is human-indistinguishable. Hence, it can be a valuable benchmark dataset for deepfake detection tasks. Data and code (https://github.com/DSAIL-SKKU/HiDF) are publicly available for future deepfake detection research.

Original languageEnglish
Title of host publicationKDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages5527-5538
Number of pages12
ISBN (Electronic)9798400714542
DOIs
StatePublished - 3 Aug 2025
Event31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto, Canada
Duration: 3 Aug 20257 Aug 2025

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2
ISSN (Print)2154-817X

Conference

Conference31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Country/TerritoryCanada
CityToronto
Period3/08/257/08/25

Bibliographical note

Publisher Copyright:
© 2025 Copyright held by the owner/author(s).

Keywords

  • ai
  • deep-learning
  • deepfake
  • human-indistinguishable
  • multimodal

Fingerprint

Dive into the research topics of 'HiDF: A Human-Indistinguishable Deepfake Dataset'. Together they form a unique fingerprint.

Cite this