Skip to main navigation Skip to search Skip to main content

LET 2D DIFFUSION MODEL KNOW 3D-CONSISTENCY FOR ROBUST TEXT-TO-3D GENERATION

  • Junyoung Seo
  • , Wooseok Jang
  • , Min Seop Kwak
  • , Hyeonsu Kim
  • , Jaehoon Ko
  • , Junho Kim
  • , Jin Hwa Kim
  • , Jiyoung Lee
  • , Seungryong Kim

Research output: Contribution to conferencePaperpeer-review

17 Scopus citations

Abstract

Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation sampling (SDS), a methodology of using pretrained text-to-2D diffusion models to optimize a neural radiance field (NeRF) for a zero-shot setting. However, the lack of 3D awareness in the 2D diffusion model often destabilizes previous methods from generating a plausible 3D scene. To address this issue, we propose 3DFuse, a novel framework that incorporates 3D awareness into the pretrained 2D diffusion model, enhancing the robustness and 3D consistency of score distillation-based methods. Specifically, we introduce a consistency injection module that constructs a 3D point cloud from the image generated from the text prompt and utilizes its projected depth map at a given view as a condition for the 2D diffusion model. The diffusion model, through its generative capability, robustly infers dense structure from the sparse point cloud depth map and generates a geometrically consistent and coherent 3D scene. We also introduce a new technique called semantic coding that reduces the semantic ambiguity of the text prompt for improved results. Our method can be easily adapted to various text-to-3D baselines. We experimentally demonstrate how our method notably enhances the 3D consistency of generated scenes compared to previous baselines, achieving state-of-the-art performance in geometric robustness and fidelity. The project page is available at https://ku-cvlab.github.io/3DFuse/.

Original languageEnglish
StatePublished - 2024
Event12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria
Duration: 7 May 202411 May 2024

Conference

Conference12th International Conference on Learning Representations, ICLR 2024
Country/TerritoryAustria
CityHybrid, Vienna
Period7/05/2411/05/24

Bibliographical note

Publisher Copyright:
© 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

Fingerprint

Dive into the research topics of 'LET 2D DIFFUSION MODEL KNOW 3D-CONSISTENCY FOR ROBUST TEXT-TO-3D GENERATION'. Together they form a unique fingerprint.

Cite this