Abstract
Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation sampling (SDS), a methodology of using pretrained text-to-2D diffusion models to optimize a neural radiance field (NeRF) for a zero-shot setting. However, the lack of 3D awareness in the 2D diffusion model often destabilizes previous methods from generating a plausible 3D scene. To address this issue, we propose 3DFuse, a novel framework that incorporates 3D awareness into the pretrained 2D diffusion model, enhancing the robustness and 3D consistency of score distillation-based methods. Specifically, we introduce a consistency injection module that constructs a 3D point cloud from the image generated from the text prompt and utilizes its projected depth map at a given view as a condition for the 2D diffusion model. The diffusion model, through its generative capability, robustly infers dense structure from the sparse point cloud depth map and generates a geometrically consistent and coherent 3D scene. We also introduce a new technique called semantic coding that reduces the semantic ambiguity of the text prompt for improved results. Our method can be easily adapted to various text-to-3D baselines. We experimentally demonstrate how our method notably enhances the 3D consistency of generated scenes compared to previous baselines, achieving state-of-the-art performance in geometric robustness and fidelity. The project page is available at https://ku-cvlab.github.io/3DFuse/.
| Original language | English |
|---|---|
| State | Published - 2024 |
| Event | 12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria Duration: 7 May 2024 → 11 May 2024 |
Conference
| Conference | 12th International Conference on Learning Representations, ICLR 2024 |
|---|---|
| Country/Territory | Austria |
| City | Hybrid, Vienna |
| Period | 7/05/24 → 11/05/24 |
Bibliographical note
Publisher Copyright:© 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.
Fingerprint
Dive into the research topics of 'LET 2D DIFFUSION MODEL KNOW 3D-CONSISTENCY FOR ROBUST TEXT-TO-3D GENERATION'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver