Abstract
Segmenting three-dimensional indoor scenes with complex layouts and object arrangements remains a core challenge in computer graphics and computational photography. We propose a Transformer-based architecture designed for semantic segmentation on point clouds in complex indoor scenes. It explicitly addresses the inherent challenges of data through a dynamic, multi-scale attention mechanism. At the core of the proposed approach is the dynamic window multi-head self-attention (DW-MSA3D) module, which adaptively fuses features captured at varying window scales. Unlike prior approaches that rely on fixed-window attention, our method dynamically adjusts the receptive field to local scene complexity, enabling expressive encoding of sparse volumes across scales. We achieve competitive performance on public datasets, validating the effectiveness of scale-adaptive attention for representing geometric detail in geometry-aware vision tasks. The source code is released at https://github.com/hyebinny/Dawin3D.
| Original language | English |
|---|---|
| Article number | 132746 |
| Journal | Neurocomputing |
| Volume | 671 |
| DOIs | |
| State | Published - 28 Mar 2026 |
Bibliographical note
Publisher Copyright:© 2026 Elsevier B.V.
Keywords
- Dynamic window transformer
- Scene segmentation
- Scene understanding
Fingerprint
Dive into the research topics of 'Dynamic window transformer for three-dimensional indoor scene segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver