Abstract
Stereo confidence estimation aims to estimate the reliability of the estimated disparity by stereo matching. Different from the previous methods that exploit the limited input modality, we present a novel method that estimates confidence map of an initial disparity by making full use of tri-modal input, including matching cost, disparity, and color image through deep networks. The proposed network, termed as Locally Adaptive Fusion Networks (LAF-Net), learns locally-varying attention and scale maps to fuse the tri-modal confidence features. Moreover, we propose a knowledge distillation framework to learn more compact confidence estimation networks as student networks. By transferring the knowledge from LAF-Net as teacher networks, the student networks that solely take as input a disparity can achieve comparable performance. To transfer more informative knowledge, we also propose a module to learn the locally-varying temperature in a softmax function. We further extend this framework to a multiview scenario. Experimental results show that LAF-Net and its variations outperform the state-of-the-art stereo confidence methods on various benchmarks.
| Original language | English |
|---|---|
| Pages (from-to) | 6372-6385 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
| Volume | 45 |
| Issue number | 5 |
| DOIs | |
| State | Published - 1 May 2023 |
Bibliographical note
Publisher Copyright:© 1979-2012 IEEE.
Keywords
- Stereo matching
- deep learning
- knowledge distillation
- stereo confidence estimation