TY - GEN
T1 - Adaptive confidence thresholding for monocular depth estimation
AU - Choi, Hyesong
AU - Lee, Hunsang
AU - Kim, Sunkyung
AU - Kim, Sunok
AU - Kim, Seungryong
AU - Sohn, Kwanghoon
AU - Min, Dongbo
N1 - Funding Information:
This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-00056) and the Mid-Career Researcher Program through the NRF of Korea (NRF-2021R1A2C2011624). S. Kim4 was supported in part by the MSIT under the ICT Creative Consilience Program (IITP-2021-2020-0-01819). ∗ Equal contribution. † Corresponding author.
Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Self-supervised monocular depth estimation has become an appealing solution to the lack of ground truth labels, but its reconstruction loss often produces over-smoothed results across object boundaries and is incapable of handling occlusion explicitly. In this paper, we propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods. The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps. To cope with the prediction error of the confidence map itself, we also leverage the threshold network that learns the threshold dynamically conditioned on the pseudo depth maps. The pseudo depth labels filtered out by the thresholded confidence map are used to supervise the monocular depth network. Furthermore, we propose the probabilistic framework that refines the monocular depth map with the help of its uncertainty map through the pixel-adaptive convolution (PAC) layer. Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods. Lastly, we exhibit that the proposed threshold learning can also be used to improve the performance of existing confidence estimation approaches.
AB - Self-supervised monocular depth estimation has become an appealing solution to the lack of ground truth labels, but its reconstruction loss often produces over-smoothed results across object boundaries and is incapable of handling occlusion explicitly. In this paper, we propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods. The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps. To cope with the prediction error of the confidence map itself, we also leverage the threshold network that learns the threshold dynamically conditioned on the pseudo depth maps. The pseudo depth labels filtered out by the thresholded confidence map are used to supervise the monocular depth network. Furthermore, we propose the probabilistic framework that refines the monocular depth map with the help of its uncertainty map through the pixel-adaptive convolution (PAC) layer. Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods. Lastly, we exhibit that the proposed threshold learning can also be used to improve the performance of existing confidence estimation approaches.
UR - http://www.scopus.com/inward/record.url?scp=85118008191&partnerID=8YFLogxK
U2 - 10.1109/ICCV48922.2021.01257
DO - 10.1109/ICCV48922.2021.01257
M3 - Conference contribution
AN - SCOPUS:85118008191
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 12788
EP - 12798
BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 October 2021 through 17 October 2021
ER -