Deep Monocular Depth Estimation via Integration of Global and Local Predictions

Youngjung Kim, Hyungjoo Jung, Dongbo Min, Kwanghoon Sohn

Research output: Contribution to journalArticlepeer-review

78 Scopus citations


Recent works on machine learning have greatly advanced the accuracy of single image depth estimation. However, the resulting depth images are still over-smoothed and perceptually unsatisfying. This paper casts depth prediction from single image as a parametric learning problem. Specifically, we propose a deep variational model that effectively integrates heterogeneous predictions from two convolutional neural networks (CNNs), named global and local networks. They have contrasting network architecture and are designed to capture the depth information with complementary attributes. These intermediate outputs are then combined in the integration network based on the variational framework. By unrolling the optimization steps of Split Bregman iterations in the integration network, our model can be trained in an end-to-end manner. This enables one to simultaneously learn an efficient parameterization of the CNNs and hyper-parameter in the variational method. Finally, we offer a new data set of 0.22 million RGB-D images captured by Microsoft Kinect v2. Our model generates realistic and discontinuity-preserving depth prediction without involving any low-level segmentation or superpixels. Intensive experiments demonstrate the superiority of the proposed method in a range of RGB-D benchmarks, including both indoor and outdoor scenarios.

Original languageEnglish
Pages (from-to)4131-4144
Number of pages14
JournalIEEE Transactions on Image Processing
Issue number8
StatePublished - Aug 2018

Bibliographical note

Funding Information:
Manuscript received June 20, 2017; revised February 7, 2018 and May 2, 2018; accepted May 9, 2018. Date of publication May 15, 2018; date of current version May 24, 2018. This work was supported in part by the Next Generation Information Computing Development Program through the National Research Foundation of Korea (NRF), Ministry of Science, ICT, under Grant NRF-2017M3C4A7069370, and in part by the Basic Science Research Program through the NRF under Grant NRF-2015R1D1A1A01061143. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Kalpana Seshadrinathan. (Corresponding author: Kwanghoon Sohn.) Y. Kim, H. Jung, and K. Sohn are with the School of Electrical and Electronic Engineering, Yonsei University, Seoul 120-749, South Korea (e-mail:;;

Publisher Copyright:
© 1992-2012 IEEE.


  • 2D-to-3D conversion
  • Depth estimation
  • RGB-D database
  • convolutional neural networks
  • non-parametric sampling


Dive into the research topics of 'Deep Monocular Depth Estimation via Integration of Global and Local Predictions'. Together they form a unique fingerprint.

Cite this