Dense Cross-Modal Correspondence Estimation with the Deep Self-Correlation Descriptor

Seungryong Kim, Dongbo Min, Stephen Lin, Kwanghoon Sohn

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


We present the deep self-correlation (DSC) descriptor for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. We encode local self-similar structure in a pyramidal manner that yields both more precise localization ability and greater robustness to non-rigid image deformations. Specifically, DSC first computes multiple self-correlation surfaces with randomly sampled patches over a local support window, and then builds pyramidal self-correlation surfaces through average pooling on the surfaces. The feature responses on the self-correlation surfaces are then encoded through spatial pyramid pooling in a log-polar configuration. To better handle geometric variations such as scale and rotation, we additionally propose the geometry-invariant DSC (GI-DSC) that leverages multi-scale self-correlation computation and canonical orientation estimation. In contrast to descriptors based on deep convolutional neural networks (CNNs), DSC and GI-DSC are training-free (i.e., handcrafted descriptors), are robust to cross-modality, and generalize well to various modality variations. Extensive experiments demonstrate the state-of-The-Art performance of DSC and GI-DSC on challenging cases of cross-modal image pairs having photometric and/or geometric variations.

Original languageEnglish
Article number8955799
Pages (from-to)2345-2359
Number of pages15
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Issue number7
StatePublished - 1 Jul 2021

Bibliographical note

Publisher Copyright:
© 1979-2012 IEEE.


  • Cross-modal correspondence
  • local self-similarity
  • non-rigid deformation
  • pyramidal structure
  • self-correlation


Dive into the research topics of 'Dense Cross-Modal Correspondence Estimation with the Deep Self-Correlation Descriptor'. Together they form a unique fingerprint.

Cite this