Unsupervised Visual Representation Learning Based on Segmentation of Geometric Pseudo-Shapes for Transformer-Based Medical Tasks

Thanaporn Viriyasaranon, Sang Myung Woo, Jang Hwan Choi

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


Recently, transformer-based architectures have been shown to outperform classic convolutional architectures and have rapidly been established as state-of-the-art models for many medical vision tasks. Their superior performance can be explained by their ability to capture long-range dependencies of their multi-head self-attention mechanism. However, they tend to overfit on small- or even medium-sized datasets because of their weak inductive bias. As a result, they require massive, labeled datasets, which are expensive to obtain, especially in the medical domain. This motivated us to explore unsupervised semantic feature learning without any form of annotation. In this work, we aimed to learn semantic features in a self-supervised manner by training transformer-based models to segment the numerical signals of geometric shapes inserted on original computed tomography (CT) images. Moreover, we developed a Convolutional Pyramid vision Transformer (CPT) that leverages multi-kernel convolutional patch embedding and local spatial reduction in each of its layer to generate multi-scale features, capture local information, and reduce computational cost. Using these approaches, we were able to noticeably outperformed state-of-the-art deep learning-based segmentation or classification models of liver cancer CT datasets of 5,237 patients, the pancreatic cancer CT datasets of 6,063 patients, and breast cancer MRI dataset of 127 patients.

Original languageEnglish
Pages (from-to)2003-2014
Number of pages12
JournalIEEE journal of biomedical and health informatics
Issue number4
StatePublished - 1 Apr 2023

Bibliographical note

Publisher Copyright:
© 2013 IEEE.


  • CT images
  • MRI images
  • breast cancer
  • cancer classification
  • cancer segmentation
  • liver cancer
  • pancreatic cancer
  • self-supervised pretraining
  • vision transformer


Dive into the research topics of 'Unsupervised Visual Representation Learning Based on Segmentation of Geometric Pseudo-Shapes for Transformer-Based Medical Tasks'. Together they form a unique fingerprint.

Cite this