Video summarization by learning relationships between action and scene

  • Jungin Park
  • , Jiyoung Lee
  • , Sangryul Jeon
  • , Kwanghoon Sohn

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

We propose a novel deep architecture for video summarization in untrimmed videos that simultaneously recognizes action and scene classes for every video segments. Our networks accomplish this through a multi-task fusion approach based on two types of attention modules to explore semantic correlations between action and scene in the videos. The proposed networks consist of the feature embedding networks and attention inference networks to stochastically leverage the inferred action and scene feature representations. Additionally, we design a new center loss function that learns the feature representations by enforcing to minimize the intra-class variations and to maximize the inter-class variations. Our model achieves a score of 0.8409 for summarization and accuracy of 0.7294 for action and scene recognition on test set of CoVieW'19 dataset, which is ranked 3rd.

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1545-1552
Number of pages8
ISBN (Electronic)9781728150239
DOIs
StatePublished - Oct 2019
Event17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27 Oct 201928 Oct 2019

Publication series

NameProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019

Conference

Conference17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period27/10/1928/10/19

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

  • Action recognition
  • Deep learning
  • Multi task learning
  • Scene recognition
  • Video summarization
  • Video understanding

Fingerprint

Dive into the research topics of 'Video summarization by learning relationships between action and scene'. Together they form a unique fingerprint.

Cite this