Utilizing a Dense Video Captioning Technique for Generating Image Descriptions of Comics for People with Visual Impairments

Suhyun Kim, Semin Lee, Kyungok Kim, Uran Oh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

To improve the accessibility of visual figures, auto-generation of text description of individual images has been studied. However, it cannot be directly applied to comics as the descriptions can be redundant as similar scenes appear in a row. To address this issue, we propose generating the descriptions per group of related images and demonstrate how an dense captioning technique for videos can be utilized for this purpose and ways to improve its performance. To assess the effectiveness of our approach and to identify factors affecting the quality of text descriptions of comics, we conducted a preliminary study with 3 sighted evaluators and a main user study with 12 participants with visual impairments. The results show that text descriptions generated per group of images are perceived to be better than those generated per image in terms of accuracy, clarity, understandability, length, informativeness and preference for sighted groups, when annotator is human. In the same conditions, when the annotator is AI, it exhibited better performance in terms of length. Also, people with visual impairments prefer group descriptions because of conciseness, smooth connectivity of sentences, and non-repetitive features. Based on the findings, we provide design recommendations for generating accessible comic descriptions at a scale for blind users.

Original languageEnglish
Title of host publicationProceedings of 2024 29th Annual Conference on Intelligent User Interfaces, IUI 2024
PublisherAssociation for Computing Machinery
Pages750-760
Number of pages11
ISBN (Electronic)9798400705083
DOIs
StatePublished - 18 Mar 2024
Event29th Annual Conference on Intelligent User Interfaces, IUI 2024 - Greenville, United States
Duration: 18 Mar 202421 Mar 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference29th Annual Conference on Intelligent User Interfaces, IUI 2024
Country/TerritoryUnited States
CityGreenville
Period18/03/2421/03/24

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

Keywords

  • comics
  • dense video captioning
  • image description
  • people with visual impairment

Fingerprint

Dive into the research topics of 'Utilizing a Dense Video Captioning Technique for Generating Image Descriptions of Comics for People with Visual Impairments'. Together they form a unique fingerprint.

Cite this