Understanding the Use of AI-Based Audio Generation Models by End-Users

Jiyeon Han, Eunseo Yang, Uran Oh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


With the growing popularity of video platforms, the demand for copyright-free audio sources for adding background music to videos is also expected to increase. While text-to-audio-generation models can be useful for this purpose, little is known about how people perceive and use these models. To understand how generation models for audio are used and to identify their strengths and weaknesses compared to typical audio search engines, we conducted a user study with 16 participants, where they were asked to choose matching background music after watching muted videos. Findings show that participants appreciate the search engine for recommending search keywords and displaying multiple results although the outcome does not fully reflect the intent. In contrast, the generation model posed challenges in choosing proper prompts but excelled in finding the desired music. Based on these results, we suggest design considerations to improve the usability of the audio generation model for end-users.

Original languageEnglish
Title of host publicationCHI 2024 - Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Sytems
PublisherAssociation for Computing Machinery
ISBN (Electronic)9798400703317
StatePublished - 11 May 2024
Event2024 CHI Conference on Human Factors in Computing Sytems, CHI EA 2024 - Hybrid, Honolulu, United States
Duration: 11 May 202416 May 2024

Publication series

NameConference on Human Factors in Computing Systems - Proceedings


Conference2024 CHI Conference on Human Factors in Computing Sytems, CHI EA 2024
Country/TerritoryUnited States
CityHybrid, Honolulu

Bibliographical note

Publisher Copyright:
© 2024 Association for Computing Machinery. All rights reserved.


  • empirical study
  • search engine
  • text-to-audio
  • user experience


Dive into the research topics of 'Understanding the Use of AI-Based Audio Generation Models by End-Users'. Together they form a unique fingerprint.

Cite this