Abstract
With the growing popularity of video platforms, the demand for copyright-free audio sources for adding background music to videos is also expected to increase. While text-to-audio-generation models can be useful for this purpose, little is known about how people perceive and use these models. To understand how generation models for audio are used and to identify their strengths and weaknesses compared to typical audio search engines, we conducted a user study with 16 participants, where they were asked to choose matching background music after watching muted videos. Findings show that participants appreciate the search engine for recommending search keywords and displaying multiple results although the outcome does not fully reflect the intent. In contrast, the generation model posed challenges in choosing proper prompts but excelled in finding the desired music. Based on these results, we suggest design considerations to improve the usability of the audio generation model for end-users.
Original language | English |
---|---|
Title of host publication | CHI 2024 - Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Sytems |
Publisher | Association for Computing Machinery |
ISBN (Electronic) | 9798400703317 |
DOIs | |
State | Published - 11 May 2024 |
Event | 2024 CHI Conference on Human Factors in Computing Sytems, CHI EA 2024 - Hybrid, Honolulu, United States Duration: 11 May 2024 → 16 May 2024 |
Publication series
Name | Conference on Human Factors in Computing Systems - Proceedings |
---|
Conference
Conference | 2024 CHI Conference on Human Factors in Computing Sytems, CHI EA 2024 |
---|---|
Country/Territory | United States |
City | Hybrid, Honolulu |
Period | 11/05/24 → 16/05/24 |
Bibliographical note
Publisher Copyright:© 2024 Association for Computing Machinery. All rights reserved.
Keywords
- empirical study
- search engine
- text-to-audio
- user experience