TY - GEN
T1 - Photo-based question answering
AU - Yeh, Tom
AU - Lee, John J.
AU - Darrell, Trevor
PY - 2008
Y1 - 2008
N2 - Photo-based question answering is a useful way of finding information about physical objects. Current question answering (QA) systems are text-based and can be difficult to use when a question involves an object with distinct visual features. A photo-based QA system allows direct use of a photo to refer to the object. We develop a three-layer system architecture for photo-based QA that brings together recent technical achievements in question answering and image matching. The first, template-based QA layer matches a query photo to online images and extracts structured data from multimedia databases to answer questions about the photo. To simplify image matching, it exploits the question text to filter images based on categories and keywords. The second, information retrieval QA layer searches an internal repository of resolved photo-based questions to retrieve relevant answers. The third, human-computation QA layer leverages community experts to handle the most difficult cases. A series of experiments performed on a pilot dataset of 30,000 images of books, movie DVD covers, grocery items, and landmarks demonstrate the technical feasibility of this architecture. We present three prototypes to show how photo-based QA can be built into an online album, a text-based QA, and a mobile application.
AB - Photo-based question answering is a useful way of finding information about physical objects. Current question answering (QA) systems are text-based and can be difficult to use when a question involves an object with distinct visual features. A photo-based QA system allows direct use of a photo to refer to the object. We develop a three-layer system architecture for photo-based QA that brings together recent technical achievements in question answering and image matching. The first, template-based QA layer matches a query photo to online images and extracts structured data from multimedia databases to answer questions about the photo. To simplify image matching, it exploits the question text to filter images based on categories and keywords. The second, information retrieval QA layer searches an internal repository of resolved photo-based questions to retrieve relevant answers. The third, human-computation QA layer leverages community experts to handle the most difficult cases. A series of experiments performed on a pilot dataset of 30,000 images of books, movie DVD covers, grocery items, and landmarks demonstrate the technical feasibility of this architecture. We present three prototypes to show how photo-based QA can be built into an online album, a text-based QA, and a mobile application.
KW - Algorithms
KW - Design
KW - Human factors
UR - http://www.scopus.com/inward/record.url?scp=64949131497&partnerID=8YFLogxK
U2 - 10.1145/1459359.1459412
DO - 10.1145/1459359.1459412
M3 - Conference contribution
AN - SCOPUS:64949131497
SN - 9781605583037
T3 - MM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops
SP - 389
EP - 398
BT - MM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops
T2 - 16th ACM International Conference on Multimedia, MM '08
Y2 - 26 October 2008 through 31 October 2008
ER -