Photo-based question answering

Tom Yeh, John J. Lee, Trevor Darrell

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

35 Scopus citations


Photo-based question answering is a useful way of finding information about physical objects. Current question answering (QA) systems are text-based and can be difficult to use when a question involves an object with distinct visual features. A photo-based QA system allows direct use of a photo to refer to the object. We develop a three-layer system architecture for photo-based QA that brings together recent technical achievements in question answering and image matching. The first, template-based QA layer matches a query photo to online images and extracts structured data from multimedia databases to answer questions about the photo. To simplify image matching, it exploits the question text to filter images based on categories and keywords. The second, information retrieval QA layer searches an internal repository of resolved photo-based questions to retrieve relevant answers. The third, human-computation QA layer leverages community experts to handle the most difficult cases. A series of experiments performed on a pilot dataset of 30,000 images of books, movie DVD covers, grocery items, and landmarks demonstrate the technical feasibility of this architecture. We present three prototypes to show how photo-based QA can be built into an online album, a text-based QA, and a mobile application.

Original languageEnglish
Title of host publicationMM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops
Number of pages10
StatePublished - 2008
Event16th ACM International Conference on Multimedia, MM '08 - Vancouver, BC, Canada
Duration: 26 Oct 200831 Oct 2008

Publication series

NameMM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops


Conference16th ACM International Conference on Multimedia, MM '08
CityVancouver, BC


  • Algorithms
  • Design
  • Human factors


Dive into the research topics of 'Photo-based question answering'. Together they form a unique fingerprint.

Cite this