Searching documentation using text, OCR, and image

Tom Yeh, Boris Katz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We describe a mixed-modality method to index and search software documentation in three ways: plain text, OCR text of embedded figures, and visual features of these figures. Using a corpus of 102 computer books with a total of 62,943 pages and 75,800 figures, we empirically demonstrate that our method achieves better precision/recall than do alternatives based on single modalities.

Original languageEnglish
Title of host publicationProceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
Pages776-777
Number of pages2
DOIs
StatePublished - 2009
Event32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009 - Boston, MA, United States
Duration: 19 Jul 200923 Jul 2009

Publication series

NameProceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009

Conference

Conference32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
Country/TerritoryUnited States
CityBoston, MA
Period19/07/0923/07/09

Keywords

  • Computer vision
  • Content-based image retrieval
  • Multimodal search

Fingerprint

Dive into the research topics of 'Searching documentation using text, OCR, and image'. Together they form a unique fingerprint.

Cite this