Learning from Descriptive Text
Tamara Berg
University of North Carolina at Chapel Hill, Computer Science Department
People communicate using language, whether spoken, written, or typed. A significant amount of this language describes the world around us, especially the visual world in an environment, or depicted in images or video. In addition there exist billions of photographs with associated text available on the web; examples include web pages, captioned or tagged photographs, and video with scripts or speech. Such visually descriptive language is potentially a rich source of
1) information about the world, especially the visual world,
2) training data for how people construct natural language to describe imagery, and
3) guidance for where computational visual recognition algorithms should focus efforts.
In this talk I will describe several projects related to images and descriptive text, including our recent approaches to automatically generate natural language descriptions, name objects, or create referring expressions for objects in images. In addition I will introduce our new work on collecting descriptions for fill-in-the-blank image description and question-answering.
All papers, created datasets, and demos are available on my webpage at: http://tamaraberg.com/
If you would like to meet with the speaker, please contact Thomas Kleinbauer.