
Andrei Barbu and Daniel Harari; CBMM Thrust 3 – Visual Intelligence
Progress on the CBMM challenge questions: What is there? and Who is there?
Title: Thrust 3: Vision and language
Abstract: A fundamental human ability is to communicate with others about what we are perceiving and to change our understanding of the world when others are communicating with us. To understand this capability, we (Thrust 3) take the CBMM Turing test literally by developing approaches to answering queries about images and videos.
Answering questions requires that we transfer knowledge between modalities. We start with a question posed in natural language, we connect our understanding of the question with the scene we are perceiving, we discover the answer to the question, and then take that perceptual knowledge and encode it back into language. The uncanny human ability to easily transfer knowledge between modalities points to the conjecture that human representations may be modality independent, and searching for such representations will produce approaches that are more cognitively plausible.
This week’s talk will focus on vision and how it connects to language. We will describe how representations in both the language and vision communities fall short of bridging the two. A new representation which helps bridge part of the gap between vision and language will be proposed. We will present our research ideas on question answering, language learning, and connections to other vision and language tasks which are key to human cognition.
Details
52 Oxford Street, Harvard University Northwest Building, Cambridge, 02138