Andrei Barbu, Research Scientist at MIT, discusses using language to understand vision and vision to understand language. He shows how the simple ability to compare an English sentence and a video clip can form the basis for many tasks such as recognition, image and video retrieval, generation of video captions, question answering, language disambiguation, language learning, paraphrasing, translation between languages, and planning.
- Andrei Barbu’s website
- Berzak, Y., Barbu, A., Harari, D., Katz, B. & Ullman, S. (2015) Do you see what I mean? Visual resolution of linguistic ambiguities, Conference on Empirical Methods in Natural Language Processing, September.
- Yu, H., Siddharth, N., Barbu, A. & Siskind, J. M. (2015) A compositional framework for grounding language inference, generation, and acquisition in video. Journal of Artificial Intelligence Research 52:601-713.