We have constructed techniques for describing videos with natural language sentences. Building on this work, we are going beyond description to answering questions such as: What is the person on the left doing with the blue object? This work takes as input a natural-language question and produces a natural-language answer. We are striving to create an approach that will make it possible for a system to understand and answer a variety of questions, rather than constructing individual systems for each question type (who is there?, what are they doing?, where are they?, etc.).