Language and Vision Ambiguities (LAVA) Corpus
Human language understanding relies critically on the ability to obtain unambiguous representations of linguistic content. While some ambiguities can be resolved using intra-linguistic contextual cues, the disambiguation of many linguistic constructions requires the integration of world knowledge and perceptual information obtained from other modalities. The LAVA corpus was created to support research on the problem of grounding language in the visual modality, using a novel task for visual and linguistic understanding that requires resolving linguistic ambiguities using the visual context of the utterance.

The LAVA corpus contains over 200 ambiguous sentences, annotated with syntactic and semantic parses, coupled with videos and images of scenes that depict the different interpretations of each sentence, enabling their disambiguation. The corpus uses a limited lexicon and includes sentences with syntactic ambiguities (prepositional phrase attachment, verb phrase attachment, conjunction), semantic ambiguity (logical form), and discourse ambiguities (anaphora, ellipsis).

URL: http://web.mit.edu/lavacorpus/


Additional Resources:

  • The LAVA Corpus website provides a few examples from the corpus and a download link to access folders of video clips (.avi format), static images, the corpus of language parses (JSON format), and a documentation file. The videos can also be downloaded in .mp4 format.