Human language understanding relies critically on the ability to obtain unambiguous representations of linguistic content. While some ambiguities can be resolved using intra-linguistic contextual cues, the disambiguation of many linguistic constructions requires the integration of world knowledge and perceptual information obtained from other modalities. The LAVA corpus was created to support research on the problem of grounding language in the visual modality, using a novel task for visual and linguistic understanding that requires resolving linguistic ambiguities using the visual context of the utterance.
The LAVA corpus contains over 200 ambiguous sentences, annotated with syntactic and semantic parses, coupled with videos and images of scenes that depict the different interpretations of each sentence, enabling their disambiguation. The corpus uses a limited lexicon and includes sentences with syntactic ambiguities (prepositional phrase attachment, verb phrase attachment, conjunction), semantic ambiguity (logical form), and discourse ambiguities (anaphora, ellipsis).
- Berzak, Y., Barbu, A., Harari, D. & Ullman, S. (2015) Do you see what I mean? Visual resolution of linguistic ambiguities, Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal.
- Yevgeni Berzak, Disambiguating language through vision (3:05)
- Andrei Barbu, From language to vision and back again (1:05:05)