Grounding language acquisition by training semantic parsersusing captioned videos

Publication TypeConference Paper
Year of Publication2018
AuthorsRoss, C, Barbu, A, Berzak, Y, Myanganbayar, B, Katz, B
Conference NameProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018),
Date Published10/2018
Conference LocationBrussels, Belgium
ISBN Number978-1-948087-84-1

We develop a semantic parser that is trained ina grounded setting using pairs of videos cap-tioned with sentences. This setting is bothdata-efficient, requiring little annotation, andsimilar to the experience of children wherethey observe their environment and listen tospeakers. The semantic parser recovers themeaning of English sentences despite not hav-ing access to any annotated sentences. It doesso despite the ambiguity inherent in visionwhere a sentence may refer to any combina-tion of objects, object properties, relations oractions taken by any agent in a video. For thistask, we collected a new dataset for groundedlanguage acquisition. Learning a grounded se-mantic parser — turning sentences into logi-cal forms using captioned videos — can sig-nificantly expand the range of data that parserscan be trained on, lower the effort of training asemantic parser, and ultimately lead to a betterunderstanding of child language acquisition.

