%0 Conference Paper %B Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing %D 2022 %T The Aligned Multimodal Movie Treebank: An audio, video, dependency-parse treebank %A Adam Yaari %A Jan DeWitt %A Henry Hu %A Bennett Stankovits %A Sue Felshin %A Yevgeni Berzak %A Helena Aparicio %A Boris Katz %A Ignacio Cases %A Andrei Barbu %X

Treebanks have traditionally included only text and were derived from written sources such as newspapers or the web. We introduce the Aligned Multimodal Movie Treebank (AMMT), an English language treebank derived from dialog in Hollywood movies which includes transcriptions of the audio-visual streams with word-level alignment, as well as part of speech tags and dependency parses in the Universal Dependencies formalism. AMMT consists of 31,264 sentences and 218,090 words, that will amount to the 3rd largest UD English treebank and the only multimodal treebank in UD. To help with the web-based annotation effort, we also introduce the Efficient Audio Alignment Annotator (EAAA), a companion tool that enables annotators to significantly speed-up their annotation processes.

%B Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing %G eng %0 Conference Paper %B Workshop on Visually Grounded Interaction and Language (ViGIL) at the Thirty-third Annual Conference on Neural Information Processing Systems (NeurIPS) %D 2019 %T Learning Language from Vision. %A Candace Ross %A Yevgeni Berzak %A Boris Katz %A Andrei Barbu %B Workshop on Visually Grounded Interaction and Language (ViGIL) at the Thirty-third Annual Conference on Neural Information Processing Systems (NeurIPS) %C Vancouver Convention Center, Vancouver, Canada %8 12/2019 %G eng %0 Conference Proceedings %B 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies %D 2018 %T Assessing Language Proficiency from Eye Movements in Reading %A Yevgeni Berzak %A Boris Katz %A Roger Levy %K Computation %K language %B 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies %C New Orleans %8 06/2018 %G eng %U http://naacl2018.org/ %0 Conference Paper %B Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), %D 2018 %T Grounding language acquisition by training semantic parsersusing captioned videos %A Candace Ross %A Andrei Barbu %A Yevgeni Berzak %A Battushig Myanganbayar %A Boris Katz %X

We develop a semantic parser that is trained ina grounded setting using pairs of videos cap-tioned with sentences. This setting is bothdata-efficient, requiring little annotation, andsimilar to the experience of children wherethey observe their environment and listen tospeakers. The semantic parser recovers themeaning of English sentences despite not hav-ing access to any annotated sentences. It doesso despite the ambiguity inherent in visionwhere a sentence may refer to any combina-tion of objects, object properties, relations oractions taken by any agent in a video. For thistask, we collected a new dataset for groundedlanguage acquisition. Learning a grounded se-mantic parser — turning sentences into logi-cal forms using captioned videos — can sig-nificantly expand the range of data that parserscan be trained on, lower the effort of training asemantic parser, and ultimately lead to a betterunderstanding of child language acquisition.

%B Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), %C Brussels, Belgium %8 10/2018 %@ 978-1-948087-84-1 %G eng %U http://aclweb.org/anthology/D18-1285 %0 Conference Paper %B Annual Meeting of the Association for Computational Linguistics (ACL 2017) %D 2017 %T Predicting Native Language from Gaze %A Yevgeni Berzak %A Chie Nakamura %A Suzanne Flynn %A Boris Katz %B Annual Meeting of the Association for Computational Linguistics (ACL 2017) %G eng %0 Generic %D 2016 %T Anchoring and Agreement in Syntactic Annotations %A Yevgeni Berzak %A Yan Huang %A Andrei Barbu %A Anna Korhonen %A Boris Katz %X

Published in the Proceedings of EMNLP 2016

We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is a well-known cognitive bias in human decision making, where judgments are drawn towards preexisting values. We study the influence of anchoring on a standard approach to creation of syntactic resources where syntactic annotations are obtained via human editing of tagger and parser output. Our experiments demonstrate a clear anchoring effect and reveal unwanted consequences, including overestimation of parsing performance and lower quality of annotations in comparison with human-based annotations. Using sentences from the Penn Treebank WSJ, we also report systematically obtained inter-annotator agreement estimates for English dependency parsing. Our agreement results control for parser bias, and are consequential in that they are on par with state of the art parsing performance for English newswire. We discuss the impact of our findings on strategies for future annotation efforts and parser evaluations.

%8 09/2016 %1

https://arxiv.org/abs/1605.04481

%2

http://hdl.handle.net/1721.1/104453

%0 Generic %D 2016 %T Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL %A Yevgeni Berzak %A Roi Reichart %A Boris Katz %X

This work examines the impact of crosslinguistic transfer on grammatical errors in English as Second Language (ESL) texts. Using a computational framework that formalizes the theory of Contrastive Analysis (CA), we demonstrate that language specific error distributions in ESL writing can be predicted from the typological properties of the native language and their relation to the typology of English. Our typology driven model enables to obtain accurate estimates of such distributions without access to any ESL data for the target languages. Furthermore, we present a strategy for adjusting our method to low-resource languages that lack typological documentation using a bootstrapping approach which approximates native language typology from ESL texts. Finally, we show that our framework is instrumental for linguistic inquiry seeking to identify first language factors that contribute to a wide range of difficulties in second language acquisition.

%8 07/2015 %1

arXiv:1603.07609v1 [cs.CL]

%2

http://hdl.handle.net/1721.1/103398

%0 Generic %D 2016 %T Do You See What I Mean? Visual Resolution of Linguistic Ambiguities %A Yevgeni Berzak %A Andrei Barbu %A Daniel Harari %A Boris Katz %A Shimon Ullman %X

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.

%8 09/2016 %1

arXiv:1603.08079v1 [cs.CV]

%2

http://hdl.handle.net/1721.1/103400

%0 Generic %D 2016 %T Language and Vision Ambiguities (LAVA) Corpus %A Yevgeni Berzak %A Andrei Barbu %A Daniel Harari %A Boris Katz %A Shimon Ullman %X

Ambiguity is one of the defining characteristics of human languages, and language understanding crucially relies on the ability to obtain unambiguous representations of linguistic content. While some ambiguities can be resolved using intra-linguistic contextual cues, the disambiguation of many linguistic constructions requires integration of world knowledge and perceptual information obtained from other modalities. In this work, we focus on the problem of grounding language in the visual modality, and introduce a novel task for visual and linguistic understanding which requires resolving linguistic ambiguities by utilizing the visual context of the utterance.

To address this challenge, we release the Language and Vision Ambiguities (LAVA) corpus. LAVA contains ambiguous sentences coupled with visual scenes that depict the different interpretations of each sentence. The sentences in the corpus are annotated with syntactic and semantic parses, and cover a wide range of linguistic ambiguities, including PP and VP attachment, conjunctions, logical forms, anaphora and ellipsis. In addition to the sentence disambiguation challenge, the corpus will support a variety of related tasks which use natural language as a medium for expressing visual understanding.

Reference:
Yevgeni Berzak, Andrei Barbu, Daniel Harari, Boris Katz, and Shimon Ullman (2015). Do You See What I Mean? Visual Resolution of Linguistic Ambiguities. Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal. [PDF]

Download all of the clips in MP4 format (ZIP)

%8 01/2016 %U http://web.mit.edu/lavacorpus/ %0 Generic %D 2016 %T Treebank of Learner English (TLE) %A Yevgeni Berzak %A Jessica Kenney %A Carolyn Spadine %A Jing Xian Wang %A Lucia Lam %A Keiko Sophie Mori %A Sebastian Garza %A Boris Katz %X

The majority of the English text available worldwide is generated by non-native speakers. Learner language introduces a variety of challenges and is of paramount importance for the scientific study of language acquisition as well as for Natural Language Processing. Despite the ubiquity of non-native English, there has been no publicly available syntactic treebank for English as a Second Language (ESL). To address this shortcoming, we released the Treebank of Learner English (TLE), a first of its kind resource for non-native English, containing 5,124 sentences manually annotated with Part of Speech (POS) tags and syntactic dependency trees. Full syntactic analyses are provided for both the original and error corrected versions of each sentence. We also introduced annotation guidelines that allow for consistent syntactic treatment of ungrammatical English. We envision the treebank to support a wide range of linguistic and computational research on language learning as well as automatic processing of ungrammatical language.

%8 08/2016 %U http://esltreebank.org/ %0 Generic %D 2016 %T Universal Dependencies for Learner English %A Yevgeni Berzak %A Jessica Kenney %A Carolyn Spadine %A Jing Xian Wang %A Lucia Lam %A Keiko Sophie Mori %A Sebastian Garza %A Boris Katz %X

We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. The UD annotations are tied to a pre-existing error annotation of the FCE, whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence. Further on, we delineate ESL annotation guidelines that allow for consistent syntactic treatment of ungrammatical English. Finally, we benchmark POS tagging and dependency parsing performance on the TLE dataset and measure the effect of grammatical errors on parsing accuracy. We envision the treebank to support a wide range of linguistic and computational research o n second language acquisition as well as automatic processing of ungrammatical language.

%8 06/2016 %1

arXiv:1605.04278v2 [cs.CL]

%2

http://hdl.handle.net/1721.1/103401

%0 Conference Paper %B Nineteenth Conference on Computational Natural Language Learning (CoNLL), Beijing, China %D 2015 %T Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL %A Yevgeni Berzak %A Roi Reichart %A Boris Katz %X

This work examines the impact of cross- linguistic transfer on grammatical errors in English as Second Language (ESL) texts. Using a computational framework that for- malizes the theory of Contrastive Analy- sis (CA), we demonstrate that language specific error distributions in ESL writ- ing can be predicted from the typologi- cal properties of the native language and their relation to the typology of English. Our typology driven model enables to ob- tain accurate estimates of such distribu- tions without access to any ESL data for the target languages. Furthermore, we present a strategy for adjusting our method to low-resource languages that lack typo- logical documentation using a bootstrap- ping approach which approximates native language typology from ESL texts. Fi- nally, we show that our framework is in- strumental for linguistic inquiry seeking to identify first language factors that con- tribute to a wide range of difficulties in second language acquisition

%B Nineteenth Conference on Computational Natural Language Learning (CoNLL), Beijing, China %8 07/31/2015 %0 Conference Paper %B Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. %D 2015 %T Do You See What I Mean? Visual Resolution of Linguistic Ambiguities %A Yevgeni Berzak %A Andrei Barbu %A Daniel Harari %A Boris Katz %A Shimon Ullman %B Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. %8 09/2015 %G eng %0 Generic %D 2014 %T Reconstructing Native Language Typology from Foreign Language Usage. %A Yevgeni Berzak %A Roi Reichart %A Boris Katz %K language %K linguistics %K Visual Intelligence %X

Linguists and psychologists have long been studying cross-linguistic transfer, the influence of native language properties on linguistic performance in a foreign language. In this work we provide empirical evidence for this process in the form of a strong correlation between language similarities derived from structural features in English as Second Language (ESL) texts and equivalent similarities obtained directly from the typological features of the native languages. We leverage this finding to recover native language typological similarity structure directly from ESL text, and perform prediction of typological features in an unsupervised fashion with respect to the target languages. Our method achieves 72.2% accuracy on the typology prediction task, a result that is highly competitive with equivalent methods that rely on typological resources.

%8 04/2014 %1

arXiv:1404.6312v1

%2

http://hdl.handle.net/1721.1/100171