%0 Conference Paper
%B Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
%D 2022
%T The Aligned Multimodal Movie Treebank: An audio, video, dependency-parse treebank
%A Adam Yaari
%A Jan DeWitt
%A Henry Hu
%A Bennett Stankovits
%A Sue Felshin
%A Yevgeni Berzak
%A Helena Aparicio
%A Boris Katz
%A Ignacio Cases
%A Andrei Barbu
%X <p>Treebanks have traditionally included only text and were derived from written sources such as newspapers or the web. We introduce the Aligned Multimodal Movie Treebank (AMMT), an English language treebank derived from dialog in Hollywood movies which includes transcriptions of the audio-visual streams with word-level alignment, as well as part of speech tags and dependency parses in the Universal Dependencies formalism. AMMT consists of 31,264 sentences and 218,090 words, that will amount to the 3rd largest UD English treebank and the only multimodal treebank in UD. To help with the web-based annotation effort, we also introduce the Efficient Audio Alignment Annotator (EAAA), a companion tool that enables annotators to significantly speed-up their annotation processes.</p>
%B Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
%G eng

%0 Book Section
%B The Wiley Handbook of Human Computer Interaction
%D 2018
%T A Natural Language Interface for Mobile Devices
%A Boris Katz
%A Gary Borchardt
%A Sue Felshin
%A Federico Mora
%X <p>This chapter discusses some of the primary issues related to the design and construction of natural language interfaces, and in particular, interfaces to mobile devices. It describes two systems in this space: the START information access system and the StartMobile natural language interface to mobile devices. The chapter also discusses recently deployed commercial systems and future directions. The use of natural language annotations, and in particular, parameterized natural language annotations, enables START to respond to user requests in a wide variety of ways. StartMobile uses the START system as a first stage in the processing of user requests. Current commercial systems such as Apple's Siri, IBM's Watson, Google's “Google Now”, Microsoft's Cortana, and Amazon's Alexa employ technology of the sort contained in START and StartMobile in combination with statistical ...</p>
%B The Wiley Handbook of Human Computer Interaction
%7 First
%I John Wiley & Sons,
%V 2
%P 539-559
%8 02/2018
%G eng
%R 10.1002/9781118976005.ch23

%0 Conference Paper
%B Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI 2017)
%D 2017
%T Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context
%A Rohan Paul
%A Andrei Barbu
%A Sue Felshin
%A Boris Katz
%A Nicholas Roy
%X <p>A robot’s ability to understand or ground natural language instructions is fundamentally tied to its knowledge about the surrounding world. We present an approach to grounding natural language utter- ances in the context of factual information gathered through natural-language interactions and past vi- sual observations. A probabilistic model estimates, from a natural language utterance, the objects, re- lations, and actions that the utterance refers to, the objectives for future robotic actions it implies, and generates a plan to execute those actions while up- dating a state representation to include newly ac- quired knowledge from the visual-linguistic context. Grounding a command necessitates a representa- tion for past observations and interactions; however, maintaining the full context consisting of all pos- sible observed objects, attributes, spatial relations, actions, etc., over time is intractable. Instead, our model, Temporal Grounding Graphs , maintains a learned state representation for a belief over factual groundings, those derived from natural-language in- teractions, and lazily infers new groundings from visual observations using the context implied by the utterance. This work significantly expands the range of language that a robot can understand by incor- porating factual knowledge and observations of its workspace into its inference about the meaning and grounding of natural-language utterances.</p>
%B Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI 2017)
%C Melbourne, Australia
%8 08/2017
%G eng
%U c

%0 Conference Paper
%B The 2016 Conference on Empirical Methods on Natural Language Processing (EMNLP 2016)
%D 2016
%T Learning to Answer Questions from Wikipedia Infoboxes
%A Alvaro Morales
%A Varot Premtoon
%A Cordelia Avery
%A Sue Felshin
%A Boris Katz
%X <p>A natural language interface to answers on the Web can help us access information more ef- ficiently.&nbsp; We start with an interesting source of information—infoboxes&nbsp; in Wikipedia that summarize factoid knowledge—and develop a comprehensive&nbsp; approach&nbsp; to&nbsp; answering&nbsp; ques- tions&nbsp; with&nbsp; high&nbsp; precision.&nbsp;&nbsp;&nbsp; We&nbsp; first&nbsp; build&nbsp; a system to access data in infoboxes in a struc- tured manner. We use our system to construct a crowdsourced dataset of over 15,000 high- quality,&nbsp; diverse&nbsp; questions.&nbsp;&nbsp; With&nbsp; these&nbsp; ques- tions, we train a convolutional neural network model&nbsp; that&nbsp; outperforms&nbsp; models&nbsp; that&nbsp; achieve top results in similar answer selection tasks.</p>
%B The 2016 Conference on Empirical Methods on Natural Language Processing (EMNLP 2016)
%G eng