C. Ross, Berzak, Y., Katz, B., and Barbu, A., “Learning Language from Vision.”, in Workshop on Visually Grounded Interaction and Language (ViGIL) at the Thirty-third Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver Convention Center, Vancouver, Canada, 2019.