Publication
Learning Language from Vision. Workshop on Visually Grounded Interaction and Language (ViGIL) at the Thirty-third Annual Conference on Neural Information Processing Systems (NeurIPS) (2019).
Grounding language acquisition by training semantic parsersusing captioned videos. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), (2018). at <http://aclweb.org/anthology/D18-1285>
Ross-et-al_ACL2018_Grounding language acquisition by training semantic parsing using caption videos.pdf (3.5 MB)
Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020). doi:10.1109/IROS45743.2020.9341325
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities. Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. (2015).
Deep sequential models for sampling-based planning. The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018) (2018). doi:10.1109/IROS.2018.8593947
kuo2018planning.pdf (637.67 KB)
Deep compositional robotic planners that follow natural language commands . International Conference on Robotics and Automation (ICRA) (2020).
The Aligned Multimodal Movie Treebank: An audio, video, dependency-parse treebank. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (2022).
Trajectory Prediction with Linguistic Representations. (2022).
CBMM-Memo-132.pdf (1.15 MB)
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset. (2021).
CBMM-Memo-128.pdf (2.91 MB)
Social Interactions as Recursive MDPs. (2021).
CBMM-Memo-130.pdf (1.52 MB)
. Seeing What You’re Told: Sentence-Guided Activity Recognition In Video. (2014).
CBMM-Memo-006.pdf (1.2 MB)
Seeing is Worse than Believing: Reading People’s Minds Better than Computer-Vision Methods Recognize Actions. (2014).
CBMM Memo 012.pdf (678.95 KB)
PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception. (2021).
CBMM-Memo-123.pdf (3.08 MB)
Partially Occluded Hands: A challenging new dataset for single-image hand pose estimation. (2018).
CBMM-Memo-097.pdf (8.53 MB)
Neural Regression, Representational Similarity, Model Zoology Neural Taskonomy at Scale in Rodent Visual Cortex. (2021).
CBMM-Memo-131.pdf (9.37 MB)
Measuring Social Biases in Grounded Vision and Language Embeddings. (2021).
CBMM-Memo-126.pdf (1.32 MB)
Learning a natural-language to LTL executable semantic parser for grounded robotics. (2020). doi:https://doi.org/10.48550/arXiv.2008.03277
CBMM-Memo-122.pdf (1.03 MB)
Incorporating Rich Social Interactions Into MDPs. (2022).
CBMM-Memo-133.pdf (1.68 MB)
Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas. (2020).
CBMM-Memo-125.pdf (2.12 MB)
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities. (2016).
memo-51.pdf (2.74 MB)
Deep compositional robotic planners that follow natural language commands. (2020).
CBMM-Memo-124.pdf (1.03 MB)
Compositional RL Agents That Follow Language Commands in Temporal Logic. (2021).
CBMM-Memo-127.pdf (2.12 MB)
Compositional Networks Enable Systematic Generalization for Grounded Language Understanding. (2021).
CBMM-Memo-129.pdf (1.2 MB)
The Compositional Nature of Event Representations in the Human Brain. (2014).
CBMM Memo 011.pdf (3.95 MB)
Anchoring and Agreement in Syntactic Annotations. (2016).
CBMM-Memo-055.pdf (768.54 KB)
]