Publication
Grounding language acquisition by training semantic parsersusing captioned videos. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), (2018). at <http://aclweb.org/anthology/D18-1285>
Ross-et-al_ACL2018_Grounding language acquisition by training semantic parsing using caption videos.pdf (3.5 MB)
Learning Language from Vision. Workshop on Visually Grounded Interaction and Language (ViGIL) at the Thirty-third Annual Conference on Neural Information Processing Systems (NeurIPS) (2019).
Modeling Visual Impairments with Artificial Neural Networks: a Review. International Conference on Computer Vision 2023 (2023). at <https://openaccess.thecvf.com/content/ICCV2023W/ACVR/html/Schiatti_Modeling_Visual_Impairments_with_Artificial_Neural_Networks_a_Review_ICCVW_2023_paper.html>
Multi-resolution modeling of a discrete stochastic process identifies causes of cancer. International Conference on Learning Representations (2021). at <https://openreview.net/forum?id=KtH8W3S_RE>
Partially Occluded Hands: A challenging new dataset for single-image hand pose estimation. The 14th Asian Conference on Computer Vision (ACCV 2018) (2018). at <http://accv2018.net/>
partially-occluded-hands-6.pdf (8.29 MB)
PHASE: PHysically-grounded Abstract Social Eventsfor Machine Social Perception. Shared Visual Representations in Human and Machine Intelligence (SVRHM) workshop at NeurIPS 2020 (2020). at <https://openreview.net/forum?id=_bokm801zhx>
phase_physically_grounded_abstract_social_events_for_machine_social_perception.pdf (2.49 MB)
. Seeing What You’re Told: Sentence-Guided Activity Recognition In Video. CVPR (IEEE, 2014).
Publication (453.54 KB)
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset. Interspeech 2021 (2021). doi:10.21437/Interspeech.2021
Spontaneous sign emergence in humans and machines through an embodied communication game. JCoLE Workshop (2022).
Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI 2017) (2017). at <c>
On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation. Shared Visual Representations in Human and Machine Intelligence (SVRHM) Workshop at NeurIPS (2021). at <https://openreview.net/forum?id=Rpazl253IHb>
Using Multimodal DNNs to Study Vision-Language Integration in the Brain. ICLR 2023 (2023). at <https://openreview.net/pdf?id=OQQ1p0pFP4>
Zero-shot linear combinations of grounded social interactions with Linear Social MDPs. Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI) (2023).
Deep Compositional Robotic Planners that Follow Natural Language Commands. Workshop on Visually Grounded Interaction and Language (ViGIL) at the Thirty-third Annual Conference on Neural Information Processing Systems (NeurIPS), (2019). at <https://vigilworkshop.github.io/>
BrainBERT: Self-supervised representation learning for Intracranial Electrodes. International Conference on Learning Representations (2023). at <https://openreview.net/forum?id=xmcYx_reUn6>
985_brainbert_self_supervised_repr.pdf (9.71 MB)
Measuring Social Biases in Grounded Vision and Language Embeddings. NAACL (Annual Conference of the North American Chapter of the Association for Computational Linguistics) (2021).
. Seeing what you're told, sentence guided activity recognition in video. Appeared at CVPR (2014).
poster-1701.pdf (4.61 MB)
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Neural Information Processing Systems (NeurIPS 2019) (2019).
9142-objectnet-a-large-scale-bias-controlled-dataset-for-pushing-the-limits-of-object-recognition-models.pdf (16.31 MB)
Language and Vision Ambiguities (LAVA) Corpus. (2016). at <http://web.mit.edu/lavacorpus/>
D15-1172.pdf (2.42 MB)
. A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video. (2015). doi:doi:10.1613/jair.4556
Compositional RL Agents That Follow Language Commands in Temporal Logic. Frontiers in Robotics and AI 8, (2021).
frobt-08-689550.pdf (1.57 MB)
Deep video-to-video transformations for accessibility with an application to photosensitivity. Pattern Recognition Letters (2019). doi:10.1016/j.patrec.2019.01.019
]