From Motor Control to Scene Perception: Using Machine Learning to Study Human Behavior and Cognition

From Motor Control to Scene Perception: Using Machine Learning to Study Human Behavior and Cognition

Date Posted:  December 7, 2017
Date Recorded:  December 1, 2017
Speaker(s):  Akram Bayat

Akram Bayat, UMass Boston


In this presentation, as part of my work at UMass Boston, two dimensions of implementing machine learning algorithms for solving two important real world problems are discussed. In the first part, we model human eye movements in order to identify different individuals during reading activity. As an important part of our pattern recognition process we extract multiple low-level features in the scan path including fixation features, saccadic features, pupillary response features, and spatial reading features.

While capturing eye movement during reading is desirable because it is a very common task, the text content influences the reading process, making it very challenging to obtain invariant features from eye-movement data. We address this issue with a novel idea for a user identification algorithm that benefits from extracting high level features that combines eye movements with syntactic and semantic word relationships in a text. The promising results of our identification method make eye-movement based identification an excellent approach for various applications such as personalized user interfaces.

The second part of my work focuses on scene perception and object recognition using deep convolutional neural networks. We investigate to which extent computer vision based systems for scene classification and object recognition resemble human mechanisms for scene perception. Employing global properties for scene classification, scene grammar, and top-down control of visual attention for object detection are three methodologies which we evaluate in humans and deep convolutional networks. We also evaluate the performance of deep object recognition networks (e.g., Faster R-CNN) under various conditions of image filtering in the frequency domain and compare it with the human visual system in terms of internal representation.  We then show that fine-tuning the Faster-RCNN to filtered data improves network performance over a range of spatial frequencies.