The computer vision system consists of a feature computation module and a classification module.
Feature computation module
( a ) A background subtraction procedure is first applied to an input video to compute a foreground mask for pixels belonging to the animal versus the cage.
( b ) A subwindow centred on the animal is cropped from each video frame based on the location of the mouse. To speed up the computation, motion features are extracted from the subwindow ( b ) only.
( c ) Space-time motion features are derived from combinations of the response of afferent units that are tuned to different directions of motion as found in the primate primary visual cortex ( d , e ). This is based on the work of A Biologically Inspired System for Action Recognition, and closely related to a hierarchical model for object recognition
( f ) Position- and velocity-based features are derived from the instantaneous location of the animal in a cage. These features are computed from a bounding box tightly surrounding the animal in the foreground mask.
Classification module
( g ) The output of this feature computation module consists of 310 features per frame that are then passed to a statistical classifier, SVMHMM (Hidden Markov Model Support Vector Machine), to reliably classify every frame of a video sequence into a behaviour of interest.
( h ) An ethogram of the sequence of labels predicted by the system from a 24-h continuous recording session for one of the CAST / EiJ mice. The red panel shows the ethogram for 24 h, and the light blue panel provides a zoomed version corresponding to the first 30 min of recording. The animal is highly active, as it was just placed in a new cage before starting the video recording. The animal’s behaviour alternates between ‘ walking ’ , ‘ rearing ’ and ‘ hanging ’ as it explores the new cage.