Spatio-temporal convolutional networks explain neural representations of human actions