cns('run')
to process it, and then retrieves the result. The CNS model itself does not need to store multiple frames or even "know" about time.
In principle, one could treat time just like any other dimension over which convolution is performed. The problem with this is: the time dimension is often very large. An entire video may not fit into GPU memory. Video data typically needs to be handled in a streaming manner.
A seemingly simple way to do streaming would be: define a CNS model that will hold a fixed, manageable number of frames in GPU memory, then write outer-loop MATLAB code to break the video into blocks of frames and push the blocks through CNS one block at a time. However, this turns out to be nontrivial when you get into the details.
CNS's temporal shift feature takes care of all these problems for you.
Time is measured in "units": by definition, input frames are one unit apart. Before processing starts, the current time is zero, and all frames of all layers are marked "invalid" -- no input frames have been loaded, and nothing has been computed yet.
As time advances, the number of frames in each layer remains the same, but frames representing the oldest times are dropped, and new frames are loaded (or computed) for newer times. Consider layer 6, having eight frames numbered 1-8. At time now=28, frame index #8 contains a newly-computed frame centered at t=24.
As time advances further, the frame centered at t=24 gets shifted to the left within the layer, into progressively smaller indices, finally winding up in frame index #1 at time now=42. Two time steps later, the frame has been dropped entirely.
Current time: Layer 6: temporal center of frame index #1: #2: #3: #4: #5: #6: #7: #8: now=28
10
12
14
16
18
20
22
24
now=30
12
14
16
18
20
22
24
26
now=32
14
16
18
20
22
24
26
28
now=34
16
18
20
22
24
26
28
30
now=36
18
20
22
24
26
28
30
32
now=38
20
22
24
26
28
30
32
34
now=40
22
24
26
28
30
32
34
36
now=42
24
26
28
30
32
34
36
38
now=44
26
28
30
32
34
36
38
40
Now consider the model as a whole. Note that for different layers, the most recent frame is often centered at a different time. Higher layers tend to lag behind lower ones. This occurs for two reasons.
A given input frame never needs to be loaded twice.
Loading multiple frames. In the above animation, only a single frame is loaded at a time. In practice, one specifies the maximum number of frames one wishes to load at once, and CNS will automatically determine the required size of every layer such that the above-mentioned constraints are all satisfied. Loading many input frames at once can generate many output frames at once.
dmap
property. For the temporal dimension, set dmap = 2
. For example:
Note:p.dnames = {'f' 't' 'y' 'x'}; p.dims = {1 2 1 2}; p.dparts = {2 2 1 1}; p.dmap = [0 2 1 1];
dims = 2
and have the highest value of dparts
for internal dimension 2.
cns_mapdim
function.
cns_mapdim
has two modes designed specifically for temporal mapping:
Mode | Usage |
'temp1' |
Used for input layers, i.e., layers you are going to load data into, not ones that get computed. Syntax: Additional arguments:
|
'temp2' |
Used for layers that compute their results from other layers, i.e., probably most of the layers in any model. Syntax: Additional arguments: Together, pzs(1) , rfSizes(1) , and rfStep determine the temporal mapping of layer z . cns_mapdim checks any additional elements pzs(2:end) and rfSizes(2:end) to ensure that those layers will always retain enough frames in memory to compute frames in layer z .
|
'val'
field that is either loaded as input or computed.
New cns
commands/options, specific to temporal shifting, are shown in red and described after the example script.
The new
m = ...; % build model, including mapping the time dimension z_in = ...; % input layer number z_out = ...; % output layer number cns('init', m); while true frames = ...; % Acquire new frame(s). if isempty(frames), break; end n = size(frames, 2); % Shift the temporal dimension forward by n units. This will drop some % frames and create free space for new frames in some (possibly all) layers. cns('shift', n); % Load the new frames into the free space we just created in the input layer. cns('set', z_in, 'val', [], 0, [], [], frames); % Compute new frames for non-input layers. cns('run'); % Retrieve new frame(s) we just computed for the output layer (if any). output = cns('get', z_out, 'val', [], 0, [], []); if ~isempty(output) % Do something with the output frame(s). ... end end cns('done');
cns
commands (or new options for existing commands) used for temporal shifting are as follows.
Command | Description |
'shift' |
This is a new command which shifts all layers forward along their time dimension, dropping old frames and freeing space for new ones, as illustrated above. Syntax: Arguments: Performance note: the shift command does not actually move the contents of GPU memory. An efficient circular buffer technique, transparent to the CNS programmer, is employed. |
'run' 'step' |
The syntax for the cns('run') and cns('step') commands is unchanged, but when there is a temporal dimension, only the new frames for each layer are computed.
Note that this means if you call |
'get' 'set' |
The cns('get') and cns('set') commands have only one new option relating to temporal shifting: the special index 0 for the temporal dimension means "all new frames". For example, if a layer has 8 frames, and the most recent cns('shift') caused it to shift by 3 frames, then asking for temporal index 0 for this layer is the same as asking for temporal indices 6:8 .
|