您的位置:首页 > 其它

Notes on MatConvNet(II):vl_simplenn

2016-04-15 13:53 267 查看

Written before

This blog is the second one of the series of Notes on MatConvNet.

Notes on MatConvNet(I) – Overview

Here I will mainly introduce the central of the matcovnet—vl_simplenn.

This function plays a quite important role in forward-propagation and backward-propagation.

PS. I have to admit that writing blogs in Chinese would bring much more traffic than writing English blogs. Yet, if I care too much about vain trifles, it is a sad thing.

Something that should be known before

I only introduce BP. How does the derivatives propagate backforward? I sincerely recommend you to read BP Algorithm. This blog is a good beginning of BP. When you have finished reading that, The first blog of this series is also recommended. I will repeat the main computation structure illustrated in Notes(I).

Computation Structure



I make some default rules here.

y always represents the output of certain layer. That is to say when it comes to layer i, then y is the output of layer i.

z always represents the output of the whole net, or rather, it represents the output of the final layer n.

x represents the input of certain layer.

In order to make things more easier, matcovnet takes one simple function as a “layer”. This means when the input goes through a computation strcuture(no matter it is conv structure or just a relu structure), it does computation like the following:

dzdx=f′(x)∗dzdy(1)

dzdx=f′w(x)∗dzdy(2) condition

Note:

- condition means that formula (2) only computes when the layer contains weights computation.

- f′(x) means the derivative of the output with respect to the input x.

- f′w(x) means the derivative of the output with respect to the weights w.

- It is a little different from BP Algorithm where it just takes conv or full connected layer as computation structures. However when you include activations(such as sigmoid, relu etc) and other functional structures(dropout, LRN, ect) into computation structures, you will find it quite easy. It is because every computation part only takes responsible for its own input and output. Every time, it gets an dzdy and its input x, it calculate dzdx using (1). If it represents conv or full connected layers or other layers which just take weights into computation, you have to do more computation with formula (2), for you need to get newer weights to update them. In fact it is where our goal is.

Taking a look at vl_simplenn

The result format

res(i+1).x
: the output of layer
i
. Hence
res(1).x
is the

network input.

res(i+1).dzdx
: the derivative of the network output relative

to the output of layer
i
. In particular
res(1).dzdx
is the

derivative of the network output with respect to the network

input.

res(i+1).dzdw
: a cell array containing the derivatives of the

network output relative to the parameters of layer
i
. It can

be a cell array for multiple parameters.

Note: When it comes to the layer i, y means res(i+1).x . For y is the output of a certain layer, and res(i+1).x is indeed the input of layer i+1, so it shows in this type. And if the layer is i,
res(i+1).dzdx
has the same meaning of
dzdy
.

Main types you may use

res = vl_simplenn(net,x); (1)

res = vl_simplenn(net,x,dzdy); (2)

res = vl_simplenn(net, x,dzdy, res, opt, val) (3)

(1) is just forward computation.

(2) is used when back propagation. It is mainly used to compute the derivatives of input or the weights with respect the net’s output
z
.

(3) is used in
cnn_train
. It adds some opts but I do not introduce them here.

...
% codes before 'Forward pass' is easy and deserves no explanations.
% -------------------------------------------------------------------------
%                                                              Forward pass
% -------------------------------------------------------------------------

for i=1:n
if opts.skipForward, break; end;
l = net.layers{i} ;
res(i).time = tic ;
switch l.type
case 'conv'
res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ...
'pad', l.pad, ...
'stride', l.stride, ...
l.opts{:}, ...
cudnn{:}) ;

case 'convt'
res(i+1).x = vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, ...
'crop', l.crop, ...
'upsample', l.upsample, ...
'numGroups', l.numGroups, ...
l.opts{:}, ...
cudnn{:}) ;

case 'pool'
res(i+1).x = vl_nnpool(res(i).x, l.pool, ...
'pad', l.pad, 'stride', l.stride, ...
'method', l.method, ...
l.opts{:}, ...
cudnn{:}) ;

case {'normalize', 'lrn'}
res(i+1).x = vl_nnnormalize(res(i).x, l.param) ;

case 'softmax'
res(i+1).x = vl_nnsoftmax(res(i).x) ;

case 'loss'
res(i+1).x = vl_nnloss(res(i).x, l.class) ;

case 'softmaxloss'
res(i+1).x = vl_nnsoftmaxloss(res(i).x, l.class) ;

case 'relu'
if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; end
res(i+1).x = vl_nnrelu(res(i).x,[],leak{:}) ;

case 'sigmoid'
res(i+1).x = vl_nnsigmoid(res(i).x) ;

...

otherwise
error('Unknown layer type ''%s''.', l.type) ;
end


Codes above show the forward propagation’s main idea.

% optionally forget intermediate results
forget = opts.conserveMemory & ~(doder & n >= backPropLim) ;
if i > 1
lp = net.layers{i-1} ;
% forget RELU input, even for BPROP
% forget为是否保留中间结果res{i+1}.x, net.layers.precious
% 为true则保留中间结果
forget = forget & (~doder | (strcmp(l.type, 'relu') & ~lp.precious)) ;
forget = forget & ~(strcmp(lp.type, 'loss') || strcmp(lp.type, 'softmaxloss')) ;
forget = forget & ~lp.precious ;
end
if forget  %不保存就让这一层的输入置为空
res(i).x = [] ;
end

if gpuMode && opts.sync
wait(gpuDevice) ;
end
res(i).time = toc(res(i).time) ;
end


Backward pass

It seems no explanations are the best explanations. Because it is quite easy.

% -------------------------------------------------------------------------
%                                                             Backward pass
% -------------------------------------------------------------------------

if doder
res(n+1).dzdx = dzdy ;
for i=n:-1:max(1, n-opts.backPropDepth+1)
l = net.layers{i} ;
res(i).backwardTime = tic ;
switch l.type

case 'conv'
[res(i).dzdx, dzdw{1}, dzdw{2}] = ...
vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ...
'pad', l.pad, ...
'stride', l.stride, ...
l.opts{:}, ...
cudnn{:}) ;

case 'convt'
[res(i).dzdx, dzdw{1}, dzdw{2}] = ...
vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ...
'crop', l.crop, ...
'upsample', l.upsample, ...
'numGroups', l.numGroups, ...
l.opts{:}, ...
cudnn{:}) ;

case 'pool'
res(i).dzdx = vl_nnpool(res(i).x, l.pool, res(i+1).dzdx, ...
'pad', l.pad, 'stride', l.stride, ...
'method', l.method, ...
l.opts{:}, ...
cudnn{:}) ;

case {'normalize', 'lrn'}
res(i).dzdx = vl_nnnormalize(res(i).x, l.param, res(i+1).dzdx) ;

case 'softmax'
res(i).dzdx = vl_nnsoftmax(res(i).x, res(i+1).dzdx) ;

case 'loss'
res(i).dzdx = vl_nnloss(res(i).x, l.class, res(i+1).dzdx) ;

case 'softmaxloss'
res(i).dzdx = vl_nnsoftmaxloss(res(i).x, l.class, res(i+1).dzdx) ;

case 'relu'
if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; end
if ~isempty(res(i).x)
res(i).dzdx = vl_nnrelu(res(i).x, res(i+1).dzdx, leak{:}) ;
else
% if res(i).x is empty, it has been optimized away, so we use this
% hack (which works only for ReLU):
res(i).dzdx = vl_nnrelu(res(i+1).x, res(i+1).dzdx, leak{:}) ;
end

case 'sigmoid'
res(i).dzdx = vl_nnsigmoid(res(i).x, res(i+1).dzdx) ;

case 'noffset'
res(i).dzdx = vl_nnnoffset(res(i).x, l.param, res(i+1).dzdx) ;

case 'spnorm'
res(i).dzdx = vl_nnspnorm(res(i).x, l.param, res(i+1).dzdx) ;

case 'dropout'
if testMode
res(i).dzdx = res(i+1).dzdx ;
else
res(i).dzdx = vl_nndropout(res(i).x, res(i+1).dzdx, ...
'mask', res(i+1).aux) ;
end

case 'bnorm'
[res(i).dzdx, dzdw{1}, dzdw{2}, dzdw{3}] = ...
vl_nnbnorm(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx) ;
% multiply the moments update by the number of images in the batch
% this is required to make the update additive for subbatches
% and will eventually be normalized away
dzdw{3} = dzdw{3} * size(res(i).x,4) ;

case 'pdist'
res(i).dzdx = vl_nnpdist(res(i).x, l.class, ...
l.p, res(i+1).dzdx, ...
'noRoot', l.noRoot, ...
'epsilon', l.epsilon, ...
'aggregate', l.aggregate) ;

case 'custom'
res(i) = l.backward(l, res(i), res(i+1)) ;

end % layers

switch l.type
case {'conv', 'convt', 'bnorm'}
if ~opts.accumulate
res(i).dzdw = dzdw ;
else
for j=1:numel(dzdw)
res(i).dzdw{j} = res(i).dzdw{j} + dzdw{j} ;
end
end
dzdw = [] ;
end
if opts.conserveMemory && ~net.layers{i}.precious && i ~= n
res(i+1).dzdx = [] ;
res(i+1).x = [] ;
end
if gpuMode && opts.sync
wait(gpuDevice) ;
end
res(i).backwardTime = toc(res(i).backwardTime) ;
end
end


Looking into functions direct under the folder matlab

Here I mean functions vl_nnxx. I will just take vl__nnsigmoid and vl__nnrelu for example.

function out = vl_nnsigmoid(x,dzdy)
y = 1 ./ (1 + exp(-x));

if nargin <= 1 || isempty(dzdy)
out = y ;
else
out = dzdy .* (y .* (1 - y)) ;
end


When it come to a layer, formula (1) is bound to be executed
out = dzdy .* (y .* (1 - y)) ;
The latter part
(y .* (1 - y))
is just f′(x).

function y = vl_nnrelu(x,dzdy,varargin)
opts.leak = 0 ;
opts = vl_argparse(opts, varargin) ;

if opts.leak == 0
if nargin <= 1 || isempty(dzdy)
y = max(x, 0) ;
else
y = dzdy .* (x > 0) ;  % here used formula (1)
end
else
if nargin <= 1 || isempty(dzdy)
y = x .* (opts.leak + (1 - opts.leak) * (x > 0)) ;
else
y = dzdy .* (opts.leak + (1 - opts.leak) * (x > 0)) ;
end
end


Why I don’t show code of structures containing weights’ computation. Because they are coded in cuda C for computation speediness. You can find them in matlab\src. That’s why we should compile matconvnet at the very beginning. We need to compile functions like vl_nnconv into mex files to be called by matlab files.

The third of the series will be mainly introduce
cnn_train
which is quite interesting.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: