Notes on MatConvNet(II):vl_simplenn
2016-04-15 13:53
267 查看
Written before
This blog is the second one of the series of Notes on MatConvNet.Notes on MatConvNet(I) – Overview
Here I will mainly introduce the central of the matcovnet—vl_simplenn.
This function plays a quite important role in forward-propagation and backward-propagation.
PS. I have to admit that writing blogs in Chinese would bring much more traffic than writing English blogs. Yet, if I care too much about vain trifles, it is a sad thing.
Something that should be known before
I only introduce BP. How does the derivatives propagate backforward? I sincerely recommend you to read BP Algorithm. This blog is a good beginning of BP. When you have finished reading that, The first blog of this series is also recommended. I will repeat the main computation structure illustrated in Notes(I).Computation Structure
I make some default rules here.
y always represents the output of certain layer. That is to say when it comes to layer i, then y is the output of layer i.
z always represents the output of the whole net, or rather, it represents the output of the final layer n.
x represents the input of certain layer.
In order to make things more easier, matcovnet takes one simple function as a “layer”. This means when the input goes through a computation strcuture(no matter it is conv structure or just a relu structure), it does computation like the following:
dzdx=f′(x)∗dzdy(1)
dzdx=f′w(x)∗dzdy(2) condition
Note:
- condition means that formula (2) only computes when the layer contains weights computation.
- f′(x) means the derivative of the output with respect to the input x.
- f′w(x) means the derivative of the output with respect to the weights w.
- It is a little different from BP Algorithm where it just takes conv or full connected layer as computation structures. However when you include activations(such as sigmoid, relu etc) and other functional structures(dropout, LRN, ect) into computation structures, you will find it quite easy. It is because every computation part only takes responsible for its own input and output. Every time, it gets an dzdy and its input x, it calculate dzdx using (1). If it represents conv or full connected layers or other layers which just take weights into computation, you have to do more computation with formula (2), for you need to get newer weights to update them. In fact it is where our goal is.
Taking a look at vl_simplenn
The result format
res(i+1).x: the output of layer
i. Hence
res(1).xis the
network input.
res(i+1).dzdx: the derivative of the network output relative
to the output of layer
i. In particular
res(1).dzdxis the
derivative of the network output with respect to the network
input.
res(i+1).dzdw: a cell array containing the derivatives of the
network output relative to the parameters of layer
i. It can
be a cell array for multiple parameters.
Note: When it comes to the layer i, y means res(i+1).x . For y is the output of a certain layer, and res(i+1).x is indeed the input of layer i+1, so it shows in this type. And if the layer is i,
res(i+1).dzdxhas the same meaning of
dzdy.
Main types you may use
res = vl_simplenn(net,x); (1)res = vl_simplenn(net,x,dzdy); (2)
res = vl_simplenn(net, x,dzdy, res, opt, val) (3)
(1) is just forward computation.
(2) is used when back propagation. It is mainly used to compute the derivatives of input or the weights with respect the net’s output
z.
(3) is used in
cnn_train. It adds some opts but I do not introduce them here.
... % codes before 'Forward pass' is easy and deserves no explanations. % ------------------------------------------------------------------------- % Forward pass % ------------------------------------------------------------------------- for i=1:n if opts.skipForward, break; end; l = net.layers{i} ; res(i).time = tic ; switch l.type case 'conv' res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ... 'pad', l.pad, ... 'stride', l.stride, ... l.opts{:}, ... cudnn{:}) ; case 'convt' res(i+1).x = vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, ... 'crop', l.crop, ... 'upsample', l.upsample, ... 'numGroups', l.numGroups, ... l.opts{:}, ... cudnn{:}) ; case 'pool' res(i+1).x = vl_nnpool(res(i).x, l.pool, ... 'pad', l.pad, 'stride', l.stride, ... 'method', l.method, ... l.opts{:}, ... cudnn{:}) ; case {'normalize', 'lrn'} res(i+1).x = vl_nnnormalize(res(i).x, l.param) ; case 'softmax' res(i+1).x = vl_nnsoftmax(res(i).x) ; case 'loss' res(i+1).x = vl_nnloss(res(i).x, l.class) ; case 'softmaxloss' res(i+1).x = vl_nnsoftmaxloss(res(i).x, l.class) ; case 'relu' if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; end res(i+1).x = vl_nnrelu(res(i).x,[],leak{:}) ; case 'sigmoid' res(i+1).x = vl_nnsigmoid(res(i).x) ; ... otherwise error('Unknown layer type ''%s''.', l.type) ; end
Codes above show the forward propagation’s main idea.
% optionally forget intermediate results forget = opts.conserveMemory & ~(doder & n >= backPropLim) ; if i > 1 lp = net.layers{i-1} ; % forget RELU input, even for BPROP % forget为是否保留中间结果res{i+1}.x, net.layers.precious % 为true则保留中间结果 forget = forget & (~doder | (strcmp(l.type, 'relu') & ~lp.precious)) ; forget = forget & ~(strcmp(lp.type, 'loss') || strcmp(lp.type, 'softmaxloss')) ; forget = forget & ~lp.precious ; end if forget %不保存就让这一层的输入置为空 res(i).x = [] ; end if gpuMode && opts.sync wait(gpuDevice) ; end res(i).time = toc(res(i).time) ; end
Backward pass
It seems no explanations are the best explanations. Because it is quite easy.% ------------------------------------------------------------------------- % Backward pass % ------------------------------------------------------------------------- if doder res(n+1).dzdx = dzdy ; for i=n:-1:max(1, n-opts.backPropDepth+1) l = net.layers{i} ; res(i).backwardTime = tic ; switch l.type case 'conv' [res(i).dzdx, dzdw{1}, dzdw{2}] = ... vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ... 'pad', l.pad, ... 'stride', l.stride, ... l.opts{:}, ... cudnn{:}) ; case 'convt' [res(i).dzdx, dzdw{1}, dzdw{2}] = ... vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ... 'crop', l.crop, ... 'upsample', l.upsample, ... 'numGroups', l.numGroups, ... l.opts{:}, ... cudnn{:}) ; case 'pool' res(i).dzdx = vl_nnpool(res(i).x, l.pool, res(i+1).dzdx, ... 'pad', l.pad, 'stride', l.stride, ... 'method', l.method, ... l.opts{:}, ... cudnn{:}) ; case {'normalize', 'lrn'} res(i).dzdx = vl_nnnormalize(res(i).x, l.param, res(i+1).dzdx) ; case 'softmax' res(i).dzdx = vl_nnsoftmax(res(i).x, res(i+1).dzdx) ; case 'loss' res(i).dzdx = vl_nnloss(res(i).x, l.class, res(i+1).dzdx) ; case 'softmaxloss' res(i).dzdx = vl_nnsoftmaxloss(res(i).x, l.class, res(i+1).dzdx) ; case 'relu' if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; end if ~isempty(res(i).x) res(i).dzdx = vl_nnrelu(res(i).x, res(i+1).dzdx, leak{:}) ; else % if res(i).x is empty, it has been optimized away, so we use this % hack (which works only for ReLU): res(i).dzdx = vl_nnrelu(res(i+1).x, res(i+1).dzdx, leak{:}) ; end case 'sigmoid' res(i).dzdx = vl_nnsigmoid(res(i).x, res(i+1).dzdx) ; case 'noffset' res(i).dzdx = vl_nnnoffset(res(i).x, l.param, res(i+1).dzdx) ; case 'spnorm' res(i).dzdx = vl_nnspnorm(res(i).x, l.param, res(i+1).dzdx) ; case 'dropout' if testMode res(i).dzdx = res(i+1).dzdx ; else res(i).dzdx = vl_nndropout(res(i).x, res(i+1).dzdx, ... 'mask', res(i+1).aux) ; end case 'bnorm' [res(i).dzdx, dzdw{1}, dzdw{2}, dzdw{3}] = ... vl_nnbnorm(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx) ; % multiply the moments update by the number of images in the batch % this is required to make the update additive for subbatches % and will eventually be normalized away dzdw{3} = dzdw{3} * size(res(i).x,4) ; case 'pdist' res(i).dzdx = vl_nnpdist(res(i).x, l.class, ... l.p, res(i+1).dzdx, ... 'noRoot', l.noRoot, ... 'epsilon', l.epsilon, ... 'aggregate', l.aggregate) ; case 'custom' res(i) = l.backward(l, res(i), res(i+1)) ; end % layers switch l.type case {'conv', 'convt', 'bnorm'} if ~opts.accumulate res(i).dzdw = dzdw ; else for j=1:numel(dzdw) res(i).dzdw{j} = res(i).dzdw{j} + dzdw{j} ; end end dzdw = [] ; end if opts.conserveMemory && ~net.layers{i}.precious && i ~= n res(i+1).dzdx = [] ; res(i+1).x = [] ; end if gpuMode && opts.sync wait(gpuDevice) ; end res(i).backwardTime = toc(res(i).backwardTime) ; end end
Looking into functions direct under the folder matlab
Here I mean functions vl_nnxx. I will just take vl__nnsigmoid and vl__nnrelu for example.function out = vl_nnsigmoid(x,dzdy) y = 1 ./ (1 + exp(-x)); if nargin <= 1 || isempty(dzdy) out = y ; else out = dzdy .* (y .* (1 - y)) ; end
When it come to a layer, formula (1) is bound to be executed
out = dzdy .* (y .* (1 - y)) ;The latter part
(y .* (1 - y))is just f′(x).
function y = vl_nnrelu(x,dzdy,varargin) opts.leak = 0 ; opts = vl_argparse(opts, varargin) ; if opts.leak == 0 if nargin <= 1 || isempty(dzdy) y = max(x, 0) ; else y = dzdy .* (x > 0) ; % here used formula (1) end else if nargin <= 1 || isempty(dzdy) y = x .* (opts.leak + (1 - opts.leak) * (x > 0)) ; else y = dzdy .* (opts.leak + (1 - opts.leak) * (x > 0)) ; end end
Why I don’t show code of structures containing weights’ computation. Because they are coded in cuda C for computation speediness. You can find them in matlab\src. That’s why we should compile matconvnet at the very beginning. We need to compile functions like vl_nnconv into mex files to be called by matlab files.
The third of the series will be mainly introduce
cnn_trainwhich is quite interesting.
相关文章推荐
- 241. Different Ways to Add Parentheses
- 算法复习——背包DP问题
- JAVA中Action层, Service层 ,modle层 和 Dao层的功能区分
- 伪类和伪元素
- 安装intelhaxm-android.exe时,屏幕一闪而过
- ajax提交表单数据,并根据返回的数据进行相关处理的代码
- iOS 相关路径
- git 命令的学习
- 工作中最全最常用的正则表达式
- Cannot find autoconf
- Mysql笔记
- 两种方式解决jquery Ajax 发送中文乱码的方法,
- oracle权限with admin option和with grant option的用法
- Makefile学习笔记(二)ubuntu平台
- Python 执行js的2种解决方案 调用spidermonkey和安装pythonspidermonkey
- mysql-利润set变量模拟分组查询每组中的第N条数据
- 彻底搞清referrer和origin
- WPS中如何将多个文件在不同窗口中打开
- Android M Launcher3主流程源码浅析
- Windows内核开发之串口过滤