您的位置：首页 > 理论基础 > 计算机网络

显存内存使用量估计卷积神经网络 convolution torch finput

2017-07-07 08:31 513 查看

如何估算深度卷积神经网络的显存/内存消耗量

torch7中是可以打印显示深度神经网络中各个神经网络层的内存占用情况，既每个Tensor的配置情况，比如batch大小为16的时候：

nn.SpatialConvolution(3,4,4,4,2,2,1,1)
-- cpu
{
padW : 1
nInputPlane : 3
output : FloatTensor - size: 16x4x16x16
gradInput : FloatTensor - size: 16x3x32x32
_type : "torch.FloatTensor"
dH : 2
dW : 2
nOutputPlane : 4
padH : 1
kH : 4
finput : FloatTensor - size: 16x48x256
weight : FloatTensor - size: 4x3x4x4
gradWeight : FloatTensor - size: 4x3x4x4
fgradInput : FloatTensor - size: 16x48x256
kW : 4
bias : FloatTensor - size: 4
gradBias : FloatTensor - size: 4
}
-- gpu
{
padW : 1
nInputPlane : 3
output : CudaTensor - size: 16x4x16x16
gradInput : CudaTensor - size: 16x3x32x32
_type : "torch.CudaTensor"
dH : 2
dW : 2
nOutputPlane : 4f
padH : 1
kH : 4
finput : CudaTensor - size: 48x256
weight : CudaTensor - size: 4x3x4x4
gradWeight : CudaTensor - size: 4x3x4x4
fgradInput : CudaTensor - size: 16x16
kW : 4
bias : CudaTensor - size: 4
gradBias : CudaTensor - size: 4
}

可见，cpu和gpu基本上一样，有较大区别的是finput和fgradInput，cpu版本与batch有关，gpu的与batch无关，也是为什么torch7在cpu上跑的话很吃内存，gpu上则好很多。这两个变量是卷积层在运算时开辟的临时缓存，用于加速运算，其大小的计算方法很难找，网络上并没有直接的解释，需要解读c文件源码才能理解。

nn.SpatialConvolution
finput=(kW*kH*nInputPlane) x (outputHeigh*outputWideth)
fgradInput=same as finput

nn.SpatialFullConvolution
finput=(kW*kH*nOutputPlane) x (inputHeight*inputWidth)
fgradInput=outputHeigh x outputWideth -- 这个是我推测的，源码中并没有看到直接相关的代码，可能是眼拙，错过了。

其他Tensor的使用情况可以对照Torch的打印结果，以及卷积神经网络的基础知识推算出计算方法。

GPU显存实际用量

上述Tensor只是一块内存空间的引用，多个Tensor可能复用同一块内存空间，特别是临时缓存空间，存在复用是必然的，所以我们推算出的显存使用量是真实值的上限，两者之间的差距，对于大规模网络来说会比较明显。比如一个预计9MB的网络，显存消耗约7MB，预计69MB的网络，显存消耗约30~50MB。

另外，CUDA在运行时，会载入很多其他东西，所以torch中，当载入第一个CudaTensor时，显存会大量消耗，比如额外消耗100~200MB，之后每次载入Tensor则如实增加显存消耗。

finput和fgradInput的复用技术

有大神推荐如下复用代码，可以让网络中各个神经网络层复用同一块临时缓存空间：

https://groups.google.com/forum/#!topic/torch7/BmP_RJ-yxlU
@Thomas you could share all the temporary buffers this way:

local finput, fgradInput

model:apply(function(m) if torch.type(m) == 'nn.SpatialConvolution' or torch.type(m) == 'nn.SpatialConvolutionMM' then
finput = finput or m.finput
fgradInput = fgradInput or m.fgradInput
m.finput = finput
m.fgradInput = fgradInput
end
end)

This will share the temporary buffer among all convolution layers in your network.

上述代码据说不能用于训练模式，本人也在torch7给的ImageNet训练代码中看到类似上述功能的代码，但是处于注释状态，看来该说法还是很有可能的。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 神经网络内存 GPU CUDA Torch

相关文章推荐

新的分享

章节导航

显存 内存 使用量估计 卷积神经网络 convolution torch finput

如何估算深度卷积神经网络的显存/内存消耗量

GPU显存实际用量

finput和fgradInput的复用技术

显存内存使用量估计卷积神经网络 convolution torch finput