READING NOTE: Wider or Deeper: Revisiting the ResNet Model for Visual Recognition
2016-12-07 20:17
936 查看
TITLE: Wider or Deeper: Revisiting the ResNet Model for Visual Recognition
AUTHOR: Zifeng Wu, Chunhua Shen, Anton van den Hengel
ASSOCIATION: The University of Adelaide
FROM: arXiv:1611.10080
A group of relatively shallow convolutional networks is proposed based on our new understanding. Some of them achieve the state-of-the-art results on the ImageNet classification datase.
The impact of using different networks is evaluated on the performance of semantic image segmentation, and these networks, as pre-trained features, can boost existing algorithms a lot.
For the residual unit i, let yi−1 be the input, and let fi(⋅) be its trainable non-linear mappings, also named Blok i. The output of unit i is recursively defined as
yi=fi(yi−1,ωi)+yi−1
where ωi denotes the trainalbe parameters, and fi(⋅) is often two or three stacked convolution stages in a ResNet building block. Then top left network can be formulated as
y2=y1+f2(y1,ω2)
=y0+f1(y0,ω1)+f2(y0+f1(y0,ω1),ω2)
Thus, in SGD iteration, the backward gradients are:
Δω2=dfsdω2⋅Δy2
Δy1=Δy2+f′2⋅Δy2
Δω1=df1dω1⋅Δy2+df1dω1⋅f′2⋅Δy2
Ideally, when effective depth l≥2, both terms of Δω1 are non-zeros as the bottom-left case illustrated. However, when effective depth l=1, the second term goes to zeros, which is illustrated by the bottom-right case. If this case happens, we say that the ResNet is over-deepened, and that it cannot be trained in a fully end-to-end manner, even with those shortcut connections.
To summarize, shortcut connections enable us to train wider and deeper networks. As they growing to some point, we will face the dilemma between width and depth. From that point, going deep, we will actually get a wider network, with extra features which are not completely end-to-end trained; going wider, we will literally get a wider network, without changing its end-to-end characteristic.
The author designed three kinds of network structure as illustrated in the following figure
and the classification performance on ImageNet validation set is shown as below
AUTHOR: Zifeng Wu, Chunhua Shen, Anton van den Hengel
ASSOCIATION: The University of Adelaide
FROM: arXiv:1611.10080
CONTRIBUTIONS
A further developed intuitive view of ResNets is introduced, which helps to understand their behaviours and find possible directions to further improvements.A group of relatively shallow convolutional networks is proposed based on our new understanding. Some of them achieve the state-of-the-art results on the ImageNet classification datase.
The impact of using different networks is evaluated on the performance of semantic image segmentation, and these networks, as pre-trained features, can boost existing algorithms a lot.
SUMMARY
For the residual unit i, let yi−1 be the input, and let fi(⋅) be its trainable non-linear mappings, also named Blok i. The output of unit i is recursively defined as
yi=fi(yi−1,ωi)+yi−1
where ωi denotes the trainalbe parameters, and fi(⋅) is often two or three stacked convolution stages in a ResNet building block. Then top left network can be formulated as
y2=y1+f2(y1,ω2)
=y0+f1(y0,ω1)+f2(y0+f1(y0,ω1),ω2)
Thus, in SGD iteration, the backward gradients are:
Δω2=dfsdω2⋅Δy2
Δy1=Δy2+f′2⋅Δy2
Δω1=df1dω1⋅Δy2+df1dω1⋅f′2⋅Δy2
Ideally, when effective depth l≥2, both terms of Δω1 are non-zeros as the bottom-left case illustrated. However, when effective depth l=1, the second term goes to zeros, which is illustrated by the bottom-right case. If this case happens, we say that the ResNet is over-deepened, and that it cannot be trained in a fully end-to-end manner, even with those shortcut connections.
To summarize, shortcut connections enable us to train wider and deeper networks. As they growing to some point, we will face the dilemma between width and depth. From that point, going deep, we will actually get a wider network, with extra features which are not completely end-to-end trained; going wider, we will literally get a wider network, without changing its end-to-end characteristic.
The author designed three kinds of network structure as illustrated in the following figure
and the classification performance on ImageNet validation set is shown as below
相关文章推荐
- Zifeng Wu的38层网络:Wider or Deeper: Revisiting the ResNet Model for Visual Recognition
- mybatis 引用对象属性映射错误 or could not be found for the javaType (xxx.model) : jdbcType (null) combination.
- Which data type is wider for the purpose of casting: float or long?
- Moving or disabling the package cache for Visual Studio 2017
- New For Me And/Or The World
- Microsoft Visual Studio Tools for the Microsoft Office system(3.0 版)
- Microsoft Visual Studio 2005 Tools for the Microsoft Office System (VSTO 2005) 可再发行组件包
- Microsoft Visual C++ 2005 Express Edition Programming for the Absolute Beginner
- Microsoft Visual C++ 2005 Express Edition Programming for the Absolute Beginner
- Visual C++ ActiveX Control for hosting Office documents in Visual Basic or HTML
- The Entrepreneurs Dilemma - Sell now for $Millions or holdout for $Billions?
- The format for JavaScript doc comments in Visual Studio 2008
- How to automate PowerPoint by using Visual C++ 5.0 or Visual C++ 6.0 with The Microsoft Foundation Classes
- The use of "." or not for current directory, in Java
- Microsoft Visual Studio 2005 Tools for the 2007 Microsoft Office System 正式版本下载
- 解决 winXP Device Manager 中的错误:Code 39:Windows cannot load the device driver for this hardware the driver may be corrupted or miss
- "The target '__Page' for the callback could not be found or did not implement ICallbackEventHandler."的解决办法
- 同名model导致的Invalid operation for the current cursor position
- Visual tool for developers working with the Open XML Formats
- Tips and Tricks for the Visual Studio .NET IDE