DeepLearning.ai code笔记2:超参数调试、正则化以及优化
2018-04-03 16:39
621 查看
1、L2正则化
不使用正则化的公式:J=−1m∑i=1m(y(i)log(a[L](i))+(1−y(i))log(1−a[L](i)))(1)(1)J=−1m∑i=1m(y(i)log(a[L](i))+(1−y(i))log(1−a[L](i)))
正则化的公式:
Jregularized=−1m∑i=1m(y(i)log(a[L](i))+(1−y(i))log(1−a[L](i)))cross-entropy cost+1mλ2∑l∑k∑jW[l]2k,jL2 regularization cost(2)(2)Jregularized=−1m∑i=1m(y(i)log(a[L](i))+(1−y(i))log(1−a[L](i)))⏟cross-entropy cost+1mλ2∑l∑k∑jWk,j[l]2⏟L2 regularization cost
cross_entropy_cost = compute_cost(A3, Y) # This gives you the cross-entropy part of the cost L2_regularization_cost = 1./m * lambd/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3))) cost = cross_entropy_cost + L2_regularization_cost
dZ3 = A3 - Y dW3 = 1./m * np.dot(dZ3, A2.T) + lambd / m * W3 # + lambd / m * W3 db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True) dA2 = np.dot(W3.T, dZ3) dZ2 = np.multiply(dA2, np.int64(A2 > 0)) dW2 = 1./m * np.dot(dZ2, A1.T) + lambd / m * W2 db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True) dA1 = np.dot(W2.T, dZ2) dZ1 = np.multiply(dA1, np.int64(A1 > 0)) dW1 = 1./m * np.dot(dZ1, X.T) + lambd / m * W1 db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)
2、He 初始化
parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * np.sqrt(2.0/layers_dims[l-1]) parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
3、Adam 梯度下降
How does Adam work?It calculates an exponentially weighted average of past gradients, and stores it in variables vv (before bias correction) and vcorrectedvcorrected (with bias correction).
It calculates an exponentially weighted average of the squares of the past gradients, and stores it in variables ss (before bias correction) and scorrectedscorrected (with bias correction).
It updates parameters in a direction based on combining information from “1” and “2”.
The update rule is, for l=1,...,Ll=1,...,L:
⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪vdW[l]=β1vdW[l]+(1−β1)∂J∂W[l]vcorrecteddW[l]=vdW[l]1−(β1)tsdW[l]=β2sdW[l]+(1−β2)(∂J∂W[l])2scorrecteddW[l]=sdW[l]1−(β1)tW[l]=W[l]−αvcorrecteddW[l]scorrecteddW[l]√+ε{vdW[l]=β1vdW[l]+(1−β1)∂J∂W[l]vdW[l]corrected=vdW[l]1−(β1)tsdW[l]=β2sdW[l]+(1−β2)(∂J∂W[l])2sdW[l]corrected=sdW[l]1−(β1)tW[l]=W[l]−αvdW[l]correctedsdW[l]corrected+ε
where:
t counts the number of steps taken of Adam
L is the number of layers
β1β1 and β2β2 are hyperparameters that control the two exponentially weighted averages.
αα is the learning rate
εε is a very small number to avoid dividing by zero
翻译: 第一步和第二步分别使用了动量梯度下降和均方根支算法,其原理都是利用指数加权平均的思想和过去的梯度进行联系,让其具有一定的原来的动量(趋势)或者得到一个更平均的值不至于矫枉过正。corrected是指进行一定的修正。
for l in range(L): # Moving average of the gradients. Inputs: "v, grads, beta1". Output: "v". v["dW" + str(l+1)] = beta1 * v["dW" + str(l+1)] + (1-beta1) * grads['dW' + str(l+1)] v["db" + str(l+1)] = beta1 * v["db" + str(l+1)] + (1-beta1) * grads['db' + str(l+1)] # Compute bias-corrected first moment estimate. Inputs: "v, beta1, t". Output: "v_corrected". v_corrected["dW" + str(l+1)] = v["dW" + str(l+1)] / (1-np.power(beta1,t)) v_corrected["db" + str(l+1)] = v["db" + str(l+1)] / (1-np.power(beta1,t)) # Moving average of the squared gradients. Inputs: "s, grads, beta2". Output: "s". s["dW" + str(l+1)] = beta2 * s["dW" + str(l+1)] + (1-beta2) * grads['dW' + str(l+1)]**2 s["db" + str(l+1)] = beta2 * s["db" + str(l+1)] + (1-beta2) * grads['db' + str(l+1)]**2 # Compute bias-corrected second raw moment estimate. Inputs: "s, beta2, t". Output: "s_corrected". s_corrected["dW" + str(l+1)] = s["dW" + str(l+1)] / (1-np.power(beta2,t)) s_corrected["db" + str(l+1)] = s["db" + str(l+1)] / (1-np.power(beta2,t))
相关文章推荐
- DeepLearning.ai学习笔记(二)改善深层神经网络:超参数调试、正则化以及优化--Week2优化算法
- DeepLearning.ai学习笔记(二)改善深层神经网络:超参数调试、正则化以及优化--Week1深度学习的实用层面
- DeepLearning.ai学习笔记(二)改善深层神经网络:超参数调试、正则化以及优化--week3 超参数调试、Batch正则化和程序框架
- 改善深层神经网络:超参数调试、正则化以及优化-- DeepLearning.ai 学习笔记(2-1)
- 深度学习第二课 改善深层神经网络:超参数调试、正则化以及优化 第二周Mini_batch+优化算法 笔记和作业
- Coursea吴恩达《优化深度神经网络》课程笔记(3)超参数调试、正则化以及优化
- 吴恩达-深度学习笔记《改善深层神经网络:超参数调试、正则化以及优化》
- 吴恩达深度学习笔记二:超参数调试、正则化以及优化
- 深度学习第二课 改善深层神经网络:超参数调试、正则化以及优化 第三周超参数调试+Batch normalization笔记和作业
- 吴恩达-DeepLearning.ai-02 改善深层神经网络:超参数调试、正则化以及优化
- Coursera deeplearning.ai 深度学习笔记2-1-Practical aspects of deep learning-神经网络实际问题分析(初始化&正则化&训练效率)与代码实现
- [DeeplearningAI笔记]改善深层神经网络_优化算法2.1_2.2_mini-batch梯度下降法
- Coursera deeplearning.ai 深度学习笔记2-2-Optimization algorithms-优化算法与代码实现
- [DeeplearningAI笔记]改善深层神经网络_优化算法2.6_2.9Momentum/RMSprop/Adam优化算法
- [DeeplearningAI笔记]02_3.1-3.2超参数搜索技巧与对数标尺
- 改善深层神经网络:超参数调试、正则化以及优化——优化算法(2-2)
- DeepLearning.ai code笔记1:神经网络与深度学习
- 第2次课改善深层神经网络:超参数优化、正则化以及优化 - week3 超参数调试、Batch正则化和程序框架
- 吴恩达《深度学习-改善深层神经网络》3--超参数调试、正则化以及优化
- 深度学习DeepLearning.ai系列课程学习总结:13. 超参数调优、Batch正则化理论及深度学习框架学习