机器学习笔记之梯度下降(二)
2017-07-07 15:37
302 查看
Gradient Descent Intuition
In this video we explored the scenario where we used one parameter θ1 andplotted its cost function to implement a gradient descent. Our formula for a single parameter was :
Repeat until convergence:
θ1:=θ1−αddθ1J(θ1) |
converges to its minimum value. The following graph shows that when the slope is negative, the value of θ1 increases
and when it is positive, the value of θ1 decreases.
![](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/SMSIxKGUEeav5QpTGIv-Pg_ad3404010579ac16068105cfdc8e950a_Screenshot-2016-11-03-00.05.06.png?expiry=1499558400000&hmac=3kIZccG0qg24zsepULiFSQdvJbQ-2OGbyc4hFRq2rbE)
On a side note, we should adjust our parameter α to
ensure that the gradient descent algorithm converges in a reasonable time. Failure to converge or too much time to obtain the minimum value imply that our step size is wrong.
![](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/UJpiD6GWEeai9RKvXdDYag_3c3ad6625a2a4ec8456f421a2f4daf2e_Screenshot-2016-11-03-00.05.27.png?expiry=1499558400000&hmac=-xJgAaLJkWI3uDIF6YZeKyv7Z1EOmnGfZYf4cLiId5s)
How does gradient descent converge with a fixed step size α?
The intuition behind the convergence is that ddθ1J(θ1) approaches0 as we approach the bottom of our convex function. At the minimum, the derivative will always be 0 and thus we get: