您的位置:首页 > 其它

随机梯度下降与批梯度下降

2017-08-23 09:42 309 查看
装载自:http://www.cnblogs.com/rcfeng/p/3958926.html

具体的理论公式见吴恩达老师斯坦福公开课讲义note1。

通过代码理解

1.随机梯度下降算法的python的实现:
# coding=utf-8
#!/usr/bin/python

'''
Created on 2014年9月6日

@author: Ryan C. F.

'''

#Training data set
#each element in x represents (x0,x1,x2)
x = [(1,0.,3) , (1,1.,3) ,(1,2.,3), (1,3.,2) , (1,4.,4)]
#y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2]
y = [95.364,97.217205,75.195834,60.105519,49.342380]

epsilon = 0.0001
#learning rate
alpha = 0.01
diff = [0,0]
error1 = 0
error0 =0
m = len(x)

#init the parameters to zero
theta0 = 0
theta1 = 0
theta2 = 0

while True:

#calculate the parameters
for i in range(m): diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) theta0 = theta0 + alpha * diff[0]* x[i][0] theta1 = theta1 + alpha * diff[0]* x[i][1] theta2 = theta2 + alpha * diff[0]* x[i][2]

#calculate the cost function
error1 = 0
for lp in range(len(x)):
error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2

if abs(error1-error0) < epsilon:
break
else:
error0 = error1

print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1)

print 'Done: theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)

2.批梯度下降
# coding=utf-8
#!/usr/bin/python

'''
Created on 2014年9月6日

@author: Ryan C. F.

'''

#Training data set
#each element in x represents (x0,x1,x2)
x = [(1,0.,3) , (1,1.,3) ,(1,2.,3), (1,3.,2) , (1,4.,4)]
#y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2]
y = [95.364,97.217205,75.195834,60.105519,49.342380]

epsilon = 0.000001
#learning rate
alpha = 0.001
diff = [0,0]
error1 = 0
error0 =0
m = len(x)

#init the parameters to zero
theta0 = 0
theta1 = 0
theta2 = 0
sum0 = 0
sum1 = 0
sum2 = 0
while True:

#calculate the parameters
for i in range(m):
#begin batch gradient descent
diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) #此处使用的是未更新的theta
sum0 = sum0 + alpha * diff[0]* x[i][0]
sum1 = sum1 + alpha * diff[0]* x[i][1]
sum2 = sum2 + alpha * diff[0]* x[i][2]
#end batch gradient descent
theta0 = sum0;
theta1 = sum1;
theta2 = sum2;
#calculate the cost function
error1 = 0
for lp in range(len(x)):
error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2

if abs(error1-error0) < epsilon:
break
else:
error0 = error1

print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1)

print 'Done: theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)
通过上述批梯度下降和随机梯度下降算法代码的对比,不难发现两者的区别:

1. 随机梯度下降算法在迭代的时候,每迭代一个新的样本,就会更新一次所有的theta参数。

for i in range(m):

diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] )

theta0 = theta0 + alpha * diff[0]* x[i][0]
theta1 = theta1 + alpha * diff[0]* x[i][1]
theta2 = theta2 + alpha * diff[0]* x[i][2]


2. 批梯度下降算法在迭代的时候,是完成所有样本的迭代后才会去更新一次theta参数
#calculate the parameters
for i in range(m):
#begin batch gradient descent
diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] )
sum0 = sum0 + alpha * diff[0]* x[i][0]
sum1 = sum1 + alpha * diff[0]* x[i][1]
sum2 = sum2 + alpha * diff[0]* x[i][2]
#end batch gradient descent
theta0 = sum0;
theta1 = sum1;
theta2 = sum2;
因此当样本数量很大时候,批梯度得做完所有样本的计算才能更新一次theta,从而花费的时间远大于随机梯度下降。但是随机梯度下降过早的结束了迭代,使得它获取的值只是接近局部最优解,而并非像批梯度下降算法那么是局部最优解。

因此我觉得以上的差别才是批梯度下降与随机梯度下降最本质的差别。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐