机器学习笔记之正规方程
2017-07-10 08:38
344 查看
Normal Equation
Note: [8:00 to 8:44 - The design matrixX (in the bottom right side of the slide) given in the example should haveelements x with
subscript 1 and superscripts varying from 1 to m because forall m training sets there are only 2 features x0 and x1.
12:56 - The Xmatrix is m by (n+1) and NOT n by n. ]
Gradient descentgives one way of minimizing J. Let’s discuss a second way of doing so, thistime performing the minimization explicitly and without resorting to aniterative algorithm.
In the "Normal Equation" method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, andsetting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:
θ=(XTX)−1XTy
There is noneed to do feature scaling with the normal equation.
The following isa comparison of gradient descent and the normal equation:
With the normal equation, computing the inversion has complexity O(n3).
So if we havea very large number of features, the normal equation will be slow. In practice,when n exceeds 10,000 it might be a good time to go from a normal solution toan iterative process.
Note: [8:00 to 8:44 - The design matrixX (in the bottom right side of the slide) given in the example should haveelements x with
subscript 1 and superscripts varying from 1 to m because forall m training sets there are only 2 features x0 and x1.
12:56 - The Xmatrix is m by (n+1) and NOT n by n. ]
Gradient descentgives one way of minimizing J. Let’s discuss a second way of doing so, thistime performing the minimization explicitly and without resorting to aniterative algorithm.
In the "Normal Equation" method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, andsetting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:
θ=(XTX)−1XTy
There is noneed to do feature scaling with the normal equation.
The following isa comparison of gradient descent and the normal equation:
Gradient Descent | Normal Equation |
Need to choose alpha | No need to choose alpha |
Needs many iterations | No need to iterate |
O (kn2) | O (n3), need to calculate inverse of XTX |
Works well when n is large | Slow if n is very large |
So if we havea very large number of features, the normal equation will be slow. In practice,when n exceeds 10,000 it might be a good time to go from a normal solution toan iterative process.
相关文章推荐
- Andrew Ng 机器学习笔记 06 :正规方程
- 机器学习笔记(二)----线性方程拟合的正规方程法*(推荐网址)
- 机器学习笔记一二 - 线性规划 梯度下降 正规方程
- 【学习笔记】斯坦福大学公开课(机器学习) 之二:正规方程
- 机器学习笔记(一):梯度下降算法,随机梯度下降,正规方程
- 斯坦福大学机器学习笔记——特征和多项式回归以及正规方程
- 机器学习笔记(2)---监督学习之正规方程
- 机器学习之——多项式回归和正规方程
- 机器学习_线性回归,梯度下降算法与正规方程
- 机器学习——线性回归中正规方程组的推导
- 机器学习:用正规方程法求解线性回归
- 机器学习之正规方程法
- 机器学习_正规方程(最小二乘法)的推导
- 机器学习之——多项式回归和正规方程
- 机器学习笔记(一)----线性方程拟合的梯度下降法
- Stanford机器学习---第二周.特征缩放、正规方程
- 机器学习(七)线性回归、正规方程、逻辑回归的正规化
- 机器学习,线性回归关于梯度下降法和正规方程的流程
- 机器学习——线性回归(梯度下降和正规方程)
- 机器学习--吴恩达(线性回归,梯度下降,正规方程法)