the problem of overfitting
2017-03-01 15:17
459 查看
underfitting or high bias—hypothesis function h maps poorly to the trend of the data.
usually caused by a function that is too simple or uses too few features.
overfitting or high variance—fits the available data but does not generalize well to predict new data.
usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.
to address it:
1) Reduce the number of features:
1. Manually select which features to keep.
2. Use a model selection algorithm .
2) Regularization
1. Keep all the features: but reduce the magnitude of parameters θj.
2. Regularization works well when we have a lot of slightly useful features.
Regularization:
1.regularized linear regression
Without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function:
minθ12m[∑i=1m(hθ(x(i))−y(i))2+λ∑j=1nθ2j]
The λ, is the regularization parameter.
If λ is chosen to be too large, it may smooth out the function too much and cause underfitting.
As a result, we see that the new hypothesis (depicted by the pink curve) looks like a quadratic function but fits the data better due to the extra small terms θ
actually,(1−αλm)<1
so it shrink the parameter a little bit before do the same thing as previous.
Using regularization also takes care of any non-invertibility issues of the X transpose X matrix as well.
if m ≤ n, then XTX is non-invertible. However, when we add the term λ⋅L, then XTX+λ⋅L becomes invertible.
2.regularized logistic regression
the θ vector is indexed from 0 to n (holding n+1 values, θ0 through θn), and this sum explicitly skips θ0
b.t.w Because regularization causes J(θ) to no longer be convex, gradient descent may not always converge to the global minimum (when λ>0, and when using an appropriate learning rate α).
相关文章推荐
- machine learning(13) -- solving the problem of overfitting:regularization
- The Problem of Overfitting
- Machine Learning - Solving the Problem of Overfitting: Regularization
- #“Machine Learning”(Andrew Ng)#Week 3_4:Solving the Problem of Overfitting
- 7 - 1 - The Problem of Overfitting (10 min)
- Andrew NG 《machine learning》week 3,class5 —Solving the Problem of Overfitting
- 【Stanford机器学习笔记】4-Regularization for Solving the Problem of Overfitting
- 机器学习(四)正则化与过拟合问题 Regularization / The Problem of Overfitting
- 欧拉项目 Problem 12 of What is the value of the first triangle number to have over five hundred divisors
- the method of resolving the difficult problem about sap
- POJ 3100 Root of the Problem(简单题)
- 32位centos运行yum报错:There was a problem importing one of the Python modules
- WOJ-Problem 1009 - The Legend of Valiant Emigration
- A problem was found with the configuration of task ':app:packageDebug'
- Android A problem was found with the configuration of task ':app:packageRelease'
- POJ 3100 Root of the Problem G++
- POJ 3100 && HDU 2740 Root of the Problem(水~)
- webview使用遇到 It is possible that this object was over-released, or is in the process of deallocation错误的解决办法
- 机器学习笔记-Hazard of Overfitting
- poj 3100 Root of the Problem