您的位置:首页 > 其它

Coursera-Machine Learning 之 Logistic Regression (逻辑回归)-0x02

2015-02-19 16:35 357 查看

Cost Function

Training set(训练集):

{(x(1),(y(1)),(x(2),(y(2)),...,(x(m),(y(m))}\{(x^{(1)}, (y^{(1)}), (x^{(2)}, (y^{(2)}), ... ,(x^{(m)}, (y^{(m)})\}

m 个训练样本

x∈⎡⎣⎢⎢⎢x0x1...xn⎤⎦⎥⎥⎥ x0=1,y∈{0,1}x \in
\begin{bmatrix}
x_0\\
x_1\\
...\\
x_n\\
\end{bmatrix}
\space x_0 = 1, y \in \{0,1\}

hθ(x)=11+e−θTxh_\theta(x) = \frac{1}{1+e^{-\theta^Tx}}

如何选择拟合参数 θ\theta ?

代价函数

线性回归:

J(θ)=1m∑i=1m12(hθ(x(i))−y(i))2J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}\frac{1}{2}(h_\theta(x^{(i)}) - y^{(i)})^2

Cost(hθ(x(i)),y(i))=12(hθ(x(i))−y(i))2Cost(h_\theta(x^{(i)}) , y^{(i)}) = \frac{1}{2}(h_\theta(x^{(i)}) - y^{(i)})^2

Logistic 回归:

Cost(hθ(x(i)),y(i))={−log(hθ(x))−log(1−hθ(x))if y=1if y=0
Cost(h_\theta(x^{(i)}) , y^{(i)}) =
\begin{cases}
-log(h_\theta(x)) & \text{if}\space y = 1\\
-log(1 - h_\theta(x)) & \text{if}\space y = 0
\end{cases}

Note: y=0 or 1 always y = 0 \ \text{or}\ 1\ \text{always}

结合函数图像比较好理解。

Simplified cost function and gradient descent

Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))Cost(h_\theta(x) , y) = -y log(h_\theta(x)) - (1 - y)log(1 - h_\theta(x))

J(θ)=1m∑i=1mCost(hθ(x(i)),y(i))=−1m[∑i=1my(i)log hθ(x(i))+(1−y(i))log(1−hθ(x(i))]J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}Cost(h_\theta(x^{(i)}), y^{(i)}) = -\frac{1}{m}[\sum\limits_{i = 1}^{m} y^{(i)}log\space h_\theta(x^{(i)}) + (1 - y^{(i)})log(1 - h_\theta(x^{(i)})]

拟合参数 θ\theta:

minθJ(θ)\min\limits_{\theta}J(\theta)

针对一个新的 xx 预测输出值:

Output hθ(x)=11+e−θTxh_\theta(x) = \frac{1}{1+e^{-\theta^T x}}

want minθJ(θ)\min\limits_{\theta}J(\theta):

Repeat {

θj:=θj−α∂∂θjJ(θ)\theta_j := \theta_j -\alpha\frac{\partial}{\partial\theta_j}J(\theta)

}

∂∂θjJ(θ)=1m∑i=1m(hθ(x(i)),y(i))x(i)j\frac{\partial}{\partial\theta_j}J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}(h_\theta(x^{(i)}), y^{(i)})x_j^{(i)}

Advanced Optimization(高级优化)

Optimization algorithm

Gradient descent(梯度下降)

Conjugate gradient(共轭梯度法)

BFGS(变尺度法)

L-BFGS(限制变尺度法)

后三种算法优点:

不需要手动选择学习率

一般情况下比梯度下来收敛得更快

缺点:更加复杂

Example:

θ=[θ0θ1]\theta =
\begin{bmatrix}
\theta_0\\
\theta_1\\
\end{bmatrix}

J(θ)=(θ1−5)2+(θ2−5)2J(\theta) = (\theta_1 - 5)^2 + (\theta_2 - 5)^2

∂∂θ1J(θ)=2(θ1−5)\frac{\partial}{\partial\theta_1}J(\theta) = 2(\theta_1 - 5)

∂∂θ2J(θ)=2(θ2−5)\frac{\partial}{\partial\theta_2}J(\theta) = 2(\theta_2 - 5)

function [jVal, gradient] = costFunction(theta)
jVal = (theta(1) - 5)^2 + (theta(2) - 5)^2;
gradient = zeros(2, 1);
gradient(1) = 2*(theta(1) - 5);
gradient(2) = 2*(theta(2) - 5);

options = optimset('GradObj', 'on', 'MaxIter', '100');
initialTeta = zeros(2,1);
[optTheta, functionVal, exitFlag]
= fminunc(@costFunction, initialTheta, options);


Multiclass Classification: One-vs-all

One-vs-all(one-vs-rest)

h(i)θ(x)=P(y=i|x;θ) (x=1,2,3)h_\theta^{(i)}(x) = P(y = i | x;\theta)\space(x = 1,2,3)

给定新的输入 xx 值,选择对应类别:

maxih(i)θ(x)\max\limits_{i} h_\theta^{(i)}(x)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐