您的位置:首页 > 其它

【漫谈机器学习】1.误差最小VS概率最大(1)

2015-12-28 21:51 204 查看
机器学习经常做的事情是拟合和分类 ,其中线性回归是最简单的机器学习问题了,因此,从线性回归说起了,在这里先说两个有意思的方法,误差最小VS概率最大,不好意思,受骗了,这绝对不是漫谈,而是开始累公式了,哈哈。

提出问题

现在有样本集合:{(xi,yi)}mi=1\left\{ {\left( {{x_i},{y_i}} \right)} \right\}_{i = 1}^m,其中:

输入为:xi∈Rn,xi=[xi1,xi2,⋯xin]T{x_i} \in {R^n},{x_i} = {\left[ {{x_{i1}},{x_{i2}}, \cdots {x_{in}}} \right]^T},nn为输入向量维度

输出为:yi∈R{y_i} \in R,是一个实数

样本个数:i=1,2,⋯mi = 1,2, \cdots m

模型参数:θ∈Rn,θ=[θ1,θ2,⋯θn]T\theta \in {R^n},\theta = {\left[ {{\theta _1},{\theta _2}, \cdots {\theta _n}} \right]^T}

咱们的目标是求解模型参数,使得:yi=θTxi{y_i} = {\theta ^T}{x_i}

有时候会考虑到建模误差,因此会有这样的形式:yi=θTxi+εi,εi∈R{y_i} = {\theta ^T}{x_i} + {\varepsilon _i},{\varepsilon _i} \in R

为了方便计算,常常写成矩阵的形式为:

Y=θTX+εY = {\theta ^T}X + \varepsilon

其中

Y=[y1y2⋯ym]Y = \left[ {\begin{array}{*{20}{c}}
{{y_1}}&{{y_2}}& \cdots &{{y_m}}
\end{array}} \right],

X=[x1x2⋯xm]X = \left[ {\begin{array}{*{20}{c}}
{{x_1}}&{{x_2}}& \cdots &{{x_m}}
\end{array}} \right],

ε=[ε1ε2⋯εm]\varepsilon = \left[ {\begin{array}{*{20}{c}}
{{\varepsilon _1}}&{{\varepsilon _2}}& \cdots &{{\varepsilon _m}}
\end{array}} \right]

求解 的思路主要有两种:一种是通过误差最小的思路求解,一种是通过概率最大的思路求解(极大似然估计)。

误差最小求解

如果不考虑误差,那么Y=θTX{Y} = {\theta ^T}{X}可以求解析解即:θ^=YX−1\hat \theta = Y{X^{ - 1}}

如果考虑误差的话,那么模型的误差为:

hθ=12∑i=1m(yi−θTxi)2=12(Y−θTX)(Y−θTX)T{h_\theta } = \frac{1}{2}\sum\limits_{i = 1}^m {{{\left( {{y_i} - {\theta ^T}{x_i}} \right)}^2}} = \frac{1}{2}\left( {Y - {\theta ^T}X} \right){\left( {Y - {\theta ^T}X} \right)^T}

因此,目标函数为:

θ^=argminθhθ\hat \theta = \arg \mathop {\min }\limits_\theta {h_\theta }

毫无疑问这是一个最优化问题了,最简单的思路就是去偏导取零,看看能不能得到解析解。

∂hθ∂θ=∂∂θ[12(Y−θTX)(Y−θTX)T] =∂∂θ[12(Y−θTX)(YT−XTθ)] =∂∂θ[12(YYT−YXTθ−θTXYT+θTXXTθ)] =∂∂θtr[12(YYT−YXTθ−θTXYT+θTXXTθ)] =12∂∂θtr((YYT))−12∂∂θtr((YXTθ))−12∂∂θtr((θTXYT))+12∂∂θtr(θTXXTθ) =12∂∂θtr(θTXXTθ)−∂∂θtr((YXTθ))\begin{array}{l}
\frac{{\partial {h_\theta }}}{{\partial \theta }} = \frac{\partial }{{\partial \theta }}\left[ {\frac{1}{2}\left( {Y - {\theta ^T}X} \right){{\left( {Y - {\theta ^T}X} \right)}^T}} \right]\ = \frac{\partial }{{\partial \theta }}\left[ {\frac{1}{2}\left( {Y - {\theta ^T}X} \right)\left( {{Y^T} - {X^T}\theta } \right)} \right]\ = \frac{\partial }{{\partial \theta }}\left[ {\frac{1}{2}\left( {Y{Y^T} - Y{X^T}\theta - {\theta ^T}X{Y^T} + {\theta ^T}X{X^T}\theta } \right)} \right]\ = \frac{\partial }{{\partial \theta }}tr\left[ {\frac{1}{2}\left( {Y{Y^T} - Y{X^T}\theta - {\theta ^T}X{Y^T} + {\theta ^T}X{X^T}\theta } \right)} \right]\
= \frac{1}{2}\frac{\partial }{{\partial \theta }}tr\left( {\left( {Y{Y^T}} \right)} \right) - \frac{1}{2}\frac{\partial }{{\partial \theta }}tr\left( {\left( {Y{X^T}\theta } \right)} \right) - \frac{1}{2}\frac{\partial }{{\partial \theta }}tr\left( {\left( {{\theta ^T}X{Y^T}} \right)} \right) + \frac{1}{2}\frac{\partial }{{\partial \theta }}tr\left( {{\theta ^T}X{X^T}\theta } \right)\
= \frac{1}{2}\frac{\partial }{{\partial \theta }}tr\left( {{\theta ^T}X{X^T}\theta } \right) - \frac{\partial }{{\partial \theta }}tr\left( {\left( {Y{X^T}\theta } \right)} \right)
\end{array}

下面涉及两个矩阵迹的性质:即

∂tr(AB)∂A=BT,∂tr(ABATC)∂A=CAB+CTAB\frac{{\partial tr\left( {AB} \right)}}{{\partial A}} = {B^T},\frac{{\partial tr\left( {AB{A^T}C} \right)}}{{\partial A}} = CAB + {C^T}AB

12∂∂θtr(θTXXTθ)−∂∂θtr((YXTθ)) =12∂∂θtr(θθTXXT)−∂∂θtr((YXTθ)) =12XXTθ+12(XXT)Tθ−XYT =XXTθ−XYT\begin{array}{l}
\frac{1}{2}\frac{\partial }{{\partial \theta }}tr\left( {{\theta ^T}X{X^T}\theta } \right) - \frac{\partial }{{\partial \theta }}tr\left( {\left( {Y{X^T}\theta } \right)} \right)\
= \frac{1}{2}\frac{\partial }{{\partial \theta }}tr\left( {\theta {\theta ^T}X{X^T}} \right) - \frac{\partial }{{\partial \theta }}tr\left( {\left( {Y{X^T}\theta } \right)} \right)\
= \frac{1}{2}X{X^T}\theta + \frac{1}{2}{\left( {X{X^T}} \right)^T}\theta - X{Y^T}\
= X{X^T}\theta - X{Y^T}
\end{array}

令∂hθ∂θ=0\frac{{\partial {h_\theta }}}{{\partial \theta }} = 0, 即 XXTθ−XYT=0X{X^T}\theta - X{Y^T} = 0

易知:θ^=(XXT)−1XYT\hat \theta = {\left( {X{X^T}} \right)^{ - 1}}X{Y^T}

这就是著名的最小二乘法。解析解得到了,那么从概率的角度考虑问题,通过极大似然估计会得到什么样的结果呢?
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: