您的位置:首页 > 其它

[#1]Least square and Nearest neighbors

2015-12-18 05:25 411 查看
Least square and nearest neighbors
1 Least square in learn regression

2 Nearest neighbors

Rationality and difference of least square and nearest neighbors
rationality least square and nearest neighbors

extension of these simple procedures

1. Least square and nearest neighbors

1.1 Least square in learn regression

Assume we have a data set {(X(i),y(i))}Ni=1, and we will fit a linear regression y=XTβ+b on this data set. Notations to be used:

Y=(y(1),y(2),⋯,y(N))

X=(X(1)T,X(2)T,⋯,X(N)T)T

Firstly, we need to choose a loss function. Here we choose the Least squares, which aims to minimize the quadratic function of parameter β,

RSSresidual sum of square(β)=∥Y−Xβ∥2F

which leads to the solution β^=(XTX)−1XTY if X is column full rank. Then prediction function at any X is given by y^=XTβ^.

1.2 Nearest neighbors

In regression, nearest neighbors method averages the outputs of the k-nearest points of X as prediction value at X, which can be formulated as

y^(X)=1k∑X(i)∈Nk(X)y(i)

In classification, nearest neighbors method uses the maximum votes of labels of the k-nearest points of X as the class for X, which can be formulated as

g^(X)=argmaxg∑X(i)∈Nk(X)1{y(i)=g}

2. Rationality and difference of least square and nearest neighbors

least square makes huge assumptions about structure but nearest neighbors not

least square yields stable but possible inaccurate predictions, while predictions of nearest neighbors are often accurate but can be unstable

Note that:

stable means low variance

accurate means low bias

rationality least square and nearest neighbors

Suppose that we have random variables (X,Y) with jointly distribution Pr(X,Y). Then we want to find a function f(X) to approximate Y. If we use the suqare loss function as a criteria for choosing f(X),

EPEepxect prediction error(f)=E∥Y−f(X)∥2F=EXEY|X[∥Y−f(X)∥2F|X]

It suffices to minimize EPE pointwise

f(x)=argminEY|X[∥Y−f(X)∥2F|X=x]

The solution is

f(x)=E[Y|X=x]

And least square and nearest neighbors both aim to approximate the expectation by averaging.

Least square assumes the linear structure and approximate the expectation in square loss function by averaging all training datas.

\mathrm{\hat{EPE}}(\beta) = \frac{1}{N}\sum_{i=1}^N\|y_i - X^{(i)}^t\beta\|_F^2

Nearest neighbors approximates the conditional expectation in solution by averaging the outputs near target x

Y^=ave(yi|X(i)∈Nk(x))

So, two things are happening in approximating of both least square and nearest neighbor.

Least square

model structure assumption

averaging over all training data in EPE

Nearest neighbors

1.condition on a small region of target point x instead of conditioning on it

2.averaging the outputs which are near to x

extension of these simple procedures

There are many complex algorithms are from these two,

Kernel Methods use weights that decrease smoothly to zero with distance from the target point, rather that the effective 0/1 weights used by k-nearest neighbors.

In high dimensional spaces the distance kernels are modified to emphasize some variable more than others.

Local regression fits linear models by locally weighted least squares, rather than fitting constants locally.

Linear models fit to a basis expansion of the original inputs allow arbitrary complex models.

Projection pursuit and neural network models consist of sums of nonlinearly transformed linear models.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: