您的位置：首页 > 其它

CS231n笔记1--Softmax Loss 和 Multiclass SVM Loss

2016-05-13 17:17 399 查看

Softmax Loss 和 Multiclass SVM Loss

Softmax Loss 和 Multiclass SVM Loss
Softmax Loss

Multiclass SVM Loss

对比
Every thing here nothing there

Softmax Loss

给出（xi,yi）（x_i, y_i），其中 xix_i 是图像，yiy_i 是图像的类别（整数），s=f（xi,W）s = f（x_i,W），其中ss 是网络的输出，则定义误差如下：

P(Y=k|X=xi)=esk∑jesjLi=−logP(Y=yi|X=xi)P(Y = k|X = x_i) = \dfrac{e^{s_k}}{\sum_je^{s^j}} \\
L_i = -logP(Y=y_i | X = x_i)

例如s=[3.2,5.1,−1.7]s = [3.2, 5.1, -1.7],则p=[0.13,0.87,0.00]p = [0.13, 0.87 , 0.00] ,可得Li=−log(0.13)=0.89L_i=-log(0.13)=0.89

向量化Python代码

def softmax_loss(x, y):
"""
Computes the loss and gradient for softmax classification.

Inputs:
- x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
for the ith input.
- y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
0 <= y[i] < C

Returns a tuple of:
- loss: Scalar giving the loss
- dx: Gradient of the loss with respect to x
"""
probs = np.exp(x - np.max(x, axis=1, keepdims=True))
probs /= np.sum(probs, axis=1, keepdims=True)
N = x.shape[0]
loss = -np.sum(np.log(probs[np.arange(N), y])) / N
dx = probs.copy()
dx[np.arange(N), y] -= 1
dx /= N
return loss, dx

Multiclass SVM Loss

给出（xi,yi）（x_i, y_i），其中 xix_i 是图像，yiy_i 是图像的类别（整数），s=f（xi,W）s = f（x_i,W），其中ss 是网络的输出，则定义误差如下：

Li=∑j≠yimax(0,sj−syi+1)L_i = \sum_{j \neq y_i} max(0, s_j-s_{y_i}+1)

例如s=[3,2,5],yi=0s = [3,2,5], y_i = 0,那么Li=max(0,2−3+1)+max(0,5−3+1)=3L_i = max(0, 2-3+1)+max(0,5-3+1)=3

思考：

question1：如果允许j=yij=y_i，结果会怎么样？如果使用平均数而非求和又会怎么样？

ans1：如果允许j=yij=y_i，也就是加上max(0,syi−syi+1)=1max(0, s_{y_i}-s_{y_i}+1)=1；如果使用平均数，就是令结果乘于一个常数；这两种情况将导致误差与原误差不同，但是，由于都是正相关的，所以对于我们最后希望得到的模型没有影响，利用这样的特性，我们可以简化我们的代码。

question2：在初期，我们设置Weights近似于零，导致 s也近似于0，那么误差会是多少？

ans2:由于s也近似于0，也即syi =sjs_{y_i} ~= s_j,那么max(0,sj−syi+1)=1max(0, s_j-s_{y_i}+1)=1，故结果大致为#类别-1；用这个可以用作在早期检测我们的实现是否出现了问题。

向量化的Python代码

def svm_loss(x, y):
"""
Computes the loss and gradient using for multiclass SVM classification.

Inputs:
- x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
for the ith input.
- y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
0 <= y[i] < C

Returns a tuple of:
- loss: Scalar giving the loss
- dx: Gradient of the loss with respect to x
"""
N = x.shape[0]
correct_class_scores = x[np.arange(N), y]
margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
margins[np.arange(N), y] = 0
loss = np.sum(margins) / N
num_pos = np.sum(margins > 0, axis=1)
dx = np.zeros_like(x)
dx[margins > 0] = 1
dx[np.arange(N), y] -= num_pos
dx /= N
return loss, dx

对比

Every thing here, nothing there

从误差的定义我们可以看出，Softmax在计算误差是考虑到了所有的类别的取值，因此，如果希望Softmax Loss尽可能的小，那么会导致其他类别的分数尽可能的低；但是在SVM Loss的定义中，我们可以看到，SVM Loss只考虑了那些在正确值附近或者压制了正确值的那些值，其他的均作为0处理，因此，SVM Loss更看重鲁棒性，只看重那些可能造成影响的点，这些所谓的可能造成影响的点也就是支持向量（现在你应该明白支持向量机是什么意思了）；但是，在分类问题中，这两种方法得到的结果往往都是一致的，所以我们也不需要担心太多。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航