kaggle 各种评价指标之二 :Error Metrics for Classification Problems 分类问题错误度量
2017-01-10 14:49
633 查看
基本上必须看一遍,顺便简单翻译一下:(暂时留着,持续更新ing)
Error Metrics for Classification Problems 分类问题错误度量
Logarithmic Loss
对数损失
The logarithm of the likelihood function for a Bernoulli random distribution.
In plain English, this error metric is used where contestants have to predict that something is true or false with a probability (likelihood) ranging from definitely true (1) to equally true (0.5) to definitely false(0).
The use of log on the error provides extreme punishments for being both confident and wrong. In the worst possible case, a single prediction that something is definitely true (1) when it is actually false will add infinite to your error score and make every
other entry pointless. In Kaggle competitions, predictions are bounded away from the extremes by a small value in order to prevent this.
logloss=−1N∑i=1N∑j=1Myijlog(pij)
where N is the number of examples, M is the number of classes, and $y_{ij}$ is a binary variable indicating whether class j was correct for example i. In the case where the number of classes is 2 (M=2) then the formula simplies to:
logloss=−1N∑i=1N(yilog(pi)+(1−yi)log(1−pi))
Python Code
R Code
Sample usage Example in Python
Sample usage Example in R
Mean Consequential Error
MCE=1n∑yi≠y^i1
Matlab code:
MCE= mean(logical(y-y_pred));
Mean Average Precision@n
MAP n
Suppose there are m missing outbound edges from a user in a social graph, and you can predict up to
n other nodes that the user is likely to follow. Then, by adapting the definition of average precision in IR (http://en.wikipedia.org/wiki/Information_retrieval,
http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf), the average precision at n for this user is:
ap@n=∑k=1nP(k)/min(m,n)
where P(k) means the precision at cut-off k in the item list, i.e.,
the ratio of number of recommended nodes followed, up to the position k, over the number k; P(k) equals 0 when the k-th item is not followed upon recommendation; m is the number of relevant nodes; n is the number of predicted nodes. If the denominator
is zero, P(k)/min(m,n) is set to zero.
(1) If the user follows recommended nodes #1 and #3 along with another node that wasn't recommended, then ap@10 = (1/1 + 2/3)/3 ≈ 0.56
(2) If the user follows recommended nodes #1 and #2 along with another node that wasn't recommended, then ap@10 = (1/1 + 2/2)/3 ≈ 0.67
(3) If the user follows recommended nodes #1 and #3 and has no other missing nodes, then ap@10 = (1/1 + 2/3)/2 ≈ 0.83
The mean average precision for N users at position n is the average of the average precision of each user, i.e.,
MAP@n=∑i=1Nap@ni/N
Note this means that order matters. But it depends. Order matters only if there is at least one incorrect prediction. In other words, if all predictions are correct, it doesn't matter in which order they are given.
Thus, if you recommend two nodes A & B in that order and a user follows node A and not node B, your MAP@2 score will be higher (better) than if you recommended B and then A. This makes sense - you want the most relevant results to show up first. Consider
the following examples:
(1) The user follows recommended nodes #1 and #2 and has no other missing nodes, then ap@2 = (1/1 + 1/1)/2 = 1.0
(2) The user follows recommended nodes #2 and #1 and has no other missing nodes, then ap@2 = (1/1 + 1/1)/2 = 1.0
(3) The user follows node #1 and it was recommended first along with another node that wasn't recommended, then ap@2 = (1/1 + 0)/2 = 0.5
(4) The user follows node #1 but it was recommended second along with another node that wasn't recommend, then ap@2 = (0 + 1/2)/2 = 0.25
So, it is better to submit more certain recommendations first. AP score reflects this.
Here's an easy intro to MAP: http://fastml.com/what-you-wanted-to-know-about-mean-average-precision/
Here's another intro to MAP
from our forums.
Production Implementation
R,
test cases
Haskell,
test cases
MATLAB / Octave,
test cases
Python,test
cases
formula
example solution & submission files
Multi Class Log Loss
只有一个R 代码
R code
补充来自scikitlearn:
逻辑斯蒂损失或者交叉熵损失
Log loss, aka logistic loss or cross-entropy loss.
This is the loss function used in (multinomial) logistic regressionand extensions of it such as neural networks, defined as the negativelog-likelihood of the true labels given a probabilistic classifier’spredictions. The log loss is only defined for two
or more labels.For a single sample with true label yt in {0,1} andestimated probability yp that yt = 1, the log loss is
-log P(yt|yp) = -(yt log(yp) + (1 - yt) log(1 - yp))
Hamming Loss
The Hamming Loss measures accuracy in a multi-label classification task. The formula is given by:
汉明损失 用于多分类问题预测
HammingLoss(xi,yi)=1|D|∑i=1|D|xor(xi,yi)|L|,
where
|D|
is the number of samples
|L|
is the number of labels
yi
is the ground truth
xi
is the prediction.
Mean Utility
Mean Utility is the weighted sum of true positives, true negatives, false positives, and false negatives. There are 4 parameters, each are the weights.
The Mean Utility score is given by
MeanUtility=wtp⋅tp+wtn⋅tn+wfp⋅fp+wfn⋅fn,wherewtp,wtn>0,wfp,wfn<0
Kaggle's implementation of Mean Utility is directional, which means higher values are better.
Matthews Correlation Coefficient
kaggle未定义
Error Metrics for Classification Problems 分类问题错误度量
Logarithmic Loss对数损失
The logarithm of the likelihood function for a Bernoulli random distribution.
In plain English, this error metric is used where contestants have to predict that something is true or false with a probability (likelihood) ranging from definitely true (1) to equally true (0.5) to definitely false(0).
The use of log on the error provides extreme punishments for being both confident and wrong. In the worst possible case, a single prediction that something is definitely true (1) when it is actually false will add infinite to your error score and make every
other entry pointless. In Kaggle competitions, predictions are bounded away from the extremes by a small value in order to prevent this.
logloss=−1N∑i=1N∑j=1Myijlog(pij)
where N is the number of examples, M is the number of classes, and $y_{ij}$ is a binary variable indicating whether class j was correct for example i. In the case where the number of classes is 2 (M=2) then the formula simplies to:
logloss=−1N∑i=1N(yilog(pi)+(1−yi)log(1−pi))
Python Code
import scipy as sp def logloss(act, pred): epsilon = 1e-15 pred = sp.maximum(epsilon, pred) pred = sp.minimum(1-epsilon, pred) ll = sum(act*sp.log(pred) + sp.subtract(1,act)*sp.log(sp.subtract(1,pred))) ll = ll * -1.0/len(act) return ll language: python
R Code
MultiLogLoss <- function(act, pred){ eps <- 1e-15 pred <- pmin(pmax(pred, eps), 1 - eps) sum(act * log(pred) + (1 - act) * log(1 - pred)) * -1/NROW(act) } language: matlab
Sample usage Example in Python
pred = [1,0,1,0] act = [1,0,1,0] print(logloss(act,pred)) error: nan,RuntimeWarning: divide by zero encountered in log please solve this error kaggle people language: haskell
Sample usage Example in R
pred1 = c(0.8,0.2) pred2 = c(0.6,0.4) pred <- rbind(pred1,pred2) pred act1 <- c(1,0) act2 <- c(1,0) act <- rbind(act1,act2) MultiLogLoss(act,pred)
Mean Consequential Error
Mean Consequential Error (MCE)
The mean/average of the "Consequential Error", where all errors are equally bad (1) and the only value that matters is an exact prediction (0).MCE=1n∑yi≠y^i1
Matlab code:
MCE= mean(logical(y-y_pred));
Mean Average Precision@n
MAP n
Introduction
Parameters: nSuppose there are m missing outbound edges from a user in a social graph, and you can predict up to
n other nodes that the user is likely to follow. Then, by adapting the definition of average precision in IR (http://en.wikipedia.org/wiki/Information_retrieval,
http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf), the average precision at n for this user is:
ap@n=∑k=1nP(k)/min(m,n)
where P(k) means the precision at cut-off k in the item list, i.e.,
the ratio of number of recommended nodes followed, up to the position k, over the number k; P(k) equals 0 when the k-th item is not followed upon recommendation; m is the number of relevant nodes; n is the number of predicted nodes. If the denominator
is zero, P(k)/min(m,n) is set to zero.
(1) If the user follows recommended nodes #1 and #3 along with another node that wasn't recommended, then ap@10 = (1/1 + 2/3)/3 ≈ 0.56
(2) If the user follows recommended nodes #1 and #2 along with another node that wasn't recommended, then ap@10 = (1/1 + 2/2)/3 ≈ 0.67
(3) If the user follows recommended nodes #1 and #3 and has no other missing nodes, then ap@10 = (1/1 + 2/3)/2 ≈ 0.83
The mean average precision for N users at position n is the average of the average precision of each user, i.e.,
MAP@n=∑i=1Nap@ni/N
Note this means that order matters. But it depends. Order matters only if there is at least one incorrect prediction. In other words, if all predictions are correct, it doesn't matter in which order they are given.
Thus, if you recommend two nodes A & B in that order and a user follows node A and not node B, your MAP@2 score will be higher (better) than if you recommended B and then A. This makes sense - you want the most relevant results to show up first. Consider
the following examples:
(1) The user follows recommended nodes #1 and #2 and has no other missing nodes, then ap@2 = (1/1 + 1/1)/2 = 1.0
(2) The user follows recommended nodes #2 and #1 and has no other missing nodes, then ap@2 = (1/1 + 1/1)/2 = 1.0
(3) The user follows node #1 and it was recommended first along with another node that wasn't recommended, then ap@2 = (1/1 + 0)/2 = 0.5
(4) The user follows node #1 but it was recommended second along with another node that wasn't recommend, then ap@2 = (0 + 1/2)/2 = 0.25
So, it is better to submit more certain recommendations first. AP score reflects this.
Here's an easy intro to MAP: http://fastml.com/what-you-wanted-to-know-about-mean-average-precision/
Here's another intro to MAP
from our forums.
Sample Implementations
our C#Production Implementation
R,
test cases
Haskell,
test cases
MATLAB / Octave,
test cases
Python,test
cases
Contests that used MAP@K
MAP@500: https://www.kaggle.com/c/msdchallenge/details/Evaluation MAP@200: https://www.kaggle.com/c/event-recommendation-engine-challenge MAP@12: https://www.kaggle.com/c/outbrain-click-prediction/details/evaluation MAP@10: https://www.kaggle.com/c/FacebookRecruiting MAP@10: https://www.kaggle.com/c/coupon-purchase-prediction/details/evaluation MAP@7: https://www.kaggle.com/c/santander-product-recommendation MAP@5: https://www.kaggle.com/c/expedia-hotel-recommendations MAP@3: https://www.kddcup2012.org/c/kddcup2012-track1/details/EvaluationArticle needs:
explanationformula
example solution & submission files
Multi Class Log Loss
只有一个R 代码
R code
multiloss <- function(predicted, actual){ #to add: reorder the rows predicted_m <- as.matrix(select(predicted, -device_id)) # bound predicted by max value predicted_m <- apply(predicted_m, c(1,2), function(x) max(min(x, 1-10^(-15)), 10^(-15))) actual_m <- as.matrix(select(actual, -device_id)) score <- -sum(actual_m*log(predicted_m))/nrow(predicted_m) return(score) }
补充来自scikitlearn:
逻辑斯蒂损失或者交叉熵损失
Log loss, aka logistic loss or cross-entropy loss.
This is the loss function used in (multinomial) logistic regressionand extensions of it such as neural networks, defined as the negativelog-likelihood of the true labels given a probabilistic classifier’spredictions. The log loss is only defined for two
or more labels.For a single sample with true label yt in {0,1} andestimated probability yp that yt = 1, the log loss is
-log P(yt|yp) = -(yt log(yp) + (1 - yt) log(1 - yp))
Hamming Loss
The Hamming Loss measures accuracy in a multi-label classification task. The formula is given by:
汉明损失 用于多分类问题预测
HammingLoss(xi,yi)=1|D|∑i=1|D|xor(xi,yi)|L|,
where
|D|
is the number of samples
|L|
is the number of labels
yi
is the ground truth
xi
is the prediction.
Mean Utility
Mean Utility
带权重的true positives, true negatives, false positives, and false negatives.Mean Utility is the weighted sum of true positives, true negatives, false positives, and false negatives. There are 4 parameters, each are the weights.
The Mean Utility score is given by
MeanUtility=wtp⋅tp+wtn⋅tn+wfp⋅fp+wfn⋅fn,wherewtp,wtn>0,wfp,wfn<0
Kaggle's implementation of Mean Utility is directional, which means higher values are better.
Matthews Correlation Coefficient
kaggle未定义
相关文章推荐
- kaggle 各种评价指标之一 :Error Metrics for Regression Problems 回归问题错误度量
- 【分享】SLR Toolbox for Classification Problems(用于分类问题的SLR工具包)
- 27-如何度量分类算法的性能好坏(Scoring metrics for classification)
- ValueError: invalid literal for int() with base 10 分类: 问题总结 python 2013-12-09 09:12 3782人阅读 评论(0) 收藏
- 二分类问题的评价指标:ROC,AUC
- Metrics-Java版的指标度量工具之二
- Theme主题错误:Error retrieving parent for item: No resource found that matche 分类: Android安装及配置 2014-06-22 18:46 171人阅读 评论(0) 收藏
- Kaggle实战之二分类问题
- IAR 提示 Error[Li005]: no definition for ...的错误 的问题的解决方法
- 【Caffe】Windows下调用生成的classification.exe单张图片分类错误问题
- 二类分类问题评价指标
- 准确率,召回率,F值,机器学习分类问题的评价指标
- MS SQL错误:SQL Server failed with error code 0xc0000000 to spawn a thread to process a new login or connection. Check the SQL Server error log and the Windows event logs for information about possible related problems
- MySQL 解决ERROR 1045 (28000): Access deniedfor user datam@localhost (using password: YES)的问题 分类: database 2013-09-12 15:52 402人阅读 评论(0) 收藏
- 关于clang: error: invalid deployment target for -stdlib=libc++ (requires iOS 5.0 or later)的错误问题
- javaMail邮件问题:java.lang.SecurityException: SHA1 digest error for com/sun/mail/smtp/SMTPTransport.class
- Net Framework 2.0 安装时出现Error 25007错误问题的解决
- Struts程序中的No getter method for property错误问题
- [已解决]在VMware-server for linux 下装centos错误:an error has occurred. - no valid devices were found on which to create new file system
- Oracle 10g "RemoteOperationException: ERROR: Wrong password for user"错误 解决方案