您的位置：首页 > 其它

Measuring performance of classifiers

2016-02-24 21:02 330 查看

Confusion Matrix is a common method for describing the performance of classifiers. It's a simple cross tabulation of predicted classes vs. obsessed classes.

Overall Accuracy and Kappa Statistic

The simplest measure of accuracy of this model is called Overall Accuracy which is simply the percent of samples that model predicted correct class for them.

Although Overall Accuracy is simple and interpretable, there are at least two major problems with this measure:

The Overall Accuracy measure makes no assumptions about natural frequencies of classes. For example, if we build a model to classify credit card transactions as fraudulent or good, we probably can simply achieve very high Overall Accuracy by predicting all
transactions as good. Because, the fraudulent transactions account for small part of all transactions.

The Overall Accuracy measure treats all classes the same. Consider a scenario such as classification of emails as Spam or Good. In this scenario classifying a good email as spam and deleting it will have a negative impact on user experience and has a higher
cost compare to misclassifying a spam email as good. The Overall Accuracy measure does not distinguish between a model that misclassifies the good emails or spam emails.
The overall accuracy measure helps us to understand if model passes the minimum requirements. The overall accuracy needs to be higher than no-information rate
for the model to be even considered. For example in the case of simple binary classification, the no information rate based on pure randomness is 50%. So, if we randomly assign classes to each observation, with a large enough sample, we probably get 50% accuracy.
So, any model with overall accuracy of less than 50% in binary classification and less than 1/C (assuming there are C classes) accuracy will be unacceptable

An alternative to no information rate is Kappa Statistic. This statistic shows the overall agreement between two raters. This statistic can have values between
-1 and 1. One shows complete agreement, zero shows complete disagreement and -1 shows complete agreement in opposite direction. Kappa statistics higher than 0.3 to 0.5 is considered acceptable.

Sensitivity and Specificity

Sensitivity(a.k.a True Positive Rate, TP or Recall):measures the proportion of positives that are correctly identified as such(e.g., the percentage of sick
people who are correctly identified as having the condition)

Specificity(a.k.a True Negative Rate, TN): measures the proportion of negatives that correctly identified as such(e.g., the percentage of healthy people who
are correctly identified as not having the condition)

Younden's Index:

J = Sensitivity+Specificity - 1

Its value ranges from 0 to 1, and has a zero when a diagnostic test gives the same proportion of positive results for groups with and without the disease,
i.e the test is useless. A value of 1 indicates that there are no false positives or false negatives values, i.e. the test is perfect. The index gives equal weight to false positive and false negative values, so all tests with same value of the index give
the same proportion of total misclassified results.

This index as well as other measures such as F-score is being used in conjunction with ROC curves to identify the best cut-off threshold of probabilities
to predict classes.

Calibration Plot:

One approach to create calibration plot is partitioning the predicted probabilities of test values to different bins. Then calculate the ration of observed
events among samples that fall in each bin. Finally, plotting the mid point value of each bin against ratio of events among samples in that bin should be a 45 degree line for well calibrated probabiities.

Receiving Operator Characteristic(ROC) Curves:

ROC curves were designed as a general method that, given a collection of continuous data points, determine an effective threshold such that values above threshold
are indicative of a specific event.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航