您的位置:首页 > Web前端

The effect of parameter class_weight on linear SVM classifier

2017-07-17 19:28 766 查看
Best way to handle unbalanced dataset with SVM

I'm trying to build a prediction model with SVMs on fairly unbalanced data.
My labels/output have two classes, positive, and negative. I would say the positive example makes about 30% of my data, and negative about 70%. I'm trying to balance out the classes as the cost associated with incorrect predictions among the classes are not
the same. One method was resampling the training data and producing an equally balanced dataset, which was larger than the original.

Having different penalties for the margin slack variables for patterns of each class is a better approach than resampling the data. It is asymptotically
equivalent to resampling anyway, but is easier to implement and continuous, rather than discrete, so you have more control.

However, choosing the weights is not straightforward. In principal you can work out a theoretical weighting that takes into account the misclassification costs and the differences between training set an operational prior class probabilities, but it will not
give the optimal performance. The best thing to do is to select the penalties/weights for each class via minimising the loss (taking into account the misclassification costs) by cross-validation.

(OpenCV) How should we set the class weights in the OpenCV SVM implementation ? 

Initialise a 1D opencv floating point matrix (point to CvMat* of type CV_32FC1) containing as many elements (columns
4000
) as classes in the learning
problem (i.e. 1xN matrix where N is the number of classes). Set each entry of this matrix to the corresponding class weights for classes 1 to N . From the OpenCV Manual: "class_weights - Optional weights, assigned to particular classes. They are multiplied
by C and thus affect the misclassification penalty for different classes. The larger weight, the larger penalty on misclassification of data from the corresponding class." 

vector<float> v = {1,3.0}; // 1 for negative class and 3 for positive class
Mat weights = Mat(v);
CvMat cvWeights = weights;
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::LINEAR;
params.C = 10;
params.class_weights = &cvWeights;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 5000, 1e-6);
// Train the SVM
CvSVM SVM;
cout << "Training begins..." << endl;
SVM.train(traindata, trainlabel, Mat(), Mat(), params);
SVM.save((this->param_file).c_str());
cout << "Training ends." << endl;

Regarding the weights, I did some small tests to check the influence. Here are my results:

No class weights:



With weights [0.9, 0.1] (0.9 for the largest class, 0.1 for the smallest class):



You can see the change of the weights clearly in these pictures. I hope this clears things up a bit.

(Image source: http://answers.opencv.org/question/26818/svm-bias-on-weights-of-positives-and-negatives/)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐