您的位置:首页 > 其它

逻辑回归--sklearn基本使用

2017-11-26 22:36 441 查看


逻辑回归–sklearn基本使用

penalty : str, ‘l1’ or ‘l2’, default: ‘l2’ 

惩罚项l1或者l2 l1可以使weight稀疏,l2可以使weight均衡,当solvers 为newton-cg’, ‘sag’ and ‘lbfgs’时,只可以是l2 
C : float, default: 1.0 

正则化的强度 
fit_intercept : bool, default: True 

默认为true,此参数为截距,即y=ax+b的b 
intercept_scaling : float, default 1. 

Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept
becomes intercept_scaling * synthetic_feature_weight. 

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased. 
class_weight : dict or ‘balanced’, default: None 

默认是balanced,即{0:1,1:1}, 如果label中0比较重要,我就可以{0:2,1:1},即代价敏感学习,一般在样本不平衡中使用

solver : {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’}, default: ‘liblinear’ 

优化的算法 

数据比较少时,用liblinear是一个比较好的选择。在数据比较多的情况下,sag更快一些 

对于多分类问题,only ‘newton-cg’, and ‘lbfgs’ 可以处理multinomial loss, ‘liblinear’只可以解决ovr 

‘newton-cg’, ‘lbfgs’ and ‘sag’ 只可以解决l2范式

multi_class : str, {‘ovr’, ‘multinomial’}, default: ‘ovr’ 

If the option chosen is ‘ovr’, then a binary problem is fit for each label. Else the loss minimised is the multinomial loss fit across the entire probability distribution. Works only for the ‘newton-cg’, ‘sag’ and ‘lbfgs’ solver.

n_jobs : int, default: 1 

默认是1,此参数为线程数,可以根据个人电脑增加


Attributes

coef_ : array, shape (n_classes, n_features) 

每一维特征的系数,即weight 
intercept_ : array, shape (n_classes,) 

截距,即bias
# -*- coding: utf-8 -*-
"""
Created on Wed Apr 12 21:28:40 2017

@author: 大帆
"""

from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['font.family']='sans-serif'
plt.rcParams['axes.unicode_minus'] = False

iris=load_iris()
iris_data=iris.data
iris_target=iris.target

print(iris_data.shape)

pca=PCA(n_components=2)

X=pca.fit_transform(iris_data)
print(X.shape)

f=plt.figure()
ax=f.add_subplot(111)
ax.plot(X[:,0][iris_target==0],X[:,1][iris_target==0],'bo')
ax.scatter(X[:,0][iris_target==1],X[:,1][iris_target==1],c='r')
ax.scatter(X[:,0][iris_target==2],X[:,1][iris_target==2],c='y')
ax.set_title('数据分布图')
plt.show()

clf=LogisticRegression(multi_class='ovr',solver='lbfgs',class_weight={0:1,1:1,2:1})
clf.fit(X,iris_target)
score=clf.score(X,iris_target)

x0min,x0max=X[:,0].min(),X[:,0].max()
x1min,x1max=X[:,1].min(),X[:,1].max()
h=0.05
xx,yy=np.meshgrid(np.arange(x0min-1,x0max+1,h),np.arange(x1min-1,x1max+1,h))
x_=xx.reshape([xx.shape[0]*xx.shape[1],1])
y_=yy.reshape([yy.shape[0]*yy.shape[1],1])
test_x=np.c_[x_,y_]

test_predict=clf.predict(test_x)
z=test_predict.reshape(xx.shape)
plt.contourf(xx,yy,z, cmap=plt.cm.Paired)
plt.axis('tight')
colors='bry'
for i,color in zip(clf.classes_,colors):
idx=np.where(iris_target==i)
plt.scatter(X[idx,0],X[idx,1],c=color,cmap=plt.cm.Paired)

xmin,xmax=plt.xlim()
coef=clf.coef_
intercept=clf.intercept_
def line(c,x0):
return (-coef[c,0]*x0-intercept[c])/coef[c,1]
for i,color in zip(clf.classes_,colors):
plt.plot([xmin,xmax],[line(i,xmin),line(i,xmax)],color=color,linestyle='--')
plt.title("score:{0}".format(score))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64





请看下一篇用tensorflow模拟逻辑回归http://blog.csdn.net/daxiaofan/article/details/70156357
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: