sklearn.pipeline.Pipeline类的用法
2015-11-02 10:13
274 查看
这一篇我会总结sklearn.pipeline.Pipeline。
1、sklearn.pipeline.Pipeline类
先给出官方的文档链接:http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
class sklearn.pipeline.Pipeline(steps)
官网的介绍如下:
pipeline of transforms with a final estimator.
最后估计量的变换管线
Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit.
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in
the example below.
解释:pipeline的目的就是当设置不同的参数时组合几个可以一起交叉验证的步骤。所以可以使用组合这几个步骤的名字和它们的属性参数(不过需要在参数前面加_来连接)。
参数:Parameters:
steps: list :
List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
注释:参数steps是一个list,list里面是一个个(name,transform)格式的tuple。最后一个tuple是估计函数(就是我们训练的模型类型)。而前面的tuple就是交叉验证的步骤。
下面给出官网的一个例子:
#!/usr/env/bin python
# -*- coding:utf-8 -*-
from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
# generate some data to play with
#
X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)
print X
print y
# ANOVA SVM-C
anova_filter = SelectKBest(f_regression, k=5)
print anova_filter
clf = svm.SVC(kernel='linear')#确定选择的模型
anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
# You can set the parameters using the names issued
# For instance, fit using a k of 10 in the SelectKBest
# and a parameter 'C' of the svm
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)#可以使用‘_’符号直接链接某个属性
print anova_svm.named_steps #实际上是一个字典
print type(anova_svm)
prediction = anova_svm.predict(X)
score=anova_svm.score(X,y)
print prediction,type(prediction)
print score
输出结果如下:
X [[-2.70323229 0.67787532 -0.65407568 ..., 0.18958162 0.50109417
2.41185611]
[-0.30777823 0.21915033 0.24938368 ..., 0.64548418 0.74625357
1.33408391]
[-0.25737654 -1.66858407 0.39922312 ..., 0.61351797 0.12003133
-0.22989455]
...,
[-0.01530985 0.5792915 0.11958037 ..., -1.47891157 0.39180401
0.21434039]
[-1.33123295 -1.83620537 0.50799133 ..., 0.95670232 0.70810868
-2.14387014]
[-1.31183623 -1.06511366 -0.3052247 ..., 0.55781031 1.39020755
-1.58909265]]
Y [1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1
0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1
0 0 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 1 0 1 1]
anova_filter: SelectKBest(k=5, score_func=<function f_regression at 0xaa05e9c>)
anova_svm.named_steps: {'svc': SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='linear', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False), 'anova': SelectKBest(k=10, score_func=<function f_regression at 0xaa05e9c>)}
type(anova_svm)= <class 'sklearn.pipeline.Pipeline'>
prediction= [0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1
0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 0 0
1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1] <type 'numpy.ndarray'>
score= 0.77
上面用到了几个方法:
set_params(**params) 设置步骤name的属性值
predict(*args, **kwargs) Applies transforms to the data, and the predict method of the final estimator. 预测估计值
score(*args, **kwargs) Applies transforms to the data, and the score method of the final estimator. 对最终的结果进行评分。
1、sklearn.pipeline.Pipeline类
先给出官方的文档链接:http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
class sklearn.pipeline.Pipeline(steps)
官网的介绍如下:
pipeline of transforms with a final estimator.
最后估计量的变换管线
Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit.
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in
the example below.
解释:pipeline的目的就是当设置不同的参数时组合几个可以一起交叉验证的步骤。所以可以使用组合这几个步骤的名字和它们的属性参数(不过需要在参数前面加_来连接)。
参数:Parameters:
steps: list :
List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
注释:参数steps是一个list,list里面是一个个(name,transform)格式的tuple。最后一个tuple是估计函数(就是我们训练的模型类型)。而前面的tuple就是交叉验证的步骤。
下面给出官网的一个例子:
#!/usr/env/bin python
# -*- coding:utf-8 -*-
from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
# generate some data to play with
#
X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)
print X
print y
# ANOVA SVM-C
anova_filter = SelectKBest(f_regression, k=5)
print anova_filter
clf = svm.SVC(kernel='linear')#确定选择的模型
anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
# You can set the parameters using the names issued
# For instance, fit using a k of 10 in the SelectKBest
# and a parameter 'C' of the svm
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)#可以使用‘_’符号直接链接某个属性
print anova_svm.named_steps #实际上是一个字典
print type(anova_svm)
prediction = anova_svm.predict(X)
score=anova_svm.score(X,y)
print prediction,type(prediction)
print score
输出结果如下:
X [[-2.70323229 0.67787532 -0.65407568 ..., 0.18958162 0.50109417
2.41185611]
[-0.30777823 0.21915033 0.24938368 ..., 0.64548418 0.74625357
1.33408391]
[-0.25737654 -1.66858407 0.39922312 ..., 0.61351797 0.12003133
-0.22989455]
...,
[-0.01530985 0.5792915 0.11958037 ..., -1.47891157 0.39180401
0.21434039]
[-1.33123295 -1.83620537 0.50799133 ..., 0.95670232 0.70810868
-2.14387014]
[-1.31183623 -1.06511366 -0.3052247 ..., 0.55781031 1.39020755
-1.58909265]]
Y [1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1
0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1
0 0 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 1 0 1 1]
anova_filter: SelectKBest(k=5, score_func=<function f_regression at 0xaa05e9c>)
anova_svm.named_steps: {'svc': SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='linear', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False), 'anova': SelectKBest(k=10, score_func=<function f_regression at 0xaa05e9c>)}
type(anova_svm)= <class 'sklearn.pipeline.Pipeline'>
prediction= [0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1
0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 0 0
1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1] <type 'numpy.ndarray'>
score= 0.77
上面用到了几个方法:
set_params(**params) 设置步骤name的属性值
predict(*args, **kwargs) Applies transforms to the data, and the predict method of the final estimator. 预测估计值
score(*args, **kwargs) Applies transforms to the data, and the score method of the final estimator. 对最终的结果进行评分。
相关文章推荐
- 浅谈 Java 性能优化
- struts.xml配置文件(package,namespace,action)
- 熟食开店流程
- I.MX6 Android mmm convenient to use
- hdu 5522 Numbers(水)
- matlab 函数头注释规范
- Java序列化与反序列化的总结
- 使用js在页面增加倒计时功能和显示年月日星期
- java POI中一些颜色值
- iOS中实现归档和反归档
- 动态拼接JS时传递对象
- html的<meta>标签的作用
- 获取Ca证书相关与服务器信息
- 最简单的闭包 掰开揉碎
- html的<meta>标签的作用
- Climbing Worm
- 王爽 汇编 实验13:编写、应用中断进程
- Eclipse ADT 与Android Studio 在放置jar库-资源文件-so文件的区别
- 一级建造师考试备考各科知识点记忆技巧
- protocol