Python interface of LIBLINEAR
2015-09-30 10:48
555 查看
copy自liblinear
Installation
Quick Start
Design Description
Data Structures
Utility Functions
Additional Information
development. This tool provides a simple Python interface to LIBLINEAR, a library
for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/liblinear). The
interface is very easy to use as the usage is the same as that of LIBLINEAR. The
interface is developed with the built-in Python library “ctypes.”
make
The interface needs only LIBLINEAR shared library, which is generated by
the above command. We assume that the shared library is on the LIBLINEAR
main directory or in the system path.
For windows, the shared library liblinear.dll is ready in the directory
please follow the instruction of building windows binaries in LIBLINEAR README.
in liblinearutil.py and the usage is the same as the LIBLINEAR MATLAB interface.
from liblinearutil import *
y, x = svm_read_problem(‘../heart_scale’)
m = train(y[:200], x[:200], ‘-c 4’)
p_label, p_acc, p_val = predict(y[200:], x[200:], m)
prob = problem(y, x)
param = parameter(‘-s 0 -c 4 -B 1’)
m = train(prob, param)
m = load_model(‘heart_scale.model’)
p_label, p_acc, p_val = predict(y, x, m, ‘-b 1’)
ACC, MSE, SCC = evaluations(y, p_label)
The low-level use directly calls C interfaces imported by liblinear.py. Note that
all arguments and return values are in ctypes format. You need to handle them
carefully.
from liblinear import *
prob = problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
param = parameter(‘-c 4’)
m = liblinear.train(prob, param) # m is a ctype pointer to a model
label = liblinear.predict(m, x0)
low-level and high-level use of the interface.
In liblinear.py, we adopt the Python built-in library “ctypes,” so that
Python can directly access C structures and interface functions defined
in linear.h.
While advanced users can use structures/functions in liblinear.py, to
avoid handling ctypes structures, in liblinearutil.py we provide some easy-to-use
functions. The usage is similar to LIBLINEAR MATLAB interface.
parameter**. They all contain fields with the same names in
linear.h. Access these fields carefully because you directly use a C structure
instead of a Python object. The following description introduces additional
fields and methods.
Before using the data structures, execute the following command to load the
LIBLINEAR shared library:
class feature_node:
Construct a feature_node.构造特征
node = feature_node(idx, val)
idx: an integer indicates the feature index.
val: a float indicates the feature value.
Show the index and the value of a node.
print(node)
Function: gen_feature_nodearray(xi [,feature_max=None [,issparse=True]])
Generate a feature vector from a Python list/tuple or a dictionary:
xi, max_idx = gen_feature_nodearray({1:1, 3:1, 5:-2})
xi: the returned feature_nodearray (a ctypes structure)
max_idx: the maximal feature index of xi
class problem:
Construct a problem instance
prob = problem(y, x [,bias=-1])
y: a Python list/tuple of l labels (type must be int/double).
x: a Python list/tuple of l data instances. Each element of x must be
an instance of list/tuple/dictionary type.
bias: if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term
added (default -1)
You can also modify the bias value by
prob.set_bias(1)
Note that if your x contains sparse data (i.e., dictionary), the internal
ctypes data format is still sparse.
class parameter:
Construct a parameter instance
param = parameter(‘training_options’)
If ‘training_options’ is empty, LIBLINEAR default values are applied.
Set param to LIBLINEAR default values.
param.set_to_default_values()
Parse a string of options.
param.parse_options(‘training_options’)
Show values of parameters.
print(param)
class model:
There are two ways to obtain an instance of model:
model_ = train(y, x)
model_ = load_model(‘model_file_name’)
**Note that the returned structure of interface functions
liblinear.train and liblinear.load_model is a ctypes pointer of
model**, which is different from the model object returned
by train and load_model in liblinearutil.py. We provide a
function toPyModel for the conversion:
model_ptr = liblinear.train(prob, param)
model_ = toPyModel(model_ptr)
If you obtain a model in a way other than the above approaches,
handle it carefully to avoid memory leak or segmentation fault.
Some interface functions to access LIBLINEAR models are wrapped as
members of the class model:
nr_feature = model_.get_nr_feature()
nr_class = model_.get_nr_class()
class_labels = model_.get_labels()
is_prob_model = model_.is_probability_model()
is_regression_model = model_.is_regression_model()
The decision function is W*x + b, where
W is an nr_class-by-nr_feature matrix, and
b is a vector of size nr_class.
To access W_kj (i.e., coefficient for the k-th class and the j-th feature)
and b_k (i.e., bias for the k-th class), use the following functions.
W_kj = model_.get_decfun_coef(feat_idx=j, label_idx=k)
b_k = model_.get_decfun_bias(label_idx=k)
We also provide a function to extract w_k (i.e., the k-th row of W) and
b_k directly as follows.
[w_k, b_k] = model_.get_decfun(label_idx=k)
Note that w_k is a Python list of length nr_feature, which means that
w_k[0] = W_k1.
For regression models, W is just a vector of length nr_feature. Either
set label_idx=0 or omit the label_idx parameter to access the coefficients.
W_j = model_.get_decfun_coef(feat_idx=j)
b = model_.get_decfun_bias()
[W, b] = model_.get_decfun()
Note that in get_decfun_coef, get_decfun_bias, and get_decfun, feat_idx
starts from 1, while label_idx starts from 0. If label_idx is not in the
valid range (0 to nr_class-1), then a NaN will be returned; and if feat_idx
is not in the valid range (1 to nr_feature), then a zero value will be
returned. For regression models, label_idx is ignored.
The above command loads
train() : train a linear model
predict() : predict testing data
svm_read_problem() : read the data from a LIBSVM-format file.
load_model() : load a LIBLINEAR model.
save_model() : save model to a file.
evaluations() : evaluate prediction results.
Function: train
There are three ways to call train()
model = train(y, x [, ‘training_options’])
model = train(prob [, ‘training_options’])
model = train(prob, param)
y: a list/tuple of l training labels (type must be int/double).
x: a list/tuple of l training instances. The feature vector of
each training instance is an instance of list/tuple or dictionary.
training_options: a string in the same form as that for LIBLINEAR command
mode.
prob: a problem instance generated by calling
problem(y, x).
param: a parameter instance generated by calling
parameter(‘training_options’)
model: the returned model instance. See linear.h for details of this
structure. If ‘-v’ is specified, cross validation is
conducted and the returned model is just a scalar: cross-validation
accuracy for classification and mean-squared error for regression.
If the ‘-C’ option is specified, the best parameter C is found
by cross validation. The returned model is a tuple of the best C
and the corresponding cross-validation accuracy. The parameter
selection utility is supported by only -s 0 and -s 2.
To train the same data many times with different
parameters, the second and the third ways should be faster..
prob = problem(y, x)
param = parameter(‘-s 3 -c 5 -q’)
m = train(y, x, ‘-c 5’)
m = train(prob, ‘-w1 5 -c 5’)
m = train(prob, param)
CV_ACC = train(y, x, ‘-v 3’)
best_C, best_rate = train(y, x, ‘-C -s 0’)
m = train(y, x, ‘-c {0} -s 0’.format(best_C)) # use the same solver: -s 0
Function: predict
To predict testing data with a model, use
p_labs, p_acc, p_vals = predict(y, x, model [,’predicting_options’])
y: a list/tuple of l true labels (type must be int/double). It is used
for calculating the accuracy. Use [] if true labels are
unavailable.
x: a list/tuple of l predicting instances. The feature vector of
each predicting instance is an instance of list/tuple or dictionary.
predicting_options: a string of predicting options in the same format as
that of LIBLINEAR.
model: a model instance.
p_labels: a list of predicted labels
p_acc: a tuple including accuracy (for classification), mean
squared error, and squared correlation coefficient (for
regression).
p_vals: a list of decision values or probability estimates (if ‘-b 1’
is specified). If k is the number of classes, for decision values,
each element includes results of predicting k binary-class
SVMs. If k = 2 and solver is not MCSVM_CS, only one decision value
is returned. For probabilities, each element contains k values
indicating the probability that the testing instance is in each class.
Note that the order of classes here is the same as ‘model.label’
field in the model structure.
Example:
m = train(y, x, ‘-c 5’)
p_labels, p_acc, p_vals = predict(y, x, m)
Functions: svm_read_problem/load_model/save_model
See the usage by examples:
y, x = svm_read_problem(‘data.txt’)
m = load_model(‘model_file’)
save_model(‘model_file’, m)
Function: evaluations
Calculate some evaluations using the true values (ty) and predicted
values (pv):
(ACC, MSE, SCC) = evaluations(ty, pv)
ty: a list of true values.
pv: a list of predict values.
ACC: accuracy.
MSE: mean squared error.
SCC: squared correlation coefficient.
Science, National Taiwan University. If you find this tool useful, please
cite LIBLINEAR as follows
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.
LIBLINEAR: A Library for Large Linear Classification, Journal of
Machine Learning Research 9(2008), 1871-1874. Software available at
http://www.csie.ntu.edu.tw/~cjlin/liblinear
For any question, please contact Chih-Jen Lin cjlin@csie.ntu.edu.tw,
or check the FAQ page:
http://www.csie.ntu.edu.tw/~cjlin/liblinear/faq.html
Table of Contents
IntroductionInstallation
Quick Start
Design Description
Data Structures
Utility Functions
Additional Information
Introduction
Python (http://www.python.org/) is a programming language suitable for rapiddevelopment. This tool provides a simple Python interface to LIBLINEAR, a library
for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/liblinear). The
interface is very easy to use as the usage is the same as that of LIBLINEAR. The
interface is developed with the built-in Python library “ctypes.”
Installation
On Unix systems, typemake
The interface needs only LIBLINEAR shared library, which is generated by
the above command. We assume that the shared library is on the LIBLINEAR
main directory or in the system path.
For windows, the shared library liblinear.dll is ready in the directory
..\windows'. You can also copy it to the system directory (e.g.,C:\WINDOWS\system32\’ for Windows XP). To regenerate the shared library,
please follow the instruction of building windows binaries in LIBLINEAR README.
Quick Start
There are two levels of usage. The high-level one uses utility functionsin liblinearutil.py and the usage is the same as the LIBLINEAR MATLAB interface.
from liblinearutil import *
Read data in LIBSVM format
默认输入格式
最左边的一列是label, 其他的为特征,冒号前面的是特征的col_idx后面的是特征number +1 1:0.708333 2:1 3:1 4:-0.320755 5:-0.105023 6:-1 7:1 8:-0.419847 9:-1 10:-0.225806 12:1 13:-1 -1 1:0.583333 2:-1 3:0.333333 4:-0.603774 5:1 6:-1 7:1 8:0.358779 9:-1 10:-0.483871 12:-1 13:1 +1 1:0.166667 2:1 3:-0.333333 4:-0.433962 5:-0.383562 6:-1 7:-1 8:0.0687023 9:-1 10:-0.903226 11:-1 12:-1 13:1 -1 1:0.458333 2:1 3:1 4:-0.358491 5:-0.374429 6:-1 7:-1 8:-0.480916 9:1 10:-0.935484 12:-0.333333 13:1 -1 1:0.875 2:-1 3:-0.333333 4:-0.509434 5:-0.347032 6:-1 7:1 8:-0.236641 9:1 10:-0.935484 11:-1 12:-0.333333 13:-1 -1 1:0.5 2:1 3:1 4:-0.509434 5:-0.767123 6:-1 7:-1 8:0.0534351 9:-1 10:-0.870968 11:-1 12:-1 13:1
y, x = svm_read_problem(‘../heart_scale’)
m = train(y[:200], x[:200], ‘-c 4’)
p_label, p_acc, p_val = predict(y[200:], x[200:], m)
Construct problem in python format
Dense data稠密矩阵
y, x = [1,-1], [[1,0,1], [-1,0,-1]]Sparse data稀疏矩阵
y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}]prob = problem(y, x)
param = parameter(‘-s 0 -c 4 -B 1’)
m = train(prob, param)
Other utility functions模型的保存与加载
save_model(‘heart_scale.model’, m)m = load_model(‘heart_scale.model’)
p_label, p_acc, p_val = predict(y, x, m, ‘-b 1’)
ACC, MSE, SCC = evaluations(y, p_label)
Getting online help
help(train)The low-level use directly calls C interfaces imported by liblinear.py. Note that
all arguments and return values are in ctypes format. You need to handle them
carefully.
from liblinear import *
prob = problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
param = parameter(‘-c 4’)
m = liblinear.train(prob, param) # m is a ctype pointer to a model
Convert a Python-format instance to feature_nodearray, a ctypes structure |python的格式转换和预测
x0, max_idx = gen_feature_nodearray({1:1, 3:1})label = liblinear.predict(m, x0)
Design Description
There are two files liblinear.py and liblinearutil.py, which respectively correspond tolow-level and high-level use of the interface.
In liblinear.py, we adopt the Python built-in library “ctypes,” so that
Python can directly access C structures and interface functions defined
in linear.h.
While advanced users can use structures/functions in liblinear.py, to
avoid handling ctypes structures, in liblinearutil.py we provide some easy-to-use
functions. The usage is similar to LIBLINEAR MATLAB interface.
Data Structures数据构造
Three data structures derived from linear.h are **node, problem, andparameter**. They all contain fields with the same names in
linear.h. Access these fields carefully because you directly use a C structure
instead of a Python object. The following description introduces additional
fields and methods.
Before using the data structures, execute the following command to load the
LIBLINEAR shared library:
>>> from liblinear import *
class feature_node:
Construct a feature_node.构造特征
node = feature_node(idx, val)
idx: an integer indicates the feature index.
val: a float indicates the feature value.
Show the index and the value of a node.
print(node)
Function: gen_feature_nodearray(xi [,feature_max=None [,issparse=True]])
Generate a feature vector from a Python list/tuple or a dictionary:
xi, max_idx = gen_feature_nodearray({1:1, 3:1, 5:-2})
xi: the returned feature_nodearray (a ctypes structure)
max_idx: the maximal feature index of xi
>>> xi <liblinear.feature_node_Array_5 object at 0x7f468b74e050> >>> max_idx 5
issparse: if issparse == True, zero feature values are removed. The default value is True for the sparsity.如果是稀疏的0特征只去除,默认是稀疏的 feature_max: if feature_max is assigned, features with indices larger than feature_max are removed.如果最大特征数已经设置,那么大于最大特征的feature会被删除。
class problem:
Construct a problem instance
prob = problem(y, x [,bias=-1])
y: a Python list/tuple of l labels (type must be int/double).
x: a Python list/tuple of l data instances. Each element of x must be
an instance of list/tuple/dictionary type.
bias: if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term
added (default -1)
You can also modify the bias value by
prob.set_bias(1)
Note that if your x contains sparse data (i.e., dictionary), the internal
ctypes data format is still sparse.
class parameter:
Construct a parameter instance
param = parameter(‘training_options’)
If ‘training_options’ is empty, LIBLINEAR default values are applied.
Set param to LIBLINEAR default values.
param.set_to_default_values()
Parse a string of options.
param.parse_options(‘training_options’)
Show values of parameters.
print(param)
class model:
There are two ways to obtain an instance of model:
model_ = train(y, x)
model_ = load_model(‘model_file_name’)
**Note that the returned structure of interface functions
liblinear.train and liblinear.load_model is a ctypes pointer of
model**, which is different from the model object returned
by train and load_model in liblinearutil.py. We provide a
function toPyModel for the conversion:
model_ptr = liblinear.train(prob, param)
model_ = toPyModel(model_ptr)
If you obtain a model in a way other than the above approaches,
handle it carefully to avoid memory leak or segmentation fault.
Some interface functions to access LIBLINEAR models are wrapped as
members of the class model:
nr_feature = model_.get_nr_feature()
nr_class = model_.get_nr_class()
class_labels = model_.get_labels()
is_prob_model = model_.is_probability_model()
is_regression_model = model_.is_regression_model()
The decision function is W*x + b, where
W is an nr_class-by-nr_feature matrix, and
b is a vector of size nr_class.
To access W_kj (i.e., coefficient for the k-th class and the j-th feature)
and b_k (i.e., bias for the k-th class), use the following functions.
W_kj = model_.get_decfun_coef(feat_idx=j, label_idx=k)
b_k = model_.get_decfun_bias(label_idx=k)
We also provide a function to extract w_k (i.e., the k-th row of W) and
b_k directly as follows.
[w_k, b_k] = model_.get_decfun(label_idx=k)
Note that w_k is a Python list of length nr_feature, which means that
w_k[0] = W_k1.
For regression models, W is just a vector of length nr_feature. Either
set label_idx=0 or omit the label_idx parameter to access the coefficients.
W_j = model_.get_decfun_coef(feat_idx=j)
b = model_.get_decfun_bias()
[W, b] = model_.get_decfun()
Note that in get_decfun_coef, get_decfun_bias, and get_decfun, feat_idx
starts from 1, while label_idx starts from 0. If label_idx is not in the
valid range (0 to nr_class-1), then a NaN will be returned; and if feat_idx
is not in the valid range (1 to nr_feature), then a zero value will be
returned. For regression models, label_idx is ignored.
Utility Functions
To use utility functions, type>>> from liblinearutil import *
The above command loads
train() : train a linear model
predict() : predict testing data
svm_read_problem() : read the data from a LIBSVM-format file.
load_model() : load a LIBLINEAR model.
save_model() : save model to a file.
evaluations() : evaluate prediction results.
Function: train
There are three ways to call train()
model = train(y, x [, ‘training_options’])
model = train(prob [, ‘training_options’])
model = train(prob, param)
y: a list/tuple of l training labels (type must be int/double).
x: a list/tuple of l training instances. The feature vector of
each training instance is an instance of list/tuple or dictionary.
training_options: a string in the same form as that for LIBLINEAR command
mode.
prob: a problem instance generated by calling
problem(y, x).
param: a parameter instance generated by calling
parameter(‘training_options’)
model: the returned model instance. See linear.h for details of this
structure. If ‘-v’ is specified, cross validation is
conducted and the returned model is just a scalar: cross-validation
accuracy for classification and mean-squared error for regression.
If the ‘-C’ option is specified, the best parameter C is found
by cross validation. The returned model is a tuple of the best C
and the corresponding cross-validation accuracy. The parameter
selection utility is supported by only -s 0 and -s 2.
To train the same data many times with different
parameters, the second and the third ways should be faster..
Examples:
y, x = svm_read_problem(‘../heart_scale’)prob = problem(y, x)
param = parameter(‘-s 3 -c 5 -q’)
m = train(y, x, ‘-c 5’)
m = train(prob, ‘-w1 5 -c 5’)
m = train(prob, param)
CV_ACC = train(y, x, ‘-v 3’)
best_C, best_rate = train(y, x, ‘-C -s 0’)
m = train(y, x, ‘-c {0} -s 0’.format(best_C)) # use the same solver: -s 0
Function: predict
To predict testing data with a model, use
p_labs, p_acc, p_vals = predict(y, x, model [,’predicting_options’])
y: a list/tuple of l true labels (type must be int/double). It is used
for calculating the accuracy. Use [] if true labels are
unavailable.
x: a list/tuple of l predicting instances. The feature vector of
each predicting instance is an instance of list/tuple or dictionary.
predicting_options: a string of predicting options in the same format as
that of LIBLINEAR.
model: a model instance.
p_labels: a list of predicted labels
p_acc: a tuple including accuracy (for classification), mean
squared error, and squared correlation coefficient (for
regression).
p_vals: a list of decision values or probability estimates (if ‘-b 1’
is specified). If k is the number of classes, for decision values,
each element includes results of predicting k binary-class
SVMs. If k = 2 and solver is not MCSVM_CS, only one decision value
is returned. For probabilities, each element contains k values
indicating the probability that the testing instance is in each class.
Note that the order of classes here is the same as ‘model.label’
field in the model structure.
Example:
m = train(y, x, ‘-c 5’)
p_labels, p_acc, p_vals = predict(y, x, m)
Functions: svm_read_problem/load_model/save_model
See the usage by examples:
y, x = svm_read_problem(‘data.txt’)
m = load_model(‘model_file’)
save_model(‘model_file’, m)
Function: evaluations
Calculate some evaluations using the true values (ty) and predicted
values (pv):
(ACC, MSE, SCC) = evaluations(ty, pv)
ty: a list of true values.
pv: a list of predict values.
ACC: accuracy.
MSE: mean squared error.
SCC: squared correlation coefficient.
Additional Information
This interface was written by Hsiang-Fu Yu from Department of ComputerScience, National Taiwan University. If you find this tool useful, please
cite LIBLINEAR as follows
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.
LIBLINEAR: A Library for Large Linear Classification, Journal of
Machine Learning Research 9(2008), 1871-1874. Software available at
http://www.csie.ntu.edu.tw/~cjlin/liblinear
For any question, please contact Chih-Jen Lin cjlin@csie.ntu.edu.tw,
or check the FAQ page:
http://www.csie.ntu.edu.tw/~cjlin/liblinear/faq.html
相关文章推荐
- Python 字典的创建赋值和动态扩展
- Python解析json文件报错:'utf8' codec can't decode byte 0xbb in position 0: invalid start byte
- Python 字典和列表的对比应用
- python的subprocess模块使用
- Python计算两个日期相差的天数
- LeetCode----Ugly NumberII
- python 初始化测试方法
- python运行类,能够执行
- python 操作文本编辑器
- Python时间函数
- python基础(1)
- 十分钟学会 Python
- python 实现文件复制,删除
- 《Python基础教程 (第2版 修订版)》 第2章 列表和元组(学习笔记· 一)
- python 缩进导致的问题
- python中lambda函数
- Python学习杂记1
- 《Python基础教程(第2版 修订版)》 第1章 快速改造:基础知识(学习笔记)
- 基于用户协同过滤python源码【多线程计算RMSE值】
- Python 之 __new__() 方法与实例化