Simple libFM example, part1(LibFM使用方法)
2015-05-02 12:24
190 查看
I often get email of people asking me how to run libFM and having trouble to understand the whole pipeline. If you are verse in Machine Learning, a first step is to take
a look at Steffen Rendle’s paper ‘Factorization Machines‘ and this one too:Factorization
Machines with libFM
I’ll try to explain how to train different kinds of models with the 4 different learning algorithms that libFM provides and use the features of libFM (like grouping and relations)
But first, here is a toy example of how each file should be. (Was posted in the libfm google group)
Simple example for 2 users and 3 items. We have 2 users, 3 items in our training set and now you want to test on the same 2 users, but now you have 4 items (the same 3 from training + one new))
Each user has a categorical feature age [“18-25″, “26-40″, “40-60″] and each item has a numerical feature price.
I one-hot encoded the users:
0 is User1
1 is User2
Same thing for items,
2 is Item1,
3 is Item2,
4 is Item3,
5 is Item4
The categorical feature age need to be one encoded too
6 is the category “18-25″,
7 is the category “26-40″,
8 is the category “40-60″
And finally the numerical feature price for the item
9 will represent the price feature
One sample can be:
5 0:1 3:1 6:1 9:20
#User1 who is 23yo is giving a rating of 5 on Item2 which costs 20 euros
We can then construct example and create a training and test set.
train.libfm
5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20
num_features = 10 #Computed on the highest integer value that represents a feature (here 9 for the Item price) + 1 (because we expect people to start at 0)
test.libfm
0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1
num_features = 10 #Computed on the highest integer value that represents a feature (here 9 for the Item price) + 1 (because we expect people to start at 0)
For the test, I have two samples I want prediction. The 0 doesn’t really have any effect in testing (Only useful if you have the true value, then libFM will output the RMSE error on it but will not use it to train the model)
Just to be sure, here is the meaning of those two samples in test:
0 1:1 4:1 8:1 9:78
#Here User2 who is 41yo is rating Item3 which costs 78 euros and we gave a rating of 0 because we don’t know yet the real rating
0 0:1 5:1 6:1
#We want to know which rating User1 who is 23yo will give to a not-yet seen Item4 and we don’t know the price
This format is the same as for libSVM
From here you have two files: train.libfm and test.libfm (the
extension doesn’t matter)
You can then run libFM like this for regression (predicting ratings):
./libfm -task r -method mcmc -train train.libfm -test test.libfm -iter
10 -dim ‘1,1,2’ -out output.libfm
So the model was train using [MCMC (-method mcmc)] on [10 (-iter 10)] iterations using a [linear model (+bias) and using factorization with 2 latent factors. (-dim ‘1,1,2’)]
You will then get some output out of the command line and prediction will be written in the file ‘output.libfm’
Discussions
This is of course a toy example but show you what you can use in libFM to train your model.
I wouldn’t recommand using the price feature like this but maybe do some transformation like log to avoid having a feature with large value but I hope you get the point.
a look at Steffen Rendle’s paper ‘Factorization Machines‘ and this one too:Factorization
Machines with libFM
I’ll try to explain how to train different kinds of models with the 4 different learning algorithms that libFM provides and use the features of libFM (like grouping and relations)
But first, here is a toy example of how each file should be. (Was posted in the libfm google group)
Simple example for 2 users and 3 items. We have 2 users, 3 items in our training set and now you want to test on the same 2 users, but now you have 4 items (the same 3 from training + one new))
Each user has a categorical feature age [“18-25″, “26-40″, “40-60″] and each item has a numerical feature price.
I one-hot encoded the users:
0 is User1
1 is User2
Same thing for items,
2 is Item1,
3 is Item2,
4 is Item3,
5 is Item4
The categorical feature age need to be one encoded too
6 is the category “18-25″,
7 is the category “26-40″,
8 is the category “40-60″
And finally the numerical feature price for the item
9 will represent the price feature
One sample can be:
5 0:1 3:1 6:1 9:20
#User1 who is 23yo is giving a rating of 5 on Item2 which costs 20 euros
We can then construct example and create a training and test set.
train.libfm
5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20
num_features = 10 #Computed on the highest integer value that represents a feature (here 9 for the Item price) + 1 (because we expect people to start at 0)
test.libfm
0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1
num_features = 10 #Computed on the highest integer value that represents a feature (here 9 for the Item price) + 1 (because we expect people to start at 0)
For the test, I have two samples I want prediction. The 0 doesn’t really have any effect in testing (Only useful if you have the true value, then libFM will output the RMSE error on it but will not use it to train the model)
Just to be sure, here is the meaning of those two samples in test:
0 1:1 4:1 8:1 9:78
#Here User2 who is 41yo is rating Item3 which costs 78 euros and we gave a rating of 0 because we don’t know yet the real rating
0 0:1 5:1 6:1
#We want to know which rating User1 who is 23yo will give to a not-yet seen Item4 and we don’t know the price
This format is the same as for libSVM
From here you have two files: train.libfm and test.libfm (the
extension doesn’t matter)
You can then run libFM like this for regression (predicting ratings):
./libfm -task r -method mcmc -train train.libfm -test test.libfm -iter
10 -dim ‘1,1,2’ -out output.libfm
So the model was train using [MCMC (-method mcmc)] on [10 (-iter 10)] iterations using a [linear model (+bias) and using factorization with 2 latent factors. (-dim ‘1,1,2’)]
You will then get some output out of the command line and prediction will be written in the file ‘output.libfm’
Discussions
This is of course a toy example but show you what you can use in libFM to train your model.
I wouldn’t recommand using the price feature like this but maybe do some transformation like log to avoid having a feature with large value but I hope you get the point.
相关文章推荐
- simple-libfm-example-part1
- Ext.data.SimpleStore的使用方法
- 详解Python使用simplejson模块解析JSON的方法
- 详解Python使用simplejson模块解析JSON的方法
- ExpandableListActivity的基本使用方法 ,SimpleExpandableListAdapter的基本使用方法
- 每日一记之ASimpleCache缓存框架的使用方法
- 下面的例子说明了如何使用java.lang.Class.getSimpleName()方法。
- 关于SimpleDateFormat的使用方法
- Simple REST Client POST使用方法
- SimpleAdapter中使用Drawable和Bitmap对象的方法
- SimpleDateFormat 的 format 方法使用详解
- beanutils通过SimpleProperty使用get或set方法赋值
- Ext.data.SimpleStore的使用方法
- SimpleDateFormat 的 format 方法使用具体解释
- Android之ListView使用方法-ArrayAdapter-SimpleAdapter-BaseAdapter
- Android之ListView使用方法-SimpleAdapter-SimpleCursorAdapter
- Android之ListView使用方法-ArrayAdapter-SimpleCursorAdapter-SimpleAdapter-BaseAdapter
- listView使用simpleAdapter通过notifyDataSetChanged()方法刷新数据问题
- SimpleDateFormat的使用方法总结
- Django 验证码的使用 -django-simple-captcha使用方法