Deal with relational data using libFM with blocks
2016-01-04 17:41
716 查看
原文:https://thierrysilbermann.wordpress.com/2015/09/17/deal-with-relational-data-using-libfm-with-blocks/
September 17, 2015ThierryS
An answer for this question: [Example] Files for Block Structure
There is a quick explanation in the README doc here: libFM1.42 Manual
Quick explanation is case you don’t want to read this whole blog post.
I’ll take back the toy dataset I used in this previous blog post. Look at it to get the features meaning.
train.libfm
5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20
and test.libfm
0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1
And I’ll merge them, so it will be easier for the whole process
dataset.libfm
5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20
0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1
So if we wanted to use block structure.
We will have those 5 files first:
rel_user.libfm (features 0,1 and 6-8 are users features)
0 0:1 6:1
0 1:1 8:1
but in fact you can avoid to have feature_id_number broken like that (0-1, 6-8), we can recompress it, so (0-1 -> 0-1 and 6-8 -> 2-4)
0 0:1 2:1
0 1:1 4:1
rel_product.libfm (features 2-5 and 9 are products features) Same thing we can compress from:
0 2:1 9:12.5
0 3:1 9:20
0 4:1 9:78
0 5:1
to
0 0:1 4:12.5
0 1:1 4:20
0 2:1 4:78
0 3:1
rel_user.train (which is now the mapping, the first 3 lines correspond to the first line of rel_user.libfm | /!\ we are using a 0 indexing)
0
0
0
1
1
rel_product.train (which is now the mapping)
0
1
2
0
1
file y.train which contains the ratings only
5
5
4
1
1
Almost done…
Now you need to create the .x and .xt files for the user block and the product block. For this you need the script available with libFM in /bin/ after you compile them.
./bin/convert –ifile rel_user.libfm –ofilex rel_user.x –ofiley rel_user.y
you are forced to used the flag –ofiley even if rel_user.y will never be used. You can delete it every time.
and then
./bin/transpose –ifile rel_user.x –ofile rel_user.xt
Now you can do the same thing for the test set, for test because we merge the train and test dataset at the beginning, we only need to generate rel_user.test, rel_product.test and y.test
At this point, you will have a lot of files: (rel_user.train, rel_user.test, rel_user.x, rel_user.xt, rel_product.train, rel_product.test, rel_product.x, rel_produt.xt, y.train, y.test)
And run the whole thing:
./bin/libFM -task r -train y.train -test y.test –relation rel_user,rel_product -out output
It’s a bit overkill for this problem but I hope you get the point.
Now a real example
For this example, I’ll use the ml-1m.zip MovieLens dataset that you can get from here (1 million ratings)
ratings.dat (sample) / Format: UserID::MovieID::Rating::Timestamp
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
movies.dat (sample) / Format: MovieID::Title::Genres
1::Toy Story (1995)::Animation|Children’s|Comedy
2::Jumanji (1995)::Adventure|Children’s|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
users.dat (sample) / Format: UserID::Gender::Age::Occupation::Zip-code
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
I’ll create 3 different models.
Easiest libFM files to train without block. I’ll use those features: UserID, MovieID
Regular libFM files to train without block. I’ll use those features: UserID, MovieID, Gender, Age, Occupation, Genre (of movie)
libFM files to train with block. I’ll also use those features: UserID, MovieID, Gender, Age, Occupation, Genre (of movie)
Model 1 and 2 can be created using the following code:
So you end up with a file model1.libfm and model2.libfm. Just need to split each of those files in two to create a training et test set file that I’ll call train_m1.libfm, test_m1.libfm (same thing for model2, train_m2.libfm, test_m2.libfm)
Then you just run libFM like this:
./libFM -train train_m1.libfm -test test_m1.libfm -task r -iter 20 -method mcmc -dim ‘1,1,8’ -output output_m1
But I guess you already know how to do those.
Now the interesting part, using blocks.
[TODO]
September 17, 2015ThierryS
An answer for this question: [Example] Files for Block Structure
There is a quick explanation in the README doc here: libFM1.42 Manual
Quick explanation is case you don’t want to read this whole blog post.
I’ll take back the toy dataset I used in this previous blog post. Look at it to get the features meaning.
train.libfm
5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20
and test.libfm
0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1
And I’ll merge them, so it will be easier for the whole process
dataset.libfm
5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20
0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1
So if we wanted to use block structure.
We will have those 5 files first:
rel_user.libfm (features 0,1 and 6-8 are users features)
0 0:1 6:1
0 1:1 8:1
but in fact you can avoid to have feature_id_number broken like that (0-1, 6-8), we can recompress it, so (0-1 -> 0-1 and 6-8 -> 2-4)
0 0:1 2:1
0 1:1 4:1
rel_product.libfm (features 2-5 and 9 are products features) Same thing we can compress from:
0 2:1 9:12.5
0 3:1 9:20
0 4:1 9:78
0 5:1
to
0 0:1 4:12.5
0 1:1 4:20
0 2:1 4:78
0 3:1
rel_user.train (which is now the mapping, the first 3 lines correspond to the first line of rel_user.libfm | /!\ we are using a 0 indexing)
0
0
0
1
1
rel_product.train (which is now the mapping)
0
1
2
0
1
file y.train which contains the ratings only
5
5
4
1
1
Almost done…
Now you need to create the .x and .xt files for the user block and the product block. For this you need the script available with libFM in /bin/ after you compile them.
./bin/convert –ifile rel_user.libfm –ofilex rel_user.x –ofiley rel_user.y
you are forced to used the flag –ofiley even if rel_user.y will never be used. You can delete it every time.
and then
./bin/transpose –ifile rel_user.x –ofile rel_user.xt
Now you can do the same thing for the test set, for test because we merge the train and test dataset at the beginning, we only need to generate rel_user.test, rel_product.test and y.test
At this point, you will have a lot of files: (rel_user.train, rel_user.test, rel_user.x, rel_user.xt, rel_product.train, rel_product.test, rel_product.x, rel_produt.xt, y.train, y.test)
And run the whole thing:
./bin/libFM -task r -train y.train -test y.test –relation rel_user,rel_product -out output
It’s a bit overkill for this problem but I hope you get the point.
Now a real example
For this example, I’ll use the ml-1m.zip MovieLens dataset that you can get from here (1 million ratings)
ratings.dat (sample) / Format: UserID::MovieID::Rating::Timestamp
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
movies.dat (sample) / Format: MovieID::Title::Genres
1::Toy Story (1995)::Animation|Children’s|Comedy
2::Jumanji (1995)::Adventure|Children’s|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
users.dat (sample) / Format: UserID::Gender::Age::Occupation::Zip-code
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
I’ll create 3 different models.
Easiest libFM files to train without block. I’ll use those features: UserID, MovieID
Regular libFM files to train without block. I’ll use those features: UserID, MovieID, Gender, Age, Occupation, Genre (of movie)
libFM files to train with block. I’ll also use those features: UserID, MovieID, Gender, Age, Occupation, Genre (of movie)
Model 1 and 2 can be created using the following code:
Then you just run libFM like this:
./libFM -train train_m1.libfm -test test_m1.libfm -task r -iter 20 -method mcmc -dim ‘1,1,8’ -output output_m1
But I guess you already know how to do those.
Now the interesting part, using blocks.
[TODO]
相关文章推荐
- Jquery一个slideToggle搞定div的隐藏与显示
- display:inline-block引发的间隙思考
- 老李推荐:第14章3节《MonkeyRunner源码剖析》 HierarchyViewer实现原理-HierarchyViewer实例化
- Xcode 升级之后插件不能使用了
- 231个web前端的javascript特效分享
- linux上传文件及解压
- Python实现邮件发送
- 设计模式之一 ---单例模式
- Myeclipse解析.classpath文件
- CFLAGS详解
- Lenovo notebook reshipment system
- 排查日志的一些基础命令
- tableViewCell中图片下方的线不被遮挡
- Linux系统下init进程的前世今生
- 如何从 Xcode 控制台输出 JavaScript 的 log?
- 小议线程之单线程、多线程、线程池
- h5connect.js 腾讯云视频点播使用指南
- 老李推荐: 第14章2节《MonkeyRunner源码剖析》 HierarchyViewer实现原理-HierarchyViewer架构概述
- 为什么要用补码?
- java基础之io流