您的位置:首页 > 其它

Deal with relational data using libFM with blocks

2016-01-04 17:41 716 查看

September 17, 2015ThierryS

An answer for this question: [Example] Files for Block Structure

There is a quick explanation in the README doc here: libFM1.42 Manual

Quick explanation is case you don’t want to read this whole blog post.

I’ll take back the toy dataset I used in this previous blog post. Look at it to get the features meaning.


5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20

and test.libfm

0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1

And I’ll merge them, so it will be easier for the whole process


5 0:1 2:1 6:1 9:12.5
5 0:1 3:1 6:1 9:20
4 0:1 4:1 6:1 9:78
1 1:1 2:1 8:1 9:12.5
1 1:1 3:1 8:1 9:20
0 1:1 4:1 8:1 9:78
0 0:1 5:1 6:1

So if we wanted to use block structure.

We will have those 5 files first:

rel_user.libfm (features 0,1 and 6-8 are users features)

0 0:1 6:1
0 1:1 8:1

but in fact you can avoid to have feature_id_number broken like that (0-1, 6-8), we can recompress it, so (0-1 -> 0-1 and 6-8 -> 2-4)

0 0:1 2:1
0 1:1 4:1

rel_product.libfm (features 2-5 and 9 are products features) Same thing we can compress from:

0 2:1 9:12.5
0 3:1 9:20
0 4:1 9:78
0 5:1


0 0:1 4:12.5
0 1:1 4:20
0 2:1 4:78
0 3:1

rel_user.train (which is now the mapping, the first 3 lines correspond to the first line of rel_user.libfm | /!\ we are using a 0 indexing)


rel_product.train (which is now the mapping)


file y.train which contains the ratings only


Almost done…

Now you need to create the .x and .xt files for the user block and the product block. For this you need the script available with libFM in /bin/ after you compile them.

./bin/convert –ifile rel_user.libfm –ofilex rel_user.x –ofiley rel_user.y

you are forced to used the flag –ofiley even if rel_user.y will never be used. You can delete it every time.

and then

./bin/transpose –ifile rel_user.x –ofile rel_user.xt

Now you can do the same thing for the test set, for test because we merge the train and test dataset at the beginning, we only need to generate rel_user.test, rel_product.test and y.test

At this point, you will have a lot of files: (rel_user.train, rel_user.test, rel_user.x, rel_user.xt, rel_product.train, rel_product.test, rel_product.x, rel_produt.xt, y.train, y.test)

And run the whole thing:

./bin/libFM -task r -train y.train -test y.test –relation rel_user,rel_product -out output

It’s a bit overkill for this problem but I hope you get the point.

Now a real example

For this example, I’ll use the ml-1m.zip MovieLens dataset that you can get from here (1 million ratings)

ratings.dat (sample) / Format: UserID::MovieID::Rating::Timestamp


movies.dat (sample) / Format: MovieID::Title::Genres

1::Toy Story (1995)::Animation|Children’s|Comedy
2::Jumanji (1995)::Adventure|Children’s|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama

users.dat (sample) / Format: UserID::Gender::Age::Occupation::Zip-code


I’ll create 3 different models.

Easiest libFM files to train without block. I’ll use those features: UserID, MovieID

Regular libFM files to train without block. I’ll use those features: UserID, MovieID, Gender, Age, Occupation, Genre (of movie)

libFM files to train with block. I’ll also use those features: UserID, MovieID, Gender, Age, Occupation, Genre (of movie)

Model 1 and 2 can be created using the following code:

So you end up with a file model1.libfm and model2.libfm. Just need to split each of those files in two to create a training et test set file that I’ll call train_m1.libfm, test_m1.libfm (same thing for model2, train_m2.libfm, test_m2.libfm)

Then you just run libFM like this:

./libFM -train train_m1.libfm -test test_m1.libfm -task r -iter 20 -method mcmc -dim ‘1,1,8’ -output output_m1

But I guess you already know how to do those.

Now the interesting part, using blocks.

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息