您的位置:首页 > 编程语言 > Python开发

python-recsys 3 Data model 3 数据模型

2017-04-02 20:09 453 查看
pyrecsys数据模型包括:用户(users),物品(items),及其交互。

3.0 Items

使用一个Id和一些metadata创建一个物品(item):

from recsys.datamodel.item import Item

ITEMID = 'a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432'
item = Item(ITEMID)
# ...plus any other info you'd like to add
name = 'U2'
genres = ['rock', 'irish', '80s']
popularity = 8.89
item.add_data({'name': name, 'genres': genres, 'popularity': popularity})

读取一个文件,基于物品创建字典。下面这个例子实际上读取Movielens 1M评级电影文件(movies.dat):

# Read movie info
def read_items(filename):
items = dict()
for line in open(filename):
#1::Toy Story (1995)::Animation|Children's|Comedy
data =  line.strip('\r\n').split('::')
item_id = int(data[0])
item_name = data[1]
genres = data[2].split('|')
item = Item(item_id)
item.add_data({'name': item_name, 'genres': genres})
items[item_id] = item
return items

# Call it!
filename = './data/movielens/movies.dat'
items = read_items(filename)

3.1 用户

用给定的Id创建一个用户:

from recsys.datamodel.user import User

USERID = 'ocelma'
user = User(USERID)

将用户与物品联接起来,同时加上它的交互(评级,戏剧数量,视图等):

from recsys.datamodel.user import User
from recsys.datamodel.item import Item

ITEMID = 'a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432'
item = Item(ITEMID)
# ...plus any other info you'd like to add
name = 'U2'
item.add_data({'name': name})

USERID = 'ocelma'
PLAYS = 256
user = User(USERID)
user.add_item(ITEMID, PLAYS) #Instead of PLAYS, one can add the classical [1..5] stars (rating)

3.2 数据

数据类管控“users rate items”信息。

加载用户数据,增加元组到数据。

from recsys.datamodel.data import Data

data = Data()
for PLAYS, ITEMID in user.get_items():
data.add_tuple((PLAYS, ITEMID, user.get_id())) # Tuple format is: <value, row, column>

从一个文件中加载训练/测试数据集。下面这个例子实际上读取Movielens 1M评级电影文件(movies.dat):

from recsys.datamodel.data import Data

filename = './data/movielens/ratings.dat'

data = Data()
format = {'col':0, 'row':1, 'value':2, 'ids': 'int'}
# About format parameter:
#   'row': 1 -> Rows in matrix come from column 1 in ratings.dat file
#   'col': 0 -> Cols in matrix come from column 0 in ratings.dat file
#   'value': 2 -> Values (Mij) in matrix come from column 2 in ratings.dat file
#   'ids': int -> Ids (row and col ids) are integers (not strings)
data.load(filename, sep='::', format=format)
train, test = data.split_train_test(percent=80) # 80% train, 20% test


从测试数据集中获取数据:

for rating, item_id, user_id in test:
pass # Do something, like evaluating how well we can predict the ratings in this test dataset

访问测试数据集,假定它是一个list:

test[3]

数据类还可以存储信息到磁盘:

data.save(FILENAME)

或者甚至可以以pickle的格式来加载或存储数据:

data.load_pickle(FILENAME)
data.save_pickle(FILENAME)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: