您的位置:首页 > 编程语言

【集体智慧编程】提供推荐

2016-04-18 00:00 441 查看
一、协作型过滤(Collaborative Filtering)


二、寻找相近用户

数据集

critics = {
'Lisa Rose':
{'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 'The Night Listener': 3.0},

'Gene Seymour':
{'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5},

'Michael Phillips':
{'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0, 'Superman Returns': 3.5, 'The Night Listener': 4.0},

'Claudia Puig':
{'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 2.5},

'Mick LaSalle':
{'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 2.0},

'Jack Matthews':
{'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},

'Toby':
{'Snakes on a Plane':4.5, 'You, Me and Dupree':1.0, 'Superman Returns':4.0}
}


Lady in the WaterSnakes on a plane Just My LuckSuperman ReturnsYou, Me and DupreeThe Night Listener
Rose2.53.53.03.52.53.0
Seymour3.03.55.05.02.53.0
Phillips2.53.03.53.54.0
Puig3.53.04.02.54.5
LaSalle3.04.02.03.02.03.0
Mattnews3.04.05.03.53.0
Toby?4.5?4.01.0?
欧几里得距离

>> from math import sqrt

>>sqrt(pow(x1-x2,2) + pow(y1-y2,2))

>> 1 / (1 + sqrt(pow(x1-x2,2) + pow(y1-y2,2))) ==> 归一化 0~1

def sim_distance(prefs, person1, person2):
si = {}
for item in prefs[person1]:        ===> 寻找p1和p2通过评论过的movie
if item in prefs[person2]:
si[item] = 1

if len(si) == 0:
return 0

sum_of_squares = 0.0

for item in prefs[person1]:       ==> 欧几里得距离公式计算相似度
if item in prefs[person2]:
sum_of_squares += pow(prefs[person1][item] - prefs[person2][item], 2)

return 1 / (1 + sum_of_squares)   ==> 归一化

皮尔逊相关度

http://lobert.iteye.com/blog/2024999

def sim_pearson(prefs, p1, p2):
si = {}
for item in prefs[p1]:
if item in prefs[p2]:
si[item] = 1

n = len(si)
if n == 0:
return 1

sum1 = 0.0
sum2 = 0.0
sum1Sq = 0.0
sum2Sq = 0.0
pSum = 0.0
for it in si:
sum1 += prefs[p1][it]
sum2 += prefs[p2][it]
sum1Sq += pow(prefs[p1][it], 2)
sum2Sq += pow(prefs[p2][it], 2)
pSum += prefs[p1][it] * prefs[p2][it]

num = pSum - (sum1 * sum2 / n)
den = sqrt((sum1Sq - pow(sum1, 2) / n) * (sum2Sq - pow(sum2, 2) / n))

if den == 0:
return 0

return num / den

推荐物品

为Toby推荐:

计算所有用户与Toby的相似度(sim_distance,sim_pearson)

def getRecommendations(prefs, person, similarity=sim_pearson):
totals = {}
simSums = {}

for other in prefs:
if other == person:
continue

#
sim = similarity(prefs, person, other)        ==> 计算参数person与其他所有用户的相似度

if sim <= 0:
continue

for item in prefs[other]:
if item not in prefs[person] or prefs[person][item] == 0: ==> 推荐没有看过的movie
totals.setdefault(item, 0)
totals[item] += prefs[other][item] * sim

simSums.setdefault(item, 0)
simSums[item] += sim

# rankings = []
# for item,total in totals.items():
#    rankings[total / simSums[item]] = item

rankings = [(total / simSums[item], item) for item, total in totals.items()]
rankings.sort()                                    ==> 按照相似度降序排序
rankings.reverse()

return rankings

相似度Nightsim * NightLadysim * LadyLucksim * Luck
Rose0.993.00.99 * 3.02.50.99 * 2.53.00.99 * 3.0
Seymour0.383.00.38 * 3.03.00.38 * 3.01.50.38 * 1.5
Puig0.894.50.89 * 4.53.00.89 * 3.0
LaSalle0.923.00.92 * 3.03.00.92 * 3.02.00.92 * 2.0
Matthews0.663.00.66 * 3.03.00.66 * 3.0
总计12.898.388.07
相似度总计0.99+0.38+0.89+0.92+0.66=3.840.99+0.38+0.92+0.66=2.830.99+0.38+0.89+0.92=3.18
总计/相似度总计3.352.832.53
三、基于物品的过滤

基于用户的协作型过滤,要求我们使用来自每一位的全部评分构建数据集。这种方法对于数量以千计的用户或是物品规模或是没有问题,但是对于上百万客户的商品的大型网站而言,将一个用户与其他所有用户进行比较,然后再对每位用户评过分的商品进行比较,其速度可能是无法忍受的。同样。一个商品销售量为数百万的网站,也许用户偏好方面彼此间很少见会有重叠,这可能会令用户相似性判断变得十分困难。

在拥有大量数据集的情况下,基于物品的协作型过滤能够更好的得出结论,而且允许我们将大量计算任务预先执行,从而
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: