您的位置：首页 > 其它

简单商品推荐

2017-08-21 23:52 211 查看

本次简单商品推荐只考虑购买两种商品的情况。即希望得到：如果一个人购买了商品A，那么他很有可能购买B。这样的规则。

首先导入我们的数据集，并查看用来训练的数据集类型。

import numpy as np
from __future__ import division

dataSet_fileName = "affinity_dataset.txt"
X = np.loadtxt(dataSet_fileName)
print X[:5]

我们取前五个会得到如下数组。

[[ 0.  1.  0.  0.  0.]
[ 1.  1.  0.  0.  0.]
[ 0.  0.  1.  0.  1.]
[ 1.  1.  0.  0.  0.]
[ 0.  0.  1.  1.  1.]]

以下是我们使用的数据代表的含义。

features = ["bread", "milk", "cheese", "apples", "bananas"]

每一行代表一条记录，1代表买了该水果，0代表没买该水果，我们的目的就是根据已知的买水果搭配，推测当用户买水果时，他还可能买什么其他水果？

num_apple_purchases = 0
for example in X:
if example[3] ==1:
num_apple_purchases +=1
print '{0} people bought apple'.format(num_apple_purchases)

我们先初始化一个购买苹果数量的变量，然后打印有多少个人购买了苹果。

from collections import defaultdict as dfd

valid_rules = dfd(int)
invalid_rules = dfd(int)
num_same = dfd(int)

features = ["bread", "milk", "cheese", "apples", "bananas"]

for example in X:
for premise in range(4):
if example[premise] ==0:
continue
num_same[premise] += 1
n_example,n_features =X.shape

for conclusion in range(n_features):
if premise == conclusion: continue   #跳过本次循环

if example[conclusion] == 1:
valid_rules[(premise,conclusion)] +=1
else:
invalid_rules[(premise,conclusion)] +=1

defaultddict 的第一个引数是一个factory function，用来替defaultdict里头不存在的key 设定value预设值，上面分别设定规则应验和规则无效，以及条件相同的规则数量，因为key值是条件和结论组成的元组，所以如果不存在时，可以给value（特征在特征列表中的索引值）。
利用两个循环，分别循环每个个体和每个个体的特征，如果条件满足（为1），则该条件加1.。shape输出行数（交易记录）和列数（特征）。

为了提高准确度，需要跳过条件和结论相同的情况，比如顾客买了香蕉，他们也买了香蕉。

如果适用于个体，则规则应验加1.否则不应验加1。

support = valid_rules
confidence = dfd(float)
for premise,conclusion in valid_rules.keys():
rule = (premise,conclusion)
confidence[rule] = valid_rules[rule]/num_same[premise]

for premise, conclusion in confidence:
premise_name = features[premise]
conclusion_name = features[conclusion]
print("Rule: If a person buys {0} they will also buy {1}".format(premise_name, conclusion_name))
print(" - Confidence: {0:.3f}".format(confidence[(premise, conclusion)]))
print(" - Support: {0}".format(support[(premise, conclusion)]))
print("")

支持度即有效规则次数，初始化一个置信度，规则应验的键值是由条件和结论组成的，这里采用函数keys来循环条件和结论。他的置信度就是规则应验数除以总数。

#计算置信度的时候，需要导入from __future__ import division，代表了精确除法。当我们没有导入时，/执行的是截断除法。比如1/4，在没有导入时，会出现0，导入后，才会出现0.25.

最后所得结果如下：

Rule: If a person buys bread they will also buy milk
- Confidence: 0.464
- Support: 13

Rule: If a person buys milk they will also buy cheese
- Confidence: 0.212
- Support: 11

Rule: If a person buys apples they will also buy cheese
- Confidence: 0.512
- Support: 22

Rule: If a person buys milk they will also buy apples
- Confidence: 0.346
- Support: 18

Rule: If a person buys apples they will also buy bread
- Confidence: 0.209
- Support: 9

Rule: If a person buys apples they will also buy milk
- Confidence: 0.419
- Support: 18

Rule: If a person buys milk they will also buy bananas
- Confidence: 0.519
- Support: 27

Rule: If a person buys cheese they will also buy bananas
- Confidence: 0.513
- Support: 20

Rule: If a person buys cheese they will also buy bread
- Confidence: 0.128
- Support: 5

Rule: If a person buys cheese they will also buy apples
- Confidence: 0.564
- Support: 22

Rule: If a person buys cheese they will also buy milk
- Confidence: 0.282
- Support: 11

Rule: If a person buys bread they will also buy bananas
- Confidence: 0.571
- Support: 16

Rule: If a person buys milk they will also buy bread
- Confidence: 0.250
- Support: 13

Rule: If a person buys bread they will also buy apples
- Confidence: 0.321
- Support: 9

Rule: If a person buys apples they will also buy bananas
- Confidence: 0.628
- Support: 27

Rule: If a person buys bread they will also buy cheese
- Confidence: 0.179
- Support: 5

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航