Learning Apriori Algorithm - in Python
2016-03-02 09:10
513 查看
"Machine Learning in Action" is a good book. I've learnt Apriori algorithm successfully. Here is a working Python3 code piece:
# Load Data def loadDataSet(path): return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]] ''' ======== Frequent Set Searching ======== ''' # Create size1 sets def createC1(dataSet): C1 = [] # TODO: list to set maybe good enough for transaction in dataSet: for item in transaction: if not [item] in C1: C1.append([item]) C1.sort() return map(frozenset, C1) # Pruning out all set with support < minSupport # D - dataset # Ck - candidate sets # minSupport - threshold def scanD(D, Ck, minSupport): ssCnt = {} for tid in D: for can in Ck: if can.issubset(tid): if not can in ssCnt: ssCnt[can] = 1 else: ssCnt[can] += 1 numItems = float(len(D)) retList = [] supportData = {} # Measure support and prune for key in ssCnt: support = ssCnt[key] / numItems if support >= minSupport: retList.insert(0, key) supportData[key] = support return retList, supportData def aprioriGen(Lk, k): # creates Ck retList = [] lenLk = len(Lk) for i in range(lenLk): for j in range(i + 1, lenLk): L1 = list(Lk[i])[:k-2] # [0,1] | [0,2] -> [0,1,2] L2 = list(Lk[j])[:k-2] if L1 == L2: retList.append(Lk[i] | Lk[j]) return retList def apriori(dataSet, minSupport = 0.5): # start from size 1 C1 = list(createC1(dataSet)) D = list(map(set, dataSet)) L1, supportData = scanD(D, C1, minSupport) # L = [L1] k = 2 while(len(L[k-2]) > 0): print ('=Debug= Apriori Size of Last Level', len(L[k-2])) Ck = aprioriGen(L[k-2], k) Lk, supK = scanD(D, Ck, minSupport) supportData.update(supK) L.append(Lk) k += 1 return L, supportData ''' ======== Association Rule Searching ======== H: a list of items that could be on the right-hand side of a rule ''' def calcConf(freqSet, H, supportData, brl, minConf=0.7): prunedH = [] for conseq in H: conf = supportData[freqSet] / supportData[freqSet - conseq] if conf >= minConf: print (set(freqSet - conseq), '-->', set(conseq), 'conf:', conf * 100, '%') brl.append((freqSet - conseq, conseq, conf)) prunedH.append(conseq) return prunedH def rulesFromConseq(freqSet, H, supportData, brl, minConf = 0.7): m = len(H[0]) if (len(freqSet) > (m + 1)): Hmp1 = aprioriGen(H, m + 1) # Gen list of next iteration Hmp1 = calcConf(freqSet, Hmp1, supportData, brl, minConf) # pruning. pick qualified rules. if (len(Hmp1) > 1): rulesFromConseq(freqSet, Hmp1, supportData, brl, minConf) # Continue\Iterate to next level # L: a set of freqent itemset; sorted by length def generateRules(L, supportData, minConf = 0.7): bigRuleList = [] for i in range(1, len(L)): # from length 2 #print ('Apriori Rule ', i) for freqSet in L[i]: H1 = [frozenset([item]) for item in freqSet] # {0,1,2} -> [{0},{1},{2}]. # Build from size 1 on right-hand side if (i > 1): # length > 2, go level by level rulesFromConseq(freqSet, H1, supportData, bigRuleList, minConf) else: # if only 2 items, just prune - the base calcConf(freqSet, H1, supportData, bigRuleList, minConf) return bigRuleList
相关文章推荐
- 介绍一个python的新的web framework——karloop框架
- 5.7 Python API(for Elasticsearch)
- eventlet引发的学习:python GIL
- 详解设计模式中的工厂方法模式在Python程序中的运用
- 实例讲解Python设计模式编程之工厂方法模式的使用
- 举例讲解Python设计模式编程中对抽象工厂模式的运用
- 深入解析Python设计模式编程中建造者模式的使用
- 设计模式中的原型模式在Python程序中的应用示例
- Python打造出适合自己的定制化Eclipse IDE
- Python设计模式编程中Adapter适配器模式的使用实例
- Python随机生成带特殊字符的密码
- 实例解析Python设计模式编程之桥接模式的运用
- 分析Python中设计模式之Decorator装饰器模式的要点
- Python设计模式编程中解释器模式的简单程序示例分享
- 详解Python设计模式编程中观察者模式与策略模式的运用
- Python使用设计模式中的责任链模式与迭代器模式的示例
- Python设计模式中单例模式的实现及在Tornado中的应用
- Python/Django/Jinja2开发模式下Url QueryString的修改
- python socket学习
- python 读书笔记(2)算法的渐进分析