Big Data, MapReduce, Hadoop, and Spark with Python
2016-11-10 09:49
639 查看
此书不错,很短,且想打通PYTHON和大数据架构的关系。
先看一次,计划把这个文档作个翻译。
先来一个模拟MAPREDUCE的东东。。。
mapper.py
reducer.py
main.py
先看一次,计划把这个文档作个翻译。
先来一个模拟MAPREDUCE的东东。。。
mapper.py
class Mapper: def map(self, data): returnval = [] counts = {} for line in data: words = line.split() for w in words: counts[w] = counts.get(w, 0) + 1 for w, c in counts.iteritems(): returnval.append((w, c)) print "Mapper result:" print returnval return returnval
reducer.py
class Reducer: def reduce(self, d): returnval = [] for k, v in d.iteritems(): returnval.append("%s\t%s"%(k, sum(v))) print "Reducer result:" print returnval return returnval
main.py
from mapper import Mapper from reducer import Reducer class JobRunner: def run(self, Mapper, Reducer, data): # map mapper = Mapper() tuples = mapper.map(data) # combine combined = {} for k, v in tuples: if k not in combined: combined[k] = [] combined[k].append(v) print "combined result:" print combined # reduce reducer = Reducer() output = reducer.reduce(combined) # do something with output for line in output: print line runner = JobRunner() runner.run(Mapper, Reducer, open("input.txt"))
相关文章推荐
- Big Data(1): Hadoop, MapReduce and Python in Ubuntu
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 NonUniqueList
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 8 Common Friends
- MapReduce with MongoDB and Python
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter4 LeftOuterJoin
- 【问题】spark运行python写的mapreduce任务,hadoop平台报错,java.net.ConnectException: 连接超时
- 小书翻译完成,分享啦--《用Python操作大数据[MapReduceHadoop和Spark]》
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter1 Secondary Sort
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter6 MovingAverage
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter5 Order Inversion Pattern
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation Items
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 7 Market Basket Analysis
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 11 Smarter Email Marketing wit
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 13 k-Nearest Neighbors
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 12. K-Means Clustering
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 10 Content-Based Recommend
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation People
- MapReduce with MongoDB and Python[ZT]
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 List
- Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2