MapReduce模式MapReduce patterns
2016-03-15 21:15
281 查看
After having modified and run a job in the last post, we can now examine which are the most frequent patterns we encounter in MapReduce programming.
Although there are many of them, I think that the most important ones are:
Summarization
Filtering
Structural
Let's examine them in detail.
Summarization
By summarization we mean all the jobs that perform numerical computation over a set of data, like:
indexing
mean (or other statistical functions) computation
min/max computation
count (we've seen the WordCount example)
Filtering
Filtering is the act of retrieving only a subset of a bigger dataset. Most used cases are retrieving all data belonging to a single user or the top-N elements (by some criteria) of the dataset. Another frequent use of filtering is for sampling a dataset: when we're dealing with a lot of data , is usually a good idea to subset the original data by choosing some elements randomly to verify the behaviour of our job.
Structural
When you need to operate on the structure of the data; most used case is a join on different data, like the ones we're used to on a RDBMS.
In the next posts, we'll see in more detail how to deal with these patterns.
from: http://andreaiacono.blogspot.com/2014/03/mapreduce-patterns.html
Although there are many of them, I think that the most important ones are:
Summarization
Filtering
Structural
Let's examine them in detail.
Summarization
By summarization we mean all the jobs that perform numerical computation over a set of data, like:
indexing
mean (or other statistical functions) computation
min/max computation
count (we've seen the WordCount example)
Filtering
Filtering is the act of retrieving only a subset of a bigger dataset. Most used cases are retrieving all data belonging to a single user or the top-N elements (by some criteria) of the dataset. Another frequent use of filtering is for sampling a dataset: when we're dealing with a lot of data , is usually a good idea to subset the original data by choosing some elements randomly to verify the behaviour of our job.
Structural
When you need to operate on the structure of the data; most used case is a join on different data, like the ones we're used to on a RDBMS.
In the next posts, we'll see in more detail how to deal with these patterns.
from: http://andreaiacono.blogspot.com/2014/03/mapreduce-patterns.html
相关文章推荐
- 20135202闫佳歆--week4 系统调用(上)--学习笔记
- poj-2909-哥德巴赫猜想
- HBase配置性能调优
- [RxJS] Toggle A Stream On And Off With RxJS
- 腾讯云服务器php+mysq+nginx配置出现的问题及解决方法(亲测)
- [Java] 实验2参考代码
- lio
- Mybatid关联表查询
- [BZOJ 3295]动态逆序对
- Java---StringBuffer()方法的简单应用
- 【BZOJ1826】【tyvj2644】缓存交换,贪心+堆维护
- 为Hadoop创建JAR包文件Creating a JAR for Hadoop
- Java---StringBuffer()方法的简单应用
- 单例传值
- 关于C++中的虚拟继承的一些总结
- C++成员函数的重载、覆盖与隐藏详解
- Linux学习6之shell筛选当前目录下文件并逐个对其进行操作
- 使用Struts实现文件上传,格式限制,大小限制
- node.js用get方式获取网页中的链接
- c/c++时间函数使用方法