Cloud Computing(3)_Basic MapReduce Algorithm Design_Pairs&Stripes
2017-03-09 22:28
260 查看
How do we aggregate partial counts efficiently?
This algorithm illustrates the use of complex keys in order to coordinate distributed computations.
Each mapper takes a sentence
Reducers sum up counts associated with these pairs
For each term emit pairs: ( (a,b), 1 ) 键值是一个pair(a,b)
[b]“Pairs Analysis”(数组短,但数目多)[/b]
Advantages
Easy to implement, easy to understand: map就是找pair,reduce就是统计
Disadvantages
Lots of pairs to sort and shuffle around, upper bound = (n!)(n个单词,就有n的阶乘个pairs)
Not many opportunities for combiners to work
The mapper emits key-value pairs with words as keys and corresponding associative arrays as values, where each associative array encodes the co-occurrence counts of the neighbors of a particular word.
Each mapper takes a sentence
Reducers perform element-wise sum of associative arrays
For each term emit stripes: a->{b:1, c:2, d:2, ….} 键值是“a”
[b]“Stripes Analysis”(数组长,但数目少)[/b]
Advantages
Far less sorting and shuffling of key-value pairs
Can make better use of combiners
Disadvantages
More difficult to implement
Underlying object more heavyweight
Fundamental limitation in terms of size of event space
Pairs
An algorithm.This algorithm illustrates the use of complex keys in order to coordinate distributed computations.
Each mapper takes a sentence
Reducers sum up counts associated with these pairs
//"pairs" approach class MAPPER method MAP(docid a, doc d) for all term w∈doc d do for all term u∈NEIGHBORS(w) do EMIT( pair(w, u) , count 1) //EMIT count for each co-occurrence class REDUCER method REDUCE(pair p, counts[c1, c2, ...]) s = 0 for all count c ∈counts[c1, c2, ...] do s = s + c EMIT(pair p, count s)
For each term emit pairs: ( (a,b), 1 ) 键值是一个pair(a,b)
[b]“Pairs Analysis”(数组短,但数目多)[/b]
Advantages
Easy to implement, easy to understand: map就是找pair,reduce就是统计
Disadvantages
Lots of pairs to sort and shuffle around, upper bound = (n!)(n个单词,就有n的阶乘个pairs)
Not many opportunities for combiners to work
Stripes
Co-occurrence information is first stored in an associative array, denoted H.The mapper emits key-value pairs with words as keys and corresponding associative arrays as values, where each associative array encodes the co-occurrence counts of the neighbors of a particular word.
Each mapper takes a sentence
Reducers perform element-wise sum of associative arrays
//"stripes" approach class MAPPER method MAP(docid a, doc d) for all term w∈doc d do H = new ASSOCIATIVEARRAY for all term u∈NEIGHBORS(w) do H{u} = H{u} + 1 //Tally words co-occurring with w EMIT( term w , Stripe H) class REDUCER method REDUCE(term w , Stripes [H1, H2, H3,...]) Hf = new ASSOCIATIVEARRAY for all stripe H ∈stripes[H1, H2, H3, ...] do sum(Hf,H) EMIT(term w , Stripe Hf)
For each term emit stripes: a->{b:1, c:2, d:2, ….} 键值是“a”
[b]“Stripes Analysis”(数组长,但数目少)[/b]
Advantages
Far less sorting and shuffling of key-value pairs
Can make better use of combiners
Disadvantages
More difficult to implement
Underlying object more heavyweight
Fundamental limitation in terms of size of event space
Pairs vs. Stripes
处理量不大,处理资源数目少,用pairs;反之,stripes较优相关文章推荐
- Cloud Computing(4)_Basic MapReduce Algorithm Design_Computing Relative Frequencies&Secondary Sorting
- Cloud Computing(2)_Basic MapReduce Algorithm Design_Local Aggregation
- MapReduce算法学习--Pairs和Stripes
- iOS App Programming Guide => Design Basic & Core Objects
- MapReduce 算法设计(二)--- Pairs 和 Stripes
- MapReduce 算法设计(二)--- Pairs 和 Stripes
- Data-Intensive Text Processing with MapReduce 第三章(2)——PAIRS AND STRIPES
- Algorithm & Design
- Data-Intensive Text Processing with MapReduce第三章(3)-MapReduce算法设计-3.2 PAIRS(对)和STRIPES(条纹)
- data-intensive text processing with mapreduce-MapReduce Algorithm Design
- 谷歌技术"三宝"之MapReduce
- 高级算法设计讲义 Lecture Notes for Advanced Algorithm Design
- MapReduce Design Patterns(chapter 6 (part 1))(十一)
- ZOJ 3232 It's not Floyd Algorithm ( 暴力水过 )
- 315. Count of Smaller Numbers After Self && 493. Reverse Pairs
- java web调用mapreduce算法-Day2&4(更新)
- [Data Structure & Algorithm] 七大查找算法
- 谷歌技术"三宝"之MapReduce
- 谷歌技术"三宝"之MapReduce
- Design Science In&nbsp…