[LinkedIn]Find top 10 urls / shared links map reduce
2015-03-31 14:56
204 查看
From Here
Given a large network of computers, each keeping log files of
4000
visited urls, find the top ten most visited URLs.
Ans:
we will just mimic the actions of map-reduce:
1. pre-processing: let R be the number of servers in cluster, give each server unique id from 0,1,2,…,R-1
2. (map) For each (string,id) - send the tuple to the server which has the id hash(string) % R.
3. (reduce) Once step 2 is done (simple control communication), produce the (string,count) of the top 10 strings per server. Note that the tuples where those sent in step2 to this particular server.
4. (map) Each server will send all his top 10 to 1 server (let it be server 0). It should be fine, there are only 10*R of those records.
5. (reduce) Server 0 will yield the top 10 across the network.
Given a large network of computers, each keeping log files of
4000
visited urls, find the top ten most visited URLs.
Ans:
we will just mimic the actions of map-reduce:
1. pre-processing: let R be the number of servers in cluster, give each server unique id from 0,1,2,…,R-1
2. (map) For each (string,id) - send the tuple to the server which has the id hash(string) % R.
3. (reduce) Once step 2 is done (simple control communication), produce the (string,count) of the top 10 strings per server. Note that the tuples where those sent in step2 to this particular server.
4. (map) Each server will send all his top 10 to 1 server (let it be server 0). It should be fine, there are only 10*R of those records.
5. (reduce) Server 0 will yield the top 10 across the network.
相关文章推荐
- Find the 10 Top Sites on the Web About: 这个网站不错,吸流量
- [Java 8] (10) 使用Lambda完成函数组合,Map-Reduce以及并行化
- [Java 8] (10) 使用Lambda完成函数组合,Map-Reduce以及并行化
- find the top-10 query plans that have been recompiled the most.
- find the top-10 queries that do the most I/Os per execution
- Find the Top 10 commands in your linux box!
- [Java 8] (10) 使用Lambda完成函数组合,Map-Reduce以及并行化
- 无循环 JavaScript(map、reduce、filter和find)
- javascript中使用迭代操作数组替代for循环(map,filter,some,every,reduce,find )
- RunningMapReduceExampleTFIDF - hadoop-clusternet - This document describes how to run the TF-IDF MapReduce example against ascii books. - This project is for those who wants to experiment hadoop as a skunkworks in a small cluster (1-10 nodes) - Google Pro
- [Java 8] (10) 使用Lambda完成函数组合,Map-Reduce以及并行化
- Find The Largest Top 10 Files and Directories on Linux
- RunningMapReduceExampleTFIDF - hadoop-clusternet - This document describes how to run the TF-IDF MapReduce example against ascii books. - This project is for those who wants to experiment hadoop as a skunkworks in a small cluster (1-10 nodes) - Google Pro
- python(map,reduce,filter)以及小例子(计算1-10的和,将大写字母字符串转换成小写字符串)
- Find top 10 large files
- 人工智能之Python10 map和reduce
- [Java 8] (10) 使用Lambda完成函数组合,Map-Reduce以及并行化
- Map/Reduce中Join查询实现
- java fork join &map-reduce
- python 用filter,map,reduce来处理list会更高效