您的位置:首页 > 运维架构

[LinkedIn]Find top 10 urls / shared links map reduce

2015-03-31 14:56 204 查看
From Here

Given a large network of computers, each keeping log files of
4000
visited urls, find the top ten most visited URLs.

Ans:

we will just mimic the actions of map-reduce:

1. pre-processing: let R be the number of servers in cluster, give each server unique id from 0,1,2,…,R-1

2. (map) For each (string,id) - send the tuple to the server which has the id hash(string) % R.

3. (reduce) Once step 2 is done (simple control communication), produce the (string,count) of the top 10 strings per server. Note that the tuples where those sent in step2 to this particular server.

4. (map) Each server will send all his top 10 to 1 server (let it be server 0). It should be fine, there are only 10*R of those records.

5. (reduce) Server 0 will yield the top 10 across the network.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐