您的位置:首页 > 其它

CareerCup How to find medium of 1 billion numbers across N distributed machines efficiently?

2014-03-08 20:21 429 查看
How to find medium of 1 billion numbers across N distributed machines efficiently?

----------------------------------------------------------------------------------

1)Each machine sorts it's own elements.

Comlexity: nlog(n)

Time: Highest of all the machines.

2) Leader machine builds a heap of m elements(m being the number of machines)

Heap node contains numbers and machine to which the number belongs

3) Leader machine asks each machine to give next smallest element.

Complexity: m log(m)

4) Leader machine removes the smallest element from heap(o(1)) and asks for next min number to the machine to which that number belonged.

5) Insert the next min number in heap, repeast from step 4 till the time kth min number is found.

Total time complexity:

if h is highest chunk of data with a machine, h log(h) for sorting.

If m is number of machines:

m log(m) for building heap.

If k is half of billion numbers, find kth element complexity is:

k log(m)

Total messages passed:

k(half billion).

I am wondering if I could do the heap part in parallel.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: