CareerCup How to find medium of 1 billion numbers across N distributed machines efficiently?
2014-03-08 20:21
429 查看
How to find medium of 1 billion numbers across N distributed machines efficiently?
----------------------------------------------------------------------------------
1)Each machine sorts it's own elements.
Comlexity: nlog(n)
Time: Highest of all the machines.
2) Leader machine builds a heap of m elements(m being the number of machines)
Heap node contains numbers and machine to which the number belongs
3) Leader machine asks each machine to give next smallest element.
Complexity: m log(m)
4) Leader machine removes the smallest element from heap(o(1)) and asks for next min number to the machine to which that number belonged.
5) Insert the next min number in heap, repeast from step 4 till the time kth min number is found.
Total time complexity:
if h is highest chunk of data with a machine, h log(h) for sorting.
If m is number of machines:
m log(m) for building heap.
If k is half of billion numbers, find kth element complexity is:
k log(m)
Total messages passed:
k(half billion).
I am wondering if I could do the heap part in parallel.
----------------------------------------------------------------------------------
1)Each machine sorts it's own elements.
Comlexity: nlog(n)
Time: Highest of all the machines.
2) Leader machine builds a heap of m elements(m being the number of machines)
Heap node contains numbers and machine to which the number belongs
3) Leader machine asks each machine to give next smallest element.
Complexity: m log(m)
4) Leader machine removes the smallest element from heap(o(1)) and asks for next min number to the machine to which that number belonged.
5) Insert the next min number in heap, repeast from step 4 till the time kth min number is found.
Total time complexity:
if h is highest chunk of data with a machine, h log(h) for sorting.
If m is number of machines:
m log(m) for building heap.
If k is half of billion numbers, find kth element complexity is:
k log(m)
Total messages passed:
k(half billion).
I am wondering if I could do the heap part in parallel.
相关文章推荐
- 每日英语:How to find the career of your dreams
- Given a list of numbers ( fixed list) Now given any other list, how can you efficiently find out if
- CareerCup Find the no. of expressions that evaluate to a Walprime
- How is jstack being used to find the bottlenect of a java program
- how to find the address of git to download linux kernel soure
- How to find a cycle of length 4?
- CareerCup Find the smallest range that includes at least one number from each of the k sorted lists.
- CareerCup Find the ceiling value present in the BST of a given ke
- Ruby: How to find all indices of elements that match a given condition?
- CareerCup Median of three numbers
- CareerCup Fill the array with product of all numbers except the number in that cell
- CareerCup Given an array having positive integers, find a subarray which adds to a given number
- Please read "Security" section of the manual to find out how to run mysqld as root!错误解决
- In C# how to find the mime type of a file
- CareerCup how will you test if the random number generator is generating actual random numbers
- CareerCup Find the diameter of the tree
- How-To Find the Source of "Unaligned Access"
- How to find PID of process listening on a port in Linux? netstat and lsof command examples
- How to find child controls that are located in the template of a parent control
- How-to find the SQL that using lots of temp tablespace in Oracle