您的位置:首页 > 其它

#EPI#Find running median from a stream of integers

2015-07-29 11:01 483 查看
连续stream取median,如果数据量不大,维护两个heap,一个max heap存小半部分,一个min heap存大的半部分,只看两个heap的root值就可以得到median,For the first two elements add smaller one to the maxHeap on the left, and bigger one to the minHeap on the right. Then process stream data one by one,
Step 1: Add next item to one of the heaps

if next item is smaller than maxHeap root add it to maxHeap,
else add it to minHeap

Step 2: Balance the heaps (after this step heaps will be either balanced or
one of them will contain 1 more item)

if number of elements in one of the heaps is greater than the other by
more than 1, remove the root element from the one containing more elements and
add to the other one
Then at any given time you can calculate median like this:
If the heaps contain equal elements;
median = (root of maxHeap + root of minHeap)/2
Else
median = root of the heap with more elements
如果数据非常多,counting sortIf you can't hold all the items in memory at once, this problem becomes much harder. The heap solution requires you to hold all the elements in memory at once. This is not possible in most real world applications of this problem.Instead, as you see numbers, keep track of the count of the number of times you see each integer. Assuming 4 byte integers, that's 2^32 buckets, or at most 2^33 integers (key and count for each int),which is 2^35 bytes or 32GB. It will likely be much less than this because you don't need to store the key or count for those entries that are 0 (ie. like a defaultdict in python). This takes constant time to insert each new integer.Then at any point, to find the median, just use the counts to determine which integer is the middle element. This takes constant time (albeit a large constant, but constant nonetheless).reference:http://stackoverflow.com/questions/10657503/find-running-median-from-a-stream-of-integers https://gist.github.com/Vedrana/3675434 import java.util.Comparator;import java.util.PriorityQueue;import java.util.Queue;// Given a stream of unsorted integers, find the median element in sorted order at any given time.// http://www.ardendertat.com/2011/11/03/programming-interview-questions-13-median-of-integer-stream/ public class MedianOfIntegerStream {public Queue<Integer> minHeap;public Queue<Integer> maxHeap;public int numOfElements;public MedianOfIntegerStream() {minHeap = new PriorityQueue<Integer>();maxHeap = new PriorityQueue<Integer>(10, new MaxHeapComparator());numOfElements = 0;}public void addNumberToStream(Integer num) {maxHeap.add(num);if (numOfElements%2 == 0) {if (minHeap.isEmpty()) {numOfElements++;return;}else if (maxHeap.peek() > minHeap.peek()) {Integer maxHeapRoot = maxHeap.poll();Integer minHeapRoot = minHeap.poll();maxHeap.add(minHeapRoot);minHeap.add(maxHeapRoot);}} else {minHeap.add(maxHeap.poll());}numOfElements++;}public Double getMedian() {if (numOfElements%2 != 0)return new Double(maxHeap.peek());elsereturn (maxHeap.peek() + minHeap.peek()) / 2.0;}private class MaxHeapComparator implements Comparator<Integer> {@Overridepublic int compare(Integer o1, Integer o2) {return o2 - o1;}}public static void main(String[] args) {MedianOfIntegerStream streamMedian = new MedianOfIntegerStream();streamMedian.addNumberToStream(1);System.out.println(streamMedian.getMedian()); // should be 1streamMedian.addNumberToStream(5);streamMedian.addNumberToStream(10);streamMedian.addNumberToStream(12);streamMedian.addNumberToStream(2);System.out.println(streamMedian.getMedian()); // should be 5streamMedian.addNumberToStream(3);streamMedian.addNumberToStream(8);streamMedian.addNumberToStream(9);System.out.println(streamMedian.getMedian()); // should be 6.5}}
                                            
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  EPI