[LeetCode] Median 系列

发布网友发布时间：2023-05-18 07:08

我来回答

共1个回答

热心网友时间：2023-09-26 13:01

输入一个数据流，求这个数据流的中位数median。中位数的意思就是对数据进行排序，如果总数是奇数个，那么就是最中间的数，如果是偶数个，那么就是中间两个数的平均值。

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3)
findMedian() -> 2

Simple Sorting

Insertion Sort

[Best] Two Heaps
前两种方法都保证了整体序列有序，但其实我们只想要Median，不一定要整体有序。

Multiset and Two Pointers
方法4本质上和3是一样的。只不过是用了Self-balancing Binary Search Trees（比如AVL Tree）去实现。
实现起来有点复杂，在这里就不展开了。个人觉得方法3简单容易理解。

Others
Buckets如果stream里面的数是 statistically distributed，那么很容易通过bucket去获得中位数。一旦知道了正确的bucket，sort这个bucket就能知道中位数。如果bucket大小 << stream的大小，就大大节省了时间。
Reservoir Sampling（蓄水池）.Following along the lines of using buckets: if the stream is statistically distributed, you can rely on Reservoir Sampling. Basically, if you could maintain just one good bucket (or reservoir ) which could hold a representative sample of the entire stream, you could estimate the median of the entire stream from just this one bucket. This means good time and memory performance. Reservoir Sampling lets you do just that. Determining a "good"size for your reservoir? Now, that's a whole other challenge. A good explanation for this can be found in this StackOverflow answer.
Segment Trees如果想对有限范围内的数据进行大量的读取或者大量的插入操作，Segment Tree是个很好的数据结构。They allow us to do all such operations fast and in roughly the same amount of time , always.缺点是不容易实现（写起来麻烦）。参考 introctory article on Segment Trees 。
Order Statistic Treesare data structures which seem to be tailor-made for this problem. They have all the nice features of a BST, but also let you find the kth order element stored in the tree. 缺点是不容易实现（写起来麻烦），面试时几乎不会考到。但如果这个结构已经提供，用起来会很有趣。