Introduction

Finding the top elements from a finite set or a live stream is a straightforward process. But things become challenging when your calculation depends on a sliding time interval. Finding the “top K frequent words from the Twitter stream for the last 10 minutes” can be a good example for such a problem.

Can we find an algorithm that provides fast insertion(O(1)), fast deletion (O(1)) and fast selection(O(1)) within a data-stream? At first, I have decided to use Heap data structure for fast insertion and selection but realised that heaps are not suitable for frequent delete operations. As each deletion of the heap requires a “heapify” operation which has O(logn) cost.

I did some research about the topic and digged into a couple of data-streams algorithms and came up with a solution that can perform all operations effectively. I assume that the sliding part of the data can fit into the memory.