Sliding Bloom Filter

A Sliding Bloom filter (or Rolling Bloom filter) is a type of Bloom filter that stores the last n values (a sliding window). Old values are evicted based on time or fill ratio.

It uses two (or more) sub Bloom filters.

At any given time, one of the sub Bloom Filters is recognized as the master filter, and the other sub filters are called slave filters.

filter, and the other sub filters are called filters. The role of master rotates among all sub filters.

among all sub filters. New elements are inserted only in the master filter.

filter. An element is in the combined filter if it's in any of the sub filters.

of the sub filters. When some amount of time has passed or the master becomes "full" (number of set bits reaches some threshold), the oldest slave is cleared out and master / slave roles rotate.

Illustration

Here's an illustration with two sub filters:

Currently in combined filter: 17 8 4 12 55 7 31 72 2 To insert: 17 8 4 12 55 7 31 72 2

Applications

Sliding Bloom filters can be used when a time retention policy must be implemented, for example due to contractual, legislative or security reasons.

Example: The owner of a webshop that accepts credit card payments would like to track the number of returning customers. To see if a credit card numbers has been used before, credit card numbers are stored in a Bloom filter. Due to data retention policies, old credit card numbers must be discarded, thus a Sliding Bloom filter is a good fit.

If a Bloom filter needs to keep track of an ever growing set of elements (an infinite stream), a Sliding Bloom filters can be used as an approximation.

Example: Content delivery networks (CDNs) usually cache responses for previous requests. Many requests however are so called "one hit wonders"—requests made once then never again. To avoid caching such requests, it's common to implement a cache-on-second-hit rule: Responses are cached the second time around. To determine if a request has been seen before, a sliding Bloom filter is used. This is suitable since there is a never ending stream of requests. (Source: Algorithmic Nuggets in Content Delivery)