Bloom Filter Calculator

Here's an interactive tool to help you tune the parameters for a Bloom filter. If you want to learn more about how Bloom filters work, see separate article here.





Number of bits (m) m = In bytes: Number of hash functions (k) k = This in the optimal choice for n = Not the optimal choice for any number of elements. Acceptable false positive rate (p) p = % Allows for up to elements. Allows no elements to be added.

1 0 p n 1 2 3 4 5 6 7 8 9

Probability of false positives

Assuming uniform and independent hashing, the probability of a specific bit not being set to 1 after a single use of a hash function is

1 − 1 m

After inserting n elements using k hashes each, the probability of a specific bit still being 0 is thus

⎛

⎝ 1 − 1 m ⎞

⎠ kn

Conversely, the probability of a specific bit being set to 1, is

1 − ⎛

⎝ 1 − 1 m ⎞

⎠ kn

Since a false positive requires k random bits to be set to 1, the probability of a false positive after inserting n elements is

⎛

⎝ 1 − ⎛

⎝ 1 − 1 m ⎞

⎠ kn ⎞

⎠ k

Using the formula for calculating e…

e = lim x → ∞ ⎛

⎝ 1 − 1 x ⎞

⎠ −x

…we can approximate the probability of false positives as

( 1 − e −kn / m ) k

Optimal value for k

By fixing the n / m ratio in the previous equation, we can solve for k, and find the optimal value for it. That is, if you for example know you can spare 1 MB for the bit vector, and you know that you will insert ~10,000 elements, you can calculate the number of hash functions that minimizes the false positives in subsequent lookups.

Given n and m we want to minimize

= ( 1 − e −kn / m ) k = e ln ( 1 − e −kn / m ) k = e k ln ( 1 − e −kn / m )

In other words, we want to minimize

k ln ( 1 − e −kn / m )

By simple rules of powers and logarithms, we know that k = −m / n ln(e−kn / m), so we can rewrite the above as follows:

− m n ln ( e −kn / m ) ln ( 1 − e −kn / m )

By symmetry, we see that this is minimized when e−kn / m = 1 − e−kn / m, i.e. when e−kn / m = 0.5. Intuitively this means that we should chose a k such that after insterting n elements, half of the m bits are set to true. Solving for k gives us

k = m ln 2 n

With this value for k the optimal false positive rate is approximately 0.6185m / n.