If we assume that the range of numbers will always be 2^n (an even power of 2), then exclusive-or will work (as shown by another poster). As far as why, let's prove it:

The Theory

Given any 0 based range of integers that has 2^n elements with one element missing, you can find that missing element by simply xor-ing the known values together to yield the missing number.

The Proof

Let's look at n = 2. For n=2, we can represent 4 unique integers: 0, 1, 2, 3. They have a bit pattern of:

0 - 00

1 - 01

2 - 10

3 - 11

Now, if we look, each and every bit is set exactly twice. Therefore, since it is set an even number of times, and exclusive-or of the numbers will yield 0. If a single number is missing, the exclusive-or will yield a number that when exclusive-ored with the missing number will result in 0. Therefore, the missing number, and the resulting exclusive-ored number are exactly the same. If we remove 2, the resulting xor will be 10 (or 2).

Now, let's look at n+1. Let's call the number of times each bit is set in n , x and the number of times each bit is set in n+1 y . The value of y will be equal to y = x * 2 because there are x elements with the n+1 bit set to 0, and x elements with the n+1 bit set to 1. And since 2x will always be even, n+1 will always have each bit set an even number of times.

Therefore, since n=2 works, and n+1 works, the xor method will work for all values of n>=2 .

The Algorithm For 0 Based Ranges

This is quite simple. It uses 2*n bits of memory, so for any range <= 32, 2 32 bit integers will work (ignoring any memory consumed by the file descriptor). And it makes a single pass of the file.

long supplied = 0; long result = 0; while (supplied = read_int_from_file()) { result = result ^ supplied; } return result;

The Algorithm For Arbitrary Based Ranges

This algorithm will work for ranges of any starting number to any ending number, as long as the total range is equal to 2^n... This basically re-bases the range to have the minimum at 0. But it does require 2 passes through the file (the first to grab the minimum, the second to compute the missing int).

long supplied = 0; long result = 0; long offset = INT_MAX; while (supplied = read_int_from_file()) { if (supplied < offset) { offset = supplied; } } reset_file_pointer(); while (supplied = read_int_from_file()) { result = result ^ (supplied - offset); } return result + offset;

Arbitrary Ranges

We can apply this modified method to a set of arbitrary ranges, since all ranges will cross a power of 2^n at least once. This works only if there is a single missing bit. It takes 2 passes of an unsorted file, but it will find the single missing number every time:

long supplied = 0; long result = 0; long offset = INT_MAX; long n = 0; double temp; while (supplied = read_int_from_file()) { if (supplied < offset) { offset = supplied; } } reset_file_pointer(); while (supplied = read_int_from_file()) { n++; result = result ^ (supplied - offset); } // We need to increment n one value so that we take care of the missing // int value n++ while (n == 1 || 0 != (n & (n - 1))) { result = result ^ (n++); } return result + offset;

Basically, re-bases the range around 0. Then, it counts the number of unsorted values to append as it computes the exclusive-or. Then, it adds 1 to the count of unsorted values to take care of the missing value (count the missing one). Then, keep xoring the n value, incremented by 1 each time until n is a power of 2. The result is then re-based back to the original base. Done.

Here's the algorithm I tested in PHP (using an array instead of a file, but same concept):

function find($array) { $offset = min($array); $n = 0; $result = 0; foreach ($array as $value) { $result = $result ^ ($value - $offset); $n++; } $n++; // This takes care of the missing value while ($n == 1 || 0 != ($n & ($n - 1))) { $result = $result ^ ($n++); } return $result + $offset; }

Fed in an array with any range of values (I tested including negatives) with one inside that range which is missing, it found the correct value each time.

Another Approach

Since we can use external sorting, why not just check for a gap? If we assume the file is sorted prior to the running of this algorithm: