Initializing large arrays of data before use is always cumbersome but it seems to be unavoidable.

The first types of solutions fill the array with a value that denotes an empty entry. While this sounds straightforward, the choice of that value is not always easy. For floating points numbers, zero may or may not be a good candidate. If convenient (as a zero floating point value is encoded as binary zeroes filling the variable) it may be difficult in some contexts to use because zero may be valid. Fortunately, there’s always the possibility to initialize the array with NaNs, which can be tested and used correctly (but you need the POSIX functions to do that).

For integers, the situation seems more complicated as there are no reserved encoding for NaN: all the possible values represent a valid integer. Zero is favored as a NaN, but there are other solutions, which we will explore.

The second types of solutions are based on methods of marking array entries as occupied or freed.

The simplest solution is a bitmap. A bitmap (or bit arrays) will use a bit for each entry in the array, usually in a storage format that uses one bit of memory per logical bits in the bitmap. A zero bit determines that the corresponding entry is free, a one bit that it is used. This might be faster since filling a bitmap with zeroes takes a fraction of the time to fill the array. For starter, it’s at least eight times smaller, and grows even smaller as the size of the entries grows. It also has the nice side effect to manipulate less memory, and doesn’t mess up the caches if the bitmap’s not too big.

If you don’t care much about the order in which an array’s contents are stored, you can use a stack-like structure. To add an item, you just push it on top. Deleting the last item proceeds just like with a stack, you pop it by decrementing the top pointer.

#include <iostream> #include <algorithm> template <typename T> class lazy_array { private: int stack_size; T * stack; int stack_ptr; public: T & operator[](int x) { if ((x>-1) && (x<=stack_ptr)) return stack[x]; else ; // throw } void add(const T & x) { if (stack_ptr<stack_size-1) { ++stack_ptr; stack[stack_ptr]=x; } else ; // throw } int size() const { return stack_ptr+1; } void del(int offset) { if (offset>-1) { if (offset<stack_ptr) std::swap(stack[offset],stack[stack_ptr]); else ; // nothing to swap (since we // needn't swap the top with // the top) --stack_ptr; } else ; // throw } lazy_array(int stack_size_) : stack_size(stack_size_), stack(new T[stack_size]), stack_ptr(-1) { } ~lazy_array() { delete[] stack; } };

So of course this solution asks for little extra data (a stack_ptr and the stack size stored in the class), but does adding, access, and deletion in constant time but you sacrifice ordering. You can’t search for an item, find it at position i and expect the i th entry to contain the same data next time you access it.

A solution that preserves ordering was proposed by Aho, Hopcroft, and Ullman [1] and found in Programming Pearls, 2nd ed. [2]. It does preserve ordering, but at the cost of two extra array, plus a pointer.

The two extra array serve as an interlock to determine whether or not an array element is used. The initialization (in addition to allocating memory) consists only in initializing a pointer, top , to zero. The two extra arrays are named to and from (taking the names found in [2]) and are left with whatever they may contain after allocation.

If the i th data element was previously accessed, then we have that from[i]<top and that to[from[i]]=i , that is, from[i] points to an entry that points back to it. If from[i] contains a random integer, then it doesn’t point to an entry in to that points back on itself and therefore the entry is unallocated. Even if both to and from agree, they still have to be so that from[i]<top , that is, already allocated by the algorithm. The above figure shows what such an array might look like.

The proposed scheme entirely safe, provided that we first check if from[i] points somewhere inside to (below top , in fact). If it doesn’t, then the array is clearly not initialized and this entry contains garbage left over by memory allocation. An implementation is proposed by Eli Bendersky on his blog.

While this method is clever, it does require two extra array, containing pointer-sized integers, which may mean a lot of extra memory depending on the original array’s size. If the array will be used sparsely, then a very small fraction of the arrays are accessed, which may have beneficial side effects such as being cache-friendly.

The bitmap approach is therefore not too bad it seems; while it does require to be zeroed initially, it allocates a lot less memory than Aho et al.‘s solution. Like everything else, it’s about trade-offs.

[2] Jon Louis Bentley — Programming Pearls — 2nd ed., Addison-Wesley (1999)

Share this: Reddit

Twitter

More

Facebook

Email



Like this: Like Loading... Related