Hölzle’s two-instruction write barrier [PDF] for garbage collectors looks like

addr = destination offset = addr>>k; (cards-(heap_base>>k))[offset] = 1 ; mark one byte write to addr

Some SBCL users allocate Lisp object lookalikes in the C heap, and we have stack-allocated values; I have to test whether the address is in range or mask the offset to avoid overflows.

Or, we could exploit X86’s bit-addressing instructions:

addr = destination bts cards, addr write to addr

where cards is a vector of 256 or 512MB (there’s some trickery to handle negative offsets). bts will index into that vector of 4G bits, and set the corresponding bit to 1. On X86-64, we can force cards to be in the lower 4GB, and stick to 32 bit addressing: the instruction will also implicitly mask out the upper 32 bit of addr before indexing into cards . Too bad it’s around twice or thrice as slow as a shift and a byte write (or even shift, mask and byte write) and really sucks with SMP.

There are also hacks [PDF] to abuse alignment checking as hardware lowtag (tag data in the lower bit of addresses) checks. Who says that contemporary machines don’t support safe languages well? (: