Databases come in all shapes and sizes. From blistering fast in-memory datasets intended to hold megabytes, to massive distributed systems designed to push terabytes around like lego. Is it even worth looking into a database engine these days?

Let me start by saying: NO. If you need a native database written in object pascal, there are several awesome engines available. Engines that have been in production for ages, that have been tried and tested by time with excellent supports, speed and track records. My personal favorite is ElevateDB which is the successor to DBIsam, an engine I used in pretty much all my projects before 64 bit became the norm. ElevateDB handles both 32 and 64 bit and is by far my database of choice.

The purpose of this article is not to create an alternative or anything along those lines, quite the opposite. It’s purely to demonstrate some of the ideas behind a database – and just how far we have come from the old “file of record” that is synonymous with the first databases. Like C/C++ Delphi has been around for a while so you can still find support for these older features, which I think is great because there are still places where such a “file of record” can be extremely handy. But again, that is not what we are going to do here.

The reason I mentioned “file of record” is because, even though we don’t use that any more – it does summarize the most fundamental idea of a database. What exactly is a database anyways? Is it really this black box?

A database file (to deal with that first) is supposed to contain the following:

One of more table definitions

One or more tables (as in data)

Indexes and lookup tables (word indexes)

Stored procedures

Views and triggers

So the question becomes, how exactly do we stuff all these different things into a single file? Does it have a filesystem? Are each of these things defined in a record of sorts? And what about this mysterious “page size” thingy, what is that about?

A file of blocks

All databases have the same problem, but each solves these problems differently. Most major databases are in fact not a single file anymore. Some even place the above in completely separate files. For me personally I don’t see any gain in having a single file with everything stuffed into it – but for sake of argument we will be looking at that in this article. It has some educational value.

The way databases are organized is directly linked to the problem above, namely that we need to store different types of data – of arbitrary length – into a single file. In other words you can’t use a fixed record because you cannot possibly know how many fields a table will have, nor can you predict the number of tables or procedures. So we have to create a system that can take any length of data and “somehow” be able to place that inside the file.

So ok, what if we divide the file into blocks, each capable of storing a piece of data? So if a single block is not enough, the data will be spread out over multiple blocks?

Indeed. And that is also where the page-size value comes in. The pagesize defines the capacity of a block, which indirectly affects how much data each block occupies. Pretty cool right?

But conceptually dividing a file into blocks doesn’t solve all our problems. How exactly will we know what blocks represents a record or a form definition? I mean, if some piece of data is spread over 10 blocks or 20 blocks, how do we know that these represents a single piece of data? How do we navigate from block #1 in a sequence, to block #2?

Well, each block in the file can house raw data, we have to remember that the whole “block” idea is just conceptual and a way that we approach a file in our code. When we write blocks to the file, we have to do that in a particular way, so it’s not just a raw slice of a stream or array of bytes.

We need a block header to recognize that indeed, this is a known block; we need the block number of the next block that holds more data – and it’s probably wise to sign each block with the block-number for the first pice.

So from a pseudo-code point of view, we end up with something like:

Since our blocks will be hosted inside a stream (so no “file of record” like I said), we have to use the “packed” keyword. Delphi like any other language always tries to align records to a boundary, so even if the record is just 4 bytes long (for example), it will be aligned to eight bytes (you can control the boundary via the compiler options). That can cause problems when we calculate the offset to our blocks so we mark both the record and data as “packed”, which tells the compiler to drop alignment.

Let’s look at what each field in the record (descriptor) means:

fbHeader, a unique number that we check for before reading a block. If this number is missing or doesn’t match, something is wrong and the data is not initialized (or it’s not a db file). Lets use $CAFEBABE since it’s both unique and fun at the same time!

fbFirst, the block number of the first block in a sequence (piece of coherent data)

fbNext, the next block in the sequence after the current

fbUsed: the number of bytes uses in the fbData array. At the end of a sequence it might just use half of what the block can store – so we make sure we know how much to extract from fbData in each segment

fbData: a packed byte array housing “pagesize” number of bytes

A sequence of blocks

The block system solves the problem of storing variable length data by dividing the data into suitable chunks. As shown above we store the data with just enough information to find the next block, and next block, until we have read back the whole sequence.

So the block-sequence is pretty much a file. It’s actually very much a file because this is incidentally how harddisks and floppy disks organize files. It had a more complex layout of course, but the idea is the same. A “track-disk-device” stores blocks of data organized in tracks and sectors (sequences). Not a 1:1 comparison but neat none the less.

But ok, we now have some idea about storing larger pieces of data as a chunk of blocks. But why not just store the data directly? Why not just append each record to a file and to hell with block chunks – wouldnt that be faster?

Well yes, but how would you recycle the space? Lets say you have a database with 10.000 records, each with different sizes, and you want to delete record number 5500. If you just append stuff, how would you recycle that space? There is no way of predicting the size of the next sequence, so you could end up with large segments of empty space that could never be recycled.

By conceptually dividing the file into predictable blocks, and then storing data in chunks where each block “knows” it’s next of kin – and holds a reference to its root (the fbFirst field), we can suddenly solve the problem of recycling!

Ok, lets sum up what we have so far:

We have solved storing arbitrary length data by dividing the data into conceptual blocks. These blocks dont have to be next to each other.

We have solved reading X number of blocks to re-create the initial data

We call the collection of blocks that makes up a piece of data a “sequence”

The immediate benefit of block-by-block storage is that space can be recycled. Blocks dont have to be next to each other, a record can be made up by blocks scattered all over the place but still remain coherent.

Not bad! But we are still not home free, there is another challenge that looms above, namely how can we know if a block is available or occupied?

Keeping track of blocks

This is actually a pretty cool place in the buildup of an engine, because the way we read, write and figure out what blocks can be recycled – very much impacts the speed of high-level functions like inserts and navigation. This is where I would introduce memory mapped files before I moved on, but like I mentioned earlier -we will skip memory mapping because the article would quickly morph into a small essay. I don’t want memory mapping and MMU theory to overshadow the fundamental principles that I’m trying to pass on.

We have now reached the point where we ask the question “how do we quickly keep track of available free blocks?”, which is an excellent question with more than a single answer. Some database vendors use a separate file where each byte represents a block. My first experiment was to use a text file, thinking that functions like Pos() would help me locate blocks faster. It was a nice idea but we need more grunt than that.

What I landed on after some experimentation, which is a good compromise between size and speed, was to use a bit buffer to keep track of things. So a single bit is either taken (1) or available (0). You can quickly search for available bits because if a byte has any other value than $FF (255) you know there is a free bit there. It’s also very modest with regards to size, and you can keep track of 10.000 blocks with only 1250 bytes.

The code for the bit-buffer was written to be easy to use. In a high-end engine you would not waste the cpu time by isolating small calculations in a single method, but try to inline as much as possible. But for educational purposes my simple implementation will be more than adequate.

Note: I will be setting up a github folder for this project, so for our next article you can just fork that. WordPress has a tendency to mess up Delphi code, so if the code looks weird don’t worry, it will all be neatly organized into a project shortly.

The file header

Before we look at the bit code to keep track of blocks, you might be thinking “what good is it to keep track of blocks if we have nowhere to store that information?“. You are quite right, and this is where the file header comes in.

The file header is the only fixed part of a database file. Like I mentioned earlier there are engines that stuffs everything into a single file, but in most cases where performance is the highest priority – you want to use several files and avoid mixing apples and oranges. I would store the block-map (bit buffer) in a separate file – because that maps easily into memory under normal use. I would also store the table definitions, indexes and more as separate files. If nothing else than to make repairing and compacting a database sane. But i promised to do a single file model (me and my big fingers), so we will be storing the meta-data inside the database file, so let’s do just that.

The file-header is just a segment of the database-file (the start of the file) that contains some vital information. When we calculate the stream offset to each block (for either reading or writing), we simply add the size of the header to that. We don’t want to accidentally overwrite that part of the file.

Depending on how we evolve the reading and writing of data sequences, the header doesn’t have to contain that much data. You probably want to keep track of the page-size, followed by the start block for the table definitions. So when you open a database you would immediately start by reading the block-sequence containing all the definitions the file contains. How we organize the data internally is irrelevant for the block-file and IO scaffolding. It’s job is simple: read or write blocks, calculate offsets, avoid killing the header, pay attention to identifiers.

Some coders store the db schemas etc. at the end of the file, so when you close the DB the information is appended to the filestream – and the offset is stored in the header. This is less messy, but also opens up for corruption. If the DB is not properly closed you risk the DB information never being streamed out. Which is again another nail in the coffin for single-file databases (at least from my personal view, there are many ways to Rome and database theory can drive you nuts at times).

So I end this first article of our series with that. Hopefully I have given you enough ideas to form a mental image of how the underlying structure of a database file is organized. There are always exceptions to the rule and a wealth of different database models exists. So please keep in mind that this article has just scratched the surface on a slab of concrete.

unit qtx.utils.bits; interface uses System.SysUtils, System.Classes; type (* Exception classes *) EQTXBitBuffer = Class(Exception); TBitOffsetArray = packed array of NativeUInt; (* About TQTXBitBuffer: This class allows you to manage a large number of bits, much like TBits in VCL and LCL. However, it is not limited by the shortcomings of the initial TBits. - The bitbuffer can be saved - The bitbuffer can be loaded - The class exposes a linear memory model - The class expose methods (class functions) that allows you to perform operations on pre-allocated memory (memory you manage in your application). Uses of TQTXBitbuffer: Bit-buffers are typically used to represent something else, like records in a database-file. A bit-map is often used in Db engines to represent what hapes are used (bit set to 1), and pages that can be re-cycled or compacted away later. *) TQTXBitBuffer = Class(TObject) Private FData: PByte; FDataLng: NativeInt; FDataLen: NativeInt; FBitsMax: NativeUInt; FReadyByte: NativeUInt; FAddr: PByte; BitOfs: 0 .. 255; FByte: byte; function GetByte(const Index: NativeInt): byte; procedure SetByte(const Index: NativeInt; const Value: byte); function GetBit(const Index: NativeUInt): boolean; procedure SetBit(const Index: NativeUInt; const Value: boolean); Public property Data: PByte read FData; property Size: NativeInt read FDataLen; property Count: NativeUInt read FBitsMax; property Bytes[const Index: NativeInt]: byte Read GetByte write SetByte; property bits[const Index: NativeUInt]: boolean Read GetBit write SetBit; default; procedure Allocate(MaxBits: NativeUInt); procedure Release; function Empty: boolean; procedure Zero; function ToString(const Boundary: integer = 16): string; reintroduce; class function BitsOf(Const aBytes: NativeInt): NativeUInt; class function BytesOf(aBits: NativeUInt): NativeUInt; class function BitsSetInByte(const Value: byte): NativeInt; inline; class Function BitGet(Const Index: NativeInt; const Buffer): boolean; class procedure BitSet(Const Index: NativeInt; var Buffer; const Value: boolean); procedure SaveToStream(const stream: TStream); virtual; procedure LoadFromStream(const stream: TStream); virtual; procedure SetBitRange(First, Last: NativeUInt; const Bitvalue: boolean); procedure SetBits(const Value: TBitOffsetArray; const Bitvalue: boolean); function FindIdleBit(var Value: NativeUInt; const FromStart: boolean = false): boolean; destructor Destroy; Override; End; implementation resourcestring ERR_BitBuffer_InvalidBitIndex = 'Invalid bit index, expected 0..%d not %d'; ERR_BitBuffer_InvalidByteIndex = 'Invalid byte index, expected 0..%d not %d'; ERR_BitBuffer_BitBufferEmpty = 'Bitbuffer is empty error'; ERR_ERR_BitBuffer_INVALIDOFFSET = 'Invalid bit offset, expected 0..%d, not %d'; var CNT_BitBuffer_ByteTable: array [0..255] of NativeInt = (0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8); function QTXToNearest(const Value, Factor: NativeUInt): NativeUInt; var LTemp: NativeUInt; Begin Result := Value; LTemp := Value mod Factor; If (LTemp > 0) then inc(Result, Factor - LTemp); end; // ########################################################################## // TQTXBitBuffer // ########################################################################## Destructor TQTXBitBuffer.Destroy; Begin If not Empty then Release; inherited; end; function TQTXBitBuffer.ToString(const Boundary: integer = 16): string; const CNT_SYM: array [boolean] of string = ('0', '1'); var x: NativeUInt; LCount: NativeUInt; begin LCount := Count; if LCount > 0 then begin LCount := QTXToNearest(LCount, Boundary); x := 0; while x < LCount do begin if x = 0) then Begin LAddr := @Buffer; inc(LAddr, Index shr 3); LByte := LAddr^; BitOfs := Index mod 8; LCurrent := (LByte and (1 shl (BitOfs mod 8))) 0; case Value of true: begin (* set bit if not already set *) If not LCurrent then LByte := (LByte or (1 shl (BitOfs mod 8))); LAddr^ := LByte; end; false: begin (* clear bit if already set *) If LCurrent then LByte := (LByte and not(1 shl (BitOfs mod 8))); LAddr^ := LByte; end; end; end else Raise EQTXBitBuffer.CreateFmt(ERR_ERR_BitBuffer_INVALIDOFFSET, [maxint - 1, index]); end; procedure TQTXBitBuffer.SaveToStream(const stream: TStream); var LWriter: TWriter; begin LWriter := TWriter.Create(stream, 1024); try LWriter.WriteInteger(FDataLen); LWriter.Write(FData^, FDataLen); finally LWriter.FlushBuffer; LWriter.Free; end; end; Procedure TQTXBitBuffer.LoadFromStream(const stream: TStream); var LReader: TReader; LLen: NativeInt; Begin Release; LReader := TReader.Create(stream, 1024); try LLen := LReader.ReadInteger; if LLen > 0 then begin Allocate(BitsOf(LLen)); LReader.Read(FData^, LLen); end; finally LReader.Free; end; end; Function TQTXBitBuffer.Empty: boolean; Begin Result := FData = NIL; end; Function TQTXBitBuffer.GetByte(const Index: NativeInt): byte; Begin If FData NIL then Begin If (index >= 0) and (Index = 0) and (Index Secondary then Result := (Primary - Secondary) else Result := (Secondary - Primary); if Exclusive then begin If (Primary < 1) or (Secondary < 1) then inc(Result); end; If (Result < 0) then Result := abs(Result); end else Result := 0; end; Begin If (FData nil) then Begin If (First 0 do Begin SetBit(x, Bitvalue); inc(x); SetBit(x, Bitvalue); inc(x); SetBit(x, Bitvalue); inc(x); SetBit(x, Bitvalue); inc(x); SetBit(x, Bitvalue); inc(x); SetBit(x, Bitvalue); inc(x); SetBit(x, Bitvalue); inc(x); SetBit(x, Bitvalue); inc(x); dec(LLongs); end; (* process singles *) LSingles := NativeInt(LCount mod 8); while (LSingles > 0) do Begin SetBit(x, Bitvalue); inc(x); dec(LSingles); end; end else begin if (First = Last) then SetBit(First, true) else Raise EQTXBitBuffer.CreateFmt(ERR_BitBuffer_InvalidBitIndex, [FBitsMax, Last]); end; end else Raise EQTXBitBuffer.CreateFmt(ERR_BitBuffer_InvalidBitIndex, [FBitsMax, First]); end else Raise EQTXBitBuffer.Create(ERR_BitBuffer_BitBufferEmpty); end; Procedure TQTXBitBuffer.SetBits(Const Value: TBitOffsetArray; Const Bitvalue: boolean); var x: NativeInt; FCount: NativeInt; Begin If FData NIL then Begin FCount := length(Value); If FCount > 0 then Begin for x := low(Value) to High(Value) do SetBit(Value[x], Bitvalue); end; end else Raise EQTXBitBuffer.Create(ERR_BitBuffer_BitBufferEmpty); end; Function TQTXBitBuffer.FindIdleBit(var Value: NativeUInt; Const FromStart: boolean = false): boolean; var FOffset: NativeUInt; FBit: NativeUInt; FAddr: PByte; x: NativeInt; Begin Result := FData NIL; if Result then Begin (* Initialize *) FAddr := FData; FOffset := 0; If FromStart then FReadyByte := 0; If FReadyByte < 1 then Begin (* find byte with idle bit *) While FOffset < NativeUInt(FDataLen) do Begin If BitsSetInByte(FAddr^) = 8 then Begin inc(FOffset); inc(FAddr); end else break; end; end else inc(FOffset, FReadyByte); (* Last byte exhausted? *) Result := FOffset 7 then FReadyByte := 0 else FReadyByte := FOffset; break; end; inc(FBit); end; end; end; end; Function TQTXBitBuffer.GetBit(Const Index: NativeUInt): boolean; begin If FData NIL then Begin If index 7 then FReadyByte := 0; end; end; end else Begin (* clear bit if not already clear *) If (FByte and (1 shl (BitOfs mod 8))) 0 then Begin FByte := (FByte and not(1 shl (BitOfs mod 8))); PByte(FDataLng + NativeInt(index shr 3))^ := FByte; (* remember this byte pos *) FReadyByte := Index shr 3; end; end; end else Raise EQTXBitBuffer.CreateFmt(ERR_BitBuffer_InvalidBitIndex, [Count - 1, index]); end else Raise EQTXBitBuffer.Create(ERR_BitBuffer_BitBufferEmpty); end; Procedure TQTXBitBuffer.Allocate(MaxBits: NativeUInt); Begin (* release buffer if not empty *) If FData NIL then Release; If MaxBits > 0 then Begin (* Allocate new buffer *) try FReadyByte := 0; FDataLen := BytesOf(MaxBits); FData := AllocMem(FDataLen); FDataLng := NativeUInt(FData); FBitsMax := BitsOf(FDataLen); except on e: Exception do Begin FData := NIL; FDataLen := 0; FBitsMax := 0; FDataLng := 0; Raise; end; end; end; end; Procedure TQTXBitBuffer.Release; Begin If FData NIL then Begin try FreeMem(FData); finally FReadyByte := 0; FData := NIL; FDataLen := 0; FBitsMax := 0; FDataLng := 0; end; end; end; Procedure TQTXBitBuffer.Zero; Begin If FData NIL then Fillchar(FData^, FDataLen, byte(0)) else raise EQTXBitBuffer.Create(ERR_BitBuffer_BitBufferEmpty); end; end.