If you missed the initial two episodes in this tutorial, click here for the first article and here for the second. Even if you feel databases is something you master I urge you to read the articles from scratch, because the database theory we explore here is not the high-level concepts we learnt at school. Having said that it’s not too complex and with a bit of exploring you should find it easy to follow.

Update your git fork

In the previous episode you were asked to fork the project code from bitbucket. As we progress I will continue to add units and refactor code, so if you forked the code last time, you should fetch the latest revision before you continue.

Since GIT is incremental and awesome you can just revert to previous commits for the examples as we move along. I will mark each commit so you can see what is what and easily jump between examples.

Quick recap from our previous episode

Our last episode ended with a fidelity test on the storage medium. We implemented a class that would operate on blocks (or parts) of a defined size, rather than single bytes like a TStream does. The idea being that the class can create, read and write the blocks that contains the data; also known as a database pagesize.

I also introduced the notion of buffer classes, which are similar to Delphi streams. The buffer classes are actually Delphi implementations of the buffers found in node.js, which adds two very important benefits: the ability to inject and remove data from any point. So if you want to inject 1 gigabyte into the middle of a file, that can be done in a single call. If you want to remove 1 gigabyte from somewhere in the file, that is likewise reduced to a single call.

Buffers are designed to give you easier access to the data inside the buffer, while streams are great for normal programming tasks, buffers are much faster and ad-hoc, and thus better suited for database work.

From blocks to sequences

In this article’s code the focus is on sequences and putting together the essentials for our database class. Which means the ability to store sequences and being able to recover them (record persistency).

A sequence is, like I explained in our previous episode, file-parts that are linked together through offsets.

Whenever you want to store data that exceeds the page-size defined for a database, the only solution is to divide that data into parts and then spread the data over many blocks. We daisy-chain them together by setting the block number for the next block in the sequence in each of the parts (a block or part is ultimately just a record we write to the file), terminated by the value -1 ($FFFFFFFF as an unsigned longword). So the last block in the chain has the “next” field set to -1, which tells the system that there is no more data to fetch for the sequence.

If you are unsure about what I mean, please read up on sequences in the previous two articles.

Sequence persistency

What we are doing in this episode is to merge all the things we have talked about so far into a single class. The bit buffer is done, the block-file class is done, and now we need to make sure we can read and write full sequences.

This brings us to the problem of scaling. An empty database will not have any free blocks. So a part of the sequence reading and writing mechanism is the ability to expand the file on demand.

We also need to have some support functionality for mapping a sequence, otherwise we would have to store an awful lot of information for each record. We only really need to know the first block number in a sequence (because that will have a reference to any subsequent blocks used by a sequence). With mapping I mean a function that can investigate a sequence and return an array of block numbers, so that we know where each part of the sequence has ended up.

This might not be immediately useful, but when we reach the point of compacting a database, it becomes vital that we can quickly map what blocks a sequence occupy – and then organize them in a linear fashion so that the data can be read faster, and we can release excess empty space.

Changes to the header and part records

Suddenly our metadata needs have grown considerably. We need to store the bit-buffer that holds the map of the whole database (all the blocks are either used [1] or available [0]). We need to store a list of longwords that represents the records in a database. Please note that on this level, the database doesn’t care what record belongs to what table. And last but not least we need a file-header and block record that can keep track of the initial values.

Let’s have a look at the records first:

// Custom data structures TDbVersion = packed record bvMajor: byte; bvMinor: byte; bvRevision: word; end; TDbLibFileHeader = packed record dhSignature: longword; dhVersion: TDbVersion; dhName: shortstring; dhMetadata: longword; dhRecList: longword; dhBitBuffer: longword; class function Create: TDbLibFileHeader; static; end; TDbPartData = packed record ddSignature: longword; ddRoot: longword; ddPrevious: longword; ddNext: longword; ddBytes: integer; ddData: packed array [0 .. CNT_DATABASEFILE_PAGESIZE - 1] of byte; class function Create: TDbPartData; static; end;

Notice that the part-data (block) has fields to keep track of the root (the first part in the sequence), the previous part in the sequence amd the next part in the sequence. This gives us a lot of freedom when we want to do some tooling for the engine later.

If you look at the file-header this has also been expanded. It has a field to hold the first part in the meta-data sequence, the record list (the total list of sequences stored in the file) and the bit buffer.

Metadata

There are quite a few classes we need to introduce, so let’s start at the top, namely the meta-data classes. In our case the metadata is (first and foremost):

Tables Fields

Database name

The reason I have isolated each of these as classes, is to make it easier to expand the system later. So let’s have a look at the classes:

IExtendedPersistence = interface ['{282CC310-CD3B-47BF-8EB0-017C1EDF0BFC}'] procedure ObjectFrom(const Reader: TDbLibReader); procedure ObjectFromStream(const Stream: TStream; const Disposable: boolean); procedure ObjectFromData(const Data: TDbLibBuffer; const Disposable: boolean); procedure ObjectFromFile(const Filename: string); procedure ObjectTo(const Writer: TDbLibWriter); function ObjectToStream: TStream; overload; procedure ObjectToStream(const Stream: TStream); overload; function ObjectToData: TDbLibBuffer; overload; procedure ObjectToData(const Data: TDbLibBuffer); overload; procedure ObjectToFile(const Filename: string); end; IDbLibFields = interface ['{0D6A9FE2-24D2-42AE-A343-E65F18409FA2}'] function IndexOf(FieldName: string): integer; function ObjectOf(FieldName: string): TDbLibRecordField; function Add(const FieldName: string; const FieldClass: TDbLibRecordFieldClass): TDbLibRecordField; function Addinteger(const FieldName: string): TDbLibFieldInteger; function AddStr(const FieldName: string): TDbLibFieldString; function Addbyte(const FieldName: string): TDbLibFieldbyte; function AddBool(const FieldName: string): TDbLibFieldboolean; function AddCurrency(const FieldName: string): TDbLibFieldCurrency; function AddData(const FieldName: string): TDbLibFieldData; function AddDateTime(const FieldName: string): TDbLibFieldDateTime; function AddDouble(const FieldName: string): TDbLibFieldDouble; function AddGUID(const FieldName: string): TDbLibFieldGUID; function AddInt64(const FieldName: string): TDbLibFieldInt64; function AddLong(const FieldName: string): TDbLibFieldLong; end; TDbLibPersistent = class(TDbLibObject, IExtendedPersistence) private FObjId: Longword; FUpdCount: integer; strict protected // Implements:: IExtendedPersistence procedure ObjectTo(const Writer: TDbLibWriter); procedure ObjectFrom(const Reader: TDbLibReader); procedure ObjectFromStream(const Stream: TStream; const Disposable: boolean); function ObjectToStream: TStream; overload; procedure ObjectToStream(const Stream: TStream); overload; procedure ObjectFromData(const Binary: TDbLibBuffer; const Disposable: boolean); function ObjectToData: TDbLibBuffer; overload; procedure ObjectToData(const Binary: TDbLibBuffer); overload; procedure ObjectFromFile(const Filename: string); procedure ObjectToFile(const Filename: string); protected procedure BeforeUpdate; virtual; procedure AfterUpdate; virtual; strict protected // Persistency Read/Write methods procedure BeforeReadObject; virtual; procedure AfterReadObject; virtual; procedure BeforeWriteObject; virtual; procedure AfterWriteObject; virtual; procedure WriteObject(const Writer: TDbLibWriter); virtual; procedure ReadObject(const Reader: TDbLibReader); virtual; strict protected // Standard persistence function ObjectHasData: boolean; virtual; procedure ReadObjBin(Stream: TStream); virtual; procedure WriteObjBin(Stream: TStream); virtual; procedure DefineProperties(Filer: TFiler); override; public property UpdateCount: integer read FUpdCount; procedure Assign(Source: TPersistent); override; function ObjectIdentifier: longword; function BeginUpdate: boolean; procedure EndUpdate; class function classIdentifier: longword; constructor Create; override; end; [ComponentPlatformsAttribute(CNT_ALL_PLATFORMS)] TDbLibRecordField = class(TDbLibBufferMemory) private FName: string; FNameHash: Int64; FOnRead: TNotifyEvent; FOnWrite: TNotifyEvent; FOnRelease: TNotifyEvent; procedure SetRecordName(NewName: string); protected function GetDisplayName: string; virtual; procedure BeforeReadObject; override; procedure ReadObject(Reader:TReader); override; procedure WriteObject(Writer:TWriter); override; procedure DoReleaseData; override; protected procedure SignalWrite; procedure SignalRead; procedure SignalRelease; public function AsString: string; virtual;abstract; property DisplayName: string read GetDisplayName; property FieldSignature:Int64 read FNameHash; published property OnValueRead: TNotifyEvent read FOnRead write FOnRead; property OnValueWrite: TNotifyEvent read FOnWrite write FOnWrite; property OnValueRelease: TNotifyEvent read FOnRelease write FOnRelease; property FieldName: string read FName write SetRecordName; end; TDbLibFieldboolean = class(TDbLibRecordField) private function GetValue: boolean; procedure SetValue(const NewValue: boolean); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value: boolean read GetValue write SetValue; end; TDbLibFieldbyte = class(TDbLibRecordField) private function GetValue:byte; procedure SetValue(const NewValue:byte); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:byte read GetValue write SetValue; end; TDbLibFieldCurrency = class(TDbLibRecordField) private function GetValue:Currency; procedure SetValue(const NewValue:Currency); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:Currency read GetValue write SetValue; end; TDbLibFieldData = class(TDbLibRecordField) protected function GetDisplayName: string; override; public function AsString: string; override; end; TDbLibFieldDateTime = class(TDbLibRecordField) private function GetValue:TDateTime; procedure SetValue(const NewValue:TDateTime); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:TDateTime read GetValue write SetValue; end; TDbLibFieldDouble = class(TDbLibRecordField) private function GetValue:Double; procedure SetValue(const NewValue:Double); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:Double read GetValue write SetValue; end; TDbLibFieldGUID = class(TDbLibRecordField) private function GetValue:TGUID; procedure SetValue(const NewValue:TGUID); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:TGUID read GetValue write SetValue; end; TDbLibFieldInteger = class(TDbLibRecordField) private function GetValue: integer; procedure SetValue(const NewValue:integer); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:integer read GetValue write SetValue; end; TDbLibFieldInt64 = class(TDbLibRecordField) private function GetValue:Int64; procedure SetValue(const NewValue:Int64); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value:Int64 read GetValue write SetValue; end; TDbLibFieldString = class(TDbLibRecordField) private FLength: integer; FExplicit: boolean; function GetValue: string; procedure SetValue(NewValue: string); procedure SetFieldLength(Value:integer); protected function GetDisplayName: string; override; public function AsString: string; override; constructor Create(AOwner: TComponent); override; published property Value: string read GetValue write SetValue; property Length: integer read FLength write SetFieldLength; property Explicit: boolean read FExplicit write FExplicit; end; TDbLibFieldLong = class(TDbLibRecordField) private function GetValue:Longword; procedure SetValue(const NewValue:Longword); protected function GetDisplayName: string; override; public function AsString: string; override; published property Value: Longword read GetValue write SetValue; end; TDbLibCustomRecord = class(TComponent, IDbLibFields, IStreamPersist) strict private FObjects: TObjectList; strict protected function GetItem(const Index:integer): TDbLibRecordField; procedure SetItem(const Index: integer; const Value: TDbLibRecordField); function GetField(const FieldName: string): TDbLibRecordField; procedure SetField(const FieldName: string; const Value: TDbLibRecordField); function GetCount: integer; property Fields[const FieldName: string]: TDbLibRecordField read GetField write SetField; property Items[const Index: integer]: TDbLibRecordField read GetItem write SetItem; property Count: integer read GetCount; public function Add(const FieldName: string; const Fieldclass: TDbLibRecordFieldclass): TDbLibRecordField; function Addinteger(const FieldName: string): TDbLibFieldInteger; function AddStr(const FieldName: string): TDbLibFieldString; function Addbyte(const FieldName: string): TDbLibFieldbyte; function AddBool(const FieldName: string): TDbLibFieldboolean; function AddCurrency(const FieldName: string): TDbLibFieldCurrency; function AddData(const FieldName: string): TDbLibFieldData; function AddDateTime(const FieldName: string): TDbLibFieldDateTime; function AddDouble(const FieldName: string): TDbLibFieldDouble; function AddGUID(const FieldName: string): TDbLibFieldGUID; function AddInt64(const FieldName: string): TDbLibFieldInt64; function AddLong(const FieldName: string): TDbLibFieldLong; procedure WriteInt(const FieldName: string;const Value: integer); procedure WriteStr(const FieldName: string;const Value: string); procedure Writebyte(const FieldName: string;const Value: byte); procedure WriteBool(const FieldName: string;const Value: boolean); procedure WriteCurrency(const FieldName: string;const Value: currency); procedure WriteData(const FieldName: string;const Value: TStream); procedure WriteDateTime(const FieldName: string;const Value: TDateTime); procedure WriteDouble(const FieldName: string;const Value: double); procedure WriteGUID(const FieldName: string;const Value: TGUID); procedure WriteInt64(const FieldName: string; const Value: int64); procedure WriteLong(const FieldName: string; const Value: longword); procedure Clear; virtual; procedure Assign(source: TPersistent); override; function ToStream: TStream; virtual; function ToBuffer: TDbLibBuffer; virtual; procedure FromStream(const Stream: TStream; const Disposable: boolean = true); // Implements:: IStreamPersist procedure LoadFromStream(Stream: TStream); virtual; procedure SaveToStream(Stream: TStream); virtual; function IndexOf(FieldName: string): integer; function ObjectOf(FieldName: string): TDbLibRecordField; constructor Create(AOwner: TComponent); override; destructor Destroy; override; end; [ComponentPlatformsAttribute(CNT_ALL_PLATFORMS)] TDbLibRecord = class(TDbLibCustomRecord) public property Fields; property Items; property Count; end; TDbLibTable = class(TDbLibPersistent) strict private FName: string; FParent: TDbLibMetaData; FPrototype: TDbLibRecord; FFieldDefs: IDbLibFields; strict protected procedure BeforeReadObject; override; procedure WriteObject(const Writer: TDbLibWriter); override; procedure ReadObject(const Reader: TDbLibReader); override; public constructor Create(const MetaData: TDbLibMetaData); reintroduce; virtual; destructor Destroy; override; published property Parent: TDbLibMetaData read FParent; property FieldDefs: IDbLibFields read FFieldDefs; property TableName: string read FName write FName; end; TDbLibMetaData = class(TDbLibPersistent, IDbLibTables) strict private [Weak] FParent: TDbLibDatabase; FDbName: string; FTables: TObjectList; function GetTable(index: integer): TDbLibTable; function GetTableCount: integer; function AddTable(TableName: string): TDbLibTable; strict protected procedure SetDatabaseName(NewName: string); procedure BeforeReadObject; override; procedure WriteObject(const Writer: TDbLibWriter); override; procedure ReadObject(const Reader: TDbLibReader); override; public property Database: TDbLibDatabase read FParent; property DatabaseName: string read FDbName write SetDatabaseName; property Tables[index: integer]: TDbLibTable read GetTable; default; property TableCount: integer read GetTableCount; function GetTableByName(TableName: string; var Table: TDbLibTable): boolean; constructor Create(const Database: TDbLibDatabase) ; reintroduce; virtual; destructor Destroy; override; end;

Some might not like that I made some alterations to the standard Delphi persistency system. But if you look closely at the class TDbLibPersistent you will see that it makes it easier to handle hosted data. You have methods like BeforeRead() and BeforeWrite() that makes house-keeping a lot easier.

With the metadata out of the way, lets have a peek at the actual database class. Or should I say, the sequence persistency layer, because at the moment it has the capacity to save sequences, remember them when you re-open the file, keep track of the database name and defined tables – but we have not yet turned the metadata into physical data.

TDbLibRecordList = class(TDbLibPersistent) strict private FDbRecList: TList; function GetCount: integer; inline; function GetItem(const index: integer): longword; inline; procedure SetItem(const index: integer; const Value: longword); inline; strict protected procedure BeforeReadObject; override; procedure WriteObject(const Writer: TDbLibWriter); override; procedure ReadObject(const Reader: TDbLibReader); override; public property RecordList: TList read FDbRecList; property Items[const index: integer]: longword read GetItem write SetItem; default; property Count: integer read GetCount; constructor Create; override; destructor Destroy; override; end; TDbLibDatabase = class(TComponent) strict private FHeader: TDbLibFileHeader; FBitbuffer: TDBLibBitBuffer; FDbFile: TDbLibBuffer; FDbFileAccess: TDbLibPartAccess; FDbRecList: TDbLibRecordList; FMetaData: TDbLibMetaData; FTableAccess: IDbLibTables; FActive: boolean; FFilename: string; FAccessMode: TDbLibAccessMode; function GetActive: boolean; procedure SetActive(const NewActive: boolean); function GetFileName: string; procedure SetFileName(NewFileName: string); function GetAccessMode: TDbLibAccessMode; procedure SetAccessMode(const NewMode: TDbLibAccessMode); function GetDbName: string; procedure SetDbName(NewDbName: string); strict protected procedure ResetHeader; procedure ReadHeader; procedure WriteHeader; public property MetaData: TDbLibMetaData read FMetaData; property Tables: IDbLibTables read FTableAccess; procedure Open; procedure Close; procedure GrowBy(Parts: longword); procedure ShrinkBy(Parts: longword); function GetRecordCount: integer; inline; function GetRecordSequence(const RecNo: integer): TDbLibSequence; inline; function GetSequenceBits(NumberOfBits: longword; var Bits: TDbLibSequence): boolean; procedure WriteSequence(const Sequence: TDbLibSequence; const Data: TStream); inline; function WriteDataAsSequence(const Data: TStream; var Sequence: TDbLibSequence): boolean; procedure MapSequence(StartPageIndex: longword; var Sequence: TDbLibSequence); overload; inline; function MapSequence(const StartPageIndex: longword): TDbLibSequence; overload; inline; function ReadDataFromSequence(SeqStart: longword): TStream; inline; procedure BeforeDestruction; override; constructor Create(AOwner: TComponent); override; destructor Destroy; override; public property Active: boolean read GetActive write SetActive; property DatabaseName: string read GetDbName write SetDbName; property Filename: string read GetFileName write SetFileName; property AccessMode: TDbLibAccessMode read GetAccessMode write SetAccessMode; end;

Our previous fidelity test was somewhat meager. This time we give the system a bit more to work with. So our initialization code now looks more akin to proper database work:

var LTable: TDbLibTable; begin memoOut.Lines.Clear(); memoOut.Lines.BeginUpdate(); screen.Cursor := crHourGlass; try try FDatabase.Filename := 'c:\temp\mydb.db'; //'::memory::'; FDatabase.AccessMode := TDbLibAccessMode.amCreate; FDatabase.DatabaseName := 'temp'; LTable := FDatabase.Tables.AddTable('main'); LTable.FieldDefs.Addinteger('id'); LTable.FieldDefs.AddStr('name'); LTable.FieldDefs.Add('data', TDbLibFieldData); FDatabase.Open(); finally if FDatabase.Active then FDatabase.Close(); end; finally memoOut.lines.EndUpdate(); screen.Cursor := crDefault; end;

Fork the code luke

The latest code has been uploaded to the BitBucket repository (same as last), so simply update the fork you did earlier. Make sure you set plenty of breakpoints and also download a hex-editor and examine the resulting databasefile.

Alright! I think that is enough for the third episode. In our next episode we have more than enough to explain and explore, but now the fundamental parts are in place!

Until next time!