High Performance Btree Module Now Available

AllegroCache is a high-performance, dynamic object caching database system. It is described on this page. One feature in AllegroCache is Btree support, which is now directly available for programmers to use directly. Btrees are useful for applications where you need a simple and efficient way of storing on disk and retrieving vast amounts of data and where you do not want to pay the overhead of transactions and CLOS. In this note, we discuss Btrees and AllegroCache support for them.

The AllegroCache Btree documentation is here. Symbols naming Btree functionality are in the db.btree package.

AllegroCache has always provided Btree support internally. The functionality is now available to AllegroCache users. Users should note that the internal Btree structure changed from early releases so Btrees created before the change cannot be used in later AllegroCache versions.

The general theory of Btrees is discussed in http://en.wikipedia.org/wiki/Btree, what we have implemented is closer to B+trees. The most important feature is stores and accesses take on the order of Log(N) time, where N is the number of nodes in the btree. This means that store/access times are reasonable even for very, very large btrees.

In Allegro CL, keys (which identify btree entries) are stored in order so you can scan the btree in order.

The Allegro implementation of btrees has these properties in addition:

The code is written completely in Lisp in order to get the best performance possible and the best integration into a Lisp program.

Keys and values are simple vectors of type (unsigned-byte 8). (You can use other types if you write encoder/decoders to/from (unsigned-byte 8) vectors.)

There is extensive support for caching disk blocks to avoid disk I/O.

Btree Example

Allegro CL also supports Btree cursors. A cursor is an object which points to a Btree entry, and which can be moved (using the functionality supplied) to point to other entries.

This simple example is taken from the documentation. It is intended to show how Btrees are created and manipulated. Cursors are also illustrated. Note you cannot run this example unless you have loaded AllegroCache into Allegro CL 8.0. We do a (use-package :db.btree) so we do not have to package-qualify symbols.

cl-user(5): (require :acache) ;; t cl-user(6): (use-package :db.btree) t

We define several auxilliary encoding functions which we will use in this example. (These are discussed in the Encoding section of the Btree documentation).

(defun enc-string (str) (string-to-octets str :external-format :utf-8 :null-terminate nil)) (defun enc-value (value) (enc-string (write-to-string value))) (defun dec-value (encoded) (read-from-string (octets-to-string encoded :external-format :utf-8))) (defun enc-pos-int (value bytes &optional arr) (let ((res nil)) (loop (if* (zerop value) then (return) else (push (logand #xff value) res) (setq value (ash value -8)))) (if* (> bytes (length res)) then (dotimes (i (- bytes (length res))) (push 0 res))) (if* (and arr (>= (length arr) (length res))) then (let ((i -1)) (dolist (val res) (setf (aref arr (incf i)) val)) arr) else (make-array (length res) :element-type '(unsigned-byte 8) :initial-contents res)))) (defun dec-pos-int (val &optional (start 0) (end (length val))) (let ((res 0)) (dotimes (i (- end start)) (setq res (+ (aref val (+ start i)) (ash res 8)))) res))

Here we create a btree where we store for each number from 0 to 999 their square:

cl-user(7): (setq bt (create-btree "foo.bt")) #<db.btree::btree [1] foo.bt @ #x1001257ff2> cl-user(8): (dotimes (i 1000) (setf (get-btree bt (enc-pos-int i 4)) (enc-pos-int (* i i) 5))) nil

Now we use the btree to find the square of 25

cl-user(9): (get-btree bt (enc-pos-int 25 4)) #(0 0 0 2 113) cl-user(10): (dec-pos-int *) 625

A cursor is an object that can move through a btree allowing you to retrieve keys and values and to delete keys and values.

When you create a cursor you specify the btree that it will scan.

cl-user(12): (setq cur (create-cursor bt)) #<db.btree::cursor @ #x71c5d62a>

When a cursor is first created it doesn't point anywhere in the btree. Thus the operations on the cursor just return nil.

cl-user(13): (cursor-get cur) nil cl-user(14): (cursor-next cur) nil

You can specify where a cursor should point in a number of ways. Here we tell the cursor to point at the first element in the btree (and since keys are sorted, this will be a pointer to the lowest key in the key sorting order).

cl-user(15): (position-cursor cur nil :kind :first) nil

We can retrieve the key and value at the cursor with cursor-get. It returns the key and value as two values:

cl-user(16): (cursor-get cur) #(0 0 0 0) #(0 0 0 0 0)

We can tell even without decoding these usb8 arrays that the key and value are both 0.

To advance the cursor to the next value and retrieve it you use cursor-next:

cl-user(17): (cursor-next cur) #(0 0 0 1) #(0 0 0 0 1) cl-user(18): (cursor-next cur) #(0 0 0 2) #(0 0 0 0 4) cl-user(19): (cursor-next cur) #(0 0 0 3) #(0 0 0 0 9) cl-user(20): (cursor-next cur) #(0 0 0 4) #(0 0 0 0 16)

You'll note that after positioning the cursor we used cursor-get to retrieve the first value and cursor-next to retrieve subsequent ones. If you're writing a loop to retrieve values it's undesirable to have to call one function the first time through the loop and another function on subsequent calls. Thus cursors can be primed which means that they are in a special state so that a cursor-next will not move the cursor before retrieving the value. Also calling cursor-next will un-prime the cursor.

Here we once again position the cursor at the first item but this time we prime it as well. Then we can just use cursor-next to retrieve all the values:

cl-user(21): (position-cursor cur nil :kind :first :prime t) nil cl-user(22): (cursor-next cur) #(0 0 0 0) #(0 0 0 0 0) cl-user(23): (cursor-next cur) #(0 0 0 1) #(0 0 0 0 1) cl-user(24): (cursor-next cur) #(0 0 0 2) #(0 0 0 0 4)

You can position the cursor at the last item in the tree and then scan backwards with cursor-previous.

cl-user(25): (position-cursor cur nil :kind :last :prime t) nil cl-user(26): (cursor-previous cur) #(0 0 3 231) #(0 0 15 58 113) cl-user(27): (dec-pos-int *) 999 cl-user(28): (cursor-previous cur) #(0 0 3 230) #(0 0 15 50 164) cl-user(29): (dec-pos-int *) 998

You can position the cursor at a particular key. Note that in this case position-cursor returns t indicating that it found the key in the table:

cl-user(36): (position-cursor cur (enc-pos-int 385 4)) t cl-user(37): (cursor-get cur) #(0 0 1 129) #(0 0 2 67 1) cl-user(38): (dec-pos-int *) 385

If you specify a key not in the btree then position-cursor returns nil and the cursor is set on the next item after the location where the key would have been found. Here we choose a key value 2000 which is bigger than the biggest key in the table: 999. Thus position-cursor returns nil and sets the cursor after the end of the table. cursor-previous brings the cursor to the last key in the table: 999.

cl-user(39): (position-cursor cur (enc-pos-int 2000 4)) nil cl-user(40): (cursor-get cur) nil cl-user(41): (cursor-previous cur) #(0 0 3 231) #(0 0 15 58 113) cl-user(13): (dec-pos-int *) 999

You can position the cursor at a particular key and value as well and position-cursor will return t if it found the pair and nil if it did not:

cl-user(41): (position-cursor cur (enc-pos-int 385 4) :value (enc-pos-int (* 385 385) 5)) t cl-user(42): (position-cursor cur (enc-pos-int 385 4) :value (enc-pos-int (* 200 200) 5)) nil

You use a cursor to specify which values to delete from the btree. In the example below we position the cursor at key 5. We delete that key at which point the cursor moves to the next key, 6. We move the cursor back one key and we end up at key 4 since key 5 was deleted.

cl-user(47): (position-cursor cur (enc-pos-int 5 4)) t cl-user(48): (cursor-get cur) #(0 0 0 5) #(0 0 0 0 25) cl-user(49): (cursor-delete cur) t cl-user(50): (cursor-get cur) #(0 0 0 6) #(0 0 0 0 36) cl-user(51): (cursor-previous cur) #(0 0 0 4) #(0 0 0 0 16)

When you're finished using a cursor for a while it's best to unbind it. This disassociates it from any block in the btree and this allows the cursor to be garbage collected should no references exist to the cursor from the heap.

cl-user(52): (unbind-cursor cur) #<db.btree::cursor @ #x71b06c82>

So far all the btree operations have been done on the copy of the btree in memory. In order to write the btree to the disk you must sync-btree:

cl-user(53): (sync-btree bt) nil

When you're finished with the btree you should close it. The close-btree function will call sync-btree to ensure that the btree is completely written to the disk before the btree file is closed.

cl-user(54): (close-btree bt) #<db.btree::btree [1] foo.bt @ #x1001139632>