

Lightweight Database Strategies for Perl

Several years ago I got what I thought was a great idea for a three-hour conference tutorial: lightweight data storage techniques. When you don't have enough data to be bothered using a high-performance database, or when your data is simple enough that you don't want to bother with a relational database, you stick it in a flat file and hack up some file code to read it. This is the sort of thing that people do all the time in Perl, and I thought it would be a big seller. I was wrong. I don't know why. I tried giving the class a snappier title, but that didn't help. I'm really bad at titles. Maybe people are embarrassed to think about all the lightweight data storage hackery they do in Perl, and feel that they "should" be using a relational database, and don't want to commit more resources to lightweight database techniques. Or maybe they just don't think there is very much to know about it. But there is a lot to know; with a little bit of technique you can postpone the day when you need to go to an RDB, often for quite a long time, and often forever. Many of the techniques fall into the why-didn't-I-think-of-that category, stuff that isn't too weird to write or maintain, but that you might not have thought to try. I think it's a good class, but since it never sold well, I've decided it would do more good (for me and for everyone else) if I just gave away the materials for free. Table of Contents The class is in three sections. The first section is about using plain text files and talks about a bunch of useful techniques, such as how to do binary search on sorted text files (this is nontrivial) and how to replace records in-place, when they might not fit. The second section is about the Tie::File module, which associates a flat text file with a Perl array. The third section is about DBM files, with a comparison of the five major implementations. It finishes up with a discussion of some of Berkeley DB's lesser-known useful features, such as its DB_BTREE file type, which offers fast access like a hash but keeps the records in sorted order Text Files Rotating log file; deleting a user Copy the File -i.bak Using -i inside a program Problems with -i Atomicity issues Essential problem with files; fundamental operations; seeking Sorted files In-place modification of records Overwriting records Bytes vs. positions Gappy Files Fixed-length records Numeric indices Case study: lastlog Indexing Void fields Generic text indices Packed offsets

Tie::File Tie::File Examples delete_user revisited uppercase_username revisited Rotating log file revisited Most important thing to know about Tie::File Indexing with Tie::File Tie::File Internals Caching Record modification Immediate vs. Deferred Writing Autodeferring Miscellaneous Features

DBM Common DBM Implementations What DBM Does Small DBMs: ODBM , NDBM , and SDBM GDBM DB_File Indexing revisited Ordered hashes Partial matching Sequential access Multiple values Filters BerkeleyDB

Online materials Class slides:

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License. PDF files: 1-up 2-up 4-up Browse slides online

Sample source code referred to in the class:

Example source code from Lightweight Databases class is licensed under a Creative Commons Public Domain License. Browse the directory TGZ file

People sometimes ask what use Tie::File is when Berkeley DB has a DB_RECNO option that appears to be the same thing. This document explains why.

[Other articles in category /prog/perl] permanent link

