SQLite Keynote

D. Richard Hipp, author of SQLite presented this years keynote. A fascinating jaunt from the civilized realms into barbarian territory and back. Here are some of the notes I took—

(Incidentally I noted that Richard pronounces SQLite S-Q-L-ite, not Sequel-Lite.)

Non-Comment Source Lines in SQLite 3.3.7 tree

Tcl 54.28%

Tcl Bindings 8.55%

Core C 35.71%

Other 1.46%

Some of the major users of SQLite that he's allowed to talk about

GE

Apple

AOL

Sun (Solaris 10)

Subversion

Yum

Philips

Monotone

General Dynamics

Toshiba

Symbian

Firefox

Federal Aviation Administration

Microsoft (xbox)

Possibly the most widely distributed Tcl extension in the world.

Uses for SQLite

Replacement for client-server RDBMS Stand-in for enterprise RDBMS during testing and demos Local cache of enterprise RDBMS data Persistence of objects, configuration Complex data structures ...

Complex Data Structures

Class <-> Table

Object <-> Row

Instance Variable <-> Column

Automated undo/redo using triggers.

Application File Format

File/Open reads and parses the entire application file

Error prone

Lots of code to maintain

Application crash causes data lost

File/Save requires the user to remember to save, possible to overwrite of independent changes, corruption if file generator and parser do not match

What if ...

No need to read and parse file content

updates are atomic, consistent, isolated and durable

automatic concurrency control

Changes are written to disk immediately

No data loss after unexpected power failure

The save and save as options are obsolete -- remove them

Key concept: SQLite wants to replace fopen()/[open] not Oracle/Oratcl.

Useful example code/utilities

SQLite Console for Tcl/Tk

Email Client that uses SQLite as a back-end store

Database Editor (define a new SQL function in Tcl, allowing you to write update changelog set message = edit(message) where change=2782; and be able to edit the message in a text widget that'll pop up instead of copying and pasting or re-typing manually)

Crossing into barbarian lands now...

Q: What is Full Text Search? A: In brief: what Google does. Q: Why is full text search important? A: Internet search enginges have users spoiled. Modern applications need to support full text search in order to be competitive. Q: Is it difficult to implement? A: It is tricky for large data sets to get it right.

Basic approach

tokenize into words case folding stemming (convert each word into its root form; porter stemmer very popular stemmer for English; recommended -> recommend; books -> book) remove stop words for each word left create posting list per word

OR queries do an intersection. AND queries do a union.

Phrase Queries: naive method: do an AND query, then examine every document in the result set in a second pass and eliminate those that lack the phrase. Very bad performance in pathalogical cases. Instead update posting list to store document:position (multiple occurances included multiple times).

Basic Operations

Insert a new document: Break the document into words; Append the document ID to the posting list for each word

Perform a query: Lookup the posting list for each word in the query; Combine posting lists

Keeping the Working Set Small

Limit size of the lexicon (stemming, stop words)

Compress the posting list aggressively

Spill stationary parts of posting list into a separate table that is not cached

Full Text Search (version 1) built into SQLite 3.3.8 released Monday

As of SQLite 3.3.8 (released Monday!) full text search support in SQLite. Ricahrd's authorized to announce help from engineers at Google. Later question elicited that roughly half of the FTS code was written by him & Dan, the other half by four engineers [didn't have time to write down their names] from Google. He isn't able/can't comment on their motivations/plans/internal usage. (Obviouslyy won't be replacing their search engine with SQLite.)

Some example usage

CREATE VIRTUAL TABLE email USING fts1(content);

CREATE VIRTUAL TABLE email USING fts1("from" TEXT, "to" TEXT, subject TEXT, body TEXT, tokenize porter);

SELECT rowid, subject FROM email WHERE email MATCH 'wyrick sqlite';

SELECT rowid, subject FROM email WHERE email MATCH $::querytext;

MATCH: when left operand is the name of the table match against any column can specify a particular column $::querytext can specify columns i.e., "from:wyrick"

Built in snippet generator: SELECR rowid, snippet(email) FROM email WHERE email MATCH ...

FTS1 comes standard with Tcl bindings for SQLite (>= 3.3.8).

FTS1 permanent-beta

FTS2 will ship before Christmas

Potential uses:

Search for private websites

Email clients

Online documentation search

Searchable history in web browsers (like a Google search but instead of searching the whole corpus of the web search only those sites I've browsed to in the last x months)

months) Chatroom archive search

Search version diffs in a configuration management system

Text editors and IDEs

Pervasive Full-Text Search (users want it; made easy using Tcl and SQLite).

—Michael A. Cleverly

