Fulltext search in SQLite and Django app

Since version about 3.5.9 SQLite contains full-text search module called FTS3 (older releases may have FTS1, FTS2). Using this module we can easily add fast full text search to a Django application (or any other using SQLite). During my tests I got FTS3 only on Python 2.6. On Python 2.5 and using pysqlite may be hard to get FTS3 (try recompiling with -DSQLITE_ENABLE_FTS3=1 flag).

Here is an example table:

CREATE VIRTUAL TABLE my_search using FTS3(slug, body);

FTS1 and FTS2 can be used exactly in the same way as FTS3. More on sqlite.org.

We can start with importing data from existing table. With this script (executed from Django project folder) we can import the data:

# -*- coding: utf-8 -*- import sys import urllib2 from os import environ environ [ 'DJANGO_SETTINGS_MODULE' ] = 'settings' from settings import * from django.contrib.sessions.models import * from django.db import connection , transaction from MYAPP.models import * cursor = connection . cursor () j = MY_SOME_MODEL . objects . all () iterr = 1 for i in j : print iterr txt = i . some_txt + ' ' + i . more_txt + ' ' + i . city_of_something # txt should be stripped from HTML, stop words etc. to get smaller size of the database cursor . execute ( "INSERT INTO my_search (slug, body) VALUES ( %s , %s )" , ( i . slug , txt )) iterr += 1 transaction . commit_unless_managed ()

Indexing new entries we can handle in Django with signals. For example in models.py add:

from django.db.models import signals #.... def update_index ( sender , instance , created , ** kwargs ): cursor = connection . cursor () txt = instance . some_txt + ' ' + instance . more_txt + ' ' + instance . city_of_something # txt should be stripped from HTML, stop words etc. to get smaller size of the database txt = clean_to_search ( txt ) if created : # add if object is created, not updated cursor . execute ( "INSERT INTO my_search (slug, body) VALUES ( %s , %s )" , ( instance . slug , txt )) transaction . commit_unless_managed () signals . post_save . connect ( update_index , sender = MY_SOME_MODEL )

The query looks like this:

SELECT slug FROM my_search WHERE body MATCH 'search term';

cursor = connection . cursor () cursor . execute ( "SELECT slug FROM my_search WHERE body MATCH %s " , ( query ,)) results = cursor . fetchall ()

I'm testing this search solution on my JobMaster - job offer searcher hosted on megiteam.pl. As for now no problems occurred with this (except switch from Python 2.5 to Python 2.6). It doesn't leak memory (Nginx+FastCGI), and it doesn't seem to be slow (it index entries faster than Whoosh, but no number on it). It's simple, easy to setup, and no whoosh. solr, xapian needed, so it's cool way to add full text search to SQLite powered websites.

RkBlog