Python’s dictionary is possibly the most useful construct in the language. And I argue that for some purposes, mapping it to a file (in real-time) can be even more useful.

*** Update ***

There’s a newer and better version of FileDict, containing bugfixes and corrections, many of which are due to comments on this page.

You can read about it (with explanations) in https://erezsh.wordpress.com/2009/05/31/filedict-bug-fixes-and-updates/

Why?

The dictionary resides in memory, and so has three main “faults”:

It only lasts as long as your program does. It occupies memory that might be useful for other, more commonly accessed, data. It is limited to how much memory your machine has.

The first can be solved by pickling and unpickling the dictionary, but will not survive an unexpected shutdown (even putting the pickling in a try-finally block won’t protect it against all errors).

FileDict

FileDict is a dictionary interface I wrote, that saves and loads its data from a file using keys. Current version uses Sqlite3 to provide consistency, and as a by-product, acidity.

The result is a dictionary which at all-times exists as a file, has virtually no size limit, and can be accessed by several processes concurrently.

It is meant as a quick-and-simple general-purpose solution. It is rarely the best solution, but it is usually good enough.

Performance obviously cannot compare to the builtin dictionary, but it is reasonable and of low complexity (refer to sqlite for more details on that).

Uses

FileDict can be used for many purposes, including:

Saving important data in a convinient manner

Managing large amounts of data in dictionary form, without the mess of implementing paging or other complex solutions

Communication between processes (sqlite supports multiple connections and implements ACID)

Examples

$ python >>> import filedict >>> d=filedict.FileDict(filename="example.dict") >>> d['bla'] = 10 >>> d[(2,1)] = ['hello', (1,2) ] -- exit -- $ python >>> import filedict >>> d=filedict.FileDict(filename="example.dict") >>> print d['bla'] 10 >>> print d.items() [['bla', 10], [(2, 1), ['hello', (1, 2)]]] >>> print dict(d) {'bla': 10, (2, 1): ['hello', (1, 2)]}

>>> d=filedict.FileDict(filename="try.dict") >>> with d.batch: # using .batch suspend commits, making a batch of changes quicker >>> for i in range(100000): >>> d[i] = i**2 (takes about 8 seconds on my comp) >>> print len(d) 100000 >>> del d[103] >>> print len(d) 99999

Limitations

All data (keys and values) must be pickle-able

Keys must be hashable (perhaps this should be removed by hashing the pickled key)

Keys and values are stored as a copy, so changing them after assignment will not update the dictionary.

Source Code

Is availible in here in here

Future

Additions in the future may include:

An LRU-cache for fetching entries

A storage strategy different than Sqlite

Other suggestions?

Share this: Twitter

Facebook

Like this: Like Loading... Related