Project description

DSE - Simplified "bulk" insert/update for Django

================================================



Version : 1.0.0

Author : Thomas Weholt <thomas@weholt.org>

License : GPL v3.0

Status : Beta

Url : https://bitbucket.org/weholt/dse

Docs at http://readthedocs.org/docs/dse/en/latest/index.html



==Background==



* DSE is available for one reason - to insert/update lots of data as fast as possible.



* DSE vs Django ORM: typical speed gain is around 10X.



* DSE is aware of default values specified in your Django models and will use

those if no value is given for a field in an insert statement.



* DSE caches SQL-statements, both inserts and updates, and executes them when

a specified number of statements has been prepared or when it`s told manually to flush

cached statements to the database. The actual sql execution is done using

DB API cursor.executemany and this is much faster than executing SQL-statements

in sequence and way faster than using the Django ORM.



* DSE uses a dictionary to specify what fields to update. It differs between

update and insert statements by looking for a key similar to the primary key for

a given model.



* DSE will try to execute as many statements as possible, but will detect update

statements where only a few sql fields/columns are updated and execute those

statements individually.



* It has been designed to be used outside Django as well, but the main focus

is good Django integration.



* My hope is to see something like this in the django core.



==Plans for the future==



* Refactoring of the SQL-builder code.

* Profiling and performance tuning.

* More focus on thread-safety and if needed locking.



==Installation==



pip install dse



or



hg clone https://bitbucket.org/weholt/dse



==Example usage==



You got a model like:



gender = (('M', 'Male'), ('F', 'Female'))



class Person(models.Model):

name = models.CharField(max_length = 30)

age = models.IntegerField(default = 30)

sex = models.CharField(max_length = 1, choices = gender, default = 'M')



Using dse:



import dse

dse.patch_models() # Monkey patch all your models and expose dse for all models:



with Person.dse as d:

for name, age, sex in (('Thomas', 36, 'M'), ('Joe', 40, 'M'), ('Jane', 28, 'F')):

d.add_item(dict(name = name, age = age, sex = sex))



Nothing will be inserted into the database before the loop is done ( or you

insert 1000 items ). Then the items will be inserted using cursor.executemany,

using plain SQL - no ORM in sight.



DSE features singletons (NB! very experimental, no locking/thread support yet!):



import dse.singleton



p1 = dse.singleton.Models.Person()

p2 = dse.singleton.Models.Person()

print p1 is p2 # should print True

p1.add_item(dict(name = 'Joe'))

p2.flush()

print Person.objects.all().count() # should print 1



Singletons makes it possible to cache entries across sections of code and cache

even more data, hitting the db less.



DSE using default values defined in your model:



with Person.dse as d:

# Adding an item, just defining a name and using the default values from the model:

d.add_item({'name': 'John'})



# Overriding the default values? Just specify a valid value

d.add_item({'name': 'Thomas', 'age': 36, 'sex': 'M'})



# Update record with id = 1 and set its name to John. This will trigger

# a SQL-statement for this update alone, since not all columns are specified:

d.add_item({'id': 1, 'name': 'John'})



To use delayed execution of SQL statements:



Person.dse.add_item({'id': 2, 'name': 'Al Capone'}) # will NOT trigger anything

Person.dse.add_item({'id': 3, 'name': 'John Dillinger'}) # will NOT trigger anything

Person.dse.add_item({'name': 'Scarface'}) # will NOT trigger anything

Person.dse.flush() # will execute both update statements and insert a record for "Scarface"



Say you want to update all records with some calculated value, something you

couldn`t find a way to do in SQL. Using dse this is easy and fast:



with Person.dse as d:

# Use Djangos ORM to generate dictionaries to use in DSE; objects.all().values().

for item in Person.objects.all().values():

item['somevar'] = calculated_value

d.add_item(item)



I`ve recieved some questions about transaction handling. Below is an simple example,

but I`m looking into other ways of handling transactions as well:



from django.db import transaction

import dse



@transaction.commit_manually

def some_method():

with SomeModel.dse as d:

for item in somelist:

SomeModel.dse.add_item(

{'some_column': item.some_value,

'another_column': item.another_value})

transaction.commit()



==Release notes==



1.0.0 : Version bump. Added unittest for issue #8.



1.0.0-RC1 : updated README.txt.



0.9.4 : - PEP8 and pyflake.



0.9.3 : - Fixed issue #7: dse causes django-debug-toolbar to crash. Thanks to ringemup for pointing that out. Added some docstrings.



0.9.2 : - Corrected type in usage.rst and README.txt.



0.9.1 : - Refactored code even more, added usage.rst, singleton support in the singleton-package and some performance tests. Models not monkey patched be default anymore, must call dse.patch_models().



0.9.0 : - Refactored code and cleaned up tests folder. Focus on getting singleton support in before 1.0.0. And more tests.



0.8.2 : - added 'pysqlite2' to _DBMAP. Thanks to David Marble for 0.8.1 and 0.8.2.



0.8.1 : - attempt to fix quoting problems with fields on postgresql.



0.8.0 : - fixed crash when more than one database connection has been configured. No ModelFactory will be triggered.



0.7.0 : - don`t remember.



0.6.0 : - added support for the with-statement.

- added an ModelDelayedExecutor-instance to each model, so you can do Model.dse.add_item

instead of dse.ModelFactory.Model.add_item.

- renamed dse.modelfactory to dse.ModelFactory to be more style-compliant.



0.5.1 : just some notes on transaction handling.



0.5.0 :

- added modelfactory. Upon first import a modelfactory will be created in the DSE module. It`s basically just a

helper-class containing ModelDelayedExecutor-instances for all models in all apps found in INSTALLED_APPS in

settings.py.

- to change the default item limit before automatic execution of cached SQL statements to 10000 instead of the default 1000::



import dse

dse.ITEM_LIMIT = 10000



0.4.0 :

- fixed serious bug when using mass updates. Using cursor.executemany is only possible when values

for all columns are specified. If only values for a subset of the columns is specified that will be

executed as a seperate SQL-call. NOTE! Using dex.get_items() or Djangos Model.objects.values() will give you

all the fields.

- code clean-up.

- added custom exceptions; UpdateManyException, UpdateOneException and InsertManyException.