I always used Django at work and for my personal projects. But on the recent project at work I had an opportunity to use Flask with SQLAlchemy. So I had to learn something new.

Flask is easy peasy because it’s a microframework and it doesn’t have a lot inside. However, understanding SQLAlchemy and how to work with it was more difficult than I expected.

In this article I tried to show the main differences between Django ORM and SQLAlchemy with examples of how to do something in Django and how to do the same in SQLAlchemy. If you’re trying to switch to SQLAlchemy for whatever reasons, I hope it will be useful for you.

Before we dive into examples and differences between Django ORM and SQLAlchemy, let’s start from understanding transactions because it’s a very important concept if you want to write anything using SQLAlchemy.

Transactions

Think of transactions as a way to ensure that multiple database operations (insert, delete, …) succeed or fail as a group.

When we start a transaction, we record the current state of our database and then execute SQL statements. If all of these statements succeed, we commit the transaction. After commit, all changes will be persisted to the database and visible for other transactions.

However, if one or more of those statements fail, we catch the exception and roll back any statements that succeed.

Transactions in Django and SQLAlchemy

The way how we handle transactions in Django and SQLAlchemy is different.

In Django we rarely think about transactions at all. Because Django’s default behavior is to run in autocommit mode which means that each SQL statement is wrapped into its own transaction and this transaction will be automatically committed or rolled back depending on whether the SQL statement succeeds or fails.

>>> u = User.objects.create(username='apirobot', email='apirobot@example.com') INSERT INTO "auth_user" ("username", "email") VALUES ('apirobot', 'apirobot@example.com') >>> Article.objects.create( ... title='Distributed systems with RabbitMQ', ... body='Distributed systems', ... author=u ... ) INSERT INTO "blog_article" ("title", "body", "author_id") VALUES ('Distributed systems with RabbitMQ', 'Distributed systems', 1) >>> Article.objects.create( ... title='Tale about Redis', ... body='Tale about Redis', ... author=u ... ) INSERT INTO "blog_article" ("title", "body", "author_id") VALUES ('Tale about Redis', 'Tale about Redis', 1)

In SQLAlchemy though, we have the Session object. The session is the way SQLAlchemy interacts with the database. It allows you to accumulate multiple changes and then issue commit command, which will write all the changes to the database automatically as one unit. This pattern is also known as Unit of Work:

>>> u = User(username='apirobot', email='apirobot@example.com') >>> a1 = Article( ... title='Distributed systems with RabbitMQ', ... body='Distributed systems' ... ) >>> a2 = Article( ... title='Tale about Redis', ... body='Tale about Redis' ... ) >>> u.articles.append(a1) >>> u.articles.append(a2) >>> db.session.add_all([u, a1, a2]) >>> db.session.commit() 2019-05-09 09:59:16,890 INFO sqlalchemy.engine.base.Engine INSERT INTO user (username, email) VALUES (?, ?) 2019-05-09 09:59:16,890 INFO sqlalchemy.engine.base.Engine ('apirobot', 'apirobot@example.com') 2019-05-09 09:59:16,894 INFO sqlalchemy.engine.base.Engine INSERT INTO article (title, body, author_id) VALUES (?, ?, ?) 2019-05-09 09:59:16,894 INFO sqlalchemy.engine.base.Engine ('Distributed systems with RabbitMQ', 'Distributed systems', 1) 2019-05-09 09:59:16,895 INFO sqlalchemy.engine.base.Engine INSERT INTO article (title, body, author_id) VALUES (?, ?, ?) 2019-05-09 09:59:16,895 INFO sqlalchemy.engine.base.Engine ('Talcommite about Redis', 'Tale about Redis', 1) 2019-05-09 09:59:16,896 INFO sqlalchemy.engine.base.Engine COMMIT

When commit raises an exception and you want to handle it, you should roll back transaction manually so that your application will continue to function normally:

>>> u = User() >>> db.session.add(u) >>> try: ... # Will raise an exception because ... # username and email cannot be NULL ... db.session.commit() ... except: ... db.session.rollback() 2019-05-10 09:01:57,141 INFO sqlalchemy.engine.base.Engine BEGIN (implicit) 2019-05-10 09:01:57,141 INFO sqlalchemy.engine.base.Engine INSERT INTO user (username, email) VALUES (?, ?) 2019-05-10 09:01:57,141 INFO sqlalchemy.engine.base.Engine (None, None) 2019-05-10 09:01:57,142 INFO sqlalchemy.engine.base.Engine ROLLBACK

Atomicity in Django and SQLAlchemy

Now when you understand the difference between how Django and SQLAlchemy handle transactions, you should see the pros and cons of both approaches.

The advantage of running in autocommit mode is that it makes it easier to understand and write code using this ORM. The disadvantage is that if you have multiple queries and one of them succeeds and another fails, then your database is at risk of corruption:

def make_transfer(source, destination, amount): Transfer.objects.create( account=source, amount=-amount ) Transfer.objects.create( # WHAT IF IT FAILS?! account=destination, amount=amount )

The way to resolve the risk is to make queries to the database atomic. Atomicity means that the things you do within a transaction proceed or fail as a single unit. If the block of code is successfully completed, the changes are committed to the database. If there is an exception, the changes are rolled back.

That’s basically what session does in SQLAlchemy. In Django we can achieve atomicity using atomic function:

from django.db import transaction def make_transfer(source, destination, amount): with transaction.atomic(): Transfer.objects.create( account=source, amount=-amount ) Transfer.objects.create( account=destination, amount=amount )

Either both transfers will be added to the database or none of them.

Models in Django and SQLAlchemy

When you define models in Django and SQLAlchemy, the main difference that you will immediately see is that in SQLAlchemy you have to be explicit. In Django though, a lot of stuff is done under the hood.

For example, let’s take a look at how you can define models with different relationships in Django and SQLAlchemy.

Starting from Django:

# models.py from django.db import models from django.conf import settings class Chat(models.Model): participants = models.ManyToManyField( settings.AUTH_USER_MODEL, related_name='chats' ) class Message(models.Model): chat = models.ForeignKey( 'Chat', on_delete=models.CASCADE, related_name='messages' ) sender = models.ForeignKey( settings.AUTH_USER_MODEL, on_delete=models.CASCADE, related_name='messages', ) is_read = models.BooleanField(default=False) text = models.TextField()

The same thing in SQLAlchemy:

# models.py from app import db chat_participation = db.Table( 'chat_participation', db.Column( 'chat_id', db.Integer, db.ForeignKey('chat.id'), primary_key=True ), db.Column( 'user_id', db.Integer, db.ForeignKey('user.id'), primary_key=True ) ) class Chat(db.Model): id = db.Column(db.Integer, primary_key=True) participants = db.relationship( 'User', secondary='chat_participation', backref='chats', ) class Message(db.Model): id = db.Column(db.Integer, primary_key=True) chat_id = db.Column( db.Integer, db.ForeignKey('chat.id', ondelete='CASCADE'), nullable=False ) sender_id = db.Column( db.Integer, db.ForeignKey('user.id', ondelete='CASCADE'), nullable=False ) is_read = db.Column(db.Boolean, default=False, nullable=False) text = db.Column(db.String, nullable=False) chat = db.relationship( 'Chat', backref=db.backref('messages', lazy='dynamic') ) sender = db.relationship( 'User', backref=db.backref('messages', lazy='dynamic') )

Let’s discuss the main differences.

Django’s models.Model class by default creates an autoincrement integer primary key. In SQLAlchemy we have to be explicit about it:

id = db.Column(db.Integer, primary_key=True)

Another difference is that in SQLAlchemy we have to model relationships like one-to-many, many-to-many and one-to-one ourselves. In Django it’s easier because it handles relationships for you.

For example, if we want create a one-to-many relationship in SQLAlchemy, we should define a column first:

class Message(db.Model): ... chat_id = db.Column( db.Integer, db.ForeignKey('chat.id', ondelete='CASCADE'), nullable=False )

And then declare relationship between two models:

class Message(db.Model): ... chat = db.relationship( 'Chat', backref=db.backref('messages', lazy='dynamic') )

Notice that when we define a foreign key, we point to the id column of the chat table. But when we define a relationship between models, we don’t point to the table, we point to the Chat model instead.

The backref argument automatically declares the reverse relationship. The lazy=’dynamic’ creates a dynamic relationship, which means that when we access chat.messages, SQLAlchemy will return a query object that we can further filter:

>>> chat.messages.filter(Message.is_read == True).all() [<Message 1>, ...]

If we don’t specify lazy argument, the default value will be ‘select’. And it works differently. When we access chat.messages the first time, SQLAlchemy will send a query to the database, fetch all of the related messages, and return a list:

>>> chat.messages [<Message 1>, ...]

Queries in Django and SQLAlchemy

When we want to filter a query in Django, we use keyword arguments in a format like column=value or column__operation=value:

# unread messages >>> Message.objects.filter(is_read=True).all() <QuerySet [<Message: Message object (1)>, <Message: Message object (2)>]> # messages that contain word hello >>> Message.objects.filter(text__contains='hello').all() <QuerySet [<Message: Message object (1)>]> # unread messages AND messages that contain word hello >>> Message.objects.filter(is_read=True, text__contains='hello').all() <QuerySet [<Message: Message object (1)>]>

In SQLAlchemy we use model expressions for filtering:

# unread messages >>> Message.query.filter(Message.is_read == False).all() [<Message 1>, <Message 2>] # messages that contain word hello >>> Message.query.filter(Message.text.contains('hello')).all() [<Message 1>] # unread messages AND messages that contain word hello >>> Message.query.filter(Message.is_read == False, Message.text.contains('hello')).all() [<Message 1>]

When it comes to joins, in Django it’s easier because it handles joins for us:

# chats that have unread messages >>> Chat.objects.filter(messages__is_read=False) <ChatQuerySet [<Chat: Chat object (1)>]>

In SQLAlchemy we have to be explicit about it:

# chats that have unread messages >>> Chat.query.join(Message).filter(Message.is_read == False).all() [<Chat 1>]

SQLAlchemy automatically tries to find what to join on, but we can specify it explicitly as the second argument to join:

>>> Chat.query\\ ... .join(Message, Message.chat_id == Chat.id)\\ ... .filter(Message.is_read == False)\\ ... .all() [<Chat 1>]

Most of the times it works without specifying the second argument. But if you have a lot of relations between models, sometimes it can perform automatic joins not the way you want to. So be careful.

Now let’s take a look at how we can reuse queries. For example, what if we need to get dialogs (chats with 2 participants) that have unread messages. Or if we want to check if a particular chat is a dialog and whether it has unread messages or not. We don’t want to copy and paste the code all over again in different places. Remember DRY (Don’t Repeat Yourself).

In Django, what we can do is create a custom QuerySet and properties for Chat model:

class ChatQuerySet(models.QuerySet): def dialogs(self): return self\\ .annotate( participants_count=models.Count('participants') )\\ .filter(participants_count=2) def unread(self): return self.filter(messages__is_read=False) class Chat(models.Model): ... objects = ChatQuerySet.as_manager() @property def is_dialog(self): return self.participants.count() == 2 @property def is_unread(self): return self.messages.filter(is_read=False).exists()

How to use them:

# get dialogs >>> Chat.objects.dialogs() ChatQuerySet [<Chat: Chat object (1)>]> # check whether a chat is a dialog or not >>> chat = Chat.objects.get(id=1) >>> chat.is_dialog True # get chats that have unread messages >>> Chat.objects.unread() ChatQuerySet [<Chat: Chat object (2)>]> # check whether a chat has unread messages or not >>> chat = Chat.objects.get(id=2) >>> chat.is_unread True

In SQLAlchemy, we can make the code reusable with hybrid properties or methods:

from sqlalchemy.ext.hybrid import hybrid_property ... class Chat(db.Model): ... @hybrid_property def is_dialog(self): return len(self.participants) == 2 @is_dialog.expression def is_dialog(cls): dialog = cls.query\ .join(cls.participants)\ .with_entities(cls.id)\ .group_by(cls.id)\ .having(func.count(User.id) == 2)\ .subquery() return cls.id == dialog.c.id @hybrid_property def is_unread(self): return any(not message.is_read for message in self.messages) @is_unread.expression def is_unread(cls): return Message.is_read == False

Hybrid property means that the property can be used at the class level as well as at the instance level:

# get dialogs >>> Chat.query.filter(Chat.is_dialog).all() [<Chat 1>] # check whether a chat is a dialog or not >>> chat = Chat.query.get(1) >>> chat.is_dialog True # get chats that have unread messages >>> Chat.query.join(Message).filter(Chat.is_unread).all() [<Chat 2>] # check whether a chat has unread messages or not >>> chat = Chat.query.get(2) >>> chat.is_unread True

Hybrid methods also can be used at the instance and class levels. The difference is that we can pass parameters to methods if it’s necessary.

N+1 Problem

Let’s discuss a problem that you will face no matter which framework you use: Django or Flask. Imagine that you need to pull messages for each chat from the database and then show message text and information about message sender:

chats = Chat.objects.all() for chat in chats: for message in chat.messages.all(): print(message.sender.username) print(message.text)

How many queries to the database will we send in order to show all of this information? Because Django is lazy, when you write Chat.objects.all() , it only pulls information about chats into memory. It doesn’t pull information about messages, and of course, about message senders.

So, If we have 2 chats and each chat has 10 messages, then it will take 23 queries. First we pull all chats from the database (1 db request). For each chat we pull all messages (another 2 db requests). And finally, for each message we pull information about its sender (2 chats * 10 messages). 1 + 2 + 2 * 10 = 23

What we can do instead is pre-load required information into memory by using prefetch_related function:

chats = Chat.objects\\ .prefetch_related('messages')\\ .prefetch_related('messages__sender')\\ .all() for chat in chats: for message in chat.messages.all(): print(message.sender.username) print(message.text)

We will have only 3 queries in total. One for chats, one for messages and one for users (senders).

We can do a similar thing in SQLAlchemy:

from sqlalchemy.orm import jonedload ... chats = Chat.query.join(Message, Message.chat_id == Chat.id) for chat in chats: messages = chat.messages\\ .join(User, User.id == Message.sender_id)\\ .options(joinedload('sender')) for message in messages: print(message.sender.username) print(message.text)

We pre-loaded sender into memory by using joinedload. As you can see, we didn’t pre-load messages for chat. The problem is that you can’t pre-load lazy=’dynamic’ relationship, because it produces queries, not collections (lists). So, be careful when you define and use lazy=’dynamic’ relationship for your models because it can cause a lot of queries to your database. However, nothing stops you from defining two relationships to the same model:

class Chat(db.Model): ... messages = db.relationship( 'Message', backref=db.backref('chat') ) messages__dynamic = db.relationship( 'Message', lazy='dynamic' )

I would recommend you to use relationship that returns list by default, but when you really need dynamic relationship for filtering nested data, you can easily add it to your model.

Also, keep in mind that you can filter nested relationships even without using dynamic relationship. contains_eager to the rescue:

from sqlalchemy.orm import contains_eager ... chats = Chat.query\\ .join(Chat.messages)\\ .filter(Message.is_read == False)\\ .options(contains_eager(Chat.messages))\\ .all() for chat in chats: for message in chat.messages: # only unread messages ...

First, we load chats that have unread messages. Then we pre-load Chat.messages relationship. After pre-loading, this relationship produces a list of unread messages.

Conclusion

We’ve looked at the most popular Python ORMs: SQLAlchemy and Django ORM. From my experience, Django ORM is easier to learn and use, but SQLAlchemy gives you more flexibility and, maybe, it more suitable for large applications.

If you’re a beginner and you’re trying to choose what to use for your next project, Django or Flask + SQLAlchemy, I would highly recommend to stick with Django. Flask may seem simple at the beginning, but when you start building applications that solve real world problems, it becomes more challenging to use in comparison with Django. Also, Django’s ecosystem is bigger. It has more open source libraries and tutorials.

If you’re not a beginner and you’re trying to build microservices, then Flask + SQLAlchemy is probably a better choice. But if you’ve come to that point, you already should know what and when to use without me.