(tl;dr - scroll down and look at the pretty pictures !)

Object relational mappers give us a way to automate the translation of domain model concepts into SQL and relational database concepts. The SQLAlchemy ORM's level of automation is fairly high, in that it tracks changes in state along a domain model to determine the appropriate set of data to be persisted and when it should occur, synchronizes changes in state from the database back to the domain model along transaction boundaries, and handles the persistence and in-memory restoration of entity relationships and collections, represented as inter-table and inter-row dependencies on the database side. It also can interpret domain-specific concepts into SQL queries that take great advantage of the relational nature of the backend (which in non-hand-wavy speak generally means, SQLAlchemy is fairly sophisticated in the realm of generating joins, subqueries, and combinations thereof).

These are complex, time consuming tasks, especially in an interpreted language like Python. But we use them for the advantage that we can work with fully realized relational models, mapped to domain models that match our way of looking at the problem, with little to no persistence-specific code required outside of configuration. We can optimize the behavior of persistence and loading with little to no changes needed to business or application logic. We save potentially dozens of lines of semi-boilerplate code that would otherwise be needed within each use case to handle the details of loading data, optimizing the loads as needed, marshalling data to and from the domain model, and persisting changes in state from domain to database in the correct order. When we don't need to code persistence details in each use case, there's less code overall, as well as a lower burden of coverage, testing, and maintenance.

The functionality of the ORM is above and beyond what we gain from a so-called "data abstraction layer". The data abstraction layer provides a system of representing SQL statements, bound values, statement execution, and result sets in a way that is agnostic of database backend as well as DBAPI. A data abstraction layer may also automate supplementary tasks such as dealing with primary key generation and sequence usage, generation and introspection of database schemas, management of functions that vary across backends such as savepoints, two phase transactions, complex datatypes.

The data access layer is of course critical, as an ORM cannot exist without one. SQLAlchemy's own data abstraction system forms the basis of the object relational mapper. But the distinction between the two is so critical that we have always kept the two systems completely separate - in our current documentation the systems are known as SQLAlchemy Core and SQLAlchemy ORM.

Keeping these systems explicitly separate helps the developer be aware of a conscious choice - which is that he or she is deciding that the higher level, but decidedly more expensive automation of the ORM layer is worth it. It's always an option to drop down into direct activity with the data abstraction layer, where the work of deciding what SQL statements to create, how they are constructed, when to execute them, and what to do with their results are up to the developer, but the in-Python overhead is fixed and relatively small. Lots of developers forego the ORM entirely for either performance or preferential reasons, and some developers build their own simple object layers on top of the data abstraction layer.

In my own case, my current project involves finance and the ability to persist sets of data across two dozen tables along the order of 500K rows per dataset (where the "dataset" is a set of analytics and historical information generated daily), loading them back later to generate reports, with heavy emphasis on Decimal formatting and calculations. As my project managers suddenly started handing me test files that consisted not of the few hundred records we tested with but more along the order of 20000-30000 records (where each record in turn requires about 20 rows, hence 500K), rewriting some of the ORM code to use direct data abtraction in those places where we need to blaze through all the records seemed like it might be necessary. But each component that's rewritten using the SQL Expression Language directly would grow in size as explicit persistence logic is added, requiring more tests to ensure correct database interaction, reducing reusability, and increasing maintenance load. It would also mean large chunks of existing domain logic, consisting of business rules and calculations that are shared among many components, would have to be refactored in some way to support the receipt and mutation of simple named-tuple result rows (or something similarly rudimentary, else overhead goes right up again), as well as working as they do now associated with rich domain objects. It would most likely add a large chunk of nuts-and-bolts code to what is already a large application with lots of source files, source that is trying very hard to stick to business rules and away from tinkering.

We'd like to delay having to rewrite intricate sections of batch loading and reporting code into hand-tailored SQL operations for as long as possible. This is a new application still under development; if certain components have been running for months or years without any real changes, the burden of rewriting them as inlined SQL is less than when it is during early development, when rules are changing almost daily and the shape of the data is still a moving target.

So to buy us some time, SQLAlchemy 0.7, like all major releases, gets another boost of performance tuning. This tuning is always the same thing - use profiling to identify high-percentage areas for certain operations, remove any obvious inefficiencies within method bodies, then find ways to reduce callcounts and inline functionality into fewer calls. It's very rare these days for there to be any large bottlenecks with "obvious" solutions - we've been cutting down on overhead for years while maintaining our fairly extensive behavioral contract. The overhead comes from just having a large number of pieces that coordinate to execute operations. Each piece, in most cases, is small and gets its work done using the best Python idioms money can buy. It's the sheer number of components, combined with the plainly crushing overhead each function call in Python produces, that add up to slowness. If we do things the way a tool like Hibernate does them, that is using deep call stacks that abstract each decision into its own method, each of those decisions in turn stowing each of its sub-decisions into more methods, building a huge chain of polymorphic nesting for each column written or read, we'd grind to a halt (note that this is fine for Hibernate since Java has unbelievably fast method invocation).