The nested loops join is the most fundamental join algorithm. It works like using two nested queries: the outer or driving query to fetch the results from one table and a second query for each row from the driving query to fetch the corresponding data from the other table.

You can actually use “nested selects” to implement the nested loops algorithm on your own. Nevertheless that is a troublesome approach because network latencies occur on top of disk latencies—making the overall response time even worse. “Nested selects” are still very common because it is easy to implement them without being aware of it. Object-relational mapping (ORM) tools are particularly “helpful” in this respect…to the extent that the so-called N+1 selects problem has gained a sad notoriety in the field.

On my Own Behalf I make my living from training, other SQL related services and selling my book. Learn more at https://winand.at/.

The following examples show these “accidental nested select” joins produced with different ORM tools. The examples search for employees whose last name starts with 'WIN' and fetches all SALES for these employees.

Java The JPA example uses the CriteriaBuilder interface. CriteriaBuilder queryBuilder = em.getCriteriaBuilder(); CriteriaQuery<Employees> query = queryBuilder.createQuery(Employees.class); Root<Employees> r = query.from(Employees.class); query.where( queryBuilder.like( queryBuilder.upper(r.get(Employees_.lastName)), "WIN%" ) ); List<Employees> emp = em.createQuery(query).getResultList(); for (Employees e: emp) { // process Employee for (Sales s: e.getSales()) { // process sale for Employee } } Hibernate JPA 3.6.0 generates N+1 select queries: select employees0_.subsidiary_id as subsidiary1_0_ -- MORE COLUMNS from employees employees0_ where upper(employees0_.last_name) like ? select sales0_.subsidiary_id as subsidiary4_0_1_ -- MORE COLUMNS from sales sales0_ where sales0_.subsidiary_id=? and sales0_.employee_id=? select sales0_.subsidiary_id as subsidiary4_0_1_ -- MORE COLUMNS from sales sales0_ where sales0_.subsidiary_id=? and sales0_.employee_id=? Perl The following sample demonstrates Perl’s DBIx::Class framework: my @employees = $schema->resultset('Employees') ->search({'UPPER(last_name)' => {-like=>'WIN%'}}); foreach my $employee (@employees) { # process Employee foreach my $sale ($employee->sales) { # process Sale for Employee } } DBIx::Class 0.08192 generates N+1 select queries: SELECT me.employee_id, me.subsidiary_id , me.last_name, me.first_name, me.date_of_birth FROM employees me WHERE ( UPPER(last_name) LIKE ? ) SELECT me.sale_id, me.employee_id, me.subsidiary_id , me.sale_date, me.eur_value FROM sales me WHERE ( ( me.employee_id = ? AND me.subsidiary_id = ? ) ) SELECT me.sale_id, me.employee_id, me.subsidiary_id , me.sale_date, me.eur_value FROM sales me WHERE ( ( me.employee_id = ? AND me.subsidiary_id = ? ) ) PHP The Doctrine sample uses the query builder interface: $qb = $em->createQueryBuilder(); $qb->select('e') ->from('Employees', 'e') ->where("upper(e.last_name) like :last_name") ->setParameter('last_name', 'WIN%'); $r = $qb->getQuery()->getResult(); foreach ($r as $row) { // process Employee foreach ($row->getSales() as $sale) { // process Sale for Employee } } Doctrine 2.0.5 generates N+1 select queries: SELECT e0_.employee_id AS employee_id0 -- MORE COLUMNS FROM employees e0_ WHERE UPPER(e0_.last_name) LIKE ? SELECT t0.sale_id AS SALE_ID1 -- MORE COLUMNS FROM sales t0 WHERE t0.subsidiary_id = ? AND t0.employee_id = ? SELECT t0.sale_id AS SALE_ID1 -- MORE COLUMNS FROM sales t0 WHERE t0.subsidiary_id = ? AND t0.employee_id = ?

SALES

Enabling SQL Logging Enable SQL logging during development and review the generated SQL statements. DBIx::Class export DBIC_TRACE=1 in your shell. Doctrine Only on source code level—don’t forget to disable this for production. Consider building your own configurable logger. $logger = new \Doctrine\DBAL\Logging\EchoSqlLogger; $config->setSQLLogger($logger); Hibernate (native) <property name="show_sql">true</property> in App.config or hibernate.cfg.xml JPA In persistence.xml but depending on the JPA provider—e.g., for eclipselink, Hibernate and OpenJPA: <property name="eclipselink.logging.level" value="FINE"/> <property name="hibernate.show_sql" value="TRUE"/> <property name="openjpa.Log" value="SQL=TRACE"/> Most ORMs offer a programmatic way to enable SQL logging as well. That involves the risk of accidentally deploying the setting in production.

The ORMs don’t generate SQL joins—instead they query thetable with nested selects. This effect is known as the “N+1 selects problem” or shorter the “N+1 problem” because it executes N+1 selects in total if the driving query returns N rows.

Even though the “nested selects” approach is an anti-pattern, it still explains the nested loops join pretty well. The database executes the join exactly as the ORM tools above. Indexing for a nested loops join is therefore like indexing for the above shown select statements. That is a function-based index on the table EMPLOYEES and a concatenated index for the join predicates on the SALES table:

CREATE INDEX emp_up_name ON employees (UPPER(last_name))

CREATE INDEX sales_emp ON sales (subsidiary_id, employee_id)

An SQL join is still more efficient than the nested selects approach—even though it performs the same index lookups—because it avoids a lot of network communication. It is even faster if the total amount of transferred data is bigger because of the duplication of employee attributes for each sale. That is because of the two dimensions of performance: response time and throughput; in computer networks we call them latency and bandwidth. Bandwidth has only a minor impact on the response time but latencies have a huge impact. That means that the number of database round trips is more important for the response time than the amount of data transferred.

Tip Execute joins in the database.

Most ORM tools offer some way to create SQL joins. The so-called eager fetching mode is probably the most important one. It is typically configured at the property level in the entity mappings—e.g., for the employees property in the Sales class. The ORM tool will then always join the EMPLOYEES table when accessing the SALES table. Configuring eager fetching in the entity mappings only makes sense if you always need the employee details along with the sales data.

Eager fetching is counterproductive if you do not need the child records every time you access the parent object. For a telephone directory application, it does not make sense to load the SALES records when showing employee details. You might need the related sales data in other cases—but not always. A static configuration is no solution.

For optimal performance, you need to gain full control over joins. The following examples show how to get the greatest flexibility by controlling the join behavior at runtime.

Java The JPA CriteriaBuilder interface provides the Root<>.fetch() method for controlling joins. It allows you to specify when and how to join referred objects to the main query. In this example we use a left join to retrieve all employees even if some of them do not have sales. Warning JPA and Hibernate return the employees for each sale. That means that an employee with 30 sales will appear 30 times. Although it is very disturbing, it is the specified behavior (EJB 3.0 persistency, paragraph 4.4.5.3 “Fetch Joins”). You can either manually de-duplicate the parent relation e.g. using a LinkedHashSet or use the function distinct() as shown in the example. CriteriaBuilder qb = em.getCriteriaBuilder(); CriteriaQuery<Employees> q = qb.createQuery(Employees.class); Root<Employees> r = q.from(Employees.class); q.where(queryBuilder.like( queryBuilder.upper(r.get(Employees_.lastName)), "WIN%") ); r.fetch("sales", JoinType.LEFT); // needed to avoid duplication of Employee records query.distinct(true); List<Employees> emp = em.createQuery(query).getResultList(); Hibernate 3.6.0 generates the following SQL statement: select distinct employees0_.subsidiary_id as subsidiary1_0_0_ , employees0_.employee_id as employee2_0_0_ -- MORE COLUMNS , sales1_.sale_id as sale1_0__ from employees employees0_ left outer join sales sales1_ on employees0_.subsidiary_id=sales1_.subsidiary_id and employees0_.employee_id=sales1_.employee_id where upper(employees0_.last_name) like ? The query has the expected left join but also an unnecessary distinct keyword. Unfortunately, JPA does not provide separate API calls to filter duplicated parent entries without de-duplicating the child records as well. The distinct keyword in the SQL query is alarming because most databases will actually filter duplicate records. Only a few databases recognize that the primary keys guarantees uniqueness in that case anyway. The native Hibernate API solves the problem on the client side using a result set transformer: Criteria c = session.createCriteria(Employees.class); c.add(Restrictions.ilike("lastName", 'Win%')); c.setFetchMode("sales", FetchMode.JOIN); c.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY); List<Employees> result = c.list(); It generates the following query: select this_.subsidiary_id as subsidiary1_0_1_ , this_.employee_id as employee2_0_1_ -- MORE this_ columns on employees , sales2_.sale_id as sale1_3_ -- MORE sales2_ columns on sales from employees this_ left outer join sales sales2_ on this_.subsidiary_id=sales2_.subsidiary_id and this_.employee_id=sales2_.employee_id where lower(this_.last_name) like ? This method produces straight SQL without unintended clauses. Note that Hibernate uses lower() for case-insensitive queries—an important detail for function-based indexing. Perl The following example uses Perl’s DBIx::Class framework: my @employees = $schema->resultset('Employees') ->search({ 'UPPER(last_name)' => {-like => 'WIN%'} , {prefetch => ['sales']} }); DBIx::Class 0.08192 generates the following SQL statement: SELECT me.employee_id, me.subsidiary_id, me.last_name -- MORE COLUMNS FROM employees me LEFT JOIN sales sales ON (sales.employee_id = me.employee_id AND sales.subsidiary_id = me.subsidiary_id) WHERE ( UPPER(last_name) LIKE ? ) ORDER BY sales.employee_id, sales.subsidiary_id Note the order by clause—it was not requested by the application. The database has to sort the result set accordingly, and that might take a while. PHP The following example uses PHP’s Doctrine framework: $qb = $em->createQueryBuilder(); $qb->select('e,s') ->from('Employees', 'e') ->leftJoin('e.sales', 's') ->where("upper(e.last_name) like :last_name") ->setParameter('last_name', 'WIN%'); $r = $qb->getQuery()->getResult(); Doctrine 2.0.5 generates the following SQL statement: SELECT e0_.employee_id AS employee_id0 -- MORE COLUMNS FROM employees e0_ LEFT JOIN sales s1_ ON e0_.subsidiary_id = s1_.subsidiary_id AND e0_.employee_id = s1_.employee_id WHERE UPPER(e0_.last_name) LIKE ?

The execution plan shows the NESTED LOOPS OUTER operation:

DB2 Explain Plan --------------------------------------------------------------- ID | Operation | Rows | Cost 1 | RETURN | | 10501 2 | NLJOIN (LEFT) | 5745 of 57 | 10501 3 | FETCH EMPLOYEES | 57 of 57 (100.00%) | 49 4 | RIDSCN | 57 of 57 (100.00%) | 6 5 | SORT (UNIQUE) | 57 of 57 (100.00%) | 6 6 | IXSCAN EMP_NAME | 57 of 10000 ( .57%) | 6 7 | FETCH SALES | 101 of 101 (100.00%) | 183 8 | IXSCAN SALES_SUB_EMP | 101 of 1011118 ( .01%) | 13 Predicate Information 2 - JOIN (Q2.EMPLOYEE_ID = Q3.EMPLOYEE_ID) JOIN (Q2.SUBSIDIARY_ID = Q3.SUBSIDIARY_ID) 3 - SARG (Q1.LAST_NAME LIKE ?) 6 - START ($INTERNAL_FUNC$() <= Q1.LAST_NAME) STOP (Q1.LAST_NAME <= $INTERNAL_FUNC$()) SARG (Q1.LAST_NAME LIKE ?) 8 - START (Q2.SUBSIDIARY_ID = Q3.SUBSIDIARY_ID) START (Q2.EMPLOYEE_ID = Q3.EMPLOYEE_ID) STOP (Q2.SUBSIDIARY_ID = Q3.SUBSIDIARY_ID) STOP (Q2.EMPLOYEE_ID = Q3.EMPLOYEE_ID) The UPPER was removed from the where -clause to get the expected result. Oracle --------------------------------------------------------------- |Id |Operation | Name | Rows | Cost | --------------------------------------------------------------- | 0 |SELECT STATEMENT | | 822 | 38 | | 1 | NESTED LOOPS OUTER | | 822 | 38 | | 2 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 4 | |*3 | INDEX RANGE SCAN | EMP_UP_NAME | 1 | | | 4 | TABLE ACCESS BY INDEX ROWID| SALES | 821 | 34 | |*5 | INDEX RANGE SCAN | SALES_EMP | 31 | | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(UPPER("LAST_NAME") LIKE 'WIN%') filter(UPPER("LAST_NAME") LIKE 'WIN%') 5 - access("E0_"."SUBSIDIARY_ID"="S1_"."SUBSIDIARY_ID"(+) AND "E0_"."EMPLOYEE_ID" ="S1_"."EMPLOYEE_ID"(+))

The database retrieves the result from the EMPLOYEES table via EMP_UP_NAME first and fetches the corresponding records from the SALES table for each employee afterwards.

Tip Get to know your ORM and take control of joins. Different ORM tools offer different ways to control join behavior. Eager fetching is just one example that is not even provided by every object-relational mapper. It is good practice to implement a small set of samples that explores the capabilities of your ORM. That’s not only a good exercise, it can also serve as reference during development, and will show you unexpected behavior—like duplication of parent records as a side effect of using joins. Download the samples to get started.

The nested loops join delivers good performance if the driving query returns a small result set. Otherwise, the optimizer might choose an entirely different join algorithm—like the hash join described in the next section, but this is only possible if the application uses a join to tell the database what data it actually needs.