When working with large databases, SQL structure effects query performance dramatically. I recently refactored a query that looked like this:

</p> <pre> SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 where t2.c = 1)</pre> <p>



Refactored into:

</p> <pre> SELECT * FROM t1 INNER JOIN t2 ON t1.a = t2.b AND t2.c = 1</pre> <p>



The query went from around 45 seconds down to 3. Digging into the MySQL docs a little more I found this:

The optimizer rewrites the statement to a correlated subquery:

SELECT … FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);

If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(MxN), rather than O(M+N) as it would be for an uncorrelated subquery.

An implication is that an IN subquery can be much slower than a query written using an IN(value_list)operator that lists the same values that the subquery would return.