Dynamic Search Conditions in T‑SQL

Version for SQL 2005 and Earlier

An SQL text by Erland Sommarskog, SQL Server MVP. Most recent update 2014-12-27.

Copyright applies to this text. See here for font conventions used in this article. An older version of this article is available in German, in translation by SQL Server MVP Frank Kalis.

Introduction

Note: This is an old article in which no content has been added or updated since 2009. (The update in december 2014 was the addition of this note and a formatting overhaul.) There is a newer version of this article which applies to SQL 2008 and later versions. The article you are looking at right now is for users who are still on SQL 2000 or SQL 2005 where OPTION (RECOMPILE) is not available, or does not work as one would like.

A very common requirement in an information system is to have a function (or several functions) where the users are able to search the data by selecting freely among many possible criterias. This is a tough challenge, because not only must you produce the desired output, but you must also keep the response time within acceptable limits, at least for common searches. And on top of all, the code must be maintainable, so you can meet new needs and requirements.

In this text I will look at various techniques to solve this problem. There are two main alternatives: dynamic SQL and static SQL. There are also hybrid solutions that use both. When you are on SQL 2005 and earlier, as soon as the number of possible search conditions is more than just a handful, dynamic SQL is the most effective solution, in terms of performance, development and maintenance. On SQL 2000 and earlier, to use dynamic SQL in its pure form you need to give the users direct SELECT permissions to the involved tables, and this is far from always permissible. SQL 2005 provides ways around this issue.

I will first look at using dynamic SQL, and try to point out some traps that you should avoid. I will then cover techniques for static SQL to give you a set of methods and tricks that you can combine to implement your search functions. Finally, I will present two hybrid solutions that use both dynamic and static SQL and in this way evades the permissions problem.

The Case Study: Searching Orders

Throughout this text, we will work with implementing a stored procedure that retrieves information about orders in the Northwind and Northgale databases, as well as the customers and the products of the orders. Northwind is a sample database that comes with SQL 7 and SQL 2000. Northwind does not ship with SQL 2005, but you can retrieve it from Microsoft's web site. Northgale is a bigger database that I have derived from Northwind, more about it below.

This is the interface that we expose to the user (or more probably to a GUI or middle-layer programmer):

CREATE PROCEDURE search_orders @orderid int = NULL, @fromdate datetime = NULL, @todate datetime = NULL, @minprice money = NULL, @maxprice money = NULL, @custid nchar(5) = NULL, @custname nvarchar(40) = NULL, @city nvarchar(15) = NULL, @region nvarchar(15) = NULL, @country nvarchar(15) = NULL, @prodid int = NULL, @prodname nvarchar(40) = NULL AS SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON p.ProductID = od.ProductID WHERE ??? ORDER BY o.OrderID

You see in the SELECT list what information the user gets. Here is a specification of the parameters:

Parameter Function @orderid Retrieve this order only. @fromdate Retrieve orders made on @fromdate or later. @todate Retrieve orders made on @todate or earlier. @minprice Retrieve only order details that cost at least @minprice . @maxprice Retrieve only order details that cost at most @maxprice . @custid Retrieve only orders from this customer. @custname Retrieve only orders from customers whose name starts with @custname . @city Retrieve only orders from customers in this city. @region Retrieve only orders from customers in this region. @country Retrieve only orders from customers in this country. @prodid Retrieve only order details with this product. @prodname Retrieve only order details with a product starting with @prodname .

If the user leaves out a search condition, that search condition should not apply to the search. Thus a plain EXEC search_orders should return all orders in the database.

This example is fairly simple, since each condition can be implemented with a single condition using = , <=, >= or LIKE. In many real-life situations you have more conditions that affect which tables or columns you need to access. I have purposely left out such conditions from search_orders, in order to keep focus on the main issues and I only touch more complex conditions in a few places. My hope is that when you have read this article, you will have a better ground to stand on so that you will be able to tackle the more complex conditions you may run into on your own.

Sometimes you may want to search for a list of values, like @prodid = '14,56,76'. The way to deal with this is to unpack the list with table-valued function. This is not something I cover here, as I have an article Arrays and Lists in SQL Server on my web site that is devoted to this topic.

It is not uncommon that there is a requirement for the user to be able to specify how the output is to be sorted. Since this article is long enough, I will not dwell on this topic more than this brief note. If you are using dynamic SQL, you would of course build the ORDER BY clause dynamically as well. If you are using static SQL, see this section in the general version of this article.

In the text I discuss a number different implementations of search_orders, unimaginatively named search_orders_1 etc. Some of them are included in whole in this text, others only in parts. All are available in the dynsearch directory on my web site. (The numbering of the procedure is somewhat out of order with the text. Originally there were seven of them, but as the article have evolved by time I've added more, but I have not renumbered existing procedures.)

To know what I'm talking about when it comes to performance, I have benchmarked all my search procedures for six different search conditions. While I'm referring to my results in this article, I am not disclosing any numbers, because they are not generally applicable. How good or bad a solution is in a certain situation depends on available indexes, data distribution etc. Throughout this article I will stress that if you care about performance, it is necessary that you run your own benchmarks with production-size data. This applies not the least if you are constrained to static SQL only..

The Northgale Database

The Northwind database is very small, so it is very difficult to draw conclusions about the performance from queries in this database. A query that one would expect to use an index, may in fact use a scan, simply because at those small volumes, the scan is cheaper.

For this reason I have composed the Northgale database. Northgale has the same tables and indexes as Northwind, but I have exploded the data so that instead of 830 orders, there are 344035 of them. To install Northgale, run Northgale.sql. (If you are on SQL 2000, use Northgale-2000.sql instead.) The data in Northgale is taken from Northwind, so this database must be in place. To install Northgale, you need 4.6 GB of disk space on SQL 2005 and 2.6 GB on SQL 2000. Once installed, the database takes up 520 MB. (Log space needed when inserting the data is reclaimed at the end of the script.) By default, the database is installed in the same directory as the master database, but you can edit the script to change that.

I've exploded the data by cross-joining the tables. For IDs, I've composed combinations, so for instance there are now 6640 customers rather than 91 as in the original Northwind. I have also generated new customer and product names by permuting the parts of the names. However, I have not created new cities, countries or regions, so a search on a city alone can yield very many hits.

Keep in mind that Northgale too is a small database by today's standards. For instance, it easily fits into cache entirely on reasonably equipped server. A poorly written query that requires a scan of, say, the Orders table, still returns within a few seconds. It's hopefully big enough to give a sense for how good or bad different solutions are, but I would advise you to not draw any far-reaching conclusions. It is also worth pointing out that the way the database was composed, the distribution of data is a bit skewed

When you implement a dynamic search function, you should always benchmark your solution with your production database, as well inspect query plans for common search cases.

Dynamic SQL

Introduction

Performance

You might have been told: Don't use dynamic SQL, it's slow! Yes, used wrongly and in the wrong place dynamic SQL can degrade performance, and sometimes severely. But if you use sp_executesql and you always refer to your tables in two-part notation, the penalty for dynamic SQL is about negligible. On the other hand, if you use EXEC() and refer to tables with only the table name there can be a cost – not for running the query itself. The cost is for building the query plan. With stored procedures and correctly written dynamic SQL, SQL Server can find a query plan in cache that it can reuse.

For a dynamic search like our search_orders, there is no single plan that fits all combinations of input parameters. One plan which may give sub-second response when the user specifies the product, may take two minutes if the user specifies the customer or vice versa. In this case, some extra 100 ms to build a query plan for each new combination of search parameters is a price well worth paying. Particularly, if the plan stays in cache and can be reused by other users.

Thus, from a performance perspective, dynamic SQL is often a superior solution for this sort of problem.

Security

On SQL 2000, there is an important security aspect you must be aware of. Recall that with stored procedures, users do not need to have direct access rights to the tables and views the procedures refer to; they only need permission to execute the procedures. When a stored procedure is running, the rights of the procedure owner apply. However, dynamic SQL created in a stored procedure is not part of that procedure, and when the dynamic SQL code executes, the permissions of the current user applies. Thus, to use dynamic SQL you need to grant your users SELECT access directly to the tables. If this is not acceptable in your system, you cannot use dynamic SQL.

...well, it is not 100% true. There are hybrid solutions where you put the static SQL in a view or an inline-table function. Particularly the latter can give you almost all the power of dynamic SQL, without having to expose tables directly to the users. We will look into this in the section Hybrid Solutions – Using Both Static and Dynamic SQL.

On SQL 2005, there are two more ways to avoid the permissions issue: You can sign the procedure with a certificate and then associate the certificate with a virtual user that you grant the necessary permissions. You can also use the EXECUTE AS clause to impersonate such a virtual user. The latter solution is less hassle to implement, but it has side effects that can break row-level security schemes and make system monitoring more difficult. I describe both these methods in detail in my article Granting Permissions through Stored Procedures.

Further Reading

There are more things to say about dynamic SQL as such, that I will not go into here. Rather I refer you to my web article The Curse and Blessings of Dynamic SQL, which discusses the use of dynamic SQL in general. In this article, I also cover the performance and security topics in more detail.

Testing is Necessary!

Because of the way you generate the code, it is very important that you test all input parameters, and preferably some combinations of them too. If you are not careful, the users can experience syntax errors from your code when you build queries dynamically.

You should also test that you get the desired performance, using data volumes in parity to what you can expect in production.

The Methods to Do Dynamic SQL

There are three ways to go:

A T‑SQL procedure that builds a query string and executes it with sp_executesql . A CLR procedure that builds a parameterised query string and executes it. (SQL 2005 only.) A T‑SQL procedure that builds a query string and executes it with EXEC () .

Of these three, the first two are both good choices, and use what you find the most convenient. Using EXEC() is a inferior method that is more difficult to work with and your prospects for query-plan reuse are slim. There are however a few special situations where EXEC() and the techniques around it comes into play.

I will in detail discuss an implementation of search_orders that uses sp_executesql, and I will also present two CLR implementations. I also show an implementation of search_orders that uses EXEC(), but this can be considered extra reading that you can skip if you like. Before I move on to static SQL, I will look a little at situations where cached query plans hampers your performance. This, too, you can regard as extra reading and skim over at first reading.

Using sp_executesql

sp_executesql is a system procedure that takes an SQL statement as its first parameter, and a declaration of parameters as the second parameter, and the remaining parameters are determined by that parameter list. Here is the procedure search_orders_1, which uses sp_executesql:

CREATE PROCEDURE search_orders_1 -- 1 @orderid int = NULL, -- 2 @fromdate datetime = NULL, -- 3 @todate datetime = NULL, -- 4 @minprice money = NULL, -- 5 @maxprice money = NULL, -- 6 @custid nchar(5) = NULL, -- 7 @custname nvarchar(40) = NULL, -- 8 @city nvarchar(15) = NULL, -- 9 @region nvarchar(15) = NULL, -- 10 @country nvarchar(15) = NULL, -- 11 @prodid int = NULL, -- 12 @prodname nvarchar(40) = NULL, -- 13 @debug bit = 0 AS -- 14 -- 15 DECLARE @sql nvarchar(4000), -- 16 @paramlist nvarchar(4000) -- 17 -- 18 SELECT @sql = -- 19 'SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, -- 20 c.CustomerID, c.CompanyName, c.Address, c.City, -- 21 c.Region, c.PostalCode, c.Country, c.Phone, -- 22 p.ProductID, p.ProductName, p.UnitsInStock, -- 23 p.UnitsOnOrder -- 24 FROM dbo.Orders o -- 25 JOIN dbo.[Order Details] od ON o.OrderID = od.OrderID -- 26 JOIN dbo.Customers c ON o.CustomerID = c.CustomerID -- 27 JOIN dbo.Products p ON p.ProductID = od.ProductID -- 28 WHERE 1 = 1' -- 29 -- 30 IF @orderid IS NOT NULL -- 31 SELECT @sql = @sql + ' AND o.OrderID = @xorderid' + -- 32 ' AND od.OrderID = @xorderid' -- 33 -- 34 IF @fromdate IS NOT NULL -- 35 SELECT @sql = @sql + ' AND o.OrderDate >= @xfromdate' -- 36 -- 37 IF @todate IS NOT NULL -- 38 SELECT @sql = @sql + ' AND o.OrderDate <= @xtodate' -- 39 -- 40 IF @minprice IS NOT NULL -- 41 SELECT @sql = @sql + ' AND od.UnitPrice >= @xminprice' -- 42 -- 43 IF @maxprice IS NOT NULL -- 44 SELECT @sql = @sql + ' AND od.UnitPrice <= @xmaxprice' -- 45 -- 46 IF @custid IS NOT NULL -- 47 SELECT @sql = @sql + ' AND o.CustomerID = @xcustid' + -- 48 ' AND c.CustomerID = @xcustid' -- 49 -- 50 IF @custname IS NOT NULL -- 51 SELECT @sql = @sql + ' AND c.CompanyName LIKE @xcustname + ''%''' -- 52 -- 53 IF @city IS NOT NULL -- 54 SELECT @sql = @sql + ' AND c.City = @xcity' -- 55 -- 56 IF @region IS NOT NULL -- 57 SELECT @sql = @sql + ' AND c.Region = @xregion' -- 58 -- 59 IF @country IS NOT NULL -- 60 SELECT @sql = @sql + ' AND c.Country = @xcountry' -- 61 -- 62 IF @prodid IS NOT NULL -- 63 SELECT @sql = @sql + ' AND od.ProductID = @xprodid' + -- 64 ' AND p.ProductID = @xprodid' -- 65 -- 66 IF @prodname IS NOT NULL -- 67 SELECT @sql = @sql + ' AND p.ProductName LIKE @xprodname + ''%''' -- 68 -- 69 SELECT @sql = @sql + ' ORDER BY o.OrderID' -- 70 -- 71 IF @debug = 1 -- 72 PRINT @sql -- 73 -- 74 SELECT @paramlist = '@xorderid int, -- 75 @xfromdate datetime, -- 76 @xtodate datetime, -- 77 @xminprice money, -- 78 @xmaxprice money, -- 79 @xcustid nchar(5), -- 80 @xcustname nvarchar(40), -- 81 @xcity nvarchar(15), -- 82 @xregion nvarchar(15), -- 83 @xcountry nvarchar(15), -- 84 @xprodid int, -- 85 @xprodname nvarchar(40)' -- 86 -- 87 EXEC sp_executesql @sql, @paramlist, -- 88 @orderid, @fromdate, @todate, @minprice, -- 89 @maxprice, @custid, @custname, @city, @region, -- 90 @country, @prodid, @prodname -- 91

In case you are reading this with a narrow browser window, I should point out that there are line numbers to the right that I will refer to in the following text.

Overall Flow

On lines 19-29, I compose the basic SQL string. The condition WHERE 1 = 1 on line 29 is there to permit the users to call the procedure without specifying any parameters at all.

Then on lines 31-68, I check all parameters (save @debug), and if a parameter is non-NULL, I add a condition for the corresponding column to the SQL string. Finally on line 70, I add the ORDER BY clause.

On line 72, I test the @debug parameter. If 1, I print the SQL string. This is handy, if the dynamic SQL yields a compilation error that I don't understand. Once I see the SQL code the error might be apparent. A typical error is to miss a space, leading to code that reads:

WHERE 1 = 1 AND o.OrderDate <= @xtodateAND p.ProductName LIKE @xprodname

On lines 75-86 I declare the parameter list for my dynamic SQL statement, and on lines 88-91 I finally execute it.

A Little More in Detail on sp_executesql

sp_executesql is a system procedure with a very special parameter list. The first parameter is a parameterised SQL statement. The second parameter is a parameter-list declaration, very similar to the parameter list to a stored procedure. And the remaining parameters are simply the parameters defined by that parameter-list parameter.

The parameterised query is placed in cache, so if a second user make a search using the same input parameters (for instance @city and @prodname), the query plan will be reused, even if he uses different values in his search.

The parameter-list parameter that I pass to sp_executesql is basically the same as the parameter list to the procedure itself. Here, I have called the parameters @xorderid and so on. There is not any technical reason for this, and normally I would have re-used the names in the parameter list of the stored procedure. But I wanted to make you aware of that the parameters inside the dynamic SQL have no relation to the parameters and variables in the surrounding stored procedure. The dynamic SQL constitutes a scope on its own. Think of it as a stored procedure that you create on the fly.

The SQL statement and the parameter must be Unicode strings, so the @sql and @paramlist variables must be nvarchar. And if you pass the SQL statement or the parameter-list as literals, you must use N (for National) denote a Unicode literal:

EXEC sp_executesql N'SELECT * FROM Orders WHERE OrderID = @orderid', N'@orderid int', @orderid = 10872

You can use OUTPUT parameters too with sp_executesql. I'm not showing any example of this here, but see the section on sp_executesql in Curse and Blessings...

Declaring the SQL String

You can see that on line 16, I have this declaration:

DECLARE @sql nvarchar(4000)

In SQL 7 and SQL 2000, there is no way to work with variables with more than 8000 bytes, which imposes a practical limit when working with sp_executesql, although the statement parameter to sp_executesql is ntext on these versions of SQL Server. It's still possible to work with longer query strings if you use EXEC() as you can say EXEC(@sql1 + @sql2 + @sql3).

SQL 2005 introduced a new data type, nvarchar(MAX) which is unlimited just like ntext, but without all the restrictions. You can declare your @sql variable to be nvarchar(MAX) and leave worries behind about outgrowing the variable. (I did not use nvarchar(MAX) in search_orders_1 so that it would run on SQL 7 and SQL 2000 as well.)

The dbo Prefix

On lines 25-28 there is something very important:

FROM dbo .Orders o JOIN dbo .[Order Details] od ON o.OrderID = od.OrderID JOIN dbo .Customers c ON o.CustomerID = c.CustomerID JOIN dbo .Products p ON p.ProductID = od.ProductID

As you can see, I refer to all tables in two-part notation. That is, I also specify the schema (which in SQL 7/2000 parlance normally is referred to as owner.) If I would leave out the schema, each user would get his own his own private version of the query plan, so if there are ten users searching for orders there will be ten instances of the plan in cache. (As discussed in Curse and Blessings... it may work without specifying the schema on SQL 2005, but you should not rely on that.)

(The author must shamefully confess that earlier versions of this article did not prefix the tables with dbo, and yet I said that query plans would be reused by other users.)

Double Feature

You may wonder why there are two conditions on the order id, customer id and product id (lines 32-33, 48-49 and 64-65). This is because I once learnt that you should always specify all conditions you know of to help the optimizer out. That was very long ago, and the rule may no longer apply. But I included it here nevertheless.

Picking Different Tables

Assume for the example, that there had been a table HistoricOrders in Northwind for orders that are shipped, invoiced and paid for, and users should be given the choice to search either current or historic orders. (For simplicity, I'm overlooking Order Details here.) How would you solve this?

You may be tempted to try something like SELECT ... FROM @ordertable and then add @ordertable as a parameter. However, T‑SQL does not permit you to parameterise the table name.

In fact, you should not pass the table name as a parameter to the procedure at all; once you have started to use stored procedures, all references to table and column names should be in the procedures themselves. The correct way would be add another parameter to search_orders:

@historic_data bit DEFAULT = 0

and then line 25 would read:

FROM dbo.' + CASE @historic_data WHEN 0 THEN 'Orders' WHEN 1 THEN 'Historicorders' END + ' o

Trying it out

Here are some test cases:

EXEC search_orders_1 @orderid = 11000 EXEC search_orders_1 @custid = 'ALFKI' EXEC search_orders_1 @prodid = 76 EXEC search_orders_1 @prodid = 76, @custid = 'RATTC' EXEC search_orders_1 @fromdate = '19980205', @todate = '19980209' EXEC search_orders_1 @city = N'Bräcke', @prodid = 76

If you try these and inspect the query plans, you will see that in the available indexes on the search columns is used in Northgale with one exception: the index on Customers.City is not used for the last case, but this column is not very selective. If you run the queries in Northwind, you will see more scans, but this is due to the small size of that database.

Using the CLR

SQL 2005 adds the possibility to write stored procedures in languages that use the CLR (Common Language Runtime), such as Visual Basic .Net or C#. A dynamic search can be implemented in the CLR just as well as in T‑SQL. After all, search_orders_1 is all about string manipulation until it invokes sp_executesql.

If you are more comfortable working in VB or C#, you may prefer to implement your searches in the CLR rather than T‑SQL. The reverse also applies: if you feel more comfortable with T‑SQL, there is little reason to use the CLR to implement dynamic searches.

I've written two CLR procedures, search_orders_vb and search_orders_cs, that I will discuss in this section. As the code is fairly repetitive, I'm not including any of them in full here, but I only highlight some important points. Beware, that I will not go into any details on writing CLR stored procedures as such. If you have never worked with the CLR before, but are curious, I refer you to Books Online. At the end of this section there are instructions on how to create these two procedures in SQL Server.

Setting up the Statement

This is how search_orders_cs starts off:

string Query; SqlCommand Command = new SqlCommand(); Query = @"SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM dbo.Orders o JOIN dbo.[Order Details] od ON o.OrderID = od.OrderID JOIN dbo.Customers c ON o.CustomerID = c.CustomerID JOIN dbo.Products p ON p.ProductID = od.ProductID WHERE 1 = 1 ";

As you can see this is very similar to search_orders_1, including the dbo prefix. The rule that you should use two-part notation to maximise query-plan reuse applies to CLR procedures as well.

Defining the Parameters

This is the very important part. Far too often on the newsgroups, I see posters who inline the parameter values into their query strings. This is bad for several reasons. One is that you get very little chance for query-plan reuse. The second and even more important reason is a security risk known as SQL injection, which I discuss in Curses and Blessings...

What you should do is to build parameterised statements. Here is how the @custid parameter is added in search_orders_cs:

if (! Custid.IsNull) { Query += " AND o.CustomerID = @custid" + " AND c.CustomerID = @custid"; Command.Parameters.Add("@custid", SqlDbType.NChar, 5); Command.Parameters["@custid"].Value = Custid; Command.Parameters["@custid"].Direction = ParameterDirection.Input; }

As in the T‑SQL example, the query string is extended with the conditions for the parameter in both Orders and Customers.

What is different from T‑SQL is how we define the parameter list and supply the value. In T‑SQL the parameter list is a string, which includes all possible parameters. When working with the CLR, we only define the parameters that actually are in use. We define a parameter by adding it to the Parameters collection of the SqlCommand object. There are a number of ways to do this, and I refer you MSDN Library for a complete reference. The example shows a pattern that works for the most commonly used data types. The first parameter is the variable name, while the second parameter is the type indicator from the SqlDbType enumeration. The last parameter is the length, which I discuss in more detail in the next section. Note that for decimal/numeric parameters, you need to use some different way to add them, as this flavour does not have means to specify scale and precision.

Once the parameter is defined, I assign the value separately. I also explicitly specify the direction, although this is hardly necessary.

Here is the above in Visual Basic .Net:

If Not Custid.IsNull Then Query &= " AND o.CustomerID = @custid" & _ " AND c.CustomerID = @custid" & VbCrLf Command.Parameters.Add("@custid", SqlDbType.NChar, 5) Command.Parameters("@custid").Value = Custid Command.Parameters("@custid").Direction = ParameterDirection.Input End If

It's very similar to the C# example. Different operator for string concatenation, parentheses to address elements in the collection and no semicolons.

Don't Forget to Specify the Length!

When the parameter has any of the data types char, varchar, nchar, nvarchar, binary or varbinary, you need to specify the length of the parameter, whereas for fixed-length data types you leave it out.

Well, if you try, you may find that you always can leave out the length. You may also get the idea to use the method AddWithValue, which permits you to define the parameter and value in a single call. Do not fall into the trap of using these shortcuts! When you don't specify the length explicitly, SqlClient will use the length of the actual parameter list when it builds the parameter list for sp_executesql. Thus, if one user enters Alfred, the parameter will be declared as:

@custname nvarchar(6)

But if another user enters Berglund, the parameter will be declared as:

@custname nvarchar(8)

When SQL Server looks up a query in the cache, it hashes the query text and the parameter list and performs a lookup on that hash value. That is, differences in the parameter list will result in different cache entries. The net effect is that you will have less benefit from the cache, and the risk that you contribute to cache bloat increases. Under extreme circumstances, the sheer numbers of plans in the cache can lead to degraded performance.

If you feel that you don't want to hardcode the length of the column in case it could change in the future, rather than leaving out the length, use the maximum length for the type, that is 8000 for char and varchar and 4000 for nchar and nvarchar.

And, again to clarify, this issue applies to char, varchar, nchar, nvarchar, binary or varbinary only. For other types you should never specify the length.

Running the Query

This is how this looks like in C#:

using (SqlConnection Connection = new SqlConnection("context connection=true")) { Connection.Open(); if (Debug) { SqlContext.Pipe.Send(Query); } Command.CommandType = CommandType.Text; Command.CommandText = Query; Command.Connection = Connection; SqlContext.Pipe.ExecuteAndSend(Command); }

Which very much is the standard way to run a query from a CLR procedure. SqlContext.Pipe.Send is how you say PRINT in the CLR.

Loading the Examples

If you have any flavour of Visual Studio 2005 (including the Express editions), you can deploy search_orders_cs and search_orders_vb from Visual Studio. (But please don't ask me how to do it, Visual Studio just leaves me in a maze.)

Since the .Net Framework comes with SQL Server and includes compilers for the most common .Net languages, you can also load them without Visual Studio. First make sure that C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727 (or corresponding) is in your path. Then run from a command-line window:

csc /target:library search_orders_cs.cs vbc /target:library search_orders_vb.vb

To load the DLLs into SQL Server, you can use load_clr_sp.sql. You will have to change path in the CREATE ASSEMBLY command to where you placed the DLLs. Note that paths are as seen from SQL Server, so if you don't have SQL Server on your local machine, you will have to copy the DLLs to the SQL Server box, or specify a UNC path to your machine.

Trying it out

The same test cases as for search_orders_1:

EXEC search_orders_cs @orderid = 11000 EXEC search_orders_cs @custid = 'ALFKI' EXEC search_orders_cs @prodid = 76 EXEC search_orders_cs @prodid = 76, @custid = 'RATTC' EXEC search_orders_cs @fromdate = '19980205', @todate = '19980209' EXEC search_orders_cs @city = N'Bräcke', @prodid = 76

The query plans should be identical to search_orders_1, as it is the same queries. Hint: if you want to look at the query plans, you need use Profiler, and get the Showplan XML event. For some reason, you don't see query plans for queries submitted from CLR code in Management Studio or Query Analyzer.

Using EXEC()

As I've already mentioned, EXEC() is an inferior solution to sp_executesql and the CLR. The reason for this is that since EXEC() does not take parameters, you have to build a query string with parameter values inlined. This precludes query plans from being reused. It also makes the programming more difficult.

Nevertheless, there are three situations where you need to use EXEC() or at least inline parameter values:

You are on SQL 7 or SQL 2000, and the query string could exceed 4000 characters. As you can say EXEC(@sql1 + @sql2 + @sql3 ...) there is no practical limit to the length of the query string with EXEC () . On SQL 2005 you can use nvarchar( MAX ) , with sp_executesql , so this issue does not exist there.

there is no practical limit to the length of the query string with . On SQL 2005 you can use , with , so this issue does not exist there. You use a hybrid solution with a inline table function, a solution I will discuss in detail later in this article.

You opt to inline the value of a certain parameter into the query. We will look into why you want to this in the section When Caching Is Not Really What You Want.

So while it's not equally essential to master EXEC() as sp_executesql to build dynamic search functions, it can still be worthwhile to study the technique so you can avoid the pitfalls if you would need to use it. But feel free to skip this section in your first reading if you are in a hurry and come back when you actually need to use it.

search_orders_2

In difference to sp_executesql, EXEC() only takes a string as a parameter, so you can not parameterise the query. Instead you have to put the values into the SQL string. At first glance you may find this simpler, but as we shall see, this is in fact more complicated. Here is the procedure search_orders_2:

CREATE PROCEDURE search_orders_2 -- 1 @orderid int = NULL, -- 2 @fromdate datetime = NULL, -- 3 @todate datetime = NULL, -- 4 @minprice money = NULL, -- 5 @maxprice money = NULL, -- 6 @custid nchar(5) = NULL, -- 7 @custname nvarchar(40) = NULL, -- 8 @city nvarchar(15) = NULL, -- 9 @region nvarchar(15) = NULL, -- 10 @country nvarchar(15) = NULL, -- 11 @prodid int = NULL, -- 12 @prodname nvarchar(40) = NULL, -- 13 @debug bit = 0 AS -- 14 -- 15 DECLARE @sql1 nvarchar(4000), -- 16 @sql2 nvarchar(4000), -- 17 @fromdatestr char(23), -- 18 @todatestr char(23), -- 19 @minpricestr varchar(25), -- 20 @maxpricestr varchar(25) -- 21 -- 22 SELECT @fromdatestr = convert(char(23), @fromdate, 126), -- 23 @todatestr = convert(char(23), @todate, 126), -- 24 @minpricestr = convert(varchar(25), @minprice), -- 25 @maxpricestr = convert(varchar(25), @maxprice) -- 26 -- 27 SELECT @sql1 = -- 28 'SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, -- 29 c.CustomerID, c.CompanyName, c.Address, c.City, -- 30 c.Region, c.PostalCode, c.Country, c.Phone, -- 31 p.ProductID, p.ProductName, p.UnitsInStock, -- 32 p.UnitsOnOrder -- 33 FROM dbo.Orders o -- 34 JOIN dbo.[Order Details] od ON o.OrderID = od.OrderID -- 35 JOIN dbo.Customers c ON o.CustomerID = c.CustomerID -- 36 JOIN dbo.Products p ON p.ProductID = od.ProductID -- 37 WHERE 1 = 1' -- 38 -- 39 IF @orderid IS NOT NULL -- 40 SELECT @sql1 = @sql1 + ' AND o.OrderID = ' + str(@orderid) + -- 41 ' AND od.OrderID = ' + str(@orderid) -- 42 -- 43 IF @fromdate IS NOT NULL -- 44 SELECT @sql1 = @sql1 + ' AND o.OrderDate >= ' + -- 45 quotename(@fromdatestr, '''') -- 46 -- 47 IF @todate IS NOT NULL -- 48 SELECT @sql1 = @sql1 + ' AND o.OrderDate <= ' + -- 49 quotename(@todatestr, '''') -- 50 -- 51 IF @minprice IS NOT NULL -- 52 SELECT @sql1 = @sql1 + ' AND od.UnitPrice >= ' + @minpricestr -- 53 -- 54 IF @maxprice IS NOT NULL -- 55 SELECT @sql1 = @sql1 + ' AND od.UnitPrice <= ' + @maxpricestr -- 56 -- 57 SELECT @sql2 = '' -- 58 -- 59 IF @custid IS NOT NULL -- 60 SELECT @sql2 = @sql2 + ' AND o.CustomerID = ' + -- 61 quotename(@custid, '''') + -- 62 ' AND c.CustomerID = ' + -- 63 quotename(@custid, '''') -- 64 -- 65 IF @custname IS NOT NULL -- 66 SELECT @sql2 = @sql2 + ' AND c.CompanyName LIKE ' + -- 67 quotename(@custname + '%', '''') -- 68 -- 69 IF @city IS NOT NULL -- 70 SELECT @sql2 = @sql2 + ' AND c.City = ' + -- 71 quotename(@city, '''') -- 72 -- 73 IF @region IS NOT NULL -- 74 SELECT @sql2 = @sql2 + ' AND c.Region = ' + -- 75 quotename(@region, '''') -- 76 -- 77 IF @country IS NOT NULL -- 78 SELECT @sql2 = @sql2 + ' AND c.Country = ' + -- 79 quotename(@country, '''') -- 80 -- 81 IF @prodid IS NOT NULL -- 82 SELECT @sql2 = @sql2 + ' AND od.ProductID = ' + str(@prodid) + -- 83 ' AND p.ProductID = ' + str(@prodid) -- 84 -- 85 IF @prodname IS NOT NULL -- 86 SELECT @sql2 = @sql2 + ' AND p.ProductName LIKE ' + -- 87 quotename(@prodname + '%', '''') -- 88 -- 89 IF @debug = 1 -- 90 PRINT @sql1 + @sql2 -- 91 -- 92 EXEC(@sql1 + @sql2 + ' ORDER BY o.OrderID') -- 93

General Notes

When building a non-parameterised query with the values included, you need to be disciplined when you write your code. It's very easy to get lost in a maze of nested quotes. I often see people on the newsgroups posting code like:

EXEC('SELECT col1, col2, ... FROM ... WHERE ' + CASE @par1 IS NULL THEN ' + col = ''' + @par + ''' + ...)

This is difficult to read and maintain, and if it goes wrong, you have no idea what SQL you are actually generating. search_orders_2 aggregates the SQL code in two variables, and there is a @debug parameter so I can see the SQL code, if I need to verify that I am generating the right thing. And by using the function quotename(), I have tried to reduce the need for nested quotes to a minimum. (More about this function just below.)

Keep in mind that EXEC() opens a new scope. The SQL you generate can not refer to variables declared in the surrounding procedure, and if the dynamic SQL creates a temp table, that table goes away with the scope.

The Use of Quotename()

As you can see, starting from line 46, I make frequent use of the built-in function quotename(), an idea I got from SQL Server MVP Steve Kass. quotename() takes two parameters: a string and a delimiter character. If the delimiter is ( , [ , < or { , the corresponding bracket is assumed to be the closing delimiter. The return value is the string enclosed by the delimiters, and any closing delimiter in the string is doubled. The default delimiter is square brackets. Examples:

SELECT quotename('Order Details') -- => [Order Details] SELECT quotename('Two o' + char(39) + 'clock', '''') -- => 'Two o''clock' SELECT quotename('abc{}def', '{') -- => {abc{}}def}

Of these examples, the interesting one in this context is the second. In T‑SQL you use single quotes to delimit strings. If you need to include the string delimiter in a literal, you double it, so those four single quotes in a row is a one-character literal with the value of a single quote ( ' ). Alternatively, you can express this value as char(39) like I do when I form the string Two o'clock in the example above.

Now, why all this? There are three important reasons:

It makes the code much easier to write. If you don't use quotename() , you need to use nested quotes, and the code becomes really messy. This permits the user to enter values such as Let's Stop N Shop for the company name (this is an actual customer in the Northwind database) without getting a syntax error. This increases your protection you against something which is known as SQL injection, a technique whereby a malicious intruder enters the string delimiter and then types in an SQL command where you as a naïve developer only expected data, and thereby manages to get SQL Server to execute that statement. See Curse and Blessings... for a longer discussion on SQL injection.

Note: The input parameter to quotename() is nvarchar(128), so if you have longer input parameters, you cannot use quotename(). You still need to be able to handle single quotes in the string. You can easily do this with the replace() function or use the quotestring() function from Curse and Blessings...

Datetime Parameters

All non-string parameters must be converted to string literals and on lines 23-24 I take care of the datetime parameters. The expression I use:

convert(char(23), @fromdate, 126)

is not something I picked at whim. The resulting format is like this: 2003-04-06T21:14:26.627, which is the precise format that is mandated by the standard ISO 8601, and which is commonly used in XML. More importantly, it is one of the three formats for datetime literals in SQL Server of which the interpretation does not depend on the settings for date format and language. If you don't understand what I'm talking about, try these statements:

SET DATEFORMAT mdy SELECT convert(datetime, '02/07/09') SET DATEFORMAT dmy SELECT convert(datetime, '02/07/09') go SELECT convert(datetime, '2002-12-19') -- Fails! go SET LANGUAGE Swedish SELECT convert(datetime, 'Oct 12 2003') -- Fails! (It's "Okt" in Swedish.)

Beside the SET commands, the language setting (which also controls the date format) can be set on user-level with the system procedure sp_defaultlanguage.

The style parameter 126, as well the format with the T between date and time, is available only on SQL 2000 and later. On SQL 7 you should instead use 112 for the style parameter, which gives the format 20030406, a second date format that is independent of date-format and language settings. The drawback with 112 is you do not get hours and minutes. For the procedure search_orders_2 this would be OK, since users are only giving dates, but if you need to include the time portion on SQL 7, you need to combine style parameters:

SELECT convert(char(8), getdate(), 112) + ' ' + convert(char(8), getdate(), 108)

For further information about the various style parameters you can give to convert(), look at the topic CAST and CONVERT in the T‑SQL Reference in Books Online.

Numeric Parameters

On lines 25-26 I take care of the two money parameters, and on lines 41-42 and 83-84, I handle the int parameters @orderid and @prodid. As you see, I apply different strategies. Of course, I could have used convert() for @orderid and @prodid too. It is just that I find str() a little more convenient to use for integer values. str() works with money too, but with no further parameters, the value is rounded to the nearest whole number, whereas convert() by default retains two decimals, which is why I preferred convert() for the @min/maxprice parameters.

I'm not discussing the float and decimal data types here. Refer to the topics CAST and CONVERT and STR in Books Online.

@sql1 and @sql2

As you see, to build the string I use two variables, @sql1 and @sql2. In this example there is not really any reason for this, but if you have a huge query and large amount of parameters there is a risk on SQL 2000 and SQL 7 that your SQL string will exceed the maximum of 4000 characters for an nvarchar value (or even 8000 for varchar.) So to demonstrate the technique, there are two variables in search_orders_2. As you see, EXEC() permits you to pass an expression as a parameter. Note this is not an arbitrary expression; the only permitted operator is string concatenation.

On SQL 2005 you should use nvarchar(MAX), and you would have no reason to use more than one @sql variable.

varchar and nvarchar

The Northwind database consistently uses the nvarchar data type, but I've failed to handle this in my procedure. The data in Northwind appears to be restricted to the characters in Windows Latin-1, which covers languages such as English, Spanish, French and German, so if you have a system collation based on this character set, you would never get any incorrect response from search_orders_2. However, a Russian user trying:

EXEC search_orders_2 @city = N'Bräcke'

would not get the orders from Folk och Fä HB in Bräcke, because Bräcke would be converted to varchar as Bracke. This is how my procedure should have read: (lines 71-72)

SELECT @sql2 = @sql2 + ' AND c.City = N' + quotename(@city, '''')

That is, I should have added an N to make the resulting SQL read c.City = N'Bräcke' . That N makes the literal a Unicode literal of the nvarchar data type, and it should appear with all nvarchar parameters. (The N stands for National, and is taken from the ANSI standard SQL-92.)

So why did I not give you the proper code? Well, I figured that many readers would take my code as a template for their own code, and I don't want you to include that N by routine. Because, if you use an nvarchar literal together with an indexed varchar column, the varchar column will be converted to nvarchar, and that means that the index will be used less efficiently or not at all. If you use a Windows collation, SQL Server will still seek the index, but in a more roundabout way, resulting in doubled or tripled execution times. If the indexed column has an SQL collation, the index is completely useless, and you will get a less efficient plan and the execution time can hundred- or thousand-fold.

Caching

As I've already mentioned, you get poor use of the cache with EXEC(), since the exact query string is placed in cache. This means that searches on the customers ALFKI and VINET generate two different plans in the cache, so reuse only happens when two users performs exactly the same search.

Two qualifications here:

Sometimes SQL Server does not cache unparameterised queries at all, so there will be no reuse, even when two users search for the same thing. SQL Server can auto-parameterise queries, but this happens only with the simplest queries, and is nothing you should rely on. On SQL 2005, there is something called forced parameterisation, where all queries are auto-parameterised, but again, you should not rely on this setting being active.

Summary

Just to make the point once more, here is a summary of the advantages of using sp_executesql over EXEC():

Minimises the need to use nested quotes.

Minimises the risk for SQL injection.

No need to worry about the formatting of datetime and numeric parameters.

and numeric parameters. No need to think about whether use N before string literals or not. (But of course in the parameter list to sp_executesql you need to make the correct choice of data type.)

before string literals or not. (But of course in the parameter list to you need to make the correct choice of data type.) Query plans for the same set of input parameters can be reused from cache.

When Caching Is Not Really What You Want

Before we move on to static SQL, I like to discuss a problematic situation that you may encounter. Thus far I have preached the virtue of caching. If two users search on the same criterias, they get the same plan with search procedures like search_orders_1 or search_orders_cs that use parameterised queries. But consider these two invocations:

EXEC search_orders_1 @custid = 'ERNTC', @fromdate = '19980205', @todate = '19980205' EXEC search_orders_1 @custid = 'BOLSR', @fromdate = '19980101', @todate = '19981231'

In Northgale, ERNTC is the top customer with 572 orders, whereas there is one single order for BOLSR. As you may guess, the best query plans for these two invocations are not the same. When searching for orders from a frequent customer in a short time span, it's probably better to use the index on OrderDate, but when searching for orders from an infrequent customer in a broad time span, the index on CustomerID is likely to be the winner. How to deal with this? I will discuss a couple of tricks that you can employ.

Inlining Some Values

While inlining all parameter values into the string is a bad idea, it can sometimes make sense to include some parameter values directly. Say that the search had included a @status parameter, and there had only been four possible values for Orders.Status, whereof Completed had accounted for 95 % of the values. A typical search may be for new orders, less than 1 % of the rows. In this case, it would be a very good idea to include the value of the @status parameter directly into the query string, since different status values could call for different query plans. This applies not the least if there is an index on the Status column.

What about the situation above, the combination of customer and date interval? Inlining the customer id into the string is probably a bad idea. That's one query plan per customer, so you would reduce the cache reuse drastically. The date interval is a better candidate, assuming that users would tend to use the same intervals during a day, for instance for the last seven days. You would still add quite a few more query plans to the cache though. But old plans for last week would age out by time, so unless the usage pattern is extremely diverse, this could work out.

Changing the Query Text

As I discussed previously, SQL Server looks up queries in the cache by hashing the query text. This means that two queries with different text are different entries in the cache, even if they are logically equivalent. There are many ways to alter the query text, for instance you could do something like:

IF @fromdate IS NOT NULL AND @todate IS NOT NULL BEGIN SELECT @sql = @sql + CASE WHEN @fromdate = @todate THEN '' WHEN datediff(DAY, @fromdate, @todate) <= 7 THEN ' AND 2 = 2 ' WHEN datediff(DAY, @fromdate, @todate) <= 30 THEN ' AND 3 = 3 ' ...

The advantage with changing the query text depending on the parameter values over inlining them into the query string is that you get fewer plans in the cache, at the price of that in some cases you will not run with the best plan. To wit, you can only make some guesses where the breakpoint between two plans is. With some luck, the damage from this is limited. Even if the plan for a single day is not the best for the span of a week, it may still be decently useful. But you will have to know your data, and possibly tune as time goes.

Could this technique also be used to deal with the fact that different customers can have a very different number of orders? Probably not. Counting the number of orders for a customer before we construct the query is taking it too far in my opinion, and it could be more expensive than what you save in the other end.

Above, I altered the query text by adding extra conditions, but it works equally well with adding extra spacing or extra comments, since SQL Server looks up the query before it parses it. And since the parameter list is part of the query, you can play with that one too if you want to show off and make your code really obscure.

Index Hints

I added this here, because I figured that sooner or later someone would ask Could you not use an index hint? Surely, index hints can be useful in some situations. For instance, if the user provides an order id, we may want the optimizer to always use PK_Orders. But force an index because of a certain input value? I find it difficult to see that there are very many situations where there this would be a fruitful strategy. As always, only add index hints, if you have identified a situation where the optimizer picks the wrong plan, and no other workaround is available.

OPTION (RECOMPILE)

In SQL 2005 (not in SQL 2000 and earlier) you can add this hint at the end of the query. This instructs SQL Server to recompile the query each time, and in this case SQL Server will not put the plan into cache.

If you feel that the input values can vary so wildly that you want a recompile each time, and yet want to use parameterised queries to be safe from SQL injection, then this option is a good choice. You could also add it selectively, so for instance you would not add it if an order id is given, but in all other cases.

Static SQL

Introduction

For a search routine with many parameters like our search_orders example, static SQL is not really an appetising choice – you will have to make trade-offs between performance and maintainability. To always get the best performance, you would have to use IF statements to only include the conditions you need:

IF @orderid IS NOT NULL SELECT ... WHERE o.orderd IS NOT NULL ELSE IF @fromdate IS NOT NULL AND @todate IS NULL AND @custid IS NULL ... SELECT ... WHERE o.OrderDate <= @fromdate ELSE IF @fromdate IS NOT NULL AND @todate IS NOT NULL AND @custid IS NULL ...

As you realise, this code would be impossible to maintain (and very boring to write).

If you are on SQL 7 or SQL 2000 and cannot give users direct SELECT permissions to the tables you don't have much choice. You must use static SQL, although it's possible to use static SQL together with dynamic SQL in a hybrid solution that we will look at later. In the following sections, I will look at different methods that you can employ in static SQL. These methods are not mutually exclusive, and you can often get the best result by combining two or more of them.

Even if you are on SQL 2005, you should not dismiss the static solutions out of hand. Keep in mind that not all search problems are equally complex as search_orders. If you have a very small set of search options, you may be able to produce a search solution with static SQL that performs perfectly well, and still is manageable – more manageable than dynamic SQL. (After all, dynamic SQL in itself adds a level of complexity with risk for syntax errors not appearing until run-time.) Sometimes your "dynamic" search is only a lookup with alternate keys. This is a special case for which we will look at perfectly good static solution in the section x = @x AND @x IS NOT NULL.

As with dynamic SQL: always be careful to test all input parameters to your procedure, both for correct result and for performance. To test for performance, you need tables with sizes resembling to what you can expect in production.

x = @x OR @x IS NULL

This is the starting point for dynamic searches in static SQL and this is also what the hybrid solutions build on. This simple technique yields one compact query. However performance for large data sets will be poor. Here is search_orders_3 to demonstrate the idea:

CREATE PROCEDURE search_orders_3 @orderid int = NULL, @fromdate datetime = NULL, @todate datetime = NULL, @minprice money = NULL, @maxprice money = NULL, @custid nchar(5) = NULL, @custname nvarchar(40) = NULL, @city nvarchar(15) = NULL, @region nvarchar(15) = NULL, @country nvarchar(15) = NULL, @prodid int = NULL, @prodname nvarchar(40) = NULL AS SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON p.ProductID = od.ProductID WHERE (o.OrderID = @orderid OR @orderid IS NULL) AND (o.OrderDate >= @fromdate OR @fromdate IS NULL) AND (o.OrderDate <= @todate OR @todate IS NULL) AND (od.UnitPrice >= @minprice OR @minprice IS NULL) AND (od.UnitPrice <= @maxprice OR @maxprice IS NULL) AND (o.CustomerID = @custid OR @custid IS NULL) AND (c.CompanyName LIKE @custname + '%' OR @custname IS NULL) AND (c.City = @city OR @city IS NULL) AND (c.Region = @region OR @region IS NULL) AND (c.Country = @country OR @country IS NULL) AND (od.ProductID = @prodid OR @prodid IS NULL) AND (p.ProductName LIKE @prodname + '%' OR @prodname IS NULL) ORDER BY o.OrderID

The effect of all the @x IS NULL clauses is that if that input parameter is NULL, then that AND-condition is always true. Thus, the only conditions that are in effect are those where the search parameter has a non-NULL value.

For maintainability this is a good choice, as there is no extra complexity. But this is a poor solution for good performance. This procedure will never use any selective index, but you will get a query plan as if you had not supplied any search parameters at all.

This may surprise readers who are acquainted with parameter sniffing. That is, when SQL Server builds a query plan for a stored procedure, it looks at the values of the input parameters and uses them as guidance for the plan. Thus, for a call like:

EXEC search_orders_3 @orderid = 11000

it would be reasonable to expect to get a plan looks up the order id through the primary-key index on Orders. So why does this not happen? Keep in mind that the optimizer does not perform any flow analysis of the code. It knows that @orderid is 11000 on input, but it cannot exclude the possibility that at point the query is executed, @orderid has been set to NULL along the way, in which case a plan with a key lookup would yield incorrect result – something which is completely unacceptable for the optimizer.

There is a second problem with parameter sniffing that the optimizer must keep in regard. Assume that SQL Server had in fact looked at the input parameters and opted to use the primary key on Orders. Recall that query plans are cached. So when the next guy submits a search on product id instead, the first query plan would be flat wrong.

Forcing Recompilation Each Time

Since there is no good plan that fits all searches, it can be a good idea to force recompilation each time. This can be done in three different ways:

Adding WITH RECOMPILE to the procedure definition.

to the procedure definition. Adding WITH RECOMPILE when calling the procedure.

when calling the procedure. On SQL 2005 use the query hint OPTION (RECOMPILE) .

I tried this test batch:

exec search_orders_3 @orderid = 11000 WITH RECOMPILE exec search_orders_3 @custid = 'ALFKI' WITH RECOMPILE exec search_orders_3 @prodid = 76 WITH RECOMPILE exec search_orders_3 @prodid = 76, @custid = 'RATTC' WITH RECOMPILE exec search_orders_3 @fromdate = '19980205', @todate = '19980209' WITH RECOMPILE exec search_orders_3 @city = N'Bräcke', @prodid = 76 WITH RECOMPILE

I found that on SQL 2000, I got the same plan for all cases but the last when I ran it in Northgale.

On SQL 2005, I got a different plan for each invocation, and the total execution time for the batch was considerably lower than when I ran the batch without WITH RECOMPILE. So forcing recompilation was certainly worthwhile. However, the plans were not still particularly effective, and compared to the solutions with dynamic SQL, the execution time was about four times more for the entire test batch.

As I discussed above, SQL Server does not perform flow analysis of the procedure, so it cannot assume the input parameters actually retain their values until it reaches the SQL statement. Thus, this poor result is not very surprising. But what if we instead use the hint OPTION (RECOMPILE)? This query hint, added in SQL 2005, instructs SQL Server to recompile the plan for the statement each time. (In SQL 2000 stored procedures are compiled as a whole. SQL 2005 is able to recompile on statement level.) That would mean SQL Server should be able to look at the actual parameter values and take all possible shortcuts, and arrive at a plan which is just as good as what you get with dynamic SQL.

Alas, this does not happen in SQL 2005. I found that when I added OPTION (RECOMPILE) to the query, that I got exactly the same plan as when I added WITH RECOMPILE to the EXEC statement. As for why, I guess it was an oversight. This has been addressed in SQL 2008, so that OPTION (RECOMPILE) works as expected, and this radically changes the presumptions for this problem, and that is why there is a more modern version of this article for SQL 2008 which do not include many of the tricks in this article, since it can focus on two main alternatives: dynamic SQL and static SQL with OPTION (RECOMPILE) in detail.

This is not to say that OPTION (RECOMPILE) is useless on SQL 2005. If you have a long stored procedure, and you have multiple-criteria condition in the middle of it, OPTION (RECOMPILE) can be very useful. Also, if the statement includes local variables, or the input parameters have changed values, SQL Server is able to use these values when building the plan.

Of course, if you force recompilation each time, there is no cache reuse at all. (In fact, the plan is not even put into cache.) But that is a trade-off you have to make. Cache reuse is good as long as it takes of load off the server, but it's not good if it leads to queries running with suboptimal plans.

A Useless Method?

So if this method always yields a table scan, it is worthless and should never be used? Not really so. Sometimes you can reasonably assume that the amount of data that you are to search will not exceed, say, 10 000 rows. As long as the response times are acceptable for the users, and there are no locking issues, there is no reason to make your programming more complex than necessary. And there are plenty of cases where you need to add an extra optional filter to an already efficient query on a non-indexed column, a case where this method is an excellent choice. I would go as far as to say that this technique should be a staple in your T‑SQL programming. You just need to know when it's not the right thing to use.

In the section x = @x AND @x IS NOT NULL, I will come back to a "trick" where you actually can make this method to use indexes effectively under some circumstances.

Variation on this Theme

Rather than using OR like above, some people write:

o.orderID = coalesce(@orderid, o.OrderID)

coalesce() is a function that takes a list of values as argument, and returns the first non-NULL value in the list, or NULL if there is no non-NULL value in the list. Thus, if @orderid is NULL, you get o.OrderID = o.OrderID , a complete no-op. You can see full example of this in search_orders_3a.

This yields even more compact code than using OR, but it has the same problems with regards to performance: SQL Server will not be able to use indexes efficiently. Rather, this sort of condition can make it even more difficult for the optimizer, even with recompilation hints. (I ran the six test searches in Northgale. On SQL 2005, search_orders_3a ran considerably slower than search_orders_3, whereas on SQL 2000, it was considerably faster!)

I like to put up a warning sign for coalesce, because there is a trap you can fall into. Run this:

EXEC search_orders_3 @orderid = 10654 EXEC search_orders_3a @orderid = 10654

The first call return three rows, but the last returns no rows at all! Why? Because for this customer, the column Region is NULL. When @region is NULL, the condition

c.Region = coalesce(@region, c.Region)

becomes in essence c.Region = NULL. But in SQL, NULL is not equal to NULL. NULL stands for "unknown value", and whether two unknown values are the same or different is of course unknown.

Thus, in a correct implementation of search_orders_3a, I should have written the condition on Customers.Region differently. Since this is accident waiting to happen, and the solution with coalesce gives no significant advantage, my verdict is that

x = @x OR @x IS NULL

is to prefer over coalesce. But if you want to cram out maximum performance out of this solution, you may want to experiment, and even mix. (And to increase your options, you could try isnull as well!)

Using IF statements

While using IF for each and every combination is not feasible, you could still use IF statements to single out the most important search cases where search columns are indexed and for which users expect snap response. After all, a condition like c.Country = @country OR @country IS NULL is fairly harmless, because there is no index on Customers.Country anyway.

To demonstrate this technique, I wrote search_orders_4, which I'm including only abbreviated form. In this procedure, I have three different code paths: one if @orderid has been given, one if @custid is present and one none of them are provided:

IF @orderid IS NOT NULL BEGIN SELECT ... WHERE o.OrderID = @orderid AND od.OrderID = @orderid AND (od.UnitPrice >= @minprice OR @minprice IS NULL) AND (od.UnitPrice <= @maxprice OR @maxprice IS NULL) AND (od.ProductID = @prodid OR @prodid IS NULL) AND (p.ProductName LIKE @prodname + '%' OR @prodname IS NULL) ORDER BY o.OrderID END ELSE IF @custid IS NOT NULL BEGIN SELECT ... WHERE (o.OrderDate >= @fromdate OR @fromdate IS NULL) AND (o.OrderDate <= @todate OR @todate IS NULL) AND (od.UnitPrice >= @minprice OR @minprice IS NULL) AND (od.UnitPrice <= @maxprice OR @maxprice IS NULL) AND c.CustomerID = @custid AND o.CustomerID = @custid AND (od.ProductID = @prodid OR @prodid IS NULL) AND (p.ProductName LIKE @prodname + '%' OR @prodname IS NULL) ORDER BY o.OrderID END ELSE BEGIN SELECT ... WHERE (o.OrderDate >= @fromdate OR @fromdate IS NULL) AND (o.OrderDate <= @todate OR @todate IS NULL) AND (od.UnitPrice >= @minprice OR @minprice IS NULL) AND (od.UnitPrice <= @maxprice OR @maxprice IS NULL) AND (c.CompanyName LIKE @custname + '%' OR @custname IS NULL) AND (c.City = @city OR @city IS NULL) AND (c.Region = @region OR @region IS NULL) AND (c.Country = @country OR @country IS NULL) AND (od.ProductID = @prodid OR @prodid IS NULL) AND (p.ProductName LIKE @prodname + '%' OR @prodname IS NULL) ORDER BY o.OrderID END

You can see that I have removed some of the conditions that are redundant if an order id or a customer id is given. You can also see that I use the "double feature" of specifying the conditions on @orderid and @custid, in case this helps the optimizer.

In this example, I chose to give preference to @orderid and @custid, as they appeared to be the most important searches. Users certainly expect immediate response when they enter an order id, and often they need to see orders for a certain customer. On the other hand, a search on price is not likely to be common.

Could I have composed paths for all indexed columns? Nah, Northwind appears to be a bit over-indexed. In fact there are only two search columns in our case study that are not indexed: Customers.Country and Order Details.UnitPrice. As you realise, the complexity of the IF statements raises rapidly as you involve more conditions. For instance, if I also want to have separate branches for OrderDate, I would have to say something like:

ELSE IF @custid IS NOT NULL AND (@fromdate IS NOT NULL OR @todate IS NOT NULL) ... ELSE IF (@fromdate IS NOT NULL OR @todate IS NOT NULL) ... ELSE IF @custid IS NOT NULL

That is, I would need two extra branches. Without the first condition on all search parameters, the user who enters a customer with a small amount of orders, and also enters a meaningless @todate in the future would end up in the wrong branch. With the first branch, the optimizer can make the decision (provided that there is a WITH RECOMPILE somewhere).

Using Subprocedures

There is a problem with search_orders_4. If you first call it with a @orderid, and then with a @custid, the cached plan for the branch with the search on CustomerID will be based the NULL value passed the first time, which may not be the best plan.

You can circumvent this by forcing recompilation, but if you have many IF branches in the procedure, each branch is recompiled every time – when you only need one to be recompiled. And recompiling a lengthy procedure can be costly.

On SQL 2005, you can address this by adding OPTION (RECOMPILE) after each SELECT statement, so that only the SELECT statement for the search at hand gets recompiled.

Another solution is to split up the different SELECT statements in subprocedures, a suggestion I originally got from SQL Server MVP Simon Sabin. You can see this in search_orders_4a, which very abbreviated runs as:

IF @orderid IS NOT NULL BEGIN EXEC search_orders_4a_sub1 @orderid = @orderid, ... END ELSE IF @custid IS NOT NULL BEGIN EXEC search_orders_4a_sub2 @fromdate = @fromdate, ... @custid = @custid, ... END ELSE BEGIN EXEC search_orders_4a_sub3 @fromdate = @fromdate, ...

Each subprocedure has its own plan in the cache, and for search_orders_4a_sub1 and sub2 that is a plan that is based on good input values from the first call. The catch-all search_orders_4a_sub3, still has a WITH RECOMPILE at it serves a mix of conditions.

The apparent downside is that the logic is scattered over several procedures, making maintenance more difficult.

Umachandar's Bag of Tricks

Here are some tricks that I have learnt from Umachandar Jaychandran, a former SQL Server MVP who now works in the SQL Server team at Microsoft.

Using BETWEEN

He suggested that rather than saying

o.OrderID = @orderID OR @orderID IS NULL

You should use:

o.OrderID BETWEEN coalesce(@orderID, @MinInt) AND coalesce(@orderID, @MaxInt)

Here @MinInt and @MaxInt are the smallest and biggest integer values possible. What you gain here is that the optimizer can use the index on OrderID, because if @orderid is NULL, the plan still yields the correct result.

Note that this trick does not work with nullable columns. If you were to use this in a query:

c.Region BETWEEN coalesce(@region, @MinRegion) AND coalesce(@region, @MaxRegion)

then the query:

EXEC search_orders @orderid = 10654

would not return any rows, because the customer on this order does not belong to a region.

How to Get Min and Max Values

How would you get the @MinInt and @MaxInt values? For some data types, you can hard-code the boundaries. This works for integer, binary and datetime columns, and with some care also for float/real, numeric/decimal and money columns.

However, for character data you cannot safely rely on hard values, because they are collation dependent. One way is to retrieve them from the table with SELECT MAX() and SELECT MIN(). But only do this on indexed columns!

If you decide to skip the variables and put the MIN/MAX queries into the main query, there is a thing to watch out for:

o.CustomerID BETWEEN isnull(@custid, (SELECT MIN(CustomerID) FROM Customers)) AND isnull(@custid, (SELECT MAX(CustomerID) FROM Customers))

isnull is an older function similar to coalesce that accepts only two parameters. Why isnull and not coalesce here? Umachandar discovered that when he used coalesce, he got table scans and bad performance, but it worked well with isnull. Since, this is something can depend on subtle behaviour in the optimizer, I would encourage you evaluate both, if you use this in your search routines.

Using LIKE

As an alternative, Umachandar suggested that for character data, you could do:

c.City LIKE coalesce(@city, '%')

Again, this is a trick that only works for columns that are not nullable. (Customers.City is in fact nullable, but there are no NULL values for this column in Northwind.) Also be careful with the fixed-length data types char/nchar. I tried this:

c.Customer LIKE coalesce(@custid, '%')

but since @custid is nchar(5), the percent was padded with four spaces, and no customer id matches that.

Another potential problem is that you now permit users to enter %York which may or may not be a good thing. The index on City is not very good for this search string.

search_orders_5

Using Umachandar's tricks, I wrote the procedure search_orders_5, which reads (abbreviated).

SELECT @minint = convert(int, 0x80000000), @maxint = convert(int, 0x7FFFFFFF) IF @custid IS NULL SELECT @mincustid = MIN(CustomerID), @maxcustid = MAX(CustomerID) FROM Customers SELECT ... WHERE o.OrderID BETWEEN coalesce(@orderid, @minint) AND coalesce(@orderid, @maxint) AND od.OrderID BETWEEN coalesce(@orderid, @minint) AND coalesce(@orderid, @maxint) AND o.OrderDate >= coalesce(@fromdate, '17530101') AND o.OrderDate <= coalesce(@todate, '99991231') AND od.UnitPrice >= coalesce(@minprice, 0) AND od.UnitPrice <= coalesce(@maxprice, 1E18) AND o.CustomerID BETWEEN coalesce(@custid, @mincustid) AND coalesce(@custid, @maxcustid) AND c.CustomerID BETWEEN coalesce(@custid, @mincustid) AND coalesce(@custid, @maxcustid) AND c.CompanyName LIKE coalesce(@custname + '%', '%') AND c.City LIKE coalesce(@city, '%') AND (c.Region = @region OR @region IS NULL) AND (c.Country = @country OR @country IS NULL) AND od.ProductID BETWEEN coalesce(@prodid, @minint) AND coalesce(@prodid, @maxint) AND p.ProductID BETWEEN coalesce(@prodid, @minint) AND coalesce(@prodid, @maxint) AND p.ProductName LIKE coalesce(@prodname + '%', '%') ORDER BY o.OrderID

To get the smallest and biggest possible integer values, I work from the bit patterns, which are easier to remember than the actual numbers. To make a small optimization, I only read the min and max values for the customer id, if @custid is NULL.

As you can see I do not use BETWEEN or LIKE for all columns. As we already have noted, we can not use this trick for Customers.Region, since there are NULL values in this column. And for Customers.Country there is no index anyway. (And like City, it permits NULL, even if there are on NULL values in Northwind.)

For the filters on Orders.OrderDate and Order Details.UnitPrice, I don't need to use BETWEEN, since we filter for a range anyway. The values I have used for the second parameter to coalesce, are the earliest and latest date supported by the datetime data type. For the money column I just assumed "reasonable values" for max and min.

To give the optimizer more to work with, I have included tests against both tables for the joining columns OrderID, CustomerID and ProductID.

Did this Perform Well?

The result was very mixed. When I ran this batch in Northgale:

EXEC search_orders_5 @orderid = 11000 WITH RECOMPILE EXEC search_orders_5 @custid = 'ALFKI' WITH RECOMPILE EXEC search_orders_5 @prodid = 76 WITH RECOMPILE EXEC search_orders_5 @prodid = 76, @custid = 'RATTC' WITH RECOMPILE EXEC search_orders_5 @fromdate = '19980205', @todate = '19980209' WITH RECOMPILE EXEC search_orders_5 @city = N'Bräcke', @prodid = 76 WITH RECOMPILE

the searches on customer id alone and customer id together with product id, had instant response time, a lot better than what I got with search_orders_3. On the other hand, the search for product id alone took over five minutes in Northgale on SQL 2000 and almost 80 seconds on SQL 2005, which is far slower than search_orders_3. Confusingly when I reran the test six months later, the search on product id now took only 20 seconds on SQL 2000, which is better than search_orders_3 on this version.

What's more disturbing is that despite the WITH RECOMPILE, the query plan was always the same for a given combination of server/database. What you want for a search routine of this kind, is to pick different plans depending on input, whence the WITH RECOMPILE.

So are Umachandar's tricks no good then? It would be to jump to conclusion to say so. They could work well in a different context. Maybe I went over the board when I wrote search_orders_5 and used his tricks for almost all conditions. When we come to search_orders_7 we will see a case where BETWEEN does help us to get a better plan.

In any case, you should always benchmark the different methods you are contemplating with your tables and data. And don't forget that the right answer may be to combine methods.

Using Temp Tables

Sometimes you can make your code simpler by using a temp table or two. We shall here look at two possibilities, of which the first is really miserable in terms of performance, while the second yields a search_orders procedure which performs quite well to be static SQL.

Deleting the Misfits

This a very simplistic method which may be OK for small data sets, but it will never perform well, and you should never consider it as your main solution. It may occasionally be defensible to use this method to implement search conditions that else would be difficult to handle, for instance conditions that requires access to tables that else would not be involved.

The idea is that you make a basic selection into a temp table, and then you delete the rows that do not match the user's search conditions from the temp table. Here is an extract of search_orders_6 where I use this technique:

SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder INTO #temp FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON p.ProductID = od.ProductID IF @orderid IS NOT NULL DELETE #temp WHERE OrderID <> @orderid IF @fromdate IS NOT NULL DELETE #temp WHERE OrderDate < @fromdate ... IF @region IS NOT NULL DELETE #temp WHERE Region <> @region OR Region IS NULL ... SELECT * FROM #temp ORDER BY OrderID

The code is so simple to write that you could do it in your sleep. The only thing to watch out for, is nullable columns like Region.

If you decide to use this method, it is essential that you make sure that the basic selection that you put in the temp table is moderate in size. search_orders_6 is OK in Northwind, but absolutely not in Northgale, where the initial SELECT will generate 2.3 million rows in the temp table.

There is a second performance issue to consider that particularly applies to SQL 7 and SQL 2000: recompilations. SQL Server may decide to recompile a procedure if the statistics of a table involved in a query has changed. This is particularly likely to happen with temp tables. Sometimes this is a good thing, but for a procedure like search_orders_6 it is likely to be a waste of time, since the temp table does not have any indexes. But since the procedure is long and have many queries, the recompilation itself could degrade performance. Since SQL 2005 has statement recompilation, the cost for recompilation is smaller there.

One way to avoid these recompilations on SQL 2000 and SQL 2005 is to use table variables instead. Table variables do not have statistics, and therefore cannot cause recompilation. Then again, if you have a DELETE that involves another table, the statistics on the temp table can be useful.

In this example I gathered all the output columns in the temp table. But it could have let it suffice with OrderID, OrderDate and UnitPrice, CustomerID and ProductID, that is the search columns from the big tables Orders and Order Details. Then for deletion on City or Prodname, I would have joined with Customers or Products. The advantage with this is that I would have kept down the table in size. It would still perform unacceptably slow in Northgale, though.

Filtering from Bottom and Up

This example uses a mix of strategies. It is more a demonstration of that there is not a clear-cut solution to implementing these searches in static code. Rather, you have to use your imagination, and even more importantly, your knowledge about the data at hand.

The procedure in the spotlight this time is search_orders_7. This is how the procedure starts:

IF @orderid IS NULL BEGIN INSERT #custids (custid) SELECT CustomerID FROM Customers c WHERE (c.CustomerID = @custid OR @custid IS NULL) AND (c.CompanyName LIKE @custname + '%' OR @custname IS NULL) AND (c.City = @city OR @city IS NULL) AND (c.Region = @region OR @region IS NULL) AND (c.Country = @country OR @country IS NULL) END INSERT #products(ProductID, ProductName, UnitsInStock, UnitsOnOrder) SELECT ProductID, ProductName, UnitsInStock, UnitsOnOrder FROM Products WHERE (ProductID = @prodid OR @prodid IS NULL) AND ProductName LIKE coalesce(@prodname + '%', '%')

That is, I fill two temp tables: one for customers and one for products. To show you different strategies, for customers I save only the customer id in that temp table, but for products I include the lot. By only including CustomerID, I keep the temp table smaller, but I need to join against Customers for the final query too, whereas for Products, the temp table has all I need.

To simplify, I make no attempt to use indexes here. It depends on your business, but tables for customers and products are typically considerably smaller than order tables.

As you see, I don't search Customers if the user has provided an order id, since in this case the customer is given by the order. (Note that the same does not apply to products!)

The next statement in this procedure reads:

EXEC search_orders_7_sub @orderid, @fromdate, @todate, @minprice, @maxprice

This procedure (available in the same file as search_orders_7), features the WITH RECOMPILE option and looks like this in abbreviated form:

IF @orderid IS NULL BEGIN SELECT ... FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN #custids t ON o.CustomerID = t.custid JOIN Customers c ON o.CustomerID = c.CustomerID AND t.custid = c.CustomerID JOIN #products p ON p.ProductID = od.ProductID WHERE o.OrderDate >= coalesce(@fromdate, '17530101') AND o.OrderDate <= coalesce(@todate, '99991231') AND od.UnitPrice >= coalesce(@minprice, 0) AND od.UnitPrice <= coalesce(@maxprice, 1E18) ORDER BY o.OrderID END ELSE BEGIN SELECT ... FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN #products p ON p.ProductID = od.ProductID WHERE o.OrderID = @orderid AND od.OrderID = @orderid AND od.UnitPrice >= coalesce(@minprice, 0) AND od.UnitPrice <= coalesce(@maxprice, 1E18) END

As in search_orders_4, I use different code paths for when an order id is given, and when it is not. You can see that only when I do not have an order id, I include both #custids andCustomers in the join. On the other hand, I always use #products, but never Products, as I have all data I need in the temp table.

Why are there two procedures? I wanted to make sure that there is a recompilation for the final query, since the best plan will depend on the contents of the temp tables, why I put the queries against the temp tables in a subprocedure with WITH RECOMPILE. (And as with search_orders_4a, I could have had one procedure for each of the two final SELECT statements, to minimise how much is recompiled each time.)

Of course, on SQL 2005 I could keep all in one procedure and use OPTION (RECOMPILE) instead to achieve the same effect.

When I ran my standard test batch in Northgale:

EXEC search_orders_7 @orderid = 11000 EXEC search_orders_7 @custid = 'ALFKI' EXEC search_orders_7 @prodid = 76 EXEC search_orders_7 @prodid = 76, @custid = 'RATTC' EXEC search_orders_7 @fromdate = '19980205', @todate = '19980209' EXEC search_orders_7 @city = N'Bräcke', @prodid = 76

Compared to search_orders_3, the result was better in all cases on SQL 2000 and in most cases significantly better. On SQL 2005, the searches on order id and date interval were up to 50-70 % slower, whereas the search on city and product, it was about three times faster. (Recall here that running search_orders_3 WITH RECOMPILE gave much more effect on SQL 2005 than on SQL 2000.) On both versions, about every invocation had a different query plan.

An interesting observation is the condition on UnitPrice. As you can see, I use the max/min method from Umachandar's Bag of Tricks. When I had the condition as:

AND (od.UnitPrice >= @minprice OR @minprice IS NULL) AND (od.UnitPrice <= @maxprice OR @maxprice IS NULL)

The query plan for the fifth search (@fromdate = '19980205', @todate = '19980209') was different on SQL 2000 and much slower, and it took 65 seconds to run the procedure with these parameters. There is no apparent reason why it should be like that. It only goes to show that you have to try different methods and turn every stone in your endeavour to get the best performance, when you use static SQL.

Note: the "good" plan I got for the search on the date interval was a very complex plan which included parallelism. My workstation has a single hyper-threaded CPU, and thus parallelism is possible. I guess that you can see different plans on machines with fewer or more CPUs.

x = @x AND @x IS NOT NULL

Alternate Key Lookup

Not all search routines are as complex our search_orders. Sometimes the dynamic search is confined to alternate keys, of which the user always supplies one. In this case, it is not too painful to use IF statements to handle the three different cases. But maybe the column list in the SELECT statement contains complex expressions that you don't want to repeat. You know already that this won't do:

WHERE (key1 = @key1 OR @key1 IS NULL) AND (key2 = @key2 OR @key2 IS NULL) AND (key3 = @key3 OR @key3 IS NULL)

As you have seen this will yield a table scan. But what do you think about this:

WHERE (key1 = @key1 AND @key1 IS NOT NULL) OR (key2 = @key2 AND @key2 IS NOT NULL) OR (key3 = @key3 AND @key3 IS NOT NULL)

The logic is here the reverse: give me all rows that matche any of the given conditions. From a performance point of view, this may look like the same thing, but this can in fact perform very well. The odds are very good that SQL Server will generate a plan which seeks the three key indexes and then merges the result either by index concatenation or some other method. Here is the real big scoop: thanks to the condition, @x IS NOT NULL, SQL Server adds a filter with a startup expression to the plan, so if the corresponding variable is NULL, SQL Server will not access that index at all.

Searching Orders

This method is directly applicable is when users want to get back rows that match any of the search conditions. Alas, most search functions are like our search_orders: users want to get back rows that match all conditions. Nevertheless, to improve performance for our dynamic search, we can use the above as a starting point for such a search, under some circumstances:

The user must supply at least one search condition; the alternative "return all" is not available.

All search columns are in the same table.

All search columns are indexed.

This leads to search_orders_10 that due these restrictions has a different interface from the other search procedures:

CREATE PROCEDURE search_orders_10 @orderid int = NULL, @custid nchar(5) = NULL, @fromdate datetime = NULL, @todate datetime = NULL, @empid smallint = NULL, @shipdate datetime = NULL AS IF @fromdate IS NOT NULL AND @todate IS NULL SELECT @todate = MAX(OrderDate) FROM Orders ELSE IF @fromdate IS NULL AND @todate IS NULL SELECT @fromdate = MIN(OrderDate) FROM Orders SELECT o.OrderID, o.OrderDate, o.EmployeeID, o.ShippedDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM (SELECT TOP 99.9999999 PERCENT o.OrderID, o.OrderDate, o.EmployeeID, o.ShippedDate, o.CustomerID FROM Orders o WHERE (o.OrderID = @orderid AND @orderid IS NOT NULL) OR (o.CustomerID = @custid AND @custid IS NOT NULL) OR (o.EmployeeID = @empid AND @empid IS NOT NULL) OR (o.ShippedDate = @shipdate AND @shipdate IS NOT NULL) OR (o.OrderDate BETWEEN @fromdate AND @todate AND @fromdate IS NOT NULL AND @todate IS NOT NULL) ) AS o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON p.ProductID = od.ProductID WHERE (o.OrderID = @orderid OR @orderid IS NULL) AND (o.CustomerID = @custid OR @custid IS NULL) AND (o.EmployeeID = @empid OR @empid IS NULL) AND (o.OrderDate >= @fromdate OR @fromdate IS NULL) AND (o.OrderDate <= @todate OR @todate IS NULL) ORDER BY o.OrderID, od.ProductID

The way search_orders_10 works is that in the derived table we run a query where we build the union of the search conditions, and then we filter the output from the derived table in the regular way. The point with this complex arrangement is that we know that SQL Server is able to run the query for the derived table efficiently with help of indexes. As for @fromdate and @todate, I will come back to that.

In case you've never seen a derived table before: a derived table is a virtual temp table within the query, but it is not necessarily materialised and SQL Server may recast the computation order as long the result is the same.

However, in this particular case, we really want SQL Server to compute the derived table, and that is the reason for the TOP 99.9999999 PERCENT. To wit, when I ran the procedure in Northgale on SQL 2000 without the TOP clause, the optimizer was able to look through my intentions and generated the plan as if the derived table wasn't there with poor performance as the result. Interesting enough, SQL 2005 was less smart and computed the derived table even without the TOP clause.

If you run for instance:

exec search_orders_10 @custid = 'ALFKI'

and look at the query plan, you will see that all indexes appear in the plan, but if you let the mouse hover the Filter operators, you will see that they include a startup expression, meaning that if the filter condition is not satisfied, that part of the plan is not executed for this query.

When I compared the execution times for search_orders_10 to the other methods, I got execution times that were in par with those for dynamic SQL – for the three test cases that search_orders_10 supports, that is.

Since the same plan is good for all search conditions, there should not be any need for WITH RECOMPILE – or at least so one likes to think. But when I ran:

exec search_orders_10 @custid = 'ALFKI', @orderdate = '19970825', @empid = 501

and added WITH RECOMPILE to force a new plan, I got a different plan which involved parallelism. It is not that surprising – the more conditions the user adds, the more rows that derived table will hit, and at some point a table scan will seem more efficient to the optimizer. This can be problematic, if the first user that performs a search specifies this many parameters. His plan will be left in the cache for everybody else. I don't really have any good standard suggestion to handle this case. But you need to consider the possibility that the first user enters many search conditions, and then use your own good judgement to determine whether to use WITH RECOMPILE, take a gamble that it does not happen often or implement some other creative solution.

Why the Restrictions?

But why all these restrictions? The restriction that the user must specify at least one search condition follows from the logic of the method. Without any search conditions at all, the derived table will not return any rows.

The other two restrictions are based on performance. If not all search columns are indexed, SQL Server will not use any startup expression and merge different indexes. Since at least one of the conditions will require a scan, the optimizer will pick a scan for everything.

If the search conditions are in different tables, the optimizer does not try to merge the result from different seeks in different tables with startup expressions, but you get a one-size-fits-all plan. The file for search_orders_10 includes search_orders_10a and search_orders_10b that add a @prodid parameter and a @custname respectively. None of them has the performance of search_orders_10. (And without the performance, it is difficult to see the point with the increased complexity of this method.)

Note: this not something is I "know" – it's only based on my own observations when testing with Northgale. Maybe you have better luck than I have. But don't stay up all night to get the optimizer use startup expressions for search conditions in different tables!

Searching on Ranges

As you can see in the code above, I have a special handling of @fromdate and @todate. If only one is given, I set the other to a boundary value to make the range closed. The reason is for this is that a condition like:

@orderdate >= @fromdate AND @fromdate IS NOT NULL

would hit too many rows if @fromdate is not NULL, so SQL Server would probably try to scan the table. I say "probably", because I did in fact not even have the guts to try it...

Handling Complex Conditions

In the example I'm using in this article, we always need to access all tables. But in many situations, there might be tables you only need to access if a certain input parameter is given. Or depending on some input parameter, you might need to access different tables.

With dynamic SQL, these conditions are fairly straightforward to handle. When you build the query, you include only the tables needed to satisfy the current values of the input parameters.

Some of the methods for static SQL can easily be extended to deal with this case. I'm mainly thinking of using IF statements and using temp tables. But there might also be additional tricks, depending on the situation. Here I will only cover one case, for which Phillipp Sumi pointed out a good solution.

Selection Based on Existence in a Other Table

Let's add one more parameter to our procedure: @suppl_country. If this parameter is provided, the procedure should only return information about products of which the supplier comes from the given country. You could implement this by joining to the Suppliers table, but Phillipp Sumi's suggestion was that you should use an EXISTS clause in this way:

@suppl_country IS NULL OR EXISTS (SELECT * FROM Suppliers s WHERE s.SupplierID = p.SupplierID AND s.Country = @suppl_country)

The point is that if @suppl_country is NULL, SQL Server will not access the Suppliers table at all. To illustrate this I wrote search_orders_9, which for the sake of simplicity is an extension of search_orders_3. To see what is going on, run this in Northwind:

SET STATISTICS IO ON go EXEC search_orders_9 @country = 'Sweden' EXEC search_orders_9 @suppl_country = 'Sweden'

If you look at the query plans, you will find that these are identical. But if you look at the output from SET STATISTICS IO ON, you will find something interesting:

Table 'Suppliers'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0. Table 'Products'. ... ...

Table 'Suppliers'. Scan count 2075, logical reads 4150, physical reads 0, read-ahead reads 0. "Table 'Products'. ... ...

(This output is for SQL 2000. It looks somewhat different, and somewhat confusing, on SQL 2005). Note that first line in bold: SQL Server is not touching this table at all. This is because the operator the optimizer uses, the Nested loops/Left Semi Join, permits SQL Server to first evaluate the variable, and thus find that there is no need to access the table.

This might remind you of shortcutting in a language like C++, but T‑SQL does not have any shortcut mechanism. In fact you could change the order of the conditions in the OR expression, and SQL Server would still not access Suppliers.

The standard word of caution applies here as well: you need to test that you actually get the query plan you expect. SQL Server may opt to handle the EXISTS clause in some different way and always access the table.

Hybrid Solutions – Using both Static and Dynamic SQL

In this section will we look at solutions that use both static and dynamic SQL. The idea is that you put all references to the tables inside a view or a table-valued function, and then use dynamic SQL to access the view or function. In this case users do not need SELECT permissions on the tables, only on the view or the function.

Of these two, using an inline-function is probably the most useful, but I will start with views, because... I'll tell you later.

Using Views

This is based on a suggestion I got from .Net MVP Eric Garza. The idea is that you define a view that exposes the data that the users are permitted to see, and then grant them SELECT access on the view instead.

From this, I defined the view search_order_view:

CREATE VIEW search_order_view AS SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON p.ProductID = od.ProductID go

Then I wrote search_orders_8, which is very similar to search_orders_1, but where I changed lines 18-28 to read:

SELECT @sql = 'SELECT * FROM dbo.search_order_view WHERE 1 = 1 '

This does not change performance, because SQL Server does not save any query plan for a view. Rather SQL Server expands the view into the query as it was a macro, so basically you get the original query from search_orders_1.

Rather than defining a view, I could have defined a inline table function. Just like views, inline table functions are macros that are expanded into the query plan. You could think of them as parameterised views.

In our example scenario, search_order_view is kind of pointless, because even if you deprive the users SELECT access to Orders, Order Details, Customers and Products, the users would still have access to all data. However, consider the case that you cannot give SELECT access to the users, because users are only allowed to see data for certain customers, and that this is defined in a table called UserAcceessCustomers:

CREATE TABLE UserAccessCustomers (UserName sysname NOT NULL, CustomerID nchar(10) NOT NULL, CONSTRAINT pk_uca PRIMARY KEY (UserName, CustomerID))

UserName here is simply the same as the user name in sysusers in the database. In such case you can write the view as:

CREATE VIEW search_order_view AS SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN UsersAccessCustomers u ON u.CustomerID = c.CustomerID AND u.CustomerID = o.CustomerID JOIN Products p ON p.ProductID = od.ProductID WHERE u.UserName = USER

USER on the last line is a built-in T‑SQL function that returns the name of the current user in the current database.

Note: the above is how you normally implement row-level security in SQL Server. SQL Server MVP Steve Kass has found that it still is possible for a user to find information about data he is not supposed to see. As I mentioned above, a view is just a kind of macro, and the query optimizer is free to evaluate the tables in the expanded query in any order, something a non-privileged user can use to his advantage. It is not the case that he can access the data directly, but he can draw conclusions by looking at query plans, error messages etc. The attack requires a bit of skill, but if it is critical that users do not get the smallest glimpse of data they have no rights to see, you should not grant them SELECT permissions to views or inline functions like the one above. (A multi-step function has a fixed query plan of its own, so it should be safe from this attack. But then SQL Server would also have to materialise, if only in a table variable, the complete set of data that the user is entitled to see, which is likely to affect performance. And it would be pointless to combine a multi-step function with dynamic SQL.)

Eric Garza arrived at this solution from another angle: he wanted to deny users access on big tables so they could not submit queries that would scan the tables and block other users. Instead he provided a number of user-defined functions, where the users are required to specify certain input parameters, so whatever queries they submitted, he could be sure that SQL Server would use indexes.

Using Inline Table Functions

We now arrive at the solution that is the most challenging for me as an author, because while this method provides a solution to using dynamic SQL without direct SELECT permissions on the tables, it also requires you to have full understanding of how it works and why it works. Else you could end up implement it in a way that is not any more effective than purely static solutions. And to get the full power from this solution, you will need to break the rules.

This solution works on SQL 2000 and SQL 2005, but not SQL 7 which does not have user-defined functions.

This method was discovered by SQL Server MVP Marcello Poletti. He found that if he put a query like the one in search_orders_3 in an inline-table function, and then ran various SELECT statements against it, he would always get the optimal plan, as if he had used dynamic SQL. To get this in our order-search perspective, let's look at this function:

CREATE FUNCTION search_orders_11fun ( @orderid int = NULL, @fromdate datetime = NULL, @todate datetime = NULL, @minprice money = NULL, @maxprice money = NULL, @custid nchar(5) = NULL, @custname nvarchar(40) = NULL, @city nvarchar(15) = NULL, @region nvarchar(15) = NULL, @country nvarchar(15) = NULL, @prodid int = NULL, @prodname nvarchar(40) = NULL) RETURNS TABLE AS RETURN( SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON p.ProductID = od