The Curse and Blessings of Dynamic SQL

An SQL text by Erland Sommarskog, SQL Server MVP. Last revision: 2020-08-21.

Copyright applies to this text. See here for font conventions used in this article.

Introduction

This is a text about dynamic SQL. Used in the right place, dynamic SQL is a tremendous asset, but dynamic SQL is also one of the most abused features in SQL Server. I frequently see posts in SQL forums that use dynamic SQL or ask for it, when there is no need to, or the desire to use dynamic SQL is due to an earlier mistake in the design. Quite often these posts are from inexperienced SQL users. It is important to understand that dynamic SQL is an advanced feature, and preferably you should not start using dynamic SQL until you have mastered to write static queries.

How much you should use dynamic SQL depends on your role. If you are an application developer, you should be restrictive with your use of dynamic SQL. Dynamic SQL certainly has its place in application code, but, as we shall see in this text, it also introduces complexity so there is all reason to be hold back. On the other hand, if you are a DBA, dynamic SQL is your best friend, because there are many DBA operations that can be automated with the help of dynamic SQL. (And at the same time as you use dynamic SQL all day long, you should attempt to prevent the devs from sticking dynamic SQL into the application code!)

Some readers may ask: what is dynamic SQL? Dynamic SQL is when you write SQL code into a string variable and then execute the contents of that variable. This can be done in client code or in a stored procedure. With a strict definition, static SQL can only occur in stored procedures and SQL scripts, since SQL in client code is always embedded into string literals in the client-side language. However, you could argue that a client that only has fixed SQL strings is using static SQL, and one of the aims of the article is to demonstrate how client code should be written to achieve this.

The first chapter after this introduction gives the basic commands to run dynamic SQL in a proper way and discusses traps you could encounter. This chapter also demonstrates how you should submit SQL queries from client code in a proper way. The next chapter is a very important one: it discusses SQL injection, a threat you must always protect yourself against when you work with dynamic SQL, no matter you are using dynamic SQL in stored procedures or you are writing client code. This is followed by a short chapter on the performance aspects of dynamic SQL and highlights a few rules you should follow. This chapter, too, is largely applicable both to stored procedures and client code.

From this point, however, the article leaves the client code aside and discusses only dynamic SQL in stored procedures and SQL scripts. The longest chapter in the article focuses on one of the big problems with dynamic SQL: if you don't have the discipline, code that generates dynamic SQL can easily become unreadable and difficult to maintain. This chapter is a style guide with tips to help you to write code that generates dynamic SQL in a better way. The penultimate chapter of this article discusses some good use cases for dynamic SQL. This chapter includes a section on how to run a dynamic pivot – something I see questions about daily in SQL forums. The last section covers situations where I often see dynamic SQL being abused, and which often are due to a bad design choice earlier in the process.

This article is intended for a broad range of SQL workers. Particularly the first chapters are written with the inexperienced SQL developer in mind, and seasoned DBAs and lead programmers may find the material overly basic. However, you may still be interested to read this material to get ammunition to keep your devs in check so that they don't abuse dynamic SQL. Experienced programmers may discover things in the style-guide chapter that you were not aware of or had not thought of. Conversely, the less experienced programmers may find that the latter part of this chapter a little difficult to digest, and you may prefer to drop out, and move on to the section on, say, dynamic pivot, if this is the problem you have at hand.

Applicable SQL Versions and Demo Database

This article applies to all versions of SQL Server from SQL 2005 and up. There are no major differences between different SQL versions with regards to dynamic SQL since this release. There are a few smaller enhancements that have been added along the way, and I will call them out as we arrive at them.

However, this does not mean that all examples in the article will run on SQL 2005 as-is, but I have taken the liberty to embrace newer syntax where this helps to make the code shorter or more concise. These are small things, and you can easily modify the code if you want to run the examples on an older version of SQL Server. Some of these newer features appear only in a place or two, and for these I point out the differences as they appear. There are also a few features that I use over and over again, and which I list below and I generally don't mention further.

CREATE OR ALTER . Throughout the article, I use CREATE OR ALTER PROCEDURE . This syntax was introduced in SQL 2016 SP1. On earlier versions you will have to use CREATE the first time and then change to ALTER if the same procedure is modified.

. Throughout the article, I use . This syntax was introduced in SQL 2016 SP1. On earlier versions you will have to use the first time and then change to if the same procedure is modified. IIF . IIF ( condition , X , Y ) is short for CASE WHEN condition THEN X ELSE Y END . IIF was introduced in SQL 2012, so with older versions you will need to replace IIF with the corresponding CASE expression.

. ( , , ) is short for . was introduced in SQL 2012, so with older versions you will need to replace with the corresponding expression. INSERT VALUES for multiple rows . This syntax was introduced in SQL 2008, so on SQL 2005 you will need to rewrite this with multiple INSERT VALUES statements.

. This syntax was introduced in SQL 2008, so on SQL 2005 you will need to rewrite this with multiple statements. DECLARE with initialisation. This was added in SQL 2008, and with SQL 2005 you will need to split this into DECLARE + SET .

The demo database for this article is NorthDynamic, which you can create by running the script NorthDynamic.sql. The database is very small, less than 10 MB. The script runs on all versions of SQL Server from SQL 2005 and up. Not all examples build on this database; in those cases the article has CREATE TABLE statements where needed. I recommend that you work in tempdb for these examples.

NorthDynamic is a slightly modified version of Microsoft's old demo database Northwind, based on the fictive company Northwind Traders which flourished between 1996 and 1998; this explains why all orders are from this time frame. (If you are familiar with Northwind and wonder what the differences are: I have replaced deprecated data types so that the database can be installed with any collation. I have also made some modifications to schema and data for the benefit of some examples that I wanted to include. I gave the database a new name in case I want another variation of Northwind for another article.)

How to Run Dynamic SQL

In this chapter, we will look at how to run dynamic SQL in a civilised way. I like to point out there is nothing in the examples in this chapter that calls for dynamic SQL, and from that point of view they are bad examples, because you should not use dynamic SQL when there is no reason to. However, I opted to do it this way, because I wanted to focus on the essentials and keep the examples the simple. Actual use cases will come in the later chapters.

sp_executesql

sp_executesql is the system procedure that you invoke to run dynamic SQL. A way to describe this procedure is that it creates a nameless stored procedure which is saved to the plan cache (but not to disk) and then runs the procedure directly. On subsequent calls to this nameless procedure, the cached plan is reused. At this point however, we will not bother too much about the ramifications on the plan cache, but we will save that to the performance chapter. For the moment, just think of it as "creates a procedure and runs it directly".

And this is a very important thing: a batch of dynamic SQL is very similar to a stored procedure. About everything you can to do in a stored procedure, you can do in a batch of dynamic SQL. (There is one important difference with regards to permissions that I will cover in the next section, and a few more very marginal items that I will cover in this text.) And maybe the most important of all: you can use parameters. This is something which is very essential that you should make use of. The converse also applies: there is no syntax that is valid inside dynamic SQL, that is which is not valid in a stored procedure. All that is special is that the dynamic SQL is stored in a string and then executed.

sp_executesql has two fixed parameters: @stmt and @params. @stmt is the text of the SQL code to execute. The fact that the name is in singular is a little misleading: the SQL code does not have to be a single statement, but it can be a batch of any length with any number of statements. As you might guess, @stmt is a mandatory parameter. @params holds the parameter list for the code in @stmt, and the format of it is exactly like the format of the parameter list to a stored procedure. @params is not mandatory, but just like most of your stored procedures take parameters, most of your batches of dynamic SQL do. It is to the extent that every time you see a call to sp_executesql without parameters when reviewing code, there is all reason to check that the programmer is not doing the mistake of concatenating the parameter values into the string, which is a very bad thing to do. After @params follow the actual values to pass to the parameter list in @params in the same manner as you pass parameters to a regular stored procedure.

Here is an example script:

DECLARE @sql nvarchar(MAX), @params nvarchar(4000) SELECT @sql = N'SELECT @cnt = COUNT(*) FROM dbo.Orders WHERE OrderDate >= @date AND OrderDate < dateadd(MONTH, 1, @date) SELECT CompanyName FROM dbo.Customers WHERE CustomerID = @custid' SELECT @params = N'@date date, @cnt int OUTPUT, @custid nchar(5) = N''ALFKI''' DECLARE @cnt int, @when date = '19980201' EXEC sp_executesql @sql, @params, @when, @cnt OUTPUT SELECT @cnt AS [Orders in Feb 1998] EXEC sp_executesql @sql, @params, @date = '19970801', @cnt = @cnt OUTPUT, @custid = 'VINET' SELECT @cnt AS [Orders in July 1997]

The script starts with declaring two variables which I later pass to sp_executesql. @sql is the variable I will pass to the @stmt parameter; I use the name @sql as this seems more natural to me. Observe that the datatype of the variable is nvarchar(MAX). You should always use this data type for your batches of dynamic SQL, without any exception. You must use nvarchar, and cannot use varchar, of the simple reason that sp_executesql only accepts nvarchar and barfs if you attempt to use varchar. MAX, on the other hand, is not a requirement for sp_executesql, but it serves to protect your sanity. That is, if you use a limited type like nvarchar(1000), there is the risk that the batch you compose exceeds the length you specify. This leads to silent truncation and errors that can drive you crazy as you are trying to understand what is going on.

For @params, I use nvarchar(4000). As with @stmt, the data type must be nvarchar; varchar is not accepted. As for the length of 4000 that is mainly out of habit. You would have to have very many parameters to fill up that limit, but again, there is little reason to take the risk to make a guess at, say, 100, and then get errors because it was too low.

Next, I set up my SQL batch by assigning it to @sql, and since it is a static piece of code, there is little to remark on it. More interesting is the assignment to @params where I set up the parameter list. There are three parameters. The first is a "plain" parameter, whereas the second is an OUTPUT parameter and the third has a default value. Default values is nothing that is commonly used with dynamic SQL, but I included it here to emphasise that the parameter list is exactly like the parameter list to a stored procedure. On the other hand, using OUTPUT parameters with dynamic SQL is very common, since you often want to get scalar data back from your batch of dynamic SQL.

Before I invoke my dynamic SQL, I declare two local variables. Let's look at the call to sp_executesql again:

EXEC sp_executesql @sql, @params, @when, @cnt OUTPUT

The first parameter is @sql, that is, my SQL text, and the second is @params, that, is my parameter list. Next comes @when, my local variable that I initiated to 1998-02-01. This value is passed to the @date parameter in the SQL batch. Last comes @cnt which is to receive the value of the parameter @cnt in the dynamic SQL. Although they have the same name, they are really different entities; more about that later. Just as when I call a stored procedure, I need to specify OUTPUT also for the actual parameter. I don't pass a value for the parameter @custid, so the default for this parameter applies, and the second query returns the name Alfreds Futterkiste which is the full name for the customer with the ID ALFKI. (As for why customer IDs in NorthDynamic are codes rather than numeric IDs, I have no idea of what went on at Northwind Traders in the nineties, but it is kind of cute.)

There is a second call sp_executesql for the same SQL batch and parameters:

EXEC sp_executesql @sql, @params, @date = '20180101', @cnt = @cnt OUTPUT, @custid = 'VINET'

As you can see, in this call I use named parameters rather than positional parameters, and for the input parameters @date and @custid I use literals for the actual parameters. This entirely in alignment with calls to regular stored procedures. Note particularly the value passed to the @custid parameter. This is a varchar literal (there is no N before it), while @custid is nchar(5). That is, when we come to this part of the parameter list, we are free to mix (var)char and n(var)char. The restriction that we must pass nvarchar applies only to the fixed parameters @stmt and @params.

To drive the point that this is just like defining a stored procedure and running it at the same time, here is the same code logically, but with a "real " stored procedure:

CREATE OR ALTER PROCEDURE my_sp @date date, @cnt int OUTPUT, @custid nchar(5) = N'ALFKI' AS SELECT @cnt = COUNT(*) FROM dbo.Orders WHERE OrderDate >= @date AND OrderDate < dateadd(MONTH, 1, @date) SELECT CompanyName FROM dbo.Customers WHERE CustomerID = @custid go DECLARE @cnt int, @when date = '19980201' EXEC my_sp @when, @cnt OUTPUT SELECT @cnt AS [Orders in Feb 1998] EXEC my_sp @date = '19970701', @cnt = @cnt OUTPUT, @custid = 'VINET' SELECT @cnt AS [Orders in July 1997]

Here is one more example to highlight one detail:

DECLARE @sql nvarchar(MAX) = 'SELECT CompanyName FROM dbo.Customers WHERE CustomerID = @custid' EXEC sp_executesql @sql, N'@custid nchar(5)', 'VINET'

As you see, in this example I don't have any @params variable, but I define the parameter list directly in the call to sp_executesql. Observe the N preceding this parameter value; this is required to make it an nvarchar literal. I tend to use this pattern, if I have a short parameter list, but the longer the list is, the more likely that I will put the definition in a variable to make the code more readable. It is also possible to pass a literal value to the @stmt parameter, but since the SQL batch is almost often built from different parts, you will rarely do this practice.

Traps and Pitfalls

You have now learnt to use sp_executesql to run dynamic SQL. We will now look at some pitfalls that can surprise you, if your expectations are not in line with the actual functionality.

Scope of Variables

Consider this fairly nonsensical stored procedure:

CREATE OR ALTER PROCEDURE mishap_sp AS DECLARE @tbl TABLE (a int NOT NULL) DECLARE @a int = 99 EXEC sp_executesql N'SELECT a FROM @tbl SELECT @a' go EXEC mishap_sp

When we run this, we get two error messages:

Msg 1087, Level 15, State 2, Line 1 Must declare the table variable "@tbl". Msg 137, Level 15, State 2, Line 1 Must declare the scalar variable "@a".

If you have been paying attention, you already understand why we get this error: sp_executesql invokes a nameless stored procedure. That is, the code

SELECT a FROM @tbl SELECT @a

is not part of the procedure mishap_sp but of an inner (and nameless) stored procedure. As a consequence, the SELECT statement cannot access the variables in mishap_sp. In T‑SQL, variables are only visible in the scope they are declared; never in inner scopes.

Note: scope is a word commonly used in programming to denote where variables (and other items) are visible. In T‑SQL, a scope can be a stored procedure, trigger, function or simply an SQL script. A batch of dynamic SQL is also a scope.

If you want to pass variables from the surrounding procedure to the dynamic SQL, you must always pass them as parameters. For table variables, this requires that you have defined a table type, as in this example:

CREATE TYPE dbo.mytbltype AS TABLE (a int NOT NULL) go CREATE OR ALTER PROCEDURE hap_sp AS DECLARE @mytbl dbo.mytbltype, @a int = 99 EXEC sp_executesql N'SELECT a FROM @tbl SELECT @a', N'@tbl dbo.mytbltype READONLY, @a int', @mytbl, @a go EXEC hap_sp

Note: If you are still on SQL 2005: table-valued parameters were introduced in SQL 2008.

What Can Be Parameters?

Some people might try:

EXEC sp_executesql N'SELECT * FROM @tblname', N'@tblname sysname', 'Employees'

That is, they expect to be able to send in the table name as parameter. (As I will discuss in the section Dynamic Table Names in Application Code in last chapter, the urge to do so is often due to an earlier design mistake.) But the error message is:

Msg 1087, Level 16, State 1, Line 1 Must declare the table variable "@tblname".

There is nothing magic with dynamic SQL. You cannot put the name of a table in a variable and use it in a FROM clause. Nor can you put the name of a column in a variable and have it interpreted as such. And the same applies to almost all object names, with one exception: the name of a procedure you use in an EXEC statement. You will see examples of this as the article moves on.

Temp Tables and Dynamic SQL

In difference to the table-variable example earlier, this works:

CREATE OR ALTER PROCEDURE hap_sp AS CREATE TABLE #temp(b int NOT NULL) EXEC sp_executesql N'SELECT b FROM #temp' go EXEC hap_sp

This is because a temp table is visible in all inner scopes invoked by the module that created it.

On the other hand, this fails:

CREATE OR ALTER PROCEDURE mishap_sp AS EXEC sp_executesql N'SELECT * INTO #temp FROM dbo.Customers' SELECT * FROM #temp go EXEC mishap_sp

The error message is:

Msg 208, Level 16, State 0, Procedure mishap_sp, Line 3 Invalid object name '#temp'.

A local temp table is automatically dropped when the scope where it was created exits. That is, in this example #temp is dropped when the batch of dynamic SQL exits and therefore you cannot access it in the surrounding stored procedure. You need to create the table with CREATE TABLE before you invoke your SQL batch. ("But I don't know the schema, because it is dynamic!" some reader may object. If so, you have probably made a design mistake. We will return to that in the last chapter of this article.)

Dynamic SQL and Functions

You cannot use dynamic SQL in user-defined functions, full stop. The reason is simple: a function in SQL Server is not permitted to change database state, and obviously SQL Server cannot check beforehand what your dynamic SQL is up to. It also follows from the rule that you cannot call stored procedures in user-defined functions, and as we have learnt, a batch of dynamic SQL is a nameless stored procedure.

Permissions

This is one thing you need to be aware of, because here is something which is different from normal stored procedures. Consider this script where we create a test user which we impersonate to test permissions:

CREATE OR ALTER PROCEDURE mishap_sp AS SELECT COUNT(*) AS custcnt FROM dbo.Customers EXEC sp_executesql N'SELECT COUNT(*) AS empcnt FROM dbo.Employees' go CREATE USER testuser WITHOUT LOGIN GRANT EXECUTE ON mishap_sp TO testuser go EXECUTE AS USER = 'testuser' go EXEC mishap_sp go REVERT

The output (with output set to text in SSMS):

custcnt ----------- 91 (1 row affected) Msg 229, Level 14, State 5, Line 1 The SELECT permission was denied on the object 'Employees', database 'NorthDynamic', schema 'dbo'.

When testuser runs mishap_sp, the SELECT against Customers suceeds thanks to something known as ownership chaining. Because the procedure and the table have the same owner, SQL Server does not check if testuser has permission on the Customers table. However, the SELECT that is in the batch of dynamic SQL fails. This is because the batch of dynamic SQL is not considered to have an owner. Alternatively, the owner is considered to be the current user, which is testuser. Whichever, ownership chaining does not apply, and therefore SQL Server checks whether testuser has SELECT permission on Employees and since we never granted any permission to testuser, this fails.

Thus, if you consider using dynamic SQL, you need to talk with your DBA or whomever is in charge for the security in the database to verify that this is acceptable. Maybe the policy is that users should only have permission to run stored procedures, but they should not be granted any direct access to the tables. It is still possible to use dynamic SQL with such a policy in force, because you can bundle the permission with the stored procedure, if you sign it with a certificate and then grant a user created from that certificate the permissions needed. This is a technique that I describe in a lot more detail in my article Packaging Permissions in Stored Procedures. While this is a perfectly possible and valid solution, it does increase the complexity of your system a little bit, and thus raises the hurdle for using dynamic SQL.

You Are Hiding Your References

If you want to know where a certain table is referenced, SQL Server offers a couple of ways to determine this. There are the system procedure sp_depends and the catalog views sys.sql_dependencies and sys.sql_expression_dependencies. You can also use View Dependencies from the context menu of a table in the Object Explorer in SSMS; under the hood SSMS uses sys.sql_expression_dependencies. However, none of these will show you any references made in dynamic SQL, since when SQL Server parses a stored procedure and saves the dependencies, the dynamic SQL in just in a string literal which SQL Server has no reason to care about at that point.

Note: In my SQL Short Story, Where Is that Table Used?, I present a solution which reads all SQL code in a database into a full-text indexed table which permits you to search the code using full-text operators such as CONTAINS and thereby you can find references also in dynamic SQL.

Chintak Chhapia pointed out a problem of similar nature: when you are looking into to upgrade your SQL Server instance to a newer version, you may want to use the Upgrade Advisor to find if there are any compatibility issues in your code or whether you are using any deprecated features. However, the Upgrade Advisor is entirely blind for what you do in dynamic SQL.

The RETURN statement

Beside the differences with permissions, there are a few more small differences between dynamic SQL and stored procedures. None of them are very significant, but here is one that Jonathan Van Houtte ran into. He wanted to use a RETURN statement with a specific return value in his dynamic SQL, but this is only permitted in regular stored procedures. That is, this fails:

EXEC sp_executesql N'IF @x = 0 RETURN 122', N'@x int', 0

The error message is:

Msg 178, Level 15, State 1, Line 1 A RETURN statement with a return value cannot be used in this context.

You can still use RETURN without a return value in dynamic SQL. This is legal:

EXEC sp_executesql N'IF @x = 0 RETURN', N'@x int', 0

EXEC()

EXEC() is an alternate way to run dynamic SQL. It exists of historic reasons, see more the note at the end of this section. While there is never any compelling reason to ever use this form, there are plenty of examples of it out the wild, so it would be wrong to be silent on it.

EXEC() does not support parameters, so all values have to be inlined. Here is an example, similar to the one we looked at previously:

DECLARE @sql varchar(MAX), @when date = '1998-02-01', @custid nchar(5) = N'ALFKI', @empid int = 3 SELECT @sql = 'SELECT COUNT(*) FROM dbo.Orders WHERE OrderDate >= ''' + convert(char(8), @when, 112) + ''' AND OrderDate < ''' + convert(char(8), dateadd(MONTH, 1, @when), 112) + ''' AND EmployeeID = ' + convert(varchar(10), @empid) + ' SELECT CompanyName FROM dbo.Customers WHERE CustomerID = N''' + @custid + '''' PRINT @sql EXEC(@sql)

If you look at this code and compare it with the original example with sp_executesql, what do you think? Don't you get the impression that this looks a lot more cumbersome? The logic of the queries is more difficult to follow when the SQL is broken up by all those single quotes, plusses and calls to convert. The reason for this cascade of single quotes is that to include a single quote in a string literal, you need to double it, and typically the single quotes appear at the beginning or the end of a string fragment so the double single quote inside the string is followed by the single quote that terminates the string. (If that sentence got your head spinning, you sort of got my point. :-)

When it comes to the convert, date and time values constitute a special challenge, since you need to pick a good format code, so that the constructed date string is not misinterpreted. 112 is a good code for dates only, 126 for values with both date and time.

You may note here that @sql is declared as varchar(MAX); in difference to sp_executesql, EXEC() accepts both varchar and nvarchar.

As you can see, I have added a diagnostic PRINT, so that you can see the actual SQL generated. This is the output:

SELECT COUNT(*) FROM dbo.Orders WHERE OrderDate >= '19980201' AND OrderDate < '19980301' AND EmployeeID = 3 SELECT CompanyName FROM dbo.Customers WHERE CustomerID = N'ALFKI'

This is something I will return to in the style guide, but it is worth making the point already now: when you work with dynamic SQL and build a query string, you always need a diagnostic PRINT, so that you can see what you have generated in case you get an error message or an unexpected result.

The input to EXEC() does not have to be a single variable, but it can be a concatenation of simple terms, that is, variables and literals, as in this example::

DECLARE @custid nchar(5) = N'ALFKI' EXEC('SELECT CompanyName FROM dbo.Customers WHERE CustomerID = N''' + @custid + '''')

Functions calls are not permitted, so this fails with a syntax error:

DECLARE @when date = '1998-02-01', @empid int = 3 EXEC( 'SELECT COUNT(*) FROM dbo.Orders WHERE OrderDate >= ''' + convert(char(8), @when, 112) + ''' AND OrderDate < ''' + convert(char(8), dateadd(MONTH, 1, @when), 112) + ''' AND EmployeeID = ' + convert(varchar(10), @empid) + )

In any case, there is little reason to use this pattern. If things go wrong, there is no way to splice in a diagnostic PRINT, so it is much better to store the SQL string in an @sql variable.

Overall, there is little reason to use EXEC(), when sp_executesql permits you to use parameters and thereby making the code a lot more readable. And, as we shall see later in this article, readability and complexity is only one reason to stay away from string concatenation. So I could make it short and say Don't use EXEC(). Ever. But that would not be honest, because I tend to use EXEC() myself when the SQL string takes no parameters. That is, rather than saying

EXEC sp_executesql @sql

I most often write the shorter:

EXEC(@sql)

You will see this more than once in this article. But you could certainly argue that this is just a bad habit of mine. If you decide to always use sp_executesql, no matter whether your dynamic SQL takes parameters or not, you would not being do something wrong.

Before I close this section, I should add that there is actually one thing you can do with EXEC() that you cannot do with sp_executesql: you can impersonate user or a login when your run your SQL batch. So instead of the impersonation we did above when we tested permissions for testuser, we could have done:

EXEC ('EXEC mishap_sp') AS USER = 'testuser'

This has the advantage that you know for sure that you will revert from the impersonated context, even if there is an execution error. But personally, I don't think this outweighs the increased complexity that dynamic SQL incurs. In any case, this has to be considered as an advanced feature.

Note: EXEC() may come to use on very old versions of SQL Server. sp_executesql was introduced in SQL 7, so if you find yourself on SQL 6.x, EXEC() is your sole option. You may also have to use EXEC() on SQL 7 and SQL 2000, if your dynamic SQL can exceed 4000 characters, since nvarchar(MAX) is not available on these versions. EXEC() permits you to work around this, since you can say EXEC(@sql1 + @sql2 + @sql3) and it accepts that the resulting string exceeds 8000 bytes.

EXEC() AT linked server

This is a special form of EXEC() which permits you to run a batch of commands on a linked server. This form of EXEC() permits you to use parameters, if in a different way from sp_executesql.

To play with this, you can set up a linked server that goes back to your current instance like this:

EXEC sp_addlinkedserver LOOPBACK, '', 'SQLNCLI', @datasrc = @@servername EXEC sp_serveroption LOOPBACK, 'rpc out', 'true'

You can replace @@servername with the name of another instance you have access to if you like. Just make sure that you have NorthDynamic on that instance as well.

Here is how you could run the batch of dynamic SQL above:

DECLARE @sql varchar(MAX), @when date = '19980201', @custid nchar(5) = 'VINET', @cnt int SELECT @sql = 'SELECT ? = COUNT(*) FROM NorthDynamic.dbo.Orders WHERE OrderDate >= ? AND OrderDate < dateadd(MONTH, 1, ?) SELECT CompanyName FROM NorthDynamic.dbo.Customers WHERE CustomerID = ?' PRINT @sql EXEC(@sql, @cnt OUTPUT, @when, @when, @custid) AT LOOPBACK SELECT @cnt AS [Orders in Feb 1998]

In difference to sp_executesql, the parameters do not have names, but instead the question mark serves as a parameter holder. SQL Server will pass the parameters given to EXEC() AT in the order the question marks appear. That is, @cnt goes to the first question mark, and as you see we can use the OUTPUT keyword here as well. Next @when comes twice in the parameter list, simply because it used twice in the SQL code.

There is a good reason why EXEC() AT does not use parameter names like sp_executesql: the linked server may be something else than SQL Server where names starting with @ would make little sense. The ? parameter marker on the other hand is a standard thing in ODBC and OLE DB, and the OLE DB provider for the remote data source will change the ? into whatever makes sense on the other data source.

For simplicity, I used EXEC() AT with a linked server that is another SQL Server instance. However, in this case it is a lot easier to use sp_executesql with four-part notation as in this example:

DECLARE @sql nvarchar(MAX), @params nvarchar(4000) SELECT @sql = N'SELECT @cnt = COUNT(*) FROM dbo.Orders WHERE OrderDate >= @date AND OrderDate < dateadd(MONTH, 1, @date) SELECT CompanyName FROM dbo.Customers WHERE CustomerID = @custid' SELECT @params = N'@date date, @cnt int OUTPUT, @custid nchar(5) = N''ALFKI''' DECLARE @cnt int, @when date = '19980201' EXEC LOOPBACK.NorthDynamic.sys.sp_executesql @sql, @params, @when, @cnt OUTPUT SELECT @cnt AS [Orders in Feb 1998]

You may note that in the example with EXEC() AT, I included the database name in the SQL string, but I could avoid it in the version with sp_executesql. When you invoke sp_executesql (as any other system procedure) with a given database name, sp_executesql will run in the context of that database. Thus, there is all reason to use this pattern when you want to run a parameterised SQL statement on a remote SQL Server instance. But when your linked server is not running SQL Server, EXEC() AT is definitely your friend.

Keep in mind that there are some restrictions in what data types you can use on linked servers. For instance, you cannot use the xml data type. You can pass LOB types (e.g. nvarchar(MAX)), but only as input parameters; you cannot receive them as OUTPUT parameters.

Running Dynamic SQL From Client Code

The difference between stored procedures and client code in this regard is that in stored procedures you only use dynamic SQL occasionally, but in client code you could say that you do it all the time since the SQL code is always inside a string literal in the client-side language. However, that does not mean that you always will have to use string concatenation to build the string. Au contraire, this is an exceptional case. As long as you are only working with plain-vanilla queries against specific tables, your queries should be contained in a single string literal that holds the full query and where the input values from the user are passed as parameters. Or put in a different way: no matter whether you are doing dynamic SQL code in a stored procedure or in client code, anything that can be a variable as permitted by the SQL syntax, should be passed as parameter and not be inlined into the query string by means of concatenation. In this section we will learn how to do this in .NET, and we will also start looking at the many problems with inlining parameter values.

Note: If you are working in a different environment than .NET, you should still read this section. The pattern I introduce here apply to all client environments for SQL Server, although the name of the objects and the methods are different. And while .NET supports named parameters in the @param style, some environments only support parameters markers, typically ?, as we saw examples of in the previous section. I should also point out that this section aims at client code where you actually write SQL code. If you are using an ORM such Entity Framework you may be using a language like Linq that generates the SQL code. For reasons irrelevant to this article, I am not a friend of ORMs, but at least they keep users away from writing bad dynamic SQL, and I will not cover Linq and similar in this article.

For this section, we will not use the tables in NorthDynamic, but instead work with this table:

CREATE TABLE typetbl (ident int NOT NULL IDENTITY, intcol int NOT NULL, deccol decimal(8,2) NOT NULL, fltcol float NOT NULL, strcol nvarchar(25) NOT NULL, dtcol datetime NOT NULL, CONSTRAINT pk_datatyptbl PRIMARY KEY (ident) )

While you could create it in NorthDynamic, I recommend that you create the table in tempdb, and this is what I will assume in the text that follows.

Say that we want to insert a row into this table and we have the values in variables intval, decval etc and we want to get back the value for ident for the inserted row. Here is a method to do this:

public static void InsertTbl () { using(SqlConnection cn = new SqlConnection(strConn)) { using(SqlCommand cmd = new SqlCommand()) { cmd.CommandType = CommandType.Text; cmd.CommandText = @"INSERT dbo.typetbl(intcol, strcol, fltcol, deccol, dtcol) VALUES(@intval, @strval, @fltval, @decval, @dtval) SELECT @ident = scope_identity()"; cmd.Parameters.Add("@intval", SqlDbType.Int).Value = intval; cmd.Parameters.Add("@strval", SqlDbType.NVarChar, 25).Value = strval; cmd.Parameters.Add("@fltval", SqlDbType.Float).Value = fltval; cmd.Parameters.Add("@decval", SqlDbType.Decimal).Value = decval; cmd.Parameters["@decval"].Precision = 8; cmd.Parameters["@decval"].Scale = 2; cmd.Parameters.Add("@dtval", SqlDbType.DateTime).Value = dtval; cmd.Parameters.Add("@ident", SqlDbType.Int); cmd.Parameters["@ident"].Direction = ParameterDirection.Output; cmd.Connection = cn; cn.Open(); cmd.ExecuteNonQuery(); int identval = Convert.ToInt32(cmd.Parameters["@ident"].Value); Console.WriteLine("The inserted row has id " + identval.ToString()); }} }

We first create connection and command objects (inside using clauses to make sure that resources are released as soon they go out of scope) and set the command type to be text. Next, in adherence with what I said in the beginning of this section, we define the command text as a single static string which includes a couple of T‑SQL variables. Or more precisely, they are parameters.

In the suite of statements that follows, we define these parameters with the Add method of the SqlParameterCollection class. There are a couple of overloads of this method, of which the two most commonly used appear in this example. The first parameter is the name of the parameter. (It appears that the @ can be left out, but I recommend that you always included it.) The second parameter is a value from the enumeration SqlDbType (which is in the System.Data namespace). This enumeration basically has one entry for each data type in SQL Server, with the name being the same as the SQL Server data type, but with initial uppercase and some middle uppercase as well to spice it up. There are a few deviations with some lesser used types that I will not cover here. For a complete list, see the.NET Framework reference. For types that go with a length, that is, string and binary types, we use an overload that takes a third parameter which is the length. (For LOB types such as nvarchar(MAX), you specify -1 for the length.) The example includes a decimal value. There is no overload to define a decimal parameter with precision and scale, so we need set these properties separately as seen in the code sample above.

The Add method returns the newly created SqlParameter object which permits us to set the Value property directly, and thus for all but @decval we can define a parameter in a single line of code. I did not include this in the example, but to explicitly pass a NULL value, set the Value property to System.DBNull.Value.

For @ident there is no value to set, since this is an output parameter, which we indicate by setting the Direction property. (This property defaults to ParameterDirection.Input.)

Once the parameters have been defined, we open the connection, run the SQL batch and then we retrieve the generated IDENTITY value from the output parameter and print it out.

At this point you may want to try the code. The file clientcode.cs includes InsertTbl as well as a method BadInsert that we will look at later. There is also a Main that parses the command line and has a simple exception handler. In the top of the file you see this:

// Connection string, change server and database as needed! private static string strConn = @"Integrated Security=SSPI;" + @"Data Source=(local);Initial Catalog=tempdb;"; // Constants for the demo. private static int intval = 123456; private static string strval = "alpha (α) and beta (β)"; private static double fltval = 17.0/3.0; private static decimal decval = 456.76M; private static DateTime dtval = DateTime.Now;

Save and compile the program. (See here if you need assistance on compilation.) Open a command prompt to move to the folder where you have the executable and run it as:

clientcode

You should see an output like this:

The inserted row has id 1

You can also run a SELECT against typetbl to see that the values were inserted.

Let's now investigate what happens in terms of T‑SQL. You can use Profiler to capture the command that is sent to SQL Server. (You can also use the XEvent Profiler in recent versions of SSMS). You will see something like this: (I have reformatted the output for legibility):

declare @p8 int set @p8=2 exec sp_executesql N'INSERT dbo.typetbl(intcol, strcol, fltcol, deccol, dtcol) VALUES(@intval, @strval, @fltval, @decval, @dtval) SELECT @ident = scope_identity()', N'@intval int, @strval nvarchar(25), @fltval float, @decval decimal(8,2), @dtval datetime, @ident int output', @intval=123456, @strval=N'alpha (α) and beta (β)', @fltval=5.666666666666667, @decval=456.76, @dtval='2018-09-22 00:07:35.193', @ident=@p8 output select @p8

So that was the heart of the matter! The @ parameters are not just an abstraction in .NET, but the batch is passed to sp_executesql exactly as it was entered and all the parameters are part of the parameter list in @params.

The full story is a little different, though. .NET does not actually compose the code above, but what you see is a textual representation of what largely is a binary stream of data. What appears in the trace is an RPC event (RPC = Remote Procedure Call). That is, .NET tells SQL Server to run a stored procedure, this time sp_executesql, with so and so many parameters, and then it sends the parameters in binary form. If were you to eavesdrop on the wire with Wireshark or similar, you would see the INSERT statement, but not the surrounding quotes. If you were to look at the bytes preceding the INSERT keyword, you would find the length of the string. The same goes for the parameters @params and @strval. You would not be able to discern the values of @intval, @fltval, @decval and @dtval, unless you are familiar with the binary format.

Note: The variable @p8 in the Trace output is a very funny way that Profiler uses to indicate the return value of output parameters. In reality, @p8 does not exist at all. And particularly, the value 2 is never passed to the @ident parameter.

The reader may find this to be a little too much of nitty-gritty details to really catch your attention, so let me summarise what are the important points so far:

When you define one or more parameters for your SQL batch, this results in a call to sp_executesql .

. The parameter values are passed in a binary format without any SQL syntax around them. (Why this matters, will be apparent later.)

And most of all:

When you pass values to an SQL batch from your client code, be that .NET or anything else, you should always use parameterised statements.

The last point should really be a superfluous one to make, because what you have seen is the norm that any professional software developer should follow. But unfortunately, I see too far many examples in user forums and elsewhere which indicate that this is not always the case. Therefore, we need to look at what far too many people do, and why you should not do that. In clientcode.cs there is also a method BadInsert which looks like this:

public static void BadInsert () { using(SqlConnection cn = new SqlConnection(strConn)) { using(SqlCommand cmd = new SqlCommand()) { cmd.CommandType = CommandType.Text; cmd.CommandText = @"INSERT typetbl (intcol, strcol, fltcol, deccol, dtcol) VALUES(" + intval.ToString() + @", '" + strval + @"', " + fltval.ToString() + @", " + decval.ToString() + @", '" + dtval.ToString() + @"') SELECT scope_identity()"; Console.WriteLine(cmd.CommandText); cmd.Connection = cn; cn.Open(); Object res = cmd.ExecuteScalar(); int identval = Convert.ToInt32(res); Console.WriteLine("The inserted row has id " + identval.ToString()); }} }

At first glance it may seem shorter, and some readers may even find it simpler than InsertTbl, as there is no call to extra methods. "We just build the query string, let's stick to the KISS principle and skip the fancy stuff". (KISS = Keep it Stupid and Simple.) But, as we shall see, this is only stupid – it is not simple at all. The fact that I felt compelled to add a diagnostic write of the resulting command text may be a first indication of this.

If you are hot on it, you can try it immediately by running this in the command-line window:

clientcode BAD

This will invoke BadInsert. However, I should warn you that for a completely successful result all the points below must be true:

In your regional settings in Windows (or locale on Linux), the decimal separator must be a period and not a comma. The date format in your regional settings must match the date-format setting in SQL Server. Your database must have a Greek collation.

When I ran the program in on my computer, I got this output:

INSERT typetbl (intcol, strcol, fltcol, deccol, dtcol) VALUES(123456, 'alpha (a) and beta (ß)', 5,66666666666667, 456,76, '2018-09-22 23:25:57') SELECT scope_identity() EXCEPTION THROWN: There are fewer columns in the INSERT statement than values specified in the VALUES clause. The number of values in the VALUES clause must match the number of columns specified in the INSERT statement.

As you see, an exception was thrown and no data was inserted. Furthermore, the error message seems mysterious at first. When we go back to the .NET code, and count columns and values, we can find no mismatch. But if you look more closely at the generated command you can find commas in the values for deccol and fltcol, which messes up the T‑SQL syntax. These commas appeared because ToString respects the regional settings and my I have my regional settings in Windows set to Swedish, where the decimal delimiter is a comma and not a period. There is obviously a simple fix: use something more sophisticated than ToString that permits you to control the format. However, imagine a naïve programmer living in, say, Australia, where period is the decimal delimiter, who writes the above. The code works, passes all tests and is shipped. Then at some point it reaches a part of the world, for instance continental Europe, where comma is the decimal delimiter. And European users find that the software crashes as soon as they start it. This is not really aimed to give credibility in the software vendor.

To be able to run the program, I changed my regional settings to English (Australia). Now I got a different error:

INSERT typetbl (intcol, strcol, fltcol, deccol, dtcol) VALUES(123456, 'alpha (a) and beta (ß)', 5.66666666666667, 456.76, '22/09/2018 11:33:51 PM') SELECT scope_identity() EXCEPTION THROWN: The conversion of a varchar data type to a datetime data type resulted in an out-of-range value. The statement has been terminated.

This time, the decimal and float values are good, but instead the value for dtcol causes a problem. My language setting in SQL Server is us_english and the date-format setting is mdy. That is, SQL Server expects the value to be month/day/year. When I ran this on the 22nd of the month, I got the error. For the first twelve days of the month, the value would have been inserted – but for eleven of those days it would have been inserted wrongly! Again, this is something can be addressed by formatting: the date should be in a format that is safe in SQL Server, for instance YYYY-MM-DDThh:mm:ss.fff. (The T in this format stands for itself.) But a naïve developer in the US will never notice the problem, because he has the same setting in Windows and SQL Server. (And it is not any better with a naïve Swedish developer for that matter. The Swedish format YYYY-MM-DD is correctly interpreted with the mdy setting.) In this case, the software may even work at sites in countries like Germany or Australia where dates are written as day/month/year, if they have the date format set to dmy in SQL Server. But sooner or later there will be a mismatch, and there will be errors. Or even worse, incorrect dates.

Note: for a more detailed discussion on date formats, see the article The ultimate guide to the datetime datatypes by SQL Server MVP Tibor Karaszi.

I made one more attempt to run the program. This time I set my regional settings to English (United States). I got this output:

INSERT typetbl (intcol, strcol, fltcol, deccol, dtcol) VALUES(123456, 'alpha (a) and beta (ß)', 5.66666666666667, 456.76, '9/22/2018 11:50:47 PM') SELECT scope_identity() The inserted row has id 4

Success! The row was inserted. However, when I looked at the data, I noticed a snag. In strcol I saw this:

alpha (a) and beta (ß)

If you look closely, you can see that the Greek alpha has been replaced by a regular lowercase Latin lowercase a, and instead of a lowercase beta, there is a German "scharfes s". This is due to that there is no N before the string literal in the SQL statement. Thus, this is a varchar literal, and for varchar, the character repertoire is defined by the code page for the collation. My collation for the test was a Finnish_Swedish collation, which uses code page 1252 for varchar, and this code page does not include alpha and beta. Therefore, SQL Server replaced them by fallback characters it deemed suitable.

Note: the observant reader may notice that the same replacement occurred also in the output in the command-line window when running the program. This is not because of the missing N, but because the command-line environment has no Unicode support at all.

There are more problems with BadInsert. The clientcode program permits you to override the value for strval on the command line. Run this:

clientcode BAD Brian O'Brien

I had changed back my regional settings to Swedish when I ran this, but it does not really matter.

INSERT typetbl (intcol, strcol, fltcol, deccol, dtcol) VALUES(123456, 'Brian O'Brien', 5,66666666666667, 456,76, '2018-09-23 00:07:36') SELECT scope_identity() EXCEPTION THROWN: Incorrect syntax near 'Brien'. Unclosed quotation mark after the character string ') SELECT scope_identity()'.

Because of the single quote in Brian's name, this time I got a syntax error. Now look at this:

clientcode GOOD Brian O'Brien

GOOD invokes InsertTbl and the row is inserted without any pain.

Yes, we can fix BadInsert so that it doubles the quotes in strval. But is that going to make the SQL code easier to read?

And that brings us to the gist of this exercise. InsertTbl shows a very structured way of writing data-access code. You define the SQL statement in a single string, which is free from any disruption of .NET syntax, save for the surrounding double quotes. Then it is followed by a definition of the parameters in a simple but structured way. In BadInsert, the SQL statement is mixed with .NET syntax, which means that the more variables there are, the more difficult it is to see the SQL forest for all the .NET trees.

With BadInsert you can get a clash between the regional settings in the client and the interpretation in SQL Server which has its own settings and rules, different from .NET and Windows. On the other hand, with InsertTbl, all values are interpreted consistently in the client program according to the regional settings on the user's computer and they are transformed to an unambiguous binary format that SQL Server always interprets in the same way.

When it comes to the final mistake, leaving out the N, this is an error you could commit with InsertTbl as well, as you could mistakenly use SqlDbType.VarChar instead of SqlDbType.NVarChar which would lead to the same mutilation of the Greek characters. Still, the .NET interface sort of encourages you to be exact with the data types.

The observant reader may have noted that there is one more difference between the two methods. In InsertTbl, I use an output parameter to get the id value of the inserted row, whereas in BadInsert I return the id with a SELECT statement and I use ExecuteScalar to run the batch and to get the value that way. Both of these approaches are perfectly valid, and I wanted to show examples with both. Which one you use is a matter of taste and what fits in best with the moment. (Obviously, if you have more than one return value, ExecuteScalar is not an option.)

Some readers may feel that having to specify the exact data type and details like length of strings and precision/scale for decimal is a bit of a burden. Surely .NET which is aimed at increasing developer productivity, could do better? Indeed. If you don't specify the length for a string or binary parameter, .NET will use the length of the value you supply, and the similar is true for decimal values. Furthermore, the SqlParameterCollection class provides the method AddWithValue which infers the type from the value you pass in. NEVER USE ANY OF THIS, PERIOD! You see, while this a boost to developer productivity, it is a setback for the productivity of DBA who is likely to be the one who has to figure out why the database access is slow. Why it is so, is something I will return to in the performance chapter. But if you want to know more details here and now, you can read the blog post AddWithValue is Evil from SQL Server MVP Dan Guzman.

You may argue that mirroring the length of strings in the .NET application causes a maintenance problem and you don't want to change the code if the column is changed from nvarchar(25) to nvarchar(50). If that is a concern, you can use 100 as a standard value for all "short" columns and 4000 for all longer nvarchar and 8000 for long varchar. (For MAX parameters, you should use -1 as noted above.) There are still situations where this can affect performance, but this something which a lot more subtle, and it is nothing I will discuss in this article. And the impact is nowhere as far-reaching as when you don't specify any length at all.

In this section, I only looked at .NET, but whichever environment you are using, building an SQL string by concatenating user input is totally unacceptable, if you are writing code in a professional context as an employee or a consultant. You must always use parameters where the SQL syntax permits. This may seem like a very harsh thing to say, but in this section we only looked at it from the point of view of developer ease. Writing and maintaining code with fixed parameterised SQL strings is so much easier than inlining parameter values. We will look at an even more important reason in the next chapter from which it will be clear why I use such strong words. Note that we will keep working with typetbl and the clientcode program, so don't drop them now.

Beware of SQL Injection

When you work with dynamic SQL you need to be aware of SQL injection. Well, "aware" is a bit of understatement. Every time you build code dynamically from some input, you should ask yourself these questions:

Can the input be crafted so that the generated SQL code performs something else than intended? Can this be utilised by a malicious user? Can I protect myself against it?

In this chapter, we will look at how the first two things can happen and what your means of protection are. The first section discusses SQL injection in client code, but this does not mean that you can skip this section if you only do T‑SQL programming, because the lessons in this chapter are of a general nature and applies to everyone. The remaining sections in this chapter covers aspects specific to T‑SQL.

SQL Injection in Client Code

What is SQL injection? SQL injection is when a user enters SQL syntax in an input field or similar and causes the application to perform something else than intended. Here is a classic example, Exploits of a Mom, from the comic strip xkcd.

For the reader who have never heard of the concept of SQL injection, this may not register right away, but let's try it with our clientcode program. Let's first create a Students table:

CREATE TABLE Students (a int NOT NULL)

(Since we are going to try to drop it, the columns do not really matter.)

To see if xkcd might be pulling our legs, we try the name as given in the strip:

clientcode BAD Robert'); DROP TABLE Students; --

No, this did not work out:

INSERT typetbl (intcol, strcol, fltcol, deccol, dtcol) VALUES(123456, 'Robert'); DROP TABLE Students; --', 5,66666666666667, 456,76, '2018-09-23 21:47:00') SELECT scope_identity() EXCEPTION THROWN: Incorrect syntax near '5'

But if you look more closely at this, you may see that xkcd is on to something. That single quote after Robert seems to fit into the syntax. The single quote that originally was in BadInsert is still there – but it is hidden behind the two hyphens that start a comment. As it happens, I introduced newlines in the generated statement to make it easier to read, and therefore the number on the next line causes a syntax error. Many developers would not bother about adding newlines, and with the statement on a single line, the generated command would have been syntactically correct as such. It would still have failed due to the mismatch of number of columns between the INSERT and VALUES list.

None of those are obstacles that cannot be overcome. Next try this:

clientcode BAD Robert', 1, 1, 1); DROP TABLE Students; SELECT 1 WHERE 1 IN ('1

It may look like a big piece of nonsense, but look at the output:

INSERT typetbl (intcol, strcol, fltcol, deccol, dtcol) VALUES(123456, 'Robert', 1, 1, 1); DROP TABLE Students; SELECT 1 WHERE 1 IN ('1', 5,66666666666667, 456,76, '2018-09-23 21:51:48') SELECT scope_identity() The inserted row has id 1

The batch completed without errors and if you run SELECT * FROM Students in a query window, you find that the table is not there.

To understand how BadInsert was exploited, let's look at the techniques used. To get the original INSERT statement to work, we added the digit 1 as many times as needed to make VALUES to match INSERT. And we were happy to see that 1 was accepted for all values (there could have been conversion errors or constraint violations). To get the rest of the syntax to work, we had to incorporate that list of values in a new statement, and a SELECT with IN seemed like the easiest pick. We don't bother about a table in the SELECT, because there is nothing that says that a column or a variable must come before IN – it can just as well be a literal. Finally, we had to consider the quote that is intended to close the value in strval and the comma that follows. As you can see, this was achieved by the final '1 .

Just as a contrast: recreate the Students table and run the same again, but replacing BAD with GOOD. This also completes successfully, but when you check the Students table, it is still there. If you look in typetbl, you find the first 25 characters of the injection string, and in fltcol, deccol and dtcol you find the constant values from the clientcode.cs file, that is, the intended values. On the other hand, in the row inserted in the injection attack we see something else. (The result of all those 1s.) You may also note that the ident column for the row added with BadInsert has a different value than was reported in the output above. This is because ExecuteScalar picked up that 1 from the extra SELECT we injected.

It was easy for us to compose the injection string, because we had access to the source code. You may argue that it would be a lot more difficult for an intruder who don't see code to find out how the injection attack should be crafted, but that is not really the case. Imagine that that BadInsert is actually inside a web application which is open to everyone on the Internet and strval gets its value from an URL. Or for that matter, a cookie. The evil user may first attempt a simple type of exploit in the xkcd strip. If the user is lucky, the web server displays the error message from SQL Server, and in such case it's a relatively simple task to figure it out. That would be bad practice in itself; best practice is to configure the web server to show a generic error message and only write the actual error message to a local log file. But that is still not a roadblock, only a small speedbump that may stop someone who is trying to find an injection hole manually. In real life, intruders use automated tools to find SQL injection holes. A site which is written in the style of BadInsert will of course have many injection holes to try. An MVP colleague demoed such a program to me. He directed it at a couple of web sites, and it took the program maybe ten seconds to determine that a site was vulnerable to SQL injection.

It is very important to understand that the people who engage in this are not lonesome teenage boys who want to show off. No, the reason people spend their time on finding SQL injection holes spells m-o-n-e-y. If the database handles money by itself, the money can be made directly in the system. The money can also be made by stealing the information in the database and sell it. Or the intruder can use the SQL Server machine as an entrance to the rest of the network to infect all PCs in the organisation in preparation to attack a third party in a DDOS attack the intruder is paid to perform. Or the intruder may install ransomware and blackmail the site. Or...

Just to take one example from real life: Marriott announced in 2018 that almost 500 million customers' data had been leaked in a major data breach. And one of the means in the attack was, yes, SQL injection. You can search for Marriott and sql injection on your favourite search engine to find articles about the incident. Here is also a direct link to one of the articles I found.

While the biggest dangers are with web applications exposed on the Internet, this does not mean that you can ignore the risk for SQL injection in intranet applications or Windows clients. There may be malicious or disgruntled employees who want to steal data or just cause a mess in general. And there can be computers in the organisation that have been infected with malware that attempt to attack all machines they find.

So how do you protect yourself against SQL injection? In the xkcd strip, the mom talks about sanitising database inputs. What she presumably is alluding to is that the single quotes in the input string should be doubled. But that alone does not protect you against SQL injection; there are injection attacks that does employ the single quote. Therefore, it not uncommon to see suggestions that some characters should not be permitted input at all, but that is not a very fruitful way to go, since these characters may be part of perfectly legit input values. Just imagine that Brian O'Brien finds that he can't enter his name, but is told that it includes an illegal character!

No, there is a much better way to stop SQL injection and you have already seen it: parameterised statements. When input values are passed as parameters through SqlParameterCollection and ultimately to sp_executesql, SQL injection cannot happen, because there is no place to inject anything. That said, there is still one more precaution to take: the application should be running with a limited set of permissions. It is far too common for web applications to run with elevated permissions or even as sa. And that is very bad. After all, even if you have learnt by now how to avoid SQL injection, it only takes one junior programmer who has not and there is an SQL injection hole. Therefore, an application should never be granted more than membership in db_datareader and db_datawriter. That will limit the damage of an SQL injection hole. Even better, the application should only use stored procedures throughout. In that case, all you need to grant is EXECUTE permissions. (You can do this on schema or database level, no need to do it per procedure.)

Note: some readers may come with objections like "our application creates tables dynamically" etc so it must have elevated permissions. In that case, you should package these permissions inside stored procedures, and I describe this in detail my article Packaging Permissions in Stored Procedures. Applications should never run with elevated permissions, period!

SQL Injection in Dynamic SQL in Stored Procedures

When you work with dynamic SQL in a stored procedure, you should of course use sp_executesql with a parameterised statement to protect yourself against SQL injection in plain input values like search parameters etc. However, in the very most cases when you build an SQL string dynamically inside a stored procedure, it is because there is some input value that you cannot pass as a parameter, but which must be inserted into the SQL string, for instance a table name.

Here is an example. Charlie Brown is the DBA at a university where each student have their own little database where they can do whatever they like. Charlie wants to be helpful, so he schedules a job that loops through all tables in the student databases to defragment them. Here is the core of that script:

DECLARE @sql varchar(MAX) DECLARE table_cur CURSOR STATIC LOCAL FOR SELECT 'ALTER INDEX ALL ON [' + name + '] REBUILD ' FROM sys.tables OPEN table_cur WHILE 1 = 1 BEGIN FETCH table_cur INTO @sql IF @@fetch_status <> 0 BREAK PRINT @sql EXEC(@sql) END DEALLOCATE table_cur

This is not a very good script. Charlie has added brackets around the table name, so the script will not fail if the student has created tables with spaces or other funny characters in the name. One flaw is that he has overlooked that tables do not have be in the dbo schema, but that is the small problem. One of the students at the university is Joe Cool. Here is a script to create his database:

CREATE LOGIN JoeCool WITH PASSWORD = 'CollegeKid' CREATE DATABASE JoeCool ALTER AUTHORIZATION ON DATABASE::JoeCool TO JoeCool SELECT is_srvrolemember('sysadmin', 'JoeCool')

The final SELECT informs you that Joe is not a member of sysadmin. But that is exactly Joe's ambition, so in his database, he creates this:

USE JoeCool go CREATE TABLE tbl (a int NOT NULL) CREATE TABLE [tbl]] REBUILD; ALTER SERVER ROLE sysadmin ADD MEMBER JoeCool --] (a int NOT NULL)

Yes, that last table has a funny name. Just as funny as Bobbie Tables in the xkcd strip above. At night, Charlie's Agent job runs, running with sysadmin permissions of course. Run the cursor batch above in the database JoeCool, and look at the statements (I've added a line break for legibility)

ALTER INDEX ALL ON [tbl] REBUILD ALTER INDEX ALL ON [tbl] REBUILD; ALTER SERVER ROLE sysadmin ADD MEMBER JoeCool --] REBUILD

Now run this again:

SELECT is_srvrolemember('sysadmin', 'JoeCool')

Joe is now sysadmin.

Look at how Joe constructed his table name which I print out for clarity: tbl] REBUILD; ALTER SERVER ROLE sysadmin ADD MEMBER JoeCool --. (You can also run a query over sys.tables to see it.) He started with the name of another table in the database. He then added a right bracket, which he doubled in the CREATE TABLE script, since it was inside a bracketed name. The purpose of the bracket is to balance the left bracket in Charlie's script. Joe also added REBUILD to make the command syntactically complete. He then added the command he wanted to run, and at the end he added comment characters to kill the last part of Charlie's command.

Note: the command ALTER SERVER ROLE that Joe uses was introduced in SQL 2012. You can use sp_addsrvrolemember for the same trick on SQL 2008 and earlier.

There is a very simple change that Charlie can make to his script to stop Joe's attack. Before we look at it, let's first deprive Joe of his membership in sysadmin:

ALTER SERVER ROLE sysadmin DROP MEMBER JoeCool

Now change the SELECT statement in the cursor to read:

SELECT 'ALTER INDEX ALL ON ' + quotename(name) + ' REBUILD ' FROM sys.tables

Run the cursor again. This is the output (again with a line break to fit the page width):

ALTER INDEX ALL ON [tbl] REBUILD ALTER INDEX ALL ON [tbl]] REBUILD; ALTER SERVER ROLE sysadmin ADD MEMBER JoeCool --] REBUILD

At first glance, it may seem identical to the above. However:

SELECT is_srvrolemember('sysadmin', 'JoeCool')

now returns 0. Joe was not promoted to sysadmin. If you look closer you can see why: the right bracket in Joe's table name is now doubled so the statement now achieves Charlie's goal: to rebuild the index on the table with the funny name. That is the purpose of the quotename function. Before we look closer at this important function, I like to make a comment on the example as such. Normally, you may think of a dynamic table name as something which comes in through a procedure, and which ultimately may come from the outside of the server. But as you saw this example, the attacker can also come from within the SQL Server instance.

Quotename and quotestring

Whenever you build dynamic SQL statements where a variable holds the name of something, you should use quotename. quotename returns its input surrounded by square brackets, and any right brackets are doubled as seen here:

SELECT quotename('a'), quotename('[a]')

The output is:

[a] [[a]]]

In this way, you protect yourself from SQL injection in metadata names. When you receive a table name as a parameter in a stored procedure, there are some more precautions that are more related to interface design than SQL injection as such, so we will look at this later in the style-guide chapter.

quotename accepts a second parameter that permits you to specify an alternative delimiter. Here are some examples to illustrate this:

SELECT quotename('Brian O''Brien', '''') -- 'Brian O''Brien' SELECT quotename('double (") quote', '"') -- "double ("") quote" SELECT quotename('(right) paren', '(') -- ((right)) paren) SELECT quotename('{brace} position', '}') -- {{brace}} position} SELECT quotename('right <angle>', '<') -- <right <angle>>> SELECT quotename('back`tick', '`') -- `back``tick` SELECT quotename('no $ money', '$') -- NULL

In all cases, the closing delimiter is doubled. As you can see, for the paired delimiters, you can pass either the left or the right delimiter, but it is always the closing delimiter that is doubled. As the last example suggests, you cannot pass any character as the delimiter, but all the supported ones are shown above. Of these, the most important is the first one, which you need to use if you want to incorporate a string value in your statement. As an example, here is a batch that creates a database with the name set dynamically:

DECLARE @dbname sysname = 'My New Database', @datadir nvarchar(128), @logdir nvarchar(128), @sql nvarchar(MAX) SET @datadir = convert(nvarchar(128), serverproperty('InstanceDefaultDataPath')) SET @logdir = convert(nvarchar(128), serverproperty('InstanceDefaultLogPath')) SELECT @sql = 'CREATE DATABASE ' + quotename(@dbname, '"') + ' ON (NAME = ' + quotename(@dbname, '''') + ', FILENAME = ' + quotename(@datadir + @dbname + '.mdf', '''') + ') LOG ON (NAME = ' + quotename(@dbname + '_log', '''') + ', FILENAME = ' + quotename(@logdir + @dbname + '.ldf', '''') + ')' PRINT @sql EXEC(@sql)

This was the statement generated on my machine:

CREATE DATABASE "My New Database" ON (NAME = 'My New Database', FILENAME = 'S:\MSSQL\SQLXI\My New Database.mdf') LOG ON (NAME = 'My New Database_log', FILENAME = 'S:\MSSQL\SQLXI\My New Database.ldf')

Note: this example does not run on SQL 2008 and earlier. These two parameters to serverproperty were introduced in SQL 2012.

Observe that what I pass to quotename is the full path to the two database files; that is, string concatenation occurs inside quotename. For the sake of the example, I used double quotes to delimit the database name, rather than the more common square brackets. (The latter is Microsoft-specific, whereas the double quote is the ANSI standard). As for the other delimiters you can use with quotename, the situations where you get to use them are few and far between.

quotename is a very convenient function, but there is a trap that you need to beware of. As the name suggests, quotename is intended to be used with object names. Its input is limited to 128 characters (which is the maximum length of an identifier in SQL Server), as seen in this example:

DECLARE @x nvarchar(255) = replicate('''', 128) SELECT quotename(@x, '''') SET @x = replicate('''', 129) SELECT quotename(@x, '''')

The first SELECT returns a long list of single quotes (more precisely 258 of them!), but the second returns NULL. Thus, as long as you only use quotename to delimit identifiers in brackets, you are safe. But when you use quotename to delimit string values in single quotes, you need to consider that the value may exceed 128 characters in length. The CREATE DATABASE example above is certainly vulnerable to this risk. The return type for the two serverproperty parameters is nvarchar(128) and this is also the definition of sysname. Thus, the full file name can be up to 260 characters long.

If you want to eliminate this risk, you can use the user-defined functions quotestring and quotestring_n. Both accept nvarchar(MAX) as input and return the same data type. They wrap their input in single quotes, and double any single quotes within the strings. quotestring_n also tacks on an N to make the returned string an nvarchar literal. With quotestring, you need to cater for this yourself. They are not a built-in, but listed below:

CREATE FUNCTION quotestring(@str nvarchar(MAX)) RETURNS nvarchar(MAX) AS BEGIN DECLARE @ret nvarchar(MAX), @sq nchar(1) = '''' SELECT @ret = replace(@str, @sq, @sq + @sq) RETURN(@sq + @ret + @sq) END go CREATE FUNCTION quotestring_n(@str nvarchar(MAX)) RETURNS nvarchar(MAX) AS BEGIN DECLARE @ret nvarchar(MAX), @sq nchar(1) = '''' SELECT @ret = replace(@str, @sq, @sq + @sq) RETURN('N' + @sq + @ret + @sq) END

We will see examples of using these functions throughout in this atricle.

You may ask: when should I use quotename (or quotestring)? Every time you insert an object name or a string value from a variable into your SQL string, without any exception. Take the database example above: the two paths are not likely attack vectors, since you need to be sysadmin or a Windows administrator to change them. But the database name may come from an untrusted source (like our college kid Joe Cool) and contain something malicious. And even if all this is hypothetic to you, there is still argument of general robustness: if any of the names would happen to include a single quote, why should the command fail with a syntax error?

The Importance of nvarchar

This is an SQL injection trap that I was unaware of myself, until I by chance was looking at Remus Rusanu's blog for something else as I was working on this article.

There is one more flaw in Charlie Brown's script above, and which Joe Cool exploits by creating this table:

CREATE TABLE [tbl〛 REBUILD; ALTER SERVER ROLE sysadmin ADD MEMBER JoeCool --] (a int NOT NULL)

That funny character after the last character in tbl is the Unicode character U+301B, Right White Square Bracket. Night comes, and Charlie's enhanced script using quotename runs and this is printed (with line-breaks added):

ALTER INDEX ALL ON [tbl] REBUILD ALTER INDEX ALL ON [tbl]] REBUILD; ALTER SERVER ROLE sysadmin ADD MEMBER JoeCool --] REBUILD ALTER INDEX ALL ON [tbl] REBUILD; ALTER SERVER ROLE sysadmin ADD MEMBER JoeCool --] REBUILD

Note that the funny double-bracket character is now a regular right bracket. If you run

SELECT is_srvrolemember('sysadmin', 'JoeCool')

you see that Joe Cool is again sysadmin.

The reason this happened is that Chaiile was sloppy in his script when he declared the @sql variable:

DECLARE @sql varchar(MAX)

It's varchar, not nvarchar. When data is converted from nvarchar to varchar, any character in the Unicode string that is not available in the smaller character set used for varchar is replaced with a fallback character. And in this case, the fallback character is a plain right bracket. quotename did not help here, because quotename worked on the input from sys.tables which is nvarchar and the conversion to varchar happened after quotename had done its job.

Fix the declaration of @sql and take Joe out of sysadmin:

ALTER SERVER ROLE sysadmin DROP MEMBER JoeCool

Change the variable declaration and run Charlie's script again. This time the output is:

ALTER INDEX ALL ON [tbl] REBUILD ALTER INDEX ALL ON [tbl]] REBUILD; ALTER SERVER ROLE sysadmin ADD MEMBER JoeCool --] REBUILD ALTER INDEX ALL ON [tbl〛 REBUILD; ALTER SERVER ROLE sysadmin ADD MEMBER JoeCool --] REBUILD

The right white square bracket is still there and is_srvrolemember informs us that Joe is not sysadmin.

This character is not the only one, but there are more Unicode characters that can be used for such exploits. For instance, in his blog Remus mentions Modifier Letter Apostrophe, U+02BC, which gets replaced by a single quote when the string is cast to varchar.

We can also use this example to chalk one up for sp_executesql. Since sp_executesql requires you to use nvarchar for your @sql variable, you cannot open up this type of hole; it can only happen to you if you use EXEC().

Note: While the main purpose of these examples is to demonstrate that SQL injection can sneak in where you may not expect it, the major mistake of Charlie is another one: he is running with elevated permissions when he shouldn't. Whenever you as a server-level DBA run code in a user database you cannot trust, you should bracket your commands in EXECUTE AS USER = 'dbo' and REVERT. This sandboxes you into the current database, and you no longer have permissions on server level, and a user like Joe cannot exploit them to become sysadmin, or anything else he cannot do himself, assuming that he has db_owner rights in the database. The SQL injection attacks we have seen is not the only type of attack that Joe can perform. For instance, he could add a DDL trigger that is fired on ALTER INDEX and which adds him to sysadmin when Charlie's script runs.

Other Injection Holes

In all examples so far, the attack vector has been where a variable which is supposed to hold a single token has been concatenated to the SQL string. Since it is a single token, protection with parameterisation and quotename is simple.

If you have variables that are intended to hold multiple tokens, it can be very difficult to protect yourself against SQL injection. Here is one example, which I still see far too often:

SELECT @list = '1,2,3,4' SELECT @sql = 'SELECT * FROM tbl WHERE id IN (' + @list + ')'

How do you ensure that @list really is a comma-separated list of values and not something like '0) DROP DATABASE abc --' ? The answer is this particular case is that you don't use dynamic SQL at all, but you should use a list-to-table function. See my short article Arrays and List in SQL Server for alternatives.

Another example is this: say that you have a stored procedure that returns a result set and you are already using dynamic SQL to produce that result set. You decide to beef it up by adding a parameter to specify the sort order. It may seem as simple as:

CREATE OR ALTER PROCEDURE my_dynamic_sp ..... @sortcols nvarchar(200) = NULL AS ... IF @sortcols IS NOT NULL SELECT @sql += ' ORDER BY ' + @sortcols ...

But as you understand by now, this procedure is wide open to SQL injection. In this case, you may argue that you have it all under control: you are showing the column names to the user in a dropdown or some other UI device, and there is no place where the user can type anything. That may be true today, but it may change two years from now, when it is decided to add a web interface, and all of a sudden the sort columns come as parameters in a URL (which the user of course has all the powers to manipulate). When you are writing a stored procedure for application use, you should never make any assumptions about the client, even if you write the client code as well. Least of all when it comes to serious security risks as SQL injection.

There are several ways to skin this cat. You could parse @sortcols and then apply quotename on the individual columns. Personally, I think I would prefer to use a three-column table-valued parameter (one column for the position, one for the column name and a bit column for ASC/DESC). Or I might prefer to entirely decouple the client from the column names and do something like this:

SELECT @sql += CASE @sortcol1 WHEN 'col1' THEN 'col1' WHEN 'col2' THEN 'col2' ... ELSE 'col1' -- Need to have a default. END + ' ' + CASE @isdesc1 WHEN 0 THEN 'ASC' ELSE 'DESC' END

If col2 would to be renamed to something else, the client would not be affected.

And to generalise this: whenever you think of passing variables with multi-token SQL text to your stored procedure in an application, you need to work out a solution for your stored procedure so that is not open to SQL injection. This is by no means simple, since you have to parse fragments of SQL code, something which T‑SQL is not well-equipped for. There can be all reason to reconsider your approach entirely. For instance, if you have the idea of passing a full-blown WHERE clause, you should probably not use a stored procedure at all, but construct all SQL in the client. (Passing a WHERE clause from client to server is not only bad because of the risk for SQL injection, but also because it introduces a tight coupling between client and server that will be a maintenance headache in the long run.) There is no reason to use stored procedures for the sake of it. There are scenarios which are very dynamic where you want to give users the ability to retrieve data with a high degree of flexibility. Such scenarios are best solved client-side, and in such solutions, the SQL code is likely to be chopped up in small fragments that are pieced together. Obviously, you should still protect yourself against SQL injection, and you should still use parameters for values in WHERE clauses. Names of columns and tables should never come from text fields or URLs, but from sources you have full control over.

So far application code. For a pure DBA utility you can be permit yourself to be a little more liberal. To take one example, in my article Packaging Permissions in Stored Procedures, I present a stored procedure GrantPermsToSP of which one parameter is a TVP with permissions to be inserted into a GRANT statement. I make no validation of this parameter, and the procedure is indeed open for SQL injection. However, as it is intended to be called from SSMS by persons who are in the db_owner role, there is nothing they can gain by exploiting the injection hole – they cannot run more powerful commands through the procedure than they can run directly themselves. And that is the quintessence: SQL injection is a threat, when it permits users to elevate their permissions beyond from what they can do on their own from SSMS or similar. (Keep in mind that for web users who have no direct access to SQL Server at all, this means that any command they can inject is an elevation.) From this follows that if you sign your utility procedure with a certificate to package permissions with the procedure to permit low-privileged users to run some privileged command in a controlled way (and that is exactly what you use GrantPermsToSP for), you must absolutely make sure that you don't have any SQL injection hole.

Performance

You may ask if there are any performance differences between using dynamic SQL and static SQL in stored procedures. The answer is that as long as you observe a few guidelines when you compose your dynamic SQL, there is hardly any difference at all. (Well, if you produce a 2000-line beast in your client and pass it to SQL Server, you should probably consider moving it to a stored procedure, as there is a cost for passing all that code across the wire.)

When you construct your dynamic SQL there are two things you need to observe to avoid wreaking havoc with performance. One of them you have already learnt. That is, you should always parameterise your statements and not inline simple parameter values for several reasons:

It makes the code easier to read.

It makes it easier to deal with dates.

It makes it easier to deal with input like Brian O'Brien.

You protect yourself against SQL injection.

Now you will learn one more reason: it is better for performance. Not so much for your own individual query, but for the server as a whole. This is related to how SQL Server caches query plans. For a stored procedure, SQL Server looks up the query plan in the cache by the name of the stored procedure. A batch of dynamic SQL does not have a name. Instead, SQL Server computes a hash value from the query text and makes a cache lookup on that hash. The hash is computed on the query text as-is without any normalisation of spaces, upper/lower, comments etc. And most importantly, not of parameter values inlined into the string. Say that you have something like this:

cmd.CommandText = @"SELECT ... FROM ... JOIN ... WHERE col1 = " + IntValue.Tostring();

Say now that this command text is invoked for IntValue = 7, 89 and 2454. This results in three different cache entries. Now imagine that the query has several search parameters and there is a multitude of users passing different values. There will be a lot of cache entries, and this is not good for SQL Server. The more entries there are, the bigger the risk for hash collisions. (That is, two different query texts have the same hash value, whereupon SQL Server needs to compare the query texts directly.) On top of that, the cache starts to consume a lot of memory that could be better used elsewhere. Eventually, this can lead to plans to be kicked out of the cache, so that reoccurring queries needs to be recompiled. And last but not least: all this compilation will take a big toll on the CPU.

I like to make you aware of that when SQL Server computes the hash value, this includes the parameter list. Thus, these two calls will result in two cache entries:

EXEC sp_executesql N'SELECT COUNT(*) FROM Customers WHERE City = @city', N'@city nvarchar( 6 )', N'London' EXEC sp_executesql N'SELECT COUNT(*) FROM Customers WHERE City = @city', N'@city nvarchar( 7 )', N'Abidjan'

Note the difference in length of the declaration of the @city parameter.

And this is what you get if you in .NET define the parameter in this way:

cmd.Parameters.Add("@strval", SqlDbType.NVarChar).Value = city;

When you leave out the length, .NET will set the length from the actual value of city. So this is why you should always set the length explicitly for string and binary parameters. And stay away from AddWithValue.

Note: Being an honest person, I need to confess that I have lied a bit here. For very simple queries, SQL Server will always employ auto-parameterisation, so that constants are replaced by parameter placeholders. The INSERT statement in the clientcode program is an example of such a query. There is also a database setting, forced parameterisation, under which SQL Server replaces about any constant in a query with a parameter. This setting is indeed intended to be a cover-up for poorly written applications that inline parameter values into SQL strings. But you should never make any assumption of that auto-parameterisation will save the show, but you should always use parameterised statements. In this note I should also point out that there is a configuration option optimize for ad hoc workloads. When this is set, the plan for an unparameterised query string is not cached the first time it appears, but only a shell query is stored in the cache. The plan is cached only if the exact same query string appears a second time. This reduces the memory footprint, but the hash collisions will still be there. It's generally considered best practice to have this option turned on, but with a well-written application, it should not really matter.

Note that this applies to values that comes as input from the user or some other source. A query might have a condition like

AND customers.isactive = 1

That, the query is hard-wired to return only active customers. In this case, there is absolutely no reason to make the 1 a parameter.

There is one more measure you need to take to reduce cache-littering: always schema-qualify your tables, even if all your tables are in the dbo schema. That is, you should write:

cmd.CommandText = @"SELECT ... FROM dbo. sometable JOIN dbo. someothertable ON ... WHERE col1 = @id"

The reason for this is as follows: Assume two users Jack and Jill. Jack has the default schema Jack and Jill has the default schema Jill. If you leave out dbo and say only FROM sometable , SQL Server needs to consider that at any point in time there may appear a table named Jack.sometable or Jill.sometable, in which case Jack's and Jill's queries should go against these respective tables. For this reason, Jack and Jill cannot share the same cache entry, but there has to be one cache entry each for them. If there are many users, this can lead to many cache entries for the same query.

Observe that this applies to all objects that belongs to a schema, not only tables. If you miss a single object, the cache entry cannot be shared by users with different default schemas. Here is a list of object types for which you need to specify the schema:

Tables.

Views.

User-defined functions of all sorts.

Stored procedures.

Synonyms.

Service Broker queues.

Sequences.

User-defined types of all sorts. (Table types, CLR types etc).

XML Schema collections.

Keep in mind that this not only applies to the statements in the dynamic SQL itself, but also to the parameter list which also is part of what is cached. As one example, we had this statement earlier in the text:

EXEC sp_executesql N'SELECT a FROM @tbl SELECT @a', N'@tbl dbo. mytbltype READONLY, @a int', @mytbl, @a

Note: When you create a user with CREATE USER, the default schema is set to dbo, so there is a fair chance that all users have the same default schema, and thus can share cached query plans even if you leave out the schema. But there is no reason to assume that this is always true. There are older commands to create users, and they will also create a schema for the user and make that the default schema.

In this section I have only used examples in client code, but everything I have discussed applies to dynamic SQL created in a stored procedure as well.

Working with Dynamic SQL – a Style Guide

You have learnt the basic commands for dynamic SQL, and you have been warned about SQL injection. Now is the time to learn more about dynamic SQL from a practical standpoint. You may think that writing dynamic SQL is a simple task: build strings, how difficult can it be? From experience, I can tell you that it is more difficult to get it right that it may seem initially.

Writing good dynamic SQL is not only about skill. No, maybe the most important virtue is discipline. If you are consistent and careful when you write your dynamic SQL, you will have fewer bugs and your code will be easier to read and maintain. On the other hand, poorly written code that generates dynamic SQL tends to be very difficult, not to say impossible, to read and understand. The generated code can easily get buried in the T‑SQL code that generates it with all its single quotes and plusses. Generally, the fewer of these you have, the more readable your code will be, and the less is the risk for errors. You have already learnt the most important device: use parameterised statements. But since you cannot parameterise everything, that is not enough. In this chapter I will give some tips on how to improve your dynamic SQL from different angles: how to make it more readable, easier to troubleshoot, more robust and more secure.

Start Out Static

Before you start writing dynamic SQL, you need to have a good understanding of what the query that you are aiming to generate is going to look like. The best way is to start with writing a fully static query, where you temporarily hardcode the parts you want to be dynamic. Once you have the static query working, you can use that a starting point to write the code that builds the dynamic query. If you try to solve all at once, you may find yourself in a situation where you don't know if it is the query logic as such that is wrong, or whether you have built the query in the wrong way.

The Necessity of Debug Prints

If you are new to dynamic SQL the first thing you need to learn is to use debug prints. All stored procedures that deals with dynamic SQL, should include a final parameter @debug:

@debug bit = 0 AS

And in everywhere in the code where you are about to run some dynamic SQL, you should have:

IF @debug = 1 PRINT @sql EXEC sp_executesql @sql, @params, @par1, ...

This is absolutely essential, because every once in a while your SQL string will not do what you intended to, and it may even fail to even compile. If you get an error message, and you don't see what the string looks like, you are in a completely dark room. On the other hand, if you see the SQL code, the error may be immediately apparent.

If your SQL string is very long, the statement may come out as a truncated. The PRINT statement prints at most 8000 bytes, so with nvarchar(MAX) for your @sql variable, you will only see up to 4000 characters.

One way to avoid this is to use SELECT instead of PRINT. That gives you the string in a grid cell in SSMS, and when you copy it to a text window, you may find that it is all one line. Thankfully, modern versions of SSMS has a setting to control this. In SSMS, go to Tools -> Options and check the setting Retain CR/LF on copy or save highlighted as in this picture:

You can also see that I have increased the maximum for Non XML Data to two million (the default is 65536) to avoid truncation with very long strings. Observe if you change these settings, none of them affect currently open query windows, but only windows you open after the change.

Note: To see the max above 65536, you need to have SSMS 18.2 or later. The checkbox for retaining CR/LF was introduced in SSMS 16 and it is unchecked by default. If you don't see this setting, you have an older version of SSMS. In SSMS 2012 and 2014, CR/LF are always retained, and in SSMS 2005 and 2008, they are always replaced by a single space. In any case, since SSMS is a free download, there is little reason to use an old version.

While this works, it is not without issues:

If you frequently copy data to Excel, you may prefer to keep the Retain setting unchecked. (Because if there are line breaks in the data, the mapping to cells in Excel will be messed up.)

setting unchecked. (Because if there are line breaks in the data, the mapping to cells in Excel will be messed up.) There are situations where extra result sets from your stored procedure could mess things up, for instance if you use INSERT-EXEC , so you would prefer to get the code printed to the Messages tab in SSMS.

, so you would prefer to get the code printed to the tab in SSMS. If your procedure produces several batches of dynamic SQL, it may be more practical to have them all collected in the Messages tab, rather than having to copy multiple result sets from the query grid.

I got a suggestion from Jonathan Van Houtte that addresses the first point (but the the other two) which makes use of the XML capabilities in SQL Server and SSMS:

SELECT @sql AS [processing-instruction(x)] FOR XML PATH('')

In the grid cell, you get a clickable XML string and when you click on it, it opens in a new window where you see the SQL text with the CR/LF retained. The code is wrapped in a <?x tag, but apart from that, there is no mutilation due to the XML. Only if the SQL string would happen to include the sequence ?> , you could get problems.

My co-worker Jan Swedin came up with the ultimate solution to this problem: a stored procedure that breaks up the text in chunks of 8000 bytes, taking care that each chunk ends with a line break, and then he runs a PRINT on every chunk. Here is an adapted version of his procedure:

CREATE OR ALTER PROCEDURE SystemPrint @str nvarchar(max) AS SET XACT_ABORT, NOCOUNT ON; DECLARE @noofchunks int, @chunkno int = 0, @pos int = 0, -- substring start posotion @len int; -- substring length SET @str = replace(@str, nchar(13), ''); SET @noofchunks = ceiling(len(@str) / 4000.0); WHILE @chunkno < @noofchunks BEGIN SET @len = 4000 - charindex(nchar(10) COLLATE Latin1_General_BIN2, reverse(substring(@str, @pos, 4000))); PRINT substring(@str, @pos, @len); SET @chunkno += 1; SET @pos += @len + 1; -- accumulation + LF END

Note: there is a known issue with this procedure. If the length of SQL string is close to a full chunk, for instance 7998 characters, the last few characters in the string may not print. It is left as an exercise to the reader to understand how this can happen and fix the issue.

No matter which of these methods you go for, always have a debug print of your dynamic SQL. And, yes, I mean ALWAYS!

Error Messages and Line Numbers

When you work with dynamic SQL, you will ever so often encounter error messages. Sometimes the error is related to the code that generates the dynamic SQL, sometimes it comes from the dynamic SQL itself. In either case, it can be a compilation error or a run-time error. It goes without saying that you need to understand the origin of the error, so that you start looking at the right piece of code. Unfortunately, this is something which is more complicated than it has to be, due to a feature in recent versions of SSMS.

For this section we will work with the task of creating an SQL login, a database user for that login and add the user to the role PlainUsers. Input for the operation is the login name and the password. We will first look at the situation where the dynamic SQL is generated inside a stored procedure. Copy this code, which has a number of errors in it, into an empty query window.

USE tempdb go IF NOT EXISTS (SELECT * FROM sys.database_principals WHERE name = 'PlainUsers') CREATE ROLE PlainUsers go CREATE OR ALTER PROCEDURE add_new_user @name sysname, @password nvarchar(50), @debug bit = 0 AS DECLARE @sql nvarchar(MAX) SELECT @sql = 'CREATE LOGIN ' + quotename(@name) + ' WITH PASSWORD = ' + quotename(@password, '''') + ' MUST_CHANGE, CHECK_EXPIRATION CREATE USER ' + quotename(@name) + ' ALTER ROLE PlainUsers ADD MEMBER ' + quotename(@name) IF @debug = 1 PRINT @sql EXEC sp_exectesql @sql go EXEC add_new_user 'nisse', 'TotaL1y#and0m', 1,

Note: If you are running these examples on SQL 2008 or earlier, you may observe that the code uses the syntax ALTER ROLE ADD MEMBER which was added in SQL 2012. However, since we are exploring error messages, this should not be of much practical importance to you.

The first error message we encounter when we try this piece of code has no relation to dynamic SQL as such, but serves to highlight a behaviour in modern versions of SSMS that is important to understand for this section. This is the output when you run the script in SSMS 2014 or later.

The module 'add_new_user' depends on the missing object 'sp_exectesql'. The module will still be created; however, it cannot run successfully until the object exists. Msg 102, Level 15, State 1, Line 19 Incorrect syntax near ','.

If you double-click the error message, you will find that you end up on the line for the EXEC statement after the procedure, and indeed there is a trailing comma that should not be there. And if you look in the status bar, you will find that it says Ln 19, which agrees with the error message. However, this is not the error number that SQL Server reports. Keep in mind that SSMS sends one batch, as defined by the go separator, at a time to SQL Server, which has no knowledge about your query window. The call to add_new_user is a single-row batch, so the actual line number reported by SQL Server is Line 1. This is also the line number you would see if you were to run this in SSMS 2012 or earlier. You can convince yourself of this fact by saving the script to a file and run it through SQLCMD, which make no adjustments. Why this important to understand will prevail in just a few moments.

Remove the offending comma (or replace it with a semicolon) and try 