Problem

You need to insert large number of records into one or more tables in a SQL Server database. By large, I mean several hundred thousands or even millions. The source of the data can be another database, an Excel spreadsheet, CSV files, XML and literally anything. Writing one record at a time using a SqlCommand along with a INSERT INTO statement is very costly and slow. You need an efficient solution to insert large number of records quickly.

Solution

In .NET, we have a class called SqlBulkCopy to achieve this. You give it a DataTable, or an array of DataRows, or simply an instance of a class that implements IDataReader interface. Next, you tell it the name of the destination table in the database, and let it do its magic.

I used this in one of my projects were I had to migrate data from one of our legacy models into a new model. First, I used a plain SqlCommand to insert one record at a time (totally about 400,000 records). The process took 12 minutes to run. With SqlBulkCopy, I reduced the data migration time to 6 seconds!

How to use it?

Here is the most basic way to use SqlBulkCopy. Read the code first and then I’ll explain it line by line.

var dt = new DataTable(); dt.Columns.Add("EmployeeID"); dt.Columns.Add("Name"); for (var i = 1; i < 1000000; i++) dt.Rows.Add(i + 1, "Name " + i + 1); using (var sqlBulk = new SqlBulkCopy(_connectionString)) { sqlBulk.DestinationTableName = "Employees"; sqlBulk.WriteToServer(dt); }

In this example I’m assuming we have a table in the database called Employees with two columns: EmployeeID and Name. I’m also assuming that the EmployeeID column is marked as IDENTITY.

In the first part, we simply create a DataTable that resembles the structure of the target table. That’s the reason this DataTable has two columns. So, to keep things simple, I’m assuming that the source DataTable and the target table in the database have identical schema. Later in this post I’ll show you how to use mappings if your source DataTable has a different schema.

The second part is purely for demonstration. We populate this DataTable with a million record. In your project, you get data from somewhere and put it into a DataTable. Or you might use an IDataReader for more efficient reads.

And finally, we create a SqlBulkCopy and use it to write the content of our DataTable to the Employees table in the database.

Pretty simple, right? Let’s take this to the next level.

Using the identity values from the source

In the above example, I assumed that the EmployeeID column is marked as IDENTITY, hence the values are generated by the Employees table. What if you need to use the identity values in the source? It’s pretty simple. You need to use the KeepIdentity option when instantiating your SqlBulkCopy.

using (var sqlBulk = new SqlBulkCopy(_connectionString, SqlBulkCopyOptions.KeepIdentity))

With this option, the EmployeeID in our DataTable will be used.

Transactions

You can wrap all your inserts in a transaction, so either all will succeed or all will fail. This way you won’t leave your database in an inconsistent state. To use a transaction, you need to use a different constructor of SqlBulkCopy that takes a SqlConnection, options (as above) and a SqlTransaction object.

using (var connection = new SqlConnection(_connectionString)) { connection.Open(); var transaction = connection.BeginTransaction(); using (var sqlBulk = new SqlBulkCopy(connection, SqlBulkCopyOptions.KeepIdentity, transaction)) { sqlBulk.DestinationTableName = "Employees"; sqlBulk.WriteToServer(dt); } }

Note that here we have to explicitly create a SqlConnection object in order to create a SqlTransaction. That’s why this example is slightly more complicated than the previous ones where we simply passed a string (connection string) to SqlBulkCopy. So here we need to manually create the connection, open it, create a transaction, and then pass the connection and the transaction objects to our SqlBulkCopy.

Batch size

By default, all the records in the source will be written to the target table in one batch. This means, as the number of records in the source increases, the memory consumed by SqlBulkCopy will increase. If you have memory limitations, you can reduce the number of records written in each batch. This way, SqlBulkCopy will write smaller batches to the database, hence it will consume less memory. Since there are multiple conversations with the database, this will have a negative impact on the performance. So, based on your circumstances, you may need to try a few different batch sizes and find a number that works for you.

To set the batch size:

using (var sqlBulk = new SqlBulkCopy(_connectionString)) { sqlBulk.BatchSize = 5000; sqlBulk.DestinationTableName = "Employees"; sqlBulk.WriteToServer(dt); }

This example writes 5000 records in each batch.

Notifications

You might need to write a message in the console or a log as records are written to the database. SqlBulkCopy supports notifications. So it has an event to which you can subscribe. You set the number of records to be processed before a notification event is generated.

using (var sqlBulk = new SqlBulkCopy(_connectionString)) { sqlBulk.NotifyAfter = 1000; sqlBulk.SqlRowsCopied += (sender, eventArgs) => Console.WriteLine("Wrote " + eventArgs.RowsCopied + " records."); sqlBulk.DestinationTableName = "Employees"; sqlBulk.WriteToServer(dt); }

In this example, once every 1000 records are processed, we get notified and display a message on the Console. Note that SqlRowsCopied is the event name and here we use a lambda expression to create an anonymous method as the event handler. If you’re not familiar with lambda expressions, delegates and event handlers, check out my C# Advanced course.

Column mappings

In all the examples so far, I assumed our DataTable has the exact same schema as the target table. What if the “Name” column in the source is actually called “FullName”. Here is how we can create a mapping between the columns in the source and target table.

using (var sqlBulk = new SqlBulkCopy(_connectionString)) { sqlBulk.ColumnMappings.Add("FullName", "Name"); sqlBulk.DestinationTableName = "Employees"; sqlBulk.WriteToServer(dt); }

How about multiple tables?

So far we’ve only inserted records into one table: the Employees table. What if we wanted to populate Employees and their Timesheet? With our DataTables as the data source, we can have two data tables, one for Employees, one for Timesheets. Then, we use our SqlBulkCopy to populate one table at a time:

using (var sqlBulk = new SqlBulkCopy(_connectionString)) { sqlBulk.DestinationTableName = "Employees"; sqlBulk.WriteToServer(dtEmployees); sqlBulk.DestinationTableName = "Timesheets"; sqlBulk.WriteToServer(dtTimesheets); }

I hope you enjoyed this post and learned something new. If you enjoy my teaching style and like to learn more from me, subscribe to my blog. Also, check out my courses for more substantial learning.

Related

Tags: .net