Total: 12 Average: 3.7

When adding or modifying a large number of records (10³ and more), the Entity Framework performance is far from perfect. The reasons are architectural peculiarities of the framework, and non-optimality of the generated SQL. Leaping ahead, I can reveal that saving data through a bypass of the context significantly minimizes the execution time.

Contents

Insert/Update with the Standard Means of Entity Framework

Let’s start with Insert. A standard way of adding new records to a DB is adding them to the context with the subsequent saving:

context.Orders.Add(order); context.SaveChanges();

Each call of the Add method leads to an ‘expensive’ (in terms of execution) call of the DetectChanges internal algorithm. This algorithm scans all entities in the context, compares the current value of each property with a source value that is stored in the context, and updates links between entities, etc. Until the release of EF6, disabling DetectChanges for the time of adding entities to the content was the most popular way to improve performance:

context.Orders.Add(order)); context.Configuration.AutoDetectChangesEnabled = true; context.SaveChanges();

Also, it is not recommended to store dozens of thousands of objects in the context, as well as to save data in blocks with saving context and creating a new context for each N objects, as it is shown here. Finally, the optimized AddRange method has been introduced in EF 6. The method improves performance up to the level of the Add+AutoDetectChangesEnabled binding:

context.Orders.AddRange(orders); context.SaveChanges();

Unfortunately, the methods listed above do not solve the core problem, that is: during saving data into DB, a separate INSERT query is being generated for each new record!

INSERT [dbo].[Order]([Date], [Number], [Text]) VALUES (@0, @1, NULL)

The situation with Update is the same. The following code:

var orders = context.Orders.ToList(); //.. writing new data context.SaveChanges();

leads to execution of a separate SQL query for each modified object:

UPDATE [dbo].[Order] SET [Text] = @0 WHERE ([Id] = @1)

In the most simple cases, EntityFramework.Extended can help:

//update all tasks with status of 1 to status of 2 context.Tasks.Update( t => t.StatusId == 1, t2 => new Task { StatusId = 2 });

This code is executed through a bypass of the context and generates 1 SQL query. Obviously, this solution is not universal and serves only for the recording the same value into all target rows.



In Search of Solution for the Problem

Since I hate reinventing the wheel, I started searching for the best practices of the bulk insert with EF. It seemed to be a typical task, but I failed to find a suitable solution ‘out of the box’. At the same time, SQL Server offers a range of quick insert techniques, such as the bcp utility and the SqlBulkCopy class. I will further consider the latter class in detail.

System.Data.SqlClient.SqlBulkCopy is an ADO.NET class for witting large volumes of data into SQL Server tables. It can use DataRow[], DataTable, or implementation of IdataReader as a data source.

It can:

send data to the server block-by-block with support for transactions;

execute column mapping from DataTable against the DB table;

ignore constraints, foreign keys during insert (optional).

Cons:

insert atomicity (optional);

failure to continue work after exception;

weak error processing options.

Let’s go back to our problem – absence of the SqlBulkCopy and EF integration. There is no conventional approach to solving this task, but there are several projects, such as:

EntityFramework.BulkInsert

It turned out practically not working. When I was studying Issues, I stumbled upon a discussion with Julie Lerman who described a problem similar to mine that remained unanswered.

EntityFramework.Utilities

Alive project, active community. No support for Database First, but developers promise to implement it.

Entity Framework Extensions

$300.



Integration of Entity Framework and SqlBulkCopy

Let’s try to do everything on our own. In the most simple case, the data insert from the object collection with SqlBulkCopy looks in the following way:

//entities - entity collection EntityFramework using (IDataReader reader = entities.GetDataReader()) using (SqlConnection connection = new SqlConnection(connectionString)) using (SqlBulkCopy bcp = new SqlBulkCopy(connection)) { connection.Open(); bcp.DestinationTableName = "[Order]"; bcp.ColumnMappings.Add("Date", "Date"); bcp.ColumnMappings.Add("Number", "Number"); bcp.ColumnMappings.Add("Text", "Text"); bcp.WriteToServer(reader); }

The task to implement IdataReader on the basis of the object collection is rather trivial, so I put here a link and switch to the ways of error handling during insert with SqlBulkCopy. By default, data insert is handled in its own transaction. When exception occurs, SqlException is thrown and rollback takes place. That is, data in DB will not be written at all, and ‘native’ error notifications of this class are far from being informative. For example, SqlException.AdditionalInformation can contain the following:

The given value of type String from the data source cannot be converted to type nvarchar of the specified target column. or:

SqlDateTime overflow. Must be between 1/1/1753 12:00:00 AM and 12/31/9999 11:59:59 PM.

Unfortunately, SqlBulkCopy does not provide information allowing to detect a row/entity that caused the error. There in one more con – during my attempt to insert a row duplicate by the primary key, SqlBulkCopy throws exception and stops working without an option to process the situation and continue execution.

Mapping

In the case of the correctly generated instances and DB, the type correspondence checks and table filed length checks (like here) become obsolete. Instead, let’s look into column mapping handled with the SqlBulkCopy.ColumnMappings property.

If the data source and the destination table have the same number of columns, and the ordinal position of each source column within the data source matches the ordinal position of the corresponding destination column, the P:System.Data.SqlClient.SqlBulkCopy.ColumnMappings collection is unnecessary. However, if the column counts differ, or the ordinal positions are not consistent, you must use ColumnMappings to make sure that data is copied into the correct columns.

In 99% of cases, you will need to explicitly set ColumnMappings for EF (because of navigation properties and any additional properties.). The navigation properties can be eliminated with Reflection:

var columns = typeof(Order).GetProperties() .Where(property => property.PropertyType.IsValueType || property.PropertyType.Name.ToLower() == "string") .Select(property => property.Name) .ToList();

Such code is good for the POCO class without additional properties. Otherwise, we have to switch to ‘manual control’. Getting the table schema is simple as well:

private static List GetColumns(SqlConnection connection) { string[] restrictions = { null, null, "", null }; var columns = connection.GetSchema("Columns", restrictions) .AsEnumerable() .Select(s => s.Field("Column_Name")) .ToList(); return columns; }

It allows us to map source class and target table manually.

Usage of the SqlBulkCopy.BatchSize Propety and SqlBulkCopyOptions Class

SqlBulkCopy.BatchSize:

BatchSize Number of rows in each package. In the end of each package, the number of rows it contains is being sent to the server.

qlBulkCopyOptions — enumeration:

Member name Description CheckConstraints Check constraints while data is being inserted. By default, constraints are not checked. Default Use the default values for all options. FireTriggers When specified, cause the server to fire the insert triggers for the rows being inserted into the database. KeepIdentity Preserve source identity values. When not specified, identity values are assigned by the destination. KeepNulls Preserve null values in the destination table regardless of the settings for default values. When not specified, null values are replaced by default values where applicable. TableLock Obtain a bulk update lock for the duration of the bulk copy operation. When not specified, row locks are used. UseInternalTransaction When specified, each batch of the bulk-copy operation will occur within a transaction. If you indicate this option and also provide a SqlTransaction object to the constructor, an ArgumentException occurs.

Optionally, we can enable the check for triggers and limitations on the DB side (disabled by default). When BatchSize and UseInternalTransaction are specified, data is sent to the server by blocks in separate transactions. Therefore, all successful blocks will be being saved to DB until the first erratic block.



Advanced Insert with MERGE

SqlBulkCopy can only add records to a table, and does not provide any functionality for modification of existing records. Nevertheless, we can speed up execution of the Update operations! How? – We can insert data into an empty table, and then synchronize tables with help of the MERGE statement, that was initially introduced in SQL Server 2008:

MERGE (Transact-SQL) Performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.

With MERGE, we can easily implement various duplicate processing logic: updating data in the target table, or ignoring (even deleting) identical records. Therefore, we can save data from the EF object collection into DB, according to the following algorithm:

Creation/deletion of a temporary table that is fully identical to the source table. Inserting data with SqlBulkCopy into the temporary table. Using MERGE to add records from temporary table to the target table.

Let’s consider steps 1 and 3 in detail.

Temporary table

We need to create a table in DB that would repeat schema of a table for data insert. Creating copies manually is the worst option, since the further comparison and synchronization of table schemas will fall on your shoulders. More reliable option is to copy schema programmatically right before the insert. For example with help of SQL Server Management Objects (SMO):

Server server = new Server(); //SQL auth server.ConnectionContext.LoginSecure = false; server.ConnectionContext.Login = "login"; server.ConnectionContext.Password = "password"; server.ConnectionContext.ServerInstance = "server"; Database database = server.Databases["database name"]; Table table = database.Tables["Order"]; ScriptingOptions options = new ScriptingOptions(); options.Default = true; options.DriAll = true; StringCollection script = table.Script(options);

It’s worth paying attention to the ScriptingOptions class containing several dozens of parameters for a profound setup of the generated SQL. We will deploy the resulting StringCollection in String. Unfortunately, I haven’t found a better solution than replacing name of the source table with name of the temporary table, like String.Replace(«Order», «Order_TEMP»). I will be grateful for suggesting a better solution for creating a table copy within a single DB. Let’s execute the scrip in any convenient way. Table copy has been created!

Nuances of SMO usage in .NET 4+

Note, that call of Database.ExecuteNonQuery in .NET 4+, throws an exception, like:

Mixed mode assembly is built against version ‘v2.0.50727’ of the runtime and cannot be loaded in the 4.0 runtime without additional configuration information.

The reason is that a great SMO library exists only for .NET 2 Runtime. Fortunately, there is a workaround:

...

The

other option is to use Database.ExecuteWithResults .

Copying data from temporary table to target table

The only thing left is to execute the MERGE statement that compares contents of the temporary and target tables and performs update or insert (if required). For instance, for the [Order] table, the code may look in the following way:

MERGE INTO [Order] AS [Target] USING [Order_TEMP] AS [Source] ON Target.Id = Source.Id WHEN MATCHED THEN UPDATE SET Target.Date = Source.Date, Target.Number = Source.Number, Target.Text = Source.Text WHEN NOT MATCHED THEN INSERT (Date, Number, Text) VALUES (Source.Date, Source.Number, Source.Text);

This SQL query compares records from the [Order_TEMP] temporary table with the records of the [Order] target table, and executes Update, if a record with identical value in the Id field is found, or Insert, if such record was not found. Let’s execute the code in any convenient way, and that’s it! Don’t forget to clean/delete temporary table.



Performance Comparison

Execution environment: Visual Studio 2013, Entity Framework 6.1.1. (Database First), SQL Server 2012. The [Order] table was used for testing (the table schema was provided above). Execution time for approaches to saving data to DB described in this article has been measured. The results are provided below (time is specified in seconds):

Insert

Method of changes fixation in database Number of records 1000 10000 100000 Add + SaveChanges 7,3 101 6344 Add + (AutoDetectChangesEnabled=false) + SaveChanges 6,5 64 801 Add + separate context + SaveChanges 8,4 77 953 AddRange + SaveChanges 7,2 64 711 SqlBulkCopy 0,01 0,07 0,42

Wow! If we use the Add method for adding to the context and SaveChanges for saving, the saving of 100000 records to DB will take almost 2 hours! Meanwhile, SqlBulkCopy copes with the same task for less than a second!

Update

Method of changes fixation in database Number of records 1000 10000 100000 SaveChanges 6,2 60 590 SqlBulkCopy + MERGE 0,04 0,2 1,5

Again, SqlBulkCopy is a hands-down leader. The source code of the test application is available at GitHub.



Conclusion

When working with the context, containing large number of objects (10³ and more), refusal of the Entity Framework infrastructure (adding to context + saving the context) and switch to SqlBulkCopy for adding records to DB can improve performance in dozens, and even hundreds of times. However, IMO, ubiquitous usage of the EF+SqlBulkCopy binding is an obvious signal that there is something wrong with the architecture of your application. The approach described in this article should be considered as a simple mean for improving performance in bottlenecks of the already written systems, if changing architecture/technology is for some reason is challenging. Any developer who uses Entity Framework must know strong and weak points of this tool. Good luck!

References:

EntityFramework: Add, AddRange

Secrets of DetectChanges

Performance Considerations for Entity Framework 4, 5, and 6

Entity Framework Performance

Entity Framework and slow bulk INSERTs

SqlBulkCopy

Using SqlBulkCopy To Perform Efficient Bulk SQL Operations

SqlBulkCopy+data reader

Creating a Generic List DataReader for SqlBulkCopy

SqlBulkCopy for Generic List

SqlBulkCopy+ MERGE

C# Bulk Upsert to SQL Server Tutorial