UPDATE Dapper is now open source

A few weeks ago we started investigating some performance issues at Stack Overflow.

Our web tier was running hot, it was often hitting 100% CPU. This was caused by a combination of factors, we made a few mistakes here and there. We allowed a few expensive operation to happen a bit too concurrently and … there were some framework issues.

Google’s monster spider machine

Google loves Stack Overflow, it loves it so much that it will crawl 10 pages a second sometimes. This is totally by design, we told Google: “crawl us as fast as you can” and Google did it’s best to comply.

The trouble was that crawling our question pages was expensive business.

We maintain a list of related questions on every question page, this list is rebuilt once a month in a on demand background process. This process is usually pretty fast, however when its happening 100,000 times a day in huge bursts … it can get a bit costly.

So, we fixed that up. We changed some stuff so Google triggers less background activity. However, performance problems persisted.

In frameworks we trust

Our question show page does a fair bit of database work. It needs do a bunch of primary key lookups to pick up all the questions, answers, comments and participating users. This all adds up. I spent a fair bit of time profiling both production and dev; something kept on showing up in my cpu analyzer traces

System.Reflection.Emit.DynamicMethod.CreateDelegate System.Data.Linq.SqlClient.ObjectReaderCompiler.Compile System.Data.Linq.SqlClient.SqlProvider.GetReaderFactory System.Data.Linq.SqlClient.SqlProvider.System.Data.Linq.Provider.IProvider.Compile System.Data.Linq.CommonDataServices+DeferredSourceFactory`1.ExecuteKeyQuery System.Data.Linq.CommonDataServices+DeferredSourceFactory`1.Execute System.Linq.Enumerable.SingleOrDefault System.Data.Linq.EntityRef`1.get_Entity

Much of our work at Stack Overflow depended on the assumption that LINQ-2-SQL is fast enough. Turns out it was not fast enough for us.

In the trace above you can see that EntityRef<T> is baking a method, which is not a problem, unless it is happening 100s of times a second.

Build your own ORM 101

There are tons of Object-Relational-Mappers for the .Net framework. There is an antique question on SO discussing which is best as well as probably tens if not hundreds of similar questions.

When we started off Stack Overflow we chose LINQ-2-SQL. Both LINQ-2-SQL and the more recent magic unicorn Entity Framwork 4.1 are built using a similar pattern.

You give the ORM a query, it builds SQL out of it, then it constructs a DynamicMethod to materialize data coming back from SQL into your business objects. I am not privy to the exact implementation, it may be including more or less work in these dynamic methods.

This implementation is an abstraction, and it leaks. It leaks performance, you give up some control over the queries you can run and have to deal with a LEFT JOIN syntax that is so crazy it can make grown men cry.

The promise of CompiledQuery

Microsoft have been aware that in some cases you do not want your ORM re-generating the same method millions of times just to pull out the same Post objects from your DB. To overcome this there is a mechanism in place that allows you to elect to re-use the method it generates across queries. Much has been written about the performance you can gain.

The trouble is that the syntax is kind of clunky and there are certain places you can not use compiled queries. In particular EntityRef and EntitySet.

My solution

Due to various limitations much of our SQL is hand coded in LINQ ExecuteQuery block. This means we have lots of inline SQL similar to the following

var stuff = Current.DB.ExecuteQuery("select a,b,c from Foo where a = {0}", x)

This is the code we would always fall back to when performance was critical or we needed better control. We are all comfortable writing SQL and this feels pretty natural.

The trouble is that LINQ-2-SQL is still a bit slow when it comes to ExecuteQuery. So we started converting stuff to compiled queries.

I decided to experiment and see if I can out-perform ExecuteQuery by introducing a smarter caching mechanism. Once we had a replacement we could easily convert our existing code base to use it.

So, I wrote a tiny proof of concept ORM gist of rev1. IL weaving does get tricky.

It can act as a fast replacement for both compiled queries and ExecuteQuery methods. In future, I hope to release a more complete mapper so people can reuse it.

h2. Some simple benchmarks

Our workload at Stack Overflow is heavily skewed toward very very cheap queries that to a simple clustered index lookup. The DB returns the data instantly.

Take this for example:

create table Posts ( Id int identity primary key, [Text] varchar(max) not null, CreationDate datetime not null, LastChangeDate datetime not null, Counter1 int, Counter2 int, Counter3 int, Counter4 int, Counter5 int, Counter6 int, Counter7 int, Counter8 int, Counter9 int )

On my machine the cost of pulling out 100 random posts and turning them into Post objects are:

LINQ 2 SQL elapsed 335 ms LINQ 2 SQL compiled 207 ms LINQ 2 SQL ExecuteQuery 242 ms Sams ORM 174 ms Entity Framework 4.1 elapsed 550ms Hand coded 164ms

So, LINQ-2-SQL can take double the amount of time to pull out our poor post, but that is not it. The trouble is that the extra 160ms is CPU time on the web server. The web server could simply be idle waiting for SQL to generate data, but instead it is busy rebuilding the same methods over and over.

You can make stuff a fair bit faster with compiled queries, however they are still as expected slower than hand coding. In fact there is quite a gap between hand coding and compiled queries.

Hand coding which is fastest is full of pain bugs and general sadness.

post.Id = reader.GetInt32(0); post.Text = reader.GetString(1); post.CreationDate = reader.GetDateTime(2); post.Counter1 = reader.IsDBNull(4) ? (int?)null : reader.GetInt32(4); // this makes me want to cry

I would take this over it any day:

var post = connection.ExecuteQuery("select * from Posts where Id = @Id", new {Id = 1}).First();

Another interesting fact is that Entity Framework is slowest and also most awkward to use. For one its context can not reuse an open connection.

Other benefits

You are probably thinking, oh-no, yet another ORM. I thought the world had more ORMs than blogging engines. Our use case is very narrow and I think it makes sense.

We are using our new ORM for a specific problem: mapping parameterized SQL to business objects. We are not using it as a full blown ORM. It does not do relationships and other bells and whistles. This allows us to continue using LINQ-2-SQL where performance does not matter and port all our inline SQL to use our mapper, since it is faster and more flexible.

This control means we can continue tweaking performance and adding features we have long waited for.

For example:

var awesome = connection.ExecuteQuery("select * from Posts where Id in (select * from @Ids)", new {Ids = new int[] {1,2,3}});

SQL 2008 has a new awesome Table-Valued Parameters, which would be awesome to use. Many ORMs shy away from supporting this cause you need to define a table type in SQL Server. However, we are in control here, so we can make it happen.

Recap

The end result of this round of optimization which involved a massive amount of rewriting, hardware tuning and such was that the question show page now only takes 50ms of server time to render. At its worst it was up at the 200ms server time. This performance boost was totally worth it. Optimizing LINQ-2-SQL played a key part of achieving this.

However, I would not recommend people go crazy and dump LINQ-2-SQL just because of this. My recommendation would be: measure, find bottlenecks and fix the ones that matter.