It is, I hope, well-known that naïve string concatenation in a loop is a quadratic “Schlemiel” algorithm. (If you haven’t read Joel’s essay, stop reading this and go read it now.)

Even though strings in .NET are length-prefixed as Joel discusses in his essay, it’s still the case that naïve string concatenation is slow in C#. If you have:

string s = ""; foreach(var item in list) s = s + M(item);

then we have a Schlemiel algorithm. If each item produces, say, 10 chars, then the first time through the loop we copy 10 chars. The second time we copy the original 10 and the new 10, for 20. The third time we copy the last 20 and the new 10, for 30. The fourth time, we copy 40. The fifth time, 50. Add those all up: 10 + 20 + 30 + 40 + 50 … and in order to get a 1000 char string, you end up copying over 50000 chars. That’s a lot of copying and a lot of pressure on the garbage collector. The right way to do this is of course:

StringBuilder sb = new StringBuilder(); foreach(var item in list) sb.Append(M(item)); string s = sb.ToString();

(Alternatively, string.Concat(list.Select(item=>M(item)) would also work, but that uses similar techniques behind the scenes anyways.)

The StringBuilder object is carefully designed to have linear, not quadratic, performance in this scenario. (How it achieves this has changed over the years; in the past it has used a double-when-full strategy. More recently I believe it uses a linked list of large blocks. This is a nice illustration of how a clean API design allows you to radically change implementation details without breaking users.) The question then is: why does the C# compiler not simply transform the first code into the second code automatically for you, if that’s so much more efficient?

That’s a good question. When I was on the JScript.NET team back in the day we added this optimization to that language. But in that case we had the expectation that JScript.NET programmers would not be familiar with the .NET framework, coming from a JavaScript background. We have the expectation that C# programmers will be more familiar with the tools in the .NET framework, which is therefore points against the optimization. Also, it’s not at all clear when the optimization is actually an optimization. Both string concatenation and string builders are very fast and there definitely are scenarios where one is faster than the other. Making that choice on behalf of the user seems wrong; C# developers make the reasonable assumption that the code they write is the code that runs. On yet a third hand, some Java compilers apparently perform a similar optimization by default, so maybe there are good reasons for this to happen in C# as well. In any event, the optimization is not there today, so use StringBuilder if you have a lot of concatenation to do.