It all started with the question of which approach will be faster – encrypt just the pieces of data in a file that need encrypted – or encrypt the entire file?

Take for example, a books.xml file where the <author> element value must be encrypted. Is it faster to encrypt the entire file, or just the individual author elements, or will it not matter because disk IO is a huge bottleneck?

These are the types of scenarios that are easy to hypothesize over, but it’s also easy to whip up some code to produce qualitative answers and put the answers in colorful charts that can be printed on glossy paper and hung on an office wall where a visitor’s natural reaction will be to ask “what’s this?”, at which point you can bombard your visitor with minutiae about symmetric encryption initialization vectors and after 10 minutes they will want to leave without remembering that they had first come into your office to ask about that nasty work item #9368, which you still haven’t fixed.

That’s called victory.

But as for the code, first we need a method to time some arbitrary action over a number of iterations.

private static void Time(Action action, string description) { var stopwatch = new Stopwatch(); stopwatch.Start(); for (int i = 0; i <= IterationCount; i++) { action(); } stopwatch.Stop(); Console.WriteLine("{0} took {1}ms", description, stopwatch.ElapsedMilliseconds); }

Then some code to encrypt an entire file.

private static void EncryptFileTest() { var provider = new AesCryptoServiceProvider(); var encryptor = provider.CreateEncryptor(provider.Key, provider.IV); using (var destination = File.Create("..\\..\\temp.dat")) { using (var cryptoStream = new CryptoStream(destination, encryptor, CryptoStreamMode.Write)) { var data = File.ReadAllBytes(FileName); cryptoStream.Write(data, 0, data.Length); } } }

Then some code to encrypt just the author fields.

private static void EncryptFieldsTest() { var provider = new AesCryptoServiceProvider(); var encryptor = provider.CreateEncryptor(provider.Key, provider.IV); var document = XDocument.Load(FileName); var names = document.Descendants("author"); foreach (var element in names) { using (var destination = new MemoryStream()) { using (var cryptoStream = new CryptoStream(destination, encryptor, CryptoStreamMode.Write)) { using (var cryptoWriter = new StreamWriter(cryptoStream)) { cryptoWriter.Write(element.Value); } element.Value = Convert.ToBase64String(destination.ToArray()); } } } document.Save("..\\..\\temp.xml"); }

The results of the benchmark on the small books.xml file (28kb) showed that encrypting individual fields generally came out 3-25% faster than encrypting an entire file.

Such wide variances made me suspect that disk I/O was too unpredictable, so I also ran tests where the timing took place on in-memory operations only, and all disk IO took place before the encryption work, like with the following code.

var bytes = File.ReadAllBytes(FileName); var document = XDocument.Load(FileName); var elements = document.Descendants("author").ToList(); Time(() => EncryptFileTest(bytes), "EncryptFile"); Time(() => EncryptFieldsTest(elements), "EncryptFields");

Now the results started to show that encrypting one big thing was a regularly 20% faster than encrypting lots of little things.

The larger the input data, the faster it became to encrypt all at once.

Then after playing with various parameters, like different provider modes, an amazing thing happened. I switch from AesCryptoServiceProvider (which provides an interface to the native CAPI libraries) to AesManaged (which is a managed implementation of AES and not FIPS compliant, but that’s a topic for another post). Encrypting the entire file was 6x slower with managed code compared to CAPI, which wasn’t the surprising part. The surprising part was that encrypting fields with AesManaged was much faster than encrypting the entire file with AesManaged, and in fact, encrypting fields with AesManaged was almost twice as fast as encrypting fields with AesCryptoServiceProvider, and almost as fast as encrypting the entire file with a CSP.

After double checking to make sure this wasn’t a fluke, I came to three conclusions.

1. Once again, benchmarks prove more useful than a hypothesis, because the numbers are often counterintuitive.

2. It must be much more efficient to reuse an AesManaged provider to create multiple crypto streams than reusing an AES CSP.

3. There is still enough variability that testing against sample data like books.xml won’t cut it, I’ll need to work against real files (which might easily hit 500MB, maybe 1GB, but I hope not).

This is the point where people smarter than me will tell me everything I’ve done wrong.