How to boost application performance by choosing the right serialization

Improving performance of .NET application with different serialization formats

Pretty much any serious application now days needs to interact with any storage media such as disk or network. This is where serialization comes in. It is a process of thermoforming your data to stream which is essentially array of bytes. Streams are a sort of a wrapper around byte arrays which enable writing and reading.

Depending on a level your application relies on IO operations, the way you are serializing data can have significant impact on your application performances. More about serialization in .NET specifically you can find at https://docs.microsoft.com/en-us/dotnet/standard/serialization/

I will deal more on the concrete implementation of serialization and some brief comparison of performances. This should help you to some level to decide which type of serialization you will rely on.

Note The code that will be used for test and demonstration is available on github https://github.com/dejanstojanovic/dotnet-serialization

What are the options

.NET comes with few serialization types built in and of course there are third party ones. Each of them have their pros and cons. The types of serialization I am going to mention here are:

Binary

Xml

JSON

Avro

Protobuf

I am going to use simple project which wraps and abstracts different serialization formats through simple interface and we'll use Unit test project to check the speed of the process and size of serialized data.

All serializers will implement the following simple interface in the sample project

namespace Serialization.Samples.Serializers { public interface ISerializer<T> where T : class { byte[] Serialize(T objectToSerialize); T Deserialize(byte[] arrayToDeserialize); } }

To have accurate results as possible, we'll use the same model class which instance we'll serialize/deserialize using different serialization formats

using System; using System.Runtime.Serialization; using ProtoBuf; namespace Serialization.Samples.Tests.Models { [Serializable] [DataContract(IsReference = true, Name = "Person", Namespace = "Serialization.Samples")] [ProtoContract] public class Person { [DataMember(Name = "Id", IsRequired = false)] [ProtoMember(1)] public Guid Id { get; set; } [DataMember(Name = "FirstName", IsRequired = false)] [ProtoMember(2)] public String FirstName { get; set; } [DataMember(Name = "LastName", IsRequired = false)] [ProtoMember(3)] public String LastName { get; set; } [DataMember(Name = "DOB", IsRequired = false)] [ProtoMember(4)] public DateTime DOB { get; set; } } }

So lets start with the built in ones, since you can just add references to your project and you are good to go.

Binary

Binary serialization is the fastest serialization in .NET but it is not the lites one. Byte array using binary serialization is a lot larger than using other serializations. It is also not cross platform compatible. If you care about the speed of serializing data and less on the IO writing like network transfer. You should pick this one.

If you also have both clients written in .NET and they use common contract classes, you are good to go with binary serialization.

using System.IO; using System.Runtime.Serialization.Formatters.Binary; namespace Serialization.Samples.Serializers { public class SampleBinarySerializer<T> : ISerializer<T> where T : class { public T Deserialize(byte[] arrayToDeserialize) { BinaryFormatter binaryFormatter = new BinaryFormatter(); using (MemoryStream memoryStream = new MemoryStream(arrayToDeserialize)) { return binaryFormatter.Deserialize(memoryStream) as T; } } public byte[] Serialize(T objectToSerialize) { BinaryFormatter binaryFormatter = new BinaryFormatter(); using (MemoryStream memoryStream = new MemoryStream()) { binaryFormatter.Serialize(memoryStream, objectToSerialize); return memoryStream.ToArray(); } } } }

Xml

The second out of the box serializer is xml serializer. As the name says, it generates the xml string from your class instance. Xml format is a bit obsolete now days since it has been replaced in many areas by JSON as a lighter and more readable format

using System.IO; using System.Xml.Serialization; namespace Serialization.Samples.Serializers { public class SampleXmlSerializer<T> : ISerializer<T> where T : class { public T Deserialize(byte[] arrayToDeserialize) { using (var memStream = new MemoryStream(arrayToDeserialize)) { return new XmlSerializer(typeof(T)).Deserialize(memStream) as T; } } public byte[] Serialize(T objectToSerialize) { using (var memStream = new MemoryStream()) { new XmlSerializer(typeof(T)).Serialize(memStream, objectToSerialize); return memStream.ToArray(); } } } }

JSON

Json serialization is available in .NET out of the box as a part of .NET framework, but it is really rare that anyone is using it. It sits in System.Runtime.Serialization.Json namespace. You can find more about it in Microsoft documentation at https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.json.datacontractjsonserializer but because of it's poor performances, more and more projects are relying on Newtonsoft.Json nuget package

The following is the simple implementation around this package

using System; using System.Text; using Newtonsoft.Json; namespace Serialization.Samples.Serializers { public class SampleJsonSerializer<T> : ISerializer<T> where T : class { public T Deserialize(byte[] arrayToDeserialize) { return JsonConvert.DeserializeObject<T>(Encoding.Default.GetString(arrayToDeserialize)); } public byte[] Serialize(T objectToSerialize) { return Encoding.Default.GetBytes(JsonConvert.SerializeObject(objectToSerialize)); } } }

These serialization come out of the box with .NET, but in case you have some special requirements for example small size of the data which will be transfered over the wire, you should consider some of the solutions that do not come included with .NET framework

The following graph shows size comparison among mentioned serialization formats and Apache Hadoop Avro and Google Protocol-buffer Protobuf serialization.

Google protobuf

This serailization format is developed by Google and most of their platforms use it for communication over the wire. It comes with PROTOC tool that can be used to build model classes from schema for various platforms like Java and .NET. You can use it even without schema if you take care about model structure on both sending end receiving point. This is not a problem for simple models like the ne we use in this sample, but for more complex models it is easier to rely on the model and use PROTOC to build the models for target platform

Note More on how to deal with .proto files in Visual Studio and .NET Core you can find in this article http://bit.ly/2tDqZWQ

using ProtoBuf; using System.IO; namespace Serialization.Samples.Serializers { public class SampleProtobufSerializer<T> : ISerializer<T> where T : class { public T Deserialize(byte[] arrayToDeserialize) { using (var memStream = new MemoryStream(arrayToDeserialize)) { return Serializer.Deserialize<T>(memStream); } } public byte[] Serialize(T objectToSerialize) { using (var memStream = new MemoryStream()) { Serializer.Serialize<T>(memStream, objectToSerialize); return memStream.ToArray(); } } } }

To use protobuf in .NET you need to install third party nuget package. I used protobuf-net package.

Apache Hadoop Avro

Another not out of the box serialization is Apache foundation Hadoop Avro. For this one Micorsoft is providing nuget package Microsoft.Hadoop.Avro

It is similar to protobuf in terms of using schema for generating classes for different platforms, but can also be used without schema if model is simpler and you use same contract class on both sides of the wire.

using Microsoft.Hadoop.Avro; using System.IO; namespace Serialization.Samples.Serializers { public class SampleAvroSerializer<T> : ISerializer<T> where T : class { private readonly IAvroSerializer<T> avroSerializer; public SampleAvroSerializer() { avroSerializer = AvroSerializer.Create<T>(); } public T Deserialize(byte[] arrayToDeserialize) { using (var memStream = new MemoryStream(arrayToDeserialize)) { return avroSerializer.Deserialize(memStream); } } public byte[] Serialize(T objectToSerialize) { using (var memStream = new MemoryStream()) { avroSerializer.Serialize(memStream, objectToSerialize); return memStream.ToArray(); } } } }

For the end I put in a graph speeds of the unit tests from this solution.

Note These are speeds of the implementations/nugets used for the specific serialization type and should not be considered as actual speed of the serialization algorithm

References

Disclaimer

Purpose of the code contained in snippets or available for download in this article is solely for learning and demo purposes. Author will not be held responsible for any failure or damages caused due to any other usage.