When it comes to logging in .NET all you generally need to do is choose between log4net or NLog and start logging, however there are times when logging can become a bottleneck which can be a big problem specially if you are dealing with a low-latency component; This was the case with one of our applications.

The problem

We have a multi-threaded server component which produces a large number of events most of which need to be logged; It also needs to process those events within a specified SLA therefore only very little latency can be tolerated. Recently after adding a bunch of new features the amount of log entries increased by around 60% and that was enough to introduce big latencies which made a lot of people unhappy! :-(

The app uses the RollingFileAppender which persists the log events to the disk synchronously and one by one. After some profiling sessions it was clear to me that the threads producing the log events are waiting for the appender to persist every entry to the disk one after another.

How bad can it be?

I created a project to be able to track the progress of the solution in different scenarios. [link at the end]

The most important scenario to address was the throughput, here we have a single thread ( main ) producing as many log events as it possibly can within a specified duration (10sec):

... var sw = Stopwatch.StartNew(); long counter = 0; while (sw.Elapsed < Duration) { counter++; _logger.DebugFormat("Counter is: {0}", counter); } ...

The machine running the benchmark has the following spec:

CPU 2.2GHz 4 cores

RAM 8GB

SSD Read: 282(MB/s) Write: 271(MB/s)

Before running each benchmark the Windows File Cache was cleared using the awesome RAMMap by SysInternals.

Using the RollingFileAppender :

<?xml version="1.0" encoding="utf-8" ?> <log4net> <root> <level value="ALL"/> <appender-ref ref="RollingFile" /> </root> <appender name="RollingFile" type="log4net.Appender.RollingFileAppender"> <file type="log4net.Util.PatternString" value="Benchmarker-%date{yyyy-MM-dd}.log" /> <appendToFile value="false"/> <rollingStyle value="Composite"/> <maxSizeRollBackups value="-1"/> <maximumFileSize value="50MB"/> <staticLogFileName value="true"/> <datePattern value="yyyy-MM-dd"/> <preserveLogFileNameExtension value="true"/> <countDirection value="1"/> <layout type="log4net.Layout.PatternLayout"> <conversionPattern value="%date{ISO8601} [%-5level] [%2thread] %logger{1} - %message%newline%exception"/> </layout> </appender> </log4net>

results in:

-------------------Testing Throughput------------------- Counter reached: 701,017, Time Taken: 00:00:10.0000111 Gen 0: 72 Gen 1: 1 Gen 2: 0

As the numbers show the main thread managed to log around 700k entries and the GC behaves as expected with many short lived objects on the Gen 0 most of which are of type String , note there is only 1 collection of Gen 1 which makes sense as the main thread is blocked until all of the log events it just created are persisted by the appender One by One so they do not stay alive long enough to get promoted to Gen 1 triggering more collections.

What can we do?

So it was clear that I needed some mechanism to do both asynchronous and batch logging; If logging to a file was not a requirement and we had a listener, I could simply use the UdpAppender and spend the rest of the day at the bar and maybe worry about occasional packet drops! however I was not that lucky :-(

Options...

The first option would be to use the BufferingForwardingAppender I like the log4net forwarders in general as they can be very powerful and help do all kind of composition, filtering and logging to different destinations so to try this option all I had to do was to edit the log4net.config to look like:

<?xml version="1.0" encoding="utf-8" ?> <log4net> <root> <level value="ALL"/> <appender-ref ref="BufferingForwarder" /> </root> <!-- http://stackoverflow.com/a/11351400/1226568 --> <appender name="BufferingForwarder" type="log4net.Appender.BufferingForwardingAppender"> <bufferSize value="512" /> <lossy value="false" /> <Fix value="268" /> <evaluator type="log4net.Core.LevelEvaluator"> <threshold value="WARN"/> </evaluator> <appender-ref ref="RollingFile" /> <!-- or any additional appenders or other forwarders --> </appender> <appender name="RollingFile" type="log4net.Appender.RollingFileAppender"> <file type="log4net.Util.PatternString" value="Benchmarker-%date{yyyy-MM-dd}.log" /> <appendToFile value="false"/> <rollingStyle value="Composite"/> <maxSizeRollBackups value="-1"/> <maximumFileSize value="50MB"/> <staticLogFileName value="true"/> <datePattern value="yyyy-MM-dd"/> <preserveLogFileNameExtension value="true"/> <countDirection value="1"/> <layout type="log4net.Layout.PatternLayout"> <conversionPattern value="%date{ISO8601} [%-5level] [%2thread] %logger{1} - %message%newline%exception"/> </layout> </appender> </log4net>

As you may guess from the available config properties of the BufferingForwader , log entries are batched into a buffer of size 512; We also set the lossy flag to be false otherwise the forwarder would only log the last 512 log entries when it sees a log entry with a threshold of WARN as specified by the:

<evaluator type="log4net.Core.LevelEvaluator"> <threshold value="WARN"/> </evaluator>

Note we are not using this feature in our benchmarking and we could remove the <evaluator> but I am including it here as something you might want to try.

Finally since this forwarder uses batching, the log events will be persisted beyond the normal scope of the log4net internal Append() method as opposed to being persisted one by one therefore we need to fix some of the properties of those events that contain volatile data such as Thread Name(Id) , Message , Exception etc. In fact the full list of the properties that can be fixed is:

[Flags] public enum FixFlags { [Obsolete("Replaced by composite Properties")] Mdc = 1, Ndc = 2, Message = 4, ThreadName = 8, LocationInfo = 16, UserName = 32, Domain = 64, Identity = 128, Exception = 256, Properties = 512, None = 0, All = 268435455, Partial = Properties | Exception | Domain | ThreadName | Message, }

But it is important to note that the more properties you fix the slower the logging will be. I have found that Partial works best in most cases even though we almost never include Domain or Properties in our layout pattern (see HERE for the full list) but for this test I am fixing Message , ThreadName and Exception which is 4 + 8 + 256 = 268 . If you find these numbers ambiguous then you can simply set it to Message, ThreadName, Exception and it will be correctly parsed as 268 (another .NET hidden gem!).

Okay, so now that we have the config updated lets run the same benchmarks with the BufferingForwarder :

-------------------Testing Throughput------------------- Counter reached: 2,218,724, Time Taken: 00:00:10.0015448 Gen 0: 227 Gen 1: 1 Gen 2: 0

Not bad! we managed to log around 2.2m entries with still 1 Gen 1 collection; So far so good however there are some subtleties here....

Problems with BufferingForwardingAppender

The BufferingForwardingAppender only flushes its events once the bufferSize is full unless you specify lossy to be true so this means if you have an application producing a single log event every 1 second you would have to wait 512 seconds for the batch to be flushed out. This may not be a problem for you and if it is not then I highly recommend using this forwarder as it has great GC and CPU performance but if you need real-time monitoring of every log entry and a higher throughput then continue reading.

Another point we should be aware of is that this forwarder is not asynchronous it merely batches the events so the application thread(s) producing the log events will be blocked while the buffer is being flushed out.

The solution

Having considered the points above, I thought I could do better so say hello to my little friend AsyncBufferingForwardingAppender .

This buffering forwarder uses a worker thread which batches up and flushes the log events in the background, it also detects if it has been idle meaning if there has not been enough log events to cause the buffer to be flushed it will invoke a manual flush which addresses the problem I mentioned above and since it is developed as a forwarder you can add additional appenders to it for example it can forward the log events to the Console , DB or any other appender that you might have all at the same time.

I let you look at the code to find out how this forwarder works but let's get down to the numbers.

To test this forwarder I updated the log4net.config to be:

<?xml version="1.0" encoding="utf-8" ?> <log4net> <root> <level value="ALL"/> <appender-ref ref="AsyncBufferingForwarder"/> </root> <appender name="AsyncBufferingForwarder" type="Easy.Logger.AsyncBufferingForwardingAppender"> <lossy value="false" /> <bufferSize value="512" /> <Fix value="268" /> <appender-ref ref="RollingFile"/> <!--Any other appender or forwarder...--> </appender> <appender name="RollingFile" type="log4net.Appender.RollingFileAppender"> <file type="log4net.Util.PatternString" value="Benchmarker-%date{yyyy-MM-dd}.log" /> <appendToFile value="false"/> <rollingStyle value="Composite"/> <maxSizeRollBackups value="-1"/> <maximumFileSize value="50MB"/> <staticLogFileName value="true"/> <datePattern value="yyyy-MM-dd"/> <preserveLogFileNameExtension value="true"/> <countDirection value="1"/> <layout type="log4net.Layout.PatternLayout"> <conversionPattern value="%date{ISO8601} [%-5level] [%2thread] %logger{1} - %message%newline%exception"/> </layout> </appender> </log4net>

And the result is:

-------------------Testing Throughput------------------- Counter reached: 3,146,474, Time Taken: 00:00:10.0000011 Gen 0: 503 Gen 1: 172 Gen 2: 0

This time we managed to log 3.1m entries at the cost of 503 Gen 0 and 172 Gen 1 collections. The high number of Gen 1 is due to the asynchronous logging of this forwarder as the log events are now kept alive long enough to be pushed to Gen 1 but not long enough to end up at Gen 2 so the good news is that we do not have any Gen 2 collections which would be really costly.

Bonus point

Given our component lives on the server, we could leverage the <gcServer> to optimize the overall GC across all generations provided you have enough RAM and more than one core otherwise there will be no noticeable gains. To do this we need to update our app.config :

<?xml version="1.0" encoding="utf-8" ?> <configuration> <runtime> <gcServer enabled="true"/> </runtime> </configuration>

This time the benchmark says:

-------------------Testing Throughput------------------- Counter reached: 3,379,161, Time Taken: 00:00:10.0000011 Gen 0: 9 Gen 1: 4 Gen 2: 0

Summary

In this article we saw how we progressed from the non batching RollingFileAppender producing ~700k log events to a synchronous batching BufferingForwardingAppender producing ~2.2m log events we then took this further by using the custom AsyncBufferingForwardingAppender to achieve 3.1million log events in 10s.

We also looked at the effect of the <gcServer> to optimize the overall collections across both Gen 0 and Gen 1 achieving a throughput of ~3.4m.

In writing this forwarder my focus was purely on the throughput both in single and multi threaded scenarios. All of this comes at the cost of slightly more CPU usage due to Gen 1 collection; So is the extra 1 million log events worth the extra CPU cost? that is for you to decide but at least we now have the option :-)

Source HERE, NuGet package HERE.