Key Takeaways Significant performance improvements came to .NET since 4.0, making it worthwhile to revisit assumptions based on older versions of the .NET Framework.

Garbage collection is a recurring theme in high performance scenarios, which led to many CLR and language improvements such as ref returns and ValueTask.

Profiling APIs giving more granular metrics on memory allocations would be a major improvement over the current APIs. (work on the wording)

.NET contains a rich set of parallel programming APIs and libraries such as the Task Parallel Library, Rx.NET and Akka.NET. The challenge in this area is educating users and making abstractions easier to use for the wider community.

.NET as an ecosystem has the required pieces to support IoT/small devices scenarios: .NET Native, .NET Standard and portable CLR components.

Interest about performance in .NET have increased recently with the advent of .NET Standard and new platforms opened to .NET applications. .NET is not traditionally a platform recognized for writing high performance applications; it is instead generally associated to enterprise applications.

There are several trends over the past years changing the landscape significantly. First and foremost, .NET Standard opens the door for a wide range of platforms such as mobile phones and IoT devices. These devices offer new challenges, being much more resource constrained than desktops and servers.

Meanwhile, being able to run server applications on Linux represents both a significant opportunity and uncharted territory for developers. .NET Core has yet to accumulate success stories for running high performance applications. Several new trends are also picking up steam in the .NET ecosystem: microservices, containers and serverless. Each brings their own set of performance requirements.

Panelists

Ben Watson - author of the books Writing High-Performance .NET Code and C# 4.0 How-To

- author of the books Writing High-Performance .NET Code and C# 4.0 How-To Sasha Goldshtein - the CTO of Sela Group, a Microsoft C# MVP and Regional Director

- the CTO of Sela Group, a Microsoft C# MVP and Regional Director Matt Warren - C# dev who loves finding and fixing performance issues

- C# dev who loves finding and fixing performance issues Maoni Stephens - the main developer of the .NET GC

- the main developer of the .NET GC Aaron Stannard - Founder and CTO at Petabridge

InfoQ: From your experience, where is .NET at on the performance aspect? How does it compare to the other mainstream platforms?

Ben Watson: .NET is in a strong position, and getting stronger. The CLR team takes performance seriously and has made tremendous gains in the last few years in improving the performance of many aspects of .NET, such as JIT and the garbage collector. My product in Microsoft has driven some of those changes, and it is gratifying seeing these improvements benefit the whole world, not just the largest applications in Microsoft. There are some weaknesses, of course. In an online world, where every request matters, things like JIT and GC can get in the way of extreme performance. There are solutions--but it can take significant effort, depending on how important that last bit of performance is to you. Performance is always work, and any platform is about tradeoffs. .NET gives you some incredible features in the runtime, but you have to play by its rules to get the most out of it. The highest levels of performance require the highest levels of engineering. Other platforms will have different tradeoffs, but the engineering effort will have to be there regardless.

Sasha Goldshtein: Traditionally, .NET has had the reputation of being a platform for LOB and web applications, where performance isn't as critical as it is for games, embedded systems, and real-time message processing. I think the focus has been shifting over the last few years, as more people are starting to realize that the convenience of C# and F#, together with the power of the .NET Framework and the potential ability to use the same codebase on non-Microsoft operating systems, makes it worthwhile to invest in developing high-performance systems with .NET. Microsoft has made quite a few investments in platform performance. To cite some examples: .NET Native was introduced a few years ago to improve startup times and reduce memory usage for client apps; .NET 4.5 and 4.6 saw important improvements to the scalability of the garbage collector; .NET 4.6 had a revamped JIT compiler with support for vectorization; C# 7 introduced ref locals and ref returns, which are features designed to allow for better performance on the language level. All in all, it would probably be easier for me personally to write a small high-performance application in a lower-level language like C or C++. However, introducing unmanaged code into an existing system, or developing a large codebase with these languages, is not a decision to be taken lightly. It makes sense to bet on .NET for certain kinds of high-performance systems, as long as you are aware of the challenges and have the tools to solve them on the code level and when facing issues in the production environment.

Matt Warren: .NET is often compared to Java as they are very similar runtimes, i.e. they both have a JIT compiler, a Garbage Collector (GC) and support similar object-orientated languages (C# v. Java). I would say that overall the 2 runtimes have similar performance characteristics, but there are of course areas that each excels in. It's often quoted that the Java Hotspot Compiler (JIT) performs more advanced optimisations than the .NET JIT. In addition it has additional features that help performance, for instance it will initially interpret a method (to improve start-up time), then at a later point in time it may compile it to machine code. However, on the flip side, for a long time now .NET has supported custom Value Types (structs), which allow you greater control over memory layout and can help reduce the effects of the GC. Java might be getting its own version of Value Types in the near future, but it doesn't have them yet. I think that .NET is a solid platform, which performs well across the board. There will always be benchmarks that show it's slower that Platform X, but I think that for the types of workloads it is used for, it performs well.

Maoni Stephens: GC is without a doubt one of the most mentioned areas when talking about performance. Since I work on it, my answers on this panel will be mostly focused around the context of the GC. While some folks might have the impression that .NET was associated with high productivity and not high performance, there have been many products written on .NET that have very high performance requirements. To make sure our GC can handle these requirements, before we ship a significant feature we always test it with our internal partner teams that handle some of the most stressful workloads in the world like Bing or Exchange. This way we don’t only rely on micro/macro benchmarks which, while very useful, can be seriously lacking in properly showing the performance in real world scenarios. The .NET GC has been highly tuned for more than a decade and is a very competitive GC compared to the ones in other mainstream platforms. Of course, GC is only part of the framework. When someone talks about GC performance, they really mean "How this GC handles the workload I give to it". Naturally, if a workload puts a lot of stress on a GC, it means longer and/or more frequent GC pauses will likely occur than another workload that makes it not so stressful for that GC. And historically the way the .NET framework was implemented and how folks tend to write code in .NET can pretty easily put very much stress on our GC. While we always keep making improvements to our GC so it can handle more types of workloads and more stressful ones, folks who have high performance requirements always need to measure and understand what factors are affecting performance in their applications. This is the case no matter what framework/libraries you use; in other words, you do need to understand how your workload behaves if you want to achieve and maintain high performance. This blog entry explains comparing only the GC performance if you are interested in that. The fact that the GC is part of a mature framework does present challenges that newer frameworks may not have. We already have a lot of libraries written and in use for a long time, and we generally try very hard not to break existing scenarios with new features. And if we must because we are providing something that’s used only by a small percentage of our users who are willing to put in more effort for high performance, we still want to make it as safe as possible as that’s one of the main qualities of our framework. I’ve seen some of the newer frameworks advocating features that sound performant but really put unrealistic reliability burden on the users. That’s not the approach we want to take. Our customers are always asking for more performance from us because they are pushing the limit on their side. I’ve heard many folks who used to use .NET say that they really missed it when they had to switch to another framework for reasons like job changes. So I am very excited that now .NET is running on more OSs. As mentioned above, due to the way .NET libraries and applications were/are written, they do tend to generate a lot of work for the GC so I believe we need to keep improving both the GC performance and tooling that helps our users in measuring perf in the foreseeable future. At the meantime, we are also looking at investing in more features that can reduce pressure on the GC.

Aaron Stannard: Most of the “performance” discrepancies come down to frameworks implemented on top of it. JavaScript is not an inherently “fast” language for a multitude of reasons, yet Node.JS smokes ASP.NET in all of the TechEmpower benchmarks. Why? Largely because ASP.NET is built on technology that is decades old, specific to Windows, and originally implemented using a synchronous design. Node.JS, on the other hand, was designed to be asynchronous from the start and incorporated many other performance-first choices from the beginning. There’s nothing technically stopping .NET from doing the same, which is the raison d’etre for Kestrel and ASP.NET Core. The C# language and the CLR itself are immensely fast, even though ASP.NET might be slow. The CLR supports value types, which are a huge advantage over the JVM when it comes to performance since we can have essentially free allocations for those types (up to a certain extent…) The CLR has had many improvements made to it over the past several years that make it much smarter at runtime; for instance, the garbage collector has become very good about defragmenting memory as it collects garbage which helps speed up subsequent memory access significantly. The CLR JIT can also use profiling information to make bets about how best to compile a local function at runtime, such as whether to use a jump table or a simple if statement on a C# interface that has only a small number of implementations. The C# language itself also has the benefit of being able to give the developer more control over how the compiler behaves - you can provide directives with respect to inlining functions, you can pin memory off heap and out of range of the garbage collector, and you can even treat CLR allocated objects as unsafe and go back to C-style pointer arithmetic if needed. The point being, with C# and the CLR you get more flexibility with respect to performance tuning than any other managed language.

InfoQ: What challenges did you face writing high performance applications in .NET?

Ben Watson: The biggest challenge was, and in some respects, still is garbage collection. When most people first start programming in .NET, they really take it for granted and don't worry about it. For most apps, it isn't really relevant unless you have a truly terrible allocation pattern. However, the GC problem shouldn't be overlooked, especially in high-performance systems that never stop processing. Programs like these need to be written quite differently than standard applications. For example, when your default coding pattern results in allocating gigabytes of memory per second, how do you change things so that your program isn't primarily a driver for the garbage collection system? When you count every millisecond, when will it have time to collect? It's not about avoiding GC completely, but making it cost little enough so that it doesn't matter, so that it's not the long pole in performance analysis. It can take very significant engineering efforts to achieve this state, but the results are amazing. It really is just as possible to achieve amazing performance in managed code as it is in the unmanaged world--the tricks are different, but they're still there. Doing this requires a fundamental shift in how many people think of writing in managed code--they have to live up to their engineering title. In some ways, teaching this engineering mindset is the hardest part. We often take our toolsets for granted, and you can't afford to do so when performance really matters.

Sasha Goldshtein: As with any managed platform, the garbage collector is a source of slowdowns and pauses for many .NET applications. When heaps grow larger, reaching into the tens and hundreds of gigabytes, it becomes more difficult to manage GC-induced pauses, considering that .NET doesn't have a fully concurrent collector (so some collections require a complete suspension of the application's threads). The result is that many people reach for using unmanaged memory, pooling objects rather than freeing them, or very carefully tuning the system so that full GCs do not occur. This is unfortunate; in my opinion, the GC should work for the application developers, but it is sometimes the other way around. Another source of performance challenges is the variety of landmines lurking in various C# language constructs and .NET library APIs. For example, it is often very hard to predict which library method is going to allocate memory, or how to use a language construct in such a way that will minimize memory allocations. Finally, certain kinds of operations are still easier to do in C/C++ -- pointer manipulations, advanced vector instructions (not currently covered by System.Numerics.Vectors), generic numeric code. Getting the same result in C# often requires unnatural constructs like unsafe code, code generation from templates, or even emitting IL/assembly dynamically.

Matt Warren: I've worked on several projects where managing the effects of the Garbage Collector (GC) were an issue. System that were running for hours or days at a time allocated a lot of memory and at some point the GC had to clean it all up. These pauses became an problem because the application had quite tight latency requirements (sub 40 msecs). So learning how to detect, measure and then mitigate the effect of the GC was our biggest performance challenge.

Maoni Stephens: GC is driven by allocations and survivals. And we want to help our framework assembly authors and customers find out info on these so they can in turn help make the GC’s job easier. Over the years we talked about various ideas of presenting allocations to our customers. I remember at one point we talked about giving APIs “allocation scores” but that didn’t go anywhere simply because when you take different parameters or code paths in an API the allocations it does can be drastically different. We definitely have developed tools that help you with allocations (we have ETW events and tools that present those events in a meaningful way) but it would be really nice to be able to see how many bytes I’ve allocated when I make an API call during development; or from point A to point B how many bytes I’ve allocated myself. The runtime does provide this info but I have not seen an IDE exposing it. Another thing we worked on was a view in PerfView to help with seeing how older generation objects hold onto objects in younger generations, i.e., make them survive. This play a big part in ephemeral GC cost. I don’t think we exposed this externally yet (but the code is in PerfView, search for CROSS_GENERATION_LIVENESS) as we didn’t have time to make the implementation solid enough. This would allow you to find out fields of objects that you can set to null to let the GC know you are done with holding something alive when a GC happens naturally (i.e., not when you induce a GC yourself which doesn’t reflect what inter-generational references GC sees accurately).

Aaron Stannard: In general I’ve found that the underlying CLR itself is further behind on performance instrumentation than other platforms. This is one of the reasons why I wrote NBench, a performance testing framework for .NET applications. One example: the Java Runtime Environment exposes the Java Management Extensions (JMX) toolchain which makes it very easy for developers to access detailed performance statistics at runtime either via a GUI or via the JMX API. Many of the big management and monitoring tools for large scale JVM applications are built directly on top of this; DataStax OpsCenter for Apache Cassandra springs to mind as an example. You can’t even have a conversation about “good,” “bad,” or “better” performance without quantifying it and my experience doing this on the CLR has been painful, to say the least. The current built-in solutions for performance on .NET, Event Tracing for Windows (ETW) and Performance Counters, are a bit awkward to implement correctly and reliably at scale compared to the JMX. And you won’t be able to use either of those on .NET Core as they’re Windows-specific technologies, so as far as I know there’s not much in the way of built-in options for monitoring .NET Core apps today. The other big challenge I had with building high-performance systems in .NET was understanding the performance of the other .NET libraries and components I depended on; since most developers don’t know how to performance test or even performance profile their code you’re taking a big risk anytime you include a third party library in your application. I remember installing a StatsD monitoring component in one of my large scale .NET applications (100s of millions of transactions per day) and the library spiked my CPU utilization by about 30%. Turns out it was doing some weird stuff with the underlying UDP socket and I had to patch it to bring the performance back inline with what we deemed acceptable.

InfoQ: How well does .NET handle parallelism and concurrency? Is there room for major improvements?

Ben Watson: The Task Parallel Library was a huge leap forward for the platform. It completely transformed concurrent programming in .NET. We use it in the online platform team to create a massively scalable application to handle real-time queries. A correctly written TPL app can easily scale for better performance just by adding more processors, something we've seen in practice multiple times--no code changes needed. That said, writing massively scalable systems is still a very hard problem. It requires a deep understanding of the hardware, the operating system, and the internals of the CLR. The higher your expectations, the more you have to scrutinize each layer. The abstractions are getting higher and easier to develop with, but they're not yet at the level that the average beginning programmer can make good use of them with efficiency and correctness. From a practical point of view, achieving perfect parallelism is often about avoiding any possibility of making a blocking call. In this area, the tools can use some improvement. We need better ways to highlight problematic areas in code and suggest ways to remove these performance roadblocks. Unless you are coding in a highly structured framework that limits what you can do, it is quite easy to ruin your performance by injudicious use of common APIs. If we can combine these types of tools with a platform that does the heavy lifting, concurrency-wise, for you, I think it will go a long way towards making concurrency easy and effective for the masses.

Sasha Goldshtein: The .NET Framework has a rich set of APIs for parallelism and concurrency: the Task Parallel Library, TPL DataFlow, and the async/await language features. These cover data parallelism, when you want to take a large collection of items and apply a certain transformation in parallel, as well as less-structured task parallelism, where you have a set of tasks you want to arrange in a pipeline and execute concurrently. Finally, there's also good coverage of asynchronous I/O operations, which are critically important for high-performance server code. At the same time, there hasn't been a lot of progress recently on the GPGPU end, to allow .NET applications to offload generic computation to the GPU; on the hybrid front, with internal processing cards like Xeon Phi; and on the vectorization front, with more advanced vector instructions than what's currently available in System.Numerics.Vectors. Writing highly-parallel CPU-bound applications using .NET is still restricted to mainstream processors (limited to a relatively small number of sockets and cores) and becomes very difficult when you add hybrid compute engines to the system.

Matt Warren: We're pretty fortunate in .NET, there's quite a lot of language and runtime features that help us out. The async/await keywords, Task/Task and even right down to the highly-tuned Thread-pool. However, it seems like the major improvements are now coming in higher level libraries, for instance Akka.NET that implements the 'Actor' concurrency model.

Maoni Stephens: In V1.0 we shipped Server GC flavor that was a parallel GC, and Workstation GC flavor that was not parallel but was concurrent to a degree. In V4.0 we evolved Workstation GC to make it a lot more concurrent and in V4.5 we made that available for Server GC (ie, Background Server GC) so it became both parallel and concurrent which gave us fantastic improvements in large Server workloads – full GC collection pauses on large heaps went from seconds for blocking gen2’s to generally 10s of ms for background gen2’s. We are always working on tuning our existing GC and we will be introducing major improvements to help us break some barriers like reducing the frequency of full blocking GCs to be extremely low (to the point most people don’t need to worry about it) and making the pause time really short without sacrificing the throughput too much.

Aaron Stannard: .NET has some of the best tools on the planet for concurrency and parallelism: the Task and Parallelism Library (TPL,) async / await, asynchronous garbage collection, reactive extensions (Rx,) and lots of OSS libraries like Akka.NET. There is always room for improvement; the recent addition of the ValueTask type to the TPL, for instance, is a huge boon as it allowed some tasks to be returned allocation-free. Which reminds me: one of the major parts of concurrency and parallelism that can be improved from a performance perspective is educating our end-users on how best to use these tools. There’s a cargo cult mythology around async / await in particular right now that leads some developers to believe that async is a keyword for “turbo mode: engage!” and it results in code gobs like this being written: await Task.Run(() => { // code that doesn't benefit from concurrency }); await Task.Run(() => { // more code that doesn't benefit from concurrency }); Concurrency and parallelism aren’t magic; they’re concepts that have to be applied in the right places in order for them to be effective. .NET as a community has not emphasized learning what these places are, hence why many of us build slow and wasteful applications despite having access to these amazing tools. So in other words, the documentation and human language around these high performance and concurrent technologies is a feature that is sorely missing from the conversation today.

InfoQ: Does .NET Core bring any advantage over the desktop .NET framework at the performance level?

Ben Watson: The fundamental parts of the CLR that matter for performance are largely the same. The biggest advantage I see is that .NET Core is where the love is right now--that's where the mindshare is focused. Desktop .NET isn't an afterthought, exactly, but there's definitely a desire to see more people move to .NET Core. If there are bugs to be fixed, or improvements to be made, it is more likely to happen in .NET Core first before making its way into the relatively slow .NET update release timeline.

Sasha Goldshtein: Although .NET Core and desktop .NET are derived from a shared codebase, there are more opportunities for innovation and optimization in .NET Core because of its faster release cadence. However, if you compare two point-in-time releases of .NET Core and desktop .NET, derived from the same codebase, I don't believe you should expect a performance difference: it is the same C# compiler, the same JIT, and the same GC engine under the hood.

Matt Warren: Fortunately Microsoft have just written a blog post titled 'Performance Improvements in .NET Core' that shows some of the current advantages. Some, if not all of these improvements may find their way back into .NET framework (depending on backwards compatibility concerns), but it shows that currently .NET Core is ahead. It appears that the combination of community help (due to it being Open Source), plus the ability to iterate quicker than the .NET framework means more improvements can be implemented in .NET Core. I would imagine that going forwards this difference could get larger, i.e. more and more performance improvements will only exists in .NET Core, especially if they involve the addition of new API's.

Maoni Stephens: All .NET SKUs share the same GC. The biggest advantage from GC’s POV is actually how fast you can make a feature available for people to try in Core because it’s open source, instead of waiting till the next Desktop release to make it available to all customers.

Aaron Stannard: Big time. Spans, System.IO.Pipelines, and tons of improvements to the way fundamental .NET types like strings are implemented are all major improvements over how these classes work in the traditional .NET 4.* framework today. There’s also been fundamental improvements to the way things like ConcurrentQueue are implemented, which should help significantly speed up concurrent applications as well.

InfoQ: What part does .NET Native play in the high performance area? In what scenarios can it outperform its managed counterpart?

Ben Watson: .NET Native has a lot of potential. It is very useful for the mobile device scenario and Windows store apps in general, where you don't want to use the user's limited battery life to JIT. However, I'm interested in seeing this kind of technology spread to traditional desktop and server apps, where it is a massive waste in electricity, downtime, latency, and availability to JIT the same things over and over again on every machine, especially in data centers. NGEN has severe limitations, something that .NET Native could eventually solve.

Sasha Goldshtein: .NET Native is an interesting project because it brings full AOT (ahead-of-time) compilation to the .NET world. I'm saying "full AOT" because NGen has been available as part of the .NET Framework since the beginning, but it still requires a full .NET Framework installation, requires the original IL modules for metadata, and can still kick-in the JIT if needed. The .NET Native solution gets rid of the JIT completely, does not depend on a global .NET Framework installation, and shakes off any unneeded dependencies, which shrinks the application down considerably. Most importantly, though, the optimization steps that go into a .NET Native build are considerably more advanced than the standard JIT or NGen (and in fact use the same optimizing backend as the C++ compiler). This is hardly surprising, because the JIT compiler operates under extremely restrictive time constraints, whereas .NET Native compilation can take literally minutes for a complex project. The end result can be quite impressive: for CPU-bound applications, certain parts can be sped up considerably by using better optimizations; and startup times can be improved by virtue of having a smaller binary with fewer dependencies that need to be loaded from disk, and JIT-compiled.

Matt Warren: One of the benefits of .NET Native is that your code is compiled ahead-of-time (AOT), rather than at run-time by a Just-in-time (JIT) compiler. That has 2 benefits, one is that start-up time is quicker because in the JIT scenario, the first time a method is called it has to be compiled, which adds overhead. The 2nd benefit is that the AOT compiler can do more costly and wide-ranging optimisations that wouldn't be possible for a JIT. A JIT compiler is constrained in what it can attempt to do because it has limited time to do its work, AOT compilers don't have this constraint.

Maoni Stephens: .NET Native reduces startup time and size on disk. And because it uses a compiler that does global analysis it’s able to perform more optimizations. I see it as being a significant player in the container scenarios.

Aaron Stannard: .NET Native will be convenient from a deployment perspective as it will allow us to deploy .NET applications onto machines that have never installed the .NET runtime whatsoever, kind of like a statically compiled C++ application. But I think applications compiled this way will be at a performance disadvantage compared to its managed counterpart. A natively compiled application can’t take advantage of runtime profiling JIT, which allows the application to performance-tune itself based on its actual usage at runtime. A .NET Native application should have a lower startup time than an application that still requires the .NET JIT, but I’d expect the runtime performance of the JITed application to be better on average.

InfoQ: Several performance related features were recently added to the .NET languages, such as value tuples and ref returns. What feature would you like to see next?

Ben Watson: I would love to see some efforts around pooling of objects to reduce memory allocation rate in high-throughput scenarios. I know that many .NET APIs pool internal data structures to reduce memory load, and that pooling can be very application specific, but even some official guidance on design would be useful. Some CLR help could go a long way, though, to making simple reusable objects easier and less of a burden on the garbage collector. I would also like to see a more scalable JIT. It's not slow, but there are bottlenecks when JITting code on multiple cores that prevent you from achieving the kind of minimal initialization time you want when executing a large application.

Sasha Goldshtein: I would like C# to make it possible to use truly generic code over numeric types, or any other family of types that do not share an interface. C++ templates come to mind because you can author a function template that operates on any numeric type, or family of types, without specifying the constraints up-front. Unfortunately it looks like introducing this feature would require a breaking change in the CLR, which language design is very carefully trying to avoid since generics were introduced in CLR 2.0. Another feature I would like to see, although I can understand the reservations, is inline IL or even inline assembly in C# programs. This would open the door to advanced optimizations, vector instructions, and other hardware-specific optimizations.

Matt Warren: Span's, although I know they're coming, so maybe that's cheating! But having the ability to access "contiguous regions of arbitrary memory" with performance that is close to or matches Arrays is a major benefit. Once they are implemented in the runtime, we should see Span's used widely across Framework and 3rd Party libraries, which will then give a much wider benefit.

Maoni Stephens: Our philosophy on the GC has been knob less because we don’t believe the users should be burdened with intimate GC knowledge in order to tune the memory behavior – that’s our job. We provided a few configs for our users to tell us what kinds of applications they have, eg, Server or Client. And we are keeping this philosophy. We do want to offer more configs that let customers tell us how we can help them in the application performance context, eg, if their application wants to trade a larger heap for more predictable pauses; or there’s 64GB memory on the machine but they would prefer to only use 32GB for a particular process. We are adding new mechanisms and policies to react better to bad situations to make GC more robust so we can meet the asks from these configs. This is one of the next performance related features the GC team will be delivering in the near future.

Aaron Stannard: I’d like to see some of the base collections support ref return natively; that’d open up some really interesting possibilities from a performance perspective, being able to pass around a reference to the same value type or iterate over an array of them. I have no idea what the performance gains would be exactly but I’d like to be able to experiment and find out. Biggest feature I’d like to see though: cross-platform performance instrumentation for .NET Core. I need to be able to get runtime performance figures on the garbage collector and other core parts of the CLR at runtime on Linux too!

InfoQ: While high-performance is usually associated with server applications, other platforms such as mobile and IoT devices also exhibit unique performance requirements. Can .NET meet the requirements of these non-traditional platforms?

Ben Watson: I'm optimistic that we'll see .NET as a great platform for smaller and more unique device footprints. We're starting to see more of a modular approach to .NET, where the various pieces can come and go, or be replaced, as needed. As this matures, I think we'll see some very appropriate architectures for different device footprints. In many ways, I think the future of high-performance applications (in .NET and elsewhere) lies in the creation of appropriate, reusable frameworks with constraints that lead to high-performance patterns.

Sasha Goldshtein: On mobile and IoT, there is a very strong focus on fast startup times, smaller memory footprints, and deterministic performance in face of garbage collection. There is work in this field, such as .NET Native, but generally there hasn't been much progress around considerably reducing GC pauses or shrinking .NET applications memory size (e.g. by using runtime optimizations like compact strings). I'm worried that today, it would be hard to build an IoT app in C# if it can hit occasional 5 second GC pauses; or a high-performance mobile app with the same issue. Future performance work on .NET Core and CoreCLR would have to bear in mind these non-traditional requirements.

Matt Warren: There is a lot of work being done to make .NET run on other platforms, such as the Raspberry Pi and the Tizen OS from Samsung and so far it seems like the .NET runtime is adapting well. What's interesting is that some of the changes made for more constrained devices (such as reducing memory overhead), are now included in .NET Core on other platforms, so we all benefit!

Maoni Stephens: The mechanisms mostly already exist; we do need to do some performance tuning to make them work better for other platforms. As a simple example, we needed to make the segment size smaller for Workstation GC when phone started to use Core. UWP applications tend to use more GC handles which means we might need to make our GC handle code paths more optimized. These are things that we need to gather data for and then it’s a matter of deciding where they go on our priority list.

Aaron Stannard: At our company, we already have customers using .NET for high throughput IOT applications. It’s doable today even without .NET Core. I expect that story to get even better as the .NET Core runtime matures and more libraries begin to support it. On the mobile side of things I expect to see the performance of Unity3D applications increase significantly as they move onto a newer version of C# and the Mono runtime (which, by the way, is also getting much faster - nearly on par with the Windows desktop CLR according to some of our benchmarks.)

Conclusion

.NET and its languages have evolved significantly performance wise since .NET 4.0, making long-standing assumptions obsolete. Aside from improvements on the GC, JIT and BCL, new additions such as .NET Native and the TPL widen the scenarios handled by .NET. Many more improvements on all front are planned or being implemented, warranting keeping an eye on updates and new benchmarks.

About the Panelists

Ben Watson has been a software engineer at Microsoft since 2008. As a core developer of the Bing platform, he has been integral in building one of the world’s leading .NET-based, high-performance server applications, handling high-volume, low-latency requests across tens of thousands of machines for millions of customers. He is passionate about performance and spends much of his time educating teams on best-practices in high-performance .NET. In his spare time, he enjoys geocaching, LEGO, reading, classical music, and spending time with his wife and children outdoors in the beautiful Pacific northwest. He is the author of the books Writing High-Performance .NET Code and C# 4.0 How-To.

Sasha Goldshtein is the CTO of Sela Group, a Microsoft C# MVP and Regional Director, a Pluralsight author, and an international consultant and trainer. Sasha is the author of "Introducing Windows 7 for Developers" (Microsoft Press, 2009) and "Pro .NET Performance" (Apress, 2012), a prolific blogger and open source contributor, and author of numerous training courses including .NET Debugging, .NET Performance, Android Application Development, and Modern C++. His consulting work revolves mainly around distributed architecture, production debugging and performance diagnostics, and mobile application development.

Matt Warren is a C# dev who loves nothing more than finding and fixing performance issues. He's worked with Azure, ASP.NET MVC and WinForms on projects such as a web-site for storing government weather data, medical monitoring devices and an inspection system that ensured kegs of beer didn't leak! He’s an Open Source contributor to BenchmarkDotNet and the CoreCLR. Matt currently works on the C# production profiler at CA Technologies and blogs at www.mattwarren.org.

Maoni Stephens is the main developer of the .NET GC. During her spare time she enjoys reading GC papers and watching anime/animals. In general, she likes to make things she cares about more efficient.

Aaron Stannard is Founder and CTO at Petabridge, a .NET data platform company. Aaron's one of the original contributors to Akka.NET, authoring the Akka.Remote module and Helios, the underlying socket server. Prior to working at Petabridge Aaron founded MarkedUp Analytics, a Cassandra and Akka.NET-powered real-time app analytics and marketing automation company for app developers.