The history of the GC configs

Maoni

November 2nd, 2019

Recently, Nick from Stack Overflow tweeted about his experience of using the .NET Core GC configs – he seemed quite happy with them (minus the fact they are not documented well which is something I’m talking to our doc folks about). I thought it’d be fun to tell you about the history of the GC configs ‘cause it’s almost the weekend and I want to contribute to your fun weekend reading.

I started working on the GC toward the end of .NET 2.0. At that time we had really, really few public GC configs. “Public” in this context means “officially supported”. I’m talking just gcServer and gcConcurrent, which you could specify as application configs and I think the retain VM one which was exposed as a startup flag STARTUP_HOARD_GC_VM, not an app config. Those of you who only worked with .NET Core may not have come across “application configs” – that’s a concept that only exists on .NET, not .NET Core. If you have an .exe called a.exe, and you have a file called a.exe.config in the same directory, then .NET will look at things you specify in this file under the runtime element for things like gcServer or some other non GC configs.

At that time the official ways to configure the CLR were:

App configs, under the runtime element

Startup flags when you load the CLR via Hosting API, strictly speaking, you could use some of the hosting APIs to customize other aspects of GC like providing memory for the GC instead of having GC acquire memory via the OS VirtualAlloc/VirtualFree APIs but I will not get into those – the only customer of those was SQLCLR AFAIK.

(There were actually also other ways like the machine config but I will also not get into those as they have the same idea as the app configs)

Of course there have always been the (now (fairly) famous) COMPlus environment variables. We used them only for internal testing – it’s easy to specify env vars and our testing framework read them, set them and unset them as needed. There were actually not many of those either – one example was GCSegmentSize that was heavily used in testing (to test the expand heap scenario) but not officially supported so we never documented them as app configs.

I was told that env vars were not a good customer facing way to config things because people tend to set them and forget to unset them and then later they wonder why they were seeing some unexpected behavior. And I did see that happen with some internal teams so this seemed like a reasonable reason.

Startup flags are just a hosting API thing and hosting API was something few people heard of and way fewer used. You could say things like you want to start the runtime with Server GC and domain neutral. It’s a native API and most of our customers refused to use it when they were recommended to try. Today I’m aware of only one team who’s actively using it – not surprisingly many people on that team used to work on SQLCLR 😛

For things you could specify as app configs you could also specify them with env vars or even registry values because on .NET our internal API to read these configs always check all 3 places. While we had a different kind of attitude toward configs you could specific via app config, which were considered officially supported, implementation wise this was great because devs didn’t need to worry about which place the config would be read from – they knew if they added a new config in clrconfigvalues.h it could be specified via any of the 3 ways automatically.

During the .NET 4.x timeframe We needed to add public configs for things like CPU group (we started seeing machines with > 64 procs) or creating objects of >2gb due to customer requests. Very few customers used these configs. So they could be thought of as special case configs, in other words, the majority of the scenarios were run with no configs aside from the gcServer/gcConcurrent ones.

I was pretty wary of adding new public configs. Adding internal ones was one thing but actually telling folks about them means we’d basically be saying we are supporting them forever – in the older versions of .NET the compatibility bar was ultra high. And tooling was of course not as advanced then so perf analysis was harder to do (most of the GC configs were for perf).

For a long time folks used the 2 major flavors of the GC, Server and Workstation, mostly according to the way they were designed. But you know how the rest of this story goes – folks didn’t exactly use them “as designed originally” anymore. And as the GC diagnostic space also advanced customers were able to debug and understand GC perf better and also used .NET on larger, more stressful and more diverse scenarios. So there was more and more desire from them to do more configuration on their own.

Good thing was Microsoft internally had plenty of customers who had very stressful workloads that called for configuration so I was able to test on these stressful real world scenarios. Around the time of .NET 4.6 I started adding configs more aggressively. One of our 1st party customers was running a scenario with many managed processes. They had configed some to use Server GC and others to use Workstation. But there was nothing inbetween. This was when configs like GCHeapCount/GCNoAffinitize/GCHeapAffinitizeMask were added.

Around that time we also open sourced coreclr. The distinction of “officially supported configs” vs internal only configs was still there – in theory that line had become a little blurry because our customers could see what internal configs we had 🙂 but it also took time for Core adoption so I wasn’t aware of really anyone who was using internal only configs. Also we changed the way config values were read – we no longer had the “one API reads them all” so today on Core where the “official configs” are specified via the runtimeconfig.json, you’d need to use a different API and specify the name in the json and the name for the env var if you want to read from both.

My development was still on CLR mostly, just because we had very few workloads on CoreCLR at that time and being able to try things on large/stressful workloads was a very valuable thing. Around this time I added a few more configs for various scenarios – notable ones are GCLOHTheshold and GCHighMemPercent. A team had their product running in a process coexisting with a much large process on the same machine which had a lot of memory. So the default memory load that GC considered as “high memory load situation”, which was 90%, worked well for the much larger process but not for them. When there’s 10% physical memory left that was still a huge amount for their process so I added this for them to specify a higher value (they specified 97 or 98) which meant their process didn’t need to do full compacting GCs nearly as often.

Core 3.0 was when I unified the source between .NET and .NET Core so all the configs (“internal” or not) from .NET were made available on Core as well. The json way is obviously the official way to specify a config but it appeared specifying configs via env vars was becoming more common, especially with folks who work on scenarios with high perf requirements. I know quite a few internal and external customers use them (and have yet to hear any incidents that involved setting an env var in an undesirable fashion). A few more GC configs were added during Core 3.0 – GCHeapHardLimit, GCLargePages, GCHeapAffinitizeRanges and etc.

One thing that took folks (who used env vars) by surprise was the number you specific for a config in an env var format is interpreted as a hex number, not decimal. As far as why it was this way, it completely predates my time on the runtime team… since everyone remembered this for sure after they used it wrong the first time 😛 and it was an internal only thing, no one bothered to change it.

I am still of the opinion that the officially supported configs should not require you to have internal GC knowledge. Of course internal is up for interpretation – some people might view anything beyond gcServer as internal knowledge. I’m interpreting “not having internal GC knowledge” in this context as “only having general perf knowledge to influence the GC”. For example, GCHeapHardLimit tells the GC how much memory it’s allowed to use; GCHeapCount tells the GC how many cores it’s allowed to use. Memory/CPU usage are general perf knowledge that one already needs to have if they work on perf. GCLOHThreshold is actually violating this policy somewhat so it’s something we’d like to dynamically tune in GC instead of having users specify a number. But that’s work we haven’t done yet.

I don’t want to have configs that would need users to config things like “if this generation’s free list ratio or survival rate is > some threshold I would choose this particular GC to handle collections on that generation; but use this other GC to collect other generations”. That to me is definitely “requiring GC internal knowledge”.

So there you have it – the history of the GC configs in .NET/.NET Core.