Each CPU has a different set of counters that they support. A counter represents a measurable thing on the CPU. For example, one can measure the total number of cycles, uops, or retired, conditional branch instructions. These counters are broken into two groups:

Architectural Counters Micro-architectural Counters

Anything that’s in the first group is part of the standard instruction set architecture (ISA) that the CPU supports. This means that the counters will be the same between CPU generations. However, the vast majority of counters are related to the second group: micro-architectural. These deal with the actual design and implementation of the processor and therefore change from processor to processor. The vast majority of counters are in this latter bucket. Practically this means that the counters and their meaning change from generation to generation. What existed and was used on a Haswell processor may not resemble a Cascade Lake processor at all. And the only guarantee one can make is that most everything is different between AMD and Intel CPUs.

While there are hundreds of counters, only a few of them can be activated at any given time. On x86 CPUs you need to associate a counter with a specific unit. Each CPU core has a limited number of units, generally called performance monitoring units. On Intel systems, you generally only get four of them per thread and on AMD Ryzen/EPYC CPUs you get six! So we’ve gone from hundreds of counters to really only having 4-6 active ones at any given time.

Because these are a finite resource, the operating system usually virtualizes them to some degree and creates an abstraction for enabling and controlling them. The exact way that this looks can vary depending on the operating system, but generally there are tools that are part of the OS. For example, on illumos you can use cpustat(1M) or even DTrace’s CPC provider. On Linux, tools such as perf) can be used to access the counters. There are various tools on Windows and more that Intel themselves write.

On both Intel and AMD, the performance monitor counters are managed with MSRs (model-specific registers). In essence, you write the ID number of a counter you care about and then read back the values at some time later. If you write an invalid counter ID to a register, that generally results in a #GP, the x86 general protection fault, which is often used as a catch-all exception. There are different strategies for figuring out how to handle this fact.