How many uops are there?

Author: Wojciech Muła Added on: 2018-11-18

The current Intel CPUs translate instructions into so called uops (micro-ops), which is a kind of internal ISA. For simple operations, like addition or bitops, translation is one-to-one, i.e. there's exactly one uop for given instruction. When an instruction gets a memory argument we usually will get two uops: one for load, another for actual operation; please note that most instructions has many forms, usually reg, reg and reg, mem .

I was curious how it looks in case of SIMD instructions. I used data from uops.info, and picked recent SkylakeX architecture; results are from IACA 3.0,

Observations:

90% of SIMD instructions are directly (or almost directly) translated into simple uops. It means they're likely supported by dedicated circuits.

AVX512 scatter , gather and conflict instructions seem not to be backed by hardware.

, and instructions seem not to be backed by hardware. STNI is very dead.

uops number of CPU instructions % CPU instructions 0 8 0.17 vgatherdps, vgatherdps, vgatherqps, vpgatherdd, vpgatherdd, vpgatherqq, vpscatterqd, vscatterqps 1 1752 36.17 too many, omitted 2 2616 54.00 too many, omitted 3 234 4.83 too many, omitted 4 140 2.89 too many, omitted 5 38 0.78 dpps, vdpps, vdpps, vgatherdpd, vgatherdpd, vgatherdpd, vgatherdpd, vgatherdpd, vgatherdps, vgatherdps, vgatherdps, vgatherqpd, vgatherqpd, vgatherqpd, vgatherqpd, vgatherqpd, vgatherqps, vgatherqps, vgatherqps, vgatherqps, vmovdqu8, vpgatherdd, vpgatherdd, vpgatherdd, vpgatherdq, vpgatherdq, vpgatherdq, vpgatherdq, vpgatherdq, vpgatherqd, vpgatherqd, vpgatherqd, vpgatherqd, vpgatherqd, vpgatherqq, vpgatherqq, vpgatherqq, vpgatherqq 7 4 0.08 vpscatterdq, vpscatterqq, vscatterdpd, vscatterqpd 8 10 0.21 pcmpestri, rex.w pcmpestri, rex.w vpcmpestri, vpcmpestri, vpconflictd, vpconflictd, vpscatterqd, vpscatterqd, vscatterqps, vscatterqps 9 8 0.17 pcmpestri, pcmpestrm, rex.w pcmpestri, rex.w pcmpestrm, rex.w vpcmpestri, rex.w vpcmpestrm, vpcmpestri, vpcmpestrm 10 4 0.08 pcmpestrm, rex.w pcmpestrm, rex.w vpcmpestrm, vpcmpestrm 11 6 0.12 vaeskeygenassist, vaeskeygenassist, vpscatterdq, vpscatterqq, vscatterdpd, vscatterqpd 12 2 0.04 vpscatterdd, vscatterdps 14 2 0.04 vpconflictd, vpconflictq 15 2 0.04 vpconflictq, vpconflictq 16 1 0.02 vzeroall 19 4 0.08 vpscatterdq, vpscatterqq, vscatterdpd, vscatterqpd 20 2 0.04 vpscatterdd, vscatterdps 21 2 0.04 vpconflictd, vpconflictq 22 4 0.08 vpconflictd, vpconflictd, vpconflictq, vpconflictq 35 1 0.02 vpconflictd 36 4 0.08 vpconflictd, vpconflictd, vpscatterdd, vscatterdps

Scripts used to collect the data are available.