Notes on Intel Microcode Updates

Ben Hawkes <hawkes@inertiawar.com>

December, 2012 - March, 2013



html - pdf



Introduction



All modern CPU vendors have a history of design and implementation defects, ranging from relatively benign stability

issues to potential security vulnerabilities. The latest CPU errata release for second generation Intel Core processors

describes a total of 120 "erratums", or hardware bugs. Although most of these errata bugs are listed as "No Fix", Intel

has supported the ability to apply stability and security updates to the CPU in the form of microcode updates for well

over a decade*.



Unfortunately, the microcode update format is undocumented. Researchers are currently prevented from gaining any

sort of detailed understanding of the microcode format, which means that it is impossible to study the updates to clearly

establish whether any security issues are being fixed by microcode patches. The following document is a summary of

notes I gathered while investigating the Intel microcode update mechanism.



* The earliest Intel microcode release appears to be from January 29, 2000. Since that date, a further 29 distinct

microcode DAT files have been released.



Acknowledgements



The initial idea to study Intel's microcode update mechanism was inspired directly from Tavis Ormandy's exploratory

work on this subject in 2011. Furthermore, I'd like to thank Emilia Kasper, Tavis Ormandy, Gynvael Coldwind and

Thomas Dullien for their outstanding technical assistance and encouragement.



How does the microcode update mechanism work?



Microcode updates are applied to a CPU by writing the virtual address of the Intel-supplied undocumented binary blob

to a model-specific register (MSR) called IA32_UCODE_WRITE. This is a privileged operation that is normally performed

by the system BIOS at boot time, but modern operating system kernels also include support for applying microcode

updates.



The BIOS (or operating system) should verify that the supplied update correctly matches the running hardware before

attempting the WRMSR operation. In order to do so, each microcode update comes packaged with a short header

containing various update metadata. The header is documented by Intel in Volume 3 of the Developer's Manual. It

contains three pieces of information required for validation: the microcode revision, processor signature, and processor

flags.



The microcode revision is an incremental version number - you can only successfully apply an update if the current

microcode revision is less than the revision supplied. The BIOS will typically extract the current microcode revision by

issuing a RDMSR called IA32_UCODE_REV and then compare this value against the revision contained in the new

microcode update's header.



The processor signature is a unique representative of the hardware model that the microcode will apply to. The

signature of the running hardware can be retrieved using the CPUID instruction, and then compared against the value

supplied in the microcode header. According to Intel, "each microcode update is designed specifically for a given

extended family, extended model, type, family, model, and stepping of the processor.". The processor flags field is

similar, Intel says: "the BIOS uses the processor flags field in conjunction with the platform Id bits in MSR (17H) to

determine whether or not an update is appropriate to load on a processor."



Once a microcode update has been applied using IA32_UCODE_WRITE, the BIOS will typically issue a CPUID

instruction and then read the IA32_UCODE_REV MSR again. If the revision number has increased, the update was

applied successfully.



Observation #1 - What does a microcode update look like?



Since 2008 Intel has regularly released DAT files containing the most up to date microcode revisions for each

processor. Prior to this, microcode update data was shipped as part of the open source tool microcode_ctl. An archive

of all microcode DAT releases can be found here.



So what does the undocumented blob portion of the microcode update look like? It appears that there is at least two

different formats to the undocumented blob, the old format being used up until Pentium 4 and certain early models of the

Intel Core 2, and the new format used from that point onwards. This article covers the new style format only.



The follow graphic shows a microcode update for an Intel Core i5 M460 (i.e. with the documented microcode header

stripped):







It is immediately clear that there is a plaintext structure (96 bytes in length) at the start of the undocumented blob. Some

easily identifiable fields are colorized:



- Microcode revision number.

- Release date (note that this date is sometimes one day prior to the microcode header date).

- Real length of microcode update (counted in 4-byte words).

- Processor signature.



And some less easily identifiable fields that appear to be in common usage are marked in grey:



- Possible flags field? May not be in use in recent hardware types.

- Possible loader version?

- Possible length field (when non-zero)? Not consistently used.



Observation #2 - Is there any structure in the microcode update after the 96 byte header?



Most of the data located after the 96 byte header appears to be random and without structure. However, performing a

longest common substring analysis on an archive of every unique microcode update (available in binary format here)

showed that different revisions for the same (or similar) processor signatures will share some common byte strings:







In this figure, two distinct strings have been identified:



- In green, a 2048-bit string that is constant between microcode revisions.

- In red, a 32-bit string that is constant for all microcode updates using the new style format.



In total, 12 unique 2048-bit strings were found to be shared across 24 processor signatures. The extracted data is

available here (in the format <2048-bit string> ).



Note that 2048-bits is a commonly used length for an RSA modulus, and that 0x00000011 (decimal 17) is a commonly

used value for an RSA exponent. This suggests that these common strings may be an RSA public key. Further

evidence to support this claim is that:



- Each of the values are strictly 2048 bit in length, i.e. the most significant bit is always set.

- None of the values are trivially factorable by 2, i.e. the values are all odd numbered.

- None of the values are factorable by any value between 2 and 2^32.



Observation #3 - Can the length of the microcode update be verified?



The length field of the 96-byte microcode header (shaded in green in fig 1) can be verified using a fault injection analysis.

The idea is to sequentially mutate each byte of a valid microcode update, attempt to apply the update, and record

whether the update was applied successfully or not.



The underlying assumption here is that the CPU should validate the integrity of the microcode update, but may not

validate the integrity of padding (since microcode updates must be a multiple of 1024, it is assumed that padding is

normally required).



Testing on an Intel Core i5 M460 (sig 0x20655, pf 0x800), the expected length of the microcode update (in revision 3) is

1668 bytes (0x1a1 * 4). Sequentially flipping a bit in each byte from offset 0 to 2000 and waiting for the first successfully

applied update gives the following results:







This result was observed on Intel Core 2 Duo P9500, Intel Core i5 M460 and Intel Core i5 2520M chips. For all other

experiments below, results were reproduced on Intel Core i5 M460, Intel Core i5 2520M, and Intel Xeon W3690 chips.



Observation #4 - How many cycles does an update take to be applied successfully?



To collect the average number of cycles the CPU took to successfully apply a microcode update, a specialized system

was setup that would:



Boot the system with an initial microcode revision. Install a Linux kernel module that: Invalidate caches (wbinvd) Stop instruction prefetch (sync_core) Disable interrupts for the running core (local_irq_disable) Record time stamp counter (rdtsc) Apply the next microcode update revision (wrmsr MSR_IA32_UCODE_WRITE) Record time stamp counter Record the rdtsc delta in syslog Reboot The cache invalidation and interrupt disable were intended to reduce variance in the timing delta. Rebooting is required

to reset the system to the original microcode revision, as successfully applied revisions must be strictly incremental.



The exact cycle value will vary significantly between different types of hardware (older hardware was observed to take

significantly more cycles), however a baseline value can be used in further timing analysis on the same hardware. For

example, the baseline average time delta across 2000 applications of microcode revision 3 for an Intel Core i5 M460 is:



Average: 488953 cycles

Sample standard deviation: 12270 cycles



The high variation in the sample deltas collected is presumed to be caused by multi-core systems. If the microcode

update mechanism has to achieve a consistent state across all available instruction pipelines (including consistency

across hyperthreads, prefetched instructions, instruction caches on all cores), this could result in a high level of

variance, as the collection mechanism used here only "cleans" internal state for the running core.



Observation #5 - Do the number of cycles change depending on the location of a fault?



Using the baseline timing delta above, it is possible to find deviations by flipping every possible bit position in the

microcode update and attempting to apply the malformed update. All of these update attempts will fail, but the idea is

that certain fields may be treated differently by the microcode update mechanism, and that this may show up in the

cycle delta.



Running this test on an Intel Core i5 M460 gives the following results:







This chart shows the results of the first 1000 bit positions being flipped. Three distinct areas of interest can be seen. All

other bit position above 1000 return a cycle count matching the failure case seen above.



The first area of interest, between bit offsets 32 and 63, corresponds to an unknown word the in the 96-byte header that

always has value 0x000000a1. This may serve as a magic value, checked when the microcode is first loaded to ensure

that an expected format has been received.



The second area of interest is a single bit at offset 64, which appears to correspond to a flags field. In the original

analysis, this bit was set. However, clearing the bit and repeating the analysis shows identical results to figure 4, except

with a significantly lower average count of cycles for the "normal" failure case. The decrease in cycle count appears to

be proportional to the number of physical cores on the system, which may suggest this bit is used to decide whether the

update will be iteratively applied to all cores, or only applied on a single core.



The third area of interest, between bit offsets 233 and 253, corresponds to the microcode size field.



Observation #6 - What happens of the microcode size field is modified?



Modifying each bit position results in an incrementally higher cycle count. To investigate this further, a second analysis

was run that records the cycle count for each size value between 0 and 10000. The following shows the results of this

analysis on an Intel Core i5 2520M:







In this chart we can see a clear correlation between an increasing size value and an increasing cycle count. This chart

appears to also show artifacts from running this system on a multi-core system (note that the i5 2520M is a quad core

processor, and that four main trend lines can be seen).



Running the size modification analysis with an incorrect magic value (i.e. replacing 0x000000a1 with a different value)

results in a flat chart with no correlation between value and cycle count. This suggests that the magic value is checked

prior to the size value being used.



Due to the high level of noise while running this analysis on a multi-core system, the analysis was rerun with symmetric

multiprocessing (SMP) and HyperThreading disabled. A clear linear correlation between length value and cycle count is

seen. The follow data is taken from an Intel Core i5 2520M:







With this cleaner data, it is possible to observe new timing behavior. By displaying a smaller sample, clear timing

shelves are seen as the size value increases:







By observing the individual points of the timing shelves, it is clear that each timing shelf has 16 points. Since each single

increase in size value corresponds to a 4-byte increase in microcode data, 16 points represents 512 bits of data.







512-bits is the standard message block size for popular cryptographic hash functions such as MD5, SHA1 and SHA2.

The timing shelves observed match what we would expect from a Merkle-Damgard hash function, as each new shelf

represents the increased number of cycles required to process a new message block.



In public-key signature schemes, it is normal to sign a hash of the data message instead of signing the entire message

contents. This means that a hash operation being observed in the early stages of the microcode loader process is an

expected result.



The lack of timing artifacts corresponding to symmetric key algorithm block sizes (i.e. 128-bits) may also indicate that

authentication of the microcode contents is occurring prior to decryption of the microcode contents (i.e. the cipher-text is

authenticated). Given the space constraints of a modern CPU architecture, this design is not entirely unexpected, as it

allows the processor to load the decrypted content directly, without having to store the plaintext for authentication

purposes.



Observation #7 - What other data is in the first 704 bytes of a microcode update?



Note that the first shelf is observed after supplying a size value of 176 (or 704 bytes of microde data), and that supplying

a size value of 704 bytes or less results in a constant timing shelf. This would suggest that there exists a minimum

length of non-variable-length data that will be hashed regardless of the supplied microcode size field. This data includes

the undocumented microcode header and the RSA public key that has been discussed above.



If we assume that the presence of an RSA public key suggests the usage of RSA as a digital signature algorithm, then it

stands to reason that an RSA signature will be found in the microcode update. If this signature value is calculated using

the public key embedded in the microcode update, then we would expect to find a 2048-bit value that is strictly less than

the modulus value (since the signature is calculated using this modulus).



Examining the 2048-bits that are contiguously after the public key exponent value (0x00000011), we find a valid

candidate for an RSA signature. In every case, the 2048-bit value after the exponent is strictly less than the 2048-bits

prior to the exponent (the presumed RSA modulus).



We can attempt to recover the originally signed data by raising the signature value to the power of 0x00000011 and then

using the modulus value. The results of this operation can be found here. The format of this file is <processor signature>

<microcode version> <result>.



The result appears to use PKCS#1 v1.5 padding, with a private-key operation set for the block type. It is also clear that

earlier processor models used a 160-bit digest for the signature hash, which is consistent with SHA1. Later processor

models use a 256-bit digest, which is consistent with SHA2.







All attempts at recreating these hash values using standard SHA implementations have failed. Several non-standard

variations of Merkle-Damgard strengthening were also attempted. This may indicate that a non-standard initial vector or

some other non-standard structural variation is used when calculating the signed hash value.



Attempts to insert a new public key and signature for the same PKCS#1 signed data into the microcode also failed,

which suggests that the public key is part of the authenticated data, or that a hash of the expected/official public key is

stored in factory embedded memory and verified after authentication.



Interestingly, it was observed that setting the most significant word of the public key modulus to zero results in a

hardware reset (in the case of a single core system, this manifests as a hardware halt/freeze, not a system restart).

This may suggest a "division by zero" error exists in the microcode authentication routine.



Conclusion



Studying the Intel microcode update mechanism through data analysis and timing analysis has revealed properties

about the cryptographic design of this system:



Several previously undocumented header fields have been identified and described.

The results suggest that microcode updates are authenticated using a 2048-bit RSA signature.

The RSA signature operation appears to be constant-time (i.e. unaffected by changes to the supplied exponent,

modulus or signature value).

modulus or signature value). Timing analysis reveals 512-bit steps correlating to supplied microcode length. This is a common message

block size for cryptographic hash functions such as SHA1 and SHA2

block size for cryptographic hash functions such as SHA1 and SHA2 The RSA signature was located, and the signed data is a PKCS#1 1.5 encoded hash value. Older processor

models use a 160-bit digest (SHA1), and newer processor models use a 256-bit digest (SHA2).