The decryption function

If you are just joining this story you may want to start at part 1.

In part 2, we discovered that a embedded controller update is performed by uploading a small ‘flasher’ program to the EC. This flasher program is then responsible for programming a new firmware image to the EC’s internal flash memory. However, both the flasher program and part of the firmware image are encrypted: the old (currently running) EC firmware decrypts the flasher program, and the flasher program then decrypts the new firmware update. This creates a bit of a chicken-and-egg problem that prevents discovering the encryption algorithm from firmware update files alone.

We managed, however, to find a decrypted version of the EC firmware online, dumped directly from EC flash memory via JTAG. Let’s dive right in and disassemble the decryption function that can be found in that flash image. The core of it looks like this:

2854: 30 25 8d 1f 00 00 48 14 ld r13,[r13,0x1448] 285c: 30 21 81 0f 00 00 48 10 ld r1,[r1,0x1048] 2864: cf 7e extb_s r14,r14 2866: 02 be asl_s r14,r14,2 2868: b9 61 add_s r1,r1,r13 286a: 30 26 8d 1f 00 00 48 18 ld r13,[r14,0x1848] 2872: 4f 7f extb_s r15,r2 2874: 02 bf asl_s r15,r15,2 2876: a7 79 xor_s r1,r1,r13 2878: 30 27 8d 1f 00 00 48 1c ld r13,[r15,0x1c48] 2880: b9 61 add_s r1,r1,r13

Here each input byte is transformed through a lookup table (in cryptography terminology, a substitution box or ‘S-box’) and the results are combined with an add/xor/add structure. This is the Blowfish cipher, as becomes evident from one glance at the diagram in the Wikipedia article on Blowfish:

[Diagram by Decrypt3/DnetSvg via Wikimedia, CC BY-SA 3.0]

Now normally the first stage of Blowfish, like most ciphers, would be to expand a user-provided key – a password or passphrase or some other secret data – to produce a set of 18 round keys (called the ‘P array’ in Blowfish terminology) and the four 256-entry S boxes depicted above. In cryptography this expansion step is called a key schedule.

In the case of the Lenovo firmware, it turns out that the keys are stored in pre-expanded form, i.e. the values of the P array and S boxes are stored in flash memory rather than the original key string. We can extract the P array and S boxes from the dumped flash image and use them for encryption/decryption.

(I do not believe there is any easy way – where in cryptography easy means ‘significantly better than trying all possibilities’ – to recover the original key string that was used to generate the P array and S boxes. This would be an interesting challenge, but is of purely academic interest: the P array and S boxes are all that is needed for both encryption and decryption. Each round of Blowfish involves taking the data, mixing in [exclusive or] one of the round keys from P and then performing the function depicted above using the S boxes; this is repeated 16 times; apart from some minor details you now understand how Blowfish works.)

The checksum function

Having the encryption algorithm and keys, we can now decrypt the flasher program that is uploaded to the embedded controller when performing a firmware update.

Analysis of the flasher program allows us to determine the checksum algorithm used to validate the firmware image. Even without a detailed analysis of the disassembly, we can notice that it uses the following table:

uint16_t csum_table[256] = { 0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50A5, 0x60C6, 0x70E7, 0x8108, 0x9129, 0xA14A, 0xB16B, 0xC18C, 0xD1AD, 0xE1CE, 0xF1EF, 0x1231, 0x0210, 0x3273, 0x2252, 0x52B5, 0x4294, 0x72F7, 0x62D6, 0x9339, 0x8318, 0xB37B, 0xA35A, 0xD3BD, 0xC39C, 0xF3FF, 0xE3DE, ... };

A quick Google for these numbers gives this secret away easily: this is a lookup table for a CRC-16 function with a generator polynomial of 0x1021. At this point we can be pretty sure that the algorithm is CRC-16, with only a few details to determine like initialisation value, but let me indulge myself and make a brief digression into what a CRC is and why the table looks like this. You can skip to the next section when this gets too dense for your liking.

The first thing to note is that when people refer to a CRC-16 generator polynomial of 0x1021, what they actually mean is 0x11021, i.e. binary 10001000000100001, i.e. x^16 + x^12 + x^5 + 1. (The x^n term of a CRC-n generator polynomial is always assumed to be 1, otherwise the following algorithm would not produce an n-bit CRC.)

Now the best way to imagine calculating a CRC is, for every 1 bit that is set in the input message, you XOR in the polynomial (towards the right, if working left to right). This clears that bit and flips some subsequent bits: for the above polynomial, it flips the three bits at N+4,N+11,N+16. If those bits end up as 1 (when you come to them), they will in turn propagate to other bits. Once you reach the end of the message, you’ve cleared all the bits in the message, but you have some bits hanging off the end: these are the CRC. You can intuitively see from this that the CRC has error propagating properties; an erroneous bit will likely cascade to many other bits. (Engineers might be reminded of a linear feedback shift register which has many similarities. The CRC can also be expressed mathematically as the remainder of a long division in carry-less binary arithmetic [GF(2)], the similarity to long division will become apparent in the examples below.)

As an example, if the input message to our CRC is one byte with value 5 (00000101), then the CRC calculation would proceed like this, where I have highlighted in red the leading 1 at each step:

(5) 00000 1 01| xor 1 00|01000000100001 (0x11021) ----------------------- 0000000 1 |01000000100001 xor 1 |0001000000100001 (0x11021) ------------------------- 00000000|0101000010100101 (0x50a5)

Here is another example for an input byte with value 16 (00010000):

(16) 000 1 0000| xor 1 0001|000000100001 (0x11021) --------------------- 0000000 1 |000000100001 xor 1 |0001000000100001 (0x11021) ------------------------- 00000000|0001001000110001 (0x1231)

For larger messages doing this bit by bit would be slow, of course. To do it more efficiently we can pre-calculate the effect of a byte on the next sixteen bits (this will be some pattern of bit flips), for each of the 256 possible values of that byte. We then look up the first byte, XOR that in to the next sixteen bits, move to the next byte and repeat. Actually the “effect of the first byte on the next sixteen bits” is exactly equivalent to the CRC of that byte – the CRC is the pattern of bit flips that ends up past the end of a message – so the lookup table is in effect a table of CRCs of single bytes.

Looking at the above examples you can verify that the CRC of 5 is indeed the entry with index 5 in the table above, and the CRC of 16 is indeed the entry with index 16. If you stare at the first example for a moment, you might see why the first sixteen entries of the table are multiples of 0x1021: all the 1 bits in 0x11021 are far enough apart that the input bits propagate independently into the CRC. Things change at entry 16 because the next 1 in the polynomial crosses over into the message.

Now, back to your regularly scheduled programming

Returning to the firmware, we can verify that the flasher checksum is in fact CRC-16. It’s important to note that the flasher program calculates this checksum after decrypting the firmware image, so the CRC-16 must be calculated on the decrypted version of the image. (It turns out that the firmware image is encrypted using the same key as the flasher program, so at this point we already have everything we need to know.)

I mentioned in part 2 that a simple 16-bit checksum is also present at the very end of the firmware image. In fact, there is also a third type of checksum that we discover when we disassemble the EC boot code: four 32-bit checksums, each performed on a different area of the flash memory.

Summarising now, the three EC checksums that must be correct are:

The outer checksum : BIOS checks the checksum of the (encrypted) image before uploading it to the flasher. The last two bytes of the image are calculated such that the total sum of the 16-bit words of the image is zero. If this fails, flashing doesn’t proceed, so this is the least dangerous checksum to fail.

: BIOS checks the checksum of the (encrypted) image before uploading it to the flasher. The last two bytes of the image are calculated such that the total sum of the 16-bit words of the image is zero. If this fails, flashing doesn’t proceed, so this is the least dangerous checksum to fail. The flasher checksum : The flasher calculates the CRC-16 checksum of all the decrypted bytes except the last four bytes. The penultimate two bytes of the image contain this checksum. If this checksum fails, the flasher sets a flag in flash that causes the EC firmware to fail to boot.

: The flasher calculates the CRC-16 checksum of all the decrypted bytes except the last four bytes. The penultimate two bytes of the image contain this checksum. If this checksum fails, the flasher sets a flag in flash that causes the EC firmware to fail to boot. The boot checksums: The EC firmware, at boot, calculates 32-bit checksums of four different areas of the flash. If any of these fail the EC firmware fails to boot. These checksums must, of course, also be calculated on the decrypted image.

Building a modified firmware

We now have everything we need to know to successfully modify the embedded controller firmware. Returning to my original goal, here is the change I made to the battery authentication check (before/after, with some comments added):

; call validation function on battery response 1d168: ee 0f af f2 bl.d 0x2954 ; validate 1d16c: f8 16 02 11 ldw r2,[r14,248] - ; branch to failure path if return value not equal to 1 - 1d170: 0b 08 51 00 brne r0,1,0x1d17a + ; branch replaced with no-operation instruction + 1d170: 4a 26 00 70 nop ; success path - set bit 3 in battery status 1d174: 00 86 ld_s r0,[r14,0] 1d176: 83 b8 bset_s r0,r0,3 1d178: 24 f0 b_s 0x1d1c0 ; failure path - try a different validation function, ; else retry up to 8 times, else abort 1d17a: 10 10 80 20 ldb r0,[r16,16] ...

Previously, if the return value from the validation function was not equal to 1, the code would branch to a failure path. I replaced this conditional branch instruction with a no-operation instruction so that, regardless of the validation result, execution would fall through to the success path. (This technique of replacing jumps with nops is a common trick when you need to modify the behaviour of binary code.)

(At this point my focus was on the first authentication sequence – state 12 in the firmware state machine that I listed in part 2.)

Also, for fun and so I could track my modified firmware version, I changed the embedded firmware version from “GCHT25WW” to “GCHT25MC”. I thought putting my initials in the geographic region field would be appropriate: the original suffix, WW, presumably means worldwide, while this firmware had… somewhat more limited geographic reach.

Having made my changes, I wrote some small utilities to regenerate the three types of checksums – I have published these utilities on GitHub – and finally I re-inserted the modified embedded controller firmware into the BIOS update .FL2 file at offset 0x500000.

If you are attempting this, make sure you check and double-check that you’ve modified exactly what you intended to modify. For Linux users, process substitution can be handy here, e.g. to get a binary diff of two files you can do: diff -u <(xxd file1.bin) <(xxd file2.bin).

The moment of truth?

I ran the BIOS update program and was greeted with:

An update is not necessary at this time. The process has been canceled.

Thwarted. I was already running the latest BIOS version and the update program did not offer an option to force an update of the BIOS or EC. Downgrading and upgrading might usually be a workaround, but I wasn’t sure that this would be possible as the Lenovo release notes mention that, “if the UEFI BIOS has been updated to version 2.63 or higher, it is no longer able to roll back to the version before 2.63 for security improvement”.

Fortunately, it turns out it’s possible to run the update program manually, bypassing the limited user interface.

Manually updating the EC

A note: As I do not have Windows on this laptop, I am running the BIOS update program from a USB flash drive following some instructions I found online (in short: download the bootable CD image; run geteltorito -o bios.img gcuj23us.iso; write the bios.img file to the USB stick). I suspect the below process is even easier from Windows, where you can directly run winflash32.exe or winflash64.exe in place of dosflash.exe.

The Lenovo BIOS update CD is just a DOS boot disk at heart. If you’ve written it to rewritable media like a USB flash drive, you can edit autoexec.bat to stop it starting the user interface, in which case it will drop you to a DOS prompt.

Updating the EC firmware can be performed using dosflash.exe as follows (/sd is skip date check to allow downgrading, /ipf ec targets the EC area):

C:\FLASH> dosflash /sd /ipf ec /file GCETA3WW\$01DA000.FL2

SCT Flash Utility for Lenovo

for Shell V1.0.1.3

Copyright (c) 2011-2012 Phoenix Technologies Ltd.

Copyright (C) 2011-2012 Lenovo Group Limited. Read BIOS image from file.

Initialize Flash module.

Read current BIOS.

Oem check Prepare to flash “ec” Do not turn off computer during the update!!! Begin Flashing……

Total blocks of the image = 48.

|—+—-+—-+—-+—-+—-+—-+—-+—-+—-|

…………………………………………..

Image flashing done. Flashing finished. BIOS is updated successfully. WARNING: System will shutdown or reboot in 5 seconds!

A note here on the FL1 and FL2 files since I couldn’t find any explanation about this on the Internet: the FL1 file is a UEFI capsule file that contains the BIOS. The FL2 file contains embedded controller firmware at offset 0x500000-0x530000, the rest of it can be ignored. Why then is the FL2 file so large and why does it contain bits of an old BIOS version pasted after the EC firmware? I think partly it may be to appease the non-Lenovo-specific parts of dosflash.exe. I noticed that even though ultimately it only uses the 48 4KB blocks from 0x500000-0x530000, if I pass a file that ends there, dosflash.exe does not recognise it as a valid BIOS update file.

(While the command shown above updates the EC, I will note here that it is also possible to update the BIOS in a similar way, by omitting /ipf ec and by specifying the FL1 file instead of the FL2 file: dosflash /sd /file GCETA3WW\$01DA000.FL1

Of course I recommend using the the normal manufacturer-recommanded BIOS upgrade/downgrade process when possible, but this may be useful if you are in a bind.)

Note that, despite what the output of dosflash.exe says, the actual EC firmware is not updated yet: at this point it has just written the update to an area where it can be picked up by BIOS at boot. Now after reboot the screen displays:

Flashing Embedded Controller…

Please do not power off!

(I appreciate the ‘please’, but really that should read “DO NOT UNDER ANY CIRCUMSTANCES POWER OFF, EVEN IF YOUR HOUSE IS ON FIRE AND SOMEONE HAS STOLEN YOUR PANTS”.)

A few skipped heartbeats later, the firmware update completes.

The moment of truth?

Sure enough, the system is now running my modified embedded controller firmware.

But my battery still isn’t charging. I hook up the SMBus signals to my logic analyser again and the communication looks like this:

... START 16 (Control Byte: Slave Address B Write) 3C 04 NACK STOP START 16 (Control Byte: Slave Address B Write) 3C 04 NACK STOP START 16 (Control Byte: Slave Address B Write) 3C 04 NACK STOP START 16 (Control Byte: Slave Address B Write) 3C 04 NACK STOP

It turns out that it isn’t even getting to the authentication check that I had modified, because the earlier command that sends the challenge to the battery is failing: as soon as the laptop sends command 3C and the data length indication 04, the battery is signalling NACK – not acknowledged – go away. So now I modify the state machine so that it proceeds whether or not that write command succeeds (again I do this by replacing a jump by a nop, this time in state 8).

Revision two deployed. Now the system boots with:

The battery installed is not supported by this system and will not charge. Please replace the battery with the correct Lenovo battery for this system. Press the ESC key to continue.

Well, that’s some sort of progress: at least it is no longer displaying the original unauthorised battery message.

I look in BIOS to see where these messages are coming from. Both this message and the original unauthorised battery message are displayed by LenovoVideoInitDxe.efi: don’t ask me why this code is in this module rather than somewhere more relevant (may I suggest LenovoAnnoyingBatteryMessageDxe.efi?), but it might have been convenient to put it in the video initialisation module as the message is displayed when the screen is cleared post-POST.

Anyway, in LenovoVideoInitDxe.efi it reads the battery status from the EC (register 0x38, which we came across in part 2 when decoding the ACPI tables, as well as register 0xd1 which has some additional battery status flags). Depending on certain bits in those registers, it may print one or other message.

Tracing through the EC code to find where those bits are set proves difficult, so I again hook the battery up to my logic analyser. It becomes evident that the sticking point is now the second authentication sequence.

I now make the same changes to the second authentication sequence as I did for the first: I prevent the write command from failing even if the battery NACKs it (state 15), and force the check of the response to succeed (state 19). This is now four changes in total, for each of which I’ve replaced a jump with a nop.

After booting to this final EC firmware revision, my saga comes to an end, almost anticlimactically. My replacement battery works, and I’m getting a good few hours out of it (and no, it hasn’t burst into flames).

Postscript

There is still one very curious open question that I haven’t managed to figure out. There are 10 random-looking bytes at offset 0x200 of the firmware image – the start of what looks like a firmware information block – which are different in every EC revision. So far I haven’t found anything that accesses those bytes, and indeed my EC update works fine even when I leave them unchanged. Probably this is a red herring, but what these bytes are for is still a mystery.

I have uploaded my utilities to GitHub so that it is possible to replicate my work and possibly make other changes to EC firmware. (Edit, Jan 2017: Hamish Coleman has built some additional utilities around my code that may also be useful, see hamishcoleman/thinkpad-ec. If you are only interested in the battery patch you will need to disable his keyboard patches.) The exact changes I made to my X230T firmware are also available as a bspatch file here. However a large disclaimer applies to all of this: do not attempt EC firmware modification at home unless you understand what you are doing. If something goes wrong with an EC update, there is a high likelihood of bricking your laptop, the only recourse being connecting to the EC via JTAG. I will not be held responsible for this. You should also understand that poor quality lithium ion cells can cause fires, as has been seen in the recent spate of hoverboard fires. I will also not be held responsible for this.

I have since also torn down my old Lenovo battery, and I plan to write another post soon with some information and photos. The option of replacing the cells in a genuine battery may be worth considering as an alternative to modifying the EC firmware, the advantage being is that you can choose your own high quality Li-Ion cells versus whatever you might happen to get in a replacement battery. The disadvantage is that it inevitably results in some damage to the battery casing, and as I mentioned before the controller will remember data about the old cells which might affect function of the new cells (I will see what I can figure out on that front).

To be fair, buying a genuine Lenovo battery is probably the best option for most people, at least while Lenovo is still making replacement batteries for this model. Primarily this was an exercise in ‘because I can’: I and a great many readers have enjoyed this process of diving deep into the system architecture of a modern laptop, at a level that few people are normally exposed to, and removing an annoying limitation that I feel I should be entitled to remove from my laptop. I do not have anything against Lenovo specifically – they are certainly not the only vendor who implements vendor restrictions – so please be nice in your comments.