There's nothing worse than having to recall your product because of a firmware bug. This is why bootloaders are an important part of modern "smart" electronic devices. This post aims to serve as a deep-dive intro to the art of bootloader design.

What is a bootloader anyway?

When an embedded device is turned on, it starts executing the application code (aka. firmware) stored in it's non-volatile memory, typically flash memory.

This firmware is usually programmed into the device at the factory by the device manufacturer, however if the firmware is updated or changed after the device has left the manufacturer, it needs to be re-programmed into the flash memory again.

This could be done by sending the product back manufacturer, however for those looking for a better user experience, this firmware can be re-programmed by the device itself. There is a problem with firmware updating itself though, as the code doing the updating would eventually overwrite itself and render the device inoperable. This is where the bootloader comes in.

The bootloader is a very minimal piece of self contained firmware that's programmed (usually once only) by the manufacturer, and it's main job is to receive new firmware and write it to where the old firmware once was. This definition is somewhat different to a computer bootloader, so it's while the two have the same name, they shouldn't be confused as the same thing.

Since this bootloader is used to erase and write new firmware, it must be designed carefully, so that failing in an unrecoverable state is not possible. Otherwise the bootloader has failed it's job and the device must be sent back to the manufacturer.

Throughout this post I use the terms bootloader to refer to the small piece of firmware used to update the application, which is the larger firmware that runs while the device is in use, until it is updated by the bootloader.

Design Considerations

The design of a bootloader has a lot of impact on the robustness and safety of the update process, however building the most resilient and robust bootloader doesn't always lead to the best user experience, so design trade-offs must be made and risks evaluatedd.

Triggering the bootloader (aka. entry)

During normal operation of a device, the running application will need some way of switching to the bootloader when a new application is ready to be programmed.

Software triggers

Sending an "enter bootloader" message to the device is a popular way of entering the bootloader, however it can be dangerous to rely on this as the only way of entering the bootloader. For example; what would happen if you released a new application and unknowingly break the "enter bootloader" message handling part of the code, or cause a lock-up in another part of the code? In this case there's no way of entering the bootloader, so the device loses the ability to update itself ever again.

Hardware triggers

Holding down a button is another popular way of entering the bootloader. This button is usually read by the bootloader when the device first turns on, so it's not possible to break this trigger with a bad application update, however holding a button down doesn't have the greatest user experience, and physical interaction isn't always an option for some products. If you use a software trigger, having a hardware trigger as a backup can be very useful.

No application trigger

This option is a little counter-intuitive, however not relying on the application or external hardware is an option if the correctly matching bootloader exit strategy is used. (See Timeouts and Triggered Exits below).

Leaving the bootloader (aka. exit)

Since the bootloader is the first piece of firmware to run, every time the device powers on, it must have a way of "jumping" to the application to run the real firmware. Jumping too soon or jumping to a bad firmware can be catastrophic so this must be carefully considered.

Application validity

Possibly one of the most common exit options is to simply check for a valid firmware image on boot. This allows for the fastest start-up time with zero user interaction. If this option is used, then software triggers should not be relied upon as the only bootloader entry trigger, otherwise if a buggy firmware passes the validity check, it wouldn't be possible to enter the bootloader.

Timeouts

Timeouts are another popular way to exit the bootloader. This is usually implemented by waiting for a set number of seconds before jumping to the application. This timeout window gives the bootloader a chance to start receiving a new application on every start-up, regardless of if the firmware image has a buggy or broken entry trigger. Waiting for 10 or 30 seconds on every start-up however can be a negative user experience for many products, and can also cause a nasty timing bug; Consider what would happen if the bootloader time-out is 10 seconds, but connecting to the device (perhaps wirelessly) can between 5 to 15 seconds. This has a potential for failure in the worst-case scenario.

Triggered exits

This method is the most robust, however it has possibly the worst end-user experience for some products. The triggered exit option works by having the bootloader wait on every start-up for a "boot now" message from another device. In industrial or commercial devices that are part of a larger system this strategy is often possible, but it's usually not possible in consumer electronics.

Interruption tolerance

Nearly all microcontrollers use non-volatile flash memory to store their firmware. The default binary state of flash memory is 1 , and "writing" to flash either leaves the bit at 1 , or "writes" the bit to 0 . The only way of "writing" a 1 to a bit in flash memory, is to "erase" that flash memory bit.

This "erase-then-write" characteristic of flash means that for the bootloader to write new firmware, it must first erase the old firmware. If the bootloader is interrupted during this erase-then-write cycle, the result can be erased but unwritten flash.

Bootloaders need to the tolerant to interruptions, whether they be from power or communications. Powering off a device mid-update should not render it inoperable. This implementation of this safety comes back to how the bootloader exit is triggered. If only application validity is used, care must be taken in checking for validity, to ensure that a complete valid application is present, not a half-written one.

Failed updates will and do happen all the time, so handling a failed update should be taken into consideration. The erase-then-write flash characteristic described above means that if an update fails, the device will be left without a functioning application. This can be OK for some devices, but this can be a bad user experience for some.

One strategy of overcoming this issue is to have space in flash for two applications. This way the bootloader can erase and write over the oldest application memory, while keeping the most recent application intact and untouched.

With two applications existing in flash, the bootloader will need some logic to know which application to start on boot (which handles failed updates correctly). The application images will also need to account for potentially being in two different memory locations. These are implementation specific details that are too in-depth to go into in this post.

Shared information

It's often useful or even necessary to share some information between the bootloader and the applciation. In practice this is usually application version information, or some saved configuration options.

Arranging this information in a memory location which is fixed and left untouched by the bootloader is important, since it's not easy or really possible to update the bootloader later, if the shared memory location changes.

This usually looks something like be memory layout below, with the config locations fixed and known by both the bootloader and application firmwares.

+------------+--------+-------------+ | BOOTLOADER | CONFIG | APPLICATION | +------------+--------+-------------+

Security

Exposing an interface to application firmware updates through a bootloader can also expose the device to potential security issues, so if this could be a problem then the security of firmware updates should be considered carefully.

It's not within the scope of this post to cover exactly how this should be implemented, but usually this is accomplished by cryptographically signing the application with a secure private key only known to the people authorized with releasing new application firmwares.

Storing private keys on the end-user device (and by extension, in distributed firmware) is not recommended, since it's very often possible for attackers to extract the flash of a microcontroller. To counter this, some type of asymmetric cryptography should be used. It's also worth considering how to handle revocation of keys if they were to become compromised. This can get very complicated very fast and the design and implementation is best left to a security expert if in doubt.

When it comes to embedded device security, it's important understand what kinds of attackers you want to protect your hardware against. Protecting against attacks from a nation state is very different to stopping a lone user from easily loading an unauthorized application.

Validation

Validation of the application has been mention already a few times, but it's important to consider how the bootloader will verify that a valid and uncorrupted firmware image has been received.

One common way to accomplish this is to have a CRC checksum for both the chunks of firmware being received by the bootloader, and a checksum for the entire application, which the bootloader can verify once it's finished writing the new application. Cryptographic hashes could also be used, but a simple 32-bit CRC is usually enough.

How the application checksum is communicated and stored is important, as the bootloader will/may need to check it on every boot. This is another type of information which can be stored using the Shared Information strategy described above.

Resource usage

The usage of flash or ram usually isn't of huge concern in the design of a bootloader, although if you're working with a small microcontroller with a low amount of flash then adding more features to the bootloader will directly impact on the amount of flash that can be used by application.

Including a large TCP/IP stack in a bootloader will not only bloat it's size but introduce a lot of code that increases the surface area for possible bugs. This is why bootloader designers and implementers strive for the smallest and leanest bootloader implementations possible.

Updating the bootloader

It's every firmware engineers worst nightmare, but bootloader bugs can and do happen, so considering how to resolve these issues is a good exercise in understanding why getting the bootloader right the first time is important.

It is possible to update a bootloader "in-the-field", although it's almost always a bad idea due to the erase-then-write nature of non-volatile flash memory. If the bootloader update fails, it usually results in an inoperable device. When you see devices with the warning: "do not disconnect from power during update", you can know that the updating process isn't well designed.

Below are two methods that I've used in the past, although doing any of these is usually a very bad idea and not at all recommended.

Bootloader updating application

One option is to build a one-off method of updating the bootloader firmware into an application update, which makes the application a meta-bootloader in itself.

Patching

If the bug in the bootloader is small, it's sometimes possible to "patch" just a few bytes of the flash memory from the application to fix the bug. This has the same "erase-then-write" pitfalls, although depending on where the patch exists, it can sometimes be possible to still recover if the patch doesn't get completely written. It requires a very lucky situation for this to work.

Hardware

With the software design considerations out of the way, there are some hardware considerations to be made when it comes to the bootloaders.

Bootloader primitives support

There are a few microcontroller features which are required for a bootloading system to be possible, and some features which just make things a lot easier.

In application programming

Also known as IAP on some architectures, this is the ability to erase and write a microcontrollers internal flash memory from the firmware running on that mirocontroller core itself. This is a hard requirement for writing a bootloader, and is common place on most modern microcontroller architectures.

This can also be used to store settings/data in an EEPROM like way, which makes it a useful feature to have.

Interrupt vector table relocation

This feature allows the interrupt vector table location to be relocation from the beginning of flash to another location.

This isn't a hard requirement, but it does make implementing the bootloader easier.

+-------------+-------------------+ | BOOTLOADER | APPLICATION | +-------------+-------------------+ ^ ^- Relocated location |- Original vector table location

System reset

Most microcontrollers have some ability to reset themselves from software. If a method for resetting the microcontroller is not present, this will affect the bootloader entry trigger design.

On a ARM Cortex M3/M4 core microcontroller, there is a register in the System Control Block which can be written to that resets the ARM core and it's peripherals. See this Cortex M3 implementation from libopencm3.

On other microcontrollers without a similar system reset register, a Watchdog Timer can be used. This is as straight forward as enabling the watchdog and then causing a non-returning loop.

This usually looks something like this:

enable_watchdog(); for(;;); /* Note: the loop above never returns */

Bootloader flash protection

Some microcontrollers have the ability to lock out sections of their flash memory from IAP access. This is a nice feature to ensure that the bootloader can never be over-written or corrupted, although it's not a required feature.

Built-in bootloaders

Some microcontroller have built-in bootloaders, which can make implementation an easy task.

The trade-off here is that these built-in bootloaders usually can't be configured or altered, they are programmed into the microcontroller by the manufacturer and can't be editied. If you want to add a flashing LED to indicate to the user that the bootloader is working, this likely won't be possible.

Bootloader interface

There are many ways to interface with a bootloader; UART, USB (DFU), Ethernet (TCP/IP or UDP), CAN, I2C, SPI, Bluetooth (OTA), RS-485 etc, are some examples of bootloader interfaces.

The transport used will affect the code size of the bootloader firmware and the speed at which new firmware can be transfered. For example, an TCP/IP bootloader will be much faster than an I2C bootloader, but the TCP/IP stack will take up many more Kilobytes of flash than the I2C bootloader by comparison.

What's next?

This post outlines the design decisions and trade-offs which should be considered with designing a custom microcontroller bootloader.

In future posts I will dive further into the implementation of different types of bootloaders, along with tips, tricks and common design patterns seen in the various types of bootloaders.

Stay tuned!

~ Gus