When working with peripherals, we need to be able to read and write to the device’s internal registers. How we achieve this in C depends on whether we’re working with memory-mapped IO or port-mapped IO. Port-mapped IO typically requires compiler/language extensions, whereas memory-mapped IO can be accommodated with the standard C syntax.

Embedded “Hello, World!”

We all know the embedded equivalent of the “Hello, world!” program is flashing the LED, so true to form I’m going to use that as an example.

The examples are based on a STM32F407 chip using the GNU Arm Embedded Toolchain .

The STM32F4 uses a port-based GPIO (General Purpose Input Output) model, where each port can manage 16 physical pins. The LEDS are mapped to external pins 55-58 which maps internally onto GPIO Port D pins 8-11.

Flashing the LEDs

Flashing the LEDs is fairly straightforward, at the port level there are only two registers we are interested in.

Mode Register – this defines, on a pin-by-pin basis what its function is, e.g. we want this pin to behave as an output pin.

Output Data Register – Writing a ‘ 1 ‘ to the appropriate pin will generate voltage and writing a ‘ 0 ‘ will ground the pin.

Mode Register (MODER)

Each port pin has four modes of operation, thus requiring two configuration bits per pin (pin 0 is configured using mode bits 0-1, pin 2 uses mode bits 2-3, and so on):

00 Input

Input 01 Output

Output 10 Alternative function (details configured via other registers)

Alternative function (details configured via other registers) 11 Analogue

So, for example, to configure pin 8 for output, we must write the value 01 into bits 16 and 17 in the MODER register (that is, bit 16 => 1, bit 17 => 0).

Output Data Register (ODR)

In the Output Data Register (ODR) each bit represents an I/O pin on the port. The bit number matches the pin number.

If a pin is set to output (in the MODER register) then writing a 1 into the appropriate bit will drive the I/O pin high. Writing 0 into the appropriate bit will drive the I/O pin low.

There are 16 IO pins, but the register is 32bits wide. Reserved bits are read as ‘0’.

Port D Addresses

The absolute addresses for the MODER and ODR of Port D are:

MODER – 0x40020C00

ODR – 0x40020C14

Pointer access to registers

Typically when we access registers in C based on memory-mapped IO we use a pointer notation to ‘trick’ the compiler into generating the correct load/store operations at the absolute address needed.

So for the Port D we might see something along the lines of (I’ll keep the code brief and use magic numbers) for simplicity):

#include <stdint.h> volatile uint32_t* const portd_moder = (uint32_t*) 0x40020C00; volatile uint32_t* const portd_odr = (uint32_t*) 0x40020C14; extern void sleep(uint32_t ms); // use systick to busy-wait int main(void) { uint32_t moder = *portd_moder; moder |= (1 << 16); moder &= ~(1 << 17); *portd_moder = moder; while(1) { *portd_odr |= (1 << 8); // led-on sleep(500); *portd_odr &= ~(1 << 8); // led-off sleep(500); } }

Alternatively we may see the registers defined using the pre-processors, e.g.

#include <stdint.h> #define PORTD_MODER (*((volatile uint32_t*) 0x40020C00)) #define PORTD_ODR (*((volatile uint32_t*) 0x40020C14)) extern void sleep(uint32_t ms); // use systick to busy-wait int main(void) { uint32_t moder = PORTD_MODER; moder |= (1 << 16); moder &= ~(1 << 17); PORTD_MODER = moder; while(1) { PORTD_ODR |= (1 << 8); // led-on sleep(500); PORTD_ODR &= ~(1 << 8); // led-off sleep(500); } }

There is a misconception among many C programmers that the pointer model is less efficient than the #define model. With C99 and modern compilers this is not the case, they will generate identical code (C99 allows for the complier to optimise away const objects).

Enabling Port D

We are missing one final step; each peripheral on the the STM32F407 is clock gated. The clock signal does not reach the peripheral until we tell it to do so by way of setting a bit in a specific register. By default, clock signals never reach peripherals that are not in use, thus saving power.

To enable the clock to reach the GPIO port D the GPIODEN (GPIO D Enable) bit (bit 3) of the AHB1ENR (AMBA High-performance Bus 1 Enable) register in the RCC (Reset and Clock Control) peripheral needs setting.

#include <stdint.h> volatile uint32_t* const portd_moder = (uint32_t*) 0x40020C00; volatile uint32_t* const portd_odr = (uint32_t*) 0x40020C14; volatile uint32_t* const rcc_ahb1enr = (uint32_t*) 0x40023830; extern void sleep(uint32_t ms); // use systick to busy-wait int main(void) { *rcc_ahb1enr |= (1 << 3); // enable PortD's clock uint32_t moder = *portd_moder; moder |= (1 << 16); moder &= ~(1 << 17); *portd_moder = moder; while(1) { *portd_odr |= (1 << 8); // led-on sleep(500); *portd_odr &= ~(1 << 8); // led-off sleep(500); } }

Using structs

The code so far works just fine, but has a number of shortcomings.

First, to support multiple IO ports we would have to define a set of pointers for each set of registers for each port, e.g.:

volatile uint32_t* const porta_moder = (uint32_t*) 0x40020000; volatile uint32_t* const porta_odr = (uint32_t*) 0x40020014; volatile uint32_t* const portb_moder = (uint32_t*) 0x40020400; volatile uint32_t* const portb_odr = (uint32_t*) 0x40020414; volatile uint32_t* const portc_moder = (uint32_t*) 0x40020800; volatile uint32_t* const portc_odr = (uint32_t*) 0x40020014; volatile uint32_t* const portd_moder = (uint32_t*) 0x40020C00; volatile uint32_t* const portd_odr = (uint32_t*) 0x40020C14; volatile uint32_t* const porte_moder = (uint32_t*) 0x40021000; volatile uint32_t* const porte_odr = (uint32_t*) 0x40021014;

Considering the port actually has 10 different registers we may want to access, this involves a lot of repetition. Where there is repetition, simple to make, but difficult to track down bugs can creep in (did you spot the deliberate mistake?).

In addition, and more significantly, we can see that the port’s ODR is always 0x14 bytes offset from the MODER. The MODER is always at offset 0x00 from the port address (this the MODER is also the port’s base address).

In Software Engineering terms we’d view this separate declaration of related pointers

as a lack of cohesion in the code. One of our goals is to strive for high cohesion, thus grouping things together that should naturally be together (as change effects them all).

struct Overlay

The full register layout for the STM32F4 GPIO port is shown below:

By using a struct to define the relative memory offsets, we can get the compiler to generate all the correct address accesses relative to the base address.

#include <stdint.h> typedef struct { uint32_t MODER; // mode register, offset: 0x00 uint32_t OTYPER; // output type register, offset: 0x04 uint32_t OSPEEDR; // output speed register, offset: 0x08 uint32_t PUPDR; // pull-up/pull-down register, offset: 0x0C uint32_t IDR; // input data register, offset: 0x10 uint32_t ODR; // output data register, offset: 0x14 uint32_t BSRR; // bit set/reset register, offset: 0x18 uint32_t LCKR; // configuration lock register, offset: 0x1C uint32_t AFRL; // GPIO alternate function registers, offset: 0x20 uint32_t AFRH; // GPIO alternate function registers, offset: 0x24 } GPIO_t;

Now we define the pointer as before, but this time using the struct type rather than a uint32_t :

volatile GPIO_t* const portd = (GPIO_t*)0x40020C00;

Finally we can use it as before, but this time use struct-pointer dereferencing to access the individual registers:

int main(void) { *rcc_ahb1enr |= (1 << 3); // enable PortD's clock uint32_t moder = portd->MODER; moder |= (1 << 16); moder &= ~(1 << 17); portd->MODER = moder; while (1) { portd->ODR |= (1 << 8); // led-on sleep(500); portd->ODR &= ~(1 << 8); // led-off sleep(500); } }

Now when we access the ODR via the statement:

portd->ODR |= (1 << 8); // led-on

the compiler can calculate the relative offset (0x14) of the ODR member relative to the base address of the pointer (0x40020C00).

This means that we only need one pointer per port rather than 10, e.g.

volatile GPIO_t* const porta = (GPIO_t*)0x40020000; volatile GPIO_t* const portb = (GPIO_t*)0x40020400; volatile GPIO_t* const portc = (GPIO_t*)0x40020800; volatile GPIO_t* const portd = (GPIO_t*)0x40020C00; volatile GPIO_t* const porte = (GPIO_t*)0x40021000;

Alternatively we could do the same with #define s;

#define PORTA ((volatile GPIO_t*) 0x40020000) #define PORTB ((volatile GPIO_t*) 0x40020400) #define PORTC ((volatile GPIO_t*) 0x40020800) #define PORTD ((volatile GPIO_t*) 0x40020C00) #define PORTE ((volatile GPIO_t*) 0x40021000)

Note in the #define s the leading ‘ * ‘ as a dereference has been dropped, so access to the register is coded thus:

PORTD->ODR |= (1 << 8); // led-on

If we left the dereference in:

#define PORTD (*((volatile GPIO_t) 0x40020C00))

the code would be:

PORTD.ODR |= (1 << 8); // led-on

It’s a matter of style, the generated instructions are the same.

Code Comparison

So how does the struct code expression compare to our original pointer code (compiled with optimisation flag -Og ):

Original code

$ arm-none-eabi-objdump -d -S main.o ... *portd_odr |= (1 << 8); // led-on 1a: 4c0b ldr r4, [pc, #44] ; (48 <main+0x48>) 1c: 6823 ldr r3, [r4, #0] 1e: f443 7380 orr.w r3, r3, #256 ; 0x100 22: 6023 str r3, [r4, #0] ...

The assembler code does the following:

Load the value 0x40020C14 into r4 Read the contents of 0x40020C14 [r4 + 0] as a 32-bit value into r3 Or 0x100 with the contents of r3 (set bit 8) Store r3 as a 32-bit value at address 0x40020C14

Comparing this to the struct access:

$ arm-none-eabi-objdump -d -S main.o ... portd->ODR |= (1 << 8); // led-on 1a: 4c0a ldr r4, [pc, #40] ; (44 <main+0x44>) 1c: 6963 ldr r3, [r4, #20] 1e: f443 7380 orr.w r3, r3, #256 ; 0x100 22: 6163 str r3, [r4, #20] ...

So how does this differ? only in the use of an offset-load:

Load the value 0x40020C00 into r4 Read the contents of 0x40020C14 [r4 + 20] as a 32-bit value into r3 Or the value 0x100 with the contents of r3 Store r3 as a 32-bit value at address 0x40020C14 – [r4 + 0x14]

This code demonstrates that, from a size and performance perspective, there is no difference between the two approaches (at least for the Arm).

Note: An Arm load ( ldr ) instruction with or without a secondary offset takes 2-cycles.

Caveats

Before rush off and refactor legacy code to now use structs there are a couple of factors we are relying on, which may vary from compiler to compiler.

First, what can we be sure of?

The offset of the first struct member is always 0x0 from the objects address (this is not guaranteed in C++ but usually is the case).

The compiler cannot reorder the members, so OTYPER will always come at a higher address in memory than MODER and at a lower than OSPEEDR.

However, we cannot guarantee that the compiler will not introduce padding between members, as the standard states:

There may be unnamed padding within a structure object, but not at its beginning.

So we cannot guarantee that address of OTYPER is equal to the address of MODER + 4 bytes.

That said, in practical terms, with modern compilers, it is unlikely to be a problem (for this code). Padding tends to occur when a data member crosses its natural boundary (i.e. a 32-bit type is not word aligned). e.g.

typedef struct { int a; char b; int c; } Padding_t;

would likely return a result of 12 from sizeof(Padding_t); because 3 paddings bytes

are added after char b to align the int c definition.

Mitigating the risk

The obvious, and most straightforward, approach is to ensure you have a unit test that checks the size of the generated structure, e.g.

void test_GPIO_t_struct_size(void) { TEST_ASSERT_EQUAL(40, sizeof(GPIO_t)); }

Alternatively, one of the compelling reasons to use C11 is the introduction of static_assert[link] , e.g.

int main(void) { static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present"); }

This is a compile-time check; if padding was present, then the following compiler error is generated:

src/main.c: In function 'main': src/main.c:87:3: error: static assertion failed: "padding in GPIO_t present" static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present"); ^

If you’re not using C11 (I’ve yet to come across an embedded C project using it) then a final approach is to try and ensure no padding is present by requesting the compiler ‘pack’ the struct to the most optimal memory model.

This is always a compiler-specific request, which may be done through #pragma s. However GCC uses its own ‘attribute’ approach instead of pragmas.

Defining the structure with the attribute ‘ packed ‘ will normally remove any potential padding, e.g.

typedef struct { uint32_t MODER; // mode register, offset: 0x00 uint32_t OTYPER; // output type register, offset: 0x04 uint32_t OSPEEDR; // output speed register, offset: 0x08 uint32_t PUPDR; // pull-up/pull-down register, offset: 0x0C uint32_t IDR; // input data register, offset: 0x10 uint32_t ODR; // output data register, offset: 0x14 uint32_t BSRR; // bit set/reset register, offset: 0x18 uint32_t LCKR; // configuration lock register, offset: 0x1C uint32_t AFRL; // alternate function registers, offset: 0x20 uint32_t AFRH; // alternate function registers, offset: 0x24 } __attribute__((packed)) GPIO_t; typedef struct { int a; char b; int c; } __attribute__((packed)) Padding_t; int main(void) { static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present"); static_assert(sizeof(Padding_t) == 9, "padding in Padding_t present"); }

Unaligned access can cause a whole host of problems and performance issues, so be extremely careful using packing.

Vendor Supplied Headers

On most modern microcontrollers you are likely to find headers provided with register definitions already supplied. Many years ago Arm introduced the

Cortex Micro-controller Software Interface Standard (CMSIS). As part of the standard it is expected that between Arm and the Vendor, register definitions will be supplied.

For example, ST supply a series for headers for their STM32 family of microcontrollers. Searching out the ST provided file stm32f407xx.h you will find definitions for all peripheral included in the 407 variant.

On line 544 of this header file (based on version V2.1.0) you will find the following definition:

typedef struct { __IO uint32_t MODER; /*!< GPIO port mode register, Address offset: 0x00 */ __IO uint32_t OTYPER; /*!< GPIO port output type register, Address offset: 0x04 */ __IO uint32_t OSPEEDR; /*!< GPIO port output speed register, Address offset: 0x08 */ __IO uint32_t PUPDR; /*!< GPIO port pull-up/pull-down register, Address offset: 0x0C */ __IO uint32_t IDR; /*!< GPIO port input data register, Address offset: 0x10 */ __IO uint32_t ODR; /*!< GPIO port output data register, Address offset: 0x14 */ __IO uint16_t BSRRL; /*!< GPIO port bit set/reset low register, Address offset: 0x18 */ __IO uint16_t BSRRH; /*!< GPIO port bit set/reset high register, Address offset: 0x1A */ __IO uint32_t LCKR; /*!< GPIO port configuration lock register, Address offset: 0x1C */ __IO uint32_t AFR[2]; /*!< GPIO alternate function registers, Address offset: 0x20-0x24 */ } GPIO_TypeDef;

This is a slightly different interpretation of the register layout from earlier, notably:

The BSRR has been split into two 16-bit register (BSRRL and BSRRH)

The AFR has been combined into an array of two elements (rather than a High and Low).

There could be a risk of padding between BSRRL and BSRRH, but unlikely and does not occur here.

The __IO macro simply maps onto volatile . There is a macro for __I (volatile const) to define ‘read only’ access (there is a __O (volatile) to indicate ‘write only’ access – but this can’t be enforced in C).

Further down in the file (line 1130):

#define GPIOD ((GPIO_TypeDef *) GPIOD_BASE)

Again, another slight difference in the code is the choice to put the volatile directive in the struct rather than at the pointer definition.

The RCC struct definition is on line 615 with the #define on line 1137.

The CMSIS code to drive the LED is:

#include "stm32f407xx.h" #include "timer.h" int main(void) { RCC->AHB1ENR = (1 << 3); uint32_t moder = GPIOD->MODER; moder |= (1 << 16); moder &= ~(1 << 17); GPIOD->MODER = moder; while (1) { GPIOD->ODR |= (1 << 8); // led-on sleep(500); GPIOD->ODR &= ~(1 << 8); // led-off sleep(500); } }

In summary

Programs are decomposed into modules in several ways of which one is chosen during the design process (assuming design happens!). The choice of decomposition has a critical effect on the architecturel and thus the product’s quality attributes such as maintainability, reliability, modifiability, and testability of the final system.

Cohesion is one of the most important concepts in software decomposition. High cohesion is central to good design principles and patterns, guiding separation of concerns and maintainability.

Using a struct-based model for device access improves cohesion through good abstraction models, making code easier to understand and maintain.

In the next article I shall start to compare the relative merits and consequences of using the #define model verse the pointer model.