By Adam Taylor

As engineers we cannot assume the systems we design will operate as intended 100% of the time. Unexpected events and failures do occur. Depending upon the application, the protections implemented against these unexpected failures vary. For example, a safety-critical system used for an industrial or automotive application requires considerably more failure-mode analysis and much better protection mechanisms than a consumer application. One simple protection mechanism that can be implemented quickly and simply in any application is the use of a watchdog and this blog post describes the three watchdogs in a Zynq UltraScale+ MPSoC.

Watchdogs are intended to protect the processor against the software application crashing and becoming unresponsive. A watchdog is essentially a counter. In normal operation, the application software prevents this counter from reaching its terminal count by resetting it periodically. Should the application software crash and allow the watchdog to reach its terminal count by failing to reset the counter, the watchdog is said to have expired.

Upon expiration, the watchdog generates a processor reset or non-maskable interrupt to enable the system to recover. The watchdog is designed to protect against software failures so it must be implemented physically in silicon. It cannot be a software counter for reasons that should be obvious.

Preventing the watchdog’s expiration can be complicated. We do not want crashes to be masked resulting in the watchdog’s failure to trigger. It is good practice for the application software to restart the watchdog in the main body of application. Using an interrupt service routine to restart the watchdog opens the possibility of the main application crashing but the ISR still being serviced. In that situation, the watchdog will restart without a crash recovery.

The Zynq UltraScale+ MPSoC provides three watchdogs in its processing system (PS):

Full Power Domain (FPD) Watchdog protecting the APU and its interconnect

Low Power Domain (LPD) Watchdog protecting the RPU and its interconnect

Configuration and Security Unit (CSU) Watchdog protecting the CSU and its interconnect

The FPD and LPD watchdogs can be configured to generate a reset, an interrupt, or both should a timeout occur. The CSU can only generate an interrupt, which can then be actioned by the APU, RPU, or the PMU. We use the PMU to manage the effects of a watchdog timeout, configuring it via to act via its global registers.

The FPD and LPD watchdogs are clocked either from an internal 100MHz clock or from an external source connected to the MIO or EMIO. The FPD and LPD watchdogs can output a reset signal via MIO or EMIO. This is helpful if we wish to alert functions in the PL that a watchdog timeout has occurred.

Each watchdog is controlled by four registers:

Zero Mode Register – This control register enables the watchdog and enables generation of reset and interrupt signals along with the ability to define the reset and interrupt pulse duration.

Counter Control Register – This counter configuration register sets the counter reset value and clock pre-scaler.

Restart Register – This write-only register takes a specific key to restart the watchdog.

Status Register – This read-only register indicates if the watchdog has expired.

To ensure that writes to the Zero Mode, Counter Control, and Restart registers are intentional and not the result of an incorrect software operation, write accesses to these registers require that specific keys, different for each register, must be included in the written data word for each write take effect.

Zynq UltraScale+ MPSoC Timer and Watchdog Architecture

To include the FPD or LPD watchdogs in our design, we need to enable them in Vivado. You do so using the I/O configuration tab of the MPSoC customization dialog.

Enabling the SWDT (System Watchdog Timer) in the design

For my example in this blog post, I enabled the external resets and connected them to an ILA within the PL so that I can capture the reset signal when it’s generated.

Simple Test Design for the Zynq UltraScale+ MPSoC watchdog

To configure and use the watchdog, we use SDK and the API defined in xwdtps.h. This API allows us to configure, initialize, start, restart, and stop the selected watchdog with ease.

To use the watchdog to its fullest extent, we also need to configure the PMU to respond to the watchdog error. This is simple and requires writes to the PMU Error Registers (ERROR_SRST_EN_1 and ERROR_EN_1) enabling the PMU to respond to watchdog timeout. This will also cause the PS to assert its Error OUT signal and will result in LED D3 illuminating on the UltraZed SOM when the timeout occurs.

For this example, I also used the PMU persistent global registers, which are cleared only by a power-on reset, to keep a record of the fact that a watchdog event has occurred. This count increments each time a watchdog timeout occurs. After the example design has reset 6 times, the code finally begins to restart the watchdog and stays alive.

Because the watchdog causes a reset event each time the processor reboots, we must take care to clear the previously recorded error. Failure to do so will result in a permanent reset cycle because the timeout error is only cleared by a power-on reset or a software action that clears the error. To prevent this from happening, we need to clear the watchdog fault indication in the PMU’s global ERROR_STATUS_1 register at the start of the program.

Reset signal being issued to the programmable logic

Observing the terminal output after I created my example application, it was obvious that the timeout was occurring and the number of occurrences was incrementing. The screen shot below shows the initial error status register and the error mask register. The error status register is shown twice. The first instance shows the watchdog error in the system and the second confirms that it has been cleared. The reset reason also indicates the cause of the reset. In this case it’s a PS reset.

Looping Watchdog and incrementing count

We have touched lightly on the PMU’s error-handling capabilities. We will explore these capabilities more in a future blog.

Meanwhile, you can find the example source code on the GitHub.

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

First Year E Book here

First Year Hardback here.

Second Year E Book here

Second Year Hardback here