As an FPGA developer, not every application we develop has the luxury of being implemented using a device with ample resources.

For some applications, typically those which are cost, footprint or power sensitive, we may find ourselves having to work with smaller devices. We can also find ourselves in this position if we fall victim to the dreaded scope creep.

MiniZed (7Z007S) & Ultra96 (ZU3EG) examples of the 7 Series and UltraScale+ where SIMD can be useful

One way we can better utilize the resources available in our device when it comes to DSP elements is to leverage Single Instruction Multiple Data, or SIMD for short.

When using an FPGA or SoC, SIMD function very similar to SIMD in the SW world allows us to use a single resource to calculate multiple results at the same time.

In this case for our 7 series and UltraScale+ designs, the resource in question would be either a DSP48E1 (7 series) or DSP48E2 (UltraScale+). Both of these are complex DSP elements that enable the implementation of high performance calculations in our FPGA or SoC.

DSP48E2 context diagram (📷: Xilinx UG579)

Examining both the DSP48E1 and DSP48E2, you will see that inputs A, B and C are the same width being 30, 18 and 48 bits, respectively. While the output P is also the same at 48 bits.

For SIMD operations, we can use these inputs to perform multiple add/sub/accumulate operations in parallel. Depending upon the size of the vectors we wish to work with, we can have up to four 12-bit operations or two 24-bit operations.

Developing RTL to take advantage of the SIMD capabilities of the DSP48 is straightforward.

We can do this in our RTL by setting an attribute for the synthesis tool to detect such that it can infer the correct DSP48 implementation function and mode / OPCODE.

In our source code, the attributes are defined as shown below.

VHDL

attribute use_dsp : string; attribute use_dsp of arch : architecture is “simd”;

Verilog

(* = “simd” *)

If you are unsure how to implement SIMD in your RTL, the simplest way to double-check on the correctness is to refer to the language templates provided by Vivado.

Vivado SIMD VHDL language template for Quad DSP48 operation

Of course, once we have written the code and included the attribute within our design, we can run synthesis prior to implementation. Once synthesis completes to ensure the synthesis engine has correctly implemented SIMD, we need to check the synthesis report.

Within the synthesis report, you will find the preliminary DSP mapping, showing how the A, B, and C ports are used.

Synthesis results for four 12-bit additions targetting a DSP48E2

Synthesis results for two 24-bit additions targeting a DSP48E2

We should also check the DSP utilization in the synthesis report to ensure it aligns with our expectations.

When we use DSP48s in this manner, compared to implementing add/sub/accumulate functionality in logic, it offers several benefits. We gain not only a reduction in the area required to implement the functionality, but an implementation which is more power efficient as well.

As outlined in the introduction, using SIMD provides us the ability to efficiently leverage device resources if we are facing resource or power dissipation challenges in our target device.

See My FPGA / SoC Projects: Adam Taylor on Hackster.io

Get the Code: ATaylorCEngFIET (Adam Taylor)

Additional Information on Xilinx FPGA / SoC Development can be found weekly on MicroZed Chronicles.