I have recently been evaluating the SLX FPGA tool from Silexica. If you are not familiar with SLX FPGA it is designed to work with both Vivado HLS and SDSoC.

SLX FPGA was created to help address the challenges software engineers face deploying C/C++ applications on to programmable logic targets using High Level Synthesis (HLS). These challenges include detecting parallelism within the design, and optimizing the design to be able to leverage the parallelism for increased performance.

This optimization is achieved by identifying and inserting appropriate pragmas to provide level of optimization desired.

To perform the optimizations, SLX FPGA analyses the users HLS C Design and identifies Data Level Parallelism (DLP) and Pipeline Level Parallelism (PLP) structures within the HLS design. FPGA also determines the software call graph, read and write accesses to local, heap and global variables along with providing a holistic view of memory accesses.

To generate the suggested optimizations SLX FPGA uses three stages

Check for synthesizability errors and guide the user through on how to fix them (see the synthesizability check and recommendation in figure 1 above)

Determine the Data Level and Pipeline Level Parallelism structures in the design.

Add the appropriate pragmas for optimal latency, throughput, or area.

Let’s take a look at a simple example implementation of a AES 256 encryption. algorithm. The initial implementation of the HLS code without any pragmas or other optimizations resulted in the following latency and resource utilization.

This original code is therefore capable of processing a new AES Encryption every 1888 clock cycles when implemented in programmable logic. However, there exist several potential optimizations in the code which could be made in the code e.g. loops unrolled, and block RAMS partitioned.

Running the analysis in SLX FPGA on the AES Code (input code is here) produces the following results, identifying several potential optimizations.

This analysis, shown above as SLX hints also breaks down the execution percentage of each element within the design. For example, in the above analysis you can see main loop in the AES Encryption function accounts for 66% of the execution time. While the loop which can be accelerated by pragma insertion accounts for 60% obviously making improvements here will yield big rewards in performance.

With the analysis completed we can then proceed to the hardware optimization and partitioning which is where we can enable or disable pragma suggestions. We can also change parameters at this stage such as unroll factors if we are willing to trade of additional logic resources for performance.

Once we are happy with the optimizations we can generate the code, as the generation progresses we will see both the original source code and the generated code side by side. This view is really useful as it allows us to see the location of the pragma insertions.

When this optimized design was completed, we see a significant increase in performance.

Compared to the original latency, the new design requires only 19% of the original non-optimized latency.

While the resource requirements have increased to provide the performance again the logic footprint of the design is still very compact.

What is interesting to me having worked considerably with HLS over the years is how easy the insertion of pragmas was with SLX FPGA. HLS optimization can be challenging, iterative and time-consuming process, SLX FPGA made this much simpler.

This is especially true for engineers new to HLS or those from a non-FPGA development background who may not understand the internal device structures necessary to optimize the HLS design. Using SLX FPGA this analysis and optimization process becomes the click of a few buttons and reviewing the recommended optimization.

For advanced HLS engineers, SLX FPGA can be used as a double check to see that there wasn’t any performance that the engineer missed or to simply move the HLS design into FPGA faster if they are not familiar with the algorithm.

I am going to keep experimenting with this tool across a few more use cases to learn a little more about it and its capabilities!