It's my pleasure to introduce guest blogger Kiran Kintali. Kiran is the product development lead for HDL Coder at MathWorks. In this post, Kiran introduces a new capability in HDL Coder™ that generates synthesizable VHDL/Verilog code directly from MATLAB and highlights some of the key features of this new MATLAB based workflow.

Contents

Introduction to HDL Code Generation from MATLAB

If you are using MATLAB to model digital signal processing (DSP) or video and image processing algorithms that eventually end up in FPGAs or ASICs, read on...

FPGAs provide a good compromise between general purpose processors (GPPs) and application specific integrated circuits (ASICs). GPPs are fully programmable but are less efficient in terms of power and performance; ASICs implement dedicated functionality and show the best power and performance characteristics, but require extremely expensive design validation and implementation cycles. FPGAs are also used for prototyping in ASIC workflows for hardware verification and early software development.

Due to the order of magnitude performance improvement when running high-throughput, high-performance applications, algorithm designers are increasingly using FPGAs to prototype and validate their innovations instead of using traditional processors. However, many of the algorithms are implemented in MATLAB due to the simple-to-use programming model and rich analysis and visualization capabilities. When targeting FPGAs or ASICs these MATLAB algorithms have to be manually translated to HDL.

For many algorithm developers who are well-versed with software programming paradigms, mastering the FPGA design workflow is a challenge. Unlike software algorithm development, hardware development requires them to think parallel. Other obstacles include: learning the VHDL or Verilog language, mastering IDEs from FPGA vendors, and understanding esoteric terms like "multi-cycle path" and "delay balancing".

In this post, I describe an easier path from MATLAB to FPGAs. I will show how you can automatically generate HDL code from your MATLAB algorithm, implement the HDL code on an FPGA, and use MATLAB to verify your HDL code.

MATLAB to Hardware Workflow

The process of translating MATLAB designs to hardware consists of the following steps:

Model your algorithm in MATLAB - use MATLAB to simulate, debug, and iteratively test and optimize the design. Generate HDL code - automatically create HDL code for FPGA prototyping. Verify HDL code - reuse your MATLAB test bench to verify the generated HDL code. Create and verify FPGA prototype - implement and verify your design on FPGAs.

There are some unique challenges in translating MATLAB to hardware. MATLAB code is procedural and can be highly abstract; it can use floating-point data and has no notion of time. Complex loops can be inferred from matrix operations and toolbox functions.

Implementing MATLAB code in hardware involves:

Converting floating-point MATLAB code to fixed-point MATLAB code with optimized bit widths suitable for efficient hardware generation.

Identifying and mapping procedural constructs to concurrent area- and speed-optimized hardware operations.

Introducing the concept of time by adding clocks and clock rates to schedule the operations in hardware.

Creating resource-shared architectures to implement expensive operators like multipliers and for-loop bodies.

Mapping large persistent arrays to block RAM in hardware

HDL Coder™ simplifies the above tasks though workflow automation.

Example MATLAB Algorithm

Let’s take a MATLAB function implementing histogram equalization and go through this workflow. This algorithm, implemented in MATLAB, enhances image contrast by transforming the values in an intensity image so that the histogram of the output image is approximately flat.

type mlhdlc_heq.m

function [pixel_out] = mlhdlc_heq(x_in, y_in, pixel_in, width, height)

persistent histogram persistent transferFunc persistent histInd persistent cumSum

if isempty(histogram) histogram = zeros(1, 2^8); transferFunc = zeros(1, 2^8); histInd = 0; cumSum = 0; end

if y_in < height && x_in < width histInd = pixel_in + 1; elseif y_in == height && x_in == 0 histInd = 1; elseif y_in >= height histInd = min(histInd + 1, 2^8); elseif y_in < height histInd = 1; end

histValRead = histogram(histInd);

transValRead = transferFunc(histInd);

if y_in < height && x_in < width histValWrite = histValRead + 1; transValWrite = transValRead; cumSum = 0; elseif y_in >= height histValWrite = 0; transValWrite = cumSum + histValRead; cumSum = transValWrite; else histValWrite = histValRead; transValWrite = transValRead; end

histogram(histInd) = histValWrite;

transferFunc(histInd) = transValWrite;

pixel_out = transValRead;

Example MATLAB Test Bench

Here is the test bench that verifies that the algorithm works with an example image. (Note that this testbench uses Image Processing Toolbox functions for reading the original image and plotting the transformed image after equalization.)

type mlhdlc_heq_tb.m

clear mlhdlc_heq ; testFile = 'office.png' ; RGB = imread(testFile);

YCBCR = rgb2ycbcr(RGB); imgOrig = YCBCR(:,:,1);

[height, width] = size(imgOrig); imgOut = zeros(height,width); hBlank = 20; vBlank = ceil(2^14/(width+hBlank));

for frame = 1:2 disp([ 'working on frame: ' , num2str(frame)]); for y_in = 0:height+vBlank-1 for x_in = 0:width+hBlank-1 if x_in < width && y_in < height pixel_in = double(imgOrig(y_in+1, x_in+1)); else pixel_in = 0; end

[pixel_out] = mlhdlc_heq(x_in, y_in, pixel_in, width, height);

if x_in < width && y_in < height imgOut(y_in+1,x_in+1) = pixel_out; end end end end

imgOut = double(imgOut); imgOut(:) = imgOut/max(imgOut(:)); imgOut = uint8(imgOut*255);

YCBCR(:,:,1) = imgOut; RGBOut = ycbcr2rgb(YCBCR);

figure(1) subplot(2,2,1); imshow(RGB, []); title( 'Original Image' ); subplot(2,2,2); imshow(RGBOut, []); title( 'Equalized Image' ); subplot(2,2,3); hist(double(imgOrig(:)),2^14-1); title( 'Histogram of original Image' ); subplot(2,2,4); hist(double(imgOut(:)),2^14-1); title( 'Histogram of equalized Image' );

Let's simulate this algorithm to see the results.

mlhdlc_heq_tb

working on frame: 1 working on frame: 2

HDL Workflow Advisor

The HDL Workflow Advisor (see the snapshot below) helps automate the steps and provides a guided path from MATLAB to hardware. You can see the following key steps of the workflow in the left pane of the workflow advisor:

Fixed-Point Conversion HDL Code Generation HDL Verification HDL Synthesis and Analysis

Let's look at each workflow step in detail.

Fixed-Point Conversion

Signal processing applications are typically implemented using floating-point operations in MATLAB. However, for power, cost, and performance reasons, these algorithms need to be converted to use fixed-point operations when targeting hardware. Fixed-point conversion can be very challenging and time-consuming, typically demanding 25 to 50 percent of the total design and implementation time. The automatic floating-point to fixed-point conversion workflow in HDL Coder™ can greatly simplify and accelerate this conversion process.

The floating-point to fixed-point conversion workflow consists of the following steps:

Verify that the floating-point design is compatible with code generation. Propose fixed-point types based on computed ranges, either through the simulation of the testbench or through static analysis that propagates design ranges to compute derived ranges for all the variables. Generate fixed-point MATLAB code by applying proposed fixed-point types. Verify the generated fixed-point code and compare the numerical accuracy of the generated fixed-point code with the original floating point code.

Note that this step is optional. You can skip this step if your MATLAB design is already implemented in fixed-point.

HDL Code Generation

The HDL Code Generation step generates HDL code from the fixed-point MATLAB code. You can generate either VHDL or Verilog code that implements your MATLAB design. In addition to generating synthesizable HDL code, HDL Coder™ also generates various reports, including a traceability report that helps you navigate between your MATLAB code and the generated HDL code, and a resource utilization report that shows you, at the algorithm level, approximately what hardware resources are needed to implement the design, in terms of adders, multipliers, and RAMs.

During code generation, you can specify various optimization options to explore the design space without having to modify your algorithm. In the Design Space Exploration and Optimization Options section below, you can see how you can modify code generation options and optimize your design for speed or area.

HDL Verification

Standalone HDL test bench generation:

HDL Coder™ generates VHDL and Verilog test benches from your MATLAB scripts for rapid verification of generated HDL code. You can customize an HDL test bench using a variety of options that apply stimuli to the HDL code. You can also generate script files to automate the process of compiling and simulating your code in HDL simulators. These steps help to ensure the results of MATLAB simulation match the results of HDL simulation.

HDL Coder™ also works with HDL Verifier to automatically generate two types of cosimulation testbenches:

HDL cosimulation-based verification works with Mentor Graphics® ModelSim® and QuestaSim®, where MATLAB and HDL simulation happen in lockstep.

FPGA-in-the-Loop simulation allows you to run a MATLAB simulation with an FPGA board in strict synchronization. You can use MATLAB to feed real world data into your design on the FPGA, and ensure that the algorithm will behave as expected when implemented in hardware.

HDL Synthesis

Apart from the language-related challenges, programming for FPGAs requires the use of complex EDA tools. Generating a bitstream from the HDL design and programming the FPGA can be daunting tasks. HDL Coder™ provides automation here, by creating project files for Xilinx® and Altera® that are configured with the generated HDL code. You can use the workflow steps to synthesize the HDL code within the MATLAB environment, see the results of synthesis, and iterate on the MATLAB design to improve synthesis results.

Design Space Exploration and Optimization Options

HDL Coder™ provides the following optimizations to help you explore the design space trade-offs between area and speed. You can use these options to explore various architectures and trade-offs without having to manually rewrite your algorithm.

Speed Optimizations

Pipelining : To improve the design’s clock frequency, HDL Coder enables you to insert pipeline registers in various locations within your design. For example, you can insert registers at the design inputs and outputs, and also at the output of a given MATLAB variable in your algorithm.

: To improve the design’s clock frequency, HDL Coder enables you to insert pipeline registers in various locations within your design. For example, you can insert registers at the design inputs and outputs, and also at the output of a given MATLAB variable in your algorithm. Distributed Pipelining : HDL Coder also provides an optimization based on retiming to automatically move pipeline registers you have inserted to maximize clock frequency, by minimizing the delay through combinational paths in your design.

Area Optimizations

RAM mapping : HDL Coder™ maps matrices to wires or registers in hardware. If persistent matrix variables are mapped to registers, they can take up a large amount of FPGA area. HDL Coder™ automatically maps persistent matrices to block RAM to improve area efficiency. The challenge in mapping MATLAB matrices to block RAM is that block RAM in hardware typically has a limited set of read and write ports. HDL Coder™ solves this problem by automatically partitioning and scheduling the matrix reads and writes to honor the block RAM’s port constraints, while still honoring the other control- and data-dependencies in the design.

: HDL Coder™ maps matrices to wires or registers in hardware. If persistent matrix variables are mapped to registers, they can take up a large amount of FPGA area. HDL Coder™ automatically maps persistent matrices to block RAM to improve area efficiency. The challenge in mapping MATLAB matrices to block RAM is that block RAM in hardware typically has a limited set of read and write ports. HDL Coder™ solves this problem by automatically partitioning and scheduling the matrix reads and writes to honor the block RAM’s port constraints, while still honoring the other control- and data-dependencies in the design. Resource sharing : This optimization identifies functionally equivalent multiplier operations in MATLAB code and shares them. You can control the amount of multiplier sharing in the design.

: This optimization identifies functionally equivalent multiplier operations in MATLAB code and shares them. You can control the amount of multiplier sharing in the design. Loop streaming : A MATLAB for-loop creates a FOR_GENERATE loop in VHDL. The body of the loop is replicated as many times in hardware as the number of loop iterations. This results in an inefficient use of area. The loop streaming optimization creates a single hardware instance of the loop body that is time-multiplexed across loop iterations.

: A MATLAB for-loop creates a FOR_GENERATE loop in VHDL. The body of the loop is replicated as many times in hardware as the number of loop iterations. This results in an inefficient use of area. The loop streaming optimization creates a single hardware instance of the loop body that is time-multiplexed across loop iterations. Constant multiplier optimization: This design level optimization converts constant multipliers into shift and add operations using canonical signed digit (CSD) techniques.

Best Practices

Now, let's look at few best practices related to writing MATLAB code when targeting FPGAs.

When writing a MATLAB design:

Use the code generation subset of MATLAB supported for HDL code generation.

Keep the top-level interface as simple as possible. The top-level function size, types, and complexity determine the interface of the chip implemented in hardware.

Do not pass in a big chunk of parallel data into the design. Parallel data requires a large number of IO pins on the chip, and would probably not be synthesizable. In a typical image processing design, you should serialize the pixels as inputs and buffer them internally in the algorithm.

When writing a MATLAB test bench:

Call the design from the testbench function.

Exercise the design thoroughly. This is particularly important for floating-point to fixed-point conversion, where HDL Coder™ determines the ranges of the variables in the algorithm based on the values the testbench assigns to the variables. You can reuse this testbench to generate an HDL testbench for testing the generated hardware.

Simulate the design with the testbench prior to code generation to make sure there are no simulation errors, and to make sure all the required files are on the path.

Conclusion

HDL Coder™ provides a seamless workflow when you want to implement your algorithm in an FPGA. In this post, I have shown you how to take an image processing algorithm written in MATLAB, convert it to fixed-point, generate HDL code, verify the generated HDL code using the test bench, and finally, synthesize the design and implement it in hardware.

See this article about how one of the HDL Coder customers, FLIR has used MATLAB to HDL workflow to achieve good results. You can also learn more about this workflow using the product examples located here.

We hope this brief introduction to the HDL Coder™ and MATLAB-to-HDL code generation, verification framework has shown how you can quickly get started on implementing your MATLAB designs and target FPGAs. Please let us know in the comments for this post how you might use this new functionality. Or, if you've already tried using HDL Coder™, let us know about your experiences here.