Introduction

High Level Synthesis (HLS) allows us to work at higher levels of abstraction when we develop our FPGA application, hopefully saving time and reducing the non recurring cost if it is a commercial project.

One of the great applications for HLS is image or signal processing, where we may have created a high level model in C or C++ or we wish to use an open source industry standard frame work such as OpenCV.

In this project we are going to look at how we can build a Sobel edge detection IP core using HLS and then include it within the Xilinx FPGA of our choice.

The selected device could be a traditional FPGA such as a Spartan Seven or Artix, alternatively it could also be implemented within the programmable logic of a heterogeneous SoC like the Zynq 7000 or Zynq MPSoC.

The Theory

Before we jump into the application I should first touch lightly on how the Sobel algorithm works. The Sobel algorithm functions by identifying edges in a image and emphasizing them such that they can be easily identified. Typically this will create a grey scale image, where the edges are identified as shades of grey / white.

Sobel edge detection works by detecting a change in the gradient of the image in both the horizontal and vertical directions. To do this two convolution filters are applied to the original image, the results of these convolution filters are then combined to determine the magnitude of the gradient.

Sobel Convolution Filters

Implementation

If we where to implement this within a FPGA using a traditional VHDL / Verilog RTL approach the development time would not be trivial. As we would need to create line buffers for the convolution and then implement the magnitude calculation. We would also need to create a test bench to so we could ensure our code was working as intended before we progressed to implementation.

Luckily when we use HLS we can really skip over a lot of the heavy lifting and let Vivado HLS implement the lower level Verilog / VHDL RTL Implementation.

To work at this higher level of abstraction we will be using Vivado HLS and its HLS_OpenCV and HLS_Video libraries.

The first library HLS_OpenCV allows us to work with the very popular OpenCV framework. While the HLS video library provides a number of image processing functions which can be accelerated into programmable logic.

Rather helpfully the HLS video library includes everything we need to create a Sobel IP core, including :-

HLS::CvtColor - This converts the color scheme between color and gray scale depending upon its configuration.

HLS::Gaussian - This will perform a Gaussian blur on the image to reduce noise present in the image.

HLS::Sobel - This performs the Sobel convolution in either the vertical or horizontal direction depending upon its configuration. We will need to use two implementations of this in our IP core.

HLS::AddWeighted - This allows us to perform the resulting magnitude calculation using the results from the vertical and horizontal Sobel operators.

These are not all the HLS functions we will be using as we need to use additional functions. We need to include these additional functions to enable the use of HLS optimizations and interface with our Vivado design easier.

Interfacing

The best method internally within programmable logic to move image data about is using a AXI Stream.

This allows the creation of a high performance image processing path where elements can be easily added or created as needed.

There exist several IP blocks in the Vivado IP library which enable the conversion between video input and output and AXI streaming. Along with other image processing functions such as mixers and color space converters.

As such we want our Sobel IP core to be able to accept an AXI Stream input and generate its output in the same AXI Stream format. To do this we use the functions below which allow conversion between AXI streaming and HLS::Mat formats used by HLS functions.

HLS::AXIvideo2Mat - Converts from a AXI Stream to the HLS::Mat format this is used for the AXI Stream input.

HLS::Mat2AXIvideo - Converts from the HLS::Mat format to the AXI Stream format, this is used for the AXI Stream output.

C Synthesis and Optimizations

The high level languages we use to describe our designs are un-timed unlike Verilog and VHDL designs. This means when the HLS tool converts C into Verilog or VHDL it must go through a number of stages to create the output RTL

Scheduling - Determines the operations and the sequence in which they occur.

Binding - Assigns the operations to logic resources available within the device.

Control Logic Extraction - Extracts the control logic and creates control structures e.g. state machines to control the behavior of the module.

Scheduling and Binding Example

Control Logic Extraction Example

As the HLS tool has to trade off between performance and logic resources when it runs through synthesis it will follow a number of rules during implementation. These may effect the performance of the resulting IP core, for instance loops (a common strucutre in coding for HLS) are kept rolled.

Of course, we may want to change the decisions the HLS tool takes during C Synthesis to obtain better performance. We can do this using #pragmas in our C and there are several we can use.

For this implementation we are going to use the Dataflow pragma to ensure we can achieve the highest possible frame rate.

To be able to use this pragma we need to ensure the HLS synthesis tool performs both Sobel operations in parallel. This will allow us to specify data flow optimization during HLS C synthesis which optimizes the data flow through functions. In effect dataflow optimization is coarse grain pipelining.

Dataflow Pipelining

If we performed one Sobel operation and then the other sequentially we would not be able to apply this optimization.

As such we need to split the result of the Gaussian blur into two parallel paths, which we recombine at the AddWeighted Stage. To do this we use the function

HLS::Duplicate - This duplicates an input image into two separate output images which we can process in parallel.

Putting It All Together

Understanding all of this we can then write the code we will use for our Sobel IP core

#include "cvt_colour.hpp" void image_filter(AXI_STREAM& INPUT_STREAM, AXI_STREAM& OUTPUT_STREAM)//, int rows, int cols) { #pragma HLS INTERFACE axis port=INPUT_STREAM #pragma HLS INTERFACE axis port=OUTPUT_STREAM RGB_IMAGE img_0(MAX_HEIGHT, MAX_WIDTH); GRAY_IMAGE img_1(MAX_HEIGHT, MAX_WIDTH); GRAY_IMAGE img_2(MAX_HEIGHT, MAX_WIDTH); GRAY_IMAGE img_2a(MAX_HEIGHT, MAX_WIDTH); GRAY_IMAGE img_2b(MAX_HEIGHT, MAX_WIDTH); GRAY_IMAGE img_3(MAX_HEIGHT, MAX_WIDTH); GRAY_IMAGE img_4(MAX_HEIGHT, MAX_WIDTH); GRAY_IMAGE img_5(MAX_HEIGHT, MAX_WIDTH); RGB_IMAGE img_6(MAX_HEIGHT, MAX_WIDTH); ; #pragma HLS dataflow hls::AXIvideo2Mat(INPUT_STREAM, img_0); hls::CvtColor<HLS_BGR2GRAY>(img_0, img_1); hls::GaussianBlur<3,3>(img_1,img_2); hls::Duplicate(img_2,img_2a,img_2b); hls::Sobel<1,0,3>(img_2a, img_3); hls::Sobel<0,1,3>(img_2b, img_4); hls::AddWeighted(img_4,0.5,img_3,0.5,0.0,img_5); hls::CvtColor<HLS_GRAY2RGB>(img_5, img_6); hls::Mat2AXIvideo(img_6, OUTPUT_STREAM); } #include "hls_video.h" #include <ap_fixed.h> #define MAX_WIDTH 1280 #define MAX_HEIGHT 720 typedef hls::stream<ap_axiu<24,1,1,1> > AXI_STREAM; typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> RGB_IMAGE; typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC1> GRAY_IMAGE; void image_filter(AXI_STREAM& INPUT_STREAM, AXI_STREAM& OUTPUT_STREAM);//int rows, int cols);

Of course we want to be able to run both C Simulation and Co Simulation as such we need a test bench we can use to test the algorithm.

#include <hls_opencv.h> #include "cvt_colour.hpp" #include <iostream> using namespace std; int main (int argc, char** argv) { IplImage* src; IplImage* dst; AXI_STREAM src_axi, dst_axi; src = cvLoadImage("test.bmp"); dst = cvCreateImage(cvGetSize(src), src->depth, src->nChannels); IplImage2AXIvideo(src, src_axi); image_filter(src_axi, dst_axi);//src->height,src->width); AXIvideo2IplImage(dst_axi, dst); cvSaveImage("op.bmp", dst); cvReleaseImage(&src); cvReleaseImage(&dst); }

When we run the C Simulation we can see the result as below for a test input image.

Input Test Image for C and Co Simulation

Co Simulation Sobel Result

With both C Simulation and Co Simulation results as expected we can export the core and add it in to a Vivado Hardware design.

Before we do that however, you might want to check the Analysis, view within Vivado HLS and confirm the two Sobel functions are operating in parallel.

Analysis View showing parallel sobel operatations

We can export the IP core using the export RTL option within Vivado HLS, if we wish we can further configure the IP core parameters

Exporting the HLS core

Implementing the Core

With the core exported you will, find a zip file under the <project>/solutionX/imp directory. This directory contains all the necessary information needed to add in the newly created Sobel IP core to our Vivado design.

This file can be added to our Vivado IP repository and then included in the Vivado block diagram

IP Core in the Vivado IP Library

Integrating the core within the image processing chain (note the HLS Symbol)

With this all integrated you can build the application and target to your development board of choice.

For the demo video below I used a Zybo Z7 and the HDMI input and HDMI output to apply video to the Sobel IP core and display the results.

You can find the files associated with this project here:

https://github.com/ATaylorCEngFIET/Hackster

See previous projects - which you may find helpful:

Conversion between color schemes explained

Creating a image processing platform for your FPGA or SoC

More on on Xilinx using FPGA development weekly at MicroZed chronicles