Fig 1. Can hardware designs be recycled?

When I first started doing digital design, I had a strong software background. One of the first lessons you learn in software is to reuse your own code as much as possible. The sooner you learn this lesson, the faster you’ll be able to finish your next homework problem.

The lesson goes well beyond school, but into industry as well. Consider the various operating systems and how often they are reused. Are you reading this article from a device running Linux, MacOS, Windows, or something else? Just being able to list the number of major operating systems on one hand is a testament itself to software reuse.

The same lesson applies to compilers and system libraries. How is it, for example, that Vivado, Quartus, Yosys, Verilator or any other EDA tool can run on so many platforms? Software reuse. It’s real. It works.

But what about hardware? Specifically, what about reusing digital design components?

Here, in this field, reuse becomes a bit more of a challenge.

The first and biggest challenge are the hardware licenses. The licenses that worked so well for software don’t apply as well to hardware. While I personally love the GPLv3 license, conveying a hardware design that uses a GPLv3 component to someone else requires also conveying to them the ability to rebuild the rest of the entire design. This isn’t so easy, since many of the popular major design components, ARM cores, SERDES cores, I/O components, and so forth, are still very proprietary.

Within a company, however, design reuse shouldn’t be a problem. The company owns all of their own designs, so they should be able to use them freely from one product to another, right?

This is the case here, within Gisselquist Technology, LLC, and yet even in this optimal reuse environment hardware design reuse is still a long way from achieving the goals that have been achieved by software reuse.

Let’s take some time today to look at several experiences I’ve had with design reuse since I started with digital design over a decade ago. (Wow, has it actually been that long?) We’ll start by looking over standardization problems I’ve had across tools, and then work our way from the bottom of a design all the way up through some components, through bus slaves, and on to bus master interoperability.

The first problem with design reuse is that the various tools tend to be vendor and often even platform centric. This makes it a challenge to reuse designs from one platform to the next. For example, design constraint files (XDC, UCF, SDC, PCF, etc.) differ in format and content from one vendor to the next. This means that I/O timing constraints and false path constraints all need to be rewritten when attempting to reuse a design across different vendors.

Well, at least the HDL languages are standard among vendors and tools, right? How about just the subset of Verilog that I like to use?

Well, no. Not even Verilog is standard across vendor tools.

One of the first things I teach anyone who will listen is to place a default_nettype none declaration at the top of every Verilog source file. Doing this prevents the synthesis tool from turning a spelling mistake into a new signal within your design. It has helped me catch a lot of mistakes over the years. The problem is that placing this line in a Quartus DE-10 Nano design will cause Quartus to fail to build the design. Why? Because the default_nettype setting isn’t applied across a single file, but rather every design file following, if not the entire design. Worse, it seems as though the Altera’s engineers used this language “feature” to avoid declaring signals within their designs. Hence, what makes my design better breaks their design components. The problem isn’t limited to Quartus. Yosys handles the default_nettype statement on a file by file basis. This means that if I change default_nettype back to its original wire setting at the end of the file, the design will now work with Quartus but it will no longer get the default_nettype benefit from Yosys. There is one annoying detail associated with this command: input ports need to be declared as input wire rather than just input once you set default_nettype to none . The Verilog standard requires this, yet neither Yosys nor Verilator require it. This means that designs that pass a verilator -Wall -cc topmodule.v check might still fail to be built under another tool. HDL designs don’t build without warnings If there’s one thing that frustrates me, it’s the inconsistency of warnings across tools. Coming from the software world, I’m used to a program that can be compiled without warnings. Here in the hardware world, this is a challenge. Consider, for example, the following code.

parameter W = 5 ; reg [ W - 1 : 0 ] A ; always @ ( posedge clk ) A <= A + 1 ;

This will generate a warning that a 32-bit number, the 1 , is being added to a 5 bit number, and so there might be a loss of precision. While I might rewrite this to get rid of the warning,

always @ ( posedge clk ) A <= A + 5'h1 ;

the warning will then return again whenever I change the width, W , to something other than five.

If I then try to change the design to

always @ ( posedge clk ) A <= A + 1'b1 ;

I then get rid of the warning when using Verific based front ends, only for it to return with Verilator.

My solution has been to build my designs so that they have no warnings when using verilator -Wall , and then to ignore any of the warnings generated by the Verific, parser used by both Vivado, ISE and Quartus.

Still, its annoying to have a design build without warnings in one environment, but not in another.

Unused values Many interfaces have signals that aren’t used by all cores. In order to make the cores generic, I pass those signals that aren’t used along with the interface anyway. Verilator generates a warning when I do this. Verific (i.e. the front-end language parser used by Vivado, ISE, and Quartus) also generates a warning. However, I can turn the Verilator warnings off on a case by case basis by simply using,

// Verilator lint_off UNUSED wire unused ; assign unused = &{ 1'b0 , unused_signal_one , unused_signal_two , etc }; // Verilator lint_on UNUSED

While this doesn’t get rid of the warnings when using the commercial vendor tools, at least those warnings are now about the wire named unused being unused, and so they’re now easy to work through.

Of course, the problem with ignoring synthesis warnings like this is what happens when a design mysteriously stops working. In that case, I find myself digging through all of the useless warnings generated in the logs of the various tools and looking for any evidence of what might’ve happened.

Generate loop block names Much to my surprise, a design that worked in Yosys, Verilator, Vivado, and ISE failed to synthesize under Quartus for the simple reason that the for loops within my generate blocks weren’t named.

generate genvar k ; for ( k = 0 ; k & lt ; NADC ; k = k + 1 ) begin : BLOCK_NAME_NEEDED_HERE assign adc_data [ k ] = raw_adc_data [ k * ADCBITS +: ADCBITS ] ; end endgenerate

My point here is simply that seemingly useless differences between vendor tools can become quite annoying in practice and a hindrance to design reuse. All of a sudden, you find that a design component that worked under one vendor’s tools mysteriously causes build failures under another vendor’s tools.

This problem was solved in software by an open source compiler, gcc. Verilog has an open source synthesizer, Yosys which can come close. It can synthesize designs for ASICs, iCE40, ECP5, Xilinx 7-series, and some Intel designs. In many ways this is halfway to nirvana. Unfortunately, there’s no open source synthesis tool for VHDL, nor is there any open source tool for SystemVerilog–although there is a Yosys plugin, called ghdl-synth, that I’m told is getting close to offering VHDL support in Yosys.

Why not reuse FIFOs?

Fig 2. Surely common components can be reused?

Once you get past the tool issues, the next biggest question is why can’t I reuse some of my most common components? The most obvious of these common components is a FIFO. FIFOs are perhaps the most common core used across designs. I use FIFOs in my bus bridges, my ADC cores, a microphone core I’ve built, my UART cores, and even in my debugging bus. Surely one simple FIFO design can be used across all architectures?

Fig 3. Common FIFO ports

The good news, at least for me, is that after writing many (dissimilar) FIFO implementations, I’m now starting to coalesce around a single synchronous FIFO implementation. Even with this implementation, there are a lot of per-design configuration differences that need to be made.

The data width changes from one application to the next, as does the necessary FIFO depth (RAM size). Thankfully, these changes are easily parameterized–making the FIFO (mostly) generic.

Should the empty/full flags be registered? Do they need to be? It costs extra LUTs to calculate these values one clock earlier, but doing so can also keep any FIFO users off the critical timing path.

Some FPGAs have distributed RAM, others don’t–something I discuss in my tutorial lesson on FIFOs. On an iCE40, all RAM reads must be done directly into a register before they can be read, whereas Xilinx architectures support “distributed RAM” reads on the same clock cycle they are used.

Handshake signaling differs from one implementation to another. My current FIFO implementation uses a READY/VALID type of handshake for reading ( i_rd and !o_empty ) from and writing ( i_wr and !o_full ) to the FIFO. The problem is that this interface isn’t necessarily appropriate for all applications. In some data centric applications, such as coming from an A/D or a video source where the data comes in at a fixed speed, the source will write to the FIFO regardless of whether or not the FIFO is ready. Doing this properly really requires generating an error signal, which my one-size-fits-most FIFO implementation doesn’t (yet) have.

Some applications, such as a UART, require being able to know how much data is in the FIFO. They want to read the FIFO’s fill level back out. This can be useful for waking up a processor only when the FIFO is half full or half empty, for example, or reading until it is empty following an interrupt. Other applications don’t care about the fill. Leaving a port unused and dangling, however, is likely to cause a tool warning and get in the way of building a warning-less design.

Other applications, such as stream to memory bridges, might want a trigger threshold implemented within the FIFO. Such a trigger, in the case of a stream to memory component, might cause the FIFO to empty into memory like flushing toilet empties the tank into the bowl.

Can one FIFO work in all applications? I haven’t managed to do it (yet). In addition to reuse, there is something to be said for keeping things simple. Of course, the problem then comes about when I fix a bug in one FIFO that then still remains in one of my other implementations.

Xilinx’s solution appears to be to use a FIFO generator that will then generate the logic for a FIFO that can be used across many Xilinx hardware platforms. At the same time, this (proprietary) FIFO generator has given me no end of hassles when trying to formally verify what little they have published about their interconnect. Tell me, for example, why does a FIFO require nearly 100 parameters and just as many ports? Hence, while configurability in the name of reuse is a good thing, this generator appears to be taking configuration to an extreme.

Can we reuse serial ports?

Fig 4. A common serial port interface

So let’s move up the ladder, from FIFOs to full level design components. How about serial ports? What can we learn about hardware reuse from serial ports?

A fellow open source designer, Olof Kindgren, is known for his strong opinion that we should stop building new serial ports. Surely among all design components serial ports should be prime candidates for reuse! The communications standard hasn’t changed in years, so why ever build a new serial port?

To put it in his own words,

I use the UART as a pathological example because it’s a function so simple that many people feel it’s easier to write a new rather than reuse an existing one. But in practice this leads to another implementation with bugs but without proper docs, tests, and drivers. (Twitter)

There are a lot of things you can learn from serial ports.

Building a serial port is a good beginner’s design exercise. If you’ve never built a serial port before, go ahead and build one. It’s a fun design to learn from, especially since you can typically “see” your design working when you are done. Indeed, serial ports are one of the many designs I work through in my beginners tutorial. The UART16550 interface has long since outlived its time. The classic serial port interface goes back to the UART16550 chip built by National Semiconductor. It seems that much of the industry has standardized around its software interface. Its not hard to find software drivers that can communicate with this interface, so why not just reuse it? Sadly, this chip appears to have been built back in the days of 8-bit buses. In order to set the baud rate of this chip, you need to set two different registers, and you’ll need to adjust a paging register in the meantime just to get access to those other registers. Worse, the UART16550 only supports a 16-element FIFO. Why not increase the size of the FIFO? That should be easy, right? Well, yes, it is fairly easy to do—it’s just that you now need to adjust all of the software that depends the size of this FIFO. From my own perspective, I only came across the UART16550 after building my own serial port core. Using my own serial port, I can completely configure baud rate, number of stop bits, number of bits per byte, the parity bit, and even whether or not flow control will be used by just writing one 32-bit value to a 32-bit bus-based interface. Pretty cool, huh? Sure, you could reuse the older core, but it’d be easier to configure, reconfigure, and use with a more modern interface. (Such as my own …) Of course, it doesn’t help that the open source UART16550 core has a (formal-verification discovered) bug within it that might cause it to send arbitrary data across the channel ….

Fig 5. How much does a serial port require?

So why rebuild the wheel when it comes to serial ports? Because 1) the UART16550 interface hasn’t aged well, and 2) my “ultimate” serial port cost me too much to use.

If you compare these problems to software, wouldn’t these also be problems that where one might learn lessons from software reuse? Not really. Unlike hardware, software bloat doesn’t nearly cost as much. Just a kB here, and a kB there, and no one will notice that a piece of software has a lot of unnecessary functionality in it. The fact that the Internet Explorer was declared to be an integral part of the Windows operating system should prove my point about software bloat.

What about Olof’s advice? In hind sight, he has a strong point. Several latent bugs existed in the core prior to formal verification. Despite the fact that the full service core had so much functionality, barely any of it was properly verified prior to that time. Further, the software driver had to be rewritten multiple times over. Still, the core components have been used over and over again in many projects with great success.

Reusing an SD-Card component

Fig 6. Reusing an SD-Card Controller

What about other components? For example, what about SD-cards? Why can’t we reuse SD card controllers from one design to another? Can reuse finally be achieved here?

This illustrates another big problem with reuse: Just because a design “works” in one bus/interconnect environment, doesn’t mean it will work in practice in your environment. This leaves the individual reusing the core with the unenviable task of needing to debug his own design enough to convince the author of any subcore within it that a bug remains within the component, rather than within the context in which it was written.

Doesn’t software also have the same problem? I suppose you might argue that it does. The difference, however, is the difficulty associated with debugging “broken” hardware components. Debugging software is fairly easy. Debugging hardware, that’s much harder.

The good news is that by using a formal property file, you can verify that a core will function in all bus interconnect and usage environments–something you don’t get from either a bench test, nor an integrated simulation environment.

Reusing an I2C controller

Fig 7. Reusing an I2C Controller

Some time ago, I built an I2C master and separate slave controller. They were built to support an HDMI based pass-through design, and so one controller would read the EDID information from a downstream monitor, and then that information would be used to populate the EDID information used by the upstream HDMI source–in this case a Raspberry Pi.

Did the design work? Beautifully. No, it wasn’t automatic, but it was still quite general purpose. (It required a ZipCPU program to forward the information from the EDID master to the slave.)

Recently, however, someone gave me an I2C chip to work with that doesn’t follow the single byte address, multi-byte data protocol. Try as I might, I can’t seem to figure out any way to control this new device with my older I2C controller.

Why not reuse? Because even though the lower level protocol remained the same, the upper level protocol changed and the cores that I might’ve used combined the two protocol layers.

Reuse at the interconnect level

Connecting components like a serial port and/or an I2C controller together within a design tends to require some sort of glueware–an interconnect–that holds the components together while allowing them to talk to each other. Many modern designs are composed of some kind of system level bus, or even a hierarchical bus structure, that connects many components together. Components to be connected include bus masters–those that want to drive an interaction, bus bridges, and bus slaves–those that actually perform some resulting action.

Fig 8. Can the interconnect be reused?

This may be the one level at which I have seen the least reuse between designs crossing multiple vendors. There just aren’t that many well-known interconnect generators that will work cross platform.

What keeps interconnects from being reused?

Fig 9. Xilinx's Area Optimized N:1:M AXI Crossbar

This is all fine and good until you switch a design component from the N:1:M crossbar interconnect to the full N:M crossbar. Chances are, if you do that, that you’ll discover that your design no longer works. Both Xilinx’s demonstration IP cores and their AXI Ethernet-Lite core would break–if not other Xilinx cores as well.

Fig 10. Whose core do you blame when something goes wrong?

Tell me, what would you do? If you reconfigured your crossbar and suddenly your design stopped working, where would you look for the bug? Would you try to find a bug in Xilinx’s interconnect? That’s where I would look! Worse, I’d get all frustrated that their crossbar was closed source, and then likely blame them for the bug–even if it was in one of my own design components! This is what you’ll suffer from when your own core can’t handle backpressure properly.

Did I mention changing standards? Some of the earlier ARM based SOCs, Zynqs included, supported only AXI3–even though most FPGA designs today use AXI4.

It doesn’t help that vendor based interconnects can’t be simulated with 3rd party tools like Verilator, or verified with things like SymbiYosys, simply because the designs are proprietary. As I mentioned above, this proprietary nature of most interconnect generators just hides the bugs within them, and obscures any bugs hidden elsewhere in the design.

This may be perhaps the biggest place where good open source based reuse might improve designs.

Reusing the ZipCPU

Let’s now turn our attention to a place where all the stars should align to make reuse easy: within IP cores generated by a single company, owned by a single entity, and all using the same bus standard.

In this ideal environment, reuse should be easy. Right?

So have I managed to achieve reuse nirvana then? Let’s take a look at several ZipCPU designs and see what might be learned from reusing the ZipCPU across multiple designs.

Still, the fact that the ZipCPU has been successfully used across so many architectures is by itself a reuse success story. Nirvana? Perhaps not, but still quite valuable.

Reusing the design across CPUs

Okay, so I’ve now got a design framework I like using. Can it be reused?

Specifically, one customer wanted me to reuse my framework to build a platform containing a RISC-V CPU instead of the ZipCPU. Surely reuse would work here, right?

Let’s see: I owned all the submodule and component designs except for the PicoRV32 CPU I chose to use, so licensing wasn’t a problem. I used AutoFPGA to compose the component cores together, so there was no problem with building the interconnect. I could reuse my CPU loader to load the (flash) memory into a design, so that wouldn’t be a problem. The PicoRV32, like the ZipCPU, was highly configurable so it could be configured to start from the memory address provided by AutoFPGA, it could be configured for the number of interrupts AutoFPGA assigned to it, the design had support for 32x32 bit multiplies, … what could possibly go wrong?

Since it looked so easy, I made a big mistake: I mis-estimated the amount of time the project would take. Since it was all reuse, it should’ve all been easy. Again, what could’ve gone wrong?

Fig 11. Endianness: Which byte of a word is byte zero?

What’s the lesson here? Did reuse work? Well, yes and no. I did manage to reuse most of the design across both CPUs. I did manage to reuse the bus interconnect framework across both CPUs. No, I wasn’t able to reuse the ZipCPU’s debugger with the PicoRV32–but then again I wasn’t expecting to. That said, it wasn’t all that hard to issue halt or reset commands to the PicoRV32 from the debugging bus interface over the Wishbone bus like I would’ve done with the ZipCPU.

Conclusion

Hardware is not software.

Let me say that again, hardware is not software. What’s easy to do in software can be ten times harder in hardware where its that much harder to “see” your bugs.

What else might we conclude?

There’s a large portion of digital design that isn’t covered by any HDL standard, but that is rather vendor and even device dependent. This includes clocks, PLLs, I/O primitives, sometimes RAM structures, and definitely hardware multiplies. To be reusable across platforms, you’ll need to take these differences into account. If that’s not enough, the differences between the “standard” languages the tools accept can also be really annoying. Don’t expect a design that hasn’t been used across vendor tools before to immediately work when switching tools. Software bloat costs more memory than anything else, whereas hardware bloat costs actual dollars in terms of scarce hardware resources on an FPGA or area on an ASIC. As a result, hardware designs take more work in order to become reusable across a large variety of needs. Bus standards are awesome–when they are truly standard. AXI USER or Wishbone tag signals aren’t really standard. Similarly, the bus bridges necessary to cross standards have a cost in both area and performance that can’t always be ignored during reuse. Making sure bus standards are standard is one of those reasons why I maintain a series of formal bus protocol checkers in my Wishbone to AXI (pipelined) repository: AXI-Lite, Wishbone (pipeline), Wishbone (classic), Avalon, and even APB. The worst reuse stories, not necessarily those captured above, are reserved for trying to reuse a core that was never formally verified in the first place. It’s in these cases that I most often find myself mis-estimating the time and energy required to get a design “working”, leaving me burning the midnight oil to get a design done by the deadline.

Can reuse happen? Yes, it can.

Do be prepared for all kinds of unexpected issues along the way.