This week, I attended the ACM FPGA 2019 conference in Seaside (nr. Monterey), California, the annual premier ACM event on FPGAs and associated technology. I’ve been involved in this conference for many years, as author, TPC member, TPC and general chair, and now steering committee member. Fashions have come and gone over this time, including in the applications of FPGA technology, but the programme at FPGA is always interesting and high quality. This year particular thanks should go to Steve Neuendorffer for organising the conference programme and to Kia Bazargan in his role as General Chair.

Below, I summarise my personal highlights of the conference. These are by no means my view of the “best” papers – they are all good – but rather those that interested me the most.

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity, a collaboration between Tsinghua, Beihang, Harbin Institute of Technology, and Microsoft Research, tackled the problem of ensuring that an inference implementation, when sparsified, gets sparsified in a way that leads to balanced load across the various memory banks. The idea is simple but effective, and leads to an interesting tradeoff between the quality of LSTM output and performance. I think it would be interesting to try to design a training method / regulariser that encourages this kind of structured sparsity in the first place.

Kees Vissers from Xilinx presented a keynote talk summarising their new Versal architecture, which the Imperial team had previously had the pleasure of hearing about from our alumnus Sam Bayliss. This is a really very different architecture to standard FPGA fare, and readers might well be interested in taking a look at Kees’s slides to learn more.

Vaughn Betz presented a paper from the University of Toronto, Math Doesn’t Have to be Hard: Logic Block Architectures to Enhance Low Precision Multiply-Accumulate on FPGAs. This work proposed a number of relatively minor tweaks to Intel FPGA architectures which might have a signifiant impact on low-precision MAC performance. Vaughn began by pointing out that in this application, very general LUTs often get wasted by being used as very simple gates – he gave the example of AND gates in partial product generation, and even as buffers. A number of architectural proposals were made to avoid this issue. I find this particularly interesting at the moment, because together with my PhD student Erwei Wang and others, I have proposed a new neural network architecture called LUTNet, motivated by exactly the same concern. However, our approach is the dual of that presented by Vaughn – we keep the FPGA architecture constant but modify the basic computations performed by the neural network to be more well-tuned to the underlying architecture. Expect a future blog post on our approach!

Lana Josipović presented the most recent work on the dynamically scheduled HLS tool from Paolo Ienne‘s group at EPFL, which they first presented at last year’s conference – see my blog post from last year. This time they have added speculative execution to their armoury. This is a very interesting line of work as HLS moves to encompass more and more complex algorithns, and Lana did a great job illustrating how it works.

Yi-Hsiang Lai presented HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing, an interesting collaboration between Zhiru Zhang‘s group at Cornell and Jason Cong‘s group. This work proposed separating functionality from implementation / optimisation concerns, such as datapath, precision and memory customisation, providing a cleaner level of abstraction. The approach seems very interesting, and reminded me of the aspect-oriented HLS work I contributed to in the REFLECT European project, about which Joāo Cardoso and others have since written a book. I think it’s a promising approach, and I’d be interested to explore the potential and challenges of their tool-flow. This paper won the best paper prize of the conference – congratulations to the authors!

My PhD student Jianyi Cheng presented our own paper, EASY: Efficient Arbiter SYnthesis from Multi-Threaded Code, and did an excellent job. Our paper is described in more detail in an earlier blog post.

Other papers I found particularly interesting include Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs, Microsemi’s contribution on analytic placement, ETH Zürich’s paper on an FPGA implementation of an approximate maximum graph matching algorithm, and U. Waterloo’s paper on a lightweight NoC making use of traffic injection regulation to avoid stalls. Unfortunately I had to miss the talks after noon on Tuesday, so there may well be more of interest in that part of the programme too.

The panel discussion – chaired by Deming Chen – was on the topic of whether FPGAs have a role to play in Supercomputing. As I pointed out in the discussion, to answer this question scientifically we need to have a working definition of “FPGA” and of “Supercomputing” – both seem to be on shifting sands at the moment, and we need to resist reducing a question like this to “does LINPACK run well on a Virtex or Stratix device.”

We also had the pleasure of congratulating Deming Chen and Paul Chow on their recently awarded fellowships, awarding a best paper prize, recognising several historical FPGA papers of significance, and last but by no means least welcoming the new baby of two of the stalwarts of the FPGA community – baby complete with “I am into FPGA” T-shirt! All this led to an excellent community feeling, which we should continue to nurture.