Christopher Celio

EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2018-151

December 1, 2018

General-purpose serial-thread performance gains have become more difficult for industry to realize due to the slowing down of process improvements. In this new regime of poor process scaling, continued performance improvement relies on a number of small-scale micro- architectural enhancements. However, software simulator-based models, which computer architecture research has largely relied upon, may not be well-suited for evaluating ideas at the necessary fidelity.

To facilitate architecture research during this fallow period of Moore’s Law, we propose using processor simulators built from synthesizable processor designs. This thesis describes the design of a synthesizable, industry-competitive processor built on recent advancements in open-source hardware: we leverage the new open-source RISC-V instruction set architecture, the new Chisel hardware construction language, and the Rocket-chip processor generator.

Our processor generator is called BOOM, and it designed for use in education, research, and industry. Like most contemporary high-performance cores, BOOM is superscalar (able to execute multiple instructions per cycle) and out-of-order (able to execute instructions as their dependencies are resolved and not restricted to their program order). The BOOM generator was implemented using the Chisel hardware construction language, allowing for the rapid implementation of parameterized designs. The Chisel description generates synthesizable implementations of BOOM that can target both FPGAs and ASIC tool-flows. The BOOM effort culminated in a test chip that was fabricated in the TSMC 28 nm HPM process (high performance mobile) using the foundry-provided standard-cell library and memory compiler.

This thesis highlights two aspects of the BOOM design: its industry-competitive branch prediction and its configurable execution datapath. The remainder of the thesis discusses the BOOM tape-out, which was performed by two graduate students and demonstrated the ability to quickly adapt the design to the physical design issues that arose.

Advisor: David A. Patterson

BibTeX citation:

@phdthesis{Celio:EECS-2018-151, Author = {Celio, Christopher}, Title = {A Highly Productive Implementation of an Out-of-Order Processor Generator}, School = {EECS Department, University of California, Berkeley}, Year = {2018}, Month = {Dec}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-151.html}, Number = {UCB/EECS-2018-151}, Abstract = {General-purpose serial-thread performance gains have become more difficult for industry to realize due to the slowing down of process improvements. In this new regime of poor process scaling, continued performance improvement relies on a number of small-scale micro- architectural enhancements. However, software simulator-based models, which computer architecture research has largely relied upon, may not be well-suited for evaluating ideas at the necessary fidelity. To facilitate architecture research during this fallow period of Moore’s Law, we propose using processor simulators built from synthesizable processor designs. This thesis describes the design of a synthesizable, industry-competitive processor built on recent advancements in open-source hardware: we leverage the new open-source RISC-V instruction set architecture, the new Chisel hardware construction language, and the Rocket-chip processor generator. Our processor generator is called BOOM, and it designed for use in education, research, and industry. Like most contemporary high-performance cores, BOOM is superscalar (able to execute multiple instructions per cycle) and out-of-order (able to execute instructions as their dependencies are resolved and not restricted to their program order). The BOOM generator was implemented using the Chisel hardware construction language, allowing for the rapid implementation of parameterized designs. The Chisel description generates synthesizable implementations of BOOM that can target both FPGAs and ASIC tool-flows. The BOOM effort culminated in a test chip that was fabricated in the TSMC 28 nm HPM process (high performance mobile) using the foundry-provided standard-cell library and memory compiler. This thesis highlights two aspects of the BOOM design: its industry-competitive branch prediction and its configurable execution datapath. The remainder of the thesis discusses the BOOM tape-out, which was performed by two graduate students and demonstrated the ability to quickly adapt the design to the physical design issues that arose.} }

EndNote citation:

%0 Thesis %A Celio, Christopher %T A Highly Productive Implementation of an Out-of-Order Processor Generator %I EECS Department, University of California, Berkeley %D 2018 %8 December 1 %@ UCB/EECS-2018-151 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-151.html %F Celio:EECS-2018-151