UCB CS294-113: Virtual Machines and Managed Runtimes

In early 2015 I was honored to be invited to develop and present a graduate course on Virtual Machines at UC Berkeley. The result is CS294-113: Virtual Machines and Managed Runtimes, which was presented in the Fall of 2015.

This page contains the materials from that course. All materials are Copyright © Oracle and Mario Wolczko, 2015-6, except as noted. The materials can be used non-commercially under the following Creative Commons license:



CS294-113: Virtual Machines and Managed Runtimes by Mario Wolczko is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Caveat: a few items are still to be completed.

Update, July 2016

Overview

The original web page advertising the course can be found here (of historical interest only, as it was written before the course was prepared and hence is not totally accurate).

Prerequisites

Students are required to have a strong background in systems programming in C and machine-level operation (assembly and machine code), and a working knowledge of Java. Basic knowledge of compiler internals is recommended but not required.

Description

The widespread adoption of FORTRAN in the 1950s and 1960s resulted in a plethora of high-level programming languages directly compiled to machine code, some of which still thrive (e.g. FORTRAN itself, as well as C and C++). However, in the 1970s and 1980s a different approach to execution gained in popularity, in which a layer of software continually intermediates between the high-level program and the machine. Most often called Virtual Machines, this approach initially gained popularity with the Pascal P-machines, Smalltalk-80's bytecode machine, and gained a huge boost with the emergence of Java and the JVM in the mid-1990s. Though Virtual Machines now dominate high-level language implementation, they have a reputation for being many orders of magnitude slower than traditionally compiled languages.

However, when coupled with dynamic compilation techniques, Virtual Machines can provide performance comparable to direct compilation while offering machine independent binary distribution, advanced memory management, better security, interactive program development and many other advantages. The objective of this seminar is to explore the design and construction of virtual machines by studying the history of the field, analyzing landmark systems and by hands-on construction and modification.

The presentation will take a mostly chronological approach, starting with early techniques and progressing through to the state of the art. Each week we will learn about the preeminent problems of a given era and how those problems were overcome. In the labs we will reprise some of these accomplishments through a graded series of exercises in which we build components of a virtual machine for an invented language. The initial exercises will implement basic techniques in C; later we will switch to a virtual machine framework known as Truffle which will provide sophisticated components that we can assemble and customize into a larger system.

Lectures

Virtual machines and managed runtimes: taxonomy, characteristics, history

Interpretation of abstract syntax trees and bytecodes

Threaded code and other performance techniques

Automatic memory management (reference counting, tracing collection)

Basic just-in-time compilation

Advanced garbage collection (generational and concurrent techniques)

Dynamic optimizing compilation

Adaptive feedback-driven techniques

Dynamic deoptimization

Trace compilation; metatracing

Metacircular VMs

Readings

Labs

Source code -- TBD

Videos

The Berkeley lectures were recorded and the videos uploaded to YouTube. Note that it is raw, unedited amateur video -- don't expect professional production standards.

The neat boundaries between topics are not reflected in the videos: sometimes a topic ran over the allotted time and was finished the following week, so you may have to hunt in the videos to find a specific topic.

In the following table I have listed the lectures with links to the videos, the accompanying slides and the lab exercises. The exercises are placed next to the relevant material; in the actual course, they sometimes lagged by a week or two (typically, exercises were dispensed weekly; students were given two weeks for #5 and #9, and 3 weeks for #7).

The Whole Shebang

Other courses

Gregor Richards' Waterloo courses on VMs for Dynamic Languages and Automatic memory management and GC.

Phil McGachey's Harvard course on Managed Environments.

Michael Haupt and Andreas Sewe taught a course on Virtual Machines at TU Darmstadt in 2010. Michael sent me a copy of his slides; I do not believe they are available on the web.

Lars Bak taught a VM course in Aarhus in 2002 (Design and Implementation of Object-Oriented Virtual Machines; thanks to Alan Harkreader for finding this archive)

Copyright details

Acknowledgements

I'd like to express my thanks to the following:

Patrick Li, my T.A. for the course. Patrick devised the Feeny language used in the exercises, wrote the Lab exercises and the model answers, and did all the grading,

Prof. Jonathan Bachrach for the invitation to give the course,

The guest speakers (in order of appearance): Peter Deutsch, Allan Schiffman, David Ungar, Cliff Click, Lars Bak, Carl Friedrich Bolz, Thomas Würthinger and Michael Van De Vanter,

My management at Oracle Labs for supporting this effort,

Michael Haupt for sharing his VM course material,

Christian Wimmer for assistance with Truffle, and

All the students who participated, for their patience, enthusiasm, attention, questions, efforts and feedback.

Mario Wolczko

March 2016