Ivan Godard, CTO of Mill Computing, Inc., will be giving a talk at Facebook.

The particulars:

Monday, July 14, 2014

Doors open at 10:30 AM, Talk is from 11 AM to 12:30 PM

1 Hacker Way, Bldg 10

Menlo Park, CA 94025

Enter via the left lane of the Willow Road entrance. There is visitor parking along the front of Building 10, but if all of those are taken there is overflow parking across the street. Guests should come to Building 10 and sign in. They will then be escorted to Room 11.2. Our hosts at Facebook are Edwin Smith and Jason Evans.

This will be the ninth topic publicly presented on the Mill general-purpose CPU architecture. It will cover the methods used to perform software pipelining on the Mill Architecture. The talk will assume some general familiarity with software pipelining.

Software pipelining on the Mill CPU:

Instant pipeline: add loop, no stirring needed

The Mill CPU architecture is very wide, able to issue and execute 30+independent MIMD operations per cycle. Non-looping open code often cannot use this raw compute capacity, but fortunately >80% of cycles are in loops. Loops potentially have unbounded instruction-level parallelism and can absorb all the capacity available – if the loop can be pipelined.

This talk addresses how loops are pipelined on the Mill architecture. On a conventional machine, pipelining requires lengthy prelude and postlude instruction sequences to get the pipeline started and wound down, frequently destroying the benefit of pipelining the main body. Conventional pipelining can be of negative benefit on short loops, especially “while” type loops whose length is unknown and data dependent. Not so on a Mill: Mill pipelines have neither prelude nor postlude, and early conditional exit has no added cost.

Pipelines on conventional machines also have problems with loop-carried data, values produced by one iteration but consumed by another. Conventional code must resort to bucket-brigade register copies, or fail to pipeline altogether. Even architectures like the Itanium, which have special hardware to support pipelining, provide it only for the innermost loop. In contrast, the Mill needs no copies and can pipeline outer as well as inner loops.

Familiarity with prior talks in this series, especially the Belt and Metadata talks will be helpful but not essential.