2013 LLVM Developers' Meeting

The meeting serves as a forum for LLVM, Clang, LLDB and other LLVM project developers and users to get acquainted, learn how LLVM is used, and exchange ideas about LLVM and its (potential) applications. More broadly, we believe the event will be of particular interest to the following people:

Active developers of projects in the LLVM Umbrella (LLVM core, Clang, LLDB, libc++, compiler_rt, klee, dragonegg, lld, etc).

Anyone interested in using these as part of another project.

Compiler, programming language, and runtime enthusiasts.

Those interested in using compiler and toolchain technology in novel and interesting ways.

We also invite you to sign up for the official Developer Meeting mailing list to be kept informed of updates concerning the meeting.

November 7 - Meeting Agenda

Talk Abstracts

LLVM: 10 years and going strong

Chris Lattner - Apple, Vikram Adve - University of Illinois, Urbana-Champaign

Slides[1] Slides[2]

Video (Computer) Video (Mobile)

Keynote talk celebrating the 10th anniversary of LLVM 1.0.

Emscripten: Compiling LLVM bitcode to JavaScript

Alon Zakai - Mozilla

Slides

Video (Computer) Video (Mobile)

Emscripten is an open source compiler that converts LLVM bitcode to JavaScript. JavaScript is a fairly unusual target for compilation, being a high-level dynamic language instead of a low-level CPU assembly, but efficient compilation to JavaScript is useful because of the ubiquity of web browsers which use it as their standard language. This talk will detail how Emscripten utilizes LLVM and clang to convert C/C++ into JavaScript, and cover the specific challenges that compiling to JavaScript entails, such as the lack of goto statements, while on the other hand making other aspects of compilation simpler, for example having native exception handling support. Some such issues are general and have to do with JavaScript itself, but specific challenges with Emscripten's interaction with LLVM will also be described, as well as opportunities for better integration between the projects in the future.

Code Size Reduction using Similar Function Merging

Tobias Edler von Koch - University of Edinburgh / QuIC, Pranav Bhandarkar - QuIC

Slides

Video (Computer) Video (Mobile)

Code size reduction is a critical goal for compiler optimizations targeting embedded applications. While LLVM continues to improve its performance optimization capabilities, it is currently still lacking a robust set of optimizations specifically targeting code size. In our talk, we will describe an optimization pass that aims to reduce code size by merging similar functions at the IR level. Significantly extending the existing MergeFunctions optimization, the pass is capable of merging multiple functions even if there are minor differences between them. A number of heuristics are used to determine when merging of functions is profitable. Alongside hash tables, these also ensure that compilation time remains at an acceptable level. We will describe our experience of using this new optimization pass to reduce the code size of a significant embedded application at Qualcomm Innovation Center by 2%.

Julia: An LLVM-based approach to scientific computing

Keno Fischer - Harvard College/MIT CSAIL

Slides

Video (Computer) Video (Mobile)

Julia is a new high-level dynamic programming language specifically designed for scientific and technical computing, while at the same time not ignoring the need for the expressiveness and the power of a modern general purpose programming language.

Thanks to LLVM's JIT compilation capabilities, for which Julia was written from the ground up, Julia can achieve a level of performance usually reserved for compiled programs written in C, C++ or other compiled languages. It thus manages to bridge the gap between very high level languages such as MATLAB, R or Python usually used for algorithm prototyping and those languages used when performance is of the essence, reducing development time and the possibility for subtle differences between the prototype and the production algorithms.

Verifying optimizations using SMT solvers

Nuno Lopes - INESC-ID / U. Lisboa

Slides Video (Computer) Video (Mobile)

Instcombine and Selection DAG optimizations, although usually simple, can easily hide bugs. We've had many cases in the past where these optimizers were producing wrong code in certain corner cases. In this talk I'll describe a way to prove the correctness of such optimization using an off-the-shelf SMT solver (bit-vector theory). I'll give examples of past bugs found in these optimizations, how to encode them into SMT-Lib 2 format, and how to spot the bugs. The encoding to the SMT format, although manual, is straightfoward and consumes little time. The verification is then automatic.

New Address Sanitizer Features

Kostya Serebryany - Google, Alexey Samsonov - Google

Slides

Video (Computer) Video (Mobile)

AddressSanitizer is a fast memory error detector that uses LLVM for compile-time instrumentation. In this talk we will present several new features in AddressSanitizer.

Initialization order checker finds bugs where the program behavior depends on the order in which global variables from different modules are initialized.

Stack-use-after-scope detector finds uses of stack-allocated objects outside of the scope where they are defined.

Similarly, stack-use-after-return detector finds uses of stack variables after the functions they are defined in have exited.

LeakSanitizer finds heap memory leaks; it is built on top of AddressSanitizer memory allocator.

We will also give an update on AddressSanitizer for Linux kernel.

A Detailed Look at the R600 Backend

Tom Stellard - Advanced Micro Devices Inc.

Slides

Video (Computer) Video (Mobile)

The R600 backend, which targets AMD GPUs, was merged into LLVM prior to the 3.3 release. It is one component of AMD's open source GPU drivers which provide support for several popular graphics and compute APIs. The backend supports two different generation of GPUs, the older VLIW4/VLIW5 architecture and the more recent GCN architecture. In this talk, I will discuss the history of the R600 backend, how it is used, and why we choose to use LLVM for our open source drivers. Additionally, I'll give an in-depth look at the backend and its features and present an overview of the unique architecture of supported GPUs. I will describe the challenges this architecture presented in writing an LLVM backend and the approaches we have taken for instruction selection and scheduling. I will also look at the future goals for this backend and areas for improvement in the backend as well as core LLVM.

Developer Toolchain for the PlayStation®4

Paul T. Robinson - Sony Computer Entertainment America

Slides

Video (Computer) Video (Mobile)

The PlayStation®4 has a developer toolchain centered on Clang as the CPU compiler. We describe how Clang/LLVM fits into Sony Computer Entertainment's (mostly proprietary) toolchain, focusing on customizations, game-developer experience, and working with the open-source community.

Annotations for Safe Parallelism in Clang

Alexandros Tzannes - University of Illinois, Urbana-Champaign

Slides

Video (Computer) Video (Mobile)

The Annotations for Safe Parallelism (ASaP) project at UIUC is implementing a static checker in Clang to allow writing provably safe parallel code. ASaP is inspired by DPJ (Deterministic Parallel Java) but unlike it, it does not extend the base language. Instead, we rely on the rich C++11 attribute system to enrich C++ types and to pass information to our ASaP checker. The ASaP checker gives strong guarantees such as race-freedom, *strong* atomicity, and deadlock freedom for commonly used parallelism patterns, and it is at the prototyping stage where we can prove the parallel safety of simple TBB programs. We are evolving ASaP in collaboration with our Autodesk partners who help guide its design in order to solve incrementally complex problems faced by real software teams in industry. In this presentation, I will present an overview of how the checker works, what is currently supported, what we have "in the works", and some discussion about incorporating some of the ideas of the thread safety annotation to assist our analysis.

Vectorization in LLVM

Nadav Rotem - Apple, Arnold Schwaighofer - Apple

Slides

Video (Computer) Video (Mobile)

Vectorization is a powerful optimization that can accelerate programs in multiple domains. Over the last year two new vectorization passes were added to LLVM: the Loop-vectorizer, which vectorizes loops, and the SLP-vectorizer, which combines independent scalar calculations into a vector. Both of these optimizations together show a significant performance increase on many applications. In this talk we’ll present our work on the vectorizers in the past year. We’ll discuss the overall architecture of these passes, the cost model for deciding when vectorization is profitable, and describe some interesting design tradeoffs. Finally, we want to talk about some ideas to further improve the vectorization infrastructure.

Bringing clang and LLVM to Visual C++ users

Reid Kleckner - Google

Slides

Video (Computer) Video (Mobile)

This talk covers the work we've been doing to help make clang and LLVM more compatible with Microsoft's Visual C++ toolchain. With a compatible toolchain, we can deliver all of the features that clang and LLVM have to offer, such as AddressSanitizer. Perhaps the most important point of compatibility is the C++ ABI, which is a huge and complicated beast that covers name mangling, calling conventions, record layout, vtable layout, virtual inheritance, and more. This talk will go into detail about some of the more interesting parts of the ABI.

Building a Modern Database with LLVM

Skye Wanderman-Milne - Cloudera

Slides

Video (Computer) Video (Mobile)

Cloudera Impala is a low-latency SQL query engine for Apache Hadoop. In order to achieve optimal CPU efficiency and query execution times, Impala uses LLVM to perform JIT code generation to take advantage of query-specific information unavailable at compile time. For example, code generation allows us to remove many conditionals (and the associated branch misprediction overhead) necessary for handling multiples types, operators, functions, etc.; inline what would otherwise be virtual function calls; and propagate query-specific constants. These optimization can reduce overall query time by almost 300%.

In this talk, I'll outline the motivation for using LLVM within Impala and go over some examples and results of JIT optimizations we currently perform, as well as ones we'd like to implement in the future.

Adapting LLDB for your hardware: Remote Debugging the Hexagon DSP

Colin Riley - Codeplay

Slides

Video (Computer) Video (Mobile)

LLDB is at the stage of development where support is being added for a wide range of hardware devices. Its modular approach means adapting it to debug a new system has a well-defined step-by-step process, which can progress fairly quickly. Presented is a guide of what implementation steps are required to get your hardware supported via LLDB using Remote Debugging, giving examples from work we are doing to support the Hexagon DSP within LLDB.

PGO in LLVM: Status and Current Work

Bob Wilson - Apple, Chandler Carruth - Google, Diego Novillo - Google

Slides

Video (Computer) Video (Mobile)

Profile Guided Optimization (PGO) is one of the most fundamental weaknesses in the LLVM optimization portfolio. We have had several attempts to build it, and to this day we still lack a holistic platform for driving optimizations through profiling. This talk will consist of three light-speed crash courses on where PGO is in LLVM, where it needs to be, and how several of us are working to get it there.

First, we will present some motivational background on what PGO is good for and what it isn't. We will cover exactly how profile information interacts with the LLVM optimizations, the strategies we use at a high level to organize and use profile information, and the specific optimizations that are in turn driven by it. Much of this will cover infrastructure as it exists today, with some forward-looking information added into the mix.

Next, we will cover one planned technique for getting profile information into LLVM: AutoProfile. This technique simplifies the use and deployment of PGO by using external profile sources such as Linux perf events or other sample-based external profilers. When available, it has some key advantages: no instrumentation build mode, reduced instrumentation overhead, and more predictable application behavior by using hardware to assist the profiling.

Finally, we will cover an alternate strategy to provide more traditional and detailed profiling through compiler inserted instrumentation. This approach will also strive toward two fundamental goals: resilience of the profile to beth source code and compiler changes, and visualization of the profile by developers to understand how their code is being exercised. The second draws obvious parallels with code coverage tools, and the design tries to unify these two use cases in a way that the same infrastructure can drive both.

Poster Abstracts

Finding a few needles in some large haystacks: Identifying missing target optimizations using a superoptimizer

Hal Finkel - Argonne National Laboratory

Poster

So you're developing an LLVM backend, and you've added a bunch of TableGen patterns, custom DAG combines and other lowering code; are you done? This poster describes the development of a specialized superoptimizer, applied to the output of the compiler on large codebases, to look for missing optimizations in the PowerPC backend. This superoptimizer extracts potentially-interesting instruction sequences from assembly code, and then uses the open-source CVC4 SMT solver to search for provably-correct shorter alternatives.

Intel® AVX-512 Architecture. Comprehensive vector extension for HPC and enterprise

Elena Demikhovsky, Intel® Software and Services Group - Israel

Poster

Knights Landing (KNL) is the second generation of the Intel® MIC architecture-based products. KNL will support Intel® Advanced Vector Extensions 512 instruction set architecture, a significant leap in SIMD support. This new ISA, designed with unprecedented level of richness, offers a new level of support and opportunities for vectorizing compilers to target efficiently. The poster presents Intel®AVX-512 ISA and shows how the new capabilities may be used in LLVM compiler.

Fracture: Inverting the Target Independent Code Generator

Richard T. Carback III – Charles Stark Draper Laboratories

Poster

Fracture is a TableGen backend and associated library that ingests a basic block of target instructions and emits a DAG which resembles the post-legalization phase of LLVM’s SelectionDAG instruction selection process. It leverages the pre-existing target TableGen definitions, without modification, to provide a generic way to abstract LLVM IR efficiently from different target instruction sets. Fracture can speed up a variety of applications and also enable generic implementations of a number of static and dynamic analysis tools. Examples include interactive debuggers or disassemblers that provide LLVM IR representations to users unfamiliar with the instruction set, static analysis algorithms that solve indirect control transfer (ICT) problems modified for IR to use KLEE or other LLVM technologies, and IR-based decompilers or emulators extended to work on machine binaries.

Automatic generation of LLVM backends from LISA

Jeroen Dobbelaere - Synopsys

Poster

LISA (language for instruction-set architectures) allows for the efficient specification of processor architectures, including non-standard, customized architectures. Using a LISA input specification designers can automatically generate instruction-set simulator, assembler, linker, debugger interface as well as RTL.

We have extended LISA to allow for the generation of a LLVM compiler backend tailored to the custom architecture. This work includes the development of a new scheduler that is able to handle hazards with high latency and delay slots, expanding the applicability of LLVM to a wider range of architectures. The LISA-based design flow allows for rapid architectural explorations, profiling dozens of different processors architectures within hours, with the automatic generation of a LLVM compiler being a key enabler of this design methodology.

clad - Automatic Differentiation with Clang

Violeta Ilieva (Princeton University), CERN; Vassil Vassilev, CERN

Poster

Automatic differentiation (AD) evaluates the derivative of a function specified in a computer program by applying a set of techniques to change the semantics of that function. Unlike other methods for differentiation, such as numerical and symbolic, AD yields machine-precision derivatives even of complicated functions at relatively low processing and storage costs. We would like to present our AD tool, clad - a clang plugin that derives C++ functions through implementing source code transformation and employing the chain rule of differential calculus in its forward mode. That is, clad decomposes the original functions into elementary statements and generates their derivatives with respect to the user-defined independent variables. The combination of these intermediate expressions forms additional source code, built through modifying clang’s abstract syntax tree (AST) along the control flow. Compared to other tools, clad has the advantage of relying on clang and llvm modules for parsing the original program. It uses clang's plugin mechanism for constructing the derivative's AST representation, for generating executable code, and for performing global analysis. Thus it results in low maintenance, high compatibility, and excellent performance.

Lightning Talk Abstracts

Fixing MC for ARM v7-A: Just a few corner cases – how hard can it be?

Mihail Popa - ARM

Slides

Video (Computer) Video (Mobile)

In 2012, MC Hammer was presented as a testing infrastructure to exhaustively verify the MC layer implementation for the ARM backend. Within ARM we have been working to fix any bugs and we have reached the point where all but one problem remains unsolved. Some of the issues discovered in this process have proven to be excessively difficult to fix. The purpose of the presentation is to give a brief rundown of the major headaches and to suggest possible courses of action for improving LLVM infrastructure.

VLIW Support in the MC Layer

Mario Guerra - Qualcomm Innovation Center, Incorporated

Slides

Video (Computer) Video (Mobile)

Modern DSP architectures such as Hexagon use VLIW instruction packets, which are not well suited to the single instruction streaming model of the LLVM MC layer. Developing an assembler for Hexagon presents unique challenges in the MC layer, especially since Hexagon leverages an optimizing assembler to achieve maximum performance. It is possible to support VLIW within the MC layer by treating every MC instruction as a bundle, and adding all instructions in a packet as sub instruction operands. Furthermore, subclassing MCInst to create a target-specific type of MCInst allows us to capture packet information that will be used to make optimization decisions prior to emitting the code to object format.

Link-Time Optimization without Linker Support

Yunzhong Gao - Sony Computer Entertainment America

Slides

Video (Computer) Video (Mobile)

LLVM's plugin for the Gold linker enables link-time optimization (LTO). But the toolchain for PlayStation®4 does not include Gold. Here's how we achieved LTO without a bitcode-aware linker.

A comparison of the DWARF debugging information produced by LLVM and GCC

Keith Walker, ARM

Slides

Video (Computer) Video (Mobile)

This talk explores the quality of the DWARF debugging information generated by LLVM by comparing it with that produced by GCC for ARM/AArch64 based targets. It highlights where LLVM's debugging information is superior to that generated by GCC and also where there are deficiencies and scope for further development. I will also explain how these difference translate into good or bad debug experiences for users of LLVM.

aarch64 neon work

Ana Pazos - QuIC, Jiangning Liu - ARM

Video (Computer) Video (Mobile)

Slides

ARM and Qualcom are implementing aarch64 advanced SIMD (neon) instruction set. We as a joint team will be implementing all of 25 classes of neon instructions on MC layer as well as all of ACLE(ARM C Language Extension) intrinsics on C level. Our talk will highlight the design choice of unique arm_neon.h for both ARM(aarch32) and aarch64, appropriate decision making of value types on LLVM IR for generating SISD instruction classes, the patterns’ qualities in .td files by reducing LLVM IR intrinsics, and all of the test categories to build a robust back-end. Finally, we’d like to mention some future plan like enabling machine instruction based scheduler, and performance tuning etc.

JavaScript JIT with LLVM

Filip Pizlo - Apple Inc.

Slides

Video (Computer) Video (Mobile)

Dynamic languages present unique challenges for compilation, such as the need for type speculation and self-modifying code. This talk shows how to add support for these features to LLVM and use them to implement a JIT for JavaScript.

Debug Info Quick Update

Eric Christopher - Google Inc.

Slides

Video (Computer) Video (Mobile)

A quick update on what's been going on in debug info support since the Euro meeting.

lld a linker framework

Shankar Easwaran, Qualcomm Innovation Centre.

Slides

Video (Computer) Video (Mobile)

The lld project is working towards becoming a production quality linker targeting PECOFF, Darwin, ELF formats.The lld project is under heavy development. The talk discusses on how lld achieves universal linking and how its moving towards becoming a linker framework that could be an integral part of llvm. The talk continues to discuss by exposes new opportunities with linking like, lld API's, Symbol resolution improvements, Link time optimizations(LTO) and enhancing the user experience by providing diagnostics, user driven inputs that drive linker behavior.

BoF Abstracts

BOF: Performance Tracking & Benchmarking Infrastructure

Kristof Beyls - ARM

We lack a good public infrastructure to efficiently track performance improvements/regressions easily. As a small step to improve on the current situation, I propose to organize a BoF to discuss mainly the following topics:

(a) What advantages do we want the performance tracking and benchmarking infrastructure to give us?

(b) What are the main technical and non-technical challenges we expect for setting up an infrastructure?

BOF: TableNextGen

Mihail Popa - ARM

Tablegen is an essential component of the LLVM ecosystem and time has come to consider its evolution. The largest issues are the lack of formal specification, the mixing of logical concepts and the unsuitability for automated generation. The aim of this BoF is to gather ideas toward an improved specification language which follows the generally accepted criteria for domain specific languages: well defined domain meta-models, formally defined semantics, simplicity, expressiveness, lack of redundancy.

BOF: Debug Info

Eric Christopher - Google



BOF: Extending the Sanitizer tools and porting them to other platforms

Kostya Serebryany - Google, Alexey Samsonov - Google, Evgeniy Stepanov - Google



BOF: High Level Loop Optimization / Polly

Tobias Grosser - INRIA, Sebastian Pop - QuIC, Zino Benaissa - QuIC

Discussions about Loop Optimizations, both generic ones as well as polyhedral Loop Optimizations as implemented in Polly. Topics include the pass order for high level loop optimizations, scalar evolution, dependence analysis, high level loop optimizations in core LLVM, the polyhedral infrastructure of Polly as well as the isl polyhedral support library.

BOF: Optimizations using LTO

Zino Benaissa - QuIC



BOF: JIT & MCJIT

Andy Kaylor - Intel Corporation

