Debunking the Myth of High-level Languages

By David Chisnall

Date: Jul 14, 2006

Return to the article

High-level languages are not intrinsically slow! David Chisnall explains how programming in a low-level language can make a compiler's job harder, essentially wasting effort and slowing down your program's processing.

The closer to the metal you can get while programming, the faster your program will compile — or so conventional wisdom would have you believe. In this article, I will show you how high-level languages like Java aren't slow by nature, and in fact low level languages may compile less efficiently.

What Is a ’High-Level’ Language?

A computer language is a way of representing a program that can be translated into something that a computer can execute. A language is described as low-level if it’s close to instructions executed by the hardware.

The lowest-level language that can be used is raw machine code—a string of numbers representing instructions and operands understood by the CPU. Note that in most modern microprocessors this is still one layer of abstraction away from the "real" instructions. A modern x86 CPU, for example, will split each of these instructions into a series of micro-operations (μOps) and then execute the μOps individually.

The next layer up is assembly languages, which are semantically equivalent to the raw machine code; one assembly language statement translates directly to one machine instruction. Assembly languages are slightly easier to read than machine code, because they substitute mnemonics for numbers. They often have some syntactic sugar, such as the ability to define macros—code segments that are reused frequently—and insert them by name. They also have the ability to define jump targets symbolically, rather than having to change an address everywhere in your program when you insert an instruction before a jump.

Moving slightly further up, we get to languages like C. Early versions of UNIX were written in Assembler, but this proved to be a hindrance when porting it to new platforms, because assembly languages are machine-specific. Existing high-level languages, such as LISP, provided too much abstraction for implementing an operating system, so a new language was created. This language, C, was a very slightly abstracted form of PDP-11 assembly language. There is almost a 1:1 mapping between C semantics and PDP-11 machine code, making it very easy to compile C for the PDP-11 (the target machine of UNIX at the time).

For a long time, LISP was the archetypal high-level language. It provides a very flexible syntax in which many complex design patterns can be represented in a reusable fashion. LISP is generally categorized as a functional language, but this isn’t entirely accurate (although it does support functional programming language).

The definition of a high-level language is a moving target. Languages that were considered high-level when I learned to program are now considered low-level. In general, a programming language provides a midway point between how you think about a program and how a computer executes the program. Languages that are closer to you than to the computer are considered high-level, while others are considered low-level.

Portable Abstractions

When C was created, it was very fast because it was almost trivial to turn C code into equivalent machine code. But this was only a short-term benefit; in the 30 years since C was created, processors have changed a lot. The task of mapping C code to a modern microprocessor has gradually become increasingly difficult. Since a lot of legacy C code is still around, however, a huge amount of research effort (and money) has been applied to the problem, so we still can get good performance from the language.

As a simple example, consider the vector unit found in most desktop processors (for example, MMX, SSE, or 3DNow! in x86 chips; AltiVec in PowerPC). These are also known as Single Instruction Multiple Data (SIMD) units because they perform the same operation on multiple inputs simultaneously. One instruction might take four integers, add them to four other integers, and provide the result as a set of four more integers. Now imagine some code that could be adapted to take advantage of this capability.

In C, the usual representation of a vector is as an array. Unfortunately, you can’t define operations on arrays, so adding the values in two arrays would be done as a loop in C, with scalar operations in the loop body. A C compiler needs to be able to spot the following facts:

There are no dependencies between loop iterations.

The operations in the loop are capable of being mapped to vector operations.

This is non-trivial. Now consider a slightly higher-level language, FORTRAN (which predates C by more than two decades, incidentally). FORTRAN has a vector datatype, and operations on it. The only difference between FORTRAN vectors and those understood by the CPU is that FORTRAN vectors are an arbitrary length. All the compiler needs to do is split the vectors into chunks of the correct length.

Virtual Machines and Other Overhead

A lot of criticisms are leveled at Java as a language. In general, the critics fall into two categories:

People complaining that Java is not Smalltalk

People complaining that Java is not C++

A lot of people complain about the overhead of Java bytecode being interpreted. This argument isn’t entirely fair, for two reasons. The first is that Java doesn’t have to be interpreted at all—implementations such as GCJ compile it directly to machine code, and use a runtime library in the same way that languages such as Objective-C work. The other reason is that using a virtual machine doesn’t necessarily translate to overhead.

Research on dynamic recompilers such as Dynamo in 2000 and earlier showed that a just-in-time recompiler, running MIPS code on a MIPS machine, could give a 10–20% speed increase over running the same code on the raw hardware. The reason for this improvement is that a virtual machine could perform some categories of optimization that weren’t available to a static compiler.

Consider the case of function inlining. Although SPARC is something of an exception to this rule, function calls on most architectures are relatively expensive; you have to save the current register set on the stack and then do the same operation in reverse later. For small functions, the time spent performing the function call can be greater than the time spent inside the function. Due to the way C works, it’s impossible for the compiler to inline a function defined in another source file. Both source files are compiled to binary object files independently, and these are linked.

A just-in-time compiler doesn’t have these limitations. It can even inline functions defined in the standard library, because the inlining won’t be preserved between program runs, allowing the library to be updated without rebuilding the application.

To give some idea of how important this one optimization can be, I increased speed in a C program by more than 25% last year simply by moving some commonly used functions into header files where the compiler could inline them.

The Wrong Abstraction

Another thing that’s slow in Java is array accesses. Let’s look at arrays for a second. An array is nothing more than a blob of memory with fixed-size elements. This is actually a very low-level construct. Arrays in Java try to mimic a low-level concept with high-level semantics. This is fairly obviously a bad idea.

Unlike arrays in C, arrays in Java are bound-checked. Every time you access an element in the array, you have to check that the index is in-bounds, and then perform the access. A slightly higher-level abstraction could allow much more of this bounds-checking to be done at compile time. Some research at IBM uses set theoretical concepts to do this, and provides a significant performance boost. The problem, isn’t that Java is a high-level language; it’s that it chose the wrong abstractions.

Other data structures work significantly better in high-level languages. A dictionary or associative array, for example, can be implemented transparently by a tree or a hash table (or some combination of the two) in a high-level language; the runtime can even decide which, based on the amount and type of data fed to it. This kind of dynamic optimization is simply impossible in a low-level language without building higher-level semantics on top and meta-programming—at which point, you would be better off simply selecting a high-level language and letting someone else do the optimization.

The Process of Programming

The process of programming involves the translation of a mental representation of an algorithm into something a computer can understand. In the early days of programming, all of the conversion process had to be performed in the programmers’ brains. Gradually, tools were developed to help the programmer; intermediate representations allowed some of the drudgery of programming to be done automatically. Simple tasks such as register allocation were taken over by automated tools.

Gradually, more and more work was done by machine. A modern Pentium 4, for example, can have about 150 instructions in-flight at once. To write optimal assembly code for this machine, the developer must track a window of 150 instructions at any point to reduce dependencies. Such work is much easier for a machine than for a human, so gradually machines did it more and more.

As compilers improved, one fact became clear; the more information you can give to your optimizer, the better the job it can do. When you program in a low-level language, you throw away a lot of the semantics before you get to the compilation stage, making it much harder for the compiler to do its job.