Imagine you have a hello world C code like this:

You can compile it to LLVM bitcode, and run it on GraalVM.

Note: All examples in this blog post assume you have a copy of GraalVM EE (19.3.0 or later) downloaded, with the $GRAALVM_HOME environment variable pointing to where it is extracted. You also need to install the llvm-toolchain component:

$ $GRAALVM_HOME/bin/gu install llvm-toolchain

To compile our example to LLVM bitcode, we use the llvm-toolchain component to compile the code (the lli --print-toolchain-path command prints the path to the toolchain):

$ TOOL_PATH=$($GRAALVM_HOME/bin/lli --print-toolchain-path)

$ $TOOL_PATH/clang hello.c -o hello

$ $GRAALVM_HOME/bin/lli hello

Hello, World!

But of course, C being an unsafe language, you can also do things like this:

We compile this code:

$ $TOOL_PATH/clang hello-bug.c -o hello-bug

hello-bug.c:4:39: warning: format specifies type 'int *' but the argument has type 'int' [-Wformat]

printf("number of arguments: %n

", argc);

~~ ^~~~

1 warning generated.

Let’s pretend we didn’t see the compiler warning (or maybe it just got lost in pages of other build output, or it was even turned off), and run the code (without GraalVM for now):

$ ./hello-bug

Segmentation fault (core dumped)

Whoops. What happened?

The problem is the mistyped format specifier %n instead of %d . The %n format specifier stores the string length until that point into the provided pointer. We're passing an int . Of course, this crashes. An int is not a valid pointer, but printf tries to write to it anyway. Since C is an unsafe language, the compiler will happily generate code that will pass an int where a pointer should go. At runtime, values are not typed, and a pointer is just another number, so the code inside printf has no way to detect the problem.

In this particular case, the compiler gave us a warning. However, this is only because printf is a well-known library function and the compiler knows what arguments you're supposed to pass. But for your own functions, you're on your own.

Now let’s try the same program with GraalVM.

$ $GRAALVM_HOME/bin/lli hello-bug

Segmentation fault (core dumped)

It still crashes the VM. Why is that?

When GraalVM runs LLVM bitcode, the bitcode is executed as-is. Memory is still allocated on the native heap and external functions are called as they are. If the function is not available as LLVM bitcode, e.g. printf from the system libc , it is just called through the native interface.

GraalVM has some mechanisms for polyglot interop, so if a pointer refers to a managed object (e.g. a JavaScript object), the access can’t crash the VM. However, a native pointer is still just a number, which refers to some memory location on the native heap. GraalVM can’t tell in a reliable way whether a pointer to native memory is valid. And, of course, once you start calling real native code outside of GraalVM like the printf function in libc , all bets are off.

Because of these issues, GraalVM can’t really make any additional security guarantees in this case.