TL;DR

We look at the basic structure of a JavaScript object in memory and learn about the butterfly

watch on YouTube

Introduction

Let's learn more about JavaScriptCore internals, by looking at the structure of JSObjects in memory. This is well explained in saelo's phrack paper Attacking JavaScript Engines, but I hope I can add to this by playing with the debugger from last post. In the last part we already had a quick peek into the memory of an array [1, 2, 3, 4] and found those numbers, but we also saw that it had all the high bits set to 0xffff - we will also learn what's up with that.

The Sources for JSValue

A very important class that is used to handle a lot of values in JavaScript, is the class JSValue. We can find the class definition in the file JSCJSValue.h, where we can also see that it seems to be able to handle a lot of different types like Integers, Doubles or Booleans.

//[...] bool isInt32() const; bool isUInt32() const; bool isDouble() const; bool isTrue() const; bool isFalse() const; int32_t asInt32() const; uint32_t asUInt32() const; int64_t asAnyInt() const; uint32_t asUInt32AsAnyInt() const; int32_t asInt32AsAnyInt() const; double asDouble() const; bool asBoolean() const; double asNumber() const; //[...]

This class also has a compiler switch to use different implementations for 32bit and 64bit, but because everything is 64bit nowadays, we focus on that. This section also contains a large comment explaining what a JSValue - look at it carefully, because we will revisit this text many times.

//[...] #elif USE(JSVALUE64) /* * On 64-bit platforms USE(JSVALUE64) should be defined, and we use a NaN-encoded * form for immediates. * * The encoding makes use of unused NaN space in the IEEE754 representation. Any value * with the top 13 bits set represents a QNaN (with the sign bit set). QNaN values * can encode a 51-bit payload. Hardware produced and C-library payloads typically * have a payload of zero. We assume that non-zero payloads are available to encode * pointer and integer values. Since any 64-bit bit pattern where the top 15 bits are * all set represents a NaN with a non-zero payload, we can use this space in the NaN * ranges to encode other values (however there are also other ranges of NaN space that * could have been selected). * * This range of NaN space is represented by 64-bit numbers begining with the 16-bit * hex patterns 0xFFFE and 0xFFFF - we rely on the fact that no valid double-precision * numbers will fall in these ranges. * * The top 16-bits denote the type of the encoded JSValue: * * Pointer { 0000:PPPP:PPPP:PPPP * / 0001:****:****:**** * Double { ... * \ FFFE:****:****:**** * Integer { FFFF:0000:IIII:IIII * * * The scheme we have implemented encodes double precision values by performing a * 64-bit integer addition of the value 2^48 to the number. After this manipulation * no encoded double-precision value will begin with the pattern 0x0000 or 0xFFFF. * Values must be decoded by reversing this operation before subsequent floating point * operations may be peformed. * * 32-bit signed integers are marked with the 16-bit tag 0xFFFF. * * The tag 0x0000 denotes a pointer, or another form of tagged immediate. Boolean, * null and undefined values are represented by specific, invalid pointer values: * * False: 0x06 * True: 0x07 * Undefined: 0x0a * Null: 0x02 * * These values have the following properties: * - Bit 1 (TagBitTypeOther) is set for all four values, allowing real pointers to be * quickly distinguished from all immediate values, including these invalid pointers. * - With bit 3 is masked out (TagBitUndefined) Undefined and Null share the * same value, allowing null & undefined to be quickly detected. * * No valid JSValue will have the bit pattern 0x0, this is used to represent array * holes, and as a C++ 'no value' result (e.g. JSValue() has an internal value of 0). */

If you have read this comment carefully, you might have noticed the encoding table of JSValues, and that it explains the 0xffff that we have seen in the array. JSValues can contain different types, and the upper bits define what it is. This also explains why JavaScript only handles 32bit integers, despite running on 64bit, because the JSValue encodes integers by setting the top 32bit to 0xffff0000. If the upper bits are 0x0000 it's a pointer, and anything in between is a float/double.

Pointer { 0000:PPPP:PPPP:PPPP / 0001:****:****:**** Double { ... \ FFFE:****:****:**** Integer { FFFF:0000:IIII:IIII

Additionally there are also a few other constants that you might be familiar with from JavaScript, which can be encoded as a JSValue. So for example False would be 0x06 or a Null would be 0x02.

False: 0x06 True: 0x07 Undefined: 0x0a Null: 0x02

But let's look at this in memory.

Playing with the Debugger

As a first test we can create a weird array that contains various different types and then look at it in memory. But we already see something weird, because the first element is supposed to be an integer. However when investigating the memory, it turns out to be a float. What's going on?

[0x1337,13.37,false,undefined,ture,null,{},0x41424344]

Let's take it slowly and build up the array element by element. By doing so we can observe that the internal type of the whole array keeps changing and that also the first element is sometimes converted to a float.

This shows us that JavaScriptCore does quite a lot of magic in the background. But let's look at this closer.

Identifying JSValues in Memory

By comparing the values of the array in memory with the information about JSValues, we can easily identify the constants like undefined or false.

JSValue constants: false, undefined, true, null

The empty JavaScript object we have created shows up as a pointer. So this is an address and the actual object is stored somewhere else.

JSValue: Object/Pointer

And of course we can also identify Integers again with the 0xffff0000 prefix.

JSValue: Integer

The Butterfly

When looking at the output of the describe() function, we also see a so called butterfly address. From playing around with the memory, we already know that it contained the elements of an array. But why is it called butterfly?

The reasons becomes clear when looking at where this address points to. Usually addresses/pointers point to the start of a structure, but here it points into the middle. To the right of the pointer there are the array elements. And to the left of the pointer there is the array length and other object property values.

The Structure ID

Besides the butterfly we have also seen other values in memory that are part of the object. The first 8 bytes contain flags that describe some internal properties and also the StructureID which is very important. This number defines the structure of this particular offset.

StructureID in Memory

When playing around with objects by changing various things, we can see when the StructureID changes or not. For example when adding properties to an object like a.x = 1 , then we notice that the StructureID changes. Actually if no object with such a structure exists yet, then a new StructureID will be created, and we can also see that it's being simply incremented.

Final Words

In the video you can find some more examples playing around with the objects and examining them in memory. But I highly encourage you to use the debugger and play around with this on your own. By doing so you should be able to get comfortable with the internal structure of the basic JavaScript objects and types, because later we have to build on that.

And one last tip, of course you can also use the print feature of lldb to use the symbol information of the debug build. For example the screenshot below shows that the StructureID and the other flags belong to the so called JSCell header, and that the butterfly is part of the JSObject class.

References