Lately, I’ve been working on Flash ActionScript 3 decompiler, and I noticed an interesting pattern. Normally, if you work with a piece of well-known software and something goes wrong, it’s your fault. But with Flash it’s not anything like that! If it doesn’t work, then it’s probably a bug in the compiler which was preserved for compatibility. Or the specification is plain wrong. Or it’s a bug in the compiler which no one noticed or attributed to cosmic rays instead.

I’ll give a few examples.

Specification is wrong

The official specification on AVM2 is often plain incorrect. Apart from examples already covered in semi-official Mozilla-authored errata, there are a few subtle mistakes. Like mixing up sign bit and sign extension: section 4.1 of spec mentions that signed integers are stored with sign extension, whereas in reality they’re stored with 31th bit set when the values are negative.

There are some other ones (e.g. pushliteral opcodes are screwed up in spec), but they’re not worth explaining.

Compiler generates dangerously invalid code

When working on support for lookupswitch opcode I wrote a small snippet to test my code with. Disassembling it yielded strange results; the code was seemingly invalid. I scratched my head on it for half a hour and then just went and tried to execute it. And you know what? It actually was invalid.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 function propel_switch ( q : int ) : Boolean { switch ( q ) { case 1 : print ( "hoge" ); break ; case 2 : print ( "fuga" ); break ; case 3 : print ( "piyo" ); break ; case 5 : print ( "bar" ); break ; default : print ( "baz" ); break ; } return false ; } // expected actual propel_switch ( 0 ); // baz baz propel_switch ( 1 ); // hoge hoge propel_switch ( 2 ); // fuga fuga propel_switch ( 3 ); // piyo <nothing printed> propel_switch ( 4 ); // baz <infinite loop> propel_switch ( 5 ); // bar bar

(The “actual” results are derived from assembler listings. Tamarin shell refused to execute it due to verification errors.)

Update: As one Reddit commenter corrects me, recent versions of ASC no longer have this problem.

No optimization ever

ActionScript compiler does not optimize, period. This produces a lot of weird code and some pieces of modern art.

Consider this switch statement (taken from abcdump.as utility):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 switch ( version ) { case 46 << 16 | 14 : case 46 << 16 | 15 : case 46 << 16 | 16 : var abc : Abc = new Abc ( data ) abc . dump () break case 67 | 87 << 8 | 83 << 16 | 10 << 24 : // SWC10 case 67 | 87 << 8 | 83 << 16 | 9 << 24 : // SWC9 case 67 | 87 << 8 | 83 << 16 | 8 << 24 : // SWC8 case 67 | 87 << 8 | 83 << 16 | 7 << 24 : // SWC7 case 67 | 87 << 8 | 83 << 16 | 6 << 24 : // SWC6 var udata : ByteArray = new ByteArray udata . endian = "littleEndian" data . position = 8 data . readBytes ( udata , 0 , data . length - data . position ) var csize : int = udata . length udata . uncompress () infoPrint ( "decompressed swf " + csize + " -> " + udata . length ) udata . position = 0 /*var swf:Swf =*/ new Swf ( udata ) break case 70 | 87 << 8 | 83 << 16 | 10 << 24 : // SWF10 case 70 | 87 << 8 | 83 << 16 | 9 << 24 : // SWF9 case 70 | 87 << 8 | 83 << 16 | 8 << 24 : // SWF8 case 70 | 87 << 8 | 83 << 16 | 7 << 24 : // SWF7 case 70 | 87 << 8 | 83 << 16 | 6 << 24 : // SWF6 case 70 | 87 << 8 | 83 << 16 | 5 << 24 : // SWF5 case 70 | 87 << 8 | 83 << 16 | 4 << 24 : // SWF4 data . position = 8 // skip header and length /*var swf:Swf =*/ new Swf ( data ) break default : print ( 'unknown format ' + version ) break }

Not only it generates a piece of modern art in an IR dump, but also has a statement so beautifully useless it should be preserved for future generations:

1 ( ternary ( false ) ( integer 15 ) ( integer 15 ))

For those unaware of s-expressions and Lisp, not only does this conditional always execute the same branch, but its result also wouldn’t be different even if other one would be taken.

For extra horror, the “piece of modern art” above is executed from scratch each time the VM encounters it, including the constant expressions. Any doubt left why Flash is so slow and power-hungry?

Compiler intentionally generates invalid code

As I’ve already shown, ASC contains enough stupid errors (see this similar bug) to accidentally generate invalid code in not-so-rare cases. But it also intentionally generates invalid code in one very frequent case: a finally block.

Let’s compile this function:

1 2 3 4 5 6 7 function c () { try { hoge (); } finally { piyo (); } }

The compiler will emit a shitload of bytecode (including two catch and two throw statements), but the relevant part is here:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ; This is an exception handler. Stack is empty upon jump to an ; exception handler. ; Address Opcode Args Stack state, comments 0016 GetLocal0 ; [local0] 0017 PushScope ; [] 0018 GetLocal1 ; [local1] 0019 PushScope ; [] 0020 NewCatch ; [catch] 0022 Dup ; [catch catch_dup] 0023 SetLocal2 ; [catch] 0024 PushScope ; [] 0025 Throw ; I want an object to throw! Ouch! 0026 PopScope 0027 Kill 2 0029 PushByte -1 0031 Jump +32 ; Jump to rethrow

As you see, the opcode at addresses 0025 is invalid because it tries to pop an object from an empty stack. The virtual machine actually recognizes the finally clause by encountering these invalid opcodes. Think about it a little longer, and you’ll go insane.

Also, the recommended way to flow control after a finally statement is… using lookupswitch opcode. The PushByte -1 is actually a mark for that lookupswitch trampoline which makes it jump to a rethrow entry point.