June 06, 2014

nullprogram.com/blog/2014/06/06/

The April 5th, 2014 draft of the ECMA-262 6th Edition specification — a.k.a the next major version of JavaScript/ECMAScript — contained a subtle, though very significant, change to the semantics of the for loop (13.6.3.3). Loop variables are now fresh bindings for each iteration of the loop: a per-iteration binding. Previously loop variables were established once for the entire loop, a per-loop binding. The purpose is an attempt to fix an old gotcha that effects many languages.

If you couldn’t already tell, this is going to be another language lawyer post!

Backup to C

To try to explain what this all means this in plain English, let’s step back a moment and discuss what a for loop really is. I can’t find a source for this, but I’m pretty confident the three-part for loop originated in K&R C.

for (INITIALIZATION; CONDITION; ITERATION) { BODY; }

Evaluate INITIALIZATION. Evaluate CONDITION. If zero (false), exit the for . Evaluate BODY. Evaluate ITERATION and go to 2.

In the original C, and all the way up to C89, no variable declarations were allowed in the initialization expression. I can understand why: there’s a subtle complication, though it’s harmless in C. We’ll get to that soon. Here’s a typical C89 for loop.

int count = 10 ; /* ... */ int i ; for ( i = 0 ; i < count ; i ++ ) { double foo ; /* ... */ }

The variable i is established independent of the loop, in the scope outside the for loop, alongside count . This isn’t even a per-loop binding. As far as the language is concerned, it’s just a variable that the loop happens to access and mutate. It’s very assembly-language-like. Because C has block scoping, the body of the for loop is another nested scope. The variable foo is in this scope, reestablished on each iteration of the loop (per-iteration).

As an implementation detail, foo will reside at the same location on the stack each time around the loop. If it’s accessed before being initialized, it will probably hold the value from the previous iteration, but, as far as the language is concerned, this is just a happy, though undefined, coincidence.

C99 Loops

Fast forward to the end of the 20th century. At this point, other languages have allowed variable declarations in the initialization part for years, so it’s time for C to catch up with C99.

int count = 10 ; /* ... */ for ( int i = 0 ; i < count ; i ++ ) { double foo ; /* ... */ }

Now consider this: in what scope is the variable i ? The outer scope as before? The iteration scope with foo ? The answer is neither. In order to make this work, a whole new loop scope is established in between: a per-loop binding. This scope holds for the entire duration of the loop.

The variable i is constrained to the for loop without being limited to the iteration scope. This is important because i is what keeps track of the loop’s progress. The semantic equivalent in C89 makes the additional scope explicit with a block.

int count = 10 ; /* ... */ { int i ; for ( i = 0 ; i < count ; i ++ ) { double foo ; /* ... */ } }

This, ladies and gentlemen, is the the C-style 3-part for loop. Every language that has this statement, and has block scope, follows these semantics. This included JavaScript up until two months ago, where the draft now gives it its own unique behavior.

JavaScript’s Let

As it exists today in its practical form, little of the above is relevant to JavaScript. JavaScript has no block scope, just function scope. A three-part for-loop doesn’t establish all these scopes, because scopes like these are absent from the language.

An important change coming with 6th edition is the introduction of let declarations. Variables declared with let will have block scope.

let count = 10 ; // ... for ( let i = 0 ; i < count ; i ++ ) { let foo ; // ... } console . log ( foo ); // error console . log ( i ); // error

If these variables had been declared with var , the last two lines wouldn’t be errors (or worse, global references). count , i , and foo would all be in the same function-level scope. This is really great! I look forward to using let exclusively someday.

The Closure Trap

I mentioned a subtle complication. Most of the time programmers don’t need to consider or even be aware of this middle scope. However, when combined with closures it suddenly becomes an issue. Here’s an example with Perl,

my @closures ; for ( my $i = 0 ; $i < 2 ; $i ++ ) { push ( @closures , sub { return $i ; }); } $closures [ 0 ](); # => 2 $closures [ 1 ](); # => 2

Here’s one with Python. Python lacks a three-part for loop, but its standard for loop has similar semantics.

closures = [] for i in xrange ( 2 ): closures . append ( lambda : i ) closures [ 0 ]() # => 1 closures [ 1 ]() # => 1

And now Ruby.

closures = [] for i in ( 0 .. 1 ) closures << lambda { i } end closures [ 0 ]. call # => 1 closures [ 1 ]. call # => 1

In all three cases, one closure is created per iteration. Each closure captures the loop variable i . It’s easy to make the mistake of thinking each closure will return a unique value. However, as pointed out above, this is a per-loop variable, existing in a middle scope. The closures all capture the same variable, merely bound to different values at the time of capture. The solution is to establish a new variable in the iteration scope and capture that instead. Below, I’ve established a $value variable for this.

my @closures ; for ( my $i = 0 ; $i < 2 ; $i ++ ) { my $value = $i ; push ( @closures , sub { return $value ; }); } $closures [ 0 ](); # => 0 $closures [ 1 ](); # => 1

This is something that newbies easily get tripped up on. Because they’re still trying to wrap their heads around the closure concept, this looks like some crazy bug in the interpreter/compiler. I can understand why the ECMA-262 draft was changed to accommodate this situation.

The JavaScript Workaround

The language in the new draft has two items called perIterationBindings and CreatePerIterationEnvironment (in case you’re searching for the relevant part of the spec). Like the $value example above, for loops in JavaScript with “lexical” (i.e. let ) loop bindings will implicitly mask the loop variable with a variable of the same name in the iteration scope.

let closures = []; for ( let i = 0 ; i < 2 ; i ++ ) { closures . push ( function () { return i ; }); } /* Before the change: */ closures [ 0 ](); // => 2 closures [ 1 ](); // => 2 /* After the change: */ closures [ 0 ](); // => 0 closures [ 1 ](); // => 1

Note: If you try to run this yourself, note that at the time of this writing, the only JavaScript implementation I could find that updated to the latest draft was Traceur. You’ll probably see the “before” behavior for now.

You can’t see it (I said it’s implicit!), but under an updated JavaScript implementation there are actually two i variables here. The closures capture the most inner i , the per-iteration version of i . Let’s go back to the original example, JavaScript-style.

let count = 10 ; // ... for ( let i = 0 ; i < count ; i ++ ) { let foo ; // ... }

Here’s what the scope looks like for the latest draft. Notice the second i in the iteration scope. The inner i is initially assigned to the value of the outer i .

We could emulate this in an older edition. Imagine writing a macro to do this.

let count = 10 ; // ... for ( let i = 0 ; i < count ; i ++ ) { let __i = i ; // (possible name collision) { let i = __i ; let foo ; // ... } }

I have to use __i to smuggle the value across scopes without having i reference itself. Unlike Lisp’s let , the assignment value for var and let is evaluated in the nested scope, not the outer scope.

Each iteration gets its own i . But what happens when the loop modifies i ? Simple, it’s copied back out at the end of the body.

let count = 10 ; // ... for ( let i = 0 ; i < count ; i ++ ) { let __i = i ; { let i = __i ; let foo ; // ... __i = i ; } i = __i ; }

Now all the expected for semantics work — the body can also update the loop variable — but we still get the closure-friendly per-iteration variables.

Conclusion

I’m still not sure if I really like this change. It’s clean fix, but the gotcha hasn’t been eliminated. Instead it’s been inverted. Sometime someone will have the unusual circumstance of wanting to capture the loop variable, and he will run into some surprising behavior. Because the semantics are a lot more complicated, it’s hard to reason about what’s not working unless you already know JavaScript has magical for loops.

Perl and C# each also gained per-iteration bindings in their history, but rather than complicate or change their standard for loops, they instead introduced it as a new syntactic construction: foreach .

my @closures ; foreach my $i ( 0 , 1 ) { push ( @closures , sub { return $i ; }); } $closures [ 0 ](); # => 0 $closures [ 1 ](); # => 1

In this case, per-iteration bindings definitely make sense. The variable $i is established and bound to each value in turn. As far as control flow goes, it’s very functional. The binding is never actually mutated.