by Sean M. Burke

A JavaScriptish companion to Mark-Jason Dominus's Higher-Order Perl http://hop.perl.plover.com/

~ Under Construction, Obviously ~

For a general review of JavaScript, I think that the best work available is the first third or so of the book Javascript: The Definitive Guide http://www.oreilly.com/catalog/jscript4/. (The rest of the book is a detailed function reference.)

And you really should get a copy of The JavaScript Anthology http://www.powells.com/biblio/2-0975240269-0. It's a mixed bag, but in a good way -- sort of like The Perl Cookbook.

Also, very handy JavaScript quick-reference cards are available from Visibone: http://visibone.com/ And I say this as someone who's never bothered with quick-reference cards until now.

Perl and JavaScript have rather different syntaxes, but surprisingly similar semantics. They vary most notably in their OO systems, but a discussion of that is outside the scope of this document. In the following subsections, I'll note the greatest differences between JavaScript and Perl at the functional level.

JavaScript has only two levels of scope -- there's no block scope.

However, you can fake it with:

(function () { var x = ...; })();

Also, there's no JavaScript equivalent of Perl's do { ... } . But you can fake that too with:

v = (function () { ... return expr; })();

Perl uses the last value in a function as an implicit return value, as in:

sub square { $_[0] ** 2 }

However, JavaScript requires an explicit return ...; statement. The above would have to be this in JavaScript:

sub square (x) { return Math.pow(x,2); }

because if you did just this:

sub square (x) { Math.pow(x,2) }

then the return value, for lack of an actual return...; statament, would be undefined (i.e., the JavaScript equivalent of Perl's undef ).

Perl functions can return no values, one value, or several; but all JavaScript functions return exactly one. However, that one value can be an array, or any other aggregate type. In other words, there is this rough equivalency:

Perl: return($junk,@stuff,\%thing); JavaScript: return [junk, stuff, thing];

Note that return(1,2,3): is valid JavaScript -- but it's not what you think. The comma operator there is the "use only the last value" operator, just as if you had done this in Perl:

Perl: return scalar(1,2,3);

JavaScript also lacks some of the handier list-friendly constructs like Perl's list-assignment:

($x,$y,@z) = stuff();

Also, one can't normally use negative indexes ( $x[-2] ) to access arrays counting from the end. (But the splice method is exceptional in understanding negative indexes for that purpose.)

Moreover, JavaScript's use of arrays where Perl uses lists can lead to other minor caveats. For example, in Perl, there are these two ways of moving the first element of @things to the end of @wad :

push @wad, shift(@things ); push @wad, splice(@things, 0, 1);

You can translate those to JavaScript:

wad.push( things.shift() ); wad.push( things.splice(0,1) );

...but they'll end up doing two different things. The shift one, like both Perl examples, makes the first item of things into the last item of wad . But JavaScript's somearray.splice() necessarily returns not a list, but an array, and so the last item of wad isn't the former things[0] , but instead [things[0]] .

Since JavaScript doesn't allow functions to return a list, there is no concept akin to Perl's scalar calling context versus list calling context.

(For a worst-case-scenario of language designers considering how to return multiple values from a function, see section 3.5.2 of Steele & Gabriel's excellent article Evolution of Lisp, http://interglacial.com/~sburke/pub/Evolution-of-Lisp.pdf.)

The full rules for return value semantics and syntax in JavaScript seem to be as follows. (I say "seem" because in some cases I've had to infer based on experimentation instead of actual reference to the spec.)

If the function doesn't end in a return statement, then it acts as if it ended in a "return;" statement. I'll call that an implied return statement.

return statement. A "return;" statement (whether explicit or implied) is equivalent to "return undefined;" in a function called normally (i.e., not as a constructor).

A "return;" statement (whether explicit or implied) is equivalent to "return this;" in a function called as a constructor.

If a function contains "return;" statements (whether explicit or implied) and "return somevalue;" statements, then a JavaScript interpreter working in strict mode will issue a warning like "function funcname does not always return a value." For example: function flipsign (x) { if( isNaN(x) ) return; return x * -1; } This will issue a warning because the function has a "return;" statement (an explicit one, in fact) and a "return somevalue;" statement. In this case, the author probably does mean to return undefined, as the "return;" does. But to defeat the warning, it has to be changed to the clearer "return undefined;".

Here's how I turn the above rules into practice:

Every JavaScript function should conform to one of the following patterns:

Pattern 1: Having no return statements at all. Example: function wobble_all () { for(var i = 0; i < weebles.length; i++) { weebles[i].wobble(); } } Pattern 2: Having only "return;" statements, including one at the end. (But never have a "return somevalue;" statement.) Example: function wobble_all () { if(!(weebles && weebles.length)) return; for(var i = 0; i < weebles.length; i++) { weebles[i].wobble(); } return; } Pattern 3: Having only "return somevalue;" statements, including one at the end. (But never have a "return;" statement.) Example: function all_weight () { if(!(weebles && weebles.length)) return 0; var w = 0; for(var i = 0; i < weebles.length; i++) { w += weebles[i].weight; } return w; }

Making each function follow one of these three patterns will silence any warnings, as well as keeping you clear of some confusing corner-cases in the semantics and syntax of return values.

(Incidentally, my pattern for constructors is to have no return statement at all. That's equivalent to ending in "return this;")

On these points, JavaScript's syntax is a bit stricter than Perl's:

In Perl: JavaScript Workaround: unless($x) ... if(!x)... => JavaScript has no 'unless' or 'until' $x = funcname $y; x = funcname(y); => You can't leave the parens off of a function call; but note that "throw" and "return" aren't functions, so their parens are optional. dostuff() || return 123; if(! dostuff() ) return "Ugh"; dostuff() || die "Ugh"; if(! dostuff() ) throw "Ugh"; => "throw" is a statement, and so can't be a component of an expression, like the operands of "x || y" are. Ditto for "return". $x = "Your name is $name." x = "Your name is " + name + "."; => JavaScript doublequoted string-literals don't have variable interpolation. But see the format() function I write in section 1.3.

When I talk about JavaScript, I generally mean JavaScript as embedded in a sane modern browser (i.e., Firefox). However, JavaScript implementations can be embedded elsewhere. Notably for Linux users, KDE applications can embed it, and you can write simple little scriptlets in JavaScript for KDE. See http://xmelegance.org/kjsembed/ for details.

MJD's Perl code for decimal to binary conversion translates directly, if oddly verbosely:

function binary (n) { // optional: n = Math.floor(n); // optional: if(n<0) throw "Usage: binary(nonnegative)"; if(n == 0 || n == 1) return n.toString(); var k = Math.floor(n/2); var b = n % 2; var E = binary(k); return E.toString() + b.toString(); } equate( binary(123) , "1111011");

Notes:

* We have all those .toString()s because "+" in JavaScript concatenates strings but adds numbers. This, in the case of JavaScript, is now universally recognized as a bad language-design decision. Mercifully, it's easy to work around.

* JavaScript has no single function corresponding exactly to Perl's int(). Moreover, if you try using int(), there will be great confusion, as "int" is a reserved word in JavaScript that currently does nothing. JavaScript has a pretty large set of reserved words, only some of which do anything:

abstract boolean break byte case catch char class const continue debugger default delete do double else enum export extends false final finally float for function goto if implements import in instanceof int interface long native new null package private protected public return short static super switch synchronized this throw throws transient true try typeof var void volatile while with

That's as of the JS1.5 spec, at least. (Appendix A: "The reserved words in this list cannot be used as JavaScript variables, functions, methods, or object names.")

In any case, I use Math.floor() instead of an int() here. But if ever you really need a work-alike for Perl's int(), then this should work fine, covering all bases, returning the integer version of the input wherever possible, otherwise returning undefined.

function _int (s) { return( (isNaN(s) || !isFinite(s)) ? undefined : (s == 0) ? 0 : (s > 0) ? Math.floor(s) : (s < 0) ? Math.ceil( s) : undefined ); };

This translation is direct and simple:

function factorial (n) { if(n==0) return 1; return factorial(n-1) * n; } equate( factorial(1), 1 ); equate( factorial(7), 5040 );

And the "1.2.1: Why Private Variables are Important" is just as relevent for JavaScript as for Perl.

As an aside, I'll note that of the various languages I've run into, the one where factorial is most interestingly computed is the stack-based language PostScript. To compute the factorial of the integer N is just 1 N -1 2 { mul } for .

That means:

1 % put a 1 on the stack N -1 2 { % Loop from N to 2, decrementing by -1 each time. % The side-effect of "for" is that is it puts the current % counter value onto the stack before executing the block. mul } for % Once we get here, the computed factorial is the item on the top % of the stack.

So, if N is 5:

1 % Stack now: 1 % As we loop from 5 to 1 by -1's: 5 % "for" pushes that. % Stack now: 1 5 mul % replaces topmost two items with their product % Stack now: 5 4 % "for" pushes that % Stack now: 5 4 mul % Stack now: 20 3 % "for" pushes that % Stack now: 20 3 mul % Stack now: 60 2 % "for" pushes that % Stack now: 60 2 mul % Stack now: 120 % End of looping. We return 120.

Phrasing this as a function that does type checking and whatnot gives us this:

/factorial { % takes one number, returns one integer dup 2 lt { pop 1 } { 1 exch cvi -1 2 { mul } for } ifelse } bind def

The hanoi program, like most Perl programs, has lines like this:

print "Move disk #1 from $start to $end.

";

There are two problems with translating this to JavaScript: variables don't interpolate in strings (so "blah $start blah" doesn't drop in the value of $start); and JavaScript doesn't provide a print function -- and if you forget and try it, you'll normally get the window.print() method, which brings up the browser's Print dialog!

But we'll just do defined a print() function of our own, and a format() method, to drop in values at indicated points, so that we can translate this

print "Move disk #1 from $start to $end.

";

as:

print(format("Move disk #1 from \f to \f.

", start, end));

Once we have those functions (to be shown later), we can translate the hanoi function directly as just this:

/* hanoi(N, start, end, extra) Solve Tower of Hanoi problem for a tower of N disks, of which the largest is disk #N. Move the entire tower from peg `start' to peg `end', using peg `extra' as a work space */ function hanoi (n, start, end, extra) { if (n == 1) { print(format("Move disk #1 from \f to \f.

", start, end)); //Step 1 } else { hanoi(n-1, start, extra, end); //Step 2 print(format("Move disk #\f from \f to \f.

", n, start, end)); //Step 3 hanoi(n-1, extra, end, start); //Step 4 } }

Now, as for a print() function -- we could do just this:

function print (s) { alert(s) }

However, it's often handy to call print() on a list of several things, so we'll need to use the arguments object:

function print () { alert( arguments.join('') ) }

However, the arguments object isn't a real Array, and so it can't be relied on to provide a join method that you'd expect from a proper Array object, nor a concat method that you could use to easily get an Array copy of it. But we do know (from the JavaScript specs!) that arguments does provide a length method, as well as understanding arguments[i] indexing. So we'll break down and use a for loop:

function print () { var Screen = ""; for(var i = 0; i < arguments.length; i++) { Screen += (arguments[i] == undefined) ? "" : arguments[i].toString() } alert(Screen); return true; }

And this is quite enough for a basic print function. Our console.html (at http://interglacial.com/hoj/console.html) provides a handier print function -- it just appends stuff to the variable Screen , which the console framework then copies into the current document later.

Now it's just a matter of creating a format function, which we will make do roughly the same work as a C/Perl sprintf function -- namely, the work of plugging values into a template. However, I think there's little need for the variety of formats that C/Perl sprintf provides; nor do I think "%stuff" is a particularly good escape sequence format. Instead, I'll use "\f" to mean "interpolate a value here". That "\f" is a standard JavaScript escape for the ASCII character \x03, FormFeed, which (I think it's very safe to say) is not widely used in JavaScript strings.

As a bonus, we'll make it so that if you really need to pad a string to a minimum size (say, padding 12 to "0012", you can follow the \f with that many \0 characters -- so \f\0\0\0 means "given a number here, pad it with as many as three leading zeroes, otherwise it's a string, and pad it with as many as three leading spaces".

(Internally, "\0", is just a standard JavaScript escape that's equivalent to "\000" or "\x00" or "\u0000", namely: ASCII character 0 -- another character that is not going to be widely used. This is a hack, but a pretty good one as hacks go.)

So, finally:

function format () { if(!arguments.length) return ""; var _ = [], out = "", m, i; for(i = 0; i < arguments.length; i++) { _.push(arguments[i]) } var f = _.shift().split( /(\f\x00*)/ ); while(f.length > _.length) { _.push("") } // a hack to keep us from ever running out of data items for(i = 0; i < f.length; i++) { m = f[i].match( /^\f(\x00*)$/ ); if(m) { // if it's a pattern if(_[0] == undefined) _[0] = ""; if(_[0].constructor == Number) out += pad_zeroes( m[1].length, _.shift() ); else out += pad_spaces( m[1].length, _.shift().toString() ); } else { // not a pattern out += f[i]; } } if(_.length) out += _.join(""); return out; } function pad_zeroes (digits, n) { // n is a string or number if(n.constructor == Number) n = n.toString(); while(n.length < digits) n = "0" + n; return n; } function pad_spaces (digits, n) { // n is a string or number if(n.constructor == Number) n = n.toString(); while(n.length < digits) n = " " + n; return n; }

So, with all this infrastructure (eminently reusable!) in place, we return to the hanoi() code, and run it, and happily get:

hanoi(3, "A", "B", "C"); Move disk #1 from A to B. Move disk #2 from A to C. Move disk #1 from B to C. Move disk #3 from A to B. Move disk #1 from C to A. Move disk #2 from C to B. Move disk #1 from A to B.

A second version:

function hanoi (n, start, end, extra, move_disk) { if (n == 1) { move_disk(1,start,end); } else { hanoi(n-1, start, extra, end, move_disk); move_disk(n,start,end); hanoi(n-1, extra, end, start, move_disk); } } function print_instruction (n,start,end) { print(format("Move disk #\f from \f to \f.

", n, start, end)); //Step 3 }

Callable with:

hanoi(3, "A", "B", "C", print_instruction);

var position = ['',"A","A","A"]; function check_move (disk, start, end) { var i; if(disk < 1 || disk > (position.length - 1)) throw format( "Bad disk number \f. Should be 1..\f.", disk, position.length - 1 ); if(position[disk] != start) throw format( "Tried to move disk \f from \f, but it is on peg \f.", disk, start, position[disk] ); for(var i = 1; i < disk; i++) { if(position[i] == start) { throw format( "Can't move disk \f from \f because \f is on top of it.", disk, start, i ); } else if(position[i] == end) { throw format( "Can't move disk \f to \f because \f is already there.", disk, end, i ); } } print(format("Moving disk \f from \f to \f.

", disk, start, end)); position[disk] = end; }

Callable with:

hanoi(3, "A", "B", "C", check_move);

Note that JavaScript lacks an array.lastIndex or anything, so we instead use (position.length - 1). More idiomatic would have been: disk >= position.length

Note also that we use "throw thing;" as the JavaScript translation of Perl "die". There is a notable difference that we should remember: JavaScript "throw" is a statement, not a function or operator, so that it doesn't require parens around its argument (whereas all function-calls do); and it can't be a part of a statement. That is, you can do this in Perl:

$x = $thingy || die "What, no thingy?!";

But if you try this in JavaScript, you'll find that it's a syntax error:

x = thingy || throw "What' no thingy?"; // a syntax error

Instead, you'll have to do:

x = thingy; if(!x) throw "What, no thingy?";

Also, a pragmatic consideration: the message in the throw() might appear only in a line in the Firefox JavaScript console, or in some other browser component that's not part of the current window. As a workaround for this, I normally defined a function "complaining", to alert() the user with the throw message, before passing it back to be thrown:

function complaining (s) { alert(s); return s; }

and elsewhere:

if(!x) throw complaining("What, no thingy?");

Our console.html happens to render this unnecessary, since it reports, as part of the current window, any exceptions that you throw().

Optionally, you can change "complaining" to observe the convention that the argument to throw() should be an Error object:

function complaining (s) { alert(s); return new Error(s); }

The points of this section are 1) recursive functions normally shouldn't deal with global variables, and 2) the common Perl idioms opendir(X, ...) and open(Y, ...) deal with the variables X and Y, and so opendir(my $x,...) and open(my $y,...) are to be used instead. The second point is just an idiosyncracy matter of Perl usage, but the first is very germane to JavaScript.

This section makes these points by using functions that access the filesystem. There are functions for such things in JavaScript (see http://kb.mozillazine.org/Dev_:_Extensions_:_Example_Code_:_File_IO for example), but for obvious security reasons, such functions are inaccessible to JavaScript that is run thru web pages. So for sake of argument, we'll emulate these functions with our own functions that serve up mock data:

var _mock_item_sizes = { "." : 32, "./a" : 32, "./a/d" : 32, "./a/d/j" : 1157, "./a/d/k" : 32, "./a/e" : 32, "./a/e/l" : 91, "./a/f" : 9324, "./b" : 32, "./b/g" : 62004, "./c" : 32, "./c/h" : 597, "./c/i" : 3633 }; var _mock_dir_contents = { '.' : ['a','b','c'], './a' : ['d','e','f'], './b' : ['g'], './c' : ['h', 'i'], './a/d' : ['j','k'], './a/d/k' : [], // suppose k is an empty directory, just for fun './a/e' : ['l'] }; function complaining (s) { alert("~Error~

" + s); return s; } function sizeOf (s) { if(!(s in _mock_item_sizes)) throw complaining( "No mock data for " + s + "?!?!"); return _mock_item_sizes[s]; } function isDir (s) { return ( s in _mock_dir_contents); } function isFile (s) { return ! isDir(s); } function opendir (s) { return new MockDirHandle(s); } function MockDirHandle (s) { if(!( s in _mock_dir_contents )) return undefined; var subfiles = _mock_dir_contents[s].concat(); subfiles.unshift('..'); subfiles.unshift( '.'); this.readdir = function () { return subfiles.shift() }; return this; }

With that in place, we can cook up our JavaScript code:

function total_size (top) { var total = sizeOf(top); if(isFile(top)) return total; var dirhandle = opendir(top); if(!dirhandle) throw complaining( "Couldn't opendir on " + top ); var file; while( file = dirhandle.readdir() ) { if( file == '.' || file == '..' ) continue; total += total_size( top + "/" + file ); } return total; }

And it works, because this:

total_size('.')

returns:

77030

...which is, indeed, the sum of all the items' sizes in our mock filesystem.

function dir_walk (top, code) { code(top); if(isDir(top)) { var dirhandle = opendir(top); if(!dirhandle) throw complaining( "Couldn't opendir on " + top ); var file; while( file = dirhandle.readdir() ) { if( file == '.' || file == '..' ) continue; dir_walk( top + "/" + file, code ); } } } function print_dir (s) { print(s, "

") }

That is callable with:

dir_walk('.', print_dir);

Or, just as well with an anonymous function:

dir_walk('.', function (s) { print(s, "

") });

function dir_walk (top, filefunc, dirfunc) { if(isDir(top)) { var dirhandle = opendir(top); if(!dirhandle) throw complaining( "Couldn't opendir on " + top ); var file, results = []; while( file = dirhandle.readdir() ) { if( file == '.' || file == '..' ) continue; results.push( dir_walk( top + "/" + file, filefunc, dirfunc ) ); } return dirfunc(top,results); } else { return filefunc(top); } } function file_size (filespec) { return sizeOf(filespec); } function dir_size (filespec, results) { var total = sizeOf(filespec) + results.sum(); print(format("\f\0\0\0\0\0\0 \f

", total, filespec)); return total; } // A utility function, er, method! Array.prototype.sum = function () { var total = 0; for(var i = 0; i < this.length; i++) { total += this[i]; } return total; }

Note that we use our print(format(...)) from section 1.3.

Note also: this "Array.prototype.sum" business is how we define a new sum() method to be available for all Array objects, as we use for results.sum().

Then:

var total_size = dir_walk('.', file_size, dir_size); 000032 ./a/d/k 001221 ./a/d 000123 ./a/e 010700 ./a 062036 ./b 004262 ./c 077030 .

Note that we can get space-padding out of our format() function by feeding a string instead of a number:

function dir_size (filespec, results) { var total = sizeOf(filespec) + results.sum(); print(format("\f\0\0\0\0\0\0 \f

", total.toString(), filespec)); return total; } var total_size = dir_walk('.', file_size, dir_size); 32 ./a/d/k 1221 ./a/d 123 ./a/e 10700 ./a 62036 ./b 4262 ./c 77030 .

Tadaah!

function file (f) { return [ basename(f), sizeOf(f) ]; } function basename (path) { return path.replace( /.*\//, "" ); } // note that we can't call it "short", // because that's a reserved word in JS. function dir (d, subdirs) { var new_hash = {}; for(var i = 0; i < subdirs.length; i++) { var subdir_name = subdirs[i][0], subdir_structure = subdirs[i][1]; new_hash[ subdir_name ] = subdir_structure; } return [ basename(d), new_hash ]; }

So:

dir_walk('.', file, dir).toSource()

gives:

".", {a:{d:{j:1157, k:{}}, e:{l:91}, f:9324}, b:{g:62004}, c:{h:597, i:3633}}]

Then to just list everything (files and dirs both), would be just this:

function print_filename (s) { print(s, "

") } dir_walk('.', print_filename, print_filename);

I leave implementing dangles() to your fecund imagination, but I'll just note that the null function in JavaScript would be:

function () {}

function dir_walk (top, filefunc, dirfunc) { if(isDir(top)) { var dirhandle = opendir(top); if(!dirhandle) throw complaining( "Couldn't opendir on " + top ); var file, results = []; while( file = dirhandle.readdir() ) { if( file == '.' || file == '..' ) continue; results.push( dir_walk( top + "/" + file, filefunc, dirfunc ) ); } return dirfunc ? dirfunc(top,results) : []; } else { return filefunc ? filefunc(top) : []; } } var all_plain_files = dir_walk('.', function (path) { return path; }, function (path,results) { return results; } ); all_plain_files.toSource();

But note that this doesn't quite work. It produces this:

[[["./a/d/j", []], ["./a/e/l"], "./a/f"], ["./b/g"], ["./c/h", "./c/i"]]

The reason is that this Perl code...

sub x { my @q = (1,2,3); return @q } push @b, x();

and this JavaScript...

function x () { var q = [1,2,3]; return q; } b.push( x() );

do different things. In the Perl, we add the three items 1, 2, and 3 to the array @b. In the JavaScript, we're adding exactly one element to b -- the Array object [1,2,3].

Superficially, this is a matter of what "push" does in each language. More fundamentally, it's also a matter of Perl functions being able to return a list of values, whereas in JavaScript, a function always returns exactly one value.

If we change our results.push(...) to a results = results.concat(...), then we get:

function dir_walk (top, filefunc, dirfunc) { if(isDir(top)) { var dirhandle = opendir(top); if(!dirhandle) throw complaining( "Couldn't opendir on " + top ); var file, results = []; while( file = dirhandle.readdir() ) { if( file == '.' || file == '..' ) continue; results = results.concat( dir_walk( top + "/" + file, filefunc, dirfunc ) ); } return dirfunc ? dirfunc(top,results) : []; } else { return filefunc ? filefunc(top) : []; } }

Then it all works:

var all_plain_files = dir_walk('.', function (path) { return path; }, function (path,results) { return results; } ); all_plain_files.toSource();

giving:

["./a/d/j", "./a/e/l", "./a/f", "./b/g", "./c/h", "./c/i"]

MJD's discussion of object-orientation assumes (quite reasonably) a class-based system, instead of a classless system like JavaScript has. However, I think his points still apply just as well -- an OO approach necessarily involves thinking of a conceptual hierarchy (whether it's a class hierarchy, or the levels between instance and prototype and proto-prototype that you get in JavaScript.) It's all just different ways of allowing for future code reuse -- the functional approach is about code being versatile, and the OO approach is about code being generic.

This section of Higher-Order Perl involves dealing with hash-objects (of the class HTML::Element) that the CPAN module HTML::TreeBuilder creates from parsing HTML source.

JavaScript that runs under modern browsers have a roughly corresponding object system, in the form of the Document Object Model.

As one of the designers of the HTML::Element system, and one of the users of the DOM, I assure you that the two interfaces are similar in the substance, but very different in style. Specifically, DOM is basically built with such generally restrictive languages as Java in mind, whereas HTML::Element is more minimalist in a way that fits Perl fine, and would fit JavaScript fine too -- except that DOM got to JavaScript first.

The interfaces' styles are different in dozens of ways not worth detailing here, but here is a point of comparison that I hope will give you an idea of them in general:

~~DOM in JavaScript~~ ~~HTML::Element in Perl~~ element.getAttribute('align'); $element->attr('align'); element.setAttribute('align','right') $element->attr('align', 'right'); element.removeAttribute('align'); $element->attr('align', undef);

I stress that the interfaces are equally expressive, and that it is trivial to emulate either interface in terms of other, on this and most other points.

Interfaces aside, the only significant point of difference between DOM and HTML::Element/HTML::TreeBuilder is that the trees they produce are actually a bit different. DOM trees that a JavaScript program sees have a document object that is above the root element of the document; HTML::TreeBuilder has no such object. Also, by default, HTML::TreeBuilder constructs trees of just objects for elements and text strings for text nodes, whereas a DOM tree can have text objects, comment objects, and various other oddities. And so our walk_html function has to do a bit more work to present the same interface to its callback functions:

function walk_html (node, textfunc, elementfunc) { if(node.nodeType == document.DOCUMENT_NODE) return walk_html(node.documentElement,textfunc,elementfunc); var results = [], children = node.childNodes; for(var i = 0; i < children.length; i++) { var item = children.item(i); if( item.nodeType == document.TEXT_NODE ) results = results.concat( textfunc(item.data, item) ); else if(item.nodeType == document.ELEMENT_NODE ) results = results.concat( walk_html(item, textfunc, elementfunc ) ); // Otherwise it's some crazy node type we don't care about. } return elementfunc(node, results); }

Unless you are (or want to be) particularly familiar with the DOM, you needn't worry about the finer details of the above function and its differences from the Perl function -- as those differences are not part of the differences between JavaScript and Perl, but merely differences between HTML::Element and DOM. The three lines of interest are really just these:

results = results.concat( textfunc(item.data, item) ); ... results = results.concat( walk_html(item, textfunc, elementfunc ) ); ... return elementfunc(node, results);

You may, correctly, begin to anticipate the same problem we saw with "dir-walk-cd-def" in section 1.5, where we puzzled over how to translate push @thing, somefunc() . With the above use of concat , we seem to have dodged any trouble, because if one of the functions wants to return nothing, it can just return []; , and if it wants to return one value, it may just return somevalue; or return [somevalue]; , and if it wants to return several values, it can just return [foo, bar, ...] . And with that agreement in place, all seems well. We can apparently use the above walk_html with JavaScript callbacks just as with the Perl ones...

function promote_if_h1tag (element, underlings) { if(element.tagName == 'H1') { return [ 'KEEPER', ...something meaning join('', map $_->[1], @_) ]; // (We'll worry about map and grep later) } else { return underlings; } } tagged_texts = walk_html( document, function (text) { return ['MAYBE', text] }, promote_if_h1tag );

However, this fails. Even though our walk_html in Perl and JavaScript provide basically the same interface, things go quite wrong. The problem is this: in Perl, return list and return [list] are basically different, whereas they fold together in JavaScript in a way whose ramifications we still haven't totally dealt with.

The specific problem is this: our particular callbacks for walk_HTML (promote_if_h1tag, and the anonymous function that is the second parameter to walk_html ) conspire to uses arrayrefs as shorthand for particular data items: ['KEEPER', somestring] for a bit of text that is definitely heading-text, and ['MAYBE',somestring] for a bit of text that whose status we don't yet know. However, our larger plan has been to use arrayrefs to express return value lists, so that when we return ['KEEPER', somestring] or ['MAYBE',somestring] , this is mistaken for a pair of return values: one the string "KEEPER", and other the string that we got from the text nodes so far.

We could salvage our approach by carefully considering our return statements and changing them like this:

return ['MAYBE', text]; => return [ ['MAYBE', text] ]; return ['KEEPER', ...]; => return [ ['KEEPER', ...] ]; return underlings; => return underlings;

And this would work -- but it is the start of madness.

Consider: In return [['MAYBE',text]] , we have four objects. Starting from the inside out: We have a bit of text that we've pulled out of our document so far. We need to signal its status as Maybe or Keeper, so we signal this with another string, "MAYBE" or "KEEPER". We need some object to be able to carry both values, so we use an array, and we decide that in our system, array[0] will be the status signal, and array[1] will be the content. And then we need an array to wrap the zero/one/many such nodes that we could be returning. Each of these steps is clear, but as a group they are confusing, because we are using the same data type, array, for two very different things -- bundling return values, and representing a status-tagged text node.

An "OO fundamentalist" approach would say our four kinds of values (array, array, string, string) should be represented as at least two new classes: a ReturnValues class, and a TextNode class (whose objects have a field for whether is a Maybe or a Keeper node, and another field to contain the actual text data), and so we would have changed our return statements something like this:

return ['MAYBE', text]; => return new ReturnValues().addValue( new Texty(text).isKeeper(false) ); return ['KEEPER',...]; => return new ReturnValues( new Texty(text).isKeeper(true) ); return underlings; => return new ReturnValues().addValues( underlings );

...with other work elsewhere to define these classes, and then to retrofit our earlier results = results.concat(...); so that when given a ReturnValues object, it will pick it apart, whereas it should do no such thing with any other type of object, notably a Texty object.

According to some notion of rigorously distinguishing type/class/kind, this is all perfect, compared to our awful return [ ['MAYBE', text] ]; approach from before. But our ReturnValues/Texty/etc approach is even crazier, as it is vastly more verbose, and it requires us to remember the interfaces of two new classes. It is the realization of every dark rumor ever perpetuated about Java, UML charts, and long design meetings. It has a kind of clarity of its own, but its design is excessive in every way.

But we can arrive at the solution by moderating that design. Instead of a ReturnValues class, we can keep using simple arrays for multiple return values; and instead of using two-item arrays for text nodes, we can use two-field objects (or, in Perl terms, hashrefs), like so:

return ['MAYBE', text]; => return { data: text }; return ['KEEPER', ...]; => return { isKeeper: 1, data:... }; return underlings; => return underlings;

Semantically, this is basically the same as our excessive ReturnValues/Texty/etc solution, except that it manages to be very brief by using data types (Array and Object) for which JavaScript provides abbreviations for construction and population (namely, [...] and {...} ). And implementationally, all is well, because somearray.concat(thingy) does what we want when thingy is an Object or is an Array containing zero/one/several Objects.

Sanity is restored; and now we build on our walk_html to make our JavaScript h1-content-gatherer, which is complete except for the parts corresponding to Perl map and grep expressions:

function promote_if_h1tag (element, underlings) { if(element.tagName == 'H1') { return { isKeeper: 1, data: (underlings.map( get each one's 'data' ).join('')) }; } else { return underlings; } } function extract_headers (tree) { var tagged_texts = walk_html(document, function (text) { return { data: text } }, promote_if_h1tag ); var keepers = tagged_texts.grep( ...get only keepers... ); var keeper_text = keepers.map( get each keeper's data ); var header_text = keeper_text.join(''); return header_text; } extract_headers(document);

Perl map and grep aren't just plain functions -- you don't just evaluate all the arguments and then operate on them. In other words, these aren't the same:

@x = map($_ * 2, @things) vs $n = $_ * 2; @x = map($n, @things);

While we can use function prototypes in Perl to create new special forms (imagine sub first (&@) {...} ), there is no such facility in JavaScript. However, we can easily use anonymous functions to do the same thing, as in this minimal attempt at a map :

function map (f, inarray) { var out = []; for(var i = 0; i < inarray.length; i++) { out.push( f(inarray[i]) ) } return out; }

Given that, we can translate

@x = map($_ * 2, @things);

as this:

x = map function(_){ return _ * 2 }, things;

(Incidentally, _ is just another valid symbol name in JavaScript. I sometimes use it in one-liner functions, but there's nothing special about it; we could just as easily use i or someNumber or whatever.)

It is unavoidably regrettable that

function(_){ return _ * 2 } is so much more verbose than $_ * 2 . But I think this can be helped slightly by making our map accept something other than functions as the first parameter. My current favorite approach is to make passing a string thingy synonymous with passing a callback consisting of

function (_) { return _.thingy }

So in our extract_headers code above, we have:

var keeper_text = map( get each keeper's data );

We could implement this with our new map function as:

var keeper_text = map( function (_){ return _.data }, keepers );

or just abbreviate it as:

var keeper_text = map( 'data', keepers );

Finally, as a final syntactic flourish, I like to make map not a function, but a method of Array objects, so that Perl code like this:

join '/', map $_*2, split ':', $thing

will produce consistently flipped JavaScript code like this:

thing .split( ':' ) .map( function(_){return $_*2} ) .join('/')

Instead of the less pretty:

map( function(_){return $_*2}, thing .split( ':' ) ) .join('/')

Doing this is just a matter of assigning a new method to Array.prototype object:

Array.prototype.map = function(f) { if(!f.apply) { var propname = f; f = function(_) { return _[propname] } } var out = []; for(var i = 0; i < this.length; i++) { out.push( f( this[i], this, i) ); } return out; };

The only drawback is the now familiar problem that our callback functions can (must!) return only one value, so that one can't use it to write the equivalent of this Perl code:

map { $_->blorp ? ($_->shunk, $_->zorp) : () } @thingies

But we can accommodate allow for this by following in the Lispish tradition of having many variants of map , like so:

Array.prototype.mapc = function(f) { if(!f.apply) { var propname = f; f = function(_) { return _[propname] } } var out = []; var gotten; for(var i = 0; i < this.length; i++) { gotten = f( this[i], this, i); if( gotten != undefined ) out = out.concat( gotten ); } return out; };

In that case, Perl code such as

map { $_->blorp ? ($_->shunk, $_->zorp) : () } @thingies

could be translated as:

thingies.mapc( function(_) { return _.blorp ? [_.shunk, _.zorp] : undefined )

and then what will be appended to out will be not the single item [_.shunk, _.zorp] or undefined , but instead either the two items in _.shunk and _.zorp , or nothing at all.

With map and its concatty twin mapc done, it's simple to also produce a grep ...

Array.prototype.grep = function(f) { if(!f.apply) { var propname = f; f = function(_) { return _[propname] } } var out = []; for(var i = 0; i < this.length; i++) { if( f( this[i], this, i)) out.push(this[i]); } return out; };

...and, while we're at it, a foreach :

Array.prototype.foreach = function(f) { if(!f.apply) { var propname = f; f = function(_,x,i) { x[i] = _[propname] } } for(var i = 0; i < this.length; i++) { f( this[i], this, i ); } return; };

So, for example, to get a copy of words with every item uppercase, one would say:

var loudwords = words.map( function(_){ return _.toUpperCase(); } );

To find all the uppercased words in words , one would say:

function isUpperCase (_) { return _ == _.toUpperCase(); } var already_loud = words.grep( isUpperCase);

And to change words in-place, one would say:

words.foreach( function(item, arr, i){ arr[i] = item.toUpperCase(); });

More importantly, we can now finish our promote_if_h1tag and extract_headers functions:

function promote_if_h1tag (element, underlings) { if(element.tagName == 'H1') { return { isKeeper: 1, data: (underlings.map('data').join('')) }; } else { return underlings; } } function extract_headers (tree) { var tagged_texts = walk_html(document, function (text) { return { data: text } }, promote_if_h1tag ); var keepers = tagged_texts.grep( 'isKeeper' ); var keeper_text = keepers.map( 'data' ); var header_text = keeper_text.join(''); return header_text; } extract_headers(document);

Abstracting out the element.tagName == 'H1' ,

function promote_if (is_interesting, element, underlings) { if(is_interesting( element.tagName )) { return { isKeeper: 1, data: (underlings.map('data').join('')) }; } else { return underlings; } } ... var tagged_texts = walk_html(document, function (text) { return { data: text } }, function (el, underlings) { return promote_if( function(tag) { return tag == "H1" }, el, underlings ); } ); ...

If we sneak a peek to section 7.1, we can replace our callback that calls promote_if , with a promote_if that manufactures functions like our promote_if_h1tag function:

function promote_if (is_interesting) { // return a function that promotes based on our given criterion return function (element, underlings) { if(is_interesting( element.tagName )) { return { isKeeper: 1, data: (underlings.map('data').join('')) }; } else { return underlings; } }; } ... var tagged_texts = walk_html(document, function (text { return { data: text } }, promote_if( function(tag) { return tag == "H1"; } ) ); ...

But the full ramifications of that will wait until then.

We see in chapter 3 (with Memoize) how to fix the redundant computation that the following algorithms produce. But for the moment, ignore their inefficiency. (They do work!)

To compute a Fibonacci number is simple in both Perl and JavaScript:

sub fib { my ($month) = @_; if ($month < 2) { 1 } else { fib($month-1) + fib($month-2); } }

becomes...

function fib (month) { if(month < 2) return 1; return fib(month-1) + fib(month-2); }

The partitioning code translates quite tidily. Here's the Perl original:

sub find_share { my ($target, $treasures) = @_; return [] if $target == 0; return if $target < 0 || @$treasures == 0; my ($first, @rest) = @$treasures; my $solution = find_share($target-$first, \@rest); return [$first, @$solution] if $solution; return find_share($target , \@rest); }

And here's the JavaScript translation:

function find_share (target, treasures) { if(target == 0) return []; if(target < 0 || treasures.length == 0) return undefined; var rest = treasures.concat(); var first = rest.shift(); var solution = find_share( target-first, rest ); if(solution) return [].concat(first, solution); return find_share(target, rest); }

The basic algorithm is just as clear/unclear in JavaScript as in Perl -- that's to say, it has that particularly austere terseness that recursion can sometimes bring. For a discussion of what it all means, you'll have to look in Higher-Order Perl.

As to just its translation into JavaScript, the only point that is not totally obvious involves JavaScript's lack of a list-assignment operator, as in Perl's ($first, @rest) = @$treasures; . But in this particular case, we get the same thing done by copying treasures to rest and then shifting the first element to first :

var rest = treasures.concat(); // concat = array-copy var first = rest.shift();

In any case, the function works fine:

find_share(5, [1,2,4,8]) => [1,4] find_share(7, [1,2,3,4,5,6,7]) => [1,2,4]

Now, the convenience function partition is mostly straightforward to translate. Here's the original Perl:

sub partition { my $total = 0; for my $treasure (@_) { $total += $treasure; } my $share_1 = find_share($total/2, [@_]); return unless defined $share_1; my %in_share_1; for my $treasure (@$share_1) { ++$in_share_1{$treasure}; } for my $treasure (@_) { if ($in_share_1{$treasure}) { --$in_share_1{$treasure}; } else { push @$share_2, $treasure; } } return ($share_1, $share_2); }

And here's the resulting JavaScript:

function partition (treasures) { var total = 0; var i; for(i = 0; i < treasures.length; i++) { total += treasures[i] } var share_1 = find_share(total / 2, treasures); if(!defined(share_1)) return undefined; // Now figure out what's in share1 or in share2 var in_share_1 = {}, share_2 = []; for(i = 0; i < share_1 .length; i++) { // We can't just say "in_share_1[ share_1[i] ] = ++;", // because ++ on undef is NaN! in_share_1[ share_1[i] ] = 1 + (in_share_1[ share_1[i] ] || 0); } for(i = 0; i < treasures.length; i++) { if( in_share_1[ treasures[i] ] ) { --in_share_1[ treasures[i] ]; } else { share_2.push( treasures[i] ); } } return [share_1, share_2]; } function defined (x) { return typeof(x) != 'undefined'; }

And it runs happily:

partition( [1,2,4,8] ) partition( [9,12,14,17,23,32,34,40,38,49] ) partition( [5,7,10, 8] ).toSource() => [[5, 10], [7, 8]] partition( [1,2,3,4,5,6,7] ).toSource() => [[1, 2, 4, 7], [3, 5, 6]]

There are two points of interest there, one involving what we do with our %in_share_1 , and the other involving how to most clearly translate "return unless defined $share_1;".

Consider our code here:

my %in_share_1; ... for my $treasure (@$share_1) { ++$in_share_1{$treasure}; }

This is a common Perl idiom, which I often myself implement with a hash called %seen . But a problem arises in JavaScript: the JavaScript "++" operator doesn't forgive the case where it's applying to an undefined value. That is, imagine the first iteration of that loop: suppose $treasure is 6, and %in_share_ is empty. That means that ++$in_share_1{$treasure} means:

in_share_1[treasure] = in_share_1[treasure] + 1;

But (in the view of the JavaScript implementors), that's just equivalent to:

in_share_1[treasure] = undefined + 1;

and undefined + 1 is NaN , the Not-A-Number object, such as JavaScript often produces when you try doing performing a mathy operation on an unmathy (or otherwise unacceptable) value. (To wit: Math.sqrt(-1) is NaN . But 1/0 isn't the NaN object, it's the Infinity object!)

The workaround is to implement x[y]++ in JavaScript as x = (x[y] || 0) + 1 , and so:

in_share_1[ share_1[i] ] = 1 + (in_share_1[ share_1[i] ] || 0);

The stylistic problem that arises is: this middling idiom in Perl has become a big mess in JavaScript.

An idealistic approach would be to say that in tidy Object-Oriented style, there should be no clunky idioms sticking out like that -- instead, they should be embodied as objects. We can imagine a JavaScript class corresponding to what we use a Perl %seen...$seen{thing}++ hash-and-idiom for. We'd do something like var seen = new BunchCounters(); and then call seen.incr(thing) and seen.decr(thing) and then be able to ask what counters in seen have positive values, and so on... And then the ugliness of x = (x || 0 ) + 1 is appropriately buried deep in the internals of the BunchCounters class.

But a less dogmatic approach would be to simply say: if it's an unwieldy idiom, you don't need to objectify it -- just refactor it! And so we isolate (quarantine? entomb?) all that into a function which, after some behoovy variable renaming, looks like this;

function missing_from (mainset, subset) { // Says what's in mainset but not in subset. var i, in_subset = {}, missings = []; for(i = 0; i < subset .length; i++) { in_subset[ subset[i] ] = 1 + (in_subset[ subset[i] ] || 0); } for(i = 0; i < mainset.length; i++) { if( in_subset[ mainset[i] ] ) { --in_subset[ mainset[i] ]; } else { missings.push( mainset[i] ); } } return missings; }

That leaves our partition function as just this:

function partition (treasures) { var total = 0; for(var i = 0; i < treasures.length; i++) { total += treasures[i] } var share_1 = find_share(total / 2, treasures); if(!defined(share_1)) return undefined; return [share_1, missing_from(treasures, share_1)]; }

Even more refactoring would move out that total figuring. We could make an anyarray.sum() method that would sum all the items of any array:

Array.prototype.sum = function () { var x = 0; for(var i = 0; i < this.length; i++) { x += this[i];} return x; };

That leaves our partition function as just this:

function partition (treasures) { var total = treasures.sum() var share_1 = find_share( total / 2, treasures); if(!defined(share_1)) return undefined; return [share_1, missing_from(treasures, share_1)]; }

or if we don't mind losing the total variable (useful only really at providing a label for sake of readability), we can simplify even further:

function partition (treasures) { var share_1 = find_share( treasures.sum() / 2, treasures); if(!defined(share_1)) return undefined; return [share_1, missing_from(treasures, share_1)]; }

Incidentally: We considered the alternatives of objectifying versus refactoring our JavaScript x[y]++ idiom, but there is a third possibility: making it neither a function nor a class, but a macro, so that one could do something like incr(x[y]) and have it expand, at compilation, to x[y] = 1+(x[y]||0) . However, JavaScript has no macro system, neither in the style of the C preprocessor, nor in the style of Lisp code-tree manipulation. (Whether a JavaScript macro system in either style could, or should, be implemented, is another question altogether.)

The second (minor) point of interest in our partition function is this:

return unless defined $share_1;

But JavaScript has no unless . Now, I really miss unless in exactly these kinds of pseudoassertions, as I call them -- namely, things that basically mean to bail out unless some necessary condition is true. Normally it's simple to follow this model:

~~Perl~~ ~~JavaScript~~ return... unless whatever; => if(!whatever) return...; die "enh!" unless whatever; => if(!whatever) throw "enh!"; whatever or die "enh!" => if(!whatever) throw "enh!";

However, we run into a minor problem with clarity, with:

return unless defined $share_1;

Namely, JavaScript has no one defined function. It's common to instead do something like to test undefinedness with thing == undefined (or the basically synonymous thing == null ), or typeof(thing) == "undefined" . But since our Perl says "unless defined", we start wanting to do this:

if(! ... ) return undefined;

and then when we want to translate "defined", we negate one of our idioms for testing undefinedness, giving thing != undefined , and thus:

if( !( share_1 != undefined )) return;

This is obviously pointlessly complex. Now, we can simplify it to:

if( share_1 == undefined ) return;

And that's fine. Or we can do as I prefer: write a utility function for testing the definedness of something, and use that in our condition:

if( !defined(share_1) ) return; function defined (x) { return typeof(x) != 'undefined'; }

And finally, mixing "return;" and "return somevalue" in the same function causes a warning in JavaScript strict mode (see "HOJ.0.2.1: Implicit Return Values"), so we usually have to change our "return;" to the synonymous "return undefined;". So:

if( !defined(share_1) ) return undefined;

MJD's Perl source for the dispatch-table-based Reverse Polish Notation calculator translates tidily into JavaScript:

var stack = []; var actions = { '+': function () { stack.push( stack.pop() + stack.pop() ); }, '*': function () { stack.push( stack.pop() * stack.pop() ); }, '-': function () { var s = stack.pop(); stack.push( stack.pop() - s); }, '/': function () { var s = stack.pop(); stack.push( stack.pop() / s); }, 'NUMBER': function(n) { stack.push(n) }, '_DEFAULT_': function(n) { throw( "Unrecognized token `" +n.toString() + "'; aborting" ); } }; function evaluate (expr, actions) { var tokens = expr.split(/\s+/); for(var i = 0; i < tokens.length; i++) { var token = tokens[i]; if(!token.length) continue; var type = 'NONNUMBER'; if (token.match( /^\d+$/ )){ // It's a number type = 'NUMBER'; token = token * 1; // note the numerification } var action = actions[type] || actions[token] || actions['_DEFAULT_']; action(token, type, actions); } return stack.pop(); }

There is one gotcha here. In the Perl code, MJD has:

if ($token =~ /^\d+$/) { # It's a number $type = 'NUMBER'; }

...but our Javascript has to have:

if (token.match( /^\d+$/ )){ // It's a number type = 'NUMBER'; token = token * 1; // note the numerification }

If not for that "token = token * 1", we would discover that "2 3 +" would give "23" -- because the number tokens "2" and "3" would be still just strings; and for strings, "+" means "concatenate" -- and our comment "It's a number" and our "type = 'NUMBER';" would be mere wishful thinking. The short story is that we turn the string value into a number value by doing "token = token * 1". (The long story is in the next section.)

And so the caculator works happily; if we paste the above code into our console.html (http://interglacial.com/hoj/console.html), and then add this:

evaluate('2 3 + 4 *', actions)

then we correctly get back the value 20.

Addition is about the most common operation that programmers perform on numbers, and concatenation is the most common for strings; so this gotcha with JavaScript "+" does create the impression that JavaScript doesn't coerce numbers to/from strings as transparently as Perl does.

But in fact, JavaScript's clumsiness with numbers is really just limited to the "+" operator. In all other practical respects, JavaScript and Perl automatically coerce numbers to and from strings just as fluidly. For example, each language's join knows to stringify numbers in the list/array being joined:

join("", 123, "456") => "123456" [123, "456"].join("") => "123456"

And even "++" does the expected coercion:

$x = "123"; $x++; $x => 124 x = "123"; x++; x => 124

But then JavaScript makes "+" be annoyingly unusual: "123"+"1" is "1231" while 123+1 is 124. And if force of habit from Perl makes us think that adding zero will numerify a string, we find that 0+"123" is not 123, but "0123"; and even worse, "123"+0 is not 123, but "1230"!

But the "*" operator has no such problem: 1*"123" and "123"*1 are both 123. So we use "token = token * 1" as our numerifier.

Incidentally, "+" is not the only JavaScript operator that is polymorphous. The others are:

JavaScript "==" subsumes Perl "==" and "eq".

JavaScript "<" subsumes Perl "<" and "lt".

JavaScript ">" subsumes ">" and "gt".

So while JavaScript "+" is not totally unique, it's certainly the only really annoying one. To mix Freudian jargon with semantics jargon, those other operators may be polymorphous, but only "+" manages to be polymorphously perverse.

(Perl has its own oddity, well hidden, on the bitwise-xor "^" operator: "123" ^ "456" is "\x05\x07\x05" while 123 ^ 456 is 435. Moreover, the "++" operator behaves specially on non-numeric alphabetic strings (so $x = "abc"; $x++ leaves $x with the value "abd"), but it is not bizarre like "^" is. That is, incrementing "123" and 123 both give 124, as we saw above.)

We can change our "actions" dispatch-table to the following to get not an evaluation of the input, but an abstract syntax tree of it:

var actions = { 'NUMBER': function(n) { stack.push(n*1) }, '_DEFAULT_': function(n) { var s = stack.pop(); stack.push( [n, stack.pop(), s] ); } };

And so

evaluate('2 3 + 4 *', actions).toSource() then returns: ["*", ["+", 2, 3], 4]

MJD's "AST_to_string" function translates just as easily into JavaScript too:

function AST_to_string(tree) { if(tree.push) { // a cheapo way to test arrayness var op = tree[0], s1 = AST_to_string( tree[1] ), s2 = AST_to_string( tree[2] ); return ['(', s1, op, s2, ')'].join(' '); // join autostringifies } else { return tree; } }

And so this:

AST_to_string( evaluate('2 3 + 4 *', actions) )

returns this:

( ( 2 + 3 ) * 4 )

And this is where MJD's section 2.2 on the dispatch-table-driven RPN calculator stops. But for JavaScript, this is a good point to continue a bit.

The above section's "actions" dispatch-table gives us an AST, an Abstract Syntax Tree. An AST is, as its name suggests, a tree.

Also consider: The typical JavaScript interpreter is embedded in a browser, those indispensable programs that are happiest when showing us documents that consist of tree-shaped data structures, which are gotten from parsing HTML, and which are styled according to CSS rules.

Thru the DOM (Document Object Model) interface, JavaScript programs can liberally interact with those document trees: they can read the document's structure and content, can alter attribute values, move nodes around, and even create new text and element nodes.

So, it would be useful and (theoretically) easy to take an AST from the calculator program, and to make a tree of DOM nodes that reiterates that structure, for displaying to the user.

However, the particulars of node construction in DOM are quite typical of the verbosity that I complained about in section 1.7. Merely making this structure and attaching it to the "body" node...

<div class="foo"><div class="bar">Baz!</div></div>

...requires all these calls:

var x = document.createElement( "div" ); x.setAttribute( "class", "foo" ); document.body.appendChild( x ); y = document.createElement( "div" ); y.setAttribute( "class", "bar" ); x.appendChild( y ); y.appendChild( document.createTextNode( "Baz" ) );

Of all the strategies I've found for making JavaScript tolerable, my first, best, and most-used is: avoid construction with DOM, by wrapping it up in a function I call "graft", which replaces those DOM calls with this:

graft( document.body, ['div.foo', ['div.bar', "Baz!" ] ] );

That is, graft turns a tree of JavaScript Array objects into a DOM tree (and attaches it to document.body, or wherever else you specify), according to a set of rules:

An array in input means to create an element (whose tagname, and possibly also class, come from the first item in the array, as with 'div.foo' and 'div.bar')

Not-first strings in arrays (like the "Baz!") mean to create text nodes.

Hashes (as we see will below) will be used to add attributes to the array/element that they're inside -- so we could rewrite the above as: graft( document.body, ['div', {'class':'foo'}, ['div', {'class':'bar'}, "Baz!" ] ] );

In my experience, this is extremely useful as a sort of shorthand in JavaScript programs, where I want to create and pass tree-shaped data that I want to put into the HTML document window, without all the verbosity of DOM calls.

But even more relevant to our task at hand, it means we can write a simple recursive function to transform the calculator's AST (made of JavaScript Array objects) into a data structure (made of JavaScript Array objects) that we can feed to graft:

function AST_to_graftable (tree) { if(tree.push) { // a cheapo way to test arrayness var op = tree[0], s1 = AST_to_graftable( tree[1] ), s2 = AST_to_graftable( tree[2] ); return ['div.expr', s1, ['div.op', op], s2 ]; } else { return ['div.terminal', tree]; } }

When we feed that in, along with the graft source that I'll show at the end of this section, and call this:

graft(document.body, ['p.ast', AST_to_graftable( evaluate('2 3 + 4 *', actions)) ] );

...we get a representation of the AST added to the bottom of the current document, as if it had come from HTML source like this:

<p class="ast"> <div class="expr"> <div class="expr"> <div class="terminal">2</div> <div class="op">+</div> <div class="terminal">3</div> </div> <div class="op">*</div> <div class="terminal">4</div> </div> </p>

It looks a bit unimpressive at first...

2

+

3

*

4

But that's just because we don't have styles defined for .expr or these other classes. That's easily fixed:

graft(document.body, ['style',{type:'text/css'}, 'div { display: block; }

', '.expr { border: 1px cyan solid; padding: .2em; }

', '.op { color: #f0f; }

', '.ast { margin: 1ex; padding: 1ex; border: 2px grey solid;}' ]);

And then it looks like this:

2 + 3 * 4

The benefit of this sort of diagram is made even clearer with an even longer RPN expression like '2 3 + 4 88 + * 19 /':

graft(document.body, ['p.ast', AST_to_graftable( evaluate('2 3 + 4 88 + * 19 /', actions)) ] );

2 + 3 * 4 + 88 / 19

Here is the complete source for my graft function:

function graft (parent, t, doc) { // Usage: graft( somenode, [ "I like ", ['em', // { 'class':"stuff" },"stuff"], " oboy!"] ) doc = (doc || parent.ownerDocument || document); var e; if(t == undefined) { throw complaining( "Can't graft an undefined value"); } else if(t.constructor == String) { e = doc.createTextNode( t ); } else if(t.length == 0) { e = doc.createElement( "span" ); e.setAttribute( "class", "fromEmptyLOL" ); } else { for(var i = 0; i < t.length; i++) { if( i == 0 && t[i].constructor == String ) { var snared; snared = t[i].match( /^([a-z][a-z0-9]*)\.([^\s\.]+)$/i ); if( snared ) { e = doc.createElement( snared[1] ); e.setAttribute( 'class', snared[2] ); continue; } snared = t[i].match( /^([a-z][a-z0-9]*)$/i ); if( snared ) { e = doc.createElement( snared[1] ); // but no class continue; } // Otherwise: e = doc.createElement( "span" ); e.setAttribute( "class", "namelessFromLOL" ); } if( t[i] == undefined ) { throw complaining( "Can't graft an undefined value in a list!"); } else if( t[i].constructor == String || t[i].constructor == Array ) { graft( e, t[i], doc ); } else if( t[i].constructor == Number ) { graft( e, t[i].toString(), doc ); } else if( t[i].constructor == Object ) { // hash's properties => element's attributes for(var k in t[i]) e.setAttribute( k, t[i][k] ); } else { throw complaining( "Object " + t[i] + " is inscrutable as an graft arglet." ); } } } parent.appendChild( e ); return e; // return the topmost created node } function complaining (s) { alert(s); return new Error(s); }

It's long, but most of its code is edge cases and error-handling.

Here is a handy-dandy memoizer in JavaScript:

function memoize (methodname, inobject) { if(!inobject) inobject = window; // default to using the "global object" if(!(methodname in inobject)) throw new Error("No '" + methodname + "' method in object " + inobject); var f = inobject[methodname]; var cachename = "_memz_cache_" + methodname; inobject[cachename] = {}; inobject[methodname] = function () { // the stub var key = []; for(var i = 0; i < arguments.length; i++) { key.push(arguments[i]) } key = key.toSource(); if(!( key in this[cachename] )) this[cachename][key] = f.apply(this, arguments); return this[cachename][key]; }; return cachename; // in case you want to access it }

To memoize a 'foo' function, call it like so:

memoize('foo');

To memoize the 'foo' method of the object Bar:

memoize('foo', Bar);

The memoize function works for methods as well as functions because in JavaScript they are the same thing -- a function call is just an abbreviated way of writing a method call to window.functionname(...) .

So to memoize this:

function fib (month) { if(month < 2) return 1; return fib(month-1) + fib(month-2); }

We need only call:

memoize('fib');

Memoize looks up 'fib' in window and replaces it with a function that first checks to see if window._memz_cache_fib[ arguments.toSource() ] is present; if so, it is returned. Otherwise we call the original function (using function.apply(this,arguments) , which is the Perlish equivalent of $this->$function(@_) ), and save that in this._memz_cache_fib[ arguments.toSource() ] , and then return that.

This document is distributed under the GNU Free Documentation License (http://www.gnu.org/licenses/fdl.html).

In this section, I'll note only nontrivial changes to HOJ.

02006-12-16 Making the license explicit (GFDL) 02006-03-21 Sorry, no more new real content, but I've added links to The JavaScript Anthology http://www.powells.com/biblio/2-0975240269-0 and KJSEmbed stuff http://xmelegance.org/kjsembed/ 02005-08-06 Added "HOJ.0.2.1: Implicit Return Values". I go back and change some bits of "1.8.2: Partitioning" to conform with my three return-value patterns for functions. But the changes are just a matter of changing some "return;"s to "return undefined;"s, to avoid strict-mode compiler warnings. 02005-08-05 I add the section "2.2: Calculator", containing the important "graft" function. I think this is the first section I've written that has the distinction of being longer than the corresponding section in MJD's book. 02005-05 to 02005-08 Lots of changes not carefully noted. 02005-05-12 I first post the beginnings of HOJ.

~End~ (for now)