Last month, I had the pleasure of meeting tons of excellent and intelligent front-end engineers at JSConf 2009 — the conference for Javascript in Washington DC. If you are a front-end engineer — or even if you write programs in Javascript for back ends, mobile devices, or desktops — you absolutely have to go to JSConf 2010. It’s apparent that Javascript is quickly becoming one of the hottest languages for all environments and applications, and JSConf is the first (and only) conference that deals with Javascript exclusively.

JSConf 2009

Track A presentations from JSConf 2009

One of the Track A speakers was Stoyan Stefanov, a really smart guy from Yahoo. His presentation was about high-performance web apps, but on one slide in particular, I was struck by his implementation of memoization (slide 79 of 87) because he used the function instance to cache the results of the function itself.

If you’re not familiar with memoization, here’s a great overview:

In computing, memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs.

Memoization (Wikipeida)

In other words, it’s a way to cache the results of function calls so that repeated expensive computations or lengthy lookups can be avoided.

Here’s Stoyan’s example:

function myFunc ( param ) { if ( ! myFunc. cache ) { myFunc. cache = { } ; } if ( ! myFunc. cache [ param ] ) { var result = { } ; // ... myFunc. cache [ param ] = result ; } return myFunc. cache [ param ] ; } function myFunc(param){ if (!myFunc.cache) { myFunc.cache = {}; } if (!myFunc.cache[param]) { var result = {}; // ... myFunc.cache[param] = result; } return myFunc.cache[param]; }

(Use your imagination to fill in some complex or lengthy task in place of the line with the “…”!)

Again, what struck me was his use of the function instance to cache the results. In Javascript, functions are first-class objects and can be assigned properties just like any other object. But should we do this? It certainly seems like this could cause intractable problems if used widely. Imagine if everybody started decorating functions and methods with properties? (Actually, debuggers such as Drosera and Firebug now use properties on function instances to help identify them in logs and traces. It seems to me we should try not to invade on that space!)

So, of course, I set out to answer the question, “Do we really need to use the function instance to hold the result cache?”

The answer is “No”. It’s really quite easy to avoid, too.



Keep reading if you’re interested in the steps I took to come to a general purpose solution. Otherwise, click here to go straight to it.

A simple example

Here’s the simplest form: a constant-valued function with one parameter.

// constant-valued function. not very interesting var memoizedFunc0 = ( function ( ) { var result ; return function ( ) { // in a real-world example this next line would do some // complex math instead of just returning a date object return result || ( result = new Date ( ) ) ; } } ) ( ) ; // constant-valued function. not very interesting var memoizedFunc0 = (function () { var result; return function () { // in a real-world example this next line would do some // complex math instead of just returning a date object return result || (result = new Date()); } })();

Notice that we use an auto-execute, anonymous function (the outer function) solely to create a closure around the result variable. Also notice that the variable, memoizedFunc0 , is assigned a reference to the inner anonymous function because the outer function executes immediately and returns it. The inner function is the one that gets executed when memoizedFunc0 is called later in our program.

The memoization happens in the inner function’s return statement. We’re using the short-circuit Boolean OR operator ( || ) to return result if it is assigned. Of course, during the first execution, the second half of the statement runs ( (result = new Date()) ), assigning a value to result .

Admittedly, this is a very boring example. Stoyan’s was much more interesting since the memoized function took a parameter and returned a different result for each possible parameter. So let’s take a look at a multi-valued memoized function:

// multi-valued function with string-serializable param var memoizedFunc1 = ( function ( ) { var cache = { } ; return function ( param1 ) { return param1 in cache ? cache [ param1 ] : ( cache [ param1 ] = new Date ( param1 ) ) ; } } ) ( ) ; // multi-valued function with string-serializable param var memoizedFunc1 = (function () { var cache = {}; return function (param1) { return param1 in cache ? cache[param1] : (cache[param1] = new Date(param1)); } })();

This is similar, but we use a native Javascript object to cache function results. As long as the parameter, param1 , is primitive* (or is serializable to a unique string), this will work great. Property names must be serialized to strings. Objects that do not have a toString() method to generate a unique string will generally not work. In most situations, though, you’ll have an id or key that you can use, anyway.

The inner function simply checks if it already has a value cached and returns it. Otherwise, it generates a value, caches it, and returns it.

Success! A memoized function that does not pollute the function instance (and doesn’t use global variables like some other implementations I’ve seen).

A more interesting case: two parameters

But let’s not stop there. What if we have two parameters? I guess we could generate some sort of string key out of the parameters and use the single-parameter memoized function. But why? Generating a unique hash code is expensive and bulky. Let’s try to build a 2-parameter memoized function without that.

// multi-valued function with two params // this time we use two levels of closures and two cache objects var memoizedFunc2 = ( function ( ) { var cache1 = { } ; return function ( param1 , param2 ) { if ( ! ( param1 in cache1 ) ) { cache1 [ param1 ] = ( function ( ) { var cache2 = { } ; return function ( param2 ) { return param2 in cache2 ? cache2 [ param2 ] : ( cache2 [ param2 ] = new Date ( param1 , param2 ) ) ; } } ) ( ) ; } return cache1 [ param1 ] ( param2 ) ; } } ) ( ) ; // multi-valued function with two params // this time we use two levels of closures and two cache objects var memoizedFunc2 = (function () { var cache1 = {}; return function (param1, param2) { if (!(param1 in cache1)) { cache1[param1] = (function () { var cache2 = {}; return function (param2) { return param2 in cache2 ? cache2[param2] : (cache2[param2] = new Date(param1, param2)); } })(); } return cache1[param1](param2); } })();

Heh. The solution is to build two levels of closures! The first level creates and caches the second level as needed! The first-level cache ends up holding a reference to an inner anonymous function for each unique param2 (and each inner function has it’s own unique closure).

A general purpose “memoizer”

The two-parameter case is quite ugly and unwieldy, in my opinion. However, it shed some light on a general-purpose solution, a “memoizer” I’d like to call it. With a little bit of work, I was able to devise a memoizer for any function with explicitly-declared parameters. Here it is:

// memoize: a general-purpose function to enable a function to use memoization // func: the function to be memoized // context: the context for the memoized function to execute within // Note: the function must use explicit, primitive parameters (or objects // that generate unique strings in a toString() method) function memoize ( func , context ) { function memoizeArg ( argPos ) { var cache = { } ; return function ( ) { if ( argPos == 0 ) { if ( ! ( arguments [ argPos ] in cache ) ) { cache [ arguments [ argPos ] ] = func. apply ( context , arguments ) ; } return cache [ arguments [ argPos ] ] ; } else { if ( ! ( arguments [ argPos ] in cache ) ) { cache [ arguments [ argPos ] ] = memoizeArg ( argPos - 1 ) ; } return cache [ arguments [ argPos ] ] . apply ( this , arguments ) ; } } } // JScript doesn't grok the arity property, but uses length instead var arity = func. arity || func. length ; return memoizeArg ( arity - 1 ) ; } // memoize: a general-purpose function to enable a function to use memoization // func: the function to be memoized // context: the context for the memoized function to execute within // Note: the function must use explicit, primitive parameters (or objects // that generate unique strings in a toString() method) function memoize (func, context) { function memoizeArg (argPos) { var cache = {}; return function () { if (argPos == 0) { if (!(arguments[argPos] in cache)) { cache[arguments[argPos]] = func.apply(context, arguments); } return cache[arguments[argPos]]; } else { if (!(arguments[argPos] in cache)) { cache[arguments[argPos]] = memoizeArg(argPos - 1); } return cache[arguments[argPos]].apply(this, arguments); } } } // JScript doesn't grok the arity property, but uses length instead var arity = func.arity || func.length; return memoizeArg(arity - 1); }

This still might look a little crazy, but it’s really quite simple. To satisfy an infinite number of parameters, we simply use recursion. Every time we recurse, we create another closure. This just happens in Javascript. It’s a natural side-effect of recursion in the language. All we have to do is use the closures to cache our results.

Our memoize function takes two arguments. The first is a reference to the function we want to memoize. The second is the context the memoized function should run in. For instance, if we want to memoize an object method, we would pass the method in as the first parameter and the object itself as the second parameter.

When the memoize function executes, it first discerns how many formal (explicit) parameters the function has. Javascript uses the arity property for this purpose. Of course, no interesting Javascript project would be complete without a Microsoft caveat of some sort. For some idiotic reason, JScript uses the length property instead of arity . Other Javascript engines followed suit and deprecated the arity property in favor of the length property.

Next, memoize calls its inner function memoizer on the last parameter. memoizer constructs a local cache and returns an inner function that either executes and caches our original function if we’re at the first parameter — or else it caches and executes a call to memoizer with the previous parameter (argPos - 1) . Each time the function, memoizer , is recursively called it creates a closure around its own cache variable.

Notice that we’re using the apply method when we finally call our original function. This allows us to bind to our original context. We’re also using apply when we recursively call memoizer . This allows us to pass the arguments through without knowing how many arguments there are. The this passed into these calls is arbitrary and could easily have been anything else, including null .

I don’t know why I chose to start at the last parameter and go backwards. This routine could go in either direction. I also have no idea whether this memoizer is any faster or leaner than Stoyan’s. I imagine it doesn’t matter when you consider that both should be much, much faster than the lengthy XHR call or computationally intensive calculation they are meant to prevent.

OK. Cool. But … um … why?

Yes, memoization is a neat concept. But why use it rather than just hand-coded caching mechanisms? It’s easy enough to write a caching routine, right? Here are a few good reasons:

hand-coded caching mechanisms obfuscate your code

multi-variate caching routines are bulky in Javascript

fewer lines of code means fewer bugs

Java programmers will think more highly of you*

*Javascript kicks a$$. Show those stodgy back-end guys what we’re capable of!

Let’s try it out

So, let’s test it. First, let’s create a sample object to work with. This object has an “expensive” method that uses an XHR request to fetch server-side resources.

// myObj constructor function myObj ( ) { } ; myObj. prototype = { expensiveAjaxLookup : function ( y , mo , d , h , mn , s ) { // OK. No XHR. Just pretend it does something on the server! // Instead we're just going to construct a date string. Lame, I know. return this . prop + new Date ( y || 0 , mo || 0 , d || 0 , h || 0 , mn || 0 , s || 0 ) ; } , // a public property to prove that "this" is really "this" in our memoized methods prop : 'my date: ' } // myObj constructor function myObj () { }; myObj.prototype = { expensiveAjaxLookup: function (y, mo, d, h, mn, s) { // OK. No XHR. Just pretend it does something on the server! // Instead we're just going to construct a date string. Lame, I know. return this.prop + new Date(y || 0, mo || 0, d || 0, h || 0, mn || 0, s || 0); }, // a public property to prove that "this" is really "this" in our memoized methods prop: 'my date: ' }

In order to prove that our memoized methods didn’t mangle the context of the original method, I added a public property and used it in the expensiveAjaxLookup method. If our memoized methods lost their context, then this property would be undefined.

Tests:

var o = new myObj ( ) ; o. memoizedLookup = memoize ( o. expensiveAjaxLookup , o ) ; // Note: 8 == September, not August in Javascript console. log ( o. memoizedLookup ( 2009 , 8 , 17 ) ) ; console. log ( o. memoizedLookup ( 2009 , 8 , 16 ) ) ; console. log ( o. memoizedLookup ( 2009 , 9 , 16 ) ) ; console. log ( o. memoizedLookup ( 2009 , 8 , 16 ) ) ; console. log ( o. memoizedLookup ( 2009 , 8 , 21 , 13 , 26 , 17 ) ) ; console. log ( o. memoizedLookup ( 2009 , 8 , 21 , 13 , 26 , 17 ) ) ; var o = new myObj(); o.memoizedLookup = memoize(o.expensiveAjaxLookup, o); // Note: 8 == September, not August in Javascript console.log(o.memoizedLookup(2009, 8, 17)); console.log(o.memoizedLookup(2009, 8, 16)); console.log(o.memoizedLookup(2009, 9, 16)); console.log(o.memoizedLookup(2009, 8, 16)); console.log(o.memoizedLookup(2009, 8, 21, 13, 26, 17)); console.log(o.memoizedLookup(2009, 8, 21, 13, 26, 17));

Output:

my date: Thu Sep 17 2009 00:00:00 GMT-0400 (EDT) my date: Wed Sep 16 2009 00:00:00 GMT-0400 (EDT) my date: Fri Oct 16 2009 00:00:00 GMT-0400 (EDT) ***** cache hit! my date: Wed Sep 16 2009 00:00:00 GMT-0400 (EDT) my date: Mon Sep 21 2009 13:26:17 GMT-0400 (EDT) ***** cache hit! my date: Mon Sep 21 2009 13:26:17 GMT-0400 (EDT) my date: Thu Sep 17 2009 00:00:00 GMT-0400 (EDT) my date: Wed Sep 16 2009 00:00:00 GMT-0400 (EDT) my date: Fri Oct 16 2009 00:00:00 GMT-0400 (EDT) ***** cache hit! my date: Wed Sep 16 2009 00:00:00 GMT-0400 (EDT) my date: Mon Sep 21 2009 13:26:17 GMT-0400 (EDT) ***** cache hit! my date: Mon Sep 21 2009 13:26:17 GMT-0400 (EDT)

As you can see, the context was correctly bound. Otherwise, we’d see ” undefinedThu Sep 17 2009 00:00:00 GMT-0400 (EDT) “. The lines that say “ ***** cache hit! ” are some debugging statements I threw in. I sprinkled some console.log() calls all over the memoize function to ensure that the caches were being used as expected. They are, of course. I’ll leave it as an exercise for you to study the behavior of the function, if you are so inclined. I found the output very interesting and revealing. Go ahead. Have fun with it!

So, what uses can you conceive for memoization in web apps? Is it only good for caching XHR requests or summing Fibonacci series? Please leave a comment or let me know on Twitter!

[Update] I just came across two interesting Javascript memoization articles. In one, the author included an “unmemoize” function. In the other, the “unmemoification” happened automatically at a specified time delay. In the first implementation, though, the function instances are used to hold the unmemoize method. In the second, a hash key is generated to handle multiple parameters.

It’s certainly possible to add both features without either of those sacrifices. I have some ideas how it could be done. Do you have an implementation that works? If so, please leave a comment!

Memoization in JavaScript

Timed Memoization

In this article, memoization is used to speed up Bezier curve calculations: One-Line JavaScript Memoization

I updated the post to remove references to the word “serializable” since I really meant primitives. In Javascript, all objects are serializable. However, I really meant uniquely serializable. All primitives in Javascript are uniquely serializable ( String(37) will always be “37”). Objects, Functions, and Arrays are not always unique when serialized. In most engines, objects, for instance, serialize as “[Object object]”. Dates, despite not being primitive, do have unique serializations (but are locale-dependent).