Hash Tables in Javascript

Javascript is a prototype-based object-oriented language. In JavaScript, all non-scalar objects behave as associative arrays, a mapping from property keys to values. The keys and values can be scalars, objects or functions. This tutorial demonstrates how to provide a hash table wrapper around these native objects that avoids stepping on the built-in properties of a JavaScript object and thus makes it more consistent.

Introduction

A hash table is a permutation of an associative array (i.e. name => value pairs). If you use PHP, then you are very familiar with this type of data structure already since all PHP arrays are associative. (That's how I became familiar with them).

In JavaScript, all non-scalar objects behave as associative arrays. The object can use other objects as keys. However, the length of the associative array is not tracked (like with index-based arrays) and there is the potential that the keys can conflict with built-in members (such as those added to the object prototype) and custom members such as the length property or convenience methods. We'll explore an approach that can avoid these shortcomings.

A short example of an associative array (hash table) in JavaScript is as follows:

var h = new Object(); // or just {} h['one'] = 1; h['two'] = 2; h['three'] = 3; // show the values stored for (var k in h) { // use hasOwnProperty to filter out keys from the Object.prototype if (h.hasOwnProperty(k)) { alert('key is: ' + k + ', value is: ' + h[k]); } }

TIP: Feel free to replace alert() with console.log() if it's supported by your browser (Chrome, Firefox with FireBug, etc).

Just as in PHP, the 'foreach' contruct is used to run through the array, doing something for each key => value pair. But what if we wanted to know the size?

alert('size of hash table ' + h.length);

Hmm. This reference to the length property gives the somewhat unexpected value of 'undefined'. That's because in Javascript the length property of an object is not incremented when new properties (keys) are added (meaning it doesn't really behave like a hash table).

Fundamentals

After a brief discussion of fundamentals we will begin to focus on the core problem. In JavaScript, every non-scalar variable is an object. Okay, so what does this mean? Well, essentially, this means it has a constructor, methods and properties. A property is just a variable that is owned by the object and thus local to that object. A property is accessed using the syntax:

h.one

where one is the property and the '.' symbol signifies we are talking about the property of the object obj. It can also be assigned in the same way:

h.one = 1;

The above example could be alternatively executed as:

for (var k in h) { if (h.hasOwnProperty(k)) { alert('key is: ' + k + ', value is: ' + eval('h.' + k)); } }

WARNING: We are using eval() here only for demonstration purposes to show that direct property reference (obj.one) produces the same result at bracket notation (obj['one']).

It's also possible to add properties to the object whose type is an array, which mixes assignment to the indexed values of the array and the members of the object.

var a = new Array(); // or just [] a[0] = 0 a['one'] = 1; a['two'] = 2; a['three'] = 3; for (var k in a) { if (a.hasOwnProperty(k)) { alert('key is: ' + k + ', value is: ' + a[k]); } } alert(a.length);

Oddly, the length is reported to be 1, yet four keys are printed. That's because we are manipulating both the array elements and the underlying object. That's just the flexibility of JavaScript shining through (and confusing us). The square bracket notation (i.e., []) is overloaded. When the key is numeric, we are assigning elements to the array. Otherwise, we are assigning members to the object.

Since each object has default properties that are accessed using this very same syntax, such as length and constructor, consider the case where the key in the hash is the same as one of these properties. This situation highlights the fundamental problem with associative arrays in JavaScript. It should be clear now why the length property is not set when we make an associative array data structure. By treating the object as an associative array by adding non-integer keys, you are manipulating the underlying JavaScript object, which is not tracked by the length (the length is for the indexed key values only).

Constructing a HashTable Class

In Javascript, we can create our own classes. So what we are going to do is create a HashTable() class that can maintain an associative array, but isolate API methods from data keys (no conflicts).

You might thinking, "okay, so we make a class, but how do we get around the conflicting properties problem?". Easy, we make a property which itself is an associative array and call it items. Then, we can use any key we want, and store the data about the array in other properties. The trick is to move the data part of the associative array inside of a property of the class. The following listing is the HashTable() object definition:

function HashTable(obj) { this.length = 0; this.items = {}; for (var p in obj) { if (obj.hasOwnProperty(p)) { this.items[p] = obj[p]; this.length++; } } }

NOTE: You should select another name for this class if you are using a JavaScript library that already defines this function.

Let's break this down a bit. Right off the bat, we create a length property, which will just be 0 to start with. Additionally, we accept an object in our constructor. Next we populate the internal items with the key => value pairs of the object passed to the constructor and increment the length coorespondingly. A typical call to create a HashTable() object would use the following syntax:

var h = new HashTable({one: 1, two: 2, three: 3, "i'm no 4": 4});

Already it must be nice to see a contructor...so much easier to add data to the structure!

Now, as you may recall before, we couldn't have any properties or methods in our associative array (because they could conflict with the keys of the data), so besides a 'foreach' construct, there was not much we could do with our associative array. Now that they are separated, we have the ability to add methods and properties, let's get started!

function HashTable(obj) { this.length = 0; this.items = {}; for (var p in obj) { if (obj.hasOwnProperty(p)) { this.items[p] = obj[p]; this.length++; } } this.setItem = function(key, value) { var previous = undefined; if (this.hasItem(key)) { previous = this.items[key]; } else { this.length++; } this.items[key] = value; return previous; } this.getItem = function(key) { return this.hasItem(key) ? this.items[key] : undefined; } this.hasItem = function(key) { return this.items.hasOwnProperty(key); } this.removeItem = function(key) { if (this.hasItem(key)) { previous = this.items[key]; this.length--; delete this.items[key]; return previous; } else { return undefined; } } this.keys = function() { var keys = []; for (var k in this.items) { if (this.hasItem(k)) { keys.push(k); } } return keys; } this.values = function() { var values = []; for (var k in this.items) { if (this.hasItem(k)) { values.push(this.items[k]); } } return values; } this.each = function(fn) { for (var k in this.items) { if (this.hasItem(k)) { fn(k, this.items[k]); } } } this.clear = function() { this.items = {} this.length = 0; } }

NOTE: Alternatively, you could refer directly to the constructor argument, obj, rather than copying it to the member property, items. It's up to you.

Let's put the HashTable() through some exercise.

var h = new HashTable({one: 1, two: 2, three: 3, "i'm no 4": 4}); alert('original length: ' + h.length); alert('value of key "one": ' + h.getItem('one')); alert('has key "foo"? ' + h.hasItem('foo')); alert('previous value of key "foo": ' + h.setItem('foo', 'bar')); alert('length after setItem: ' + h.length); alert('value of key "foo": ' + h.getItem('foo')); alert('value of key "i'm no 4": ' + h.getItem("i'm no 4")); h.clear(); alert('length after clear: ' + h.length);

These calls should produce the following output:

original length: 4

value of key "one": 1

has key "foo"? false

previous value of key "foo": undefined

length after setItem: 5

value of key "foo": bar

value of key "i'm no 4": 4

length after clear: 0

Let's find out how this all works.

Understanding the Implementation

We now have lots of useful methods! In JavaScript, any variable can be a reference to a function, so to add methods to the class, the easiest way to do it is to just write the function and then assign it to a property in the class. Okay, so you may be thinking, "But I can't have the same property name as a method name." That's right, another limitation of JavaScript objects is that methods are properties. However, in most cases, it won't be a problem because method names should be 'behavior' names and properties should be 'state' names.

In order to access the underlying items, we added the methods 'setItem', 'getItem', 'hasItem', 'keys', 'values' to set and retrieve data and 'remoteItem' and 'clear' methods to flush out the data. For now we will refer to each key => value pair as an item. Not that 'getItem' is added merely for completeness. You could just acccess the items property direct to retrieve values by key (which would be slightly faster anyway). The 'hasItem' method uses the 'hasOwnProperty' method to check that a key belong to the items object and isn't a function that has been added to the prototype of Object (which is done by libraries such as prototype.js).

The most important role of our methods is to keep the length property up to date. As we can see, it takes a lot of work out of our job and we can use these nice methods to work easily with our hash. Just like a HashTable in Java, the return value is a reference to the item in the HashTable() that was replaced:

alert("Previous value: " + h.setItem('foobar', 'hey'));

If you now want to iterate through the HashTable() like we did the object in the very beginning, you may do so using several different approaches:

Iterating the items, filtering out members inherited from the Object.prototype:

for (var k in h.items) { if (h.hasItem(k)) { alert('key is: ' + k + ', value is: ' + h.items[k]); } }

Iterating the entries using each: (notice we don't have to use hasOwnProperty in this case)

h.each(function(k, v) { alert('key is: ' + k + ', value is: ' + v); });

Iterating the collection of keys:

for (var i = 0, keys = h.keys(), len = keys.length; i < len; i++) { alert('key is: ' + keys[i] + ', value is: ' + h.getItem(keys[i])); }

Iterating the collection of values:

for (var i = 0, v = h.values(), len = v.length; i < len; i++) { alert('value is: ' + v[i]); }

You can also find out the size of the hash table:

alert('size of hash table: ' + h.length);

Summary

Now you should go home and start using this in every JavaScript code you write because it finally makes associative arrays in JavaScript useful. It's great for storing configuration data, returning multiple values from function, and the list goes on.

While this article remains useful as background material, I strongly recommend adopting a more mature JavaScript library such as jQuery, which offers many more well-tested convenience APIs, including jQuery.each(), which is a generic iterator function, which can be used to seamlessly iterate over both objects and arrays.

Attributions

Thanks to Peter Bailey for pointing out errors in this original article and suggesting how to improve it. I've included his feedback below verbatim. I've done my best to integrate his recommended changes into the article. He is correct, though, this article is really just about adding a length property to an associative array in JavaScript and letting you add additional methods that don't conflict with key names, plain and simple.

You mislead the reader in the very first example by starting with a new Array() as if that has any significance to what comes next. That first line might as well read "var myArray = new Boolean();" for how much irrelevance arrays have to the example. <- I've changed the example to use Object().

You get object iteration all wrong. for..in loops need a check against obj.hasOwnProperty(). See the "Objects" section here for a synopsis http://javascript.crockford.com/survey.html. <- I've included the use of hasOwnProperty() and explained why it's necessary.

Also, the usage of eval() is completely unwarranted since javascript objects support bracket notation. myArray[i] will work just fine. I find this especially troubling since your note on the page suggests to the reader that they "have" to use eval(), arguably the most dangerous function in the javascript core. Not to mention you use this exact syntax in the previous example! (myArray['one'] = 1 is the same as myArray.one = 1). <- I added a note that this is purely for demonstration purposes.

Not all variables are objects! There are actual scalar types. It only feels like they're all objects because they're essentially implemented as generics. <- I clarify that I'm talking about non-scalar objects.

Probably my biggest issue is that article almost doesn't need to exist at all. JavaScript objects *are* hashtables! The only real utility your Hash class gives is size determination. Everything else (set, get, has, delete) is available natively.

Thanks Peter!