I talk a lot about static site generators, but always about using static site generators. In most cases, it may seem like a black box. I create a template and some Markdown and out comes a fully formed HTML page. Magic!

But what exactly is a static site generator? What goes on inside that black box? What kind of voodoo is this?

In this post, I want to explore all of the parts that make up a static site generator. First, we’ll discuss these in a general fashion, but then we’ll take a closer look at some actual code by delving deep inside HarpJS. So, put your adventurer’s cap on and let’s start exploring.

Why Harp? For two reasons. The first is that HarpJS is, by design, a very simple static site generator. It doesn’t have a lot of the features that might cause us to get lost exploring a more comprehensively full-featured static site generator (like Jekyll for instance). The second, much more practical, reason is that I know JavaScript and don’t know Ruby very well.

The Basics of a Static Site Generator

The truth is, a static site generator is a pretty simple concept. The key ingredients to a static site generator are typically:

A template language(s) for creating page/post templates

A lightweight markup language (typically Markdown) for authoring content

A structure and markup (often YAML) for providing configuration and metadata (e.g. “front matter“)

A set of rules or structure for organizing and naming files that are exported/compiled, files that are not and how these files will be handled (e.g. frequently prefacing a file or folder with an underscore means that it is not exported into the final site files or all posts go in a posts folder)

A means of compiling templates and markup into HTML (frequently support for CSS or JavaScript preprocessors is also included)

A local server for testing.

That’s it. If you’re thinking, “Hey… I could build that!” you are probably correct. Things start to get complicated though when you start to expand the functionality, as most static site generators do.

So, let’s look at how Harp handles this.

Getting to the Harp of the Matter

Let’s look at the basics of how Harp handles the key ingredients described above. Harp offers more than this handful of functionality, but, for the sake of our examination, we’ll stick to those items.

First, let’s discuss the basics of Harp.

Harp Basics

Harp supports Jade and EJS (for templating) and Markdown as its lightweight markup language (for content). Note that while Jade is now called Pug, Harp has not officially transitioned in their documentation or code, so we’ll stick with Jade here. Harp also offers support for other preprocessing such as Less, Sass, and Stylus for CSS and CoffeeScript for JavaScript.

By default Harp does not require much in the way of configuration or metadata. It tends to favor convention over configuration. However, it allows for specific metadata and configuration using JSON. It differs from many other static site generators in that file metadata is contained outside of the actual file within a `_data.json` file.

While it is configurable to a degree, Harp has certain established guidelines for how to structure files. For example, in a typical application, the files that are served fall within a public directory. Also, any file or folder prefaced by an underscore will not be served.

Lastly, Harp offers a basic local web server for testing that includes some configurable options. And, of course, it will compile the finished HTML, CSS and JavaScript files for deployment.

Let’s Look at Harp’s Actual Source Code

Since much of what makes a static site generator are rules and conventions, the code centers around the actual serving and compiling (for the most part). Let’s dig in.

The Server Function

In Harp, serving your project is usually done by executing harp server from the command line. Let’s look at the code for that function:

exports.server = function(dirPath, options, callback){ var app = connect() app.use(middleware.regProjectFinder(dirPath)) app.use(middleware.setup) app.use(middleware.basicAuth) app.use(middleware.underscore) app.use(middleware.mwl) app.use(middleware.static) app.use(middleware.poly) app.use(middleware.process) app.use(middleware.fallback) return app.listen(options.port || 9966, options.ip, function(){ app.projectPath = dirPath callback.apply(app, arguments) }) }

While the function looks simple, obviously there is a ton going on within middleware that isn’t illustrated here.

The rest of this function opens up a server with the options you specify (if any). Those options include a port, an IP to bind to and a directory. By default the port is 9000 (not 9966 as you might guess by the code), the directory is the current one (i.e. the one Harp is running in) and the IP is 0.0.0.0 .

The details for these defaults are in the command line application source

The Compiler Function

Staying within index.js, let’s take a look at the compile function next.

exports.compile = function(projectPath, outputPath, callback){ /** * Both projectPath and outputPath are optional */ if(!callback){ callback = outputPath outputPath = "www" } if(!outputPath){ outputPath = "www" } /** * Setup all the paths and collect all the data */ try{ outputPath = path.resolve(projectPath, outputPath) var setup = helpers.setup(projectPath, "production") var terra = terraform.root(setup.publicPath, setup.config.globals) }catch(err){ return callback(err) } /** * Protect the user (as much as possible) from compiling up the tree * resulting in the project deleting its own source code. */ if(!helpers.willAllow(projectPath, outputPath)){ return callback({ type: "Invalid Output Path", message: "Output path cannot be greater then one level up from project path and must be in directory starting with `_` (underscore).", projectPath: projectPath, outputPath: outputPath }) } /** * Compile and save file */ var compileFile = function(file, done){ process.nextTick(function () { terra.render(file, function(error, body){ if(error){ done(error) }else{ if(body){ var dest = path.resolve(outputPath, terraform.helpers.outputPath(file)) fs.mkdirp(path.dirname(dest), function(err){ fs.writeFile(dest, body, done) }) }else{ done() } } }) }) } /** * Copy File * * TODO: reference ignore extensions from a terraform helper. */ var copyFile = function(file, done){ var ext = path.extname(file) if(!terraform.helpers.shouldIgnore(file) && [".jade", ".ejs", ".md", ".styl", ".less", ".scss", ".sass", ".coffee"].indexOf(ext) === -1){ var localPath = path.resolve(outputPath, file) fs.mkdirp(path.dirname(localPath), function(err){ fs.copy(path.resolve(setup.publicPath, file), localPath, done) }) }else{ done() } } /** * Scan dir, Compile Less and Jade, Copy the others */ helpers.prime(outputPath, { ignore: projectPath }, function(err){ if(err) console.log(err) helpers.ls(setup.publicPath, function(err, results){ async.each(results, compileFile, function(err){ if(err){ callback(err) }else{ async.each(results, copyFile, function(err){ setup.config['harp_version'] = pkg.version delete setup.config.globals callback(null, setup.config) }) } }) }) }) }

The first portion defines the output path as specified by the call to harp compile via the command line (source here). The default, as you can see, is www . The callback is a callback function passed by the command line utility which is not configurable.

The next part starts by calling the setup function in the helpers module. For the sake of brevity, we won’t go into the specific code of the function (feel free to look for yourself), but essentially it reads the site configuration (i.e. harp.json ).

You may also notice a call to something called terraform . This will come up again within this function. Terraform is actually a separate project required by Harp that is the basis of its asset pipeline. The asset pipeline is where the hard work of compiling and building the finished site gets done (we’ll look at Terraform code in a little bit).

The next portion of code, as it states, tries to prevent you from specifying an output directory that would inadvertently overwrite your source code (which would be bad as you’d lose any work since your last commit).

The compileFile and copyFile functions are fairly self-explanatory. The compileFile function relies on Terraform to do the actual compilation. Both of these functions drive the prime function which uses a helper function ( fs ) to walk the directories, compiling or copying files as necessary in the process.

Terraform

As I discussed, Terraform does the grunt work for compiling the Jade, Markdown, Sass and CoffeeScript into HTML, CSS and JavaScript (and assembling these pieces as defined by Harp). Terraform is made up of a number of files that define its processors for JavaScript, CSS/stylesheets, and templates (which, in this case, includes Markdown).

Within each of these folders is a processors folder that contains the code for each specific processor that Terraform (i.e. Harp) supports. For example, in the templates folder are files that form the basis for compiling EJS, Jade, and Markdown files.

I won’t delve into the code for each of these, but, for the most part, they rely upon external npm modules that handle the supported processor. For example, for Markdown support, it depends upon Marked.

The core logic of Terraform is contained in its render function.

/** * Render * * This is the main method to to render a view. This function is * responsible to for figuring out the layout to use and sets the * `current` object. * */ render: function(filePath, locals, callback){ // get rid of leading slash (windows) filePath = filePath.replace(/^\\/g, '') // locals are optional if(!callback){ callback = locals locals = {} } /** * We ignore files that start with underscore */ if(helpers.shouldIgnore(filePath)) return callback(null, null) /** * If template file we need to set current and other locals */ if(helpers.isTemplate(filePath)) { /** * Current */ locals._ = lodash locals.current = helpers.getCurrent(filePath) /** * Layout Priority: * * 1. passed into partial() function. * 2. in `_data.json` file. * 3. default layout. * 4. no layout */ // 1. check for layout passed in if(!locals.hasOwnProperty('layout')){ // 2. _data.json layout // TODO: Change this lookup relative to path. var templateLocals = helpers.walkData(locals.current.path, data) if(templateLocals && templateLocals.hasOwnProperty('layout')){ if(templateLocals['layout'] === false){ locals['layout'] = null } else if(templateLocals['layout'] !== true){ // relative path var dirname = path.dirname(filePath) var layoutPriorityList = helpers.buildPriorityList(path.join(dirname, templateLocals['layout'] || "")) // absolute path (fallback) layoutPriorityList.push(templateLocals['layout']) // return first existing file // TODO: Throw error if null locals['layout'] = helpers.findFirstFile(root, layoutPriorityList) } } // 3. default _layout file if(!locals.hasOwnProperty('layout')){ locals['layout'] = helpers.findDefaultLayout(root, filePath) } // 4. no layout (do nothing) } /** * TODO: understand again why we are doing this. */ try{ var error = null var output = template(root, templateObject).partial(filePath, locals) }catch(e){ var error = e var output = null }finally{ callback(error, output) } }else if(helpers.isStylesheet(filePath)){ stylesheet(root, filePath, callback) }else if(helpers.isJavaScript(filePath)){ javascript(root, filePath, callback) }else{ callback(null, null) } }

(If you were reading all this code closely, you likely noticed TODO’s, typos, and even a funny “understand again why we are doing this” comment. That’s real life coding!)

The majority of the code in the render function is about handling templates. Things like CoffeeScript and Sass fundamentally render on a one-to-one basis. For example, style.scss will render to style.css . Even if it has includes, that is handled by the renderer. The very end of the render function deals with these types of files.

Layouts in Harp, on the other hand, are nested within each other in a variety of manners that can even depend upon configuration. For example, about.md might be rendered within the default _layout.jade (where, exactly, is determined by the use of != yield within that layout). However, _layout.jade might include multiple other layouts within itself by way of the partial support in Harp.

Partials are a way of splitting up a template into multiple files. They are especially useful for code reuse. For instance, I might put the site header inside a partial. Partials are important for making layouts within a static site generator maintainable but they also add a good deal of complexity to the logic of compiling templates. This complexity is handled within the partial function of the templates processor.

Finally, you could override the default layout by specifying a specific layout or no layout at all for a particular file within the _data.json configuration file. All of these scenarios are handled (and even numbered) within the logic of the render function.

That’s Not So Complicated, Is It?

To make this digestible, I’ve skipped over a ton of additional detail. At its core, every static site generator I’ve ever used (and I’ve used a bunch) functions similarly: a set of rules, conventions, and configuration that is run through compilers for the various supported markups. Perhaps that is why there are a ridiculous number of static site generators out there.

That being said, I wouldn’t want to build my own!

My Report & Book

If you are interested in learning how to build sites using a static site generator, I’ve authored a report and co-authored a book for O’Reilly that might interest you. My report, simply titled Static Site Generators is free and attempts to establish the history, landscape, and basics behind static site generators.

The book that I co-authored with Raymond Camden is called Working with Static Sites and is available as an early release, but should be available in print soon.