What is Archie Markup Language?

ArchieML (or "AML") was created at The New York Times to make it easier to write and edit structured text on deadline that could be rendered in web pages, or more specifically, rendered in interactive graphics.

One of the main goals was to make it easy to tag text as data, without having type a lot of special characters. Another goal was to allow the document to contain lots of notes and draft text that would not be read into the data. And finally, because we make extensive use of Google Documents's concurrent-editing features — while working on a graphic, we can have several reporters, editors and developers all pouring information into a single document — we wanted to have a format that could survive being edited by users who may never have seen ArchieML or any other markup language at all before.

Why not YAML? Or JSON?

ArchieML differs from other popular formats like YAML and JSON in several areas that we've found are key to making it easy to use:

Whitespace is not significant to the document structure

In YAML, lines must be indented precisely and variably; the wrong number of spaces to the left of a key invalidates the document, and tabs can't be used. AML ignores all whitespace not within a value. We believe this makes it easier for non-programmers to use, and is essential for use in environments with non-monospaced fonts, like in Google Documents.

Unstructured text is ignored; there is no such thing as a parsing error

AML was designed so that writers could work in a freeform environment. They should be able to add entire paragraphs as scratch work that do not appear in the output. JSON and YAML have strict schemas that forbid text deviating from a pattern. AML doesn't assume text follows any pattern. If it finds text that looks like data, it treats it as data. Otherwise, it moves on.

The notation makes sense to non-programmers

Lists of values are noted with bullet points / asterisks, not hyphens or quoted strings that must be separated with commas. An overriding goal was to have a intuitive format that could be passed to a non-technical user — a reporter, an assigning editor or a copy editor — to edit, and to have the format be clear enough that they could make changes without breaking the parsing of the document. If we were using another format, we'd have to explain indentation rules in YAML, or how to match curly braces or properly escape quotation marks in JSON, and so forth.

How Does It Work in Practice?

For a very simple example, here's a screenshot of the Google Doc that powers a recent graphic about the trick plays used by the New England Patriots and Seattle Seahawks:

To generate the graphic, we load the ArchieML data from the document using the archieml-js npm module, then pass it to an underscore template to render the final markup server-side. This lets the journalists who are focusing on the text and content concentrate on getting the copy in shape independently of the developers working on the graphic.

While this is a very simple example, with only a few bits of text and data and one comment at the end that is ignored, when we're covering a breaking news story, we can have a half-dozen people all contributing to a Google Doc at the same time as we gather all the information we need for a graphic and turn it into the final copy blocks that make their way into the finished piece.

Resources

Parsers and tools in (hopefully) your language of choice.

Integrating with Google Documents

At The New York Times, we normally write ArchieML in Google Documents. Both parsers include quick-start examples for how to download text from Google Docs and run it through the parser. They also include some formatting steps we take, such as converting links to HTML tags.

Examples:

For more fully-fledged integrations with Google Docs, use one of the plugins above.

Introductory Demo

Click on any ArchieML textarea to try it out yourself, and see how changes affect the output.

Or try out ArchieML in the Sandbox.

Keys and values

Strings can be stored as part of key/value pairs, defined whenever a line in ArchieML begins with a token followed by a colon. Keys can contain any unicode character, with the exception of whitespace / invisible characters, and a handful of characters that are used within ArchieML ( { } [ ] : . + ). The rest of the string is the value.

key: This is a value ☃: Unicode Snowman for you and you and you!

Whitespace surrounding keys and values is ignored. Indent as you like. Keys are case sensitive.

1: value 2:value 3 : value 4: value 5: value a: lowercase a A: uppercase A

Lines that don't look like keys or other special commands are ignored:

This is a key: key: value It's a nice key!

Nested key structure

Use dot-notation to create nested objects.

colors.red: #f00 colors.green: #0f0 colors.blue: #00f

You can also use "object" blocks to namespace a group of keys.

{colors} red: #f00 green: #0f0 blue: #00f {numbers} one: 1 ten: 10 one-hundred: 100 {} key: value

You can end an object with a line beginning with {}, or by beginning a new object. ArchieML is parsed one line at a time, so ending "tags" like {} are never required.

Dot notation can be used in object namespaces as well:

{colors.reds} crimson: #dc143c darkred: #8b0000 {colors.blues} cornflowerblue: #6495ed darkblue: #00008b

Arrays of objects

Groups of keys can be placed inside an array by giving the array a name within brackets. The name of the array can be any valid key, and can use dot-notation. You can optionally end an array with an empty set of brackets, or by beginning a new array.

[scope.array] []

All keys inside the array are inserted into a single object within the array. The parser remembers the first key it found, and whenever it encounters it again, a new object is started.

[arrayName] Jeremy spoke with her on Friday, follow-up scheduled for next week name: Amanda age: 26 # Contact: 434-555-1234 name: Tessa age: 30 []

Arrays of strings

You can also create "simple" or "flat" arrays of strings. If an asterisk is encountered first within an array, that array will become a simple array, and key/value pairs within it will be ignored. If a key/value pairs is encountered first, then asterisk lines will be ignored.

[days] * Sunday note: holiday! * Monday * Tuesday Whitespace is still fine around the '*' * Wednesday * Thursday Friday! * Friday * Saturday []

Nested arrays

Array elements can contain arrays of their own. To begin an array while inside an array element, prepend its name with a period.

[array] [.subarray] [.subsubarray] key: value

Unlike top-level arrays, nested arrays must be "closed" with empty brackets in order to move up to the parent level.

[days] name: Monday [.tasks] * Clean dishes * Pick up room [] name: Tuesday [.tasks] * Buy milk []

Freeform arrays

Freeforms are a third type of array that was created to have better control over presentation from within ArchieML.

Unlike regular arrays, which group lines into objects whose values have no order, freeforms preserve the order of each of its lines. Clients that use ArchieML's output can then use that order to render the values, allowing you to vary the presentation for each array item.

[+books] kicker: Books you should read score: ★★★★★!!! title: Wuthering Heights author: Emily Brontë title: Middlemarch author: George Eliot score: ★★★★☆ []

Each line becomes its own object, with a type and value. ArchieML splits these two words into separate objects to make it easier to deal with different type of information; rendering logic can always be based on the content of the type attribute.

Freeforms also allow you to type unstructured lines of text, which are included as items in the array with a type of text. Note that this means that comments do not work within freeforms.

[+text] I can type words here... And separate them into different paragraphs without tags. []

Having full control over order is useful when arrays need to be mixed with other types of data. For example, showing a list of events interspersed with general artwork.

[+events] header: My Birthday date: August 20th, 1990 {.image} src: http://example.com/photo.png alt: Family Photo {} header: High School Graduation date: June 4th, 2008 []

Multi-line values

Values automatically end when a newline is encountered. But all subsequent text is read into a buffer that can be added to that key. Anchor the end of a multi-line value by following the value with a line beginning with ":end". All whitespace within the block is preserved.

Try removing the last line to see how it changes the output:

key: value More value Even more value :end

Works within object and simple arrays

[arrays.complex] key: value more value :end [arrays.simple] * value more value :end

Escape characters

You can place any text inside of a multi-line value. If one of your lines would be interpreted by the parser as a key or some other special command, you may have to escape that line by adding a backslack to the beginning of it. The backslash won't be included in the value.

Try removing the backslashes from the following lines:

key: value \:end :end

key: value \more: value :end

key: value [escaping * is not necessary if we're not inside an array, but will still be removed] \* value :end

key: value \:ignore \:skip \:endskip :end

Block comments

Wrap text between lines that begin with ":skip" and ":endskip" to ignore blocks of text.

:skip this: text will: be ignored :endskip

There is also a safety mechanism of sorts built in. When the parser encounters a line beginning with ":ignore" (even if it's within a :skip block), parsing immediately stops, and the rest of the document is ignored.

key: value :ignore [array] * Blah [] other-key: other value

Usage

If you use JavaScript or Ruby, we hope you'll try one of the existing ArchieML parsers.

If you want to make a parser yourself (or want the technical details on the format), the full specification is online here.

Questions or concerns? The Github repository for this site is available at newsdev/archieml.org, and you can use its Issues page to submit questions or bugs on the spec itself.

Created by Michael Strickland, Archie Tse, Matthew Ericson and Tom Giratikanon / The New York Times

Copyright (c) 2015 The New York Times Company