TXR: an Original, New Programming Language for Convenient Data Munging

Kaz Kylheku <kaz @ kylheku.com>

Quick Links

What is it?

TXR is a pragmatic, convenient tool ready to take on your daily hacking challenges with its dual personality: its whole-document pattern matching and extraction language for scraping information from arbitrary text sources, and its powerful data-processing language to slice through problems like a hot knife through butter. Many tasks can be accomplished with TXR "one liners" directly from your system prompt. TXR is relatively new: the project started in 2009.

It is difficult to give a small introduction to TXR because it is no longer a small language. The PDF rendition of the reference manual, which takes the form of a large Unix man page, is 720 pages long, excluding any index or table of contents. There are many ways to solve a given data processing problem with TXR.

TXR is a fusion of many different ideas, a few of which are original, and it is influenced by many languages, such as Common Lisp, Scheme, Awk, M4, POSIX Shell, Prolog, Ruby, Python, Arc, Clojure, S-Lang and others.

TXR consists of two languages, which can be used separately or tangled together: the TXR Pattern Language, and TXR Lisp.

A comparison may be drawn between the TXR Pattern Language and the Unix utility Awk. Both provide an implicit, convenient way of scanning input. Whereas Awk implicitly reads a file, breaking it into records and fields which are accessible as positional variables, TXR has quite a different way of making input handling implicit: namely via a nested, recursive pattern matching notation which binds variables. This approach still handles delimited fields with relative convenience, but generalizes into handling messy, loosely structured data, or data which exhibits different regularities in different sections, etc. Constructs in TXR (the pattern language) aren't imperative statements, but rather pattern-matching directives: each construct terminates by matching, failing, or throwing an exception. Searching and backtracking behaviors are implicit. It has features like structured named blocks with nonlocal exits, structured exception handling, named pattern matching functions, and numerous other features. TXR's pattern language is powerful enough to parse grammars, yet simple to use in an ad-hoc way on trivial tasks. Speaking of Awk, TXR in fact contains an implementation of Awk, in the form of a Lisp macro, which brings us to the next topic.

The other language in TXR is TXR Lisp. This is not an implementation of an existing Common Lisp or Scheme, but a new dialect, which contains many new ideas. TXR Lisp is feature-rich, and oriented toward succinct, convenient expressivity. While staying completely true to the Lisp heritage, it takes cues from new scripting and functional languages.

TXR Lisp programs are shorter and clearer than those written in some mainstream languages "du jour" like Python, Ruby, Clojure, Javascript or Racket. If you find that this isn't the case, the TXR project wants to hear from you; give a shout to the mailing list. If a program is significantly clearer and shorter in another language, that is considered a bug in TXR.

Help Needed

The TXR project is looking for hackers to develop features

TXR has clean, easy to understand and maintain internals that are a pleasure to work with. Be sure to read the HACKING guide.

Examples

Here is a collection of TXR Solutions to a number of problems from Rosetta Code.

Make a Donation

TXR is truly free software because it is distributed under the two-clause BSD license which allows every conceivable use, commercial and non-commercial.

If you find TXR to be a valuable tool in your arsenal, here is one way to show your appreciation and support! Developing stuff like this takes countless hours.