Eliminating FORMAT from Lisp

Drew McDermott

February 14, 2003\\ Revised August 16, 2004

The FORMAT statement started with Fortran, in the mid-50's. To print an integer and a real, interspersed with explanatory tags, one would write:

PRINT 101, I, X 101 FORMAT (2HI=, I5, 6W, 2HX=, F6.2)

2HI=

2H

I5

F6.2

The C language took over the same idea, embodied in the printf function from the C standard I/O library. Instead of a separate FORMAT statement, the information about how to print the data was incorporated into a string argument to printf . So the example above became:

printf("I=%5d X=%6.2f

", i, x);

Early versions of Lisp had builtin I/O procedures such as print , terpri , etc., but they were awkward for any purpose much more complex than printing a single S-expression. At some point someone had the bright idea of importing format into Lisp. The example above became

(format t "I=~5d X=~6,2F~%" i x)

In my opinion, this was a silly mistake. Lisp is a syntactically extensible language, meaning that it is quite easy, using macros, to create arbitrary language extensions, so long as they obey two basic rules: (1) A new statement must look like (op ...) , where "..." has balanced parentheses; (2) the lexical conventions inside the new statement must be Lisp's (e.g., more characters (including '*' , '+' , and such) are ordinary symbol constituents, in contrast to their role in other languages, so adjacent symbols must be separated by whitespace; double quote starts a string; single quote, sharpsign, and a few other characters have special meanings). If you're used to Lisp, these rules are barely noticeable, so that Lisp hackers come to think of it as having the most flexible syntax in the world.

The format statement takes a completely different approach. format is implemented as a function, whose second argument is a string containing instructions on how to print the remaining arguments. This "format control string" is essentially a little program written in a special "format control language." This language doesn't obey any of Lisp's syntax rules. Over time, the format control language has evolved to the point where it contains conditionals, iteration, and even "goto"s. It even has its own compiler, the formatter function.

If this language were particularly suited to I/O, it might be worth putting up with. But it is unbearably clumsy from the word go. Why should one have to write (format t "x = ~s, y = ~s, z = ~s~%" x y z) , and then match up the occurrences of "~s" with the variables to see where x , y , and z are printed? Why not write this instead:

(out "x = " x ", y = " y ", z = " z :%)

out

out

out

princ

As soon as control structures enter the picture, it becomes even more obvious how much better one can do than format . Suppose we have a structure f1 of type Foo , and we want the print-function for this type to print its label and each element of its contents . Furthermore, if its status is marked :abnormal , we want to print a question mark after the label. At this point the Lisp community had to choose between abandoning format and adding conditionals to it. Unfortunately, they chose to do the latter. Here is what I would like to write:

(out "#<Foo " (Foo-label f1) (:q ((eq (Foo-status f1) ':abnormal) "?")) (:e (dolist (x (Foo-contents f1)) (:o " " x))) ">")

:q

:e

:o

out

(Foo-contents f1)

out

"?"

(:q --clauses-- )

cond

out

f1

:e

(:e --exps-- )

exps

(:o --outstuff-- )

out

The net result for printing f1 is that if f1 has label "tree", is not flagged as abnormal, and has a contents list (eenie meenie minie moe) , it would be printed as

#<Foo tree eeenie meenie minie moe>

(defstruct (xapping (:print-function print-xapping) (:constructor xap (domain range &optional (default ':unknown defaultp) (infinite (and defaultp :constant)) (exceptions '())))) domain range default (infinite nil :type (member nil :constant :universal) exceptions)

where the print-function is defined thus:

(defun print-xapping (xapping stream depth) (declare (ignore depth)) (format stream ;; Are you ready for this one? "~:[{~;[~]~:{~S~:[->~S~;~*~]~:^ ~}~:[~; ~]~ ~{~S->~^ ~}~:[~; ~]~[~*~;->~S~;->~*~]~:[}~;]~]" ;; Is that clear? (xectorp xapping) (do ((vp (xectorp xapping)) (sp (finite-part-is-xetp xapping)) (d (xapping-domain xapping) (cdr d)) (r (xapping-range xapping) (cdr r)) (z '() (cons (list (if vp (car r) (car d)) (or vp sp) (car r)) z))) ((null d) (reverse z))) (and (xapping-domain xapping) (or (xapping-exceptions xapping) (xapping-infinite xapping))) (xapping-exceptions xapping) (and (xapping-exceptions xapping) (xapping-infinite xapping)) (ecase (xapping-infinite xapping) ((nil) 0) (:constant 1) (:universal 2)) (xapping-default xapping) (xectorp xapping)))

Here is a blow-by-blow description of the parts of this format string: ~:[{~;[~] Print ``['' for a xector, and ``{'' otherwise. ~:{~S~:[->~S~;~*~]~:^ ~} Given a list of lists, print the pairs. Each sublist has three elements: the index (or the value if we're printing a xector); a flag that is true for either a xector or xet (in which case no arrow is printed); and the value. Note the use of ~:{ to iterate, and the use of ~:^ to avoid printing a separating space after the final pair (or at all, if there are no pairs). ~:[~; ~] If there were pairs and there are exceptions or an infinite part, print a separating space. ~ Do nothing. This merely allows the format control string to be broken across two lines. ~{~S->~^ ~} Given a list of exception indices, print them. Note the use of ~{ to iterate, and the use of ~^ to avoid printing a separating space after the final exception (or at all, if there are no exceptions). ~:[~; ~] If there were exceptions and there is an infinite part, print a separating space. ~[~*~;->~S~;->~*~] Use ~[ to choose one of three cases for printing the infinite part. ~:[}~;]~] Print ``]'' for a xector, and ``}'' otherwise.

Folks, you don't have to put up with this nonsense. Here is the civilized way to write the print-function.

(defun print-xapping (xapping stream depth) (declare (ignore depth)) (out (:to stream) ;; Print ``['' for a xector, and ``{'' otherwise. (:q ((xectorp xapping) "[") (t "{")) ;; Print the pairs implied by the xapping. ;; Whether the element to the left of the arrow comes from ;; the list 'd' or the list 'r' depends on whether the ;; xapping is a xector. An arrow is printed only if ;; xapping is not a xector or a xet. The element to the ;; right of the arrow always comes from 'r'. ;; Each pair is followed by a space, except the last. (:e (do ((vp (xectorp xapping)) (sp (finite-part-is-xetp xapping)) (d (xapping-domain xapping) (cdr d)) (r (xapping-range xapping) (cdr r))) ((null d)) (:o (if vp (car r) (car d)) (:q ((not (or vp sp)) "->")) (car r) (:q ((not (null (cdr d))) " "))))) ;; If there were pairs and there are exceptions or an infinite part, ;; print a separating space. (:q ((and (xapping-domain xapping) (or (xapping-exceptions xapping) (xapping-infinite xapping))) " ")) ;; Given a list of exception indices, print them. (:e (do ((el (xapping-exceptions xapping) (cdr el))) ((null el)) (:o (car el) (:q ((not (null (cdr el))) " "))))) ;; If there were exceptions and there is an infinite part, ;; print a separating space. (:q ((and (xapping-exceptions xapping) (xapping-infinite xapping)) " ")) ;; The infinite part is omitted if nil, printed as "->k" if it's a ;; constant k, and printed as "->" if it's "universal" (:e (ecase (xapping-infinite xapping) ((nil)) (:constant (:o "->" (xapping-default xapping))) (:universal (:o "->")))) ;; Print ``]'' for a xector, and ``}'' otherwise. (:q ((xectorp xapping) "]") (t "}"))))

Note that Steele's comments now have a place to go. Of course, now the comments can be about content, not the grubby details of the control string. Furthermore, we no longer have to squeeze the output data into a form intelligible to format , because we can use any Lisp control structure we like. This especially applies to the code for printing the arrow-separated pairs and the "infinite part" of the xapping.

Irritatingly, format control strings are used in more than one Lisp construct. The macros error , cerror , and break use them to control printout. So to eliminate all uses of format control strings we must provide new versions of these macros.

Acknowledgements: The out macro I've given examples of is a descendant of the MSG macro described in

Eugene Charniak, Christopher Riesbeck, Drew McDermott and James Meehan 1987 {\it Artificial Intelligence Programming, {\rm (2nd edition)}}. Lawrence Erlbaum Associates,