Readable s-expressions and sweet-expressions: Getting the infix fix and fewer parentheses in Lisp-like languages

This page is obsolete; see http://readable.sourceforge.net instead.

Many people find Lisp s-expressions hard to read as a programming notation. This paper discusses various ways to extend/modify s-expressions so they can be more readable without losing their power (such as quasiquoting, macros, and easily-manipulated program fragments). The goal is a notation that can be trivially translated to and from traditional s-expression notation (both by computer and in people’s heads), and isn't dependent on a particular underlying semantic. The paper identifies and discusses three approaches that seem particularly promising: indentation, name-prefixing (so func(x y) is the same as (func x y)), and infix support.

It then defines a particular way of combining these approaches, called “sweet-expressions”, that can be viewed as an essentially backward-compatible extension of s-expressions. A sweet-expression reader can accept typical cleanly-formatted s-expressions without change, but it also supports various extensions (optional syntactic sugar) that make much clearer code possible. This is purely a matter of screen presentation; underlying systems can continue to use s-expressions, unchanged. For example, here’s a trivial Common Lisp program that takes advantage of sweet-expression’s formatting extensions (the Scheme version is similar):

defun factorial (n) ; Parameters can be indented, but need not be if (n <= 1) ; Supports infix, prefix, & function <=(n 1) 1 ; This has no parameters, so it's an atom. n * factorial(n - 1) ; Function(...) notation supported

Sweet-expressions add the following abilities:

Indentation. Indentation may be used instead of parentheses to start and end expressions: any indented line is a parameter of its parent, later terms on a line are parameters of the first term, lists of lists are marked with GROUP, and a function call with 0 parameters is surrounded or followed by a pair of parentheses. A “(” disables indentation until its matching “)”. Blank lines at the beginning of a new expression are ignored. A term that begins at the left edge and is immediately followed by newline is immediately executed, to make interactive use pleasant. Name-prefixing. Terms of the form ‘NAME(x y...)’, with no whitespace before ‘(’, are interpreted as ‘(NAME x y...)’;. If its content is an infix expression, it's considered one parameter. Infix. Optionally, expressions are automatically interpreted as infix if their second parameter is an infix operator (by matching an “infix operator” pattern of symbols), the first parameter is not an infix operator, and it has at least three parameters. Otherwise, expressions are interpreted as normal “function first” prefix notation. Infix expressions must have an odd number of parameters with the even ones being binary infix operators. You must separate each infix operator with whitespace on both sides. You can chain the same infix operator, so (2 + 3 + 4) is fine; to mix infix operators, use parentheses. Thus "2 + (y * -(x)" is a valid expression, equivalent to (+ 2 (* y (- x))). Infix operators must match this pattern (and in Scheme cannot be =>): [+-\*/<>=&\|\p{Sm}]{1-4}|\:|\|\|

For more information, see my website at http://www.dwheeler.com/readable.

This paper describes the rationale behind the older sweet-expressions version 0.1; see Sweet-expressions: Version 0.2 for information on the changes made to sweet-expressions since this paper.

Introduction

S-expression notation is a very simple notation for programs, and programs in variants of the Lisp programming language have traditionally been written using s-expressions. In this notation, an operation and its parameters is surrounded by parentheses; the operation to be performed is identified first, and each parameter afterwards is separated by whitespace. So “2+3” is written as “(+ 2 3)”. As noted in the May 2006 Wikipedia, this syntax “is extremely regular, which facilitates manipulation by computer. The reliance on [s-]expressions gives the language great flexibility. Because Lisp functions are themselves written as lists, they can be processed exactly like data: allowing easy writing of programs which manipulate other programs (metaprogramming).” In short, s-expressions are a powerful and regular way to represent programs and other data.

I’ve written a lot of Lisp code, so I’ve learned to read s-expressions fairly well. (I wrote a lot of Lisp code in the late 1980s on a $120,000 system.) But I am never the only one who reads my programs -- I need to make sure others can read my programs too.

Here’s the problem: for most software developers, programs written solely using s-expressions are hard to read, and they will only voluntarily use programming languages that allow, at least optionally, a more common notation. This is particularly true for the usual infix operations (+, <, and so on). People who use Lisp-based languages all the time eventually learn, but not everyone wants to use them all the time, and even developers who are comfortable with programs in s-expression notation need to share their work with others. Wikipedia notes that “the heavy use of parentheses in S-expressions has been criticized -- some joke acronyms for Lisp are ‘Lots of Irritating Superfluous Parentheses’, ‘Let’s Insert Some Parentheses’, or ‘Long Irritating Series of Parentheses’ “.

Yes, I know the arguments. “S-expressions are powerful and regular” ! Of course they are. They are a wonderful intermediate representation for lots of things, in fact. But they are a terrible user interface, especially if you are trying to share your results with others. People today -- even most programmers -- want systems that are “easy to use”, and one of the best ways to make something easy to use is to make it familiar. Most software developers have been trained for many years to use traditional infix mathematical notation, and S-expression notation fails to use it. Programming is often not a solo effort; development today is practically always a group effort, and readability for a large diverse group matters.

All the loud statements about the power of S-expressions cannot compete with time looking at real programs. Most software developers laugh at languages that are “obviously weak” (to them) because their default parser cannot handle <, *, and - in conventional ways. Below is a trivial Common Lisp program to compute a factorial - now ask, “is this really the most readable format possible for other readers?”

(defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))

Maybe you think so now, but I use this trivial program as an example to show some alternatives that I think many people would prefer. For example, here’s the same program, but exploiting the abilities of sweet-expressions; a reader for sweet-expressions could read both the previous expression and this one:

defun factorial (n) if (n <= 1) 1 n * factorial(n - 1)

The 1993 paper “The Evolution of Lisp” by Guy L. Steele, Jr. and Richard P. Gabriel section 3.5.1 discusses “Algol-style Syntax”, give a history, and slyly makes fun of efforts to try to use anything other than S-expressions: “Algol-style syntax makes programs look less like the data structures used to represent them. In a culture where the ability to manipulate representations of programs is a central paradigm, a notation that distances the appearance of a program from the appearance of its representation as data is not likely to be warmly received... it is always easy for the novice to experiment with alternative notations. Therefore we expect future generations of Lisp programmers to continue to reinvent Algol-style for Lisp, over and over and over again, and we are equally confident that they will continue, after an initial period of infatuation, to reject it. (Perhaps this process should be regarded as a rite of passage for Lisp hackers.)” Paul Graham posts a longer version of this quote.

Perhaps. But it’s worth noting that thousands of languages have been invented over the years, but almost none decide to use S-expressions as their surface expressions. Smalltalk and Python took many ideas from Lisp (see Norvig’s “Python for Lisp programmers” for more)... but not its surface syntax. Languages like ML and Haskell have strong academic communities... and do not use S-expressions for their surface syntax either. Logo was devised to have Lisp’s power, but intentionally chose to not use its syntax. Dylan actually switched from S-expressions to a more traditional syntax, when they wanted “normal” people to use it. Yacas is intentionally Lisp-like underneath, but completely abandoned Lisp's surface syntax for normal user interaction (as have essentially all computer algebra systems, even though many have a Lisp underneath).

The fact that so many people are compelled to find a “fix” to Lisp syntax indicates to me that there is a problem... the fact that so many efforts fail suggests that it is a hard problem. Norvig’s Lisp retrospective found that even in 2002, when comparing Lisp to Python and Java, Lisp was far faster and more extensible, and had many powerful properties... but it was gaining few users (it was mostly stagnant). I argue that one key reason is the syntax; Lisp's syntax looks genuinely hostile to most “normal programmers”. Instead of Lisp-like languages gaining converts who adjust to prefix syntax, many people are ignoring or abandoning Lisp-like languages to use languages designed to be readable by humans. John Foderaro correctly said, “Lisp is a programmable programming language”, but people expect standard notation to be already available. Obviously it’s easy to create a some kinds of front-end syntax for Lisp-like systems, but they generally don’t catch on -- in part because they usually don’t provide the full power of the system underneath (e.g., you often can’t do backquoting with comma-lifting, or insert new parameters in a control structure, or handle macros well). I prefer to use the best tool for the job; when it's not Lisp, don't use Lisp. But some jobs are naturals for Lisp-based langauges, yet their poor readability seriously interferes with that. I think there are ways to retain the power of Lisp-based languages, while improving their readability and retaining their flexibility.

Oh, and let’s debunk one claim here. Steele and Gabriel claim one problem with Algol-like syntax is that there are “not enough symbols”. Yet even if that were true, by combining characters lots of symbols can be created without resorting to fancy characters. Indeed, <= is a common combined character for “less than or equal to”; nobody has trouble understanding that.

So below, I discuss a lot of past and current work to improve the syntax of Lisp-like languages. I then focus on a few especially promising areas that can make Lisp more readable while still accepting normal s-expression notation and having only minimal changes to it. These include:

Indentation for meaning (like Python and Haskell); there is already work on this, particularly with I-Expressions. Name-prefixing approach, so f(x) and (f x) mean the same thing. Norvig earlier experimented with this. Infix approaches - how in the world can we rationally add infix support? There are lots of options, which I explore.

Goals

My goals were originally for the BitC project, but now I think there might be ways to work more widely for any language that uses Lisp-like notations. Here were my goals for a programming notation for Lisp-like systems:

Readable. It should be more “readable” to the uninitiated, in particular, it should look like more traditional notation. For example, ideal format would support infix notation for operations that normally use infix (+, -, <=, etc.), support having function names before parentheses, and not require as many parentheses. Mappable. There needs to be an obvious mapping to and from current s-expression notation, which must work for both data and code. The key advantage of Lisp-like languages is that you can easily manipulate programs as data and vice-versa; that must remain for any modified syntax. Otherwise, there’s no point; there are lots of other very good programming languages that support infix and other nice notations. General/Standardizable. It should be very general and standardizable across all systems that accept s-expressions (Lisp's original syntax); nobody wants to relearn syntax everywhere. It should be useful in Common Lisp, Scheme, Emacs Lisp, BitC, ACL2, DSSSL, AutoLISP (built into AutoCAD), ISLISP (standardized by ISO) BRL, and so on. A thread about guile noted this very need. Easily implemented (relatively speaking). It must not require tens of thousands of lines of code to do. However, if it takes a little extra code to produce nice results, that is fine; better to do things well once. It need not be easily implementable via a few tweaks of an existing reader, though that’d be nice. Yes, rewriting a reader is a pain, but those only have to be written once per implementation. Note that even among implementation of a particular language there is often much variance, so the format needs to be simple enough to support many implementations. Quote well. In particular, for both forward quote (’) and backquote/quasiquote (‘), it must be easy to find the end of the quote, and for backquote/quasiquote, it would be very desirable to support initial comma (,) and friends to locally reverse the quoting (the whole point of quasiquoting). Backward-compatible. Ideally, it should be able to read regular s-expressions (at least normally-seen formats of them) as well as the extensions. I’m willing to give a little on this one where necessary. Note: Version 0.2 of sweet-expressions is not perfectly backward-compatible, but most typical s-expressions as people actually use them are also valid sweet-expressions. Work with macros. There will probably be some tweaks for indenting and infix notation (especially since infix reorders things!), but macro processing should continue to work in most cases.

My notion is that the underlying s-expression system would not change... instead, the system would support a reader that takes an extended notation and converts it into s-expressions. A printer could then redisplay s-expressions later in the traditional notation, or same kind of notation used to input it. It would also be nice if the notation was not confusing or led to likely errors.

I’ve written this document to put at least a few ideas down in writing. In the long term, I suspect what needs to happen is for there to be some sort of “neutral” forum where ideas can be discussed, and code shared. For any syntax to be widely used, there must be trustworthy, widely-usable implementations. I think at least one FLOSS implementation with a generous license that permits use by proprietary programs and Free-libre / open source software (FLOSS) programs. Note that the LGPL doesn’t work as intended with most Lisp implementations; Franz has created the Lisp LGPL (LLGPL) which is a specific clarification of the LGPL for use with Lisp. Lisp code is typically licensed under the LLGPL instead of the LGPL, since the LLGPL clarifies some otherwise sticky issues and ambiguities in the LGPL. Note that any FLOSS software should be GPL-compatible, since there is so much GPL’ed code. The implementations would need to be widely portable and modular, so that they can be widely depended on.

Scheme and Common Lisp aren’t really compatible at all. Translators like scm2cl (Scheme to Common Lisp translator) could help initially, certainly to get started (suggesting that it'd be best to start with Scheme, and then transition at least an initial version to Common Lisp). But in the end, specialized implementations well-tuned to different environment will be necessary for an improved syntax to really work.

There are lots of Lisp code repositories and other Lisp-related sites, including Common-Lisp.net, Lisp.org / Association of Lisp users (ALU), CLiki (Common Lisp Wiki), and schemers.org.

Past Work

Special syntax for various constructs

Most readability efforts focus on creating special syntax for every language constructs; these often end up unused (because they cannot keep modifying the grammar to match the underlying system), or end up creating a completely new language less suitable for self-analysis of program fragments.

“The Evolution of Lisp” lists many efforts to create "Algol-like notations for Lisp", which generally included infix notations and attempts to be more “readable”. Originally Lisp was supposed to be written in M-expressions, which were more traditional in format. Function calls were written as F[x;y...] instead of (F x y...), for example. One problem is that they kept adding new syntax, and never found a good “final” M-expression format, so it was never implemented... and it just receded into the never-finished future. Many other notations were developed, including those of LISP 2. These generally had if ... then ... else and other more traditional naming conventions.

For example, RLisp (used by Reduce, among others) is a Lisp with an infix notation. A yacc grammar for RLisp by A. C. Norman (2002) is available. Norman notes that the grammar is “ambiguous or delicate in several areas”:

It has the standard “dangling else” problem. If R is a word tagged as RLIS, then R takes as its operands a whole bunch of things linked by commas. At present I have this grammar ambiguous on R1 a, b, c, R2 d, e, f; where R2 could (as far as the grammar is concerned) be being given one, two or three arguments. This problem arises if the operands of R may themselves end in an R. This is harded to avoid than I at first thought - one might well want conditionals in the are list of an R, but then R1 a, IF x THEN R2 b, c; comes and bites. I guess this is a “dangling comma” problem. The above two problems are resolved by the parser genarator favouring shift over reduce in the ambiguous cases. “IN”, “ON” are both keywords, as used in for each x in y do ... and words with the RLISTAT property. This is sordid! Similarly “END” has a dual use. This is coped with by making special provision in the grammar for these cases.

One trouble of many of these notations is that it becomes difficult to see where the end of an expression is (e.g., there might be no way to indicate the “end” of the if statement); this can creates ambiguities and makes it harder to easily match the infix notation and s-expression if you need to. And, if you want to describe arbitrary Lisp s-expressions, not having an end-marker means that you may not able to access some of the capabilities of the underlying s-expressions (though that may not be an issue for all uses).

Using a hierarchy of Domain Specific Languages in complex software systems design by V. S. Lugovsky discusses using Lisp to create domain-specific languages, including syntactic transformations.

The ACL2 language is Common Lisp-based, but it has a separate front-end for a more traditional interface including an infix processor; it is named IACL2. This is not often used, for several reasons:

ACL2 is only defined in s-expression form, so you cannot read any of the documentation or examples without knowing s-expression form anyway. IACL2 is not as portable as ACL2, nor as supported. IACL2 does not support all the capabilities as ACL2 -- and who wants to use a tool that is known to not work when you most need it?

Logo is basically Lisp with an infix and more readable syntax. Instead of “(”...”)”, Logo uses “[”...”]”. Normally, all commands begin with the name of the function, just like Lisp, and Logo even has text names for math functions: sum, product, difference, and quotient. More interestingly, infix is also available... by using symbols, Logo automatically uses the infix forms instead. Logo knows the number of parameters for each function, so once the number of parameters is provided you can just provide another function call on the same line without any marking of the end of the call (see the “rt” call below). This is not as flexible as s-expressions, where you can always add another parameter. Here’s a sample Logo program:

to spiral :size if :size > 30 [stop] ; a condition stop fd :size rt 15 ; many lines of action spiral :size *1.02 ; the tailend recursive call end

Dylan is another Lisp with more conventional notation, including an infix format. Here’s a Lisp-to-Dylan translator, exploiting the Common Lisp pretty-printer. D-Expressions: Lisp Power, Dylan Style even shows it is possible to combine infix forms with Lisp’s abilities to manipulate programs. Dylan is a little wordy, in part because of namespace problems (types are in the same namespace, which is often not what you want), but it’s easy to read. It ends blocks with “end blockname”, e.g., “if (a) b else c end if”; a little long but clear. Here’s a simple example from the page Procedural Dylan:

define method distance (x1 :: <real>, y1 :: <real>, x2 :: <real>, y2 :: <real>) => distance :: <real>; sqrt((x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1)) end method distance;

There is also CGOL (an Algol-like language that compiles into Common Lisp); this was developed originally by Vaughn Pratt, and written by a UC Berkeley graduate student, Tom Phelps. This program is a Common Lisp implementation of CGOL that is translated into Lisp before execution. I do not know what its license is. You can look more generally at the CMU archive, though beware of licensing problems.

Clike (see also Clike on Sourceforge) is a compiler which converts code written in a simple C-like language to Scheme. Pico is an educational Scheme with C-ish syntax.

TwinLisp is a "new way of programming in Common Lisp"; it creates a new reader with a more conventional syntax, which it then translates to Common Lisp and executes. It predefines precedence - which unfortunately means that it is tied to a particular interpretation of all operations. It also defines special control structures; thus, like most past efforts to create "readable Lisp", it ends up sacrificing generality of the resulting format (the reader is rigged to a specific set of operators with special syntax - so creating new operators is not simply a matter of defining the meaning of the operator).

Lisp resources gives pointers for where to go for more information. For extensions specifically focused on indentation, see the text below on indentation; for extensions specifically focused on infix notation, see the section on infix.

Skill

Skill from Cadence is a proprietary Lisp-based extension language. By examining that paper, this text, and this document, a hint of its syntax can be inferred. Skill supports name-prefixing, where FUNC(x y) is treated as meaning (FUNC x y). Skill also supports infix processing, automatically translating infix operators into an internal prefix format. In Skill, it appears that all of the prefix functions have alphabetic names, e.g., "(plus 3 4)" adds 3 to 4. This means that infix operators can be unambiguously detected and used without surrounding whitespace; thus "3+4" is automatically translated to "(plus 3 4)". Skill does not consider indentation significant.

I point out Skill specially because it does shows that it's possible to have a very general syntax that is easier to use.

Ioke

Arc and its influence: Syntax as Abbreviation

Paul Graham began devising a Lisp variant named Arc, and promulgated the term Syntax as abbreviation for his approach. Here’s what he said:

Common Lisp and Scheme only directly support s-expressions; “disadvantage: long-winded”.

Dylan and Python have s-expressions hidden underneath, “disadvantage: macros unnatural”.

Arc has inspired many people to re-evaluate Lisp syntax, and see if there are ways it can be made more readable. Some comments on Arc made some interesting points:

Peter Norvig said, “For the Scheme I wrote for Junglee’s next-generation wrapper language, I allowed three abbreviations: (1) If a non-alphanumeric symbol appeared in the first or second element of a list to be evaled, then the list is in infix form. So (x = 3 + 4) is (= x (+ 3 4)), and (- x * y) is (* (- x) y). And (2), if a symbol is delimited by a “(”, then it moves inside the list. So f(a b) is (f a b), while f (a b) is two s-exps. And (3), commas are removed from the list after infix parsing is done, but serve as barriers to infix beforehand, so f(a, b) is (f a b), while in f(-a, b + c), each of -a and b + c gets infix-parsed separatly, and then they get put together as (f (- a) (+ b c)). This seemed to satisfy the infix advocates (and annoy some of the Scheme purists). You might consider something like this.” Here are some of my thoughts about this: Rule 1 (the check on the first and second elements of the list) is a little hackish, and by having a check on the first element too, you can’t use usual Lisp notation at all for infix operators. Rule 3 (removing commas) is simple. Amazingly enough, this doesn’t interfere with quasiquoting as long as commas are always required as function parameter separators; since you cannot have a null-length parameter, any parameter beginning with a comma has a comma for lifting! But that doesn’t mean that the resulting code is easy to read; all those extra commas can make it hard to find the comma that is doing the lifting, and unless prefix operators are allowed, they aren’t needed for parameter separation... they just create more syntactic noise. So I haven’t gone further with this, though others could. Rule 2 is a neat idea; there’s the minor risk of bad spacing causing problems, but it is a trivial way to get more traditional functional notation without harming the syntax (functions cannot have “(” in their name anyway). Indeed, I think there’s a good idea here, as I discuss below in “Name-prefixing”.

Sudhir Shenoy’s “ideas from Perl” said: “Please don’t use parentheses for s-expressions (’[’ and ‘]’ or even ‘{’ and ‘}’ would be preferable. The biggest complaint I have about parentheses is that it makes infix mathematics (which you are planning to introduce in Arc) really hard to read. e.g. (def interest (x y) (expt (-r * (t2 - t1) / (days-in-year)))) reads really badly when compared to [def interest (x y) [expt (-r * (t2 - t1) / [days-in-year])]]. The overloading of meanings of parentheses (grouping + s-expression) which isn’t a problem in Lisp may cause a loss of readability in Arc.”

Kragen Sitaker’s email “two-dimensional syntax for Lisp” summarizes some ideas from Paul Graham and Arc, but the author seems to add his own points too. He says:

”Paul Graham notes that you can infer some parens in Common Lisp from indentation and newlines. In particular, if a line contains more than one s-expression, it must be a prefix of a list of those s-expressions. If a sequence of lines is indented from the previous text line, it must be a continuation of that line’s s-expression... All of this leads to needing extra parens around invocations of parameterless methods. Oh well.” He explicitly assumes that adding parentheses will not disable indentation processing.

disable indentation processing. ”In general, infix-to-prefix transformation should be relatively simple, reversible, and allow mixing of infix and prefix expressions; infix operators can’t be infix if they’re the first element of a list anyway. However, infix expressions could be valid prefix expressions; how about (reduce + mylist), for example? OCaml solves this problem by adopting the infix interpretation where there is ambiguity, requiring parens around infix operators used as values. This probably isn’t the best solution for a Lisp, but you could probably require (function +) instead of (+) and get away with it. I’d like to be able to use at least (+ - * ** / mod) infix, with their usual precedence; and a left-associative : for cons would shorten a heck of a lot of Lisp programs.”

“On multiline string syntax: it’s ridiculous to include the leading spaces on successive lines. Spaces to get subsequent lines of the string in to where the first line of the string started should not be included in the string; their absence should simply be a syntax error.”

“These syntax rules still don’t help cond much, either the Arc variant of cond or the traditional one.” He suggests borrowing McCarthy’s syntax from the 1960s: “condition => consequent”. I (Wheeler) think that using “=>” this way would be a bad idea, at least in Scheme; Scheme already uses “=>” for a completely different meaning inside “cond” constructs. Another operator could work, though finding a clear one may be more difficult. Often “->” means “implies”, and having to similar-looking operators with different semantics is a bad idea anyway. Actually, I think that when you add indentation (see below), constructs like “cond” actually look pretty reasonable, so this is may be best solved a different way.

A conversation about Arc and related ideas is posted on reddit.

BitC version 0.9 uses this “syntax as abbrevation” approach. You can indicate that “Value” has type “Type” using the quasi-function “the”, e.g., as (the Type Value), but the preferred form is “value : type”. It does not go far right now, though; + is still prefix (not infix).

Infpre is a Common Lisp infix utility (LGPL license).

Quack is yet another Lisp derivative. This adds two syntactic items. First, "postfix colon syntax" - colon with indentation as a shortcut for parentheses. Second, "infix colon syntax" - a colon without spaces around it are a shortcut for function/macro call, right-associative, so cdr:cdr:obj is (cdr (cdr obj))). He doesn't know if they're workable, and I don't find the examples convincing. Indentation has promise, but the colon ends up in a bad place and it's completely unnecessary. And while an infix operator like the infix colon is useful, making that the only infix operator is rather weak; where's 3 + 4?

Lisp sans parentheses hell discusses some similar ideas:

1. We use () to indicate grouping. ((((a)))) will mean the same thing as (a). 2. We use right associative infix operator notation - a * b + c translates to (* a (+ b c)) in lisp. 3. We use indentation (tab-size = 4) to convey grouping - say hello becomes (say hello (to lisp)) to lisp 4. We use square brackets notation to encode lists - for instance - [a,b,c] means (a b c) The items in a list can also be separated by line breaks. So [ one two means (one two three) three ] Since indentation is significant, [ one two means ((one two) three) three ] There are a few special operators that help reduce the need for parentheses and aid readability - 1. a : b translates to (a . b) and due to right associativity, a : b : c translates to (a b . c) 2. a :: b translates to (a b) 3. a := b translates to (define a b) Here are the cases where you won’t need parentheses - 1. Your expression starts on its own line. In this case, the first term is the head term and the rest of the terms on the same line are tail terms. 2. Your expression is the second argument to an infix operator. For example - mid :: quotient (low + high) 2 is equivalent to mid :: (quotient (low + high) 2) Exceptions - 1. Use func() notation to indicate that you mean (func) (a function call) and not just the value of func. Limitations of current implementation - 1. Back-quoting not supported. 2. The use of comma separator outside of list expressions is not defined and handled well enough. Enjoy! ... and, for the record, I’ll stick to the parentheses, thank you :)

So now, let’s turn to three areas that have great promise, and then see if we can combine them together.

Lisp programs are normally presentated using indentation; the Common Lisp standard even includes a pretty printer! Experienced Lisp programmers eventually stop seeing the parentheses, and see the structure of a program instead as suggested by indentation. So why not use indentation to identify structure, since people do anyway, and eliminate many unnecessary parentheses?

Other languages, including Python, Haskell, Occam, Icon, use indentation to indicate structure, so this is a proven idea. Other recently-developed languages like Cobra (a variant of Python with strong compile-time typechecking) have decided to use indentation too, so clearly indentation-sensitive languages are considered useful by many. In praise of mandatory indentation notes that it can be helpful to have mandatory indentation.

Past work on indenting to represent s-expressions

Paul Graham (developer of Arc) is known to be an advocate of indentation for this purpose. As I noted above, Kragen Sitaker’s notes on Graham and Arc discusses how indentation can really help (in this notation, functions with no parameters need to be surrounded by parentheses, to distinguish them from atoms - “oh well” ). Graham's RTML is implemented using Lisp, but uses indentation instead of parentheses to define structure. RTML is a proprietary programming language used by Yahoo!’s Yahoo! Store and Yahoo! Site hosting products, though Yahoo are transitioning away from it. Paul Graham’s comments about the RTML language design and this introduction to RTML by Yahoo.

Darius Bacon's ”indent” file, includes his own implementation of a Python/Haskell-like syntax for Scheme using indentation in place of parentheses, and in that file he also includes Paul D. Fernhout's implementation of an indentation approach. Bacon's syntax for indenting uses colons in a way that is limiting (it interferes with other uses of the colon in various Lisp-like languages). I have not had a chance to examine Paul D. Fernhout's yet. (It also includes an I-expression implementation.) All of the files are released under the MIT/X license. (Darius Bacon also created mlish, an infix syntax front end listed earlier). Lispin discusses a way to get S-expressions with indentation.

I-expressions

I-expressions are an alternative method for presenting s-expressions (either program or data), using indentation. They are defined in SRFI ("surfie") 49; this has final status, making I-expressions a quasi-official part of Scheme. I-expressions have no special cases for semantic constructs of the language. SRFI 49 includes a sample implementation with an MIT-style license (based on the Sugar project).

Here’s an example, quoted from the Scheme Requests for Implementation (SFRI) number 49 (this example uses Scheme’s “define” function, not Common Lisp’s “defun” function):

define fac x if = x 0 1 * x fac - x 1

I-expressions can include traditional s-expression representation too; here's an example (using Scheme):

define (fac x) if (= x 0) 1 * x fac (- x 1)

(define (fac x) (if (= x 0) 1 (* x (fac (- x 1)))))

The keyword “group” is used to begin a list of lists. (One minor drawback: it's more difficult to use a function named "group", so it's best to simply not create such a function.) You can drop back several indentation levels without dropping them all:

let group foo + 1 2 bar + 3 4 + foo bar

The SFRI permits both tabs and space characters. I-expressions can also be quoted (including being quasi-quoted); see the proposal for information. Note this means that ’ followed by whitespace is now the beginning of a quote, because the initial whitespace is significant, and newlines become important.

In any indentation system for s-expressions, you need to be able to express s-expressions such as (f a), (g (h 1)), and (j (k)). In the first case, the first parameter is simple symbol, in the second case, the first parameter is a call with at least one parameter, and in the third case, the call to k has no parameters at all. I-expressions represent 0-parameter function calls by having the parameter surrounded with (...) (or having () as a name-ender). Let’s see what the indentation rules by themselves look like:

; Here is (f a): f a ; or f a ; Here is (g (h 1)): g h 1 ; or g h 1 ; or g (h 1) ; Here is (j (k)), note the treatment of 0-parameter calls: j (k) ; or j (k)

The idea is simple, but it can be hard to reason about how it works -- so here is one way to think about it (if you're using this to write a program). On any line, the first term is the function to be called; the rest of the items on the line, and each indented line below, are its parameters. If a line has no other parameters (on the rest of the line or indented), then it's an atom; otherwise, it's treated as one expression and parentheses surround that line and its indents (if any). The “group” term creates an extra surrounding (...) in the resulting s-expression, so you can create lists of lists.

So how do you present computing a function and then calling it? Personally, I’d just switch to traditional s-expression notation in this case. But the I-expression representation actually isn’t bad; just use a “group” followed by how to compute the function to be called:

; ((getfunction x) a b) can be represented as: group getfunction x a b ; or: group (getfunction x) a b

The SFRI supplies Guile code (not fully portable Scheme code) to implement I-expressions (e.g., it uses define-public and not define). However, it should be easy to port.

In the I-expression sample implementation a “(” disables I-expression processing until its matching “)”, and this is an intended part of the definition of I-expressions. This has all sorts of wonderful side-effects:

I-expressions parsing becomes very safe to use with existing code - pre-existing oddly-indented code will almost certainly start each expression with an opening parenthesis, disabling indentation processing. It supports dealing with text that is very close to running off the right-hand side; just use parentheses to disable indentation processing. Python does the same thing.

There are a few special cases where the reader with I-expressions enabled parsed a file differently. When two top-level s-expressions follow each other, either (1) on the same line or (2) on consecutive lines with the second expression indented, the lines will be combined by an I-expression reader. The first seems unlikely to me; the second is a little unfortunate. What's worse, though, is that in a read-eval-print loop, users have to enter return twice to see their results, and it's easy to end up trying to evalute an empty list. We'll discuss those below in possible extensions to I-expressions.

The mailing list discussion of I-expressions includes a posting of an I-expression pretty-printer; the author says "I just couldn't get to sleep, so I wrote one". Be warned: I believe this pretty-printer has a bug, because it does not handle (f (g)) correctly. The first pretty-printer I wrote for I-expressions in Common Lisp had the same error, too, so this is a common mistake when doing this task - be warned. The expression (f (g)) should print like this:

f (g)

I believe that users who use indenting expressions should only use space characters, and never tabs, to indent. The R5RS Scheme specification doesn't even officially permit tab characters (though implementations generally do). The real problem is that tabs produce a varying number of spaces on real systems; experience with Python suggests that using tabs can cause a lot of problems. Most of today’s text editors can be configured to turn tabs into spaces automatically.

Possible extensions to I-expressions

I-expressions and similar indentation approaches are a very nice approach. But there are several ways it could be extended or modified, now that I've had a chance to experiment with them.

Immediate execution if begin on left-hand-edge

If you use indented I-expressions interactively (in a read-eval-print loop), you have to enter an extra a blank line (an extra "enter") before the expression is evaluated. That's because the system needs to know when an expression has ended; the system believes you might indent the next line and add more expressions. This is fine for multi-line expressions, but it's very annoying for simple single-line expressions. If you do a lot of simple one-line commands this becomes very annoying. And this is likely; few people type in long, multi-line definitions into a read-eval loop, because they would stick those into a file instead!

So how can we make indenting easy to use at a command line? We could create a special command for use at the beginning of a line, such as "! ", that means "at the end of the line, you're done" . Or we could create a special indicator that means "execute now" . But these don't seem any easier than entering a blank line, and they are yet something else to remember.

One simple modification could be to add a special case, based on whether or not text was entered in column 1. There are several obvious variants:

If leftmost edge has a "(" , and the entire expression is immediately followed by a newline, then the expression is complete. This has lots of problems; this doesn't handle evaluating atoms, such as running "x" to see what x contains. So I'll reject that. If text begins at the leftmost edge (no leading horizontal space), and the following term is completely read, followed immediately by a newline, then the expression is complete and the read function returns. Thus, if these are entered at the leftmost edge, they will immediately return: x (load "file") load("file") (3 + 4) However, the following will not (they will wait for a blank line) if entered on the leftmost edge: load "file" define x 3 + 4 The system will wait for a blank line in these cases, because the first term is followed by a space, not a newline... so it is waiting to see if you'll add more parameters by indenting them. The 'define x' is an example of good news (probably), and the 'load' command is an example of the bad news. Another problem is that the rule is a little complicated to explain. The good news is that 'define x' on the left edge will not trigger an immediate return. With these semantics, if you want to enter an indented expression, you'll need to indent the first line or include at least one parameter (expression) on the same line. If you indent at all on the command line, or enter one or more terms on one line, you must type an extra blank line to execute the sequence; this seems reasonable. If text begins at the leftmost edge (no leading horizontal space), and there are no unmatched parentheses, it is immediately completed at newline. In this approach, you must indent the first line to have an indented expression. This means that if these were entered on the left edge, they would be executed immediately: load "file" 3 + 4 define x The good news with this approach is that 'load "file"' on the left edge works as a user might expect - it will trigger an immediate return. The same is true for 3 + 4 ... suddenly an interactive Lisp interpreter makes a plausible calculator! The bad news is that 'define x' on the left edge followed by return will also trigger an immediate return of the read expression, which is almost certainly not what was intended. That is not such a big deal in an interactive command loop, since the user could try again. But more seriously, this would be a likely thing to do in a file, and it could cause very hard-to-detect bugs. How can we counter the risks of not noting the change in semantics for text on the left edge? It might be possible to have a special loop for file-reading that detected the case of "unindented expression followed immediately by indented expression", and trigger a warning or error. Another variant would be to have two slightly different modes for read (or two different starting functions): one intended for interactive use, where unindented lines are ended on newline where possible, and one intended for file reading (where there is no such distinction). This does complicate reasoning and implementation, and it's quite normal to cut-and-paste into an interactive session. One promising approach to countering the problem is to configure other tools to help detect such problems. Text editors, for example, could be colorize such constructs. Thus, if a left edge that would run immediately is followed by an indented line, it would be labelled with red lines. And special tools could be designed to detect this as well. One good point is that the rule is simple: "if you want to indent, you must start by indenting the first line". When you indent, you enter a blank line or another expression at the same indentation level. Frankly, that's much simpler than the rules above. If text begins at the leftmost edge (no leading horizontal space), and there is exactly one term on the line or it is a legal infix expression, then the expression is complete and the read function returns. Thus, if these are entered at the leftmost edge, they will immediately return: x (load "file") load("file") (3 + 4) 3 + 4 3 + (4 * 2) And these will not: define x load "file" 3 + 4 * 2 3 + 4 +

For sweet-expressions, for a long time I keep going back and forth between option 2 and option 3. Option 3 presents the risk of subtle bugs, and that is of great concern. However option 3 is a joy to work with interactively. I then said, "This is a hard choice to make, so I plan to experiment. Expressions like load("filename") and (3 + 4) seem particularly easy to explain, and do not have the drawbacks of this alternative, so they suggest using alternative 2." It was only in November 2007 that I finally realized that what I wanted was option 4.

Note that when activated these also partly eliminate one of the minor incompatibilities of I-expressions with traditional s-expressions. If there are two separate outermost s-expressions on consecutive lines, with the second expression indented from the first, as long as the first expression was not indented (a likely case) the lines will no longer be combined by the I-expression reader. Of course, this could be misleading, because the second line might appear to be combined with the first; this is a trade-off with no perfect answer.

Ignoring excess blank lines

Excess blank lines get interpreted oddly, as '(), which the underlying system may then try to run (Scheme in particular complains about them). It can also cause confusion if interpretation gets "out of sync".

A plausible modification would be to ignore any additional blank lines before a (next) expression. In short, a blank line shouldn't really return '() to the system; if they want that, users should need to enter that directly. If there blank lines at the end of the file, the system should probably return the end-of-file marker (if it supports that) once it has consumed the final blank lines trying to read the next expression.

An alternative would be to ignore any additional blank lines after an expression, but you don't want to do that. Then you need to treat the beginning of a file specially (assuming that there was a beginning of file).

Also, should horizontal spaces followed by newline be treated the same as a blank line? Users cannot easily see the difference, especially on typical printouts. That isn't clear, so I leave that as a question for now.

Indentation, a wrap-up

Lisp’s standard notation is different from “normal” notation in that the parentheses precede the function name, rather than follow it. Jorgen ‘forcer’ Schaefer argues that this is a more serious problem than the lack of infix notation; on July 2000 he said “I think most people would like Scheme a lot better if they could say lambda (expression) ... instead of (lambda (expression) ...”

Peter Norvig had some interesting ideas, as noted earlier. Let’s look at one of his rules, the rule that says “if a function name ends with an open parentheses, move it inside the list (when converting to an s-expression)”. This means that “(fact x)” and “fact(x)” will mean the same thing.

Obviously, this is trivial to parse. We don’t lose any power, because this is completely optional -- we only use it when we want to, and we can switch back to the traditional s-expression notation if we want to. It’s trivially quoted.. if you quote a symbol followed by “(”, just keep going until its matching “)” -- essentially the same rule as before! Technically, this is a change from some official Lisp s-expression notations and implementations. For example, entering “a(b)” into CLisp (a Common Lisp implementation) is the same as “a (b)” -- its parser tries to return the value of a, followed by running the function b. But it’s not clear it’s a big change in practice; commonly accepted style always separates parameters (including the first function call name) with whitespace. So normally, what follows a function call’s name is whitespace or “)”, and this is enforced by pretty-printers. Thus, many large existing Lisp programs could go through this kind of parsing without resulting in a change in meaning!

Does this help? Let’s rewrite the CL factorial example, but not make the infix operations do this:

defun(factorial (n) if((<= n 1) 1 (* n factorial((- n 1)))))

It looks slightly more familiar, but not that much. Let’s try again, but move the infix ops out too; the result is actually not bad:

defun(factorial (n) if(<=(n 1) 1 *(n factorial(-(n 1)))))

If we really wanted it to look conventional, we could use wordy names instead of symbols that are traditionally infix; that isn’t too horrible for + (use “sum”), but that is rather wordy for others (I don’t like it):

defun(factorial (n) if(lessequal(n 1) 1 product(n factorial(subtract(n 1)))))

Note that as far as I can tell, you do not need any whitespace after the opening parentheses, because atoms (including function names) cannot have parentheses in their name in s-expressions. A variant of this idea would be to require whitespace after the opening paren if name-prefixing is used, but I see no need for that.

Skill from Cadence, a proprietary Lisp-based extension language, also supports name-prefixing.

Mathematica does something similar. As noted in How Purely Nested Notation Limits The Language's Utility, its FullForm notation can be transformed into Lisp using simple rules, starting with transforming f[a,b] into (f a b). But it isn't designed to be backward-compatible with existing Lisp notations; its use of "," to separate arguments would cause confusion since "," is also used in macro handling.

Name-prefixing can be combined with indentation (e.g., I-expressions). Let’s presume that if you indent further and begin the expression with a parentheses, or by a function name, no space, and a parenthesis, that parenthesis and its mate are silently ignored (they don’t add yet another subexpression in the s-expression). Furthermore, let’s presume that if you have a name-ender format (function name followed immediately by open paren), it does not switch to some “no more I-expression” mode. Then you could do this (using English names for operators):

defun factorial (n) if lessequal(n 1) 1 product n factorial(subtract(n 1))

defun factorial (n) if <=(n 1) 1 *(n factorial(-(n 1)))

It turns out that name-prefixing works well with indentation. Here are some examples:

; Here is (g (h 1)) <=> g(h(1)) g h(1) ; or g h(1) ; Here is (g (h)) <=> g(h()) g h() ; or g h() ; Here is (g (a) (b 1) c): <=> g(a() b(1) c) g a() b(1) c ; or g a() b(1) c ; Here is (g a (b 1) c()) <=> g(a b(1) c()) g a b(1) c() ; or g a b(1) c()

This does introduce the risk of someone inserting a space between the function name and the opening “(”. But whitespace is already significant as a parameter separator, so this is consistent with how the system works anyway... this is not really a change at all.

I think this is slightly better for untrained eyes. It’s hard to argue that this is a major improvement, at least by itself, but the more I look at it the more I like it. What’s more, the pain is tiny, and that’s a good thing. This is a lower pain, lower gain approach, and it can be combined with SWP-expressions, which I’ll describe in a moment.

The article Improving lisp syntax is harder than it looks discusses name-prefixing, but I think it makes a number of errors. It first claims that this would be hard to integrate with macros - this actually isn't true if it's built into the reader (and the macros aren't doing reading themselves). If the reader transforms a(x) into (a x), then when the macro has a chance to run all it sees is (a x) - exactly what it was expecting to see. He also says that "Another disadvantage to this change of syntax is that it makes functional programming much more odd looking. Lets say you have a list containing functions and you want to call the first one. In Scheme you write ((car lst) params) and in Common Lisp (funcall (car lst) params). However in our new syntax it looks like: car(lst)(params) and funcall(car(lst) (params)). Neither of these is very elegant, and it only gets worse if that call in turn returns a function, which would look like: car(lst)(params)(params2) and funcall(funcall(car(lst) (params)) (params2))." But I find this remarkably elegant, and better than the traditional notation - to do functional programming, just cuddle up the parentheses. It's much easier to understand sequential parentheses compared to a deeply nested list.

Again, let’s continue the thought that we want to support a syntax that is maximally Lisp-like, generally accepting existing expressions. The biggest issue in making Lisp s-expressions easier to read are in whether or not to provide infix support, and if so, how.

The first question is, should you support infix at all? Because if we do, and we presume that the underlying s-expressions always have the operation first, this means that we are changing the external presentation of (some) data. Note that we are only changing the presentation for entering programs and program-like data, and possibly changing how they are externally displayed; the traditional s-expression would still be used internally. Thus, if “{3 + 4}” is interpreted as an infix expression, it will quietly be transformed to the s-expression “(+ 3 4)”, so the “car” (head) of “{3 + 4}” would be “+”. The ll1-discuss mailing list includes discussions on the issues of infix.

The reality is that nearly everyone prefers infix notation; people will specifically avoid Lisp-based systems solely because they lack infix support built-in. Even Paul Graham, a well-known Lisp advocate, admits that "Sometimes infix syntax is easier to read. This is especially true for math expressions. I've used Lisp my whole programming life and I still don't find prefix math expressions natural." Paul Prescod remarked, “[Regarding] infix versus prefix... I have more faith that you could convince the world to use esperanto than prefix notation.” Nearly all developers prefer to read infix for many operations. I believe Lisp-based systems have often been specifically ignored even where they were generally the best tool for the job, solely because there was no built-in support for infix operations. After all, if language creators can’t be bothered to support the standard notation for mathematical operations, then clearly it isn’t very powerful (as far as they are concerned). So let’s see some ways we can support infix, yet with minimal changes to s-expression notation.

If we are willing to support infix, there are many options. The big issues are:

How do you determine when to use infix notation? Is this manually specified (if so, how, and how deep does it go?) - or is it automatic (if so, how, and how do you override it?). What are the legal infix operators? What are the semantics? (Do you do something trivial, like swap the first and second parameters, allow many parameters for the same operator, or go all the way to full infix with precedence?)

There are some code samples that might be used as a starting point for implementations; after going over the issues, I identify some of them.

Nathan Baum (of Bournemouth, United Kingdom) posted on 4/14/2006 8:19:16 PM the idea of using [...] to surround infix notation (I’m sure he’s neither the first nor last). He said, “Lisp actually has a few syntactic shortcuts of its own. Essentially anything that isn’t part of a name is a character which triggers a specialised reader function. Strings, for example, don’t need to be built in to the Lisp core: a reader can be associated with the double quote character, and that’ll return a string object. This also means you can redefine it yourself to get shell/Perl/Ruby-style interpolated strings, for example. Even lists themselves are read using a reader function triggered by [parentheses]. One simple shortcut is to have ‘normal’ Lisp expressions delimited by [parentheses], and infix expressions delimited by brackets...

(print (+ x (- (/ y z) a))) ; or (print [x + [[y / x] - a]] ”.

How do you determine when to use infix notation?

There are several major options:

Not automatic (special syntax) . If infix is not automatic, you add characters that say “make this infix” , using pairs such as {...} or [...] around infix operators, or use single expressions before the first or second parameter. The single expression might be single characters, like ? or !, or multiple characters, like the dispatching macro characters (#...). I don’t recommend “!” because it looks too much like “not” in other langauges. Using something short is a good idea, because infix is very common. The Lisp FAQ mentions #I being used as the “Portable Infix Package” marker (I have searched and not found out more about this; suggestions would be appreciated). (This discussion mentions an author using #I(3+4) to notate infix, and remarks that [3+4] would have been simpler.) Requiring a syntax marker is not as “pretty” as automatically notating infix operators, though it has the advantage of clarity if you process the resulting s-expressions often. Kantrowitz’s infix macro package for Common Lisp is an example of this approach. Pairs of characters can force infix interpretation, and seem like the obvious idea, but they do have some disadvantages. Infix operations can work across a list, so it makes sense to mark the end of such a list and use a special pair of characters like {...} instead of (...) to mark an “infix list”. There are two pairs available: {...} and [...]. Advantages of {...} is that they are more commonly used for statement blocks in other programming languages (Logo being an exception), and in particular users may want [...] for other purposes (such as identifying lists, like BitC). A disadvantage of using {...} is that in many fonts these characters look extremely similar to (...), making them harder to see; the characters [...] are more distinct. But in either case ({...} or [...]), using a pair means you have to match the correct closing pair, so inside many parens you have to make sure you use the right ones in the right place. This can create much extra work, and it is debatable if they help much in error detection. Also, infix notation is really obvious when you see it... so there really isn’t a big need for a special marker to show its end. Combining this with name-prefixing does not seem hard; just accept “{” for name-prefixing as well. Combining the paired characters {...} or [...] with indented forms like I-expressions requires careful thinking about what you are doing. In particular, if you want to identify something as a parameter of a function f and switch to infix notation, do you indent inside function f? I think you should, so the system should only add one level of s-expression (from the indent) if the first character after the indenting whitespace is “{”, instead of the two that would otherwise be implied. In all other cases, other than the beginning of a line, a “{” would normally end up matching a “(” in the corresponding s-expression if you want to stay similar to usual s-expression syntax (if not, see the relaxed infix notation such as ZY-expressions, below). This means that you need to think carefully about the meaning of {...} (or [...]). Basically, don’t think that {...} create a matching s-expression (...). Instead, have the mindset that the infix operators (+, -, etc.) create the s-expressions, and that the surrounding characters {...} simply clarify that what’s inside has an infix interpretation. Also, do {...} disable I-expressions inside them (as I’ve proposed (...) do), or should a “{” at the beginning of a line only switch to infix without disabling I-expressions? I’m not sure what the meaning of indented sub-expressions would even be, if indentation still mattered. Thus, for the moment, I’ll assume that {...} disable I-expressions inside them, as I’ve proposed (...) do, for the simple reason of simplicity: if an infix expression is so complicated that it goes beyond one line, indentation like I-expressions are more likely to harm than help. Note that if it’s not automatic, there’s a follow-on question... how deep does infix go when it is enabled? Do the infix operators only work at one level (one list), or do they keep going down to lists contained inside? This question doesn’t come up if the detection is automatic, so let’s address that next.

. If infix is not automatic, you add characters that say “make this infix” , using pairs such as {...} or [...] around infix operators, or use single expressions before the first or second parameter. The single expression might be single characters, like ? or !, or multiple characters, like the dispatching macro characters (#...). I don’t recommend “!” because it looks too much like “not” in other langauges. Using something short is a good idea, because infix is very common. The Lisp FAQ mentions #I being used as the “Portable Infix Package” marker (I have searched and not found out more about this; suggestions would be appreciated). (This discussion mentions an author using #I(3+4) to notate infix, and remarks that [3+4] would have been simpler.) a syntax marker is not as “pretty” as automatically notating infix operators, though it has the advantage of clarity if you process the resulting s-expressions often. Kantrowitz’s infix macro package for Common Lisp is an example of this approach. Automatic. To work automatically, if some parameters look infix, then an infix interpretation is used. E.G., if presented with “(3 * 4)”, it is automatically interpreted as “(* 3 4)” internally. There are two major variants that I can see: (1) if the first or second parameter is an infix-type operator switch to infix notation (Norvig’s approach), or (2) only the second parameter can be infix to trigger infix processing (my approach). The first approach has the advantage that it can work with prefixed “-” traditionally, e.g., (- x * y) works, becoming (* (- x) y). But there are also many problems with that approach. For example, then it is much harder to use traditional Lisp notation (with the infix operator in front) at the same time, and it’s easier to get confused when looking at traditional s-expression notation. It is all too easy to accidentally trigger the infix notation when it wasn’t intended. And the traditional Lisp notation (- x) is not that hard to write or read; if you support traditional function notation, like -(x), it's even easier. Finally, if you only trigger on the second parameter, you can add a subtle safety check: the switch to infix notation can be made to only occur if there are three or more parameters. This safety check avoids unintentionally enabling infix in some cases, and makes it very simple to implement trivial quoting mechanisms: since expressions like “(self +)” will not be interpreted as being infix, using functions as parameters is simplified. In fact, you can go further on the safety check: You can say that infix lists have to have an odd number of parameters, and that the first parameter (as displayed) cannot be an infix operator (so a list of infix operators won't cause problems). I think the second approach is much better. For fully-traditional infix notation, the first is slightly better, and we’ll discuss that later, but for our purposes I don't think it's as useful. If it’s automatic, now you need to determine the legal pattern of infix operators, since they will trigger the infix interpretation. This is probably some sort of regular expression; we’ll discuss this below. You’ll also need a syntax to say when it is not infix at least (to disable the automatic part). Again, this can be identified using surrounding pairs such as {...} or [...] around non-infix operators, or using single expressions before the first or second parameter to declare that this is not infix. For functions, in many real-world use cases Common Lisp’s #’ notation or simply “(function +)” would do. Indeed, if at least three parameters are required for infix, any function that simply returns its second parameter could do the escaping. As noted below, I'm currently using "as(+)" aka "(as +)" to do this escaping. Automatic approaches, either way, are in some sense a little “hackish”.. the second parameter determines the order of parameters in the resulting s-expression! Yet it produces very nice-looking results, and I suspect that over time it will become workable. After all, in all other languages you have to look for infix operators too, so this actually not that strange a requirement. And the results (shown elsewhere) are startlingly clear. You might still have an optional “make this infix” syntax indicator as well, or at least a “warning, this is infix”, which can be used when printing. That way, when you read a result as infix, you’ll get warned that the expression is infix. If this is optional, you might still want to make it very short, since you may have to read a lot of them in printouts. For example, let’s say that #_ means “no infix” and is placed before the first parameter (mnemomic: “base case” ), while ? means “infix notation follows” is placed before the first (displayed) parameter (#^ might make a reasonable alternative). Thus, “(3 + 4)” , “(?3 + 4)”, and “(#_+ 3 4)” could all mean the same as “(+ 3 4”), and the “car” of all of them is “+” . Or, if [...] are the infix warnings, [x >= 4] is a way to hint that the s-expression is actually (>= x 4).

Of course, if the notation permits mixed infix and prefix notations, you can easily have a “manual” notation that just means “turn on autodetection all the way down from here” and end up having a combination of properties. Alan Manuel K. Gloria’s “infix notation macro” is a good example of a mixture - in this implementation, there must be an initial marker (nfx ...), but it then descends all the way down to all subexpressions. As a result, you could use the marker at a very high level, and have a result similar to automatic detection. A similar result could be had by having a function like “(enable-infix)” or “(sugar)” which you could insert at the beginning of a file as a top-level command (this could also enable indenting and the name-prefixing notation), which would then enable the infix operations (even at the topmost level) until specifically disabled. Using such a command would mean that everything below, for the rest of the file, would use automatic infix detection... but since there would be a specific a command to enable it first, it would be compatible with existing code (which didn’t enable infix). A command-line flag, or invoking a command with a different name (possibly via a different file extension), could enable automatic detection of infix notation.

What are the legal infix operators?

If detection of infix operators is automatic, now you need to determine the legal pattern of infix operators, since they will trigger the infix interpretation. Even if it’s manually triggered, if you permit arbitrary expressions, or want error-checking of the operators (”you said this was infix but there were none”) you’ll need to detect infix operators.

There are several options:

You could support only a fixed list (+, -, etc.). That is very inflexible, though.

You could start with a fixed list, and allow the user to specify which ones to add/override. The additions would probably include information on precedence and direction (right or left), and if so, these additions would need to be made before they are used. One big risk is that if the infix expressions are read before the commands to make the additions, the infix expressions would be misinterpreted.

You could define some sort of pattern. The pattern needs to be simple, so humans can easily remember it. This has the advantage of not depending on proper setup of a table of operators.

You could support default patterns and allow additions as well, if you wanted to. Or define a pattern, but it's an error to USE the pattern before declaring it (and maybe its precedence).

If we use a pattern, what is the pattern? Patterns are often expressed as regular expressions (REs), and an obvious RE for this purpose is: [+-\*/<>=]+. Basically, it has to be a set of one or more special characters. (One alternative, which might be nice, would be that it has to start with a special character, but after that, other characters are allowed; be wary that +h1 is an infix operator but +1 is not). This is enough to cover the 4 arithmetic expressions (+, -, *, and /), and all the usual comparisons (<, <=, =, >=, >). It’s also enough to cover other operations that people like to have as infix, such as “**” (exponentiate), “->” (implies), “<->” (if and only if), and so on. You can probably add “&”, which is enough to represent common representations of “and”. But there are some issues, especially with the characters not noted here:

Adding “:” is tempting, because “:” and “::” are sometimes used as infix operators (BitC uses “:” for declaring types). On the other hand, “:” is sometimes used as filler, so there are pluses and minuses to including it in the pattern. In Common Lisp, a “:” by itself is not a problem (it’s illegal normally), but the “:” separates package names from their symbols. In addition, in Common Lisp a blank package name (e.g., :x) is a keyword... so anything with a colon at the beginning or middle is probably going to be a problem. Similarly, a Scheme proposal (SRFI 88) proposed using names ending in “:” as keywords, with SFRI 89 using them for optional parameters. Thus, while a single colon (:) appears to be a fine name for an infix operator, more complex names with colons in them appear to be more problematic for backward-compatibility with existing Lisp-like systems.

One possible concession would be to say that “and” and “or” are specially recognized as infix operators; that’s a kludge, though all Lisp-like languages have “and” and “or” operators, so it would make sense. On the other hand, if infix operators automatically switch lists to infix notation, s-expressions of sentences like “(Jack and Jill)” would silently become “(and Jack Jill)” - and that seems dangerous. It especially seems dangerous if s-expressions are sometimes displayed with automatic infix notation without any indication that the parameter order has changed. So I think this is an unwise choice, even though at first it seems reasonable... but that still begs the question of how to represent "and" and "or", since these are incredibly common infix operations.

Lisp-like languages have “and” and “or” operators, so it would make sense. On the other hand, if infix operators automatically switch lists to infix notation, s-expressions of sentences like “(Jack and Jill)” would silently become “(and Jack Jill)” - and that seems dangerous. It especially seems dangerous if s-expressions are sometimes displayed with automatic infix notation without any indication that the parameter order has changed. So I think this is an unwise choice, even though at first it seems reasonable... but that still begs the question of how to represent "and" and "or", since these are incredibly common infix operations. Many languages use && or & to represent "and", and use || or | to represent "or". Adding “|” or “||” is tempting because it is a common notation for “or”, and in Scheme the character is reserved for possible future extensions anyway. But simply adding it to the list of allowed symbols is more problematic, because in Common Lisp the “|” already has a meaning - it escapes symbol text until the next “|”. So if “|” keeps its usual meaning, then “|” as an operator has to be written as “\|” in Common Lisp (yuk). But there's a clever trick here - the syntax "||" can always be used to mean "or", whether or not "|" has the special role of escaping symbol text. This would mean that in Common Lisp, the function's name would actually be an empty string (!)... but since its name would print as "||", it would look quite clear and understandable. This wouldn't be combinable with "=", (e.g., "+=" is a fine atom, but "||=" would immediately simplify to "=")... but that seems liveable. It would be possible to force a literal | as the real symbol, but this would look ugly; \| or \|\| is hideous for code reading. In Scheme and many other LISPs this would be a non-issue; the || would just be interpreted as a two-character name for an atom, which is fine. The & can be confusing in some contexts, too; it is sometimes interpreted specially in parameter lists. But as far as I can tell, & is not normally interpreted in any special way outside of parameter lists, and even there, && isn't normally accepted. It's certainly possible to use "^" to represent "and" (using ** for the power operator), but that still begs how to represent "or"; few keyboards provide the mathematical "or" symbol (an upside-down wedge). So && and || are probably reasonable text representations for logical "and" and "or"... which is certainly consistent with typical practice. Note that "and" and "or" are special forms in Scheme (and many other Lisps), so that they can short-circuit; this means that redefining them is often more complicated.

quite clear and understandable. This wouldn't be combinable with "=", (e.g., "+=" is a fine atom, but "||=" would immediately simplify to "=")... but that seems liveable. It would be possible to force a literal | as the real symbol, but this would look ugly; \| or \|\| is hideous for code reading. In Scheme and many other LISPs this would be a non-issue; the || would just be interpreted as a two-character name for an atom, which is fine. The & can be confusing in some contexts, too; it is sometimes interpreted specially in parameter lists. But as far as I can tell, & is not normally interpreted in any special way outside of parameter lists, and even there, && isn't normally accepted. It's certainly possible to use "^" to represent "and" (using ** for the power operator), but that still begs how to represent "or"; few keyboards provide the mathematical "or" symbol (an upside-down wedge). So && and || are probably reasonable text representations for logical "and" and "or"... which is certainly consistent with typical practice. Note that "and" and "or" are special forms in Scheme (and many other Lisps), so that they can short-circuit; this means that redefining them is often more complicated. An unfortunate oddity happens with Scheme: Scheme actually defines a “=gt;” syntactic marker in its “cond” processing, and it occurs as the second parameter . This means that if you are using the second parameter to automatically detect infix operations, you will almost certainly consider this construct to be an infix operation. Unfortunately, you cannot solve this by creating a new simple function => that can handle this (having only one form at this location changes the meaning of the construct). There are several solutions; two obvious ones are to require manual detection or to have a specific list of infix operations. Another solution is to specifically state that “=>” is excluded from the pattern of allowable infix operators, at least for Scheme. You can partly justify this on the additional grounds that -> and => are easily confused. It would probably be possible to solve this with a complex Scheme-unique macro for => that essentially restored its original meaning, though this will probably cause problems if an error occurs nearby; I think simply declaring that it's not an infix operator when processing Scheme is safer.

. This means that if you are using the second parameter to automatically detect infix operations, you will almost certainly consider this construct to be an infix operation. Unfortunately, you cannot solve this by creating a new simple function => that can handle this (having only one form at this location changes the meaning of the construct). There are several solutions; two obvious ones are to require manual detection or to have a specific list of infix operations. Another solution is to specifically state that “=>” is excluded from the pattern of allowable infix operators, at least for Scheme. You can partly justify this on the additional grounds that -> and => are easily confused. It would probably be possible to solve this with a complex Scheme-unique macro for => that essentially restored its original meaning, though this will probably cause problems if an error occurs nearby; I think simply declaring that it's not an infix operator when processing Scheme is safer. How do you represent assignment and field accessors, if the underlying language has them? The pattern supports "=", but if you use "=", then you have the problem that this is easily confused with equal-to. This is one of the big problems with C and C++ (confusing = with ==), and it'd be unfortunate to duplicate that mistake. In addition, if the system maps any infix name directly to the identical prefix function names, then you may not have a choice... in most Lisp-like systems, "=" is already an equality operator. The term "<-" is compelling for assignment, and "->" is compelling for field access, but the two do not go together well; just imagine deciphering "x -> a <- b -> c". Parentheses help, e.g., "(x -> a) <- (b -> c)" - but it's still a little awkward. One possibility is to use a different form for assignment, such as "<--" or "<==". The term "<==" is actually fairly easy to distinguish, so "x -> a <== b -> c" is easier to read, and "(x -> a) <== (b -> c)" is actually especially nice. Another possibility is to use "<-" for assignment, and use [...] as a field accessor. That looks nice: x[a] <- b[c] But if we use [...], we need to figure out what general function to map [...] to, and we just lost the phrase [...] for other purposes (such as describing special lists). (For sweet-expressions, I'll presume that the language or user will define the infix macros/functions, and not add [...] simply because it's easier and more general not to.)

But if we use [...], we need to figure out what general function to map [...] to, we just lost the phrase [...] for other purposes (such as describing special lists). (For sweet-expressions, I'll presume that the language or user will define the infix macros/functions, and not add [...] simply because it's easier and more general not to.) Since Unicode/ISO 10646 is finally becoming widely available, and many systems can process them (e.g., using UTF-8 encoding), additional infix symbols could be accepted that are not in ASCII. This would certainly resolve the “and” and “or” issues, since there are standard mathematical symbols for them: ∧ (”logical and”) is character U+2227 (decimal 8743) and ∨ (”logical or”) is character U+2228 (decimal 8744). See Unicode symbols chart, especially the mathematical operators chart, for more information. The character set from U+2200 through U+22FF is allocated to mathematical symbols. This will remind those of us who’ve been around a while of APL! Probable characters for infix operators include U+2227 (”and”), U+2228 (”or”), U+220A (”element of”), U+2209 (”not an element of”), U+220B (”contains as member”), U+220C (”does not contain as member”), U+2229 (”intersection”), U+222A (”union”), U+225D (”defined as”), U+2260 (”not equal to”), U+2254 (”colon equals”), U+2282 (”subset of”), U+2283 (”superset of”), U+2284 (”not a subset of”), U+2285 (”not a superset of”), U+2286 (”subset of or equal to”), U+2287 (”superset of or equal to”), U+2288 (”neither a subset of or equal to”), U+2289 (”neither a superset of nor equal to “), U+228A (”subset of with not equal to”), U+228B (”superset of with not equal to”), U+22C8 (”bowtie” - sometimes used for join), U+22BB (”xor”), and U+22BC (”nand”). But not everyone has good support for these yet, so while include a set of Unicode infix operators in the standard set might be a good idea (anticipating their use), it’s probably premature to require their use.

If the printing routines normally convert to infix, then you need to not have a pattern that is likely to be engaged unintentionally... which are good reasons to not have the terms “and” and “or” be infix operators, and certainly a good reason to stay away from anything other than an isolated “:”. You can probably limit the regular expression length, say up to 3-6 characters, to help limit unintentional conversions as well. I would not include “and” and “or”, instead add & and | to the infix operators (and say that | has to be escaped), and limit it so it must be 1 through 4 characters in length. I would accept “:” alone (so it can be used for infix type declarations), but nothing else with colons.

These considerations suggest that “=>” be quietly prevented from being an infix opreator, and then use this regular expression for infix operators;

[+-\*/<>=&\|\p{Sm}]{1-4}|\:|\|\|

As a related matter, you need to decide how to separate infix operators from non-infix operators. In many other language families, “x-y” would be parsed as x minus y. Unfortunately, in many Lisp-like languages, names often include typical infix operator characters such as “-”. Both Common Lisp and Scheme would be unusable without the - symbol, because they have many functions with “-” in the name (in addition, Scheme has a convention where “->” embedded in the name indicates a conversion). The simplest and most obvious approach is to require that any infix operator be surrounded by whitespace; this is very easy for humans to remember, and is consistent with normal s-expression syntax anyway. You could require escaping such characters as part of a name, e.g. |simple-string-p| or simple\-string\-p, but this is both ugly and unnecessarily incompatible with common practice. If you’re devising a new language, you could simply forbid such characters in regular symbol names, and then you could automatically detect all the infix operators without requiring that they be surrounded by whitespace. However, for general-purpose parsing, sweet-expressions will require that infix operators be surrounded by whitespace (in cases where function names don't have such symbols, this requirement could be relaxed).

What are the semantics of the infix operations?

Once we detect that we are using infix notation, what are the semantics of the “infix” notation?

Swap first two parameters. An extremely easy-to-implement and general approach is to just swap the first two parameters. A good idea for this case would be to require exactly 3 parameters -- that way, we won’t accidentally screw things up, because when limited to exactly 3 parameters, there’s no difference between swapping and the normal interpretation of infix. This is trivial to implement, and yet it’s enough to implement basic infix operators in a way that looks nicer. This means that the presented expression is very similar to the actual s-expression, which has its advantages. If you want to add a fancier infix format later, this approach is a reasonable stepping-stone towards that. Allow chaining (duplication) of identical operator. This extends the previous option, but you can have an odd number of parameters, and the even ones much match. Thus, (3 + 4 + 5 + 6) becomes (+ 3 4 5 6). Trivial to implement (just swap the first two parameters and ensure the rest match). This approach makes it possible to fully use the capabilities of underlying functions that allow multiple parameters, and that's a good thing. Interestingly enough, Common Lisp’s definition of comparison operators would make expressions like (x <= y < z) work as it does in mathematics! Note that this does not support precedence, so you have to surround different operators with parenthesis or indents, e.g., (3 + (5 * 6)). In one view, that's a disadvantage - you still have to indicate some information that would be automatic in other languages. On the other hand, this makes when there are new lists explicit, and that has some pretty big advantages. It also sidesteps the problems of defining precedence (either fixed or a way to add them). Those are pretty big advantages. There are many options for determining when things aren't correct, and then deciding if that is an error or merely an indicator that infix was not intended. Note that this only supports an odd number of parameters, and obviously there shouldn't be an infix operator presented as the first parameter in the text. Doing otherwise could be an error, or could be an indicator that it should not be treated as infix. For the moment, I'll plan to consider both as an indicator that infix was not intended. This only permits all the operators to be identical - if they're different, but all infix operators, then the writer probably intended some sort of precedence (and so an error should be reported). Otherwise, infix was probably not intended, so again, I'd probably interpret that as "not infix". Full infix of binary operations, with precedence rules. Allow infix and add precedence rules (* before +, etc.). This would mean that “(3 + 4 * 5 ** 2)” would quietly transform into the s-expression “(+ 3 (* 4 (** 5 2)))”. In cases where it is important that you be able to clearly see the mapping between the surface presentation and the underlying s-expression, just don’t use precedence - instead, parenthesize (so those who do not need to see that representation don’t have to). This is trivial to implement in Lisp... the problem is what to do about the precedence of unknown operators. You’d need to either not support precedence of them, have a default for unknown operators, or have a way to set precedence values (and make sure the settings are read before processing the code). Setting precedence is easy enough, but its disadvantage is that it invites trouble - you need to make sure that the rules are set before the parameters are swapped, and mistakes could result in hard to find errors. Left association could probably be safely assumed for more than one instance of the same unknown operator. Alternatives are giving unknown operators a specific precedence relative to other operators... or simply requiring that other operator’s precedence be made clear with parentheses (or their equivalent). Predefining precedence for common functions, and requiring statements about the others, isn’t a bad thing; people usually don’t want to have varying precedence tables (it’s confusing), and they also don’t want to memorize large tables. Again, I suggest making duplicate operators (3 + 4 + 5) turn into one s-expression (+ 3 4 5); have the user override if they want something different. Here, we don’t allow prefix/suffix operators, such as “- x”, or suffix operators, like “x !”. There is a trade-off here; prefix/suffix don’t work well with automatic detection or trying to also read traditional s-expressions correctly - it would be all too easy to misread such expressions. Using name-prefix notation for them, such as -(x), is a very reasonable alternative. Full infix of binary and non-binary operations, with precedence rules. This is like the above, only now we also allow prefix operators, such as as “- x”, or suffix operators, like “x !”. Allowing them adds a little extra flexibility, and might make sense if the specific expression is always manually marked as being infix. But it has many problems, as noted earlier.

Precedence rules

If you have precedence rules, what are they, and are they controllable? It's not hard to add a command that sets precedence rules, but there's always the problem that you risk not getting that called at the "right time" (before reading). Worse, if there are global precedence rules that can be changed, setting them in one environment may manipulate another (unintended) environment, so now we have to have a parameter for (read), and that may be hard to pass down. This is particularly a problem for Lisp-like systems, where you may have multiple levels and meta-levels that you might not want to interact. Another big challenge is that if different developers set precedences, it's harder to combine their code. Different operations may have different meanings in different circumstances, so the lack of control over precedence can be a problem too.

A chain of the same operation can normally be combined into a single operation with all those parameters, e.g., (a + b + c) should become (+ a b c). This adds capability, without any problems or loss.

Everyone agrees that * and / should have the same precedence, both of which have greater precedence than the equal-precedence binary + and -, and that all of these are left-to-right. So in theory that, at least, could be implemented. Originally I thought that it may be best to just implement those universal rules, and normally use parentheses most everywhere else for controlling infix precedence (or at least discourage lots of precedence-setting). Chaining is fine, but as discussed below, precedence causes a lot of problems in many use cases, so in the end I decided against them for sweet-expressions.

Printing expressions that might be infix-able

So far, I’ve primarily discussed how to read in expressions, but expressions need to be printed too. A likely answer is “just print s-expressions as usual” by default. That means that the output would be “(+ 3 4)”, even if the input was “(3 + 4)”. For debugging code, this is probably a Good Thing. There might be value in presenting expressions back with some infix operations “re-inserted”, at least as some sort of “pretty-printing” function or option. It might be wise to identify which lists are being interpreted as infix in these cases (e.g., by surrounding them with {...}). I think there is no reason to try to redo precedence when re-displaying; showing how the lists are actually stored is very clear and eliminates questions about precedence. Thus, if “(3 + 4 * 5)” is input, the resulting s-expression would be “(+ 3 (* 4 5))”, and the infix-printer might show this as “{3 + {4 * 5}}”.

Combining infix with Name-prefixing

There is a subtlety when combining name-prefixing with automatic infix: I think people expect an infix expression to be considered a single expression, even though at first it appears to be a multi-parameter list.

For example, most people would accept this syntax for calling function f with two parameters, x and y:

f(x y)

However, if infix is normally done automatically, they would expect this to be computed with f given one parameter, not three:

f(x + y)

Note that this is actually a very nice thing; it means that in many cases, name-prefixing causes the number of parentheses that need writing to reduce. So the original Lisp:

(f (+ x y))

f(x + y)

Infix control

There is likely to be a need to control infix processing. These should be a standard way of saying “disable infix” and “enable infix”. You'd like to be able to enable or disable infix at only one level, in particular, so that you could leave infix on as the default and yet disable it in one particular expression. It would be especially useful to have an “enable infix for this one level of expression” operation that must receive an expression that “looks like an infix expression” or it is in error.

The following text are some very early ideas on the topic; I include them here, but it's all very early, and at least some is likely to be very wrong. Still, you may find useful thoughts here.

One implementation issue is that macros work from outside in, and most macros will not expect special additional macros. In particular, it might be very useful to cuddle an infix operator with a macro that says do not use infix, like this (where "as" means "as-is"):

(define as(+) ...)

There should be a standard way of inserting at the beginning of a file or read-eval-print loop “please switch to and from special processing, and for setting some of its options (e.g., infix control). For debugging, you could just print s-expressions as now. However, a standard way to request printing these expressions that shows infix as such, but marking infix operators and non-infix, would be a good idea and each list which could be misinterpreted as an infix expression but is not has another marker. This implies a need for a marker that says “the following list is an infix expression and it’s an error if it is not”. If adding specific new infix operations is supported (such as “and” and “or”), a standard name and syntax for this add operation would be a good idea too. Ideally, the external interfaces for these operations be pseudo-standard across all Lisp-like notations. Bridging the gap between Common Lisp and Scheme might be challenging for a standard interface (in some cases string parameters might be needed; string syntax is now universal, but other syntax is not - e.g., the keyword syntax of Common Lisp is incompatible with the proposed keyword syntax for Scheme).

The notation for controlling all this is to be determined. Here’s a start. Originally, I looked at implementing another # character option. If you want to do that, you might look for sharpsign options that no one one currently uses, examining sources such as the Common Lisp Hyperspec (particularly the sharpsign section), the R5RS Scheme specification, the GNU Emacs Lisp Reference Manual, as well as less-common systems like NewLisp. Using #I for infix is tempting, but Scheme uses #i for inexact; it’s not clear if #I would get interpreted the same way on some systems, but it’s not worth risking and might be confusing to developers anyway. I especially considered using #/ to mean "infix". The #@ combination is used by Emacs Lisp, so we should avoid that. However, Scheme only supports one-character lookahead; if you want to be able to call the "ordinary" Scheme reader for other #-beginning constructs, there isn't an easy way to do that - because to read the character after #, you have to consume the #, making that more difficult to do. So I don't think beginning with # is a good idea.

I’ve looked for some possible notation, under the presumption that this is a future standard format common to Common Lisp and Scheme (so we need something unallocated by either).

A simple approach might this naming convention, with function-call-like macros:

nfx(...): Everything inside is interpreted as infix if it can, recursively.

unfx(...): Everything inside is NOT infix, recursively.

nfx1(...): The immediately contained inside is infix, and it is an error if it is not.

unfx1(...): The immediately contained expression is not infix.

Each of the above could either accept a single list (in which case it's the list that is being referred to), otherwise it is the expression itself as the rest of its parameters that is being referred to. Since any infix expression must have at least three parameters (2 if unary operators are used), this isn't ambiguous.

We could interpret unfx(...) around a second parameter to quietly disable infix processing. Alternatively, we could have a different function/macro name such as "as(...)". This would Treat the inside (function) as-is, so it won't be considered an infix operation (use this if an infix operator is the second parameter, but you don't want the expression to be considered infix). Here's an example:

defun as(+) (left right) ...

One interesting challenge in doing this is recursion in the read function. I decided to implement much of the processing in the reader, and to have the processing go "as I go" in the reader, rather than have the reader automatically add nfx() in front of expressions and then have eval() fix it up by calling a macro. Since the reader is called on ordinary data in s-expression format, it is very inconvenient to automatically have the reader add nfx(), etc. calls! Yet if there is no outermost call to "fix up" the expression, then the outermost parameters would be defun, define, or other terms that would not handle these macro calls correctly. And you dare not remove things piecemeal - read can be called recursively, so if you call read, remove things partly, but then later all "fix up" all the way down again, you can "unfix" things. This is one reason to have a separate "as" function, so that these different forms can be differentiated.

Note that “(s (b))” is interpreted as you might expect during sweet-expression processing, but (s(b)) is interpreted as the s-expression ((s b)), not (s (b)). In practice this is not a problem - nobody likes the format (s(b)) for s-expressions, and not all s-expression processors would even accept them. More also needs to be done to describe their interaction with macros; this is a critical area and hard to get right, but since there is relatively little that is changing, this may not be so bad.

A simple program could transform an existing program to sweet-expression format, including its comments, and with any necessary markers to control infix interpretation. I expect that the need for controlling infix interpretation will be exceedingly rare, so the results should look very nice.

Infix implementations

Many, many people have implemented infix processors. Here are some, not including the many larger notational systems (like IACL2) that have an infix notation built into them:

A simple Lisp program that converts infix to s-expression format is available as part of Peter Norvig’s book, “Artificial Intelligence: A Modern Approach.” This code’s license appears to be open source software. Besides the usual disclaimers, it says, “3. The origin of this software must not be misrepresented, either by explicit claim or by omission” and “4. Altered versions must be plainly marked as such, and must not be misrepresented as being the original software. Altered versions may be distributed in packages under other licenses (such as the GNU license).” Alan Manuel K. Gloria’s “infix notation macro” is extremely promising. His approach is to create a macro “nfx”, and then put spaces around everything. From then on, everything inside transitively can use either infix or prefix notation. It detects which lists are infix by examining the second parameter, and determining if it is in a list of infix operators. At this time the license intends to allow arbitrary use, though not in a clear legal manner. Mark Kantrowitz wrote an infix reader macro that is typical of many such packages (dated Jan 18, 1995); his runs on Common Lisp. It is often mentioned, though its restrictive license makes it useless to many. It “allows the user to type arithmetic expressions in the traditional way (e.g., 1+2) when writing Lisp programs instead of using the normal Lisp syntax (e.g., (+ 1 2)). It is not intended to be a full replacement for the normal Lisp syntax. If you want a more complete alternate syntax for Lisp, get a copy Apple’s MLisp or Pratt’s CGOL. Although similar in concept to the Symbolics infix reader (#<DIAMOND>), no real effort has been made to ensure compatibility beyond coverage of at least the same set of basic arithmetic operators. There are several differences in the syntax beyond just the choice of #I as the macro character. (Our syntax is a little bit more C-like than the Symbolics macro in addition to some more subtle differences.) It is not open source software; deriv