So…

To start off, I think there are two architectural choices to make - tokens vs AST and pretty printing vs heuristic adjustment. I think they are orthogonal.

Tokens vs AST is not really the right way to frame the decision. In both cases we work off an AST for most situations and fallback to tokens or source text in some instances. The ‘tokens’ approach is really building our own AST (as opposed to using the libsyntax AST). The advantage of the former is that we can optimise for the job in hand, rather than using a more generic AST. However, the disadvantage is that we don’t share code with the compiler and thus keep things inline together. How this manifests is that the ‘tokens’ approach is able to cope with code which does not parse - e.g., snippets or non-compiling code. It also makes handling macros easier. The ‘AST’ approach makes it easier to leverage more information from the compiler to make decisions.

My personal feeling is towards the AST approach. I think formatting incomplete code is a distraction and should be a non-goal (someone change my mind with a good use case ). I think macros can be handled just fine using the AST approach - we just need to handle them pre-expansion. Furthermore, I see the end game for Rustfmt involving moving a bunch of the style lints out of the compiler and into Rustfmt. That would require more compiler info, and (I think) requires we have knowledge of the AST.

I believe most existing tools take the pretty printing approach. I prefer the heuristic adjustment approach. Importantly, the pretty printing approach has to be perfect before it is useful. Whereas, an heuristic approach is useful straight away. It also allows us to use the layout of the source more easily as input. Pretty printing can never perfectly insert line breaks, so if someone has already formatted their code in a style-rules-conformant way, re-formatting that according to pretty printing rules seems like a bad move.

Orthogonal to these questions is how customisable the tool is. I believe it should be customisable (like Clang Format) rather than strict like Gofmt. Although the latter saves arguments, it seems ill-adjusted to the real world - people might want their Rust formatted like existing code or might have strong personal preferences. They will not use this tool if we go for the strict version. I would like the tool to be useful for the most people so I err on the side of pragmatism. However, it would be easy to add configurability later, so we don’t need to worry about it for now (as an aside, I think it would be interesting research to be able to formally describe style rules for such a tool, rather than using ad hoc constraints - e.g., style by example (analogous to macro by example). Styling is really the counter to parsing - it is a set of rules which map from parsed code to tokenised code, whereas parsing is the other way round, so it seems there should be some counter of a formal grammar for specifying styling, anyway, I digress).

I would like to put together some test cases of awkward code and how it would be formatted in order to get a better idea of what information we need, these would be useful in a test suite too. I’m not exactly sure what would make something awkward, but lots of rightward drift would give most pretty printing algorithms trouble.

I have a prototype using the AST and a heuristic adjustment approach. The key element is representing the source text using a modified rope which keeps track or original source positions as well as current ones, that makes it easy to build on previous adjustments, whilst using the compiler’s spans for input. Unfortunately it is a mess. I haven’t had time to work on it for a while. I’ll try and tidy it up and put it online very soon…