[rust-dev] Redesigning fmt!

Having thought a bit more concretely about this, using the suggestions here, and talking to Graydon on IRC I've come up with the following design for a string formatting library. I think that this addresses the comments in the responses to my original email, but if not please let me know! == Format Language == On of the major goals of the "formatting language" is to support internationalization as necessary. This means that must be nested format patterns, some form of a few functions that can be executed at runtime, and be able to test the equivalence of format strings at runtime. To this end, I drew from these links: http://docs.python.org/3/library/string.html#formatstrings http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html http://docs.oracle.com/javase/7/docs/api/java/text/ChoiceFormat.html?is-external=true http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/PluralFormat.html http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/SelectFormat.html https://github.com/SlexAxton/messageformat.js And settled on this grammar: format_string := <text> [ format <text> ] * format := '{' [ argument [ ':' format_spec ] ',' ] function_spec ] '}' argument := '' | integer format_spec := [[fill]align][sign]['#'][width][.precision][type] fill := character align := '<' | '>' sign := '+' | '-' width := count precision := count | '*' type := identifier | '' count := parameter | integer parameter := integer '$' function_spec := plural | select plural := 'plural' ',' [ 'offset:' integer ] ( selector arm ) * selector := '=' integer | keyword keyword := 'zero' | 'one' | 'two' | 'few' | 'many' select := 'select' ',' ( identifier arm ) * arm := '{' format_string '}' Some examples would be: {} {1} {1:d} {:date} {: <} {:+#} {plural, other {...}} {1, plural, offset:1 =1{...} one{...} many{...} other {...}} {select, s1{...} s2{...} other{...}} {select, selector{{2, plural, other{...}}} other{...}} An overview of this: * Any argument can be selected (0-indexed from the start) * Any argument can be formatted any way (or at least the formatter requests a particular format * There are two internationalization functions 'select' and 'plural'. I've also seen a 'choice' function and I haven't quite been able to grasp it, but there's enough foundation here that it should be easy to add. * Nested format strings are allowed * Multi-char format names are allowed. * I'm not conviced the format-specifiers are the best they could be, they're currently modified from python's version, and do differ slightly from what currently exists today. Implementation-wise, there will be a parser in a `fmt` module which parses these strings and yields ast-like items representing the structure of the format string. == Compile-time suport == All of the formats available for use will be defined at compile-time. Each format will be defined as implementors of a particular trait, and these traits will have one method each defining a format function. For example: #[fmt="b"] pub trait Bool { fn fmt(&Self, &mut Formatter); } #[fmt="c"] pub trait Char { fn fmt(&Self, &mut Formatter); } #[fmt="d"] pub trait Signed { fn fmt(&Self, &mut Formatter); } #[fmt="u"] #[fmt="i"] pub trait Unsigned { fn fmt(&Self, &mut Formatter); } #[fmt="s"] pub trait String { fn fmt(&Self, &mut Formatter); } #[fmt="?"] pub trait Poly { fn fmt(&Self, &mut Formatter); } Here each format specifier is specified via a #[fmt] attribute. There is one static function called `fmt` which takes the type as a first parameter and then a `Formatter` object as a second. The `Formatter` object contains the output stream and any relevant flags like width/precision/fill/alignment. It will be up to each implementation of each trait to implement these flags, but there will be a number of helper functions in a `fmt` package for dealing with these options. >From the compiler's point of view, there will be a new macro, let's call it ifmt!, which will have the following transformation: ifmt!("{:s}, {}!", "Hello", "World") { let l1 = "Hello"; let l2 = "World"; ::std::fmt::sprintf("{:s}, {}!", [c(String::fmt, &l1), c(Poly::fmt, &l2)]) } A few notable points: * If you're wondering what this `c` function is, look below * An attempt is made to make this as little code as possible. Each format location should purely pass all the arguments along to someone else. * The argument list is a list of tuples where the first element is a function which takes the second element (and a formatter) to format the result into a stream. The exact function selected depends on the format parameter specified in the string, such as: "s" == String::fmt, default == Poly::fmt * A bit of magic goes on under the hood with unsafe casts to make these all typecheck to the same thing (more details below) == Runtime support == The crux of the implementation will be around this function signature: type FormatFn<T> = extern "Rust" fn(&T, &mut Formatter); type Argument = (FormatFn<Void>, &Void); unsafe fn fprintf(w: &mut io::Writer, fmt: &str, args: &[Argument]) { ... } Here, the stream to output to is taken, the format string, and the list of arguments. Each argument is an "opaque" pointer/function pair where the function knows how to format the value at the pointer. The validity of each FormatFn/pointer type is validated at compile time, so only valid calls to this function will be emitted. The function is then also tagged as `unsafe` so if it's manually called at runtime there's a knowledge that if you mix up the arguments then serious problems will happen. >From above, the compiler would emit calls to the `c` function as so: fn c<T>(f: FormatFn<T>, t: &T) -> Argument { ... } The actual implementation is just a wrapper around `transmute`. This gets us a lot of nice error messages and compile-time checks that guarantee the type of each argument is sane (regarding its format specifier). For example an invalid program would yield the following: ifmt!("{:s}", MyStruct{ foo: "bar" }) //~^ ERROR: No implementation of `String` trait found for `MyStruct` This comes about because the 's' format specifier is registered to the `String` trait (or rather `std::fmt::String`), and due to the signature of the `c` function it will attempt to look up an implementation of that trait for the `MyStruct` type (passed as the second argument of `c`). Algorithm-wise, this will create a parser for the fmt string, and iterate over each of the "tokens" performing the necessary action (streaming output to the specified stream). A few notes: * I believe that parsing must occur at runtime, because otherwise i18n wouldn't work because it could generate any arbitrary format string at runtime based on the current locale. * Currently traits don't work well enough such that `&mut io::Writer` is a thing that works, so the current interface would only export an `sprintf` function which emits to a `&mut ~str` object (essentially a stream). == Internationalization == I also wanted to touch on how this covers internationalization. The main point of this is located within the query language, but the runtime must also support some constructs. The format string and arguments are validated at compile-time, but any format string could be run at runtime. For this reason an equivalence function will be needed that takes the original format string and a translated format string and ensures at runtime that the two are equivalent in terms of types, position, and number of arguments. On a related note, any argument as a parameter to the `plural` function will be required to be of the `&uint` type, and any argument to the `select` function will be required to thbe of the `& &str` type. Additionally, the function pointer of the argument pair these are in will be some dummy function that fails if called (because they should never be called). I haven't given too much thought to these constructs, but that was kinda the first thing I came up with. == Summing up == Currently I have implemented the format language parsing, and the runtime support necessary for this (without dealing with formatting flags). I haven't started the compiler work yet, and there's no reason that any of this couldn't completely change in the meantime. I would love comments/suggestions on this system. I think that this takes into account almost all of the feedback which I've received about how formatting strings should work, but extra sets of eyes are always useful!