Easier Python string formatting

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

Some languages pride themselves on providing many ways to accomplish any given task. Python, instead, tends to focus on providing a single solution to most problems. There are exceptions, though; the creation of formatted strings would appear to be one of them. Despite the fact that there are (at least) three mechanisms available now, Python's developers have just adopted a plan to add a fourth. With luck, this new formatting mechanism (slated for Python 3.6) will improve the traditionally cumbersome string-formatting facilities available in Python.

Like many interpreted languages, Python is used heavily for string processing tasks. At the output end, that means creating formatted text. Currently, there are three supported ways to get the same result:

'The answer is %d' % (42,) 'The answer = {answer}'.format(answer = 42) s = string.Template('The answer is $answer') s.substitute(answer=42)

The traditional " % " operator suffers from some interesting lexical traps and only supports a small number of types. The format() string method is more flexible, but is somewhat verbose, and the Template class seems to combine the shortcomings of the previous two methods and throws in yet another syntax to boot. All three methods require a separation between the format string and the values that are to be formatted into it, increasing verbosity and, arguably, decreasing readability, while other languages have facilities that do not require that separation.

f-strings

Other languages, such as Perl and Ruby, have more concise string-formatting operations. With the debut of the string interpolation mechanism described in PEP 498, Python will have a similar facility. This PEP introduces a new type of string, called an "f-string" ("formatted string") denoted by an " f " character before the opening quote:

f'This is an f-string'

F-strings thus join the short list of special string types in Python; others include r'raw' and b'byte' strings. The thing that makes an f-string special is that it is evaluated as a particular type of expression when it is executed. Thus, to replicate the above examples:

answer = 42 f'The answer is {answer}'

As can be seen, f-strings obtain the value to be formatted directly from the local (and global) namespace; there is no need to pass it in as a parameter to a formatting function or operator. Beyond that, though, what appears between the brackets can be an arbitrary expression:

answer = 42 f'The answer is not {answer+1}' f'The root of the answer is {math.sqrt(answer)}'

So formatted output can be created with expressions of just about any complexity. These expressions might even have side effects, though one suspects that would rarely be a good idea.

Under the hood, the execution of f-strings works by evaluating each expression found in curly brackets, then invoking the __format__() method on each result. So the following two lines would have an equivalent effect:

f'The answer is {answer}' 'The answer is ' + answer.__format__()

A format string to be passed to __format__() can be appended to the expression with a colon, thus, for example:

f'The answer is {answer:%04d}'

One can also append " !s " to pass the value to str() first, " !r " to use repr() , or " !a " to use ascii() . So, once again, the following two lines would do the same thing:

f'The answer is {answer:%04d!r}' 'The answer is ' + repr(answer).__format__('%04d')

That is the core of the change. There are other details, of course; see the PEP for the full story. The PEP was accepted by Python benevolent dictator for life Guido van Rossum on September 8, so, unless something goes surprisingly wrong somewhere, f-strings will be a part of the Python 3.6 release.

Where next?

PEP 498 was somewhat controversial over the course of its development. There were a number of concerns about how f-strings fit into the Python worldview in general, but there was also a specific concern: security. In particular, Nick Coghlan expressed concerns that f-strings would make it easy to write insecure code. Examples would be usage like:

os.system(f'cat {file}') SQL.run(f'select {column} from {table}')

In either case, if any of the values substituted into the strings are supplied by the user, the result could be the compromise of the whole system. The problem is not that f-strings make it possible to incorporate untrusted data into trusted strings — that can just as easily be done with existing string-formatting mechanisms. And the problem is certainly not that f-strings make string formatting easier in general; Nick's specific concern is that f-strings will be the easiest way to put strings together, while more secure methods remain harder. Using an f-string to format an SQL query will be easier to code (and to read later) than properly escaping the parameters, so developers will be drawn toward the insecure alternative.

His suggestion, as described in PEP 501, is to make the secure way as easy to use as the insecure way. The result is "i-strings"; they look a lot like f-strings in that the syntax is nearly identical:

i'The answer is {answer}'

There is a key difference, though: while f-strings produce a formatted string immediately on execution, i-strings delay that formatting. An explicit call to a format function is required to do the job. To see the difference, consider the two lines below, which have equivalent effect:

print(f'The answer is {answer}') print(format(i'The answer is {answer}'))

The key to Nick's proposal is that format() can be replaced with another formatting function that knows how to escape dangerous characters in the intended usage scenario. Thus:

os.system(sh(i'cat {file}')) SQL.run(sql(i'select {column} from {table}'))

The sh() formatter would ensure that no shell metacharacters get through, while sql() would prevent SQL-injection attacks. These formatters would be easy enough to use that developers would not be tempted to bypass them. Just as importantly, static analysis software could easily distinguish between safe and unsafe string usage for a given API, making it possible to automatically detect when the wrong type of string is being used.

PEP 501 has been through a long series of revisions, involving significant changes, since first being posted. At times the syntax was rather more complicated, prompting Guido to ask: "Have I died and gone to Perl?". Nick's proposal had originally been intended as an alternative to PEP 498, but, over time, Nick warmed to the f-string approach and came out in favor of its adoption. PEP 501 remains outstanding, though, and will likely be pursued as an extension to f-strings.

That work, too, could conceivably happen in time for the 3.6 release, which is planned to happen in late 2016. Given its volatile history thus far, chances are that the end result will look somewhat different from what has been proposed to date. However it turns out, though, Python should no longer have to defer to other languages when it comes to the ease of creating formatted output.

