[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Request for Pronouncement

Accepted. Congrats with marshalling yet another quite contentious discussion, and putting up with my last-minute block-headedness! If you're going to commit another change, may I suggest to add, to the section stating that %r is not supported, that %a is usually a suitable replacement for %r? On Thu, Mar 27, 2014 at 1:07 PM, Ethan Furman <ethan at stoneleaf.us> wrote: > Requesting pronouncement on PEP 461. Full text below. > > ============================================================ > =================== > PEP: 461 > Title: Adding % formatting to bytes and bytearray > Version: $Revision$ > Last-Modified: $Date$ > Author: Ethan Furman <ethan at stoneleaf.us> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2014-01-13 > Python-Version: 3.5 > Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25, > 2014-03-27 > Resolution: > > > Abstract > ======== > > This PEP proposes adding % formatting operations similar to Python 2's > ``str`` > type to ``bytes`` and ``bytearray`` [1]_ [2]_. > > > Rationale > ========= > > While interpolation is usually thought of as a string operation, there are > cases where interpolation on ``bytes`` or ``bytearrays`` make sense, and > the > work needed to make up for this missing functionality detracts from the > overall > readability of the code. > > > Motivation > ========== > > With Python 3 and the split between ``str`` and ``bytes``, one small but > important area of programming became slightly more difficult, and much more > painful -- wire format protocols [3]_. > > This area of programming is characterized by a mixture of binary data and > ASCII compatible segments of text (aka ASCII-encoded text). Bringing back > a > restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in > writing new wire format code, and in porting Python 2 wire format code. > > Common use-cases include ``dbf`` and ``pdf`` file formats, ``email`` > formats, and ``FTP`` and ``HTTP`` communications, among many others. > > > Proposed semantics for ``bytes`` and ``bytearray`` formatting > ============================================================= > > %-interpolation > --------------- > > All the numeric formatting codes (``d``, ``i``, ``o``, ``u``, ``x``, ``X``, > ``e``, ``E``, ``f``, ``F``, ``g``, ``G``, and any that are subsequently > added > to Python 3) will be supported, and will work as they do for str, including > the padding, justification and other related modifiers (currently ``#``, > ``0``, > ``-``, `` `` (space), and ``+`` (plus any added to Python 3)). The only > non-numeric codes allowed are ``c``, ``b``, ``a``, and ``s`` (which is a > synonym for b). > > For the numeric codes, the only difference between ``str`` and ``bytes`` > (or > ``bytearray``) interpolation is that the results from these codes will be > ASCII-encoded text, not unicode. In other words, for any numeric > formatting > code `%x`:: > > b"%x" % val > > is equivalent to:: > > ("%x" % val).encode("ascii") > > Examples:: > > >>> b'%4x' % 10 > b' a' > > >>> b'%#4x' % 10 > ' 0xa' > > >>> b'%04X' % 10 > '000A' > > ``%c`` will insert a single byte, either from an ``int`` in range(256), or > from > a ``bytes`` argument of length 1, not from a ``str``. > > Examples:: > > >>> b'%c' % 48 > b'0' > > >>> b'%c' % b'a' > b'a' > > ``%b`` will insert a series of bytes. These bytes are collected in one of > two > ways: > > - input type supports ``Py_buffer`` [4]_? > use it to collect the necessary bytes > > - input type is something else? > use its ``__bytes__`` method [5]_ ; if there isn't one, raise a > ``TypeError`` > > In particular, ``%b`` will not accept numbers nor ``str``. ``str`` is > rejected > as the string to bytes conversion requires an encoding, and we are > refusing to > guess; numbers are rejected because: > > - what makes a number is fuzzy (float? Decimal? Fraction? some user > type?) > > - allowing numbers would lead to ambiguity between numbers and textual > representations of numbers (3.14 vs '3.14') > > - given the nature of wire formats, explicit is definitely better than > implicit > > ``%s`` is included as a synonym for ``%b`` for the sole purpose of making > 2/3 code > bases easier to maintain. Python 3 only code should use ``%b``. > > Examples:: > > >>> b'%b' % b'abc' > b'abc' > > >>> b'%b' % 'some string'.encode('utf8') > b'some string' > > >>> b'%b' % 3.14 > Traceback (most recent call last): > ... > TypeError: b'%b' does not accept 'float' > > >>> b'%b' % 'hello world!' > Traceback (most recent call last): > ... > TypeError: b'%b' does not accept 'str' > > > ``%a`` will give the equivalent of > ``repr(some_obj).encode('ascii', 'backslashreplace')`` on the interpolated > value. Use cases include developing a new protocol and writing landmarks > into the stream; debugging data going into an existing protocol to see if > the problem is the protocol itself or bad data; a fall-back for a > serialization > format; or any situation where defining ``__bytes__`` would not be > appropriate > but a readable/informative representation is needed [6]_. > > Examples:: > > >>> b'%a' % 3.14 > b'3.14' > > >>> b'%a' % b'abc' > b"b'abc'" > > >>> b'%a' % 'def' > b"'def'" > > > Unsupported codes > ----------------- > > ``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported. > > > Compatibility with Python 2 > =========================== > > As noted above, ``%s`` is being included solely to help ease migration > from, > and/or have a single code base with, Python 2. This is important as there > are modules both in the wild and behind closed doors that currently use the > Python 2 ``str`` type as a ``bytes`` container, and hence are using ``%s`` > as a bytes interpolator. > > However, ``%b`` should be used in new, Python 3 only code, so ``%s`` will > immediately be deprecated, but not removed until the next major Python > release. > > > Proposed variations > =================== > > It has been proposed to automatically use ``.encode('ascii','strict')`` for > ``str`` arguments to ``%b``. > > - Rejected as this would lead to intermittent failures. Better to have > the > operation always fail so the trouble-spot can be correctly fixed. > > It has been proposed to have ``%b`` return the ascii-encoded repr when the > value is a ``str`` (b'%b' % 'abc' --> b"'abc'"). > > - Rejected as this would lead to hard to debug failures far from the > problem > site. Better to have the operation always fail so the trouble-spot > can be > easily fixed. > > Originally this PEP also proposed adding format-style formatting, but it > was > decided that format and its related machinery were all strictly text (aka > ``str``) based, and it was dropped. > > Various new special methods were proposed, such as ``__ascii__``, > ``__format_bytes__``, etc.; such methods are not needed at this time, but > can > be visited again later if real-world use shows deficiencies with this > solution. > > A competing PEP, ``PEP 460 Add binary interpolation and formatting`` [7]_, > also exists. > > > Objections > ========== > > The objections raised against this PEP were mainly variations on two > themes: > > - the ``bytes`` and ``bytearray`` types are for pure binary data, with no > assumptions about encodings > > - offering %-interpolation that assumes an ASCII encoding will be an > attractive nuisance and lead us back to the problems of the Python 2 > ``str``/``unicode`` text model > > As was seen during the discussion, ``bytes`` and ``bytearray`` are also > used > for mixed binary data and ASCII-compatible segments: file formats such as > ``dbf`` and ``pdf``, network protocols such as ``ftp`` and ``email``, etc. > > ``bytes`` and ``bytearray`` already have several methods which assume an > ASCII > compatible encoding. ``upper()``, ``isalpha()``, and ``expandtabs()`` to > name > just a few. %-interpolation, with its very restricted mini-language, will > not > be any more of a nuisance than the already existing methods. > > Some have objected to allowing the full range of numeric formatting codes > with > the claim that decimal alone would be sufficient. However, at least two > formats (dbf and pdf) make use of non-decimal numbers. > > > Footnotes > ========= > > .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting > .. [2] neither string.Template, format, nor str.format are under > consideration > .. [3] https://mail.python.org/pipermail/python-dev/2014- > January/131518.html > .. [4] http://docs.python.org/3/c-api/buffer.html > examples: ``memoryview``, ``array.array``, ``bytearray``, ``bytes`` > .. [5] http://docs.python.org/3/reference/datamodel.html#object.__bytes__ > .. [6] https://mail.python.org/pipermail/python-dev/2014- > February/132750.html > .. [7] http://python.org/dev/peps/pep-0460/ > > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140327/cdcdd43a/attachment-0001.html>