There are too many changes to list them all here; instead, I will refer to the PEPs. However, I'd like to highlight a number of features that I find to be significant or expect to be of particular interest or controversial.

We're switching to a model known from Java: (immutable) text strings are Unicode, and binary data is represented by a separate mutable "bytes" data type. In addition, the parser will be more Unicode-friendly: the default source encoding will be UTF-8, and non-ASCII letters can be used in identifiers. There is some debate still about normalization, specific alphabets, and whether we can reasonably support right-to-left scripts. However, the standard library will continue to use ASCII only for identifiers, and limit the use of non-ASCII in comments and string literals to unit tests for some of the Unicode features, and author names.

We will use "..." or '...' interchangeably for Unicode literals, and b"..." or b'...' for bytes literals. For example, b'abc' is equivalent to creating a bytes object using the expression bytes([97, 98, 99]) .

We are adopting a slightly different approach to codecs: while in Python 2, codecs can accept either Unicode or 8-bits as input and produce either as output, in Py3k, encoding is always a translation from a Unicode (text) string to an array of bytes, and decoding always goes the opposite direction. This means that we had to drop a few codecs that don't fit in this model, for example rot13, base64 and bz2 (those conversions are still supported, just not through the encode/decode API).

New I/O Library The I/O library is also changing in response to these changes. I wanted to rewrite it anyway, to remove the dependency on the C stdio library. The new distinction between bytes and text strings required a (subtle) change in API, and the two projects were undertaken hand in hand. In the new library, there is a clear distinction between binary streams (opened with a mode like "rb" or "wb") and text streams (opened with a mode not containing "b"). Text streams have a new attribute, the encoding, which can be set explicitly when the stream is opened; if no encoding is specified, a system-specific default is used (which might use guessing when an existing file is being opened). Read operations on binary streams return bytes arrays, while read operations on text streams return (Unicode) text strings; and similar for write operations. Writing a text string to a binary stream or a bytes array to a text stream will raise an exception. Otherwise, the API is kept pretty compatible. While there is still a built-in open() function, the full definition of the new I/O library is available from the new io module. This module also contains abstract base classes (see below) for the various stream types, a new implementation of StringIO, and a new, similar class BytesIO, which is like StringIO but implements a binary stream, hence reading and writing bytes arrays.