Mastering Python 3 I/O

Presented at PyCon'11, March 10, 2011, Atlanta, Georgia.

Note: Click here for the 2010 version of this tutorial.

Introduction

As most Python programmers know, Python 3 breaks backwards compatibility with Python 2 both in syntax and new semantics of built-in operations. One of the most radical changes concerns the ground-up redesign of the I/O system. This tutorial aims to take a tour of the new I/O system and issues that are critical to know about if you're going to port existing code. Topics include text processing, binary data handling, system interfaces, io library module, and porting advice.

Support Files

The following file contains some supporting data files that are used in some of the code samples. There are also some code fragments to experiment with things.

This download also includes all of the code samples that follow below.

Code Samples

Here are a few code samples that you can use to try things things out during the course. The course doesn't rely heavily upon these examples, but I'll try a few out here and there.

Preliminaries:

timethis.py. A utility function for making performance measurements. Used in many of the code samples that follow.

Part 1 : Introducting Python 3

printlinks.py. A Python 2 program that simply prints all of the links on a specified HTML page fetched with urlopen(). Try converting this program to Python 3 using 2to3.

Part 2 : Working with Text

textop.py. Performance timings of various text operations. Try it with different versions of Python.

Part 3 : Printing and Formatting

textformat.py. Examples of new-style formatting applied to a list of tuples in order to make a formatted table.

textformat2.py. Examples of new-style formatting applied to a list of dictionaries in order to make a formatted table.

textformat3.py. Examples of new-style formatting applied to a list of instances in order to make a formatted table.

Part 4 : Binary Data Handling

msgfrag.py. A comparison ofjoining byte fragments together using concatenation, join, and bytearray extension.

structwrite.py. Two techniques of writing binary data structures are compared.

Part 5 : The io module

These files have a few simple performance tests for comparing different file modes, encodings, etc. You should try these under both Python 2 and 3.

iterlines.py. Iterate over lines of a text file using native open().

itercodecs.py. Iterate over lines of a text file using codecs.open()

iterbin.py. Iterate over lines of a text file using binary file mode.

iterenc.py. Iterate over lines of a text file using different text encodings. (Python 3 only).

readall.py. Read the entire contents of a file all at once.

find404.py. Find all 404 errors in a web server log using text and binary file modes.

Part 6 : System Interfaces

No files

Part 7 : Library Design Issues

No files.

Feedback

I'm always looking for ways to improve presentation materials and examples. Send your ideas to dave@dabeaz.com.