In python, text could be presented using unicode string or bytes. Unicode is a standard for encoding character. Unicode string is a python data structure that can store zero or more unicode characters. Unicode string is designed to store text data. On the other hand, bytes are just a serial of bytes, which could store arbitrary binary data. When you work on strings in RAM, you can probably do it with unicode string alone. Once you need to do IO, you need a binary representation of the string. Typical IO includes reading from and writing to console, files, and network sockets.

Unicode string literal, byte literal and their types are different in python 2 and python 3, as shown in the following table.

python2.7 python3.4+ unicode string literal u"✓ means check" "✓ means check" or u"✓ means check" unicode string type unicode str byte literal "abc" or b"abc" b"abc" byte type str bytes

You can get python3.4's string literal behavior in python2.7 using future import:

from __future__ import unicode_literals

When you use unicode string literals that includes non-ascii characters in python source code, you need to specify a source file encoding in the beginning of the file:

# !/usr/bin/env python # coding=utf-8

This coding should match the real encoding of the text file. In linux, it's usually utf-8.

It's recommend you always put coding information there. Just config your IDE to insert the code block when you create a new python source file.