Author Feedback

Print Article

A Developer's Guide to Python 3.0: Numbers, Strings, and Data Python 3.0 makes critical—and not-backwardly-compatible—changes to data types. Find out how these changes will affect your code.

by Gigi Sayfan Apr 7, 2009 Page 1 of 6 he



PEP 237: Unifying Long Integers and Integers Python 2.x has two integral types: int and long. The int type is limited to the machine's "native" word size (32 or 64 bit in modern machines). Operations on the int type can overflow and result in OverflowError exceptions (before Python 2.2). In contrast, the long type is limited only by the amount of available memory, and could conceptually represent any integer.



The reason for having two integer types is that int is very efficient because it has direct support in hardware and OSs, while the long type is flexible and doesn't require the developer to keep tabs on the size of numbers. But having two types presents several problems when porting compiled Python files or pickled objects across machines with different architectures.



The goal of PEP-237 is to eventually unify these two concepts, combining them into a single integer type that changes its representation internally to use the more efficient machine integer when possible. The implementation actually stretched across four different versions: 2.2, 2.3, 2.4, and is now complete in 3.0.



Python 2.4 and higher already support auto-promotion of int to long without exceptions or warnings. Python 3.0 simply eliminated the long type and long literals at the Python level. If you try to use long in Python 3.0 you will get an error:



>>> long Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'long' is not defined Python 3.0 also removed the L suffix for longs. Now, an integer is an integer is an integer. In Python 2.5 this is fine:



>>> 5L 5L But in Python 3.0 it's a syntax error:



>>> 5L File "<stdin>", line 1 5L ^ SyntaxError: invalid syntax In Python 2.5 the following code generates a long object:



>>> x = 5 ** 88 >>> type(x) <type 'long'> >>> x 32311742677852643549664402033982923967414535582065582275390625L In Python 3.0 it's an int:



>>> x = 5 ** 88 >>> type(x) <class 'int'> >>> x 32311742677852643549664402033982923967414535582065582275390625 PEP 3127: Integer Literal Support and Syntax Python has always supported a plethora of radices or bases for integers. The int() and long() functions in Python 2.5 accept a second argument, which is the base to convert from. The base can be any integer between 2 and 36 (inclusive):



>>> int('000111', 2) 7 >>> int('000111', 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: int() base must be >= 2 and <= 36 >>> int('000111', 36) 1333 TypeError: long() can't convert non-string with explicit base >>> long('555555555555555555555555555555555555555', 6) 2227915756473955677973140996095L Python 3.0 preserves this functionality (although the error message says arg 2 instead of base ):



>>> int('0001111', 2) 15 >>> int('5', 36) 5 >>> int('5', 37) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: int() arg 2 must be >= 2 and <= 36 Python 2.5 also supported integer literals in octal and hexadecimal, so whenever an integer was expected you could provide an octal or hexadecimal number instead. Octal numbers required a leading zero, as in 0123 ; hexadecimal numbers required both a leading zero and the character x or X , as in 0x123 . Finally, there are two functions called oct() and hex() , each of which takes an integer and returns its string representation in octal or hexadecimal, for example:



>>> 010 8 >>> 010 + 8 16 >>> 0xa 10 >>> 0xa + 010 + 2 20 >>> oct(20) '024' >>> hex(20) '0x14' Python 3.0 maintained all this functionality, but with one small change—the prefix for octal numbers is now a zero and the character o or O as in 0O123 instead of just 0123 . The original notation with the single leading zero was borrowed from C programming language. The change reduces the possibility for confusion for developers unfamiliar with C-like languages or with octal numbers. The expectation of such developers is that leading zeros don't change the value of numbers. For example, they might try to use leading zeros for formatting and indentation purposes and unwittingly end up with the wrong numbers. In addition, Python 3.0 adds a binary literal. All in all, this break from the C legacy creates a uniform notation for integer literals in bases 2, 8, and 16 (binary, octal, and hexadecimal). The prefixes are 0b , 0o and 0x :



>>> 0b10 2 >>> 0o10 8 >>> 0x10 16 There is also a new bin() function that converts integers to a binary string representation (analogous to oct() and hex() ):



>>> bin(5) '0b101' >>> bin(0x10) '0b10000' >>> bin(0o10) '0b1000' The oct() function of course uses the new 0o prefix and not the old 0 prefix as in Python 2.5:



>>> oct(12) '0o14' I feel that this change, while pretty minor in the great scheme of things, is an elegant and clean win-win solution. It removed an obstacle from the path of new users, it made a clean break from the past (octal notation in C), and it unified the notation for radix literals, which is important when adding the new binary literal.



he first article in this series covered important changes to the core language and its type system. This article focuses on how Python 3.0 treats the basic data types: numbers, text, and binary data.Python 2.x has two integral types: int and long. The int type is limited to the machine's "native" word size (32 or 64 bit in modern machines). Operations on the int type can overflow and result in OverflowError exceptions (before Python 2.2). In contrast, the long type is limited only by the amount of available memory, and could conceptually represent any integer.The reason for having two integer types is that int is very efficient because it has direct support in hardware and OSs, while the long type is flexible and doesn't require the developer to keep tabs on the size of numbers. But having two types presents several problems when porting compiled Python files or pickled objects across machines with different architectures.The goal of PEP-237 is to eventually unify these two concepts, combining them into a single integer type that changes its representation internally to use the more efficient machine integer when possible. The implementation actually stretched across four different versions: 2.2, 2.3, 2.4, and is now complete in 3.0.Python 2.4 and higher already support auto-promotion of int to long without exceptions or warnings. Python 3.0 simply eliminated the long type and long literals at the Python level. If you try to use long in Python 3.0 you will get an error:Python 3.0 also removed thesuffix for longs. Now, an integer is an integer is an integer. In Python 2.5 this is fine:But in Python 3.0 it's a syntax error:In Python 2.5 the following code generates a long object:In Python 3.0 it's an int:Python has always supported a plethora of radices or bases for integers. Theandfunctions in Python 2.5 accept a second argument, which is the base to convert from. The base can be any integer between 2 and 36 (inclusive):Python 3.0 preserves this functionality (although the error message saysinstead of):Python 2.5 also supported integer literals in octal and hexadecimal, so whenever an integer was expected you could provide an octal or hexadecimal number instead. Octal numbers required a leading zero, as in; hexadecimal numbers required both a leading zero and the characteror, as in. Finally, there are two functions calledand, each of which takes an integer and returns its string representation in octal or hexadecimal, for example:Python 3.0 maintained all this functionality, but with one small change—the prefix for octal numbers is now a zero and the characteroras ininstead of just. The original notation with the single leading zero was borrowed from C programming language. The change reduces the possibility for confusion for developers unfamiliar with C-like languages or with octal numbers. The expectation of such developers is that leading zeros don't change the value of numbers. For example, they might try to use leading zeros for formatting and indentation purposes and unwittingly end up with the wrong numbers. In addition, Python 3.0 adds a binary literal. All in all, this break from the C legacy creates a uniform notation for integer literals in bases 2, 8, and 16 (binary, octal, and hexadecimal). The prefixes areandThere is also a newfunction that converts integers to a binary string representation (analogous toand):Thefunction of course uses the newprefix and not the oldprefix as in Python 2.5:I feel that this change, while pretty minor in the great scheme of things, is an elegant and clean win-win solution. It removed an obstacle from the path of new users, it made a clean break from the past (octal notation in C), and it unified the notation for radix literals, which is important when adding the new binary literal.



Next Page 123456 Next Page » Author Feedback

Email Article

Print Article

