There are certain Perl idioms that every Perl programmer uses: "while (<>) { foo; }" and "foo ~= s/old/new/g" both come to mind.

When I was learning Python I was pretty peeved that certain Python books don't get to that kind of thing until much later chapters. One didn't cover that kind of thing until the end! As [a long-time Perl user](https://everythingsysadmin.com/2011/03/overheard-at-the-office-perl-e.html) this annoyed and confused me.

While they might have been trying to send a message that Python has better ways to do those things, I think the real problem was that the audience for a general Python book is a lot bigger than the audience for a book for Perl people learning Python. Imagine how confusing it would be to a person learning their first programming language if their book started out comparing one language you didn't know to a different language you didn't know!

So here are the idioms I wish were in Chapter 1. I'll be updating this document as I think of new ones, but I'm trying to keep this to be a short list.

Processing every line in a file

Perl:

while (<>) { print $_; }

Python:

for line in file('filename.txt'): print line

To emulate the Perl <> technique that reads every file on the command line or stdin if there is none:

import fileinput for line in fileinput.input(): print line

If you must access stdin directly, that is in the "sys" module:

import sys for line in sys.stdin: print line

However, most Python programmers tend to just read the entire file into one huge string and process it that way. I feel funny doing that. Having used machines with very limited amounts of RAM, I tend to try to keep my file processing to a single line at a time. However, that method is going the way of the dodo.

contents = file('filename.txt').read() all_input = sys.stdin.read()

If you want the file to be one string per line, with the newline removed just change read() to readlines()

list_of_strings = file('filename.txt').readlines() all_input_as_list = sys.stdin.readlines()

Regular expressions

Python has a very powerful RE system, you just have to enable it with "import re". Any place you can use a regular expression you can also use a compiled regular expresion. Python people tend to always compile their regular expressions; I guess they aren't used to writing throw-away scripts like in Perl:

import re RE_DATE = re.compile(r'\d\d\d\d-\d{1,2}-\d{1,2}') for line in sys.stdin: mo = re.search(RE_DATE, line) if mo: print mo.group(0)

There is re.search() and re.match(). re.match() only matches if the string starts with the regular expression. It is like putting a "^" at the front of your regex. re.search() is like putting a ".*" at the front of your regex. Since match comes before search alphabetically, most Perl users find "match" in the documentation, try to use it, and get confused that r'foo' does not match 'i foo you'. My advice? Pretend match doesn't exist (just kidding).

The big change you'll have to get used to is that the result of a match is an object, and you pull various bits of information from the object. If nothing is found, you don't get an object, you get None, which makes it easy to test for in a if/then. An object is always True, None is always false. Now that code above makes more sense, right?

Yes, you can put parenthesis around parts of the regular expression to extract out data. That's where the match object that gets returned is pretty cool:

import re for line in sys.stdin: mo = re.search(r'(\d\d\d\d)-(\d{1,2})-(\d{1,2})', line) if mo: print mo.group(0)

The first thing you'll notice is that the "mo =" and the "if" are on separate lines. There is no "if x = re.search() then" idiom in Python like there is in Perl. It is annoying at first, but eventually I got used to it and now I appreciate that I can't accidentally assign a variable that I meant to compare.

Let's look at that match object that we assigned to the variable "mo" earlier:

mo.group(0) -- The part of the string that matched the regex.

mo.group(1) -- The first ()'ed part

mo.group(2) -- The second ()'ed part

mo.group(1,3) -- The first and third matched parts (as a tuple)

mo.groups() -- A tuple containing all the matched parts.

The perl s// substitutions are easily done with re.sub() but if you don't require a regular expression "replace" is much faster:

>>> re.sub(r'\d\d+', r'', '1 22 333 4444 55555') '1 ' >>> re.sub(r'\d+', r'', '9876 and 1234') ' and ' >>> re.sub(r'remove', r'', 'can you remove from') 'can you from' >>> 'can you remove from'.replace('remove', '') 'can you from'

You can even do multiple parenthesis substitutions as you would expect:

>>> re.sub(r'(\d+) and (\d+)', r'yours=\1 mine=\2', '9876 and 1234') 'yours=9876 mine=1234'

After you get used to that, read the ""pydoc re" page":http://docs.python.org/library/re.html for more information.

String manipulations

I found it odd that Python folks don't use regular expressions as much as Perl people. At first I though this was due to the fact that Python makes it more cumbersome ('cause I didn't like to have to do 'import re'). It turns out that Python string handling can be more powerful. For example the common Perl idiom "s/foo/bar" (as long as "foo" is not a regex) is as simple as:

credit = 'i made this' print credit.replace('made', 'created')

or

print 'i made this'.replace('made', 'created')

It is kind of fun that strings are objects that have methods. It looks funny at first.

Notice that replace returns a string. It doesn't modify the string. In fact, strings can not be modified, only created. Python cleans up for automatically, and it can't do that very easily if things change out from under it. This is very Lisp-like. This is odd at first but you get used to it. Wait... by "odd" I mean "totally fucking annoying". However, I assure you that eventually you'll see the benefits of string de-duplication and (I'm told) speed.

It does mean, however, that accumulating data in a string is painfully slow:

s = 'this is the first part

' s += 'i added this.

' s += 'and this.

' s += 'and then this.

'

The above code is bad. Each assignment copies all the previous data just to make a new string. The more you accumulate, the more copying is needed. The Pythonic way is to accumulate a list of the strings and join them later.

s = [] s.append('this is the first part

') s.append('i added this.

') s.append('and this.

') s.append('and then this.

') print ''.join(s)

It seems slower, but it is actually faster. The strings stay in their place. Each addition to "s" is just adding a pointer to where the strings are in memory. You've essentially built up a linked list of pointers, which are much more light-weight and faster to manage than copying those strings around. At the end, you join the strings. Python makes one run through all the strings, copying them to a buffer, a pointer to which is sent to the "print" routine. This is about the same amount of work as Perl, which internally was copying the strings into a buffer along the way. Perl did copy-bytes, copy-bytes, copy-bytes, copy-bytes, pass pointer to print. Python did append-pointer 4 times then a highly optimized copy-bytes, copy-bytes, copy-bytes, copy-bytes, pass pointer to print.

joining and splitting.

This killed me until I got used to it. The join string is not a parameter to join but is a method of the string type.

Perl:

new = join('|', str1, str2, str3)

Python:

new = '|'.join([str1, str2, str3])

Python's join is a function of the delimiter string. It hurt my brain until I got used to it.

Oh, the join() function only takes one argument. What? It's joining a list of things... why does it take only one argument? Well, that one argument is a list. (see example above). I guess that makes the syntax more uniform.

Splitting strings is much more like Perl... kind of. The parameter is what you split on, or leave it blank for "awk-like splitting" (which heathens call "perl-like splitting" but they are forgetting their history).

Perl:

my @values = split('|', $data);

Python:

values = data.split('|'); You can split a string literal too. In this example we don't give split() any parameters so that it does "awk-like splitting".

print 'one two three four'.split() ['one', 'two', 'three', 'four']

If you have a multi-line string that you want to break into its individual lines, bigstring.splitlines() will do that for you.

Getting help

pydoc foo

except it doesn't work half the time because you need to know the module something is in . I prefer the "quick search" box on http://docs.python.org or "just use Google".

I have not read ""Python for Unix and Linux System Administration":http://www.amazon.com/dp/0596515820/safocus-20" but the table of contents looks excellent. I have read most of Python Cookbook (the first edition, there is a 2nd edition out too) and learned a lot. Both are from O'Reilly and can be read on Safari Books Online.

That's it!

That's it! Those few idioms make up most of the Perl code I usually wrote. Learning Python would have been so much easier if someone had showed me the Python equivalents early on.

One last thing... As a sysadmin there are a few modules that I've found useful: