Intermediate Python: Pythonic file searches

It's very easy to get up and running with Python, but programmers coming from other more verbose or procedural languages tend to write code that's not very pythonic — that is, it doesn't use Python idioms that experienced programmers use.

The problems with un-pythonic code are that it tends to be more verbose, more difficult to understand, and even to run slower. Here's a naive implementation of a function to find every line in a supplied filename containing a specified string. It returns a list of (line_num, line) tuples.

def naive_way ( to_find, filename ) :

"" "Find string to_find in file filename" ""

file_handle = open ( filename )

line_number = 0

lines = [ ]

done = False

while done == False :

line = file_handle. readline ( )

if not line:

done = True

else :

line_number += 1

index = line. find ( to_find )

if index > -1 :

lines. append ( ( line_number, line ) )

return lines

This code is fairly readable and it gets the job done, but we can do better. Notice all these variables lying around? Those are bad because they clutter up the function (making the intent of the function harder to see), and actually slow down the code. Things like "line_number += 1" are more costly than you might expect, because every time you write "1" you're creating an object.

We can get rid of " done " and " file_handle " by iterating over the file rather than using the low-level readline() method. We can avoid the code to increment " line_number " by using the built-in enumerate generator function. Finally, we can get rid of " index " by using the " in " statement.

Here's a more pythonic version of the above function:

def pythonic_way ( to_find, filename ) :

"" "Find string to_find in file filename" ""

lines = [ ]

for line_num, line in enumerate ( open ( filename ) ) :

if to_find in line:

lines. append ( ( line_num +1 , line ) )

return lines

Remember what I said about pythonic code being faster? Here are the times I got for running these functions 100 times, searching for "your system" in "/python25/readme.txt" (rounded to nearest three decimals):

Without psyco With psyco naive_way 0.411 s 0.213 s pythonic_way 0.116 s 0.082 s

Psyco manages to narrow the gap a bit (probably by optimizing away those object creations), but even with psyco the pythonic function is 2.5x faster, not to mention more readable (to a Python programmer, at least!). And since bugs are directly correlated to number of lines of source code, it's likely to have fewer bugs as well.