Don’t overuse classes in Python

Unlike some mainstream languages like Java, you don't have to package everything into a class in Python. A class is a good tool when you want to package up state and behavior, but when all you've got is a bundle of related functionality, the module is the natural unit of packaging in Python.

In my opinion, this article is an egregious example of overuse of classes. I don't want to pick on the author in particular, but it illustrates my point so well that I want to examine the article's code here.

The article was about using Python for exploratory programming, but I think that the class-heavy style makes things more complicated than they need to be. The classes in the code essentially have no state. The one exception is TopRowsWBZipContent , where state is passed into the __init__ method, but is only used in one method and could just as easily have been passed in there. The author also uses extensive inheritance to get the various methods onto the class instances, where if vanilla functions were used, that could all be eliminated.

Here, I want to post the code from the article, and below that my rewrite using plain functions.

First, the article's code (I've made some of the interspersed text into comments):

# Let's look at the first class definition.

# It isn't very interesting, but it shows the design pattern.

class Operation ( object ) :

def processList ( self , files ) :

for fileName in files:

self . process ( fileName )

def processFile ( self , fileName ) :

pass OperationprocessList, filesfileNamefiles:fileNameprocessFile, fileName # Here's a subclass that provides that process.

class ZipContent( Operation ):

def processFile( self, fileName ):

zip= zipfile.ZipFile( fileName )

for member in zip.infolist():

print "%s: %s %s" % ( fileName,

member.filename )

self.examineMember( zip, member ) # Here's the next subclass.

# It opens each zip archive member as a workbook,

# using the xlrd module.

class WBZipContent( ZipContent ):

def examineMember( self, zipFile, member ):

contents= zipFile.read( member.filename )

wb= xlrd.open_workbook( file_contents=contents,

filename=member.filename )

for sheet in wb.sheets():

self.examineSheet( wb, sheet )

def examineSheet( self, wb, sheet ):

print "> Sheet %s %d rows" % (sheet.name, sheet.nrows ) # Exploring the Workbook sheets

# Here's sprint three of the application.

# This is yet another subclass.

class TopRowsWBZipContent( WBZipContent ):

def __init__( self, topnRows=5 ):

super( TopRowsWBZipContent, self ).__init__()

self.topnRows= topnRows

def examineSheet( self, wb, sheet ):

print "> Sheet %s %d rows" % (sheet.name, sheet.nrows )

if self.topnRows is None:

limit= sheet.nrows

else:

limit= min( self.topnRows, sheet.nrows )

for r in xrange(limit):

row= sheet.row(r)

print r, [ c.value for c in row ] def manual():

"""Change the options manually."""

#op= ZipContent() # What's in the ZIP files?

# What does the data look like?

#op= TopRowsWBZipContent( topnRows=5 )

op= ExtractCSVWBZipContent("../data")

files = glob.glob( "../data/*.zip" )

op.processList( files )

Here's my rewrite using vanilla functions. The code is now a lot shorter, and I think easier to understand. (It's also easier to test. Yes, I believe in unit testing exploratory code, at least once it settles down a bit.) I've been a bit snooty and used more Pythonic coding conventions while I was at it.

import glob

import zipfile

import xlrd xlrd def process_files(filenames):

"""Process a list of files."""

for filename in filenames:

process_file(filename) def process_file(filename):

"""

Examine a zipped file.

Configure "examine_member" below to

customize behavior.

"""

zipped = zipfile.ZipFile(filename)

for member in zipped.infolist():

print "%s : %s" % (filename,

member.filename)

examine_member(zipped, member) def examine_workbook(zipped, member):

"""

Examine a workbook. Open up and process each

sheet in the workbook using the xlrd module.

"""

contents= zipped.read(member.filename)

try:

wb= xlrd.open_workbook(file_contents=contents,

filename=member.filename)

except xlrd.biffh.XLRDError:

print "Not an excel file"

else:

for sheet in wb.sheets():

examine_sheet(wb, sheet) def examine_sheet(wb, sheet, top_n_rows=5):

"""

Examine a worksheet. Print top_n_rows, or

all rows in the sheet if top_n_rows is 0/None.

"""

print "> Sheet %s %d rows" % (sheet.name,

sheet.nrows)

limit = top_n_rows or sheet.nrows

for r in xrange(limit):

row = sheet.row(r)

print r, [c.value for c in row] # configure behavior like this

examine_member = examine_workbook def manual():

"""

Run when called as main. Gets all

the zip files in an arbitrary folder

and processes them.

"""

filenames = glob.glob("stuff/*.zip")

process_files(filenames) if __name__ == "__main__":

manual()

Note that this code is even more suited to exploratory programming than the class-based code, because we don't have to write all the class machinery, and we can mix around functions without the need for inheritance or other abuses.