Reverse File Iterator and Premature Optimization

This afternoon, I had a need for a reverse file iterator in a script I was writing to process a quite large log file. After some unsuccessful googling followed by some quick hacking, I was left with a frustratingly close, but nonfunctional, iterator. Instead of fixing it while I was at work, I hacked up a solution calling "tail" and left it for a while.

When I came back to finish the job this evening, I figured that I ought to go look at the source of tail for some inspiration. Surely the unix hackers had figured this problem out long ago? In reverse.c, I found the inspiration I needed:

for (; pos >= start; pos -- ) { /* A seek per char isn't a problem with a smart stdio */ if (fseeko(fp, pos, SEEK_SET) != 0 ) { //snip if ((ch = getc(fp)) == '\ n ' )

I had been reading the file in chunks, splitting the chunk into lines, handling a cache of the lines, and compensating for unfinished lines, all because I had buried deep down in my lizard brain the idea that an fseek per character was "slow". In pure python, for chrissake!

Properly reminded of the fact that premature optimization can sneak up anywhere, I ended up with this code:

import os class reversefile ( object ): """Iterate backwards through a file. f should be an open file handle""" def __init__ ( self , f): self . _f = f self . end = os . stat(f . name) . st_size def __iter__ ( self ): return self def next ( self ): if self . end == 0 : raise StopIteration pos = self . end - 2 while pos >= 0 : self . _f . seek(pos) if self . _f . read( 1 ) == ' \ n' : end = self . end self . end = pos return self . _f . read(end - pos - 1 ) pos -= 1 end = self . end self . end = 0 self . _f . seek( 0 ) return self . _f . read(end) . strip( " \ n" )

You can see the whole thing, with source and some very brief tests I hacked up, here.