Posted 25 May 2011 - 02:41 PM

#REGULAR GENERATOR #read a file myfile = open('somefile.csv') #use a generator to automatically split each line for us, returning a list of values for each line #note that the file isn't actually processed here, that comes later lines = (line.split(',') for line in myfile) #'lines' is evaluated at this point in the code element by element for line in lines: #do a bunch of processing here #'lines' is now useless (has been iterated through and may not be re-used) #clean up the file myfile.close() #list comprehension #read a file myfile = open('somefile.csv') #use a generator to automatically split each line for us, returning a list of values for each line #at this point the entire file is read into memory, this line will take longer to execute than #a regular generator, but saves processing later lines = [line.strip().split(',') for line in myfile] #'lines' is now just a list, nothing special about looping through a list for line in lines: #do a bunch of processing here #'lines' can now be re-used if necessary, just like a regular list for line in lines[5:]: #do something with 'lines', but ignoring the first 5 lines (that's the '[5:]' bit) #clean up the file myfile.close()

names = ['andy', 'dave', 'rebecca', 'john'] ages = [21, 40, 34, 18] generator = ((names[i], ages[i]) for i in range(len(names))) people = dict(generator) #personally I would combine the above 2 lines into 1 like so: #generator = ((names[i], ages[i]) for i in range(len(names))) print people

{'rebecca': 34, 'dave': 40, 'john': 18, 'andy': 21}

#'List' generator function lines = [line.strip().split(',') for line in open('myfile.csv')] #assume column names are on the first line colnames = lines[0] #create a dictionary to hold our file info (by column) #note that this is effectively a 'single line', I have just split it up to make ti a little more readable myfile = dict( (colnames[i], [line[i] for line in lines]) for i in range(len(colnames)) )

Create a dictionary from:

A 'Tuple' for every column:

This 'Tuple' consists of:

The column name

A list containing every element that belongs in that column

This tutorial already assumes you have a semi-working knowledge of Python. If you don’t already know what a ‘generator’ is check out my other tut 'Generators' , also a good tutorial to read before this is atik97’s 'Data Structures in Python' These are a subject I neglected to cover in my 'Generators' tut but are really handy and worth mentioning. If you recall, the standard format of a single line generator is a kind of one line 'for' loop inside brackets. This will produce a 'one-shot' iterable object which is an object you can iterate over in only one direction and which you can't re-use once you reach the end.A 'list comprehension' looks almost the same as a regular one-line generator, except that the regular brackets - ( ) - are replaced by square brackets - [ ]. The major advanatge of alist comprehension is that produces a 'list', rather than a 'one-shot' iterable object, so that you can go back and forth through it, add elements, sort, etc. While regular generators are only evaluated when they are used, and then usually only element by element,list comprehensions are evaluated on the line they are defined. Perhaps the most clear example can be shown with a bit of code:It should be noted here that a generator may also be turned into a list via the built-in function 'list()', but personally I prefer the '[ ]' approach. Also, using 'list comprehensions' in place of generator functions for reading in files (or any data) can sometimes pose a problem if your file is too big. Keep in mind that your entire file will be loaded into memory, so if you try and load a 2GB+ file you will either crash outright or be waiting around for a very long time.I have done more than my fair share data processing with Python, particularly of space and comma delimited data in vanilla text files, and find this trick really useful on a day-to-day basis. One really important thing I need to mention is that DICTIONARIES ARE NOT ORDERED. It doesn’t matter what order you define or add the key/value pairs, if you try and loop through a Dictionary you will get data out in a funny order. This is really important to remember as it will catch you out sooner or later if you forget it.Tip for creating a dictionary – Combine ‘generators’ and the ‘dict()’ method to quickly and automatically build a Dictionary from some data. The 'dict()' method is really handy as it takes any 'Iterable' object that produces a 'Tuple' containing 2 elements and uses it to produce a fully fledged 'Dictionary'For example:This gives an output of:Notice how this is a stunning example of my earlier caution that Dictionaries are not ordered - 'rebecca' was added third but appears firstThis example may not appear to be the most interesting, but with a little imagination it can be taken and morphed into code that will read in a CSV file and store all the data in a Dictionary with the column names as the keys and a List of all the data entries as the value. For example:A breakdown of that 'dict()' statement would be as follows:That's it for the moment, the second edition should be along soon. Any questions, feel free to ask.