Previously, we looked at how to create Microsoft Excel (i.e. *.xls) files using the xlwt package. Today we will be looking at how we can read an *.xls/*.xlsx file using a package called xlrd. The xlrd package can be run on Linux and Mac as well as Windows. This is great when you need to process an Excel file on a Linux server.

We will start out by reading the first Excel file we created in our previous article.

Let’s get started!

Reading an Excel Spreadsheet

In this section, we will look at a function that demonstrates different ways of reading an Excel file. Here’s the code example:

import xlrd #---------------------------------------------------------------------- def open_file(path): """ Open and read an Excel file """ book = xlrd.open_workbook(path) # print number of sheets print book.nsheets # print sheet names print book.sheet_names() # get the first worksheet first_sheet = book.sheet_by_index(0) # read a row print first_sheet.row_values(0) # read a cell cell = first_sheet.cell(0,0) print cell print cell.value # read a row slice print first_sheet.row_slice(rowx=0, start_colx=0, end_colx=2) #---------------------------------------------------------------------- if __name__ == "__main__": path = "test.xls" open_file(path)

Let’s break this down a bit. First we import xlrd and then in our function, we open the Excel workbook that was passed in. The next couple of lines show how to introspect the book. We find out how many worksheets there are in the workbook and we print out their names. Next we extract the first worksheet via the sheet_by_index method. We can read an entire row from the worksheet using the row_values method. If we want to get a particular cell’s value, we can call the cell method and pass it the row and column indexes. Finally we use xlrd’s row_slice method to read a portion of the row. As you can see, this last method accepts a row index and the starting and ending column indexes to determine what to return. The row_slice method returns a list of cell instances.

This makes it very easy to iterate over a group of cells. Here’s a small snippet to demonstrate:

cells = first_sheet.row_slice(rowx=0, start_colx=0, end_colx=2) for cell in cells: print cell.value

The xlrd package supports the following types of cells: text, number (i.e. float), dates (any number format that “looks” like a date), Boolean, error and empty/blank. The package also supports extracting data from named cells, although the project doesn’t support all types of named cells. The reference text is a bit vague on what exactly it does not support though.

If you need to copy cell formatting you will need to download the xlutils package.

Wrapping Up

At this point you should know enough to read most Excel files that were built using Microsoft’s XLS format. There is another package that also supports reading xls/xlsx files called the openpyxl project. You might want to check it out as an alternative.