Introduction I have written several times about the usefulness of pandas as a data manipulation/wrangling tool and how it can be used to efficiently move data to and from Excel. There are cases, however, where you need an interactive environment for data analysis and trying to pull that together in pure python, in a user-friendly manner would be difficult. This article will discuss how to use xlwings to tie Excel, Python and pandas together to build a data analysis tool that pulls information from an external database, manipulates it and presents it to the user in a familiar spreadsheet format.

A Quick Excel Automation Intro Excel supports several automation options using VBA. User Defined Functions (UDF) are relatively simple in that they take inputs and returns a single value. The more powerful option is a macro (or procedure) that can automate just about anything Excel can do. Despite the fact that UDF’s and macros are powerful, they are still written in VBA and there are times when it would be useful to bring the power of python to our Excel-based solution. That’s where xlwings comes into play. At the simplest level, xlwings allows us to glue python and Excel together in two main ways: Control Excel from python

Call custom python code from within Excel This article will focus on building an Excel worksheet that calls your custom python code.

The Problem For this example, we are going to develop a simple modeling application that will allow someone to enter an account number and date range then return some summarized sales information that has been transformed via pandas. The solution is simple but shows the power of this combination and how easily you could perform more complex data analysis. Here’s a diagram of what we are trying to do: The example shown below could easily be expanded to query multiple databases or interact with any kind of file that python can read (CSV, Excel, json, etc.)

Setting Up The Environment For the purposes of this article, I will assume you are running the application on a Windows-based system. I highly recommend you use anaconda (or miniconda) as your distro of choice. The first thing we need to do is install xlwings (assuming python+pandas are already installed): conda install xlwings Version Warning xlwings is being constantly updated. This code is based on version 0.7.1. There is a nice xlwings helper function called quickstart which will create a sample Excel file and stub python file for you. c: \> xlwings quickstart pbp_proj If you look in the newly created pbp_proj directory, you’ll see two files: pbp_proj.py pbp_proj.xlsm The python file is empty and the Excel file looks empty but there has been some behind the scenes work done to make the excel to python interface easier for you. To see what is put into the Excel file, open your newly created file in Excel and go into Developer -> Visual Basic and you should see something like this: You will notice that there are two modules - xlwings and Module1 . The xlwings module includes all the VBA code to make your custom code work. For the most part you should leave that alone. However, if you have issues with your configuration (like you can’t find python) then you can update the config information in this section. The Module1 will have some default code that looks like this: We will modify that in a moment to call our custom code. First, I want to create the Excel input fields. For this application, we are going to allow the user to enter an account number, start date and end date and will manipulate the sales date based on these inputs. Here is the simple spreadsheet: I have only made some minor formatting changes, there are no formulas in the cells. Be sure to save the changes to the Excel file. For the next step, I’m going to create a short python function that illustrates how to read data from Excel and write it back. I will be saving this in the empty file called pbp_proj.py import pandas as pd from xlwings import Workbook , Range def summarize_sales (): """ Retrieve the account number and date ranges from the Excel sheet """ # Make a connection to the calling Excel file wb = Workbook . caller () # Retrieve the account number and dates account = Range ( 'B2' ) . value start_date = Range ( 'D2' ) . value end_date = Range ( 'F2' ) . value # Output the data just to make sure it all works Range ( 'A5' ) . value = account Range ( 'A6' ) . value = start_date Range ( 'A7' ) . value = end_date The program is simple and not very useful at this point. I think it is easier to develop a skeleton program in order to make sure all the “plumbing” is in place. The key thing to remember is that the file is called pbp_proj.py and the function is called summarize_sales . To wire this all together, we need to define an Excel procedure to run our code: The code is really concise just import the module and execute the function: Sub RetrieveSales() RunPython ("import pbp_proj;pbp_proj.summarize_sales()") End Sub The final piece is to add a button to our sheet and assign it to the procedure/macro RetrieveSales . Once you have that in place, you should be able to press the button and see something like this: The basic process is in place. We can read from Excel into a python program and use that to output data back into Excel. Now, let’s make this a little more useful.