Introduction Waterfall charts can be a really useful tool to for certain types of data plotting. Not surprisingly, we can use pandas and matplotlib to create a repeatable waterfall chart. Before I go any further, I want to level set with everyone about which type of chart I’m referring to. I will be building a 2-D waterfall chart described in this wikipedia article. A fairly typical use for a chart such as this is to show what the + and - values are that “bridge” between a start and end value. For this reason, finance folks will sometimes refer to this as a bridge. Like the other examples I’ve used, this type of plot is not easy to generate in Excel. There are certainly ways to do it but it is not easy to remember. The key thing to keep in mind with a waterfall chart is: at its heart it is a stacked bar chart. The “special sauce” is that you have a blank bottom bar so the top bar “floats” in space. Let’s get started.

Creating the Chart Execute the standard imports and make sure IPython will display matplot plots. import numpy as np import pandas as pd import matplotlib.pyplot as plt % matplotlib inline Setup the data we want to waterfall chart and load it into a dataframe. The data needs to start with your starting value but you leave out the final total. We will calculate it. index = [ 'sales' , 'returns' , 'credit fees' , 'rebates' , 'late charges' , 'shipping' ] data = { 'amount' : [ 350000 , - 30000 , - 7500 , - 25000 , 95000 , - 7000 ]} trans = pd . DataFrame ( data = data , index = index ) I am using the handy display function in IPython to make it easier to control what I want to display. from IPython.display import display display ( trans ) amount sales 350000 returns -30000 credit fees -7500 rebates -25000 late charges 95000 shipping -7000 The biggest trick with a waterfall plot is figuring out what the bottom stacked bar chart should be. I learned a lot from this stackoverflow discussion First, let’s get the cumulative sum. display ( trans . amount . cumsum ()) sales 350000 returns 320000 credit fees 312500 rebates 287500 late charges 382500 shipping 375500 Name: amount, dtype: int64 This looks good but we need to shift the data one place to the right. blank = trans . amount . cumsum () . shift ( 1 ) . fillna ( 0 ) display ( blank ) sales 0 returns 350000 credit fees 320000 rebates 312500 late charges 287500 shipping 382500 Name: amount, dtype: float64 We need to add a net total amount to the trans and blank dataframe. total = trans . sum () . amount trans . loc [ "net" ] = total blank . loc [ "net" ] = total display ( trans ) display ( blank ) amount sales 350000 returns -30000 credit fees -7500 rebates -25000 late charges 95000 shipping -7000 net 375500 sales 0 returns 350000 credit fees 320000 rebates 312500 late charges 287500 shipping 382500 net 375500 Name: amount, dtype: float64 Create the steps we use to show the changes. step = blank . reset_index ( drop = True ) . repeat ( 3 ) . shift ( - 1 ) step [ 1 :: 3 ] = np . nan display ( step ) 0 0 0 NaN 0 350000 1 350000 1 NaN 1 320000 2 320000 2 NaN 2 312500 3 312500 3 NaN 3 287500 4 287500 4 NaN 4 382500 5 382500 5 NaN 5 375500 6 375500 6 NaN 6 NaN Name: amount, dtype: float64 For the net row, we need to make sure the blank value is 0 so we don’t double stack. blank . loc [ "net" ] = 0 Plot it and see what it looks like my_plot = trans . plot ( kind = 'bar' , stacked = True , bottom = blank , legend = None , title = "2014 Sales Waterfall" ) my_plot . plot ( step . index , step . values , 'k' ) That looks pretty good but let’s try formatting the y-axis to make it more readable. We use FuncFormatter and some of the python 2.7+ syntax to truncate decimals and add a comma to the format. def money ( x , pos ): 'The two args are the value and tick position' return "$ {:,.0f} " . format ( x ) from matplotlib.ticker import FuncFormatter formatter = FuncFormatter ( money ) Pull it all together my_plot = trans . plot ( kind = 'bar' , stacked = True , bottom = blank , legend = None , title = "2014 Sales Waterfall" ) my_plot . plot ( step . index , step . values , 'k' ) my_plot . set_xlabel ( "Transaction Types" ) my_plot . yaxis . set_major_formatter ( formatter )

Full Script The basic graph works but I wanted to add labels and make some minor formatting changes. Here is my final script: import numpy as np import pandas as pd import matplotlib.pyplot as plt from matplotlib.ticker import FuncFormatter #Use python 2.7+ syntax to format currency def money ( x , pos ): 'The two args are the value and tick position' return "$ {:,.0f} " . format ( x ) formatter = FuncFormatter ( money ) #Data to plot. Do not include a total, it will be calculated index = [ 'sales' , 'returns' , 'credit fees' , 'rebates' , 'late charges' , 'shipping' ] data = { 'amount' : [ 350000 , - 30000 , - 7500 , - 25000 , 95000 , - 7000 ]} #Store data and create a blank series to use for the waterfall trans = pd . DataFrame ( data = data , index = index ) blank = trans . amount . cumsum () . shift ( 1 ) . fillna ( 0 ) #Get the net total number for the final element in the waterfall total = trans . sum () . amount trans . loc [ "net" ] = total blank . loc [ "net" ] = total #The steps graphically show the levels as well as used for label placement step = blank . reset_index ( drop = True ) . repeat ( 3 ) . shift ( - 1 ) step [ 1 :: 3 ] = np . nan #When plotting the last element, we want to show the full bar, #Set the blank to 0 blank . loc [ "net" ] = 0 #Plot and label my_plot = trans . plot ( kind = 'bar' , stacked = True , bottom = blank , legend = None , figsize = ( 10 , 5 ), title = "2014 Sales Waterfall" ) my_plot . plot ( step . index , step . values , 'k' ) my_plot . set_xlabel ( "Transaction Types" ) #Format the axis for dollars my_plot . yaxis . set_major_formatter ( formatter ) #Get the y-axis position for the labels y_height = trans . amount . cumsum () . shift ( 1 ) . fillna ( 0 ) #Get an offset so labels don't sit right on top of the bar max = trans . max () neg_offset = max / 25 pos_offset = max / 50 plot_offset = int ( max / 15 ) #Start label loop loop = 0 for index , row in trans . iterrows (): # For the last item in the list, we don't want to double count if row [ 'amount' ] == total : y = y_height [ loop ] else : y = y_height [ loop ] + row [ 'amount' ] # Determine if we want a neg or pos offset if row [ 'amount' ] > 0 : y += pos_offset else : y -= neg_offset my_plot . annotate ( " {:,.0f} " . format ( row [ 'amount' ]),( loop , y ), ha = "center" ) loop += 1 #Scale up the y axis so there is room for the labels my_plot . set_ylim ( 0 , blank . max () + int ( plot_offset )) #Rotate the labels my_plot . set_xticklabels ( trans . index , rotation = 0 ) my_plot . get_figure () . savefig ( "waterfall.png" , dpi = 200 , bbox_inches = 'tight' ) Running the script will generate this nice looking chart: