Let’s get started! First things first, download your payments from Google. Go to https://takeout.google.com. Once there click “Manage Archives” and then “Create New Archive”. For our purposes we are only interested in your Google Pay data. So make sure all options are unticked except for Google Pay. Click “Next” then click “Create Archive”. When you’re archive is ready you will receive an email.

When it’s ready, download your archive and get the file in the zip file located at Takeout > Google Pay > My Activity > My Activity.html. This is the file we will be scraping the information from. Save it to an empty folder somewhere. We’ll use this folder for both the code and storing the input and output. Go ahead and look through the HTML file to get a feel for the layout.

The first thing I wanted to do was strip the out the following items from each purchase and save it to a CSV file: amount, date, time, latitude and longitude. To begin with I inspected the HTML file with Firefox. I found that each payment entry was surrounded by a div with a class called “mdl-grid”.

From there I was able to work out the div for date and time, price, and latitude and longitude. Now, on to the Python!

For this project I’m using Python 3.6, but I feel any Python 3 version should work (don’t quote me). First thing we need to do is install beautiful soup. If you don’t know what it is, Beautiful Soup is a super handy tool for looking through HTML files (online or offline). It makes it really easy to search for an element based on type, class or ID. If you want to use a virtual environment go ahead and activate it now. To install beautiful soup do:

pip install bs4

Now make a new file called ‘scrapePayments.py’.

Bellow is the code I used to scrape the values to a CSV file:

To begin with, this code reads the HTML file into a Beautiful Soup object. Then from that object we select the items that relate to payment (every object that has the class ‘mdl-grid’). The way we’ve chosen to select the items gives us an extra element we don’t want, so we just pop it off the list.

After that we go through each payment and feed each one through our ‘extract_purchase_details’ function. This function will go through and extract all the information from the HTML elements. The date and time, for example, is pulled from the text element with a class ‘mdl-typography — body-1’. Now there is actually two elements with this class, so it actually returns a list with two elements. We just take the first one. After that we use a string slice to remove the excess text from the values, this leave us with text like the following:

Attempted contactless payment<br>8 Jan 2019, 20:47:30 AEDT

To remove the excess we use a string slice which removes everything except the date and time. Then we split the string at the comma and save the date to the ‘date’ variable and the time to the ‘time’ variable. The rest of the values are done in a similar method. Once we have all the values the function returns the values as a tuple which we store in ‘payment_details’. Finally we append this tuple to our payments list.

This is all wrapped in a try-catch statement. This is a little bit of a cheat to get rid of the entries that don’t actually have purchases in them (things like promotions). Because they don’t have the same layout we will cause an exception when trying to access elements of the purchase that don’t exist. Instead of handling the exception, we’re just ignoring it (just like how I handle all my real life problems!)

Once all values are scraped, we then use Python’s CSV library to write the values to a CSV file. To run this, do the following command:

python scrapePayments.py 'My Activity.html'

Now you have all your purchases in a nice CSV file. Depending on your currency you may need to adjust line 3 where we define the local currency symbol ‘$’:

LOCAL_CURRENCY_SYMBOL = '$'

# Might need to become

LOCAL_CURRENCY_SYMBOL ='£'

Funny story, before I added that line to split on the dollar sign, I found out that last time I went to the casino someone charged me 18 Indonesian R̶u̶p̶e̶e̶s̶ Rupiah*! 😲 (I live in Australia, that’s like $0.0018 AUD at the current exchange rate!)

*Thanks to @yogasukmap for pointing out that they use Rupiah, not Rupees in Indonesia!