Data-driven marketing at Smarkets

A quick glance into how we made our marketing data-oriented and some reasoning behind the technical stack

At Smarkets we pride ourselves on being a lean, data-oriented company. We are a rapidly-growing team of 80 with around 60% developers, handling billions of pounds of trades yearly. So when we created the marketing tech team, it was based on the premise that one of the most critical channels should not be left behind.

Our journey in creating the data-driven marketing machine has been a really interesting one, and we have come a long way from running campaigns that required a lot of engineering effort. Now we have non-technical people spinning up relevant dashboards and using data to determine what campaigns we run. We also looked into how some other similarly-sized companies built their marketing pipelines and this blog post by 500px was a particularly useful source of motivation for us.

Background

Smarkets is one of Europe’s largest betting exchanges and our technology challenges are quite variable, ranging from:

Handling billions of pounds worth of trades with Erlang in our back-end

Creating a reactive, modern website with React and Redux

Having a flask micro-services based architecture powering our admin tools which help automate a ton of tasks, catching fraud and helping us ensure people trade responsibly

Mapping and handling thousands of new events in our website from different data providers

Managing a few hundred hosts spread across multiple teams, monitoring everything and having quick recovery

It was about time we decided to tackle the traditional marketing practices with python and a bunch of powerful open source tools.

I’ve worked at huge companies like Cisco and Facebook, but this was definitely one of the most interesting challenges I have faced given the scale and fuzzy nature of the problem. We had to accept the fact that instead of trying to create a perfect marketing tool, we should start with something usable and keep iterating on it.

Problems

Unlike our tech monitoring dashboards (based on grafana and data-dog), which were easy to configure and spin up, our marketing dashboards required JavaScript knowledge to add more features, hence the time from request to a finished dashboard was quite high. We avoid using third party solutions as we have experimented with them in the past and ran into issues with them not being custom enough for us, taking too much time to set up and being too expensive for the value they provide. We wanted our retention to be automated and not spam people with generic messages. No way to measure costs and benefits of running marketing campaigns without manual accounting or ad hoc scripts by developers. Our technical architecture consists of multiple dockerised micro-services and most of them have combine their own and different data sources. Therefore it’s really hard to get a combined picture of our data. Most of our non-technical staff are sufficiently adept with SQL but without some scripting knowledge it is not possible for them to get relevant data.

Goals

Based on the Pirate Model, have a dashboard that focuses on relevant marketing and business statistics and graphs for non-technical team members. Make the dashboard easy to modify that non-technical staff can add new metrics on the go and have their own easily accessible dashboards for things like retention, content and customer service. Have an OLAP (Online Analytical Processing) oriented data source which is fast for analytics and aggregation oriented queries and makes it easy for any team to store and view data in one place. Service that focuses on sending and keeping track of retention emails in an automated manner. Gather web usage statistics so we can focus on improving our new website which we are gradually rolling out, with minimum moving parts.

Technology stack choices

These were our final picks for our technical stack.

Python, the predominant language at Smarkets.

Luigi for our ETL pipeline. Luigi, which was open-sourced by Spotify, helps you build complex pipelines of batch jobs and execute them in a fault tolerant, scalable way and visualise the dependency graphs.

Amazon Redshift for our data warehouse. Redshift is a fully-managed, data warehouse which is really good for OLAP as it has a column-oriented structure unlike PostgreSQL which we use for most of our micro-services. It was easy to setup, is managed by Amazon — which is a big positive as we wanted our focus to be on analytics and getting things done rather than the infrastructure part — and relatively cheap to get started with.

Pandas/Numpy/Jupyter notebooks for processing and extracting useful information from the processed csvs.

Flask/Gunicorn/Docker for building a container based micro-service for retention. We use an internal docker-py wrapper called dockyard to simplify the deployment of docker containers.

Metabase for our dashboards which visualises the data present in Redshift and makes it easy for people to create custom dashboard without engineering intervention. Also, it was easy to host internally which was an issue with compliance.

Amazon Cloudfront and Amazon S3 for fetching access logs to get usage statistics on our website visitors.

This is what our current marketing architecture looks like:

In Part 2, I will go into details on why we decided to go with the above technical stack.