Blog

How can a sales team prioritize which leads to focus on? How can a marketing team measure the quality of the leads they produce? One simple option is lead scoring — a system that grades how likely a lead is to convert.

Today we’ll run through one simple version of lead scoring — point based models based on historical performance.

There are more sophisticated options that involve machine learning and a heap of statistics, but point based lead scoring can do a decent job with much less complexity.

Our Experience with Prediction Models

We’ve had great success with a growing number of prediction models.

We’ve built models with regularized logistic regressions, decision trees, and point based systems. So far point based systems are our favorites. They’re easily understood, relatively easy to build by hand, and perform well on evaluation data and real world post-launch data.

For scoring leads, the most useful features have been whether a user has a common email address like gmail vs. a business email, employee counts, the number of days a user visited a website before signing up, which technologies are used on the business’s website, industry sector, employee seniority and job role, and the lead’s existing tools.

Let’s run through how we’ll build our favorite kind of lead scoring system.

Process Overview

There’s a short process for training a new lead scoring model:

Start with a collection of lead email addresses

Enrich the leads with a third-party data source like Clearbit

Train a scoring system with half of the leads

Evaluate the scoring system on the other half of the leads

Data Model

We’ll start with a table of leads:

The only requirements are that each row represents a lead, has a lead id, a column for whether that lead converted, and additional feature fields we can use for modeling.

Enrichment

Before we train a model, we need enough information on each lead to generate good predictions.

Most companies already have several pieces of useful information in their signup forms. This is a good start, but additional information on the company and individual can result in a better model. Ideally, we’ll have five or more high quality features to use.

We’ll use Clearbit to annotate leads with additional information such as industries, job roles, seniority, and especially website technologies.

Since we use a Ruby/Rails stack, our example Clearbit code is in Ruby. First add a json field clearbit to the model with a db migration. Second, loop through every lead and save the newly fetched Clearbit attributes:

require 'clearbit' Clearbit.key = 'your_api_key' Lead.all.each do |lead| clearbit_options = { email: lead.email, stream: true } lead.clearbit = Clearbit::Enrichment.find(clearbit_options) lead.save! end

Expect a couple seconds per lead on average for the Clearbit API to respond. If you want to parallelize, the streaming API allows up to 5 simultaneous requests.

Splitting the Data

We want to split the data into two groups so we have one group to train the model on, and the other set to evaluate on. We can’t get a real evaluation on the same data we trained on, since the model may have memorized some of the answers during training.

Let’s create two views for our training and evaluation leads:

create view training_leads as select * from leads where id % 2 == 0 create view evaluation_leads as select * from leads where id % 2 == 1

Now we can train a model based on our training_leads view.

Feature Engineering

Our leads have many features now, so we need to decide which features to use and how to weight them. This is an exploratory stage where we see which features have a big impact.

Let’s first look at the baseline performance for the dataset, so we know what the typical performance is.

with leads_with_feature as ( select *, 'all_leads' as feature from training_leads ) select feature, count(1) as leads, count(distinct case when converted then id else null end) as conversions, count(distinct case when converted then id else null end)::float / count(1) as conversion_rate from leads_with_feature group by 1

Our baseline conversion rate is 13.8% across the training set.

Now let’s see if we can make a good feature out of website technologies. Clearbit returns several Google related technologies — Google Apps, Google Analytics, and Google AdWords. Let’s look at the general feature of whether the lead’s website uses any Google technologies:

That’s a promising feature! We’ll keep track of it for use in our model. We’ll repeat this step until we have 5 to 10 good features to be used in the next step.

For the rest of the example, it’s assumed we added four additional features: whether Clearbit knew anything about the company or about the person’s job role, and whether the company sector is “Industrials” or “Financial.”

Assigning Points

We have several features that look predictive, but how can we assign points? If we don’t want to dive too deep into machine learning and statistics, we’ll need to use a bit of educated guessing to weight the features.

If a feature predicts a big difference and applies to a large number of datapoints, we’ll add a point. A great example is the Google technologies feature in the previous section — double the conversion rate and at least eighty conversions in each side of the split. We could be more nuanced and add multiple points for especially strong features, but for simplicity we’ll treat it as a simple point-or-not question.

If you want a sense of the statistics involved, playing around with a binomial calculator can be enlightening. Plug in the average conversion rate, and the particular number of conversions and leads to see what the probability is of seeing that particular result.

Fortunately, a point based model adds some redundancy to the scoring. So long as most of our features are actually significantly predictive, we’ll probably get a useful model.

Here’s the query and results for our point allocation. We start with zero points, and add points with a series of case statements based on the features we’ve selected.

with leads_with_points as ( select *, (0 + case when json_extract_path_text( clearbit, 'person', 'employment', 'role') is not null then 1 else 0 end + case when json_extract_path_text( clearbit, 'company', 'category', 'sector') in ('Industrials', 'Financial') then 1 else 0 end + case when json_extract_path_text(clearbit, 'company') is not null then 1 else 0 end + case when json_extract_path_text(clearbit, 'company', 'tech') like '%google%' then 1 else 0 end ) as points from training_leads ) select points, count(1) as leads, count(distinct case when converted then id else null end) as conversions, count(distinct case when converted then id else null end)::float / count(1) as conversion_rate from leads_with_points group by 1

Our model is doing a good job on the training data — the more points, the higher the conversion rate.

Picking Thresholds

Threshold decisions are often affected by business concerns. Let’s assume that the sales team is very happy to get a lead with a greater than 20%+ chance to convert, and happy for 10% or more.

Given those business thresholds, we can group our model points into a low tier (zero points), medium tier (one or two points) and high tier (three to four points).

This new query adds a leads_with_tier expression which uses a case statement on points to bucket into tiers:

with leads_with_points as ( select *, (0 + case when json_extract_path_text( clearbit, 'person', 'employment', 'role') is not null then 1 else 0 end + case when json_extract_path_text( clearbit, 'company', 'category', 'sector') in ('Industrials', 'Financial') then 1 else 0 end + case when json_extract_path_text(clearbit, 'company') is not null then 1 else 0 end + case when json_extract_path_text(clearbit, 'company', 'tech') like '%google%' then 1 else 0 end ) as feature from training_leads ), leads_with_tier as ( select *, case when points = 0 then 'low' when points between 1 and 2 then 'medium' when points between 3 and 4 then 'high' else 'other' end as tier from leads_with_points ) select tier, count(1) as leads, count(distinct case when converted then id else null end) as conversions, count(distinct case when converted then id else null end)::float / count(1) as conversion_rate from leads_with_tier group by 1

Nice! Big buckets and clear separate of conversion rate between each tier. Our model looks great on the test data.

Evaluating the model

We built our model by looking at which features were successful, and keeping the ones that predicted data on the training set. The risk here is that our model can only predict the training set data, and it will fail in the future if we ship it.

Because we saved half of our leads for evaluation, we can now try the model on the leads it wasn’t trained on. This gives us an estimate of how good it will do on predicting lead value in the future on leads it hasn’t seen before.

If we had chosen several features that only applied to a small number of leads, and only made a small difference, it is very likely we would have made a model that only memorized the training set, and it would do poorly on the evaluation set.

This memorization of the test set is called overfitting and overfit models have worse performance on the evaluation set and future data.

We can evaluate our model by swapping in the evaluation_leads to the from clause of the above lead tier query. The results are:

Success! The evaluation results still show a clear separation of tiers and big buckets. There are some small changes in the exact conversion rates, but nothing severe enough to cause a problem.

If our future leads are similar to our past leads, then we can be confident that our model will do a good job of separating them into these tiers. A corollary is that if we change what kinds of leads we get by significantly changing our marketing efforts, the model may no longer apply and we’d need to train a new model.

Had the evaluation results been unsatisfactory, the next step would be to carefully develop a new model. Were we to make a single small change to our model and try it again on the evaluation set, we the model builders would start to memorize which features evaluate well. This destroys the integrity of the evaluation process, because the evaluation data has been transformed into training data.

If we need to develop a new model, we must make enough changes that we can’t predict which individual changes will evaluate well.

Score!

We now know how to train and evaluate simple data-based lead scoring models. They can be used for both measuring the lead quality that marketing delivers, and prioritizing sales effort.

Good luck with lead scoring!