Churn Prediction

Churn Prediction with XGBoost Binary Classification

This series of articles was designed to explain how to use Python in a simplistic way to fuel your company’s growth by applying the predictive approach to all your actions. It will be a combination of programming, data analysis, and machine learning.

I will cover all the topics in the following nine articles:

1- Know Your Metrics

2- Customer Segmentation

3- Customer Lifetime Value Prediction

4- Churn Prediction

5- Predicting Next Purchase Day

6- Predicting Sales

7- Market Response Models

8- Uplift Modeling

9- A/B Testing Design and Execution

Articles will have their own code snippets to make you easily apply them. If you are super new to programming, you can have a good introduction for Python and Pandas (a famous library that we will use on everything) here. But still without a coding introduction, you can learn the concepts, how to use your data and start generating value out of it:

Sometimes you gotta run before you can walk — Tony Stark

As a pre-requisite, be sure Jupyter Notebook and Python are installed on your computer. The code snippets will run on Jupyter Notebook only.

Alright, let’s start.

Part 4: Churn Prediction

In the last three sections of Data Driven Growth series, we have discovered tracking essential metrics, customer segmentation, and predicting the lifetime value programmatically. Since we know our best customers by segmentation and lifetime value prediction, we should also work hard on retaining them. That’s what makes Retention Rate is one of the most critical metrics.

Retention Rate is an indication of how good is your product market fit (PMF). If your PMF is not satisfactory, you should see your customers churning very soon. One of the powerful tools to improve Retention Rate (hence the PMF) is Churn Prediction. By using this technique, you can easily find out who is likely to churn in the given period. In this article, we will use a Telco dataset and go over the following steps to develop a Churn Prediction model:

Exploratory data analysis

Feature engineering

Investigating how the features affect Retention by using Logistic Regression

Building a classification model with XGBoost

Exploratory Data Analysis

We start with checking out how our data looks like and visualize how it interacts with our label (churned or not?). Let’s start with importing our data and print the first ten rows:

df_data = pd.read_csv('churn_data.csv')

df_data.head(10)

Output:

A better way to see all the columns and their data type is using .info() method:

It seems like our data fall under two categories:

Categorical features: gender, streaming tv, payment method &, etc.

Numerical features: tenure, monthly charges, total charges

Now starting from the categorical ones, we shed light on all features and see how helpful they are to identify if a customer is going to churn.

As a side note, in the dataset we have, Churn column is string with Yes/No values. We convert it to integer to make it easier to use in our analysis.

df_data.loc[df_data.Churn=='No','Churn'] = 0

df_data.loc[df_data.Churn=='Yes','Churn'] = 1

Gender

By using the code block below, we easily visualize how Churn Rate (1-Retention Rate) looks like for each value:

df_plot = df_data.groupby('gender').Churn.mean().reset_index()

plot_data = [

go.Bar(

x=df_plot['gender'],

y=df_plot['Churn'],

width = [0.5, 0.5],

marker=dict(

color=['green', 'blue'])

)

] plot_layout = go.Layout(

xaxis={"type": "category"},

yaxis={"title": "Churn Rate"},

title='Gender',

plot_bgcolor = 'rgb(243,243,243)',

paper_bgcolor = 'rgb(243,243,243)',

)

fig = go.Figure(data=plot_data, layout=plot_layout)

pyoff.iplot(fig)

Output:

Churn Rate by Gender

Gender breakdown for the churn rate:

Female customers are more likely to churn vs. male customers, but the difference is minimal (~0.8%).

Let’s replicate this for all categorical columns. To not repeat what we did for gender, you can find the code needed for all below:

Now we go over the features which show the most significant difference across their values:

Internet Service

Churn Rate by Internet Service

This chart reveals customers who have Fiber optic as Internet Service are more likely to churn. I normally expect Fiber optic customers to churn less due to they use a more premium service. But this can happen due to high prices, competition, customer service, and many other reasons.

Contract

Churn Rate by Contract

As expected, the shorter contract means higher churn rate.

Tech Support

Churn Rate by Tech Support

Customers don’t use Tech Support are more like to churn (~25% difference).

Payment Method

Automating the payment makes the customer more likely to retain in your platform (~30% difference).

Others

Let’s show some of the other features’ graphs here for the reference: