Loading the Data

We shall use the read_table utility from the Tidyverse to load in our dataset. We specify the header and a semicolon as the separator.

A complete description of the columns can be found here. We can quickly check the head of the dataset. One important thing to note is the y column that we’re going to predict. The column indicates whether or not a customer subscribed for a term deposit.

R has a package known as DataExplorer that will enable us to quickly explore the dataset. If you don’t have any of the packages mentioned in this article, just install them using the install.packages(“package name”) command.

Using the introduce function, we see the number of columns, rows, and missing values:

Before diving into machine learning, we sometimes want to plot a few graphs to give us a glimpse into our dataset. DataExplorer provides one function to plot all these graphs.

A quick look at the graphs shows us that most of the customers are blue-collar workers and are married.

One-Hot Encoding

We noticed earlier that some of our columns are categorical. In order to use them in our machine learning model, we have to convert them to dummy variables. This will involve converting them into zeros and ones.

We also have to be keen to drop the first dummy variable in order to avoid the dummy variable trap. So we usually remain with N-1 dummy variables. For example, instead of having a column for both male and female, we want to have one column that will have 1 for male and 0 for female, or vice versa.

R has several packages that one can use to convert columns into dummy variables. In this case, we’ll use the fastDummies package. We utilize the dummy_cols for the conversion and specify remove_first_dummy to TRUE in order to avoid the dummy variable trap.

Next, we select the columns that we’ll use in our machine learning model. We achieve this by specifying the columns to keep in a variable and then use that to make the selection.

If we check the head of our dataset, we notice that every column is now numerical. We’re now ready to move on to the next step where we’ll use Caret to build our machine learning model.