In this article, I would like to present how to predict employee attrition with machine learning. For analysis I will use a data set created by IBM data scientists, which is available here. However, I will do a split into train and test samples to better explain you how machine learning methods can be applied to this problem. The splitted data is available at my github. The train set represents historical data about employees. In this data each sample (row) describes the employee with parameters like: age, department, distance from home, marital status, income, years at company. You can check all used descriptors here. For each employee in the train set the attrition is known (it is historical value). In test data we have employees descriptors available, however the attrition is unknown and we want to predict (compute) it with our machine learning model. (To be honest, the attrition values in test data are available, but for better explanation let’s assume that it is missing).

There are 1200 samples (employees) in train data. For model training we will use MLJAR which allows to create machine learning models in the browser (no installations required!). We start with project set-up, we will set project title and task: binary classification (we predict yes/no for attrition).