Intro

This blog post aims at showing what kind of feature engineering can be achieved in order to improve machine learning models.

I entered Kaggle’s instacart-market-basket-analysis challenge with goals such as :

finish top5% of a Kaggle competition

keep learning Python (i come from R)

I ended up 52nd (top2%) ouf of 2623 data scientists, which i’m pretty happy with afterwards eventhough i was bitter during the last week because i lost some competitive edges i had due to the release of this and this public posts. I’m also now part of the Kaggle master club and ranked 1162 out of 65 891 worldwide data scientists.

Instacart delivers groceries from local stores and asked Kaggle community to predict which products will be reordered by customers during their next purchase. Submissions will be evaluated based on their mean F1 score.

Here are some keys numbers:

200 000 customers

3 million orders

32 million products purchased

50 000 unique products

In this blog post i’ll detail my general approach (in a machine learning way) and the feature engineering work which was done. Feature engineering is the oil allowing machine learning models to shine. In my opinion feature engineering and data wrangling is more important than models!

My whole code can be found on my Github here.

Few ideas were shared by people on the Kaggle forum i thank them.