In one of my earlier pieces I explored decision trees in python, which lets you to train a machine learning algorithm to predict or classify data.

I like this style of model because the model itself is valuable; I’m more interested in finding underlying patterns than attempting to predict the future. Decision trees are nice if you want to predict one particular feature; however they aren’t as good for exploratory analysis and they force fit a hierarchy onto data.

Association rules are an alternative model which have a somewhat similar characteristic (they produce probabilistic logic models), but they do not focus on a specific attribute. This is more the style of what Amazon does when it says “people who bought this also bought this” – it doesn’t matter which item you start or end with. We have to use R instead of python – the only mention of association rules in sci-kit learn was a discussion of bumping it from the product to allow them to release in a timely manner, so maybe it will be available there at some point in the future.

To construct this, I’ve build database views that have only the values I want included in my model – this makes it relatively easy to pick and choose which features are included in the resulting model (if you find yourself running the code below a lot, it makes sense to materialize the view).

This uses the RPostgreSQL library, nothing special here.