I’m shopping for a house and it’s taken me longer than I anticipated. To deal with the frustration, I’ve decided to try a more Data-Driven approach. This is what I’ve accomplished so far. I have collected data on a bunch of homes for sale in the Amsterdam area. With this data, and some pretty straightforward statistics, I have characterized the relationship between sale price, rooms, size, and location.

The Data:

In total, it’s about 1,800 listings collected online from a real estate website in the Netherlands. The data includes price, number of rooms, and square meters size. It’s also got the free-text address from which we extract details like the precise location and the zip code.

data %>% head()

Distributions and Bi-Variate Plots:

Let’s take a peek at the distributions of these variables, and how they might be related to one another.