Any visitor to Manchester city centre will be struck by the number of modern blocks of city centre flats.

Take the tram out towards Altrincham and you’ll see modern apartment blocks on both sides, out towards Salford and back towards Castlefield.

Many of these flats will have been sold, which means they will turn up in the Land Registry’s Price Paid data.

This is an excellent dataset containing residential property sales in England and Wales. You can download the (huge) full file, or you can use the data wizard to filter the data you want. The Land Registry also let you download this year’s data only, which they update monthly. This is the file I’m going to be using (note it’s since been updated with another month).

Getting the data into RStudio

#read Land Registry CSV, called pp-2016.csv houses <- read.csv("pp-2016.csv",header=FALSE,stringsAsFactors = FALSE) #add column names colnames(houses) <- c("id","price","date","postcode","type", "y/n","hold","housename2","housename1","street", "neighbourhood1","neighbourhood2","la","county","a","a2")

Our data is now in RStudio. The CSV file doesn’t come with column headers, so we’ve correctly labelled them. We won’t need all these columns but it helps to know what they are.

Calling str on our data frame shows us the structure of our data:

'data.frame': 469605 obs. of 16 variables: $ id : chr "{369DFB16-3F24-3A19-E050-A8C0620518C6}" "{369DFB16-3F25-3A19-E050-A8C0620518C6}" "{369DFB16-3F26-3A19-E050-A8C0620518C6}" "{369DFB16-3F27-3A19-E050-A8C0620518C6}" ... $ price : int 169995 110000 240000 70000 80000 165000 138500 246500 145000 230000 ... $ date : chr "2016-06-01 00:00" "2016-04-29 00:00" "2016-06-10 00:00" "2016-05-27 00:00" ... $ postcode : chr "S73 0BX" "S12 4RW" "S74 9NW" "S5 7DQ" ... $ type : chr "D" "S" "T" "S" ... $ y/n : chr "N" "N" "N" "N" ... $ hold : chr "F" "F" "F" "F" ... $ housename2 : chr "1" "34" "MOOR VIEW BARN" "29" ... $ housename1 : chr "" "" "" "" ... $ street : chr "COTTERDALE GARDENS" "ALPORT PLACE" "HIGH ROYD LANE" "MUSGRAVE CRESCENT" ... $ neighbourhood1: chr "WOMBWELL" "" "HOYLAND" "" ... $ neighbourhood2: chr "BARNSLEY" "SHEFFIELD" "BARNSLEY" "SHEFFIELD" ... $ la : chr "BARNSLEY" "SHEFFIELD" "BARNSLEY" "SHEFFIELD" ... $ county : chr "SOUTH YORKSHIRE" "SOUTH YORKSHIRE" "SOUTH YORKSHIRE" "SOUTH YORKSHIRE" ... $ a : chr "A" "A" "A" "A" ... $ a2 : chr "A" "A" "A" "A" ...

Focusing on ‘type’

‘Type’ refers to the kind of home being sold. The five possibilities are ‘D’ (detached), ‘S’ (semi-detached), ‘T’ (terraced), ‘F’ (flat) and ‘O’ (other).

We don’t really want ‘other’ as it can skew the data. We also want to write these out in full so they show up properly on our legends.

#remove other by using a regular expression to search for everything except 'O' no.other <- grep("[^O]", houses$type) filtered_data <- houses[no.other, ] #with the plyr package enabled, rename the values in the 'type' section filtered_data$type <- revalue(filtered_data$type, c("D" = "Detached", "F" = "Flat", "S" = "Semi-detached", "T" = "Terraced"))

We now have the data we want

So let’s take a look at Manchester, using ggplot2. We are going to use geom_jitter, which spreads our dots out so we can see them better. If we used geom_point they would all be on one line because they are all from Manchester.

selected <- grep("MANCHESTER",filtered_data$la) manchester_data <- filtered_data[selected, ] ggplot(manchester_data, aes(x = price, y = la, col = type)) + geom_jitter()

Three things jump out at me:

There was one detached that sold for a fortune, almost £3.5m

Terraced houses tend to be among the cheapest properties sold in Manchester

There are lots of flats being sold in the city

Now let’s take a look at Oldham:

oldham <- grep("OLDHAM",filtered_data$la) oldham_data <- filtered_data[oldham, ] ggplot(oldham_data, aes(x = price, y = la, col = type)) + geom_jitter()

This is much easier to read. Two things jump out from the data:

Hardly any flats have been sold in Oldham this year

The cheapest properties are almost always terraced houses

Putting them both on the same graph doesn’t really help much, because of that one detached house:

manol <- grep("MANCHESTER|OLDHAM",filtered_data$la) manol_data <- filtered_data[manol, ] ggplot(manol_data, aes(x = price, y = la, col = type)) + geom_jitter()

How about if we only focus on new builds?

If we go back to the structure of our data, there is a y/n column. ‘Yes’ means a new build, ‘no’ means an existing property.

Let’s filter just for new builds and see what comes up:

Now this is more interesting.

We can see two trends here:

Hardly any new homes are being sold in Oldham compared to Manchester

Almost all the new homes in Manchester being sold are flats

Let’s tidy up the graph:

ggplot(manolnew_data, aes(x = price, y = la, col = type)) + geom_jitter(size = 3) + ggtitle("New homes sold in Manchester and Oldham, 2016") + labs(y = "", x ="Price", color = "") + theme(plot.title = element_text(size = 30), legend.title = element_text(size = 18), axis.title.x = element_text(size = 28), axis.text = element_text(size = 16), legend.text = element_text(size = 18), legend.key.size = unit(0.8, "cm")) #add pound signs and thousands separator + scale_x_continuous(labels = dollar_format(prefix = "£"))

Some swish new flats are being sold in Manchester this year for extraordinary prices – well over £300,000, even £400,000 in three cases.

Meanwhile, hardly anyone is purchasing new homes in Oldham.

A quick glance at the Government’s house building tables shows that only 40 new builds were completed in Oldham in Q2 of 2016, the joint lowest in Greater Manchester. Perhaps that goes some way to explaining why.