In this paper, we call all domestic properties “properties”. The subset of properties without a permanent resident we call ‘low-use properties” (LUPs). The word “property” is used to separate LUPs from the emotional connotations of “home”, as described by Paris (2009). This separation is important as we do not know where a LUP sits on the spectrum between a home and a purely commercial investment. Properties with a permanent resident will be referred to as “homes”. The previous definitions indicate that homes + LUPS = properties.

Two common threads in the literature on both foreign investment and second homes are the difficulty of defining the properties of interest and the lack of data about which properties are empty. We solve the problem of lack of data by utilising the Freedom of Information (FOI) ACT (ICO, 2017) to request data on council tax from local authorities. These FOI combine to form a unique and detailed picture of housing use in England and Wales. Local authorities are a form of local government used in England and Wales, and council tax is a form of property tax used to pay for local services.

We take a pragmatic approach to defining a LUP. We use the council classes provided by local authorities to determine whether a given house is a LUP. Broadly, a LUP is a property that is not registered as the primary residence of any individual. However, this definition comes with caveats, as we include only data that falls under council tax “discounts” and not “exemptions”. Council tax ‘exemptions” cover situations such as the recent death of the resident, incarceration of the resident and so on. Discounts cover situations such as properties empty for longer than six months, second homes and properties that are under renovation. The discounts class implies a degree of agency absent from the exemptions class.

In this paper, we do not distinguish between local and foreign buyers as this is not possible given our dataset (see Section ‘Obtaining data'). We view both local and foreign owners as “out-of-town”, as was used by Favilukis and Van Nieuwerburgh (2017). Although local and foreign owners may have different drivers and occupy a different parts of the spectrum between a long-term commitment to a property and community or a purely commercial investment, we are interested in the overall relationship between LUPs and affordability and tourism, independent of owner type.

Of note, after April 2013, councils were no longer obliged to give a discount on council tax to LUPs (National Archives, 2012), so registration of LUPs may have decreased. This change in the law means that LUP percentages recorded by the local authorities may not reflect the true LUP levels. However, this does not affect whether we class a local authority as having high-value LUPS, as our definition depends on the distribution, not the absolute level (see Section ‘Types of low-use property').

We use three different geographical units, the lower super output area (LSOA), the middle super output area (MSOA) and the local authority. These three geographic units and their boundaries are defined and specified by the Office of National Statistics for census data and other demographic analysis.

An LSOA contains a minimum of 400 properties (ONS, 2018a) and is the smallest publishable geographical unit for which it is possible to obtain council tax data. LSOA are themselves constructed of output areas (OA). OAs are constructed using adjacent postcodes to be as socially and demographically homogeneous as possible (ONS, 2015). This homogeneity extends to dwelling type and tenure and prevents splitting along an urban/rural boundary. Contiguous blocks of LSOA form an MSOA. Some data such as local income statistics are only gathered at the MSOA level. The final geographic group, the local authority, is made up of one or more MSOAs. The local authority provides the FOI data.

Obtaining data

This paper uses exclusively publicly available data provided by various branches of government in England and Wales; all data is available under an open government licence (OGL, 2015). The low-use data was obtained through FOI requests (ICO, 2017); data was received between January and September 2017 from 112 local authorities. This data is available at in an open data repository (Bourne 2018). Previous to the collection of this dataset, vacancy data was only available through the annual long-term vacants dataset from the Ministry of Housing Communities and Local Government (for Communities and Local Government, 2017) or through the Census. The method used to create this dataset allows detailed demographic data to be collected from local authorities at much more regular intervals.

The local authorities were selected to cover the different regions of England and Wales, including both rural and urban areas, as well as areas popular with tourists and those that are not. London regions are over-represented due to the focus on London as a hub of foreign investment, although the remainder of South East England is under-represented. The East Midlands is the only region not present in the dataset. Data was requested from approximately 120 local authorities. Those few that were not included in this analysis either lacked the ability to return the data or did not return data of sufficient quality in a timely manner, necessitating their exclusion from the study before data was received.

Geographical data is from the Office of National Statistics (ONS) (ONS, 2010a; ONS, 2017a; ONS, 2017c; ONS, 2016a; ONS, 2018b). Data on price paid for houses was obtained from the Land Registry for 2003–2007 and 2013–2017 (Land Registry Open Data, 2016). Local area income estimates and population estimates are from the ONS (ONS, 2016b; ONS, 2010b). Company data and data on homes per local authority are from the Valuation Office Agency (VOA, 2018; Valuation Office Agency, 2016). Local authority council tax income data was obtained from the Ministry of Housing Communities and Local Government (of Housing Communities and Local Government, 2017) and StatsWales (StatsWales, 2018).

Types of low-use property

In England and Wales, areas with a shrinking population have typically been affected by financial issues caused by deindustrialisation and associated loss of jobs (Rieniets, 2009). These areas then enter a cycle of district decline (Accordino and Johnson, 2000; Han, 2014), further exacerbating the problem. As such, we expect to see high levels of LUPS in areas that are very affordable due to low demand for housing. However, it should be noted that the number of cities experiencing population declines has become very small since 2000 (Pike et al., 2016). As we saw previously, Sá (2016) proposed that foreign investment disproportionately affects the upper end of the market. The work of Badarinza and Ramadorai (2018) suggests that foreign buyers will cluster together in certain areas. Even if we do not take explicit account of foreign buyers, Guerrieri et al., (2013) found that wealthy prospective property owners tend to want to buy near other wealthy property owners. Taking further inspiration from the literature, we note that second homes bought as luxury items tend to be in scenic or otherwise desirable locations. We therefore hypothesise that we will also see more LUPs at the least affordable end of the market.

Considering both ends of the affordability scale, we expect the number of low-value LUPs to decrease as the affordability ratio increases and the number of high-value LUPs to do the opposite. Together, the low-value LUPs and the high-value LUPs should result in a U-shaped curve, as illustrated by Fig. 1.

Fig. 1 Hypothetical LUP vs. affordability curves. Low-value LUPs decrease as the affordability ratio increases, whilst high-value LUPs increase as the affordability ratio increases Full size image

Within a single local authority, we assume that the affordability ratio is a proxy for property quality and desirability, although this may not be the case at the national level. As such, when the median price of a LUP is higher than the median price of homes in the local authority, we call the area a high-value LUP area. This is interesting as it means that at least 50% of LUPs are more expensive than at least 50% of the homes; we can then say that the LUPs tend to be more desirable than homes.

Finding the mean value of homes and LUPs

To find the difference in price between homes and LUPs, we use a simple graphical model, with the following attributes:

1. C: the price of each property in the local authority 2. W: the LSOA within the local authority 3. T: the type of property, either LUP or home

The dependency structure of these variables is shown in Fig. 2. The belief network tells us that the distribution of property prices is dependent on knowing the LSOA that the property is in. The LSOA distribution is dependent on knowing whether the property is a home or a LUP. The belief network has this structure because the property types have separate distributions and knowing the property type tells us the distribution over the LSOA, which then tells us the price distribution within each LSOA.

Fig. 2 Belief network of the variables. The nodes are price (C), LSOA (W) and property type (T) Full size image

We create the joint probability P(T, W, C) using the factorisation shown in Eq. (1) and represented in the belief network. From the data, we know the values of P(T) and P(W | T). However, we do not know the price of each property in the local authority and so do not know P(C | W). However, we assume that the distribution of the value of property sales in W is approximately the same as the distribution of the property values such that \(P(C|W) \approx P(\widehat {C|W})\). This assumption allows us to calculate the joint distribution using empirical data. We can then derive the mean value of homes and LUPs in each local authority, as shown in Eqs. (1–5). When calculating the mean, C y is the price of the yth property.

Within an LSOA, the mean price of a LUP and a home are the same, so to find the price difference at the local authority level, we effectively look at geographic concentrations of property types across the local authority. This crucial detail is shown in Eq. (4), where P(C | T = H) = P(C | T = LUP) is true when P(W | T = H) = P(W | T = LUP). In other words, in any local authority, the price of a home and a LUP are equal when they have the same distribution across that local authority.

$$P(T,W,C) = P(C|W)P(W|T)P(T)$$ (1)

$$\mathop {\sum}\limits_W {\kern 1pt} P(T,W,C) = \mathop {\sum}\limits_W {\kern 1pt} P(C|W)P(W|T)P(T)$$ (2)

$$P(T,C) = \mathop {\sum}\limits_W {\kern 1pt} P(C|W)P(W|T)P(T)$$ (3)

$$P(C|T) = \mathop {\sum}\limits_W {\kern 1pt} P(C|W)P(W|T)$$ (4)

$$\left\langle C \right\rangle _{T = i} = \mathop {\sum}\limits_y {\kern 1pt} C_yP(C_y|T = i)$$ (5)

While the theory provides the basis for the sampling, in practice it is more involved than the previous equations would suggest. As we wish to calculate the local authority median, we will sample each LSOA n Q,w times with replacement, where n Q,w is the total number of properties in the LSOA. This will create the vector of the empirical sample of property prices \(\widehat {\bf{q}}\) defined in Eq. (6), where \(\widehat {\bf{C}}_w\) is the observed set of sales prices in the LSOA and n Q,w is the number of homes in the LSOA. The total empirical vector of prices is then \(\widehat {\bf{q}} = \left\{ {\widehat {\bf{q}}_1,\widehat {\bf{q}}_2 \ldots \widehat {\bf{q}}_m} \right\}\), where m is the total number of LSOA in the local authority. The LUPs of any given area are a subset of the properties in that area, i.e., LUP ∈ Q. This means the vector of house prices that make up the empirical sample of LUPs is also a subset of the properties vector (see the definition in Eq. (7)). However, while the properties are sampled with replacement from the sales price data, the LUPs are sampled without replacement from the properties vector \(\widehat {\bf{q}}\). The homes are the complement of the LUPs shown in Eq. (8). We can then calculate the mean and median values for properties, LUPs and homes while ensuring that LUPs and homes are a subset of the sample of property prices. We repeat the sampling process 501 times to find the distribution of the sample mean and median. This process is similar to bootstrapping (Efron and Tibshirani, 1993).

$$\widehat {\bf{q}}_w = \left\{ {x \in \widehat {\bf{C}}_w} \right\},\;\# \widehat {\bf{q}}_w = n_{Q_w},\;0 < n_{Q_w}$$ (6)

$$\widehat {{\bf{LUP}}}_w = \left\{ {x \in \widehat {\bf{q}}_w} \right\},\;\# \widehat {{\bf{LUP}}}_w = n_{{\mathrm{LUP}}_w},\;0 \le n_{{\mathrm{LUP}}_w} \le n_{Q_w}$$ (7)

$$\widehat {\bf{h}}_w = \left\{ {x \in \widehat {\bf{q}}_w|x

otin \widehat {{\bf{LUP}}}_w} \right\}$$ (8)

The Vancouver tax

We explore how much tax would be generated in the considered areas by implementing the same 1% value tax as was enacted in Vancouver, Canada (City of Vancouver, 2017). The tax will be compared to the local authorities' total income from domestic council tax.

Models

The data is too noisy to create regression models, so we focus on producing a binary classification. This classification is used for three dependent variables, as follows:

1. High-LUP percentage: The percentage of LUPs in the area is in the top 50% of areas. 2. High-value: The LUP median price is higher than the median price of homes. 3. High-value-high-LUP: The LUP median price is higher than the median price of homes, and the percentage of LUPs in the area is in the top 50% of areas (abbreviated to HVHL).

In this analysis, we use logistic regression, as shown in Eq. (9). Logistic regression is a linear method that allows easy interpretation of the relationship between the coefficients and the outcome variable.

$$Pr(Y_i = 1|X_i) = \frac{{exp\left( {\beta _0 + \beta _1X_i + \beta _2X_2 + \beta _3X_3} \right.}}{{1 + exp(\beta _0 + \beta _1X_i + \beta _2X_2 + \beta _3X_3)}}$$ (9)

As there are only 112 observations at the local authority level, the number of variables that can be used without overfitting is limited. We use three independent variables to predict the three dependent variables. The independent variables are A (affordability ratio), A2 (the centred squared affordability ratio) and T (the tourism density). The formulae that will be trialled in the logistic regression are shown in Table 1, where y is the dependent variable. The models are tested using 20 repetitions of a 5-fold cross validation for 100 models in total per formula per dependent variable.

Table 1 Formulae to be used in the logistic regression Full size table

The affordability ratio is smilar to the price-to-earnings ratio. The higher the affordability ratio is, the more years it will take to earn the average property price for the average resident. The affordability ratio takes into account findings (Hilber et al., 2016; Follain and Jimenez, 1985; Xiao, 2017) that property prices are related to income. As LUP purchasers do not usually live in the same area as their LUPs, properties should become less affordable (affordability ratio increases) with more high-value LUPs. Tourism density is the number of companies in the local authority registered as guest houses or hotels per 1000 homes.

Estimating the number of people living in high-value areas

We wish to answer the following question: What fraction of the population lives in a high-value or HVHL area? As we can only infer the median difference between LUPs and properties through resampling, we can at best find an estimated range for the fraction of the population living in such areas. For the local authorities for which we have data, we take the probability of being a high-value area as the fraction of times that the local authority was high-value across the 501 resamples. We assume that the probability of an area being accurately classed as a high-value/low-value area is equal to the accuracy of the model. Thus, if an area is classed as high-value and the model has an accuracy of p, then the probability of being high-value is p, and if the area is classed as not being high-value, it is actually high-value with a probability of 1−p. Thus we will have a vector of the probabilities of high-value local authorities and another vector of the probabilities of HVHL local authorities across the whole of England and Wales. These vectors will be sampled 1000 times each to get the expected range within which the true population fraction exists.