AirBnB has 2 million listings and operates in 65,000 cities. Here we look at insights related to vacation rental space in the sharing economy using the property listings data for Texas, US.

By Preetish Panda, promptcloud.

Airbnb’s growth has been phenomenal over the years and they are one of hottest unicorn startups in the present days. In fact, they have become profitable and hopefully, looking at IPO in the near future. According to popular media outlets, currently Airbnb has more than 2 million listings in 192 countries and operating in 65,000 cities. Considering the amount of data that Airbnb hosts, it’d be interesting to perform analyses and uncover insights related to vacation rental space in the sharing economy.

In this study we’ll be using the property listings data extracted for Texas, United States. Here are the data fields of the dataset:

Rate per night

Number of bedrooms

City

Joining month and year

Longitude

Latitude

Property description

Property title

Property URL

You can download this dataset from our Kaggle page.

There are two goals of this study:

The goal of our analyses would be to plot the pricing data in terms of density and expensiveness on the Texas map

We’d also perform topic modelling on the description text of the property listings

Spatial Analysis

Let’s first try to visualize the property density across cities of Texas and find out which one has maximum number of listings. Given below is the `R` code to create the contour heat map:

#Load the library library(ggmap) #Create Texas map tx_map = get_map(location = "texas", zoom = 6) #Load the dataset from the CSV file tx_bnb_data = read.csv(file.choose(), stringsAsFactors = FALSE) # Create the heat map ggmap(tx_map, extent = "device") + geom_density2d(data = tx_bnb_data, aes(x = longitude, y =latitude), size = 0.3) + stat_density2d(data = tx_bnb_data, aes(x = longitude, y = latitude, fill = ..level.., alpha = ..level..), size = 0.01, bins = 16, geom = "polygon") + scale_fill_gradient(low = "black", high = "red",name = "Density") + scale_alpha(range = c(0,0.3), guide = FALSE) + ggtitle("Density Distribution of Airbnb Properties in Texas")

You should be able to generate the following heat map:

We can see that Dallas and Fort Worth area sit at the top in terms of property density, closely followed by Houston and Austin. But, what about the expensiveness of the properties? Is that different from this visualization? Let’s find out.

This dataset contains longitude and latitude of each listing along with pricing. We’ll use these data points to plot pricing on Texas map. It’ll have circles on each of the location clusters (cities) and size of these circles would signify increase in price. Here is the `R` code: