all the code for this project can be found on my GitHub

Objective: train a model that predicts the price of an Airbnb listing only using the first image of the apartment. There are two different approaches i will try, one create a custom CNN (convolutional neural network) and train it on 6,000 images and test it on 1,000 images. The second approach is pretty similar the main difference is instead of using a custom CNN acatecture i will be using the InceptionV3 model that has been trained on image net.

Visualize the Data

Over all data

Here are all the listings in NY on Airbnb, color is the price, and size is the number of reviews





Image Data

Custom CNN

here is an image from the dataset

before looking any further try and guess how much this listing is per night

Human accuracy: I asked 10 people what they thought the price of three different listings where only based on the picture and I found out that the average mean absolute error for humans is around 30 (this is a pretty small sample size so not sure how much weight you should put in it). Also, the answer was $89 for the price of the listing in the picture. in the next part I will discuss the custom CNN architecture I used to train the model

Structure:

Input (size 224 by 224)

CNN (depth:32 ,kernel:3, stride:2)

Max_Pooling( size: 3 by 3, stride: 2

CNN (64,3,1)

Average_pooling(3,3,1)

CNN(128,3,2)

Dropout of .5

CNN(64,3,1)

Fully connected layers

output (predicted price)

Training: This model was trained on 3,200 images (and had a validation set of 8,00 images), it had 5 epochs and a batch size of 50. Used Adam optimization as the optimizer, MSE as the loss function, and MAE, MSE as the metrics to show how the model does on the training set and validation set.

Results:

Training set: MSE = 7229, MAE = 63

Validation set: MSE = 8364, MAE = 62

Test set: MSE = 5929, RMSE: 77

InceptionV3 Model

The Inception V3 model was made by four google developers and is known as one of the best models for image classification and other image-based ML problems(in order to implement this model I used Keras).

Training: The model original weight where the weights used from training on the Image net data set. in order to train the model on new data training was split up into two different parts, one training the layers I added to the end of the model, and two training some of the Convolutional layers in the InceptionV3 model.

Training part 1: I added an average pooling layer, a dense layer with 1024 neurons, and an output layer that outputs the predicted price. The training size for this model was pretty small because this model takes a while to train, the training size was 200 and the test size was 60. After training the first part of the model the results where as follows, Training set: RMSE= 65, Validation set: RMSE = 124, Test set: RMSE = 122 Training part 2: unfroze some of the layers in the InceptionV3 model, and trained it on 200 images. Again if this model was trained on more images for a longer time it would probably do better. After training the second part the results where as follows, Training set: RMSE= 14, Validation set: RMSE = 71, Test set: RMSE = 96. As you can see the difference between the training set and test set is very large and could be mitigated by training it on more data.

Conclution

while this model may not work so well it could be used as part of a larger model, that not only looks at the picture of a listing but also the location, the number of bedrooms, etc. another use case for this model could be to determine which picture you should use as your first picture for a listing to maximize the price you could charge.

Future model