AptDeco.com is a peer-to-peer online marketplace for buying and selling used furniture.

While at Insight, I had the opportunity to consult on a data science project for AptDeco.com. AptDeco is a NYC based peer-to-peer online marketplace for buying and selling used furniture. (They’ve recently expanded to DC too!) The website simplifies the resale process by handling all of the logistics for its users. The AptDeco team fills in any missing details about the furniture, creates high quality listings on their website, and even delivers the furniture when it’s purchased.

The first step a user takes when creating a listing on AptDeco is to submit a picture of their furniture. The editors at AptDeco will manually review the pictures, and decide if the images are of high enough quality to be edited and displayed on the front page of the listing. Unfortunately, around 75% of the submitted images are low quality, and can’t be displayed on the front of a listing. This means that AptDeco’s team spends a large amount of time sifting through low quality pictures and requesting improvements from their users.

With this in mind, I created DecoRater — an algorithm for automatically assessing the quality of furniture images on AptDeco.com. DecoRater provides immediate feedback to AptDeco users, increasing the odds that they will submit high quality images for their listings. At the same time, DecoRater’s image assessments can be used behind the scenes to automatically flag listings which are in high need of editor intervention.

In this post, I’ll explain how I created DecoRater from the ground up. The discussion will include:

I’m no longer hosting DecoRater on the public AWS server, but I’ve included a video demonstrating the application below:

In this demonstration, I upload two sets of furniture images to DecoRater. The quality of each image is evaluated on a five-star scale, and DeocRater identifies if each image is suitable for editing and display on the front page of an AptDeco listing. If none of the images in a set are high quality (as is the case with the second set) DecoRater prompts the user to upload additional images.

Choosing an appropriate model

Before I built a model to assess the quality of each image, I needed to define what the “quality” of an image meant. AptDeco’s database has a number of metrics which could be used to define “quality.” Most of these metrics, such as a listing’s click through rate, are influenced by outside factors, such as the price of a listing or the age of the furniture. Even if I included these non-image features in the model, the images that are displayed in the store are edited before being placed on the front page of a listing. This means that any model which uses a listing’s click through rate as the definition of quality could only evaluate the already-edited images, rather than to the raw images that users upload directly.

A more useful measure of image quality can be defined by AptDeco’s editors’ choice to edit and display an image, or to reject it. I did not have access to historical accept/reject decisions, but AptDeco was willing to help me hand-label a subset of 2000 images for this purpose. In this light, the goal of assessing image quality becomes a binary classification problem in which we try to predict the probability that an unedited picture will be chosen for editing and display on the front page of a listing.

To accomplish this goal, I trained a machine learning model to emulate a human’s subjective opinion of each image’s quality. After considering multiple algorithms, I settled on an AdaBoosted Random Forest Classifier due to its higher test performance relative to the other models. The random forest model is well suited for the nonlinearity of the classification problem, and the AdaBoost algorithm helps to properly classify the many borderline-acceptable images in our database. Most importantly, the model can output the probability that an image belongs to the “high quality” class, rather than simply classifying the images on a binary scale. This is a nice feature to have, since we can use the continuous probability to automatically identify the lowest quality images for the AptDeco editors.

Extracting Image Features

A lot of the work on this project was spent engineering features that described each of the furniture images. I created over 60 features to use as input for the model. In this section I’ll discuss some of the more interesting and important features.

Image Symmetry

The editors at AptDeco told me that they strongly prefer front facing pictures of furniture, so one of the first features that I developed was a measurement of each image’s symmetry. I measured the symmetry of an image by decomposing it into symmetric (Is) and antisymmetric (Ia) components. The symmetric component of an image was produced by mirroring the image across the axis of symmetry, and adding it to the original image. For instance, to measure the horizontal symmetry of an image, I mirrored the image across the vertical axis and added it to the original image to produce the symmetric component. Similarly, the antisymmetric component of an image is produced by mirroring the image across the axis of symmetry and subtracting it from the original image. The total symmetry of an image is calculated by taking the ratio of the total intensity of the symmetric component to the total intensity of both components:

From left to right: The original image, the value channel of the image, and the corresponding antisymmetric and symmetric component of the value channel.

Image Sharpness

The editors at AptDeco also mentioned that high quality images needed to be sharp, with well defined edges. I took two separate approaches to measure how sharp an image is. The first approach applies a Laplacian Filter to the image, which calculates the second derivative of a single channel from the image. If the variance of the laplacian filter is high, then the image has a wide range of edge-like and non-edge-like responses. On the other hand, if the variance of the laplacian is low, then there are very few edges present in the picture — indicating a blurry image.

Top: A blurry image and the corresponding Laplacian filter.

Bottom: A sharp image and the corresponding Laplacian filter. Note the higher intensity of edges in the sharper image.

My second approach to measure the sharpness of an image was to calculate the Fast Fourier Transform of the image. The Fourier Transform decomposes the image into high frequency and low frequency variations. High frequency variations indicate rapid changes in the image, which correspond to edge-like features. Therefore, the average frequency found in the Fourier Transform provides a measure of how sharp an image is, with higher frequencies corresponding to sharper images. Note that the two sharpness features are highly correlated, which is another reason that I favored using a random forest classifier over other models.

Focal Points of an Image

The final image feature which I’ll discuss involves the focal points of each image. I measured the focal points of an image by constructing a saliency map. The concept of saliency originates from neuroscience, and measures the extent to which a pixel stands out relative to its surroundings. The method I used to create the saliency maps is described by Itti, Koch, and Niebur in their 1998 paper A model of Saliency-based Visual Attention for Rapid Scene Analysis. This method takes a bottom-up approach, in which the image is decomposed into 12 color maps, 6 intensity maps, and 24 orientation maps. The extent to which an object stands out within each map is calculated, and normalized saliency maps representing each of the three categories (color, intensity, and orientation) are produced. The final saliency map is created by averaging over the three normalized maps. Once I’ve identified areas of high saliency in each image, I calculate the average hue, saturation, and value at the focal points to produce three additional image features.

An image and the corresponding saliency map. White areas correspond to higher saliency, and identify focal points of the image.

Assessing Model Performance

Once I extracted the features from each image, I set aside 40% of the labeled images as a test set, and used the remaining 60% of the labeled images to train and tune the random forest model.

A common way to assess the performance of a binary classification algorithm is called the receiver operating characteristic curve (more commonly called the ROC curve). The ROC curve illustrates the classifier’s performance as the threshold for calling a picture “high quality” is altered. For example, suppose we classified everything with a “high quality” probability higher than 70% as a “high quality” image. In this case, the classifier requires the vast majority of trees to agree that the image is “high quality” before classifying it as such. With such a strict threshold, we’re likely to misclassify many of the high quality images as a low quality image. In exchange, we are less likely to misclassify a low quality image as a high quality image. In this situation, we say that our classifier has a low true positive rate (the fraction of high quality images that we correctly identified as high quality), and a low false positive rate (the fraction of low quality images that we incorrectly identified as high quality). As we lower the threshold of our classifier, we will increase both the true positive and false positive rate, tracing out the ROC curve. The ROC for DecoRater’s image classifier is shown below:

The dashed diagonal line in the figure above indicates how well the classifier would perform with completely random guesses. The most ideal classifier would have an extremely high true positive rate while maintaining a low false positive rate. Thus, higher quality classifiers will hug the top left corner of the ROC plot. As the ROC curve approaches the top left corner, the area under the curve (AUC) approached 1.0. In this case, the classifier has an AUC of 0.74, halfway between random guessing and perfect classification.

Another tool that I used to evaluate the classifier is called a confusion matrix. The confusion matrix is a heat map which plots the true class of an image against the predicted class of an image, as shown below.

From the top row of the confusion matrix, we can see that 68% of the low quality images were correctly identified, and 32% of the low quality images were incorrectly identified. From the bottom row of the confusion matrix, we can see that 73% of the high quality images were correctly identified, and 27% of the high quality images were incorrectly identified. This information is also encoded by the color of the squares. (Note: The numbers presented here use a probability threshold of 50%)

The confusion matrix is particularly useful for evaluating classification models when the balance of classes is uneven. In this case, only about 25% of AptDeco’s images truly belong to the high quality class. Therefore, if the classifier always predicted that images were low quality, it would be correct 75% of the time. This would lead to a confusion matrix with dark squares in the first column, and light squares in the second column. In contrast, a high quality classifier will correctly identify both the majority and minority class, leading to dark squares in the top left and bottom right corners, as is the case with our classifier.

Implementing the Model

There are a few ways in which AptDeco can implement DecoRater. First, we can use the model to identify the most common mistakes users make when uploading images, and to provide general guidelines to AptDeco users. I extracted this information by identifying the most important features in the model (those which provided the most efficient splittings in our trees). A list of the top 15 features is shown below. By far, the most important features have to do with the horizontal symmetry of brightness and saturation in the image. The location of the furniture with respect to focal points in the image, the presence of color gradients and shadows, and the perceived luminance of the image are also distinguishing factors.

AptDeco could also integrate the classifier with their current image upload system. The classifier would extract the relevant image features and calculate the probability that an image is high quality or not. If all of the user’s images a classified as low quality images, the website can immediately ask the user to upload a high quality picture. Note that the classifier incorrectly identifies 27% of high quality images when a probability threshold of 50% is used as the cut off. To avoid a frustrating user experience where we misclassify high quality images too often, it may be beneficial to lower the threshold for calling an image high quality in this case.

Finally, the classifier could be implemented in the backend of AptDeco’s website. In this situation, the classifier would predict the probability that each image is high quality, and store that information in a database. The editors at AptDeco could use this information to automatically identify which pictures are suitable for editing, and which listings are in urgent need of intervention. If the classifier is integrated into the backend of AptDeco, we don’t need to worry about a poor user experience, and can leave the probability threshold at 50%. Based on the performance metrics shown in the previous section, using the algorithm in this way would reduce the time that editors spend sifting through low quality images by 40%.

Conclusion

I learned a lot about developing a data science project from start to finish during this project. The opportunity to consult with a local business was invaluable, and gave me a chance to apply my technical skills to a real-world business problem. Ultimately, the project left me with a more business-oriented mindset, and a great network of connections in the data science community. The team at AptDeco was a pleasure to work with, and I’m excited to see the creative ways in which they deploy DecoRater. Additional details about the project can be found on my personal blog.