Data

To classify shoes as classy vs. sporty, we first needed access to a few thousand shoes. Out friends at UT had conveniently put together a repository of 50,000 shoes. Using this, we wrote a bash script to extract 6,000 sneakers and 5,000 oxfords/loafers. The repository contained slippers, heels, and other types of shoes, but we decided to restrict our data to oxfords/loafers and sneakers.

Using the bash script, we create two large repositories of shoes. One contained pictures of classy shoes and the other contained sporty shoes.

Classy Shoes for Training

Sport Shoes for Training

Implementation

Step One: CNN (Convolutional Neural Network)

The first step was classifying the shoes as classy or sporty. Given only a picture of the shoe, our goal was to return an output of “classy” or “sporty”. Given this large data set, a neural network seemed to be the best way forward.

Attempt 1

The first CNN implementation yielded fairly accurate results, but there were some glaring problems. For example, a picture of Nikes yielded the following results:

One angle classified these Nikes as Sporty but the other classified them as Formal (classy)

Another misclassification is shown here:

This dress shoe is classified as sporty

From this, we noticed 2 things:

Some images in the training set are angled with the toe box touching the bottom left of the window (like in the first image above) and others are flipped (like the second image above). There are significantly more sporty shoes in the training set than classy shoes.

Attempt 2:

In our second attempt, we decided to oversample the classy shoes data set. This would attempt to fix both the image angle issue and provide more images for the classy data set. The code below flips each image in the training set.

X = []

Y = []



for picture in list_pictures('./shoesimages/Sports/'):

img = img_to_array(load_img(picture, target_size))

X.append(img)

Y.append(0)



img_flip = np.flip(img, axis=1)

X.append(img_flip)

Y.append(0)



for picture in list_pictures('./shoesimages/Formals/'):

img = img_to_array(load_img(picture, target_size))

X.append(img)

Y.append(1)



img_flip = np.flip(img, axis=1)

X.append(img_flip)

Y.append(1)

Running a CNN on this code improved the accuracy of the model, but there was still the issue of images from the training set like the ones below being misclassified:

Upon further inspection, it is easy to see why the neural network lacks accuracy with these particular images. The shoes above could fall into either category, so they are rather ambiguous. The very first image, for example, is supposed to be a sporty shoe, but the model classifies it as a classy shoe. Given that the shoe looks like a cross between a sporty shoe and a classy one, this misclassification is understandable.

Attempt 3

In our final attempt, we attempted to clean the data of these ambiguous pictures. This was a rather tricky process, but we listed all shoes from the training set that were misclassified, and subsequently saved these images into separate folders. Then, these images were removed from the original training set, and the CNN was trained for the final time.

Some of the ambiguous images removed from the classy training set

Some of the ambiguous images removed from the sporty training set

The picture of the Nikes, reproduced below, yielded very accurate results after oversampling and cleaning the data.

Both angles are classified as Sporty

The dress shoe is now classified properly

Below is the confusion matrix. Though not perfectly accurate, this final CNN is much more accurate in predicting newer input images, and we chose this for our classifier.

Background Removal

One final area of concern was the background of the new input image. That is, all the training images had a pure white background. The image given in actual use of this project will be provided by the user and likely taken on a phone camera. Therefore, it was important to account for changes in background and remove those variables to ensure accuracy. We used a method of edge detection called Canny Edge Detection to isolate the shoe from the background image. Credits to this StackOverflow post for the idea and implementation. Then, we colored the background a standard yellow to isolate it from the shoe. Yellow was chosen because yellow shoes are rare, and coloring the background a color not permitted for the shoe ensures easy detection and classification. After this process, all input pictures, regardless of background color, were all given a standard yellow background. Input images were also resized to comply with the training set.

Step Two: It’s All About the Color

Getting the color from the picture of the shoe was another significant hurdle, and we went through many iterations and attempts before getting accurate results.

Before we began, we restricted the colors of our shoes to a manageable list. For classy shoes, the colors were: dark brown, light brown, and black. For sporty shoes, they were: black, red, blue, green, grey, pink, light brown, dark brown, and white.

Attempt 1: K Means Clustering

Initial research showed that color recognition could be achieved with K Means clustering and OpenCV. Essentially, we pick a certain number of “clusters” (which in our case was colors) and the K Means algorithm will separate our data (the image) into those three clusters. We implemented this by following a tutorial from Google. The results gave us a histogram with the three most prominent colors in the image. Though it worked to our specifications, there were some issues. Namely, there were a lot of overlaps in colors, so it was difficult to classify certain colors like dark brown vs. black and red vs. brown.

Attempt 2 : RBG Looping

The next technique was RBG looping. Essentially, we iterated through every pixel of the input and using domain knowledge to determine the color. For example, a pixel was black if it fell within the range of black RBG value (0,0,0). Based on the classification of the shoe (classy vs. sporty), we kept track of how many pixels fell into each color group. Then, the color with the most pixels was returned as the most prominent color of the shoe. In theory, this sounded like a viable option, but once implemented, there were certain issues. The ranges for colors varied greatly, so colors like black had a much smaller range than colors like green. This meant black was very difficult to capture because that smaller range of RGB was difficult to capture. For example, black was in the range 0 to 127, but the next color on our list had a range from 127 to 127627. From this, it is easy to see that black will likely never be chosen even if the image were of a black shoe.

Attempt 3: HSV Looping

The final technique was HSV looping. HSV stands for hue, saturation, and brightness (value). Hue is the color (red, blue, green), saturation is the “depth” of a color, and value is the brightness of a color. In our final attempt, we still looped through the pixels and obtained the RGB values like before. However, these RGB values were then converted to HSV using colorsys. Then, we followed a similar process to place the HSV values into a color classification. Since we agreed on limited color options, we used an HSV simulation slider to find the parameters for each color. An example is given below:

if (saturation > 0.075 and saturation <= 0.1) and (brightness > 0.3 and brightness <= 0.6):

key='gray'

After obtaining the color, we kept track of the number of pixels associated with a color and returned the most prevalent one. This method proved the best out of all three since it eliminated overlap and yielded more accurate results. The hue gave the color, but the saturation and value allowed us to detect darker vs. lighter colors. Below are two examples of the color recognition program in action:

A sporty shoe classified as blue

A classy shoe classified as black

Step Three: The Final Product

So now that the classification and the color of the shoe have been determined, it is time to actually put together an outfit. After the intense process of training a neural net and using OpenCV for color recognition, this was the easy step.

First, we (being oblivious to any fashion rules) used a style guide pick out color combinations. Given the color of a shoe, it gives a list of acceptable and not so acceptable pant colors. This was one Python dictionary, essentially shoe color → list of pants to match that shoe color. Once this was mapped out (literally), we matched pants to shirts (pant color → list of shirts to match that pant color). Accessories were also included to shake up the status quo.

Then, given two things, color and classification (1 for classy and 0 for sporty), we generated a random selection of outfits. The pant color was randomly determined from the list of possible options (e.g. black formal shoes matched with black, dark grey, and light grey pants, so we picked one of those at random). Then, based on the pant color, a shirt color was chosen at random (similar to the process described above). Throw in a the accessory (hat for sporty and belt for classy) and voila! You have an outfit. You can also catch these on the run way at the Paris Fashion Week in 2019.