Photo by Paweł Czerwiński on Unsplash

An Orwellian Approach to the Litter Problem

Using computer vision to detect someone missing the garbage

Anyone who has lived in an urban environment knows how filthy it can be. No matter the effort exerted by municipalities, trash finds a way to roll through cities like tumble weeds. Simple solutions involve sending individuals with trash pickers to decontaminate city streets. Unfortunately, it is not always known where litter is. Street cleaners can meander in low density litter areas, while garbage piles in more bustling regions.

In a recent hackathon at my former university, my team and I attempted to tackle the litter problem in order to improve our communities and appease the environmental awareness theme of the hackathon. Combining our experience in computer vision and machine learning principles, we wanted to create a solution that cut to the source. Instead of waiting for a dedicated litter removal service, our project harnessed the abundance of security cameras in cities and basic human-self respect.

What a vile litterbug

Human Tracking

In order to detect litter being thrown, we thought to isolate a human as a blob of pixels. If the entire human could be converted into a binary blob of pixels, we could detect if another blob of pixels begins to move away from the human. We can safely assume that the second blob is an object being littered. This thought process spurred the idea of using a derivative video that tracked changes in a scene. This is not a robust solution, as even after dilation schemes were attempted, the human in the image is often fragmented.

A derivative approach to motion detection in a scene. Red box shows trash and is not generated by program

After quickly becoming fed up with the inconsistency in the naive approach, our group transitioned to using a Histogram of Oriented Gradients (HOG) to determine the position of a human. A HOG detector is often used to detect pedestrians. The method finds the gradient of the image using a Sobel operator. The gradient image is then segmented into eight by eight pixel patches. In each patch, the gradient vector is calculated and added into a nine bin histogram based on its direction. Using these gradients, we can predict where a person might be.

Implementing the HOG detector gave surprisingly pristine results. A person could be tracked moving through a scene. The method can also differentiate between multiple people in one image. Once the person was detected, a bounding box was created. It was found that with a too small bounding box, wide and expressive movements were occasionally confused for litter. As a result, we expanded the bounding box by a constant value.

The result of the HOG detector, with an enlarged bounding box

Trash Detection

Detecting the actual litter was the most tricky portion of this project. Two approaches were tried. The first approach relied on machine learning principles to identify garbage in an image. A data set of literal images of trash were used to train a convoluted neural network. The images were resized to sixty by sixty pixel images and normalized on a scale of 0 to 1. The neural net itself contained three convolutional layers and two hidden layers. Unfortunately, this method was incredibly clunky and did not have a high success rate. At best, the team was able to achieve a 80% success rate when distinguishing between cardboard and glass.

A more successful approach we found relied on straightforward image processing methods. We took the aforementioned derivative image and monitored pixels within the bounding box created by the HOG detector. If a set of pixels were detected crossing the boundary of the bounding box, it was marked. The set of pixels were bounded by their own box and the associated color image stored.

With a image of the potential trash captured, a tracking algorithm was used to follow the trash. OpenCV has several convenient tracking algorithms available. For this project a Discriminative Correlation Filter with Channel and Spatial Reliability (CSRT) tracker was used. This filter is known for its robustness. It was convenient for our purpose as it can handle occlusion of objects. The downside of this method is its computational complexity. But, who’s looking for optimization at a hackathon?

If the CSRT tracked that the image move beyond the litterer, the object was marked as trash. The tracker continued to follow the trash and saved its final image coordinates and dimension.

Example of CSRT tracker resolving occlusion

In addition to catching a litterer, the team wanted to detect whether a piece of trash was picked up. Such a good samaritan would avoid the need to send a crew to clean up the trash. More invasive municipalities could go out of their way to reward the tidy individual.

Since the position of the litter was known and tracked, the algorithm for cleaning it up was simple. The program continued to track the piece of garbage. If the garbage was displaced from its discarded position by a threshold and was contained within a HOG generated frame for a specific number of frames, the garbage was considered cleaned up.

Being a good samaritan

Twitter Trash

As mentioned earlier, this project aimed to curb littering by targeting the root of the problem. A complete lack of common decency mixed with the protection of anonymity, allows people to commit this heinous crime. Since our approach is capable of detecting the individual that litters, an image of the culprit can be easily saved.

Using Twitter’s incredibly practical API, a magic touch was added to this project by uploading an image of the criminal to Twitter. The message lets a neighborhood see who contributed to the accumulation of garbage within their community. A camera could potentially report its geographical coordinates along with the image. This would allow courteous residents to clean up the litter. In the future we would want to add a method of rewarding the good samaritan, whether it be through a positive Twitter post or other means.

An example twitter post by the litter tracker

Results

This hackathon project successfully demonstrated a prototype of a basic monitoring system. The method is capable of monitoring if an individual was incapable of placing their trash into a garbage receptacle.

It’s important to consider the limitations of this approach. Our method does rely on an individual throwing a piece of litter away from them. I imagine this is the most common method of littering (I don’t have much personal experience). Yet, if one decided to crouch and gently place the litter next to them our algorithm would fail to detect the litter. The program is incapable of determining the type of litter as well. An individual can potentially drop a wallet and the algorithm would interpret the item as trash. The team initially hoped to solve this with the previously mentioned neural network. Unfortunately, with a limited data set this turned out to be an arduous task.

While the project did not land us a victory at the hackathon, this was an incredibly exciting experience. I’ve done a considerable amount of work with static images in the past, but very little with video footage. This project took many computer vision techniques and applied them to a silly and fun application.

Final Video