I did an old article a while back, talking about detecting ripe fruit with OpenCV. While a simple color detector is a fun introduction to computer vision, sometimes our problems require a bit more complexity and forward thinking. Deep learning therefore allows us to develop a robust approach to most problems in computer vision. The algorithm I’ll use in this article is called Mask R-CNN.

Kauai Coffee Company. Source: Hawaii Coffee Association

Coffee orchards around the globe rely a great deal on hand-counting the amount of ripe vs. non-ripe coffee cherries on a branch in order to estimate time-to-harvest. A great amount of expertise is necessary to make faster, educated estimates on when to harvest based on this simple metric. Mistakes could cost the company a successful harvest in a particular field. Less harvest, less money.

How many cherries can you count? Now, how long should we wait until harvest?

Training Computers

If you aren’t familiar with machine learning, there’s really only one thing you need to know for this type of problem. Without it we would have to hand-code every feature of those coffee cherries to be recognized by a program. No matter how you put it that is too much code to write, and no the coffee cherries can’t be approximated as green circles (I also definitely have not had this thought).

We therefore take the following approach:

Take ~100 pictures of our object of interest Hand label/annotate ~30 pictures Feed the images into our network Evaluate model Repeat until we can detect our objects with >90% accuracy. Bonus: have our model label new images for us

Mask R-CNN

The purpose of this article is geared for the applications of this algorithm, so if you’re interested in how the network functions check out the original paper. In simple terms this algorithm returns the location of the object and the pixels that make it up. Works great for being able to:

Determine the location/amount of the coffee cherries (bounding boxes) Determine the color of each coffee cherry (image segmentation).

VGG Image Annotator (VIA) is probably the best annotating tool for polygonal objects that I’ve found. They updated a few features to save projects, load previous annotations, etc. Annotating the objects for this case takes a long time. It’s really easy to fall into uncertainty during this stage: Should I really be annotating all these images? Does it usually take this long? What if it doesn’t work?

Yes. Absolutely. Suck it up. We beat the USSR and landed on the moon without knowing it would work. Back to annotating coffee cherries.

This really is the hardest part of the process.

With ~4 hours sunken into annotating all these images, this has to work. Right? The easiest way to test it is through transfer learning and someone else’s implementation of the algorithm. If needed, the model can be fine tuned and adjusted. Don’t be ashamed.

The folks at Matterport have a great foundation to use. Here’s the github repo.

The Results

After fighting endlessly with Amazon Web Services and package dependency issues, the final results looked like this:

Trained model in the wild.

This is art.

Not bad at all, especially considering the low amount of pictures needed to make it work. The next steps will be to refine the model by taking more pictures in settings similar to its application and ultimately deploy it for general use.

If you liked the article be sure to follow me for more applications of machine learning and computer vision in agriculture. If you have any questions, leave a comment or email me at jamesthesken@gmail.com.

— James