For each of the 500 images in the Accuracy Evaluation part of the study every tag from the image recognition engines was evaluated on whether or not it was accurate. This was a basic "yes, no, or I'm not sure" decision (only 1.2% of tags were marked "not sure").

The distinction here is that a tag could be judged to be accurate, even if it was one that a human would not be likely to use in describing the image. For example, a picture of an outdoor scene might get tagged by the engine as "panorama," and be perfectly accurate, but still not be one of the tags a user would think of to describe the image.

With that in mind, here is the summary data with the overall score for each engine, across all of the tags they returned:

The clear winner here is Google Vision, with Amazon AWS Rekognition coming in second.

Confidence Levels

The above scores are across all tags returned by each engine. However, each engine also returns a score on the confidence level they have with each tag. This enables it to return tags that are quite a bit more speculative. Here is the data showing a summary of confidence level scores each engine provided across all engines:

It's interesting to look more closely at images that the engines feel they have a very high degree of confidence about. Here is a look at all the images where the engines have a 90% or higher confidence level:

What's fascinating about this data is that on a pure accuracy basis, three of the four engines (Amazon, Google, and Microsoft) scored higher than human tagging for tags with greater than 90% confidence.

Let's see how this varies when we take the confidence level down to 80% or higher:

At this level, we see that the scores for 'human hand tagged' is basically equivalent to what we see for Amazon AWS Rekognition, Google Vision, and Microsoft Azure Computer Vision.

One would expect that the tags that were given a low confidence level would be lower in accuracy, and that proves to be case:

For the next few charts we'll take a look at the accuracy by image recognition engine across many classes of confidence levels.

Amazon AWS Rekognition:

Google Vision:

IBM Watson:

Microsoft Azure Computer Vision:

Across all engines we can see that they do significantly better with the tags that they have assigned higher confidence scores to.