Performance Across Categories

I first evaluated each of the APIs by category to see how they perform at detecting each of the different types of NSFW content.

Pornography/ Sexual Acts

The Google and Sightengine API really shine here by being the only one that is able to detect all the pornographic images correctly. Nanonets and Algorithmia are a close second as they are able to correctly classify 90% of all pornographic images. Microsoft and Imagga have the worst performance on this category.

Links to original images: Porn19, Porn7, Porn18, Porn14

The images that are easy to identify are explicitly pornographic. All the providers got the images above correct. Most of them predicted NSFW content with a very high confidence.

Links to original images: Porn6, Porn2, Porn10, Porn3

The images that were difficult to identify were due to occlusion or blurring which made it difficult. In the worst case 11/12 vendors got the image wrong. Pornography has high variance in performance depending on the intensity of pornography and how clearly visible the pornographic content is.

Explicit Nudity

Most of the APIs performed remarkably well in this category with many of them having a 100% detection rate. Even the lowest performing APIs(Clarifai and Algorithmia) had a 90% detection rate here. The definition of what is considered nudity has always been subject to debate and as is clear from the images that are difficult to identify they mostly fail in cases where one could argue these are SFW.

Links to original images: Nudity10, Nudity7, Nudity13, Nudity14

The images that are easy to identify had clear visible nudity and are explicit. These would be called NSFW by anybody without a difference of opinion. None of the providers made an error and the average scores were all 0.99.

Links to original images:

Links to original images: Nudity9, Nudity8, Nudity18, Nudity4

The images that were subject to debate were the ones that the providers got wrong. This could just be that each of the providers have different settings for sensitivity to nudity.

Suggestive Nudity

Google once again leads the pack here by having a 100% detection rate for this category. Sightengine and Nanonets perform better than the rest with detection rates of 95% and 90% respectively. Suggestive nudity is almost as easy to identify for a machine as nudity but the places where it makes a mistake are in images which normally look like SFW images but have some aspects of nudity.

Links to original images: Suggestive13, Suggestive10, Suggestive2, Suggestive8

Once again none of the providers got the easy to identify images wrong. These images were all clearly NSFW.

Links to original images: Suggestive17, Suggestive12 ,Suggestive11, Suggestive5

In the suggestive nudity the providers were split. Similar to the explicit nudity they all had different thresholds of what is tolerable. I personally am unsure of if these images should be SFW or not.

Simulated/ Animated Porn

All the APIs performed exceptionally well here and were able to detect 100% of the simulated porn examples accurately. The only exception was IMAGGA which missed 1 image. It’s interesting to note that almost all the providers perform very well. This indicates that these algorithms find it easier to identify artificially generated images than naturally occurring images.

Links to original images: SimulatedPorn1, SimulatedPorn16, SimulatedPorn19, SimulatedPorn9

All the providers have perfect scores and high confidence scores.

Links to original images: SimulatedPorn15

The one image that Imagga got wrong could have been construed as maybe not porn if you didn’t look long enough.

Gore

This was one of the most difficult categories as the average detection rate across APIs was less than 50%. Clarifai and Sightengine outperforms its competitors here by being able to identify a 100% of the gore images.

Links to original images: Gore2, Gore3, Gore6, Gore10

The ones where all the providers had high thresholds were medical images probably because they are easier to find. However even in the best performing images 4/12 providers got the images wrong.

Links to original images: Gore7, Gore9, Gore17, Gore18

There was no discernible pattern in the images that were difficult to predict. However humans would find it very easy to find any of these images as Gory. Which probably means the reason for poor performance is the lack of available training data.

Safe-for-work

Safe for work are the images that should not have been identified as NSFW. Collecting a safe for work dataset itself is difficult which should be close to NSFW to get a sense of these providers do. A lot of debate can go into if all of these images are SFW or not. Here Sightengine and Google are the worst performers which kind of explains their great performance across other categories. They basically just call anything and everything NSFW. Imagga here does well because they call nothing NSFW. X-Moderator also does very well here.

Links to original images: SFW15, SFW12, SFW6, SFW4

The easy to identify images had very little skin showing and would be very easy for a human to identify as SFW. Only 1 or 2 providers got these images wrong.

Links to original images: SFW17, SFW18, SFW10, SFW3

The difficult to identify SFW images all had a higher amount of skin showing or were Anime (high bias towards Anime being porn). Most of the providers got the images with a high amount of sking showing as SFW. Which begs the question if these are truly SFW?

Overall Comparison

Looking at the performance of the APIs across all the NSFW categories as well as their performance in being able to correctly identify safe for work(SFW) content, I saw that Nanonets has the best F1 score and Average Accuracy thus performs consistently well across all categories. Google which does exceptionally well in detecting the NSFW categories marks too many of the SFW pieces of content as NSFW thus gets penalized in its F1 score.

By Provider

I compared the top 5 providers by accuracy and F1 Score to showcase the differences in their performance. The larger the area of the radar chart the better.

1. Nanonets

Nanonets does not perform the best overall in any one category. However is the most balanced overall doing well in every category. The place where it could do better is in identifying more images as SFW. It’s over sensitive to any skin.

2. Google

Google performs the best in most NSFW categories but performs the worst in detecting SFW. One point to note is that the images I found were from Google which means they “should know” what the images I was using were anyway. This might be reason for the really good performance in most categories.

3. Clarifai

Clarifai really shines in identifying Gore and does better than most other APIs it is again well balanced and does well in most categories. It lacks in identifying Suggestive Nudity and Porn.

4. X-Moderator

X-Moderator is another well balanced API. Apart from identifying Gore it identifies most other types of NSFW content well. It got 100% accuracy in SFW which sets it apart from it’s competitors.

5. Sightengine

Sightengine like Google has an almost perfect score at trying to identify NSFW content. However it didn’t identify a single Gore image.

Pricing

One other criteria for deciding which API to go with is pricing. Below is a comparison of the pricing each of the vendors have. Most of the APIs have a free trial with limited usage. Yahoo is the only one that is completely free to use but is self hosted hence not included in this table.

Amazon, Microsoft, Nanonets, DeepAI all come in the lowest at $1k a month for 1M API calls.

Which is the Best Content Moderation API out there?

The subjective nature of NSFW content makes it difficult to declare any one API as the go-to API for content moderation.

A general social media application that is more geared towards content distribution and wants a balanced classifier would prefer to use Nanonets API as proven by the highest F1 score for their classifier.

An application that is targeted towards kids would definitely err on the side of caution and would prefer to hide even marginal inappropriate content, thus they would prefer to use the Google API with its exemplary performance on all NSFW categories at the risk of filtering out some of the appropriate content as well. The trade off would be losing a lot of SFW content that Google might declare NSFW.