In September, 2016, Yahoo announced Open NSFW (https://yahooeng.tumblr.com/post/151148689421/open-sourcing-a-deep-learning-solution-for). Open NSFW is a neural network model to detect not safe for work (NSFW) images. The open sourced the model weights and parameters but its too bad that they didn’t open source the dataset (pun intended). The model uses caffe and can be found here (https://github.com/yahoo/open_nsfw).

After looking at this model I thought of doing something similar. So I started looking for datasets with nsfw and sfw images but unfortunately, there is nothing like that available online. So the only option left for me was crawling from the web. During this process, I decided to take it to a next level and build a model which distinguishes between Anime and Hentai images instead.

In this post, we will learn how to use Keras to finetune a pre-trained model which distinguishes between anime and hentai with over 95% accuracy!!! B.Y.O.N.D (Build Your Own NSFW Detector).

Let’s first know about what anime and hentai are. Anime in Japanese means animation. These can be hand-drawn or computer generated. Hentai is a type of anime which is a genre of pornography. It means “pervert” in Japanese.

Some of the anime images that i used in my training model look like the following:

Unfortunately, I cannot post hentai images here without censoring parts of it. These images looked like the following:

I won’t be posting more hentai images, so don’t get too excited. ;)





The Dataset

Obviously, this dataset is not available online either but is relatively easy to build. To build this dataset of anime v/s hentai, I used reddit. I used RedditImageGrab to collect over 2000 images of anime and 2000 for hentai. Reddit image grabber can be found here: https://github.com/HoverHell/RedditImageGrab

To use this image grabber, clone the repository to any place you prefer on your computer and run the following:

. -- --

And similarly,

. -- --

This library made my life simple as now I had the data and only had to build the convolutional neural network which is easy thanks to the interface provided by Keras (https://keras.io/).





Building the Convolutional Neural Network

Keras provides ImageDataGenerator, which can be used to generate images for convnets for classification tasks easily. To use this class, the images for training and validation should be stored in the following manner:

training_images/

anime

Image1

Image2

.

.

ImageN

hentai

Image1

Image2

.

.

ImageN

And for validation data:

validation_images/

anime

Image1

Image2

.

.

ImageN

hentai

Image1

Image2

.

.

ImageN

I started with a simple neural network from scratch. This model was created after removing a few layers from the VGG16 model.

Image size of 224x244 was used as input to this model. It should be noted that since I’m using tensorflow backend, the input_shape is (224, 224, 3). If you plan to use theano backend, change this to (3, 224, 224). After the model definition, the model needs to be compiled. I used an SGD optimizer with binary cross-entropy as the loss function.

Keras offers you a model checkpoint callback to save the best model. This saves the best model as an h5 file:

Now we need to create a generator for the training and validation data that we have. Keras’ ImageDataGenerator not only allows us to create a generator easily, but also allows on-the-fly image augmentation!

The augmentations that I used were rescaling, shear, zooming, rotation, width shift, height shift and horizontal flipping. One doesn’t need these augmentations for validation data. Only rescaling is needed for validation data if it’s included while generating training images.

And that’s it! Time to train this simple convolutional neural network.

This model achieves an accuracy of only 68.34% which doesn't look good. Time to move to fine tuning of pre-trained networks.

After this basic neural network, I tried fine tuning pre-trained networks like Resnet50 and InceptionV3 and results were amazing!

Fine tuning a pre-trained network using Keras is pretty easy once the images are stored in a proper manner. For both InceptionV3 and Resnet50 without including the 3 fully-connected layers at the top of the network and with imagenet weights. The input image size InceptionV3 is 299x299 and for Resnet50 is 224x224.

I added an average pooling layer to the Inceptionv3 model:

Similarly for Resnet50:

Everything else remains the same as described above for training neural networks from scratch. Let’s compare the logloss during training process for inception and resnet:

We see there is a high jump in the loss for validation set between 5th and 10th epoch for Resnet50 model. The InceptionV3 models seems to be quite stable in this regard. At the end of 35 epochs, the InceptionV3 model achieves a logloss of 0.1 while the logloss for Resnet50 is closer to 0.25.

Let’s compare accuracy achieved by these two models:

Both the models seem to achieve quite good accuracy on the validation set. Resnet50 reaches 0.93 while InceptionV3 outperforms it by approximately 0.03, getting a little bit more than 96% accuracy! Its interesting to see how these pre-trained models can be fine-tuned and achieve much higher accuracy than the models which are built from scratch (which is also a very time consuming process).

Code used in this blog post can be found here: https://github.com/abhishekkrthakur/anime_hentai















































































