Machine learning is a buzzword nowadays. There are plenty of theories going around, but it’s hard to see real applications that can be built by an indie developer. Developing an end-to-end machine learning system requires a wide range of expertise in areas like linear algebra, vector calculus, statistics, and optimization.

Therefore, from a developer’s perspective, there’s a high learning curve that stands in the way, but the latest tools take care of most of the work for developers, leaving them free to code. In this tutorial, we’ll create an indie iOS app that employs image classification recognize banknotes and read their values aloud for people with visually impairments.

This post will guide you through four steps:

Preparing a dataset to use in machine learning Data augmentation to diversify the dataset Transfer learning and fine-tuning to train the model faster Converting a Keras model to Core ML to use in an iOS app

First, let’s have a look at the tools and models we’ll be using.

Keras

As Mr. Le states “Keras is a wrapper over its backend libraries, which can be TensorFlow or Theano — meaning that if you’re using Keras with TensorFlow backend, you’re running TensorFlow code. Keras takes care of a lot of the nitty-gritty details for you, as it’s geared towards neural network technology consumers and is well suited for those practicing data science. It allows for easy and fast prototyping, supports multiple neural network architectures, and runs seamlessly on CPU/GPU.”

ResNet50

ResNet is an abbreviation for residual neural network. This network model is an improved version of the convolutional neural network (CNN). If you need to recap your knowledge about CNNs, take a look at this beginner’s guide.

ResNet solves the degradation problem of the CNN. This degradation problem is clearly stated in the original paper: “When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly.”

ResNet solves this problem by using shortcuts between layers. It’s a simple idea, but it really helps as the network gets deeper. It also uses a bottleneck design to shorten training time. ResNet50 is a 50-layered network trained on the ImageNet dataset. Instead of 2 layered (3x3) convolutions, it uses (1x1), (3x3), and (1x1) convolutions.

Dataset and Augmentation

As I want to classify Turkish banknotes, I had to create my own image dataset under varied conditions (light, perspective, etc.). For this task, I accumulated 900 photos. This dataset is an amateur dataset and very biased, but it’s okay for the prototype. For production, large datasets with much more variance should be used. There are 6 denominations. 80% of the images are separated for training, with the remaining 20% used for the test dataset.

In order to increase variance, I used a technique called data augmentation. This allows us to rotate, zoom, and rotate images in order to increase the number and the variance of images. All the images are resized to 224x224 in order to fit ResNet50’s input size. You can see augmented image samples below. The one on the left is the original, and the collage on the right is augmented.