Getting Started

Before we dive into the various augmentation techniques, there’s one issue that we must consider beforehand.

Where do we augment data in our ML pipeline?

The answer may seem quite obvious; we do augmentation before we feed the data to the model right? Yes, but you have two options here. One option is to perform all the necessary transformations beforehand, essentially increasing the size of your dataset. The other option is to perform these transformations on a mini-batch, just before feeding it to your machine learning model.

The first option is known as offline augmentation. This method is preferred for relatively smaller datasets, as you would end up increasing the size of the dataset by a factor equal to the number of transformations you perform (For example, by flipping all my images, I would increase the size of my dataset by a factor of 2).

The second option is known as online augmentation, or augmentation on the fly. This method is preferred for larger datasets, as you can’t afford the explosive increase in size. Instead, you would perform transformations on the mini-batches that you would feed to your model. Some machine learning frameworks have support for online augmentation, which can be accelerated on the GPU.

Popular Augmentation Techniques

In this section, we present some basic but powerful augmentation techniques that are popularly used. Before we explore these techniques, for simplicity, let us make one assumption. The assumption is that, we don’t need to consider what lies beyond the image’s boundary. We’ll use the below techniques such that our assumption is valid.

What would happen if we use a technique that forces us to guess what lies beyond an image’s boundary? In this case, we need to interpolate some information. We’ll discuss this in detail after we cover the types of augmentation.

For each of these techniques, we also specify the factor by which the size of your dataset would get increased (aka. Data Augmentation Factor).

1. Flip

You can flip images horizontally and vertically. Some frameworks do not provide function for vertical flips. But, a vertical flip is equivalent to rotating an image by 180 degrees and then performing a horizontal flip. Below are examples for images that are flipped.

From the left, we have the original image, followed by the image flipped horizontally, and then the image flipped vertically.

You can perform flips by using any of the following commands, from your favorite packages. Data Augmentation Factor = 2 to 4x

# NumPy.'img' = A single image.

flip_1 = np.fliplr(img) # TensorFlow. 'x' = A placeholder for an image.

shape = [height, width, channels]

x = tf.placeholder(dtype = tf.float32, shape = shape)

flip_2 = tf.image.flip_up_down(x)

flip_3 = tf.image.flip_left_right(x)

flip_4 = tf.image.random_flip_up_down(x)

flip_5 = tf.image.random_flip_left_right(x)

2. Rotation

One key thing to note about this operation is that image dimensions may not be preserved after rotation. If your image is a square, rotating it at right angles will preserve the image size. If it’s a rectangle, rotating it by 180 degrees would preserve the size. Rotating the image by finer angles will also change the final image size. We’ll see how we can deal with this issue in the next section. Below are examples of square images rotated at right angles.

The images are rotated by 90 degrees clockwise with respect to the previous one, as we move from left to right.

You can perform rotations by using any of the following commands, from your favorite packages. Data Augmentation Factor = 2 to 4x

# Placeholders: 'x' = A single image, 'y' = A batch of images

# 'k' denotes the number of 90 degree anticlockwise rotations

shape = [height, width, channels]

x = tf.placeholder(dtype = tf.float32, shape = shape)

rot_90 = tf.image.rot90(img, k=1)

rot_180 = tf.image.rot90(img, k=2) # To rotate in any angle. In the example below, 'angles' is in radians

shape = [batch, height, width, 3]

y = tf.placeholder(dtype = tf.float32, shape = shape)

rot_tf_180 = tf.contrib.image.rotate(y, angles=3.1415) # Scikit-Image. 'angle' = Degrees. 'img' = Input Image

# For details about 'mode', checkout the interpolation section below.

rot = skimage.transform.rotate(img, angle=45, mode='reflect')

3. Scale

The image can be scaled outward or inward. While scaling outward, the final image size will be larger than the original image size. Most image frameworks cut out a section from the new image, with size equal to the original image. We’ll deal with scaling inward in the next section, as it reduces the image size, forcing us to make assumptions about what lies beyond the boundary. Below are examples or images being scaled.

From the left, we have the original image, the image scaled outward by 10%, and the image scaled outward by 20%

You can perform scaling by using the following commands, using scikit-image. Data Augmentation Factor = Arbitrary.

# Scikit Image. 'img' = Input Image, 'scale' = Scale factor

# For details about 'mode', checkout the interpolation section below.

scale_out = skimage.transform.rescale(img, scale=2.0, mode='constant')

scale_in = skimage.transform.rescale(img, scale=0.5, mode='constant') # Don't forget to crop the images back to the original size (for

# scale_out)

4. Crop

Unlike scaling, we just randomly sample a section from the original image. We then resize this section to the original image size. This method is popularly known as random cropping. Below are examples of random cropping. If you look closely, you can notice the difference between this method and scaling.

From the left, we have the original image, a square section cropped from the top-left, and then a square section cropped from the bottom-right. The cropped sections were resized to the original image size.

You can perform random crops by using any the following command for TensorFlow. Data Augmentation Factor = Arbitrary.

# TensorFlow. 'x' = A placeholder for an image.

original_size = [height, width, channels]

x = tf.placeholder(dtype = tf.float32, shape = original_size) # Use the following commands to perform random crops

crop_size = [new_height, new_width, channels]

seed = np.random.randint(1234)

x = tf.random_crop(x, size = crop_size, seed = seed)

output = tf.images.resize_images(x, size = original_size)

5. Translation

Translation just involves moving the image along the X or Y direction (or both). In the following example, we assume that the image has a black background beyond its boundary, and are translated appropriately. This method of augmentation is very useful as most objects can be located at almost anywhere in the image. This forces your convolutional neural network to look everywhere.

From the left, we have the original image, the image translated to the right, and the image translated upwards.

You can perform translations in TensorFlow by using the following commands. Data Augmentation Factor = Arbitrary.

# pad_left, pad_right, pad_top, pad_bottom denote the pixel

# displacement. Set one of them to the desired value and rest to 0

shape = [batch, height, width, channels]

x = tf.placeholder(dtype = tf.float32, shape = shape) # We use two functions to get our desired augmentation x = tf.image.pad_to_bounding_box(x, pad_top, pad_left, height + pad_bottom + pad_top, width + pad_right + pad_left) output = tf.image.crop_to_bounding_box(x, pad_bottom, pad_right, height, width)

6. Gaussian Noise

Over-fitting usually happens when your neural network tries to learn high frequency features (patterns that occur a lot) that may not be useful. Gaussian noise, which has zero mean, essentially has data points in all frequencies, effectively distorting the high frequency features. This also means that lower frequency components (usually, your intended data) are also distorted, but your neural network can learn to look past that. Adding just the right amount of noise can enhance the learning capability.

A toned down version of this is the salt and pepper noise, which presents itself as random black and white pixels spread through the image. This is similar to the effect produced by adding Gaussian noise to an image, but may have a lower information distortion level.

From the left, we have the original image, image with added Gaussian noise, image with added salt and pepper noise

You can add Gaussian noise to your image by using the following command, on TensorFlow. Data Augmentation Factor = 2x.

#TensorFlow. 'x' = A placeholder for an image.

shape = [height, width, channels]

x = tf.placeholder(dtype = tf.float32, shape = shape) # Adding Gaussian noise

noise = tf.random_normal(shape=tf.shape(x), mean=0.0, stddev=1.0,

dtype=tf.float32)

output = tf.add(x, noise)

Advanced Augmentation Techniques

Real world, natural data can still exist in a variety of conditions that cannot be accounted for by the above simple methods. For instance, let us take the task of identifying the landscape in photograph. The landscape could be anything: freezing tundras, grasslands, forests and so on. Sounds like a pretty straight forward classification task right? You’d be right, except for one thing. We are overlooking a crucial feature in the photographs that would affect the performance — The season in which the photograph was taken.

If our neural network does not understand the fact that certain landscapes can exist in a variety of conditions (snow, damp, bright etc.), it may spuriously label frozen lakeshores as glaciers or wet fields as swamps.

One way to mitigate this situation is to add more pictures such that we account for all the seasonal changes. But that is an arduous task. Extending our data augmentation concept, imagine how cool it would be to generate effects such as different seasons artificially?

Conditional GANs to the rescue!

Without going into gory detail, conditional GANs can transform an image from one domain to an image to another domain. If you think it sounds too vague, it’s not; that’s literally how powerful this neural network is! Below is an example of conditional GANs used to transform photographs of summer sceneries to winter sceneries.

Changing seasons using a CycleGAN (Source: https://junyanz.github.io/CycleGAN/)

The above method is robust, but computationally intensive. A cheaper alternative would be something called neural style transfer. It grabs the texture/ambiance/appearance of one image (aka, the “style”) and mixes it with the content of another. Using this powerful technique, we produce an effect similar to that of our conditional GAN (In fact, this method was introduced before cGANs were invented!).

The only downside of this method is that, the output tends to looks more artistic rather than realistic. However, there are certain advancements such as Deep Photo Style Transfer, shown below, that have impressive results.

Deep Photo Style Transfer. Notice how we could generate the effect we desire on our dataset. (Source: https://arxiv.org/abs/1703.07511)

We have not explored these techniques in great depth as we are not concerned with their inner working. We can use existing trained models, along with the magic of transfer learning, to use it for augmentation.