The idea of how neural networks work

Recently there has been a great buzz around the words “neural network” in the field of computer science and it has attracted a great deal of attention from many people. But what is this all about, how do they work, and are these things really beneficial?

Essentially, neural networks are composed of layers of computational units called neurons, with connections in different layers. These networks transform data until they can classify it as an output. Each neuron multiplies an initial value by some weight, sums results with other values coming into the same neuron, adjusts the resulting number by the neuron’s bias, and then normalizes the output with an activation function.

Iterative learning process

A key feature of neural networks is an iterative learning process in which records (rows) are presented to the network one at a time, and the weights associated with the input values are adjusted each time. After all, cases are presented, the process is often repeated. During this learning phase, the network trains by adjusting the weights to predict the correct class label of input samples.

Advantages of neural networks include their high tolerance to noisy data, as well as their ability to classify patterns on which they have not been trained. The most popular neural network algorithm is the backpropagation algorithm.

Once a network has been structured for a particular application, that network is ready to be trained. To start this process, the initial weights (described in the next section) are chosen randomly. Then the training (learning) begins.

The network processes the records in the “training set” one at a time, using the weights and functions in the hidden layers, then compares the resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights for application to the next record.

This process occurs repeatedly as the weights are tweaked. During the training of a network, the same set of data is processed many times as the connection weights are continually refined.

So what’s so hard about that?

One of the challenges for beginners in learning neural networks is understanding what exactly goes on at each layer. We know that after training, each layer extracts higher and higher-level features of the dataset (input), until the final layer essentially makes a decision on what the input features refer to. How can it be done?

Instead of exactly prescribing which feature we want the network to amplify, we can let the network make that decision. Let’s say we simply feed the network an arbitrary image or photo and let the network analyze the picture. We then pick a layer and ask the network to enhance whatever it detected. Each layer of the network deals with features at a different level of abstraction, so the complexity of features we generate depends on which layer we choose to enhance.

Popular types of neural networks and their usage

In this post on neural networks for beginners, we’ll look at autoencoders, convolutional neural networks, and recurrent neural networks.

Autoencoders

This approach is based on the observation that random initialization is a bad idea and that pre-training each layer with an unsupervised learning algorithm can allow for better initial weights. Examples of such unsupervised algorithms are Deep Belief Networks. There are a few recent research attempts to revive this area, for example, using variational methods for probabilistic autoencoders.

They are rarely used in practical applications. Recently, batch normalization started allowing for even deeper networks, we could train arbitrarily deep networks from scratch using residual learning. With appropriate dimensionality and sparsity constraints, autoencoders can learn data projections that are more interesting than PCA or other basic techniques.

Let’s look at the two interesting practical applications of autoencoders:

• In data denoising, a denoising autoencoder constructed using convolutional layers is used for efficient denoising of medical images.

A stochastic corruption process randomly sets some of the inputs to zero, forcing the denoising autoencoder to predict missing (corrupted) values for randomly selected subsets of missing patterns.

• Dimensionality reduction for data visualization attempts dimensional reduction using methods such as Principle Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE). They were utilized in conjunction with neural network training to increase model prediction accuracy. Also, MLP neural network prediction accuracy depended greatly on neural network architecture, pre-processing of data, and the type of problem for which the network was developed.

Convolutional Neural Networks

ConvNets derive their name from the “convolution” operator. The primary purpose of convolution in the case of a ConvNet is to extract features from the input image. Convolution preserves the spatial relationship between pixels by learning image features using small squares of input data. ConvNets have been successful in such fields as:

In the identifying faces work, they have used a CNN cascade for fast face detection. The detector evaluates the input image at low resolution to quickly reject non-face regions and carefully process the challenging regions at a higher resolution for accurate detection.

Calibration nets were also introduced in the cascade to accelerate detection and improve bounding box quality.