A single-layer fully-connected neural network used for classification

High-level deep learning libraries such as TensorFlow, Keras, and Pytorch do a wonderful job in making the life of a deep learning practitioner easier by hiding many of the tedious inner-working details of neural networks. As great as this is for deep learning, it comes with the minor downside of leaving many new-comers with less foundational understanding to be learned elsewhere. Our goal here is to simply provide a 1 hidden-layer fully-connected neural network classifier written from scratch (no deep learning libraries) to help chip away that mysterious black-box feeling you might have with neural networks. The Github repo to this project is at:

The provided neural network classifies a dataset describing geometrical properties of kernels belonging to three classes of wheat (you can easily replace this with your own custom dataset). An L2-loss function is assumed, and a sigmoid transfer function is used on every node in the hidden and output layers. The weight update method uses the delta rule which is gradient descent with an L2-norm.

For the remainder of this article, we outline the general steps taken by our code to build and train a neural network for class prediction. For more of my blogs, tutorials, and projects on Deep Learning and Reinforcement Learning, please check my Medium and my Github.

Our steps towards building a single-layer Neural Network classifier from scratch

1. Setting up n-fold cross-validation

For our n-fold cross-validation, we randomly permute all N example indices then take consecutive blocks of size ~N/n as our folds. Each fold serves as the test set for one of many cross-validation experiments, and the complement indices serve as the training set.

2. Building and training the neural network model

We have 2 fully-connected layers of weights: one connecting the input layer nodes with the hidden layer nodes, and one connecting the hidden layer nodes with the output layer nodes. Without any bias terms, this should total to (n_input*n_hidden + n_hidden*n_output) number of weights in the network. We initialize each weight by sampling the normal distribution.

Each node (neuron) has 3 attributes stored to memory: a list of weights connecting itself to its inputs nodes, an output value calculated from forward-passing some input, and a delta value representing its error from backward-propagating classification mismatches at the output layer. These 3 attributes are intertwined, and are updated through a three-process cycle of:

( A ) Forward-passing a training example to update the node outputs given our current node weights. Each node output is computed as a weighted sum of its previous layer inputs (no bias terms) followed by a sigmoid transfer function.

( B ) Backward-passing classification errors to update node deltas given our current node weights. To understand more about these deltas, we suggest reading https://en.wikipedia.org/wiki/Delta_rule as we use the same delta rule equations derived from applying gradient descent on an L2 loss function.

( C ) We perform a forward-pass to update the current weights using both the updated node outputs and deltas.

The training cycle process of ( A ) → ( B ) → ( C ) is performed on each training example for each training epoch.

3. Making class predictions

After training, we can simply use the model to make class predictions on our test examples by taking the argmax of the output received from forward-passing the text examples into the trained neural network. The accuracy score is the intuitive fraction of the number of examples where the network classified correctly divided by the total number of examples (in both the training and test set from n-fold cross-validation).