So how does a neural network become a logical network?

In this section I am going to assume that you are at least somewhat comfortable with the ideas of neural networks. I’ll be throwing around the words bias, weight and activation function and I shan’t be taking the time to explain what I mean beyond this simple diagram.

In order for the neural network to become a logical network, we need to show that an individual neuron can act as an individual logical gate. To show that a neural network can carry out any logical operation it would be enough to show that a neuron can function as a NAND gate (which it can). However, to make things more beautiful and understandable, lets dive in deep and show how a neuron can act as any of a set of gates we will need — namely the AND and OR gates as well as a comparison gate of x>0.

Let’s ground this in a simple concrete example. In order to do this I would like to link you to a particular data set on the Tensorflow Playground. An image of a generated set of data from this distribution is below. Orange data points have a value of -1 and blue points have a value of +1. I’ll refer to the coordinates along the x-axis as the variable x1, and the coordinates along the y-axis as variable x2.

Example from Tensorflow playground

In the above figure you can see that the target variable is positive when both x1 and x2 are positive or where both x2 and x1 are negative. If we were to code this up using a logical network we might choose the logic

If (x1>0) AND (x2>0) THEN predict +1

(x2>0) predict +1 If (x1<0) AND (x2<0) THEN predict +1

(x2<0) predict +1 Else predict -1

In Python we could write this logic up as

expression = lambda x1,x2 : \\

(( (x1>0) & (x2>0) ) or ( !(x1>0) & !(x2>0) ))*2 -1

If you are familiar with logical gates you might notice that an inner part of this expression this is an XNOR gate.

XNOR = lambda A,B : ( A & B) or ( !A & !B )

What I want to show you is that we can build an efficient neural network for this logical expression.

Of course in this example, the most obvious thing would be to use the cross feature x1*x2 and make predictions using this. However, what I want to show is that a good solution is possible without using x1*x2, and to dive into the structure and interpretation of the network that emerges when this feature is not created. Also note that the logical neural network we are going to create could be extended to use 3 variables or more. You could imagine needing the variable which is a 3-input XNOR gate which we could not create by feature crossing between pairs of two variables. We also wouldn’t want to do all crosses of 3 variables, ie x1*x2*x3, as this is likely to make our number of features explode.

So lets go term by term through the logical expression, which will make the right predictions for this sample data set, and figure out what weights our neurons will need. Remember the expression we want to emulate with our neural network is

(( (x1>0) & (x2>0) ) or ( !(x1>0) & !(x2>0) ))*2 -1

Activation function

I want to quickly address the activation function we will be using for our neurons, the Sigmoid function. The reason we choose this is because the output of the Sigmoid function is close to the outputs of logical gates. Logical gates output either 1 or 0 and the Sigmoid function outputs ~1 or ~0 for all values of z where z is not close 0. ie. |z| >>0. This means the Sigmoid function will be a good choice to help our neurons emulate the AND and OR gates we need.

Comparison x1>0

This is the innermost part of our function. We can calculate this expression quite easily using a single neuron and a Sigmoid activation function. In this case there will only be one variable input to the neuron, x1, and we would like the neuron to output a value close to 1 if x1 is positive and output a value close to 0 if x1 is negative. Although it isn’t important for this problem, lets say that we want an output of 0.5 if x1 is exactly zero.

We have the input to the activation function of z = w1*x1 + b1. Looking at the figure of the Sigmoid function above, and thinking about our criteria above we can deduce that we want b=0 and w1 should be a large positive number. In fact as w1 goes towards infinity, the output of this neuron gets closer and closer to the output of the logical comparison x1>0. However lets take more moderate value and say w1=10.

In the table below we show the output of this comparison neuron (right hand column) for different values of x1 and for w1=10. You can see that it behaves very similar to x1>0.

Neural comparison truth table

Note can apply the same logic to calculated the comparison x2>0 and set w2=10 and b2=0.

AND Gate

Now we can do greater than comparisons, the next most inner function of the target logical expression is the AND operator. In this case the input to the sigmoid function will be z=w3*a1 + w4*a2 + b3. Here w3 and w4 are weights and a1 and a2 are the activations of the first layer of neurons. Variable a1 is very close to one if x1>0 and very close to zero if x1<0; the value of a2 is similar.

To emulate the AND operator we would like to set the weights and bias such that the output of the Sigmoid is very close to one if a1 and a2 are ~1 and zero otherwise. A good solution is to set b3=-3/2 * w3 and w4=w3. You can do some calculations yourself, and see how this choice in weights will fulfill our criteria (there are also some examples below). As w3 goes to infinity and w4 and b3 go to their corresponding limits, this neuron becomes more and more like a perfect AND gate. However we will choose a more moderate value of w3 = 10 and let b3 and w4 follow.

In the table below we show the output of this AND neuron (right hand column) for different values of a1 and a2 and for w1=10. You can see that it behaves very similar to an AND gate.

Neural AND gate truth table

OR Gate

The next most inner part of our logical expression is the OR gate. For an OR gate we want the output to be close to 1 when one or more of the inputs is ~1 and zero otherwise. The input to the Sigmoid function is z=w7*a3 + w8*a4 + b5. Here a3 represents whether both x1 and x2 were positive, and a4 represents whether both x1 and x2 were negative.

By comparing back to the Sigmoid function and considering the criteria above, we deduce that we can solve this by setting b5=-1/2 * w7 and w8=w7. Again this approaches a perfect OR gate as w1 goes to infinity and b5 and w8 go to their corresponding limits. However. Lets choose a modest value of w7=10, and let b5 and w8 follow.

Neural OR gate truth table

Final piece - multiply and add

For this outermost part of our expression you would either need to multiply the output of your neural network by 2 and subtract 1 or else have a final neuron with a linear activation function and weight w=2 and bias b=-1.

Putting it all together

We now have all of the pieces we need to create our neural network emulator of the logical expression above. So putting everything together the full architecture of the network looks like this

Neural architecture for expression

Where the values of the weights and biases are as such

You might notice that the weights of the NOT a1 AND NOT a2 neuron aren’t the same as those in the a1 AND a2 neuron. This is due to the NOT gate which I didn’t want to get bogged down in because it wouldn’t be fun to see this gate again. It may be fun however to try to figure this out yourself.

Using this full network we can test input values to predict an output value.

Looks good! It has the output we wanted from the logical expression, or at least it is very close.

Training and Learning

Now we have shown that this neural network is possible, now the remaining question is, it is possible to train. Can we expect that if we simply fed in the data drawn from the graph above after defining the layers, number of neurons and activation functions correctly, the network will train in this way?

No, not always, and not even often. The problem, like with many neural networks is one of optimization. In training this network it will often get stuck in a local minimum even though a near-perfect solution exists. This is where your optimization algorithm may play a large role, and this is something which Tensorflow Playground doesn’t allow you to change and may be the subject of a future post.

Your Turn

Now I recommend you go to Tensorflow Playground and try to build this network yourself, using the architecture (as shown in the diagrams both above and below) and the weights in the table above. The challenge is to do it by only using x1 and x2 as features and to build up the neural network manually. Note that due to peculiarities of the Tensorflow Playground you should only add the first three layers. The output layer where the scaling happens is obscured, but if you build up the first three layers and then train the network for very short time you should get results like below.

In the playground you can edit the weights by clicking on the lines which connect neurons. You can edit the bias values by clicking on the small squares at the bottom left corner of each neuron. You should be able to achieve a loss of 0.000. Remember to set your activation function to Sigmoid.

Ideas:

After you have built this network by manually inputting the weights, why not try to train the weights of this this network from scratch instead of constructing it in manually. I have managed to do this after many trials, but I believe it is quite sensitive to the seeding and often ends up in local minimums. If you find a reliable way to train this network using these features and this network structure please reach out in the comments.

Try to build this network using the only this number of neurons and layers. In this article I have shown that it is possible to do it with this many neurons only. If you introduce any more nodes then you will certainly have some redundant neurons. Although, with more neurons/layers, I have had better luck in training a good model more consistently.

I’ve shown you the weights and biases needed for the two logical gates and a comparison, can you find the weights and biases for the other logical gates in the table? Particularly the NAND gate (NOT AND), which is a universal gate, ie. if a neuron can implement this gate (or an approximation) then that proves a neural network is capable of any computational task.

Next Time

I hope you have enjoyed reading this article as and the playground experiment as much as I enjoyed discovering it.

I have ideas to extend this demonstration further by producing a Jupyter notebook to implement something along these lines in Keras. Then you can play with training neural network logical gate emulators yourself.