For this experiment, two schemes were generated, “A” and “B”. Each scheme was a list of transformations that were applied sequentially to a randomly generated string of 7 characters.

If you want to take the Human Intuition Test before you see what transformations were used in the experiment, then you can do so now:

Take the Human Intuition Test!

Okay, here are the transformations for Scheme A and Scheme B:

Each transformation script uses a global “cursor” class instance to navigate around the string to modify letters using some helper methods. As an example, you can see the last transformation in Scheme B is:

cursor.moveTo(4).set(cursor.getRandomCon());

This transformation moves the cursor to the 5th letter (because the first letter is in the ZERO position), and sets it to a random consonant.

Gathering the Data From the Humans

Below is a screenshot of what the human test looked like within the web application. This delivery mechanism was how the web application could record human accuracy. Getting real people to take the online test was the next hurdle.

The Human Intuition Online Test : Wrong Answer!

Duncan Watts’s book, “Everything is Obvious : *Once you know the answer”, gave the solution to gathering human data. In the book, Duncan describes some non-obvious results from using human workers in the Mechanical Turk ecosystem. Mechanical Turk is an online service ran by Amazon to organize and deploy “human intelligence tasks” that real humans complete.

By setting up a “human intelligence task” on Mechanical Turk, 73 people signed up for and completed the task online in less than 5 hours.

( Read about experiences using Mechanical Turk in this article : “How to Attract “Turkers” and be a Mechanical Turk Hero!” )

Gathering the Data For the Machines

Getting the labelled data for training the neural network was the easy part. The web application used the configured schemes to generate random strings with their corresponding labels “A” or “B”.

Developing the Neural Network

After the analysis on the human participants was completed, a neural network needed to be setup for comparison with the human participants.

View the human results here: “Are you Intuitive? Challenge my Machine!”

How does the process of creating a neural network start?

The data set will ultimately determine the structure for the optimal network. For instance, doing image recognition on the Image Net data set requires a deep network with many convolution and pooling layers in order to extract more and more complex patterns. It is also often recommended to make your hidden layers as small as possible to avoid overfitting.

The data set isn’t as complex as a 2 dimensional color image, so it was reasonable to start with one hidden layer and then move onto more complex networks and more layers as needed.

What Should the Inputs Look Like?

Feeding letters straight into the neural network is not ideal. This is because it is important to make sure every important feature is represented and can be manipulated mathematically.

A simple way to transform each 7 letter string into a numerical input would be to map each letter to a corresponding digit between 1–26. This would mean that “A” would be 1 and “Z” would be 26. In the mathematical sense, “Z” may be treated as more important because of the larger magnitude which could lead to undesired results during training.

There is another option that still portrays relative position between letters but does not weight letters differently. This option is to transform each string into its one hot encoding representation.

The one hot encoding format will transform each letter into a 26 dimension array that will be ZERO for all entries in the array except for a single ONE in the position for the letter that is “ON”.

For example, take the first letter “W” in “WIYLEWS”. For the first letter you could say the “W” is “ON” and the rest of the possible letters are “OFF”. In a one hot encoding representation, this would be a matrix notation with a ONE for the 23rd letter and a ZERO for the rest. As a single dimensional matrix/vector that would look like this:

[0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,1,0,0, 0] — “W”

Repeating the process for the entire string gives:

[0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,1,0,0, 0] — “W” [0,0,0,0,0, 0,0,0,1,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0] — “I” [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,1, 0] — “Y” [0,0,0,0,0, 0,0,0,0,0, 0,1,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0] — “L” [0,0,0,0,1, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0] — “E” [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,1,0,0, 0] — “W” [0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0, 0,0,0,1,0, 0,0,0,0,0, 0] — “S”

Each of these arrays gets concatenated into one big ordered array that now represents all the possible letters that could be “on” or “off” in 7 different combinations. Each input will therefore be a single dimensional array of length 7 x 26 = 182.

This method would give the neural network access to every possible bit of information that was available.

The Single Hidden Layer

With the proper inputs created, a neural network could now be developed. The first artificial neural network explored was a simple one that had a single hidden layer. This hidden layer had the same number of neurons as the number of inputs (182).

The code for the neural network with one hidden layer is here:

Comparing the learning capabilities of the neural net with those of the humans was the goal. The best human performers were able to hit 80% accuracy in an average of 43 examples. Therefore, it would need to be determined how many training iterations it would take for the neural network to reach 80% accuracy.

During training, a batch size of 1 was used (adjusting weights after each input) for one epoch (number of times using the data set) to mimic the iterative training the humans received.

Here are the hyper parameters and configurations used:

Batch Size: 1

Epochs: 1

Non-linear Activation Function: None (simple linear activation)

Gradient Descent Learning Rate: 0.03

Initial Random Weight Function: Random Uniform with stddev = 0.01

Initial Random Weights for Bias: Zeros

Single hidden layer as a first attempt (3 inputs on the left represent the actual 182 input used in the real neural network)

After the hyper parameters were optimized, the neural network achieved an average of 77% accuracy on the verification data after 43 training rounds. Below is a graph of the accuracy on 20 random data sets: