Years ago in college, I took a course on the Philosophy of A.I. and Machine Learning. This course had me thinking about computers in ways I’d never thought of them before—not as abstract, magical black boxes, but as mechanical devices using simple rules. Understanding how these simple rules lead to complex processes is the key to understanding machine learning.

In this article I want to demonstrate the basics of Machine Learning — one of the more advanced and cutting edge areas of Computer Science. And as in the previous articles of this series, we’re not going to need any computers. Instead, we’ll use willing participants (a group of students) to create a real-life “machine” that can learn.

But first, let’s take review how computers make decisions.

Every computer is like a river

In my previous article How Computers Work, I described how you can visualize electricity flowing through different switches in a computer like water flowing through different paths down a mountainside. If you could control the flow of the river at many different points, and if the river flows fast enough, you’d have a rudimentary computer.

Conceptually, every fork in the river represents a decision between 0 or 1 (a true/false switch). If you want the mountain river to become a not-so-instant messenger, you’d need 26 different paths to represent the alphabet. If I’m at the top controlling the direction, I could use these different paths to write messages to someone at the bottom.

With many more paths for my river, I could program a calculator.

And with many, many more paths — we could have an artificially intelligent mountain river that can leverage machine learning techniques to easily beat the world’s best chess master.

Add another many, many more paths, maybe we would have a mountain that thinks, has a consciousness, and acts in the same way we imagine computers will one day act. It really has nothing to do with electricity or circuitry, any more than it has to do with water and mountains. (Speed and size definitely help, though.)

From decisions to learning

It’s easy enough to understand how switches make decisions. But how does a computer change the way it makes decisions — in other words, how can it learn?

The basic approach to learning is the same with machines as it is with animals, businesses, and people: Reinforce good choices, discourage bad ones. This simple idea lends itself beautifully to a demonstration that requires no technology at all — just lots of Starbursts.

Pavlov could have taught machines to learn.

In creating this activity, I was inspired by a tech-less Machine Learning system called MENACE. MENACE is a “computer” made up of more than 150 matchboxes that “plays” Tic-Tac-Toe against a human opponent.

At first, moves are random choices. But, over time as the better moves are reinforced, the machine will eventually only win or draw. It’s a fascinating approach that dates back to the 1960s, but unfortunately because of the number of matchboxes and the length of time, it doesn’t translate easily to a one-hour session with a younger class.

Starbursts, on the other hand, have a few great properties that make them an ideal substitute to both technology and matchboxes:

They come in many different colors (more on this in a moment). They are easy to manipulate. You can make eating them part of the algorithm.

Before getting into the exercise, I’ll quickly cover the game we’ll be playing. Rather than play Tic-Tac-Toe, which has many different permutations of the game board, we’ll play another classic game called Nim.

Understanding Nim

Nim is a mathematical game in which players take turns taking 1, 2, or 3 marbles from a row of marbles. The player which takes the last marble in the row loses. With the right setup, you can always win Nim, regardless of your opponent’s moves , which makes for a great parlor trick. Here’s how:

Starting with 13 marbles, have your opponent go first. Whatever number your opponent takes (1, 2 or 3), you should take an amount that would have a total of 4 marbles removed in that round. Your opponent takes 3, you take 1. Next round, they take 2, you take 2, etc.

This strategy works for 5, 9, 13 or larger numbers (any value of 4x+1).

How to always win playing Nim

Because Nim has a guaranteed strategy for success, it’s perfect for demonstrating a reinforced Machine Learning process. What we’ll be doing in our exercise is reinforcing this strategy with the human machine the students will “create.”

Before doing the following exercise, explain the rules of Nim to the students but don’t explain the winning strategy.

Exercise: Machine Learning with Starbusts

To set up this exercise, divide the students into groups of four. One player will be the human player, and the other 3 players will represent the computer. The human player can play as strategically as they want. But the computer players will play randomly, and then learn to win.

Here’s how the computer player will be “programmed” to behave:

Each of the computer players will have a cup initially filled with 1 pink, 1 yellow and 1 red Starbursts.

On a given computer player’s turn, they’ll randomly (without peeking) pull out a Starburst.

If the computer player pulls out a pink Starburst, they’ll remove 3 marbles. If it’s yellow, they’ll remove 2. If it’s red, they’ll only remove 1.

Now set up 5 marbles. The human will go first, selecting 1, 2, or 3 marbles. (It’s entirely up to them!)

On the computer’s turn there will either be 4 marbles, 3 marbles, or 2 to choose from. We only need to cover these 3 cases. (There is one scenario which could leave the computer faced with having to choose from only 1 marble. First round: human picks 2 marbles, and computer picks 1 marble. Second round: human picks 1 marble, leaving the computer to pick the last marble. But, since the computer has lost at this point, we don’t need to represent it.) So the 3 cases (4, 3, or 2 marbles remaining) are represented by a different computer player’s cup of Starbursts.

After the human’s turn, whatever number of marbles are left on the table, the computer player with that cup number determines (randomly) how many marbles the computer takes. Remember, the winning strategy: Number of marbles to take = 4-number taken by the human.

Because, initially, each of the cups will have 1 of each color Starburst in it, the computer’s choice is entirely random.

Starbursts learning to play Nim

If the computer picks the wrong color Starburst and immediately causes a loss, the human player gets to eat that Starburst. This discourages the computer from making a bad choice in the next round. If, however, the computer makes a winning choice, that color Starburst remains in the cup and another of the same color would be added to reinforce that color selection. (You don’t have to eat/remove the bad-choice Starbursts. But not removing it means it will take longer for the computer to learn the best strategy.)

After playing a few rounds the computer cups should have the winning color in the designated cup: the 4 cup had only pink Starbursts (pink representing taking 3), the 3 cup only had oranges, and the 2 cup only had yellows. The computer players get to eat their respective Starbursts at that point.

As mentioned, in this setup we’re both discouraging and reinforcing — but you can also only reinforce or only discourage. You can also have 9, 13, or 17 cups — but these would only prolong the exercise.

A typical run through goes like this:

5 marbles, human takes 3, leaving 2 on the table.

Computer player with 2 cup randomly picks a pink Starburst from their cup containing a pink (3), orange (2), and yellow (1).

Computer loses. (There were only 2 and computer removes 3.) The game resets, and the human gets to eat the pink Starburst from the 2 cup.

5 marbles, human takes 3, leaving 2.

Computer player with 2 cup pulls randomly pulls out yellow. Computer removes 1 marble, leaving 1 on the table.

Human loses. 2 cup gets reinforced with an additional yellow.

Final discussion

At the end of this exercise, I like to recap what “learning” is — whether for computers, our pets, or ourselves. Simply hearing something, or seeing something isn’t enough. What’s important is doing. And when we do, sometimes we don’t get it quite right. But we try again, adjust, and try again until we learn — just like the Starbursts.