Simply put, any algorithm that has the ability to learn on its own, given a set of data, without having to program the rules of the domain explicitly, falls under the ambit of Machine Learning. This is different from Data Analytics or Expert systems where, rules, logic, propositions or activities has to be manually coded by an expert programmer.

Systems which has ability to learn on its own and progress towards a pre-defined goal, without much of human intervention can be broadly termed as Intelligent Systems. The quality of intelligence can range from an amoeba, algae, ant, armadillo all the way to chimps, humans or beyond.

Why Machine Learning?

As an example, systems which interact with humans in natural language cannot be built by coding the rules and conversational logic of human language. The rule base cannot accommodate the nuances and vastness of conversations that is possible. Or, systems that view a picture of a bird and catalog them into different categories cannot be programmed for all possible variations, quality of image, features of the bird, angle of photography, lighting, shadows, noise in the image etc..

Types of AI

AI or Artificial Intelligence is a specialized branch of Machine Learning and you can notice people use this interchangeably sometimes. Machine Learning is a broader set of algorithms which may not necessarily be considered as AI. AI can be broadly grouped into Artificial Narrow Intelligence (ANI), Artificial General Intelligence (AGI) and Artificial Super Intelligence (ASI).

The current (advanced) crop of algorithms you constantly hear in news and articles are all in the early stages of Artificial Narrow Intelligence or ANI. ANI is also termed as weak or lite AI. The objective of ANIs is to take a very narrow goal and improve the quality of results by accelerated, autonomous learning in order to improve accuracy while reducing costs. Goals can be to drive a car, play chess or Go, find an anomaly in trading, optimize the search relevance etc. While ANIs have out-competed humans in most areas, it is limited by its domain specificity (AlphaGO cannot drive a car, even if you want to teach it).

Artificial General Intelligence or AGI (Strong AI or the holy grail) is the ability of a system (or a collection of systems) to perform most activities that a Human can perform. We are talking about common sense activities which involves all senses of perception (vision, speech, hearing, touch..), understanding and judgement that any normal human is blessed with. Common sense such as understanding that something is ‘vulgar’ within the context of what is commonly shared by other humans in a given culture, locality or region (or, universality of what vulgar means). Ability to understand the good versus the bad in a ethical and moral way as an ‘ideal’ human should understand.

Artificial Super Intelligence or ASI (Singularity, fairy dust and beyond), is the ability of the machines to have acquired senses and sentience beyond human capabilities. It is hard to comprehend the magical powers these machines shall acquire and the possible outcomes given such powers.

IMO, when you read AI articles for year 2016 and 2017, you should equate that in your mind to ANI.

What’s with the AI proponents and detractors?

There has been strong camps around AI where one camp fears the rise of the machines or another camp embraces the progress. The fear is mainly around AGI and ASI, where it is hard to predict the outcome of what the machines will do, once they come to full senses. There are questions around survival of the human species upon a untoward accident (or glitch). The proponents, though are cautiously optimistic that the progress of technology is inevitable, independent of whether we ‘will’ it or not. Either ways, both the camps are putting in efforts to ensure that from ground-up, we are consciously encoding (or biasing) the machines with highest order of morality and ethics so as to avert ‘evil’ tendencies of the machine upon full senses.

Several luminaries like Stephen Hawking, Bill Gates, Elon Musk has been warning us on the rise of the intelligent machines and urging all of us to work on safe-guards, starting right now. To know more, you can follow the work by OpenAI (an initiative to help build safe AI) here : Concrete AI Safety Problems.

While, AGI and ASI is probably decades away, given the acceleration of technological systems, it is hard to predict the probability of arrival of AGI by end of this year (lucky accidents happen all the time).

That said, lite ANI is already quite pervasive, in your phones, at your service providers (telco), when you search for something, and probably in your homes if you have SmartHome kits installed.

How does AI work?

An AI system needs Data. Large quantities of Data. And it needs training. Broadly, learning-systems can be classified to do the following activities:

Recognizing Patterns (Recognize a cat from a car)

Detect Anomalies (Detect fraud in expense reports)

Prediction (Predict stocks)

Also these activities can be performed using the following learning techniques:

Supervised Learning: Here we need to know what to expect from the system as a output. The systems during training are told what the output needs to be. Ex: English Alphabets can be fed into the system to train to recognize english handwriting. The system shall predict any English handwriting. During training, few handwritten alphabets are trained on the systems as input along with the desired matching output alphabet. Once the system gains accuracy in recognizing input handwritten alphabets, it shall be exposed to reading cheques, or signs on the highway.

Unsupervised Learning: The systems are not told what the output needs to be, but allowed to expose the internal representation of the input. For example, automatically categorizing images with sunsets differently from images of a train. The system cannot recognize and tag the train as a “train” or a sunset as “sunset” though. (unlike supervised learning, which can recognize and tag).

Reinforcement Learning: The systems are asked to increase the payoff of an activity by selecting the most optimal action. Actions which increase payoffs are incentivized while options which reduce payoffs are penalized. While there may not be a predefined expected output, the systems can look at what payoffs are voted higher and optimize actions towards the same. For example, recommending music that I like based on my higher rating for music that I particularly enjoy.

Typically, the working of any learning systems is as follows:

To understand how learning systems train, we need to glimpse into how kids learn. The very first time a toddler comes across a cat, we say, “that’s a cat”. every time the toddler sees a cat, we continue to say that it’s a cat, until the toddler learns to recognize any cat (not necessarily a specific cat).

We have to train the AI systems in a similar fashion.

Let’s say I would like to build a model which recognizes the picture of cats. Let’s also say that a picture of a cat is fed to the model as a pixel of X and Y coordinates. In other words a 100x200 pixel size picture of a cat. Let’s say I have many different pictures of cats, say about a 1000 pictures.

During training, we set aside 500 pictures of cats to train the system to recognize how a cat looks like. In other words, we send the input signal of cat’s picture, and we also tell the model that we are sending the picture of a cat. (This is the idea behind supervised learning, where we tell the system what ouput is desired, and supervise the behavior of the systems to see if it can produce the output.)

The equation may look something like follows:

x -> Input pixels of cat

y -> Output which in our case is “cat”

w -> Is the “knowledge weight” that needs to be acquired or learnt on the cattiness of an image that makes it a cat.

So Output ‘y’ = (Some input picture ‘x’) and (some knowledge weight ‘w’ about the cattiness or composition of a cat)

In simplified terms;

y = x * w

We need to learn what ‘w’ is, as we don’t know what makes a cat. Without AI, you have to encode a lot of rules into ‘w’ about all different variations of cat, how it looks from the side, from above, under different lighting conditions, different size of cats, colors of cat, cat pictures in different actions etc.. This is close to impossible…

With AI modeling, the equation y= x * w can be considered as follows:

Let’s assume “*” as an operator is a known function that can be applied on x and w.

if ‘y’ as an output is known during training (which is “cat” in our case), and ‘x’ as a input is given (which is the pixels of a cat image) then, can we learn about the cattiness ‘w’ through training?

In other words:

w = y / x

The problem is, the ‘*’ operator is not a straight forward multiplication in the equation y = x * w. Hence there is no inverse for that operation that can be coded.

Instead, the system can be built as follows: If “*” is a known function, then can we change the equation to the following ?

E = (x * w) — y where E is a error, and we can find another function let’s say equivalent to ‘minus’ that can be used.

In other words, (as a mathematical equivalent),

if y = 20, x = 5, and w = is unknown, in the equation 20 = 5 * w, what we are trying to find out is the value for ‘w’.

20 = 5 * w ; (what is the value of ‘w’?)

Since w = 20 / 5 is not a possible operation as explained, can we arrive at:

E = (5 * w) — 20 where E tends to zero ? In other words, can we substitute the variable ‘w’ with different values in such a way that the Error ‘E’ is close to zero?

This is exactly how Neural Networks work. A Neural Net is a collection of nodes (called neurons) which are layered as input nodes, hidden nodes and output nodes. The input nodes can be the pixels from the cat image, the hidden nodes shall learn the ‘knowledge weights’ of the cattiness of a cat, and then one of the output node is selected based on the type of the cat.

In our model, the input pixels shall be fed through many input nodes {x1, x2, x3.. xn}, there shall be a output node ‘y’ for a particular type of cat and there are many hidden nodes with a weight ‘w’ assigned to the connection between the input node ‘x’ and the hidden node.

So in our model of equation E = (5 * w) — 20,

we can take a first random guess at ‘w’ as 10, then we get

E = (5 * 10) — 20 = 30.

Since E = 30 and is clearly greater than zero, we have to take a second guess.

Second random guess, w = -8 results to E = (5 * -8) — 20 = -60

By now, we have clear information that w = 10 resulted in 30 and w = -8 resulted in -60. So the value of w should lie between 10 and -8. We can repeat this exercise by reducing value from ‘w’ backwards or increasing ‘w’ from -8 forward until you arrive at 4.

The above is a over simplified example of the process. But the premise of how a neural network learns that the cattiness is a value 4 is a good enough analogy.

The ‘*’ operator in a neural network is called the Summation Function. The minus operator is called the Gradient Descent in order to minimize the error towards a minima. Each hidden node additionally contains a activation function which encodes the overall equation.

By now, you may have got the basic premise on how the neural networks or AI possibly works. The beauty of this techniques is that, as long as we can convert the inputs to a number, we can always learn the “knowledge weight” (another number) which shall automatically codify the knowledge without having to program any rules. The flip side, we will never know why the knowledge weight is a 4 for cattiness (as a analogy again). As long as the systems find the knowledge weights (which happens to be in fractions in real systems) that can recognize an input, we are good.

Remember, that we set aside 500 pics of cats for training among the 1000 pics? The other 500 is used as a validation set after training to see if the model can now recognize the remaining 500 pics as cat after training. If there are large errors in validation, then the model is continuously tuned to perform until the error is contained. Once the model performs within a error range, it can be taken to production.

Where do we go from here?

As stated, ANI has just started gaining prominence. I presume it shall take a decade before AGI can be achieved. But there is a whole host of research pending in ANI to automate many tasks around Sales, Journalism, IT Support, Law, Security, Medicine, Games, Operating Vehicles, Retail, eCommerce, Packaging, Manufacturing, Banking, Trading, Education, Governance etc… The world is waiting to get automated through such learning systems called ANI and the possibilities are endless before we even move towards AGI.