Newton’s method: the visual intuition.

If at first you don’t succeed, try try try again.

Newton and Raphson

The method we’re going to cover in this blog is also called the Newton-Raphson method. From the name, you might picture Newton and Raphson working together as a team, coming up with it like buddies. But in reality, they discovered it independently. This seems to be a common theme with Newton. Remember how he also discovered calculus with Leibnitz? How come this guy was having so many ideas at the exact time other people were having them too? It’s actually quite possible, since, for multiple people at the cutting edge of research in a field, the next steps are often not too ambiguous.

It’s the same reason that multiple independent teams discovered the security vulnerabilities in Intel chips (Spectere and Meltdown).

But anyway, enough history, let’s get into how it works.

What is x?

The essence of algebra (derived from Arabic, Al-Jabr — putting together parts) is solving equations for unknown quantities. For example, find x where:

2x + 5 = 7 __(1)

It’s easy to see that the solution to the equation above is x=1. Another way to look at this is to take everything to one side and calling the expression y. We get: y=2x−2. Then, we can see try and find the x for which y=0.

But what happens when the expression of y in terms of x becomes more and more complex? Can we find all (or at least one of) the values of x that satisfy y=0? For example, figure 1 below shows some more complex relationships y can have with x. In all cases though, y=0 is satisfied at x=1 (though maybe not exclusively).

Figure 1: Different functions that all have zeros at x=1.

If linear functions are the simplest, then perhaps the next in rank are quadratics (involve x²). Now, if we have a quadratic equation, how would we go about solving it? Well, we could simply use the quadratic formula (when there is only one variable, x). But let’s for a second suppose that we don’t know this formula. We only know how to solve linear systems like equation (1). Can we use this knowledge of solving a linear equation for solving a non-linear (in this case, quadratic) equation?

y=x² __(2)

Well, we can, but only if we’re persistent. Let’s start with any random point, x. Now, calculate the value of our quadratic function at this x and call it y. Our tool is a linear equation solver but we have a quadratic equation instead. So, let’s convert the quadratic equation into a linear equation. To do this, just approximate the quadratic equation with a linear equation (using Taylor series). When we solve this linear equation, we will get an “answer” for x. But we can’t expect this answer to be “right” since we “cheated” — solved a linear equation instead of a quadratic one. Since the definition of insanity is doing the same thing and expecting a different result, we keep repeating this process. The only difference being that this time we use the previous solution to the linear equation as our starting point. And lo-behold eventually, this process leads us to the solution for the quadratic equation as you can see below.

Figure 2: Newton Raphson iterations for a parabola taking us closer and closer to one of the points where the parabola intersects the x-axis.

Cranking up the dimensions

Now, it is possible that you might have seen something very similar to the visualization above before. But, how does this extend to multiple dimensions? For example, instead of the one variable x, let’s say we now have two variables, x and y. The most natural way to extend equation (2) to two dimensions is:

z=x²+y² __(3)

And here is what its plot looks like:

Figure 3: A paraboloid, a kind of quadratic equation in multiple dimensions. Here, x and y are the dimensions and the z-axis represents the functional form of the paraboloid.

Like before, we want to solve for z=0. This happens at the green circle in the figure above, which is where our paraboloid equation intersects the x-y plane. But the green circle is an infinite number of points, not one, two or three. This is quite natural since we increased the number of variables to two but kept the number of equations the same. In order to get a finite number of solutions, we need to have the same number of equations as variables, which is two.

There are many candidates we could choose for our second equation. To keep things simple, let’s just replicate our existing equation and move it by a small amount. And just like that, we have two equations now.

Figure 3: We want to demonstrate solving multiple equations. To keep things simple, we just take the existing equation of our paraboloid and replicate it to form the second one.

Also, the second equation intersects the x-y plane in a circle as well and the two circles intersect at two distinct points which are the solutions to the system of equations. These are the two yellow points in the figure below.

Figure 4: The two paraboloids representing our two quadratic equations intersecting at two points.

Now, how do we use our method from before to get to one of these solutions?

We’ll start with any random point on the x-y plane (the pink point in figure 5 below). This can be projected to the green point on the green paraboloid (which was our first quadratic equation) and the yellow point on the yellow paraboloid (our second quadratic equation). We can then draw the best linear approximations of the green and yellow paraboloids at the green and yellow points. Since linear equations are planes, this gives us the green and yellow planes. These planes will intersect in the purple line further goes on to intersect the x-y plane at some point. This point is the solution of the system of two linear equations (approximations of the two parabolas).

Figure 5: NR iterations - we repeatedly solve the system of linear equations obtained by approximating the two quadratic equations. This leads us closer and close to one of the real solutions of the original quadratic system.

This point is not the solution of the system of quadratic equations, of course, because we “cheated” and approximated them with linear equations. But then we repeat the entire process starting at this new point. And doing this again and again takes us to one of the solutions (two yellow points) of the two quadratic equations.

What if we wanted the second solution? Well, then we start with a different random starting point and repeat the process until we find a solution we haven’t already seen before.

Note: you’ll see Newton Raphson in the context of optimization whereas here, we described it as a method for solving equations. But when we note that optimization generally involves finding the gradient (a vector of derivatives) and setting it to zero, it does reduce to a problem of solving systems of equations.

Check out the video version of this blog: