What is gradient descent ?

It is an optimization algorithm to find the minimum of a function. We start with a random point on the function and move in the negative direction of the gradient of the function to reach the local/global minima.

Homer descending !

Example by hand :

Question : Find the local minima of the function y=(x+5)² starting from the point x=3

Solution : We know the answer just by looking at the graph. y = (x+5)² reaches it’s minimum value when x = -5 (i.e when x=-5, y=0). Hence x=-5 is the local and global minima of the function.

Now, let’s see how to obtain the same numerically using gradient descent.

Step 1 : Initialize x =3. Then, find the gradient of the function, dy/dx = 2*(x+5).

Step 2 : Move in the direction of the negative of the gradient (Why?). But wait, how much to move? For that, we require a learning rate. Let us assume the learning rate → 0.01

Step 3 : Let’s perform 2 iterations of gradient descent

Step 4 : We can observe that the X value is slowly decreasing and should converge to -5 (the local minima). However, how many iterations should we perform?

Let us set a precision variable in our algorithm which calculates the difference between two consecutive “x” values . If the difference between x values from 2 consecutive iterations is lesser than the precision we set, stop the algorithm !

Gradient descent in Python :

Step 1 : Initialize parameters

cur_x = 3 # The algorithm starts at x=3

rate = 0.01 # Learning rate

precision = 0.000001 #This tells us when to stop the algorithm

previous_step_size = 1 #

max_iters = 10000 # maximum number of iterations

iters = 0 #iteration counter

df = lambda x: 2*(x+5) #Gradient of our function

Step 2 : Run a loop to perform gradient descent :

i. Stop loop when difference between x values from 2 consecutive iterations is less than 0.000001 or when number of iterations exceeds 10,000

while previous_step_size > precision and iters < max_iters:

prev_x = cur_x #Store current x value in prev_x

cur_x = cur_x - rate * df(prev_x) #Grad descent

previous_step_size = abs(cur_x - prev_x) #Change in x

iters = iters+1 #iteration count

print("Iteration",iters,"

X value is",cur_x) #Print iterations



print("The local minimum occurs at", cur_x)

Output : From the output below, we can observe the x values for the first 10 iterations- which can be cross checked with our calculation above. The algorithm runs for 595 iterations before it terminates. The code and solution is embedded below for reference.