Posted September 9, 2014 By Presh Talwalkar. Read about me , or email me .

A Nash equilibrium is a situation in which no single player can profitably deviate. In other words, once the players choose strategies to form a Nash equilibrium, it is impossible for any single person to change a strategy and gain.

In some games, it is easy to see understand how players reach the Nash equilibrium. In the prisoner’s dilemma, for example, confessing to the crime is a dominant strategy and so it is obvious why each would play for self-interest. Or in a game like rock-paper-scissors, even a child quickly learns to randomize equally to avoid being outguessed.

Accordingly, one justification for the concept of a Nash equilibrium is that players might reach it from learning in games. In fact, in certain games we can prove that players are guaranteed to reach the Nash equilibrium from a dynamic learning process.

The result is presented in Fudenberg and Tirole’s Game Theory, which I do my best to re-tell in this post.

I warn the casual reader this post is one of the more math-heavy discussions. One of the beauties of math is seeing how results in seemingly different fields relate to each other. In this post, we’ll explain how Nash equilibrium is connected to the concepts of learning, differential equations, and even eigenvalues!

We start out with a game of Cournot competition and then generalize to a dynamic learning process.

.

.

"All will be well if you use your mind for your decisions, and mind only your decisions." Since 2007, I have devoted my life to sharing the joy of game theory and mathematics. MindYourDecisions now has over 1,000 free articles with no ads thanks to community support! Help out and get early access to posts with a pledge on Patreon. .

.



Cournot Competition

Imagine two firms that compete on quantity, with each firm picking a quantity level between 0 = minimum and 1 = maximum. We denote the quantities as q 1 and q 2 , and we assume the firms can produce the good costlessly.

The price they will receive for the good depends on the total quantity produced by each. If they both produce very little, the good is scarce and the price can reach a level of 1 = maximum. If they both flood the market, then the good will be plentiful and the price drops down accordingly. We will model the market price as dropping linearly with quantity. Specifically, the market price p = 1 – q 1 – q 2 .

If each firm wants to maximize profits, and they are playing strategically, what will be the result of this game?

Let’s consider the game from firm 1’s perspective. The profit of the firm is equal to the amount the firm produces, q 1 , times the market price, which depends on q 1 and q 2 . So let’s write the profit level as a function π 1 in terms of what each firm chooses to produce.

Firm 1 Profit = (quantity firm 1 makes)(market price) π 1 (q 1 , q 2 ) = (quantity 1)(market price) π 1 (q 1 , q 2 ) = q 1 (1 – q 1 – q 2 ) π 1 (q 1 , q 2 ) = q 1 – q 1 2 – q 1 q 2

Firm 1 wants to maximize profits, which happens when the partial derivative of the profit function with respect to q 1 is equal to 0. The first-order condition implies the following.

1 – 2q 1 – q 2 = 0 q 1 = (1 – q 2 )/2

The above equation describes a reaction-curve or a best-reply function. For a given quantity level firm 2 produces, q 2 , firm 1 maximizes profits by producing with r 1 (q 2 ) = (1 – q 2 )/2. For example, if firm 2 produces 0, then firm 1 should produce 1/2. Or if firm 2 produces 1/2, then firm 1 should produce 1/4. Or if firm 2 produces 1, then firm 1 should produce 0. So for any given quantity firm 2 produces, firm 1 knows what to do.

We can plot this reaction-curve to get the following graph.

By a similar analysis, we can derive the reaction-curve for firm 2. The profit-maximizing amount for firm 2, given what firm 1 produces, is given by r 2 (q 1 ) = (1 – q 1 )/2.

We can plot this reaction-curve on the same axes as well.

What is the Nash equilibrium of this game?

From firm 2’s profit maximizing condition, q 2 = (1 – q 1 )/2. We can substitute that into firm 1’s profit maximizing condition and derive the following.

q 1 = (1 – q 2 )/2 q 1 = (1 – [(1 – q 1 )/2])/2 2q 1 = 1 – (1 – q 2 )/2 2q 1 = 1/2 + q 2 /2 4q 1 = 1 + q 2 3q 1 = 1 q 1 = 1/3

We can then derive q 2 = (1 – q 1 )/2 = 1/3 as well.

Both firms produce 1/3, and that is the Nash equilibrium of the game. The solution corresponds to the point on the graph where the two reaction curves intersect.

Sequential Learning in the Cournot Game

We solved the Cournot game algebraically. But we could also have imagined the game as the result of a sequential learning process. That is, we could imagine the two firms start out producing certain levels and then each, in turn, improves production by optimizing using the reaction curves.

For example, let’s say initially firm 1 produced q 1 = 1/6.

Firm 2 thinks, what is its best response? Based on firm 1 producing 1/6, firm 2 will consider its profit-maximizing reaction. This will be q 2 = (1 – 1/6)/2 = 5/12.

Graphically, this is seen by drawing a horizontal line from (0, 1/6) to firm 2’s reaction curve, which will intersect at the point (5/12, 1/6).

Now firm 1 will think if it can do any better. If firm 2 is producing 5/12, then firm 1 should produce (1 – 5/12)/2 = 7/24.

This can be seen graphically by drawing a vertical line to firm 1’s reaction curve.

We can continue the sequential learning process to find out that the two firms do end up producing the Nash equilibrium of 1/3 each.

At the point of (1/3, 1/3), the learning process results in the same outcome. Hence this is a steady state.

Note how the learning process converged to a steady state that was the Nash equilibrium. In fact, the learning process for this game will converge to the Nash equilibrium from any initial quantities “close” to it.

This is known as an asymptotically stable steady state.

Differential Equations

When does the learning process result in an asymptotically stable steady state?

Consider the following graph of reaction curves that has 3 Nash equilibriums, defined by the points where the reaction curves intersect.

The leftmost and rightmost equilibriums are asymptotically stable: if you start out close enough to them, the learning process will end up in those steady states.

The center equilibrium is not asymptotically stable. If you start out just to its left, you will go to the leftmost steady state, and if you start out to its right, you will go to the rightmost steady state.

As you might surmise from these examples, the condition of asymptotic stability depends on the slope of the lines for the reaction curves.

If we denote firm 1’s reaction curve (the profit-maximizing condition) as r 1 , and firm 2’s as r 2 , then a sufficient condition for an equilibrium to be asymptotically stable is the product of their slopes should be less than 1 in absolute value.

This should be true in an open neighborhood of the Nash equilibrium.

For the Cournot game, the reaction curves had slopes of 1/2 in absolute value. The products of the slopes was 1/4, which is indeed less than 1.

Eigenvalues

What if the firms were learning simultaneously instead of sequentially? That is, what if both firms were responding in the next period to what the other firm was producing in the last period?

We can describe a dynamics process. Define the quantity at period t as the ordered pair of what firm 1 and firm 2 produce. That is,

qt = (q 1 t, q 2 t)

What each firm produces in period t is the best response to what the other was producing in the previous period t – 1. So we have the following.

qt = (q 1 t, q 2 t) = (r 1 (q 2 t-1), r 2 (q 1 t-1))

We will define a function f as the learning process that maps the quantities to the next period. Hence we have

qt = (q 1 t, q 2 t) = (r 1 (q 2 t-1), r 2 (q 1 t-1)) = f(qt-1)

In a Nash equilibrium, both firms would want to keep producing the same set of quantities. Therefore, a Nash equilibrium is a fixed point q* in which f(q*) = q*.

When is a fixed point of f asymptotically stable?

The answer comes from studying dynamic systems, and it depends on linear algebra! A fixed point of a discrete time process is asymptotically stable precisely when the eigenvalues of the partial derivatives have real parts whose absolute value is less than 1. (You can read more about the condition for discrete asymptotic stability.)

In the Cournot example, the mapping was f(qt-1) = (0.5(1 – q 2 t), 0.5(1 – q 1 t)). The partial derivatives were column vectors [0, -0.5] and [-0.5, 0] which both have eigenvalues of -0.5. These are indeed less than 1 and the Nash equilibrium was stable.

This concludes the mathematical post and I hope you enjoyed seeing how game theory is related to learning, differential equations, and eigenvalues.

If you liked this kind of math, you would definitely enjoy studying a text like Fudenberg and Tirole’s Game Theory.