Strategic-form games

For reference, we begin by introducing some basic terminology and formalism for strategic-form games. For an introduction, see, e.g., Osborne (2004). For reasons that will become apparent later on, we limit our treatment to two-player games.

A two-player strategic game\(G=(A_1,A_2,u_1,u_2)\) consists of two countable sets of moves \(A_i\) and for both players \(i\in \{1,2\}\) a bounded utility function \(u_i:A_1\times A_2 \rightarrow {\mathbb {R}}\). A (mixed) strategy for player i is a probability distribution \(\pi _i\) over \(A_i\).

Given a strategy profile\((\pi _1,\pi _2)\) the probability of an outcome\((a_1,a_2)\in A_1\times A_2\) is

$$\begin{aligned} P(a_1,a_2 \mid \pi _1,\pi _2 ):=\pi _1(a_1)\cdot \pi _2(a_2). \end{aligned}$$ (1)

The expected value for player i given that strategy profile is

$$\begin{aligned} {\mathbb {E}}\left[ u_i \mid \pi _1,\pi _2 \right] :=\sum _{(a_1,a_2)\in A_1\times A_2} P(a_1,a_2 \mid \pi _1,\pi _2 ) \cdot u_i (a_1,a_2). \end{aligned}$$

Note that because the utility function is bounded, the sum converges absolutely, such that the order of the action pairs does not affect the sum’s value.

Program equilibrium

We now introduce the concept of program equilibrium, first proposed by Tennenholtz (2004). The main idea is to replace strategies with computer programs that are given access to each other’s source code.Footnote 1 The programs then give rise to strategies.

For any game G, we first need to define the set of program profiles\( PROG (G)\) consisting of pairs of programs. The ith entry of an element of \( PROG (G)\) must be a program source code \(p_i\) that, when interpreted by a function \( apply \), probabilistically map program profilesFootnote 2 onto \(A_i\).

We require that for any program profile \((p_1,p_2)\in PROG (G)\), both programs halt. Otherwise, the profile would not give rise to a well-defined strategy. Whether \(p_i\) halts depends on the program \(p_{-i}\), it plays against, where (in accordance with convention in game theory) \(-:\{1,2\}\rightarrow \{1,2\}:1\mapsto 2, 2 \mapsto 1\) and we write \(-i\) instead of \(-(i)\). For example, if \(p_{i}\) runs \( apply (p_{-i},(p_i, p_{-i}))\), i.e., simulates the opponent, then that is fine as long as \(p_{-i}\) does not also run \(apply(p_i,(p_i,p_{-i}))\), which would yield an infinite loop. To avoid this mutual dependence, we will generally require that \( PROG (G)= PROG _1(G)\times PROG _2(G)\), where \( PROG _i(G)\) consists of programs for player i. Methods of doing this while maintaining expressive power include hierarchies of players—e.g., higher indexed players are allowed to simulate lower indexed ones but not vice versa—hierarchies of programs—programs can only call their opponents with simpler programs as input—requiring programs to have a “plan B” if termination can otherwise not be guaranteed, or allowing each player to only start strictly less than one simulation in expectation. These methods may also be combined. In this paper, we do not assume any particular definition of \( PROG (G)\). However, we assume that they can perform arbitrary computations as long as these computations are guaranteed to halt regardless of the output of the parts of the code that do depend on the opponent program. We also require that \( PROG (G)\) is compatible with our constructions. We will show our constructions to be so benign in terms of infinite loops that this is not too strong of an assumption.

Given a program profile \((p_1,p_2)\), we receive a strategy profile \(( apply (p_1,(p_1,p_2)), apply (p_2,(p_1,p_2)))\). For any outcome \((a_1,a_2)\) of G, we define

$$\begin{aligned} P(a_1,a_2\mid p_1,p_2) :=P(a_1,a_2\mid apply (p_1,(p_1,p_2)), apply (p_2,(p_1,p_2)))

onumber \\ \end{aligned}$$ (2)

and for every player \(i\in \{1,2\}\), we define

$$\begin{aligned} {\mathbb {E}}\left[ u_i \mid p_1,p_2 \right] :=\sum _{(a_1,a_2)\in A_1\times A_2} P(a_1,a_2 \mid p_1,p_2 ) \cdot u_i (a_1,a_2). \end{aligned}$$ (3)

For player i, we define the (set-valued) best response function as

$$\begin{aligned} B_i(p_{-i})={{\mathrm{arg\,max}}}_{p_i\in PROG _i(G)} {\mathbb {E}}\left[ u_i \mid p_i,p_{-i} \right] . \end{aligned}$$

A program profile \((p_1,p_2)\) is a (weak) program equilibrium ofG if for \(i\in \{1,2 \}\) it is \(p_i\in B_i(p_{-i})\).

Repeated games

Our construction will involve strategies for the repeated version of a two-player game. Thus, for any game G, we define \(G_{\epsilon }\) to be the repetition of G with a probability of \(\epsilon \in \left( 0,1\right] \) of ending after each round. Both players of \(G_{\epsilon }\) will be informed only of the last move of their opponent. This differs from the more typical assumption that players have access to the entire history of past moves. We will later see why this deviation is necessary. A strategy \(\pi _i\) for player i non-deterministically maps opponent moves or the information of the lack thereof onto a move

$$\begin{aligned} \pi _i :\{ 0 \} \cup A_{-i} \rightsquigarrow A_i. \end{aligned}$$

Thus, for \(a\in A_i, b\in A_{-i}\), \(\pi _i(b,a):=\pi _i(b)(a)\) denotes the probability of choosing a given that the opponent played b in the previous round and \(\pi _i(0,a):=\pi _i(0)(a)\) denotes the probability of choosing a in the first round. We call a strategy \(\pi _i\)stationary if for all \(a\in A_i\), \(\pi _i(b,a)\) is constant with respect to \(b\in \{ 0 \} \cup A_{-i}\). If \(\pi _i\) is stationary, we write \(\pi _i(a):=\pi _i(b,a)\). The probability that the game follows a complete history of moves \(h=a_0b_0a_1b_1\cdots a_nb_n\) and then ends is

$$\begin{aligned} P(h\mid (\pi _1,\pi _2)) :=\pi _1(0,a_0) \pi _2(0,b_0) \epsilon (1-\epsilon )^n \prod _{i=1}^n \pi _1(b_{i-1},a_i) \pi _2(a_{i-1},b_i). \end{aligned}$$ (4)

Note that the moves in the history always come in pairs \(a_ib_i\) which are chosen “simultaneously” in response to \(b_{i-1}\) and \(a_{i-1}\), respectively. The expected value for player i given the strategy profile \((\pi _1,\pi _2)\) is

$$\begin{aligned} {\mathbb {E}}\left[ u_i \mid \pi _1,\pi _2 \right] :=\sum _{h\in (A_1\cdot A_2)^+} P(h \mid \pi _1,\pi _2 ) \cdot u_i (h), \end{aligned}$$ (5)

where \((A_1\cdot A_2)^+\) is the set of all histories and

$$\begin{aligned} u_i (a_0b_0a_1b_1\cdots a_nb_n) :=\sum _{i=0}^n u_i(a_i,b_i). \end{aligned}$$ (6)

The lax unordered summation in Eq. 5 is, again, unproblematic because of the absolute convergence of the series, which is a direct consequence of the proof of Lemma 1. Note how the organization of the history into pairs of moves allows us to apply the utility function of the stage game in Eq. 6.

For player i, we define the set-valued best response function as

$$\begin{aligned} B_i(\pi _{-i})={{\mathrm{arg\,max}}}_{\pi _i:\{0\}\cup A_{-i}\rightsquigarrow A_i} {\mathbb {E}}\left[ u_i \mid \pi _i,\pi _{-i} \right] . \end{aligned}$$

Analogously, \(B_i^c(\pi _{-i})\) is the set of responses to \(\pi _{-i}\) that are best among the computable ones, \(B_i^s(\pi _{-i})\) the set of responses to \(\pi _{-i}\) that are best among the stationary ones, and \(B_i^{s,c}(\pi _{-i})\) the set of responses to \(\pi _{-i}\) that are best among stationary computable strategies. A strategy profile \((\pi _1,\pi _2)\) is a (weak) Nash equilibrium of\(G_\epsilon \) if for \(i\in \{1,2 \}\) it is \(\pi _i\in B_i(\pi _{-i})\).

We now prove a few lemmas that we will need later on. First, we have suggestively called the values P(h) probabilities, but we have not shown them to satisfy, say, Kolmogorov’s axioms. Additivity is not an issue, because we have only defined the probability for atomic events and non-negativity is obvious from the definition. However, we will also need the fact that the numbers we have called probabilities indeed sum to 1, which requires a few lines to prove.

Lemma 1

Let \(G_\epsilon \) be a repeated game and \(\pi _1,\pi _2\) be strategies for that game. Then

$$\begin{aligned} \sum _{h\in (A_1\cdot A_2)^+} P(h \mid \pi _1,\pi _2 ) = 1. \end{aligned}$$

Proof

For seeing why the second-to-last equation is true, notice that the inner-most sum is 1. Thus, the next sum is 1 as well, and so on. Since the ordering in the right-hand side of the first line is lax, and because only the second line is known to converge absolutely, the re-ordering is best understood from right to left. The last step uses the well-known formula \(\sum _{k=0}^\infty x^k=1/(1-x)\) for the geometric series. \(\square \)

For any game \(G_\epsilon \), \(k\in {\mathbb {N}}_{+}\), \(a\in A_1\), \(b\in A_2\) and strategies \(\pi _1\) and \(\pi _2\) for \(G_\epsilon \), we define

(7)

For \(k=0\), we define

$$\begin{aligned} P_{0, G_\epsilon } (a,b\mid \pi _1,\pi _2) :=\pi _1(0,a)\cdot \pi _2(0,b). \end{aligned}$$

Intuitively speaking, \(P_{k, G_\epsilon } (a,b\mid \pi _1,\pi _2)\) is the probability of reaching at least round k and that (a, b) is played in that round. With this

$$\begin{aligned} \sum _{ab\in A_1A_2} P_{k, G_\epsilon } (a,b\mid \pi _1,\pi _2) u_i(a,b) \end{aligned}$$

should be the expected utility from the kth round (where not getting to the kth round counts as 0). This suggests a new way of calculating expected utilities on a more round-by-round basis.

Lemma 2

Let \(G_\epsilon \) be a game, and let \(\pi _1,\pi _2\) be strategies for that game. Then

$$\begin{aligned} {\mathbb {E}}\left[ u_i \mid \pi _1,\pi _2 \right] = \sum _{k=0}^\infty \sum _{ab\in A_1A_2} P_{k, G_\epsilon } (a,b\mid \pi _1,\pi _2) u_i(a,b). \end{aligned}$$

Proof

$$\begin{aligned}&{\mathbb {E}}_{G_\epsilon } \left[ u_i \mid \pi _1, \pi _{2}\right] = \sum _{h\in (A_1A_2)^+} P(h\mid \pi _1,\pi _{2})u_i(h)\\&\quad \underset{\text {Eqs. }4,~6}{=} \sum _{a_0b_0\cdots a_nb_n\in (A_1A_2)^+} \pi _1(0,a_0) \pi _2(0,b_0) \epsilon (1-\epsilon )^n \left( \prod _{j=1}^n \pi _1(b_{j-1},a_j) \pi _{2}(a_{j-1},b_j) \right) \\&\qquad \quad \quad \cdot \sum _{k=0}^n u_i(a_k,b_k) = \sum _{k=0}^\infty \sum _{a_0b_0\cdots a_nb_n\in (A_1A_2)^{\ge k+1}} \pi _1(0,a_0) \pi _{2}(0,b_0) \epsilon (1-\epsilon )^n \\&\qquad \quad \quad \cdot \left( \prod _{j=1}^n \pi _1(b_{j-1},a_j) \pi _2(a_{j-1},b_j)\right) u_i(a_k,b_k)\\&\quad \quad = \sum _{k=0}^\infty \sum _{a_0b_0\cdots a_kb_k\in (A_1A_2)^{k+1}} \sum _{a_{k+1}b_{k+1}\cdots a_nb_n\in (A_1A_2)^*} \pi _1(0,a_0) \pi _2(0,b_0) \epsilon (1-\epsilon )^n u_i(a_k,b_k)\\&\qquad \qquad \cdot \left( \prod _{j=1}^{k} \pi _1(b_{j-1},a_j) \pi _2(a_{j-1},b_j) \right) \left( \prod _{j=k+1}^{n} \pi _1(b_{j-1},a_j) \pi _2(a_{j-1},b_j) \right) \\&\quad \quad = \sum _{k=0}^\infty \sum _{a_0b_0\cdots a_kb_k\in (A_1A_2)^{ k+1}} \pi _1(0,a_0) \pi _2(0,b_0) \epsilon (1-\epsilon )^k u_i(a_k,b_k)\\&\qquad \qquad \cdot \left( \prod _{j=1}^{k} \pi _1(b_{j-1},a_j) \pi _2(a_{j-1},b_j) \right) \\&\qquad \qquad \cdot \sum _{a_{k+1}b_{k+1}\cdots a_nb_n\in (A_1A_2)^*} \epsilon (1-\epsilon )^{n-k} \left( \prod _{j=k+1}^{n} \pi _1(b_{j-1},a_j) \pi _2(a_{j-1},b_j) \right) \\&\quad \underset{\text {lemma }1}{=} \sum _{k=0}^\infty \sum _{a_0b_0\cdots a_kb_k\in (A_1A_2)^{ k+1}} \pi _1(0,a_0) \pi _2(0,b_0) \epsilon (1-\epsilon )^k u_i(a_k,b_k)\\&\qquad \qquad \cdot \left( \prod _{j=1}^{k} \pi _1(b_{j-1},a_j) \pi _2(a_{j-1},b_j) \right) \\&\quad \underset{\text {Eq. }7}{=} \sum _{k=0}^\infty \sum _{a_kb_k\in A_1A_2} P_k(a_k,b_k \mid \pi _1,\pi _2) u_i(a_k,b_k) \end{aligned}$$

\(\square \)

To find the probability of player i choosing a in round k, we usually have to calculate the probabilities of all actions in all previous rounds. After all, player i reacts to player \(-i\)’s previous move, who in turn reacts to player i’s move in round \(k-2\), and so on. This is what makes Eq. 7 so long. However, imagine that player \(-i\) uses a stationary strategy. This, of course, means that player \(-i\)’s probability distribution over moves in round k (assuming the game indeed reaches round k) can be computed directly as \(\pi _{-i}(b)\). Player i’s distribution over moves in round k is almost as simple to calculate, because it only depends on player \(-i\)’s distribution over moves in round \(k-1\), which can also be calculated directly. We hence get the following lemma.

Lemma 3

Let \(G_\epsilon \) be a game, let \(\pi _i\) be a any strategy for \(G_\epsilon \) and let \(\pi _{-i}\) be a stationary strategy for \(G_\epsilon \). Then, for all \(k\in {\mathbb {N}}_+\), it is

$$\begin{aligned} P_{k, G_\epsilon } (a,b\mid \pi _i,\pi _{-i}) = (1-\epsilon )^k \sum _{b'\in A_{-i}} \pi _{-i}(b') \pi _{-i}(b) \pi _{i} ( b',a). \end{aligned}$$

Proof

We conduct our proof by induction over k. For \(k=1\), it is

$$\begin{aligned}&P_1(a,b \mid \pi _i, \pi _{-i}) \underset{\text {Eq. }7}{=} (1-\epsilon ) \sum _{a_0b_0} \pi _i(b_0,a)\pi _{-i}(a_0,b) \pi _i(0,a_0) \pi _{-i}(0,b_0)\\&\quad = (1-\epsilon )\sum _{b_0} \pi _i(b_0,a) \pi _{-i}(b) \pi _{-i}(b_0) \sum _{a_0}\pi _i(0,a_0)\\&\quad = (1-\epsilon )\sum _{b_0} \pi _i(b_0,a) \pi _{-i}(b) \pi _{-i}(b_0). \end{aligned}$$

If the lemma is true for k, it is also true for \(k+1\):

\(\square \)