# Project #2: The prisoner's dilemma

## A mathematical model of cooperation and retaliation

When the result of everyone's single-minded pursuit of his own interests is not satisfactory to anyone, the members of a community can often achieve a better outcome by cooperating with one another. Cooperation requires the members of the community to trust one another, at least to some extent; but they can often develop this mutual trust both by experiencing the benefits of cooperation and by recognizing the threat of retaliation for breaches of trust. Often the benefits of cooperation are so great that the threatened retaliation need not be extreme, enduring, or inevitable in order to be effective.

A mathematical game, traditionally called ``the prisoner's dilemma,'' can be used illustrate these claims. It is a game for two players; I'll call them Alice and Bob. The object of the game is to score as many points as possible. Each player has to make a decision between two alternatives, cooperation and defection. The players make their decisions independently, without consultation, and simultaneously; they then announce those decisions simultaneously. The outcome is then scored as follows:

• If both players choose cooperation, each receives 3 points.
• If Alice chooses cooperation and Bob chooses defection, then Bob receives 5 points. Alice gets nothing in this case.
• Similarly, if Alice chooses defection and Bob chooses cooperation, then Alice receives 5 points and Bob gets nothing.
• If both players choose defection, each receives 1 point.

It is clear to Alice that, no matter what Bob does, she will receive more points for choosing defection than for choosing cooperation. (If Bob chooses cooperation, Alice will get 5 points for defection and only 3 for cooperation; if Bob chooses defection, Alice will get 1 point for defection, none at all for cooperation.) So it is rational for Alice to choose defection.

However, for exactly the same reason, it is rational for Bob to choose defection; the game is symmetrical. Thus, if both players make rational choices, each will receive 1 point. Both Alice and Bob would prefer to score the 3 points that they would receive if they both chose cooperation, but because they cannot collaborate -- their choices must be made independently and simultaneously, remember -- there is no way for either player to make this happen. Alice can't trust Bob; Bob can't trust Alice.

At least, this appears to be the case if Alice and Bob are going to play only one round of this game. Suppose, however, that they go on to play a whole sequence of rounds -- an indefinitely long sequence, so that neither Alice nor Bob knows when the sequence will end. The object of the game is to score as many points as possible over the course of the entire match -- not to score more points than the other player, but simply to accumulate as large a total as one can. (Imagine that a point is a dollar. Bob would rather ``lose'' by a score of 328 to 350 than ``win'' by a score of 156 to 139, because he'd get \$172 more.) We still assume that Alice and Bob make their decisions in each round independently and simultaneously; but we allow them to know the outcome of each round before proceeding to the next.

In such a match, it is sometimes rational to choose cooperation, if by doing so one can induce the other player to choose cooperation in subsequent rounds. Choosing cooperation in an early round is a tentative offer of trust, a way for Alice to send a signal to Bob, expressing an interest in cooperating for their mutual benefit. Defection can also be used as a signal -- for example, as a way of retaliating for an unprovoked defection by the other player in a preceding round.

Of course, such signals are pointless unless the other player picks them up and acts on them. One can imagine an unresponsive player who figures out in advance what he is going to do in each round of the match and sticks to that plan without paying any attention to what the other player does. (For instance, a player who always chooses defection, no matter what, is unresponsive; so is one who chooses cooperation in every odd-numbered round and defection in every even-numbered one.) If the other player is unresponsive, then the match is just the basic one-round game played over and over again, and the rational course of action is to defect every time.

If the other player is responsive, however, some more interesting strategies are possible, and no one of them is ``best'' under all circumstances. (Alice would get the best possible result if she could somehow anticipate Bob's decisions and always make the same choice he does. But there is no way for her to do this reliably, since their decisions are announced simultaneously.) One could, for instance, choose cooperation in every round so long as the other player continued to cooperate; if the other player ever chose defection, one could retaliate by choosing defection in the next round, or perhaps in the next two rounds, or perhaps even in all subsequent rounds. Or one could choose cooperation for several rounds and then try to sneak in a single defection without attracting retribution. One could choose cooperation for twenty rounds and then toss a coin to decide between cooperation and defection on subsequent rounds. One could try to deduce, from the other player's choices in early rounds, whether the other player is responsive, with the intention of choosing cooperation in later rounds if she is and choosing defection if she is not. And there are still other possibilities.

In the late 1970s, Robert Axelrod, a political scientist at the University of Michigan School of Public Policy, devised a way to test such strategies against one another: He organized a ``prisoner's dilemma'' tournament. He invited a number of distinguished psychologists, economists, political scientists, mathematicians, and sociologists to suggest strategies, in the form of FORTRAN or BASIC subroutines. He fitted these into a program that paired off the suggested strategies in every possible way and ran each pair through a two-hundred-round match, collecting the scores. He then added up the scores that each strategy had accumulated in all of its pairwise matches and ranked the strategies by total score.

I've written a program to stage a prisoner's dilemma tournament similar to Axelrod's. The main procedure is `play-tournament`; here's an example of its use:

```> (play-tournament roster)
(("Just mean" . 2128)
("Look back" . 4511)
("Absentee" . 4275)
("Pavlov" . 4116)
("Gotcha" . 3295)
("Tester" . 4095)
("Friedman" . 3849)
("Punisher" . 4262)
("Forgiver" . 4291))
```

In this association list, the keys are strings denoting different strategies for playing the prisoner's dilemma game, and the values are the total numbers of points scored by these strategies. In this particular tournament, each player faced every other player in a 209-round series, and so participated in 1672 rounds. The winner, Look back, managed to score about 2.7 points per round; the loser, Just Mean, scored less than 1.3.

## Exercise 1

Copy the file /home/stone/courses/scheme/examples/prisoners-dilemma.ss into your home directory by opening a terminal-emulator window and giving the shell command

```cp /home/stone/courses/scheme/html/labs/prisoners-dilemma.ss ~/prisoners-dilemma.ss
```

## Exercise 2

Read the program and reflect on how it works.

## Exercise 3

Design and implement a player that always cooperates in the first two rounds, and subsequently cooperates if its opponent has cooperated in both of the two preceding rounds (and defects if its opponent has defected in either of those rounds). Add this player to the tournament by putting the procedure that implements it into program that conducts the tournament.and extending the list called `roster`. Re-run the tournament to see how the players' scores change in the new environment.

## Exercise 4

All of the PLT dialects that DrScheme supports provide a non-standard procedure called `random` that takes on argument, a positive integer, and returns a randomly selected natural number less than that integer. (It makes a new random choice each time it is invoked, so repeating the same call with the same argument can yield different results.) So, for instance, you can simulate a coin toss by writing

```(if (zero? (random 2)) 'heads 'tails)
```

because the value of `(random 2)` will be either 0 or 1 (with equal probability, so that the test `(zero? (random 2))` will be true half the time and false half the time. Similarly, you can simulate the roll of a die by writing `(+ (random 6) 1)`; the call to `random` produces a natural number in the range from 0 to 5, so adding one to its result gives you a number in the range from 1 to 6.

Change languages, if necessary, to activate one of the PLT dialects (such as Textual). Then, using `random`, design and implement the Point Seven player, which always cooperates on the first round, and subsequently cooperates if its opponent has cooperated in the preceding round, but, when its opponent has defected in the preceding round, randomly defects 70% of the time and cooperates 30% of the time.

## Exercise 5

Design and implement your own player of the prisoner's dilemma. Add it to the roster of players and conduct the tournament. Try to explain why your player did well or badly.

When you have your player working, email it to me. I'm hoping to prepare a tournament among the players submitted by members of the class.

## Exercise 6

Change the number of rounds played in each match from 213 to 4. How does this affect the relative performance of the various players? How can the changes be explained? What conclusions about cooperation do they suggest?

## Exercise 7

Adjust the reward function so that the payoff for defecting against a cooperator is 25 instead of 5. Again, how does this change affect the relative performance of the various players? How can the differences be explained? What conclusions about cooperation do they suggest?

## Exercise 8

Adapt the program so that the payoffs increase in proportion to the round number: 3, 0, 5, and 1 in the first round, 6, 0, 10, and 2 in the second round, 9, 0, 15, and 3 in the third round, and so on. Again, how does this change affect the relative performance of the various players? How can the differences be explained? What conclusions about cooperation do they suggest?