# Computer models of cooperation, retaliation, and forgiveness

When the result of everyone's single-minded pursuit of his own interests is not satisfactory to anyone, the members of a community can often achieve a better outcome by cooperating with one another. Cooperation requires the members of the community to trust one another, at least to some extent; but they can often develop this mutual trust both by experiencing the benefits of cooperation and by recognizing the threat of retaliation for breaches of trust. Often the benefits of cooperation are so great that the threatened retaliation need not be extreme, enduring, or inevitable in order to be effective.

A mathematical game, traditionally called ``the prisoner's dilemma,'' can be used illustrate these claims. It is a game for two players; I'll call them Alice and Bob. The object of the game is to score as many points as possible. Each player has to make a decision between two alternatives, cooperation and defection. The players make their decisions independently, without consultation, and simultaneously; they then announce those decisions simultaneously. The outcome is then scored as follows:

• If both players choose cooperation, each receives 3 points.
• If Alice chooses cooperation and Bob chooses defection, then Bob receives 5 points. Alice gets nothing in this case.
• Similarly, if Alice chooses defection and Bob chooses cooperation, then Alice receives 5 points and Bob gets nothing.
• If both players choose defection, each receives 1 point.

It is clear to Alice that, no matter what Bob does, she will receive more points for choosing defection than for choosing cooperation. (If Bob chooses cooperation, Alice will get 5 points for defection and only 3 for cooperation; if Bob chooses defection, Alice will get 1 point for defection, none at all for cooperation.) So it is rational for Alice to choose defection.

However, for exactly the same reason, it is rational for Bob to choose defection; the game is symmetrical. Thus, if both players make rational choices, each will receive 1 point. Both Alice and Bob would prefer to score the 3 points that they would receive if they both chose cooperation, but because they cannot collaborate -- their choices must be made independently and simultaneously, remember -- there is no way for either player to make this happen. Alice can't trust Bob; Bob can't trust Alice.

At least, this appears to be the case if Alice and Bob are going to play only one round of this game. Suppose, however, that they go on to play a whole sequence of rounds -- an indefinitely long sequence, so that neither Alice nor Bob knows when the sequence will end. The object of the game is to score as many points as possible over the course of the entire match -- not to score more points than the other player, but simply to accumulate as large a total as one can. (Imagine that a point is a dollar. Bob would rather ``lose'' by a score of 328 to 350 than ``win'' by a score of 156 to 139, because he'd get \$172 more.) We still assume that Alice and Bob make their decisions in each round independently and simultaneously; but we allow them to know the outcome of each round before proceeding to the next.

In such a match, it is sometimes rational to choose cooperation, if by doing so one can induce the other player to choose cooperation in subsequent rounds. Choosing cooperation in an early round is a tentative offer of trust, a way for Alice to send a signal to Bob, expressing an interest in cooperating for their mutual benefit. Defection can also be used as a signal -- for example, as a way of retaliating for an unprovoked defection by the other player in a preceding round.

Of course, such signals are pointless unless the other player picks them up and acts on them. One can imagine an unresponsive player who figures out in advance what he is going to do in each round of the match and sticks to that plan without paying any attention to what the other player does. (For instance, a player who always chooses defection, no matter what, is unresponsive; so is one who chooses cooperation in every odd-numbered round and defection in every even-numbered one.) If the other player is unresponsive, then the match is just the basic one-round game played over and over again, and the rational course of action is to defect every time.

If the other player is responsive, however, some more interesting strategies are possible, and no one of them is ``best'' under all circumstances. (Alice would get the best possible result if she could somehow anticipate Bob's decisions and always make the same choice he does. But there is no way for her to do this reliably, since their decisions are announced simultaneously.) One could, for instance, choose cooperation in every round so long as the other player continued to cooperate; if the other player ever chose defection, one could retaliate by choosing defection in the next round, or perhaps in the next two rounds, or perhaps even in all subsequent rounds. Or one could choose cooperation for several rounds and then try to sneak in a single defection without attracting retribution. One could choose cooperation for twenty rounds and then toss a coin to decide between cooperation and defection on subsequent rounds. One could try to deduce, from the other player's choices in early rounds, whether the other player is responsive, with the intention of choosing cooperation in later rounds if she is and choosing defection if she is not. And there are still other possibilities.

Robert Axelrod, a political scientist at the University of Michigan School of Public Policy, devised a way to test such strategies against one another: He organized a ``prisoner's dilemma'' tournament. He invited a number of distinguished psychologists, economists, political scientists, mathematicians, and sociologists to suggest strategies, in the form of FORTRAN or BASIC subroutines. He fitted these into a program that paired off the suggested strategies in every possible way and ran each pair through a two-hundred-round match, collecting the scores. He then added up the scores that each strategy had run up in all of its pairwise matches and ranked the strategies by total score.

I've written a program to stage prisoner's dilemma tournaments similar to Axelrod's. When I have presented the game to various groups of high-school and college students, I've also asked them to suggest strategies and collected their proposals. Here are some of the players they designed:

1. The Absentee player cooperates in the first two rounds and subsequently defects if the other player has defected in each of the two preceding rounds, but cooperates if the other player cooperated in either of those two rounds.

2. The Downing player defects in the first and second rounds, then decides whether to choose cooperation or defection by looking at the other player's track record: It reviews the history of the match so far, determining how often in the past the other player has responded to defection with defection and how often it has responded to cooperation with defection. It then assumes that the other player will continue to respond to future acts of cooperation and defection with cooperation and defection in the same proportions. Finally, Downing computes whether it is more profitable to cooperate or to defect, given the opponent's response policy, and makes the appropriate choice. (In cases where Downing would be cooperating for the first time, it assumes that the probability that the other player will respond to cooperation with cooperation is fifty percent.)

3. In the first five rounds, Faye alternately cooperates and defects (beginning with cooperation). Subsequently, Faye does whatever the other player did in the previous round.

4. Florencio defects in the first round, cooperates in the second round, defects in the third round, and so on, alternately.

5. The Just mean player defects in every round, regardless of what the other player does.

6. The Look back player cooperates in the first round. On every subsequent round, it does whatever the other player did in the previous round.

7. Marilee cooperates in every round until the other player defects. It responds to any defection by defecting in the next two rounds.

8. The Overtime player follows a pre-programmed sequence of acts of cooperation and defection, basing its choice entirely on the round number and not on the opponent's behavior. The pattern consists of alternating strings of cooperations and defections, growing longer and longer as the round number increases. After round 100, it always defects.

9. The Point seven player cooperates in the first round and thereafter cooperates whenever the other player cooperated in the preceding round; after a defection by the other player, Point seven generates a random number in the range from 0 to 1 and defects if the random number is less than 0.7. (In other words, Point seven is like Look back, except that it retaliates only seventy percent of the time; it ignores the other thirty percent of the other player's defections.)

10. Punisher cooperates in every round until the other player defects. It responds to the other player's first defection by defecting in the next round, to the second defection by defecting in the next two rounds, to the third defection by defecting in the next three rounds, and so on. Each new defection starts the retaliation count over at 0. When not retaliating, Punisher responds to cooperation with cooperation.

11. Saint cooperates in every round, regardless of what the other player does.

12. The Sneaky player follows the same strategy as Look back for the first fifteen rounds. Beginning in the sixteenth round, however, if the other player has cooperated in all five of the preceding rounds, Sneaky defects once, then cooperates in the following round. If, in that following round, the other player ``punishes'' Sneaky by defecting, Sneaky reverts to following the Look back strategy for another fifteen rounds and then tries another unprovoked defection, and so on. If the other player fails to punish Sneaky, Sneaky continues to defect indefinitely until it provokes a response.

13. Tester defects in the first round, cooperates in the second round, and continues to alternate defection with cooperation as long as the other player does not defect. Tester cooperates after the other player's first defection, but after that plays the same strategy as Look back.

14. Three strikes cooperates in every round until the other player has defected three times altogether; after that, it defects in every round.

15. Vanessa cooperates in the first three rounds, then switches over to constant defection if the other player has defected in each of those first three rounds, and otherwise does whatever the other player did in the preceding round.

Here are the results of a round-robin prisoner's-dilemma tournament among players. One hundred eighty-six rounds were played in each match. The following table gives the match score for each pair of players. For instance, when player 2 (Downing) was paired with player 8 (Overtime), Downing scored 355 points (row 2, column 8) and Overtime scored 150 (row 8, column 2).

```                      1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
1 Absentee             266 552 279 184 558 558 268 558 558 558 462 279 558 558
2 Downing          581     196 558 185 554 189 355 374 189 930 194 542 197 197
3 Faye             562 186     465 183 468 190 290 550 194 562 468 468 562 471
4 Florencio        744  93 465      93 465  97 244 564 101 744 465 464 111 468
5 Just mean        194 190 198 558     190 190 358 410 190 930 190 190 198 198
6 Look back        558 554 463 465 185     558 290 558 558 558 549 557 558 558
7 Marilee          558 189 195 557 185 558     328 558 558 558 228 196 558 558
8 Overtime         373 150 295 494 143 295 238     492 154 844 299 296 164 299
9 Point seven      558 139 545 399 130 558 558 232     558 558 532 554 558 558
10 Punisher         558 189 194 556 185 558 558 354 558     558 265 557 558 558
11 Saint            558   0 552 279   0 558 558 129 558 558      51 279 558 558
12 Sneaky           582 319 463 465 185 549 223 289 557 260 896     548 231 549
13 Tester           744 542 463 464 185 557 191 291 559 557 744 548     200 559
14 Three strikes    558 187 552 551 183 558 558 349 558 558 558 221 195     558
15 Vanessa          558 187 461 463 183 558 558 289 558 558 558 549 554 558
```

Here are the players' totals:

```Look back             6969
Tester                6604
Vanessa               6592
Point seven           6437
Punisher              6206
Absentee              6196
Three strikes         6144
Sneaky                6116
Marilee               5784
Faye                  5619
Downing               5241
Saint                 5196
Florencio             5118
Overtime              4536
Just mean             4184
```

Axelrod's tournament did not, of course, prove which strategy is ``best'' -- different strategies are optimal in different matches. However, by studying the standings, Axelrod identified some characteristics shared by the most successful and robust strategies in this arena of shrewdly designed players:

Axelrod described a player as nice if she never chooses defection before the other player does. This implies that she chooses cooperation in the first round of every match and continues to choose cooperation unless the other player defects without provocation. In Axelrod's tournament, eight of the fourteen strategies were nice. These eight strategies took the first eight places in the final rankings. Axelrod concluded that, in an environment in which there are other nice players around, it pays to be nice.

A slightly subtler characteristic is provocability. The strategies that placed first and second in Axelrod's tournament shared this characteristic: When the other player chose defection in an early round, both of the successful strategies retaliated (by choosing defection) in the very next round. (Axelrod called players that have this characteristic retaliatory.) Players that chose to ignore a single defection or postponed retaliation for it were less successful.

Axelrod also observed that willingness to forgive the other player for unprovoked defections -- eventually -- helped the most successful strategies. The eighth-place finisher (the least successful of the nice players) always chose cooperation until the other player defected, but subsequently defected every time. Such a player is nice, but totally unforgiving; it ignores the other player's attempts to apologize. Not surprisingly, the other player often detected this unresponsiveness and continued to defect. As a result, neither player emerged from the match with a good score.

In our tournament (the one reported above):

• Every one of the eleven responsive players finished ahead of every one of the four unresponsive ones (Florencio, Just mean, Overtime, and Saint).

• The non-nice strategies in this tournament are Downing, Faye, Florencio, Just mean, Overtime, Sneaky, and Tester -- all the others are nice. Six of the top seven finishers were nice, and five of the bottom six were not nice. Notice that when two nice players met in the tournament, each got 558 points -- considerably more than any player's average overall score per round.

• Just mean, Look back, Marilee, and Punisher are retaliatory; the rest are not.

• Just mean is the only player in this group that is totally unforgiving, although Overtime, Three strikes, and Vanessa can also reach a state in which they will defect in every remaining round. In general, the players that were more forgiving (such as Absentee and Point seven) did better than otherwise similar but less forgiving players (such as Marilee).

• Vanessa gained by being nice and forgiving, particularly by staying on good terms with the non-nice players Sneaky and Faye.

• Faye lost ground by committing early unprovoked defections against Marilee and Punisher, both of which demand multiple apologies. Since Faye did not give such apologies, both of those players locked Faye into an unfortunate cycle of constant defection.

The player that finished in first place in Axelrod's tournament was Look back, which was submitted by Anatol Rapoport of the University of Toronto. Look back was the simplest player submitted; the FORTRAN subroutine that implemented it contained only four statements! Recall that Look back's strategy is to choose cooperation on the first round and subsequently to duplicate the choice that the other player made in the preceding round (cooperation if the other player cooperated, defection if the other player defected). Thus Look back is nice, since it cannot choose defect until the other player has done so. It is retaliatory, since it immediately replies to a defection by defecting right back. And it is forgiving; it retaliates for a defection immediately, but does not hold a grudge into subsequent rounds.

Axelrod published the results of this tournament, together with his thought-provoking analysis. He then announced a second tournament! Once again, he solicited entries from a large variety of thoughtful and competent people. Some of the participants in the first tournament also submitted entries in the second one, with improvements reflecting their study and consideration of the results of the first tournament. Rapoport submitted Look back again. Altogether, the second tournament attracted sixty-two entries.

As in the first tournament, these entries were paired off in every possible way. This time, the matches were halted after 153 rounds. (The number of rounds per match was not announced in advance, since one of the conventions of the game is that players do not know when the match will end.) The match scores for each player were summed, and the players were ranked by total score.

Astonishingly, Look back won again! Even though all of the participants in the second tournament knew that this simple strategy had won the first tournament, and several of them submitted players that would have scored even higher than Look back if they had been entered in that first tournament, none of those ingenious ``improvements'' was quite as successful in adapting to the rather different collection of players entered in the second tournament.

In his subsequent analysis, Axelrod explained the surprising robustness of Look back -- its success with a wide variety of players -- by noting three conditions that combined to make it non-exploitable: (1) It is a common enough strategy that other players expect to encounter it or some variation of it. (2) When other players encounter Look back, they can identify it quickly (or at least find out quickly that it is nice, retaliatory, and forgiving). (3) When other players have perceived these features of Look back, it is easy for them to infer that there is no way to exploit it.

Look back has another surprising characteristic: It won both of these tournaments without ever scoring more points than the other player in a match! Look back either scores exactly the same number of points as the other player, or else slightly fewer (if the other player has managed to get in one more defection than Look back).

You can find a fuller account of both tournaments, with many fascinating details, in Axelrod's excellent book The evolution of cooperation (New York: Basic Books, Inc., 1984). There's also a World Wide Web site, http://pscs.physics.lsa.umich.edu/Software/CC/ECHome.html, that includes the FORTRAN implementation of Axelrod's second tournament.

This document is available on the World Wide Web as

```http://www.cs.grinnell.edu/~stone/events/cooperation/
```

Grinnell College's Web site begins at

```http://www.grinnell.edu/
```