dinsdag 5 juni 2007

College 11 (deel 2)

The classical prisoner's dilemma
The Prisoner's dilemma was originally framed by Merrill Flood and Melvin Dresher working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence payoffs and gave it the "Prisoner's Dilemma" name (Poundstone, 1992).

The classical prisoner's dilemma (PD) is as follows:

Two suspects, A and B, are arrested by the police. The police have insufficient evidence for a conviction, and, having separated both prisoners, visit each of them to offer the same deal: if one testifies for the prosecution against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the full 10-year sentence. If both stay silent, both prisoners are sentenced to only six months in jail for a minor charge. If each betrays the other, each receives a two-year sentence. Each prisoner must make the choice of whether to betray the other or to remain silent. However, neither prisoner knows for sure what choice the other prisoner will make. So this dilemma poses the question: How should the prisoners act?
The dilemma can be summarized thus:

Prisoner B Stays Silent Prisoner B Betrays Prisoner A Stays Silent Each serves six months Prisoner A serves ten years Prisoner B goes free Prisoner A Betrays Prisoner A goes free Prisoner B serves ten years Each serves two years

The dilemma arises when one assumes that both prisoners only care about minimizing their own jail terms. Each prisoner has two options: to cooperate with his accomplice and stay quiet, or to defect from their implied pact and betray his accomplice in return for a lighter sentence. The outcome of each choice depends on the choice of the accomplice, but each prisoner must choose without knowing what his accomplice has chosen to do.

In deciding what to do in strategic situations, it is normally important to predict what others will do. This is not the case here. If you knew the other prisoner would stay silent, your best move is to betray as you then walk free instead of receiving the minor sentence. If you knew the other prisoner would betray, your best move is still to betray, as you receive a lesser sentence than by silence. Betraying is a dominant strategy. The other prisoner reasons similarly, and therefore also chooses to betray. Yet by both defecting they get a lower payoff than they would get by staying silent. So rational, self-interested play results in each prisoner being worse off than if they had stayed silent. In more technical language, this demonstrates very elegantly that in a non-zero sum game a Nash Equilibrium need not be a Pareto optimum.

Note that the paradox of the situation lies in that the prisoners are not defecting in hope that the other will not. Even when they both know the other to be rational and selfish, they will both play defect. Defect is what they will play no matter what, even though they know fully well that the other player is playing defect as well and that they will both be better off with a different result.

Note that the "Stay Silent" and "Betray" strategies may be known as "don't confess" and "confess", or the more standard "cooperate" and "defect", respectively.

[edit] Generalized form
We can expose the skeleton of the game by stripping it of the Prisoners' subtext. The generalized form of the game has been used frequently in experimental economics. The following rules give a typical realization of the game.

There are two players and a banker. Each player holds a set of two cards:
one printed with the word "Cooperate", the other printed with "Defect" (the standard terminology for the game). Each player puts one card face-down in front of the banker. By laying them face down, the possibility of a player knowing the other player's selection in advance is eliminated (although revealing one's move does not affect the dominance analysis[1]). At the end of the turn, the banker turns over both cards and gives out the payments accordingly.

If player 1 (red) defects and player 2 (blue) cooperates, player 1 gets the Temptation to Defect payoff of 5 points while player 2 receives the Sucker's payoff of 0 points. If both cooperate they get the Reward for Mutual Cooperation payoff of 3 points each, while if they both defect they get the Punishment for Mutual Defection payoff of 1 point. The checker board payoff matrix showing the payoffs is given below.

Canonical PD payoff matrix Cooperate Defect Cooperate 3, 3 0, 5 Defect 5, 0 1, 1

In "win-lose" terminology the table looks like this:

Cooperate Defect
Cooperate win-win lose much-win much
Defect win much-lose much lose-lose

These point assignments are given arbitrarily for illustration. It is possible to generalize them. Let T stand for Temptation to defect, R for Reward for mutual cooperation, P for Punishment for mutual defection and S for Sucker's payoff. The following inequalities must hold:

T > R > P > S

In addition to the above condition, if the game is repeatedly played by two players, the following condition should be added.[2]

2 R > T + S

If that condition does not hold, then full cooperation is not necessarily Pareto optimal, as the players are collectively better off by having each player alternate between cooperate and defect.

These rules were established by cognitive scientist Douglas Hofstadter and form the formal canonical description of a typical game of Prisoners Dilemma.

Geen opmerkingen: