Big game theory discovered




















One of the things that we've really discovered through this use of game theory in studying science is that it's really very complicated. That scientists striving to get credit from their peers might initially seem counterproductive, you think, "Well, scientists should care about truth and nothing more.

Scientists are encouraged to publish their results quickly rather than sort of keeping them secret until they're absolutely positively sure, and that can actually be beneficial because then other scientists can build on it, can use it in their discoveries.

All of that is the good side. There's also a bad side, which is that striving for credit can sometimes encourage scientists to skew towards one project or another and not distribute their labor as a community of scientists over a bunch of different projects.

One thing that's really important in science is that scientists work on different projects, because we don't know which project is going to end up working out. There are many different ways to say, detect gravitational waves, or many different ways to try and discover a subatomic particle, and what we want is scientists to distribute themselves so that we have groups working on each different project.

The desire for credit can sometimes encourage them to be too homogenous, to all jump onboard with what looks like the best project and not distribute themselves in a good or useful way across all the different projects.

Nobody really has a good idea yet of how to balance these, because there's a good side and a bad side. We don't want to scientists to be completely divorced from the world, but on the other hand the danger is if we put too much emphasis on public reward and the acclaim of the public then we worry that we have scientists just generating false but exciting results over and over and over again in an attempt to get popular acclaim.

However, this condition may often not hold. Suppose now that the utility functions are more complicated. The pursuer most prefers an outcome in which she shoots the fugitive and so claims credit for his apprehension to one in which he dies of rockfall or snakebite; and she prefers this second outcome to his escape. The fugitive prefers a quick death by gunshot to the pain of being crushed or the terror of an encounter with a cobra.

Most of all, of course, he prefers to escape. Suppose, plausibly, that the fugitive cares more strongly about surviving than he does about getting killed one way rather than another.

This is because utility does not denote a hidden psychological variable such as pleasure. As we discussed in Section 2. How, then, can we model games in which cardinal information is relevant? Here, we will provide a brief outline of their ingenious technique for building cardinal utility functions out of ordinal ones.

It is emphasized that what follows is merely an outline , so as to make cardinal utility non-mysterious to you as a student who is interested in knowing about the philosophical foundations of game theory, and about the range of problems to which it can be applied.

Providing a manual you could follow in building your own cardinal utility functions would require many pages. Such manuals are available in many textbooks. Suppose that we now assign the following ordinal utility function to the river-crossing fugitive:. We are supposing that his preference for escape over any form of death is stronger than his preferences between causes of death.

This should be reflected in his choice behaviour in the following way. In a situation such as the river-crossing game, he should be willing to run greater risks to increase the relative probability of escape over shooting than he is to increase the relative probability of shooting over snakebite.

Suppose we asked the fugitive to pick, from the available set of outcomes, a best one and a worst one. Now imagine expanding the set of possible prizes so that it includes prizes that the agent values as intermediate between W and L.

We find, for a set of outcomes containing such prizes, a lottery over them such that our agent is indifferent between that lottery and a lottery including only W and L. In our example, this is a lottery that includes being shot and being crushed by rocks. Call this lottery T. What exactly have we done here?

Furthermore, two agents in one game, or one agent under different sorts of circumstances, may display varying attitudes to risk. Perhaps in the river-crossing game the pursuer, whose life is not at stake, will enjoy gambling with her glory while our fugitive is cautious.

Both agents, after all, can find their NE strategies if they can estimate the probabilities each will assign to the actions of the other. We can now fill in the rest of the matrix for the bridge-crossing game that we started to draw in Section 2.

If both players are risk-neutral and their revealed preferences respect ROCL, then we have enough information to be able to assign expected utilities, expressed by multiplying the original payoffs by the relevant probabilities, as outcomes in the matrix.

Suppose that the hunter waits at the cobra bridge with probability x and at the rocky bridge with probability y. Then, continuing to assign the fugitive a payoff of 0 if he dies and 1 if he escapes, and the hunter the reverse payoffs, our complete matrix is as follows:. We can now read the following facts about the game directly from the matrix.

No pair of pure strategies is a pair of best replies to the other. But in real interactive choice situations, agents must often rely on their subjective estimations or perceptions of probabilities. In one of the greatest contributions to twentieth-century behavioral and social science, Savage showed how to incorporate subjective probabilities, and their relationships to preferences over risk, within the framework of von Neumann-Morgenstern expected utility theory.

Then, just over a decade later, Harsanyi showed how to solve games involving maximizers of Savage expected utility. This is often taken to have marked the true maturity of game theory as a tool for application to behavioral and social science, and was recognized as such when Harsanyi joined Nash and Selten as a recipient of the first Nobel prize awarded to game theorists in As we observed in considering the need for people playing games to learn trembling hand equilibria and QRE, when we model the strategic interactions of people we must allow for the fact that people are typically uncertain about their models of one another.

This uncertainty is reflected in their choices of strategies. Consider the fourth of these NE. The structure of the game incentivizes efforts by Player I to supply Player III with information that would open up her closed information set.

Player III should believe this information because the structure of the game shows that Player I has incentive to communicate it truthfully. Theorists who think of game theory as part of a normative theory of general rationality, for example most philosophers, and refinement program enthusiasts among economists, have pursued a strategy that would identify this solution on general principles.

The relevant beliefs here are not merely strategic, as before, since they are not just about what players will do given a set of payoffs and game structures, but about what understanding of conditional probability they should expect other players to operate with.

What beliefs about conditional probability is it reasonable for players to expect from each other? Consider again the NE R, r 2 , r 3. Suppose that Player III assigns pr 1 to her belief that if she gets a move she is at node The use of the consistency requirement in this example is somewhat trivial, so consider now a second case also taken from Kreps , p.

The idea of SE is hopefully now clear. We can apply it to the river-crossing game in a way that avoids the necessity for the pursuer to flip any coins of we modify the game a bit. This requirement is captured by supposing that all strategy profiles be strictly mixed , that is, that every action at every information set be taken with positive probability.

You will see that this is just equivalent to supposing that all hands sometimes tremble, or alternatively that no expectations are quite certain. A SE is said to be trembling-hand perfect if all strategies played at equilibrium are best replies to strategies that are strictly mixed.

You should also not be surprised to be told that no weakly dominated strategy can be trembling-hand perfect, since the possibility of trembling hands gives players the most persuasive reason for avoiding such strategies. How can the non-psychological game theorist understand the concept of an NE that is an equilibrium in both actions and beliefs? Multiple kinds of informational channels typically link different agents with the incentive structures in their environments. Some agents may actually compute equilibria, with more or less error.

Others may settle within error ranges that stochastically drift around equilibrium values through more or less myopic conditioned learning. Still others may select response patterns by copying the behavior of other agents, or by following rules of thumb that are embedded in cultural and institutional structures and represent historical collective learning. Note that the issue here is specific to game theory, rather than merely being a reiteration of a more general point, which would apply to any behavioral science, that people behave noisily from the perspective of ideal theory.

In a given game, whether it would be rational for even a trained, self-aware, computationally well resourced agent to play NE would depend on the frequency with which he or she expected others to do likewise. If she expects some other players to stray from NE play, this may give her a reason to stray herself. Instead of predicting that human players will reveal strict NE strategies, the experienced experimenter or modeler anticipates that there will be a relationship between their play and the expected costs of departures from NE.

Consequently, maximum likelihood estimation of observed actions typically identifies a QRE as providing a better fit than any NE. Rather, she conjectures that they are agents, that is, that there is a systematic relationship between changes in statistical patterns in their behavior and some risk-weighted cardinal rankings of possible goal-states.

If the agents are people or institutionally structured groups of people that monitor one another and are incentivized to attempt to act collectively, these conjectures will often be regarded as reasonable by critics, or even as pragmatically beyond question, even if always defeasible given the non-zero possibility of bizarre unknown circumstances of the kind philosophers sometimes consider e.

The analyst might assume that all of the agents respond to incentive changes in accordance with Savage expected-utility theory, particularly if the agents are firms that have learned response contingencies under normatively demanding conditions of market competition with many players. All this is to say that use of game theory does not force a scientist to empirically apply a model that is likely to be too precise and narrow in its specifications to plausibly fit the messy complexities of real strategic interaction.

A good applied game theorist should also be a well-schooled econometrician. However, games are often played with future games in mind, and this can significantly alter their outcomes and equilibrium strategies.

Our topic in this section is repeated games , that is, games in which sets of players expect to face each other in similar situations on multiple occasions. This may no longer hold, however, if the players expect to meet each other again in future PDs. Imagine that four firms, all making widgets, agree to maintain high prices by jointly restricting supply. That is, they form a cartel. This will only work if each firm maintains its agreed production quota.

Typically, each firm can maximize its profit by departing from its quota while the others observe theirs, since it then sells more units at the higher market price brought about by the almost-intact cartel.

In the one-shot case, all firms would share this incentive to defect and the cartel would immediately collapse. However, the firms expect to face each other in competition for a long period. In this case, each firm knows that if it breaks the cartel agreement, the others can punish it by underpricing it for a period long enough to more than eliminate its short-term gain.

Of course, the punishing firms will take short-term losses too during their period of underpricing. But these losses may be worth taking if they serve to reestablish the cartel and bring about maximum long-term prices. One simple, and famous but not , contrary to widespread myth, necessarily optimal strategy for preserving cooperation in repeated PDs is called tit-for-tat.

This strategy tells each player to behave as follows:. A group of players all playing tit-for-tat will never see any defections. Since, in a population where others play tit-for-tat, tit-for-tat is the rational response for each player, everyone playing tit-for-tat is a NE. You may frequently hear people who know a little but not enough game theory talk as if this is the end of the story.

It is not. There are two complications. First, the players must be uncertain as to when their interaction ends. Suppose the players know when the last round comes.

In that round, it will be utility-maximizing for players to defect, since no punishment will be possible. Now consider the second-last round. In this round, players also face no punishment for defection, since they expect to defect in the last round anyway.

So they defect in the second-last round. But this means they face no threat of punishment in the third-last round, and defect there too. We can simply iterate this backwards through the game tree until we reach the first round. Since cooperation is not a NE strategy in that round, tit-for-tat is no longer a NE strategy in the repeated game, and we get the same outcome—mutual defection—as in the one-shot PD. Therefore, cooperation is only possible in repeated PDs where the expected number of repetitions is indeterminate.

Of course, this does apply to many real-life games. Note that in this context any amount of uncertainty in expectations, or possibility of trembling hands, will be conducive to cooperation, at least for awhile. When people in experiments play repeated PDs with known end-points, they indeed tend to cooperate for awhile, but learn to defect earlier as they gain experience.

Now we introduce a second complication. Consider our case of the widget cartel. Suppose the players observe a fall in the market price of widgets. Perhaps this is because a cartel member cheated.

Or perhaps it has resulted from an exogenous drop in demand. If tit-for-tat players mistake the second case for the first, they will defect, thereby setting off a chain-reaction of mutual defections from which they can never recover, since every player will reply to the first encountered defection with defection, thereby begetting further defections, and so on.

If players know that such miscommunication is possible, they have incentive to resort to more sophisticated strategies. In particular, they may be prepared to sometimes risk following defections with cooperation in order to test their inferences. However, if they are too forgiving, then other players can exploit them through additional defections. In general, sophisticated strategies have a problem. Because they are more difficult for other players to infer, their use increases the probability of miscommunication.

But miscommunication is what causes repeated-game cooperative equilibria to unravel in the first place. The complexities surrounding information signaling, screening and inference in repeated PDs help to intuitively explain the folk theorem , so called because no one is sure who first recognized it, that in repeated PDs, for any strategy S there exists a possible distribution of strategies among other players such that the vector of S and these other strategies is a NE.

Thus there is nothing special, after all, about tit-for-tat. Real, complex, social and political dramas are seldom straightforward instantiations of simple games such as PDs.

Hardin offers an analysis of two tragically real political cases, the Yugoslavian civil war of —95, and the Rwandan genocide, as PDs that were nested inside coordination games. A coordination game occurs whenever the utility of two or more players is maximized by their doing the same thing as one another, and where such correspondence is more important to them than whatever it is, in particular, that they both do. In these circumstances, any strategy that is a best reply to any vector of mixed strategies available in NE is said to be rationalizable.

That is, a player can find a set of systems of beliefs for the other players such that any history of the game along an equilibrium path is consistent with that set of systems.

Pure coordination games are characterized by non-unique vectors of rationalizable strategies. The Nobel laureate Thomas Schelling conjectured, and empirically demonstrated, that in such situations, players may try to predict equilibria by searching for focal points , that is, features of some strategies that they believe will be salient to other players, and that they believe other players will believe to be salient to them.

Coordination was, indeed, the first topic of game-theoretic application that came to the widespread attention of philosophers. In , the philosopher David Lewis published Convention , in which the conceptual framework of game-theory was applied to one of the fundamental issues of twentieth-century epistemology, the nature and extent of conventions governing semantics and their relationship to the justification of propositional beliefs.

The basic insight can be captured using a simple example. This insight, of course, well preceded Lewis; but what he recognized is that this situation has the logical form of a coordination game.

Thus, while particular conventions may be arbitrary, the interactive structures that stabilize and maintain them are not. Furthermore, the equilibria involved in coordinating on noun meanings appear to have an arbitrary element only because we cannot Pareto-rank them; but Millikan shows implicitly that in this respect they are atypical of linguistic coordinations.

In a city, drivers must coordinate on one of two NE with respect to their behaviour at traffic lights. Either all must follow the strategy of rushing to try to race through lights that turn yellow or amber and pausing before proceeding when red lights shift to green, or all must follow the strategy of slowing down on yellows and jumping immediately off on shifts to green. Both patterns are NE, in that once a community has coordinated on one of them then no individual has an incentive to deviate: those who slow down on yellows while others are rushing them will get rear-ended, while those who rush yellows in the other equilibrium will risk collision with those who jump off straightaway on greens.

However, the two equilibria are not Pareto-indifferent, since the second NE allows more cars to turn left on each cycle in a left-hand-drive jurisdiction, and right on each cycle in a right-hand jurisdiction, which reduces the main cause of bottlenecks in urban road networks and allows all drivers to expect greater efficiency in getting about.

Unfortunately, for reasons about which we can only speculate pending further empirical work and analysis, far more cities are locked onto the Pareto-inferior NE than on the Pareto-superior one.

Conditional game theory see Section 5 below provides promising resources for modeling cases such as this one, in which maintenance of coordination game equilibria likely must be supported by stable social norms, because players are anonymous and encounter regular opportunities to gain once-off advantages by defecting from supporting the prevailing equilibrium. This work is currently ongoing. While various arrangements might be NE in the social game of science, as followers of Thomas Kuhn like to remind us, it is highly improbable that all of these lie on a single Pareto-indifference curve.

These themes, strongly represented in contemporary epistemology, philosophy of science and philosophy of language, are all at least implicit applications of game theory.

The reader can find a broad sample of applications, and references to the large literature, in Nozick Most of the social and political coordination games played by people also have this feature.

Unfortunately for us all, inefficiency traps represented by Pareto-inferior NE are extremely common in them. And sometimes dynamics of this kind give rise to the most terrible of all recurrent human collective behaviors. That is, in neither situation, on either side, did most people begin by preferring the destruction of the other to mutual cooperation. However, the deadly logic of coordination, deliberately abetted by self-serving politicians, dynamically created PDs. Some individual Serbs Hutus were encouraged to perceive their individual interests as best served through identification with Serbian Hutu group-interests.

That is, they found that some of their circumstances, such as those involving competition for jobs, had the form of coordination games. They thus acted so as to create situations in which this was true for other Serbs Hutus as well. Eventually, once enough Serbs Hutus identified self-interest with group-interest, the identification became almost universally correct , because 1 the most important goal for each Serb Hutu was to do roughly what every other Serb Hutu would, and 2 the most distinctively Serbian thing to do, the doing of which signalled coordination, was to exclude Croats Tutsi.

That is, strategies involving such exclusionary behavior were selected as a result of having efficient focal points. But the outcome is ghastly: Serbs and Croats Hutus and Tutsis seem progressively more threatening to each other as they rally together for self-defense, until both see it as imperative to preempt their rivals and strike before being struck.

If Hardin is right—and the point here is not to claim that he is , but rather to point out the worldly importance of determining which games agents are in fact playing—then the mere presence of an external enforcer NATO? The Rwandan genocide likewise ended with a military solution, in this case a Tutsi victory.

But this became the seed for the most deadly international war on earth since , the Congo War of — Of course, it is not the case that most repeated games lead to disasters. The biological basis of friendship in people and other animals is partly a function of the logic of repeated games.

The importance of payoffs achievable through cooperation in future games leads those who expect to interact in them to be less selfish than temptation would otherwise encourage in present games. The fact that such equilibria become more stable through learning gives friends the logical character of built-up investments, which most people take great pleasure in sentimentalizing.

Furthermore, cultivating shared interests and sentiments provides networks of focal points around which coordination can be increasingly facilitated.

More directly, her claim was that conventions are not merely the products of decisions of many individual people, as might be suggested by a theorist who modeled a convention as an equilibrium of an n -person game in which each player was a single person.

Similar concerns about allegedly individualistic foundations of game theory have been echoed by another philosopher, Martin Hollis and economists Robert Sugden , , and Michael Bacharach The explanation seems to require appeal to very strong forms of both descriptive and normative individualism.

The players undermine their own welfare, one might argue, because they obstinately refuse to pay any attention to the social context of their choices. Binmore forcefully argues that this line of criticism confuses game theory as mathematics with questions about which game theoretic models are most typically applicable to situations in which people find themselves.

At 3, players would be indifferent between cooperating and defecting. Then we get the following transformation of the game:. Thus if the players find this equilibrium, we should not say that they have played non-NE strategies in a PD. Rather, we should say that the PD was the wrong model of their situation. What is at issue here is the best choice of a convention for applying mathematics to empirical description. Binmore is clearly right, and the majority of commentators have come to recognize that he is right, if we interpret the payoffs of games by reference to utility functions with unrestricted domains.

This is the overwhelmingly standard practice in both economics and formal decision theory. For a number of years this issue was regarded as closed in the mainstream literature. However, Sugden argues in very recent work that there are reasons, quite independent of technical considerations about which conventions are most convenient for representing empirical interactions as games, for avoiding appeal to preferences over unrestricted domains in analyzing welfare that is, in doing normative economics.

On the basis of this argument, Sugden reverts to using game-theoretic models in which payoffs are restricted to objectively specifiable metrics, such as monetary returns. The substantive issues in welfare economics on which Sugden sheds now light are too interesting for a critic to reasonably refuse to engage with them out of mere stubbornness about adhering to convention in interpreting game representations.

It is too soon to assess whether the advances in welfare analysis that Sugden seeks are sustainable under critical stress-testing. If they prove not to be, then his motivation for an alternative convention on payoff interpretation will dissolve. I think it more likely, however, that a period of intensive innovation in welfare economics lies just ahead of us, and that in the course of this economists and other analysts will grow comfortable with operating two different representational conventions depending on problem contexts.

If that is indeed our future, then we can anticipate a further stage in which, because problem contexts tend not to remain conveniently isolated from one another, new formalism is demanded to allow both conventions to be operated in a single application without confusion.

But these speculations run well ahead of the current state of theory. Under this assumption, Bacharach, Sugden and Gold argue, human game players will often or usually avoid framing situations in such a way that a one-shot PD is the right model of their circumstances.

Note that the welfare of the team might make a difference to cardinal payoffs without making enough of a difference to trump the lure of unilateral defection. Suppose it bumped them up to 2. This point is important, since in experiments in which subjects play sequences of one-shot PDs not repeated PDs, since opponents in the experiments change from round to round , majorities of subjects begin by cooperating but learn to defect as the experiments progress.

The team reasoners then re-frame the situation to defend themselves. And a pure strategy--such as the one you found for tick-tack-toe--is an overall plan specifying moves to be taken in all eventualities that can arise in a play of the game.

A game is said to have perfect information if, throughout its play, all the rules, possible choices, and past history of play by any player are known to all participants. Games like tick-tack-toe, backgammon and chess are games with perfect information and such games are solved by pure strategies.

But whereas you may be able to describe all such pure strategies for tick-tack-toe, it is not possible to do so for chess, hence the latter's age-old intrigue. Games without perfect information, such as matching pennies, stone-paper-scissors or poker offer the players a challenge because there is no pure strategy that ensures a win.

For matching pennies you have two pure strategies: play heads or tails. For stone-paper-scissors you have three pure strategies: play stone or paper or scissors. In both instances you cannot just continually play a pure strategy like heads or stone because your opponent will soon catch on and play the associated winning strategy. What to do? We soon learn to try to confound our opponent by randomizing our choice of strategy for each play for heads-tails, just toss the coin in the air and see what happens for a split.

There are also other ways to control how we randomize. For example, for stone-paper-scissors we can toss a six-sided die and decide to select stone half the time the numbers 1, 2 or 3 are tossed , select paper one third of the time the numbers 4 or 5 are tossed or select scissors one sixth of the time the number 6 is tossed.

Doing so would tend to hide your choice from your opponent. But, by mixing strategies in this manner, should you expect to win or lose in the long run?

What is the optimal mix of strategies you should play? How much would you expect to win? This is where the modern mathematical theory of games comes into play. Games such as heads-tails and stone-paper-scissors are called two-person zero-sum games. Zero-sum means that any money Player 1 wins or loses is exactly the same amount of money that Player 2 loses or wins. Discover the biggest explosion in the history of the universe Could you imagine a bigger explosion than the Big Bang?

Well, although it seems unthinkable, scientists have discovered a new explosion that until now is considered the largest in the history of the universe since its origin. Scientists have discovered the biggest explosion seen in the universe since the Big Bang.

The explosion, which released five times more energy than the previous record holder, emanated from a supermassive black hole at the center of a galaxy hundreds of millions of light-years from Earth.



0コメント

  • 1000 / 1000