Thanks!

]]>]]>Interesting. I encountered the concept of correlated equilibrium in the context of regret minimizing algorithms. It might be concrete enough to be more intuitive for more people. Okay, so Nash equilibria are too difficult to compute for finite sum games (this is well known, it looks like what is new is that they are also difficult to approximate); we want something easier. A correlated equilibrium can be computed in polynomial time and is achieved when, given a random signal that tells some agent an action to select, there is no gain from deviating. An example is two cars at an intersection, each trying to decide whether to Go or Wait. The traffic signal helps to coordinate their action unlike Nash, which assumes each player randomizes without communication (much harder to get fair outcomes!).

In the low regret learning scenario, you have a learner that gets to select an action and suffers a penalty or reward. The actions are offered by N experts/strategies. External regret is minimized when you do not do much worse in hindsight than a best fixed expert. E.g. you’re betting and the experts are a pool of speculators, you allocate weights by a min-regret rules and so aren’t losing much more than the best person. The experts can be abstract. You’re playing rock paper scissors and the experts are rock, paper or scissors. This learns a distribution over outcomes. One solution concept is to randomize according to the weight of an expert which is determined by its accumulated loss. For a 2 player zero sum game, this converges on Nash if both players play this way. This algorithm is so simple and so successful as to be almost offensive :).

Internal regret and swap regret are stronger notions. They allow you to say, even if I had swapped betting on that team for that other team, I would not have any better regret in expectation. Multiple agents minimizing their swap regret converges on the correlated equilibrium. The dynamics of many learning algorithms converge on correlated equilibrium. Minimizing swap regret is a very simple algorithm and much more efficient than linear programming. It’s also trivial to extend to multiple players (just have them each minimize swap regret) than it is with linear programming which is more mathematically involved.

I’ve never run into an explanation of swap regret that wasn’t very abstract. It’s not obvious why minimizing swap regret individually leads to a correlated equilibrium but it’s the fact that history serves as a correlator. Then memory and the ability to learn conditional distributions acts as the signal that tells you what to do when (this is formally done in terms of a time correlation->the signal randomly selects a time t and then tells you to perform the action you took at time t, it doesn’t let you know what t it chose. You then set up a correspondence between no swap situation and the no gain from deviating the random signal/correlation device’s suggestion).

A lot of this work joining game theory, economics and learning theory is done in the setting of ads and click optimization. These are the algorithms that run the digital world. They operate in the shadows so you never hear about them. Another fun thing about them is their link to evolution, so it gives CS/ML people another angle into that field.

Kram wrote:

I assume the two equilibria are both-choose-A and both-choose-B, right?

Those are two Nash equilibria, but apparently not the only ones!

Let’s work it out. Suppose player 1 has probability > 1/2 of choosing A. Then to maximize his expected payoff, player 2 will choose A all the time. By the same token if player 2 has probability > 1/2 of choosing A, to maximize his expected payoff player 1 will choose A all the time.

The same is true if we change “A” to “B” everywhere in the previous paragraph.

This gives the two Nash equilibria you mentioned: the one where both players choose A with probability 1, and the one where both players choose B with probability 1.

The only case not considered is where both players choose A with probability 1/2. This is a Nash equilibrium as well, since neither player can improve their payoff by unilaterally changing the probability that they chose A.

So there appear to be 3 Nash equilibria.

]]>Going by the example of how it’s done on Wikipedia:

I assume the two equilibria are both-choose-A and both-choose-B, right?

Now consider a third party that draws one of two cards labelled (A,A) and (B,B) with some probability p and 1-p respectively.

After drawing the card the third party informs the players of the strategy assigned to them on the card (but *not* the strategy assigned to their opponent). Suppose a player is assigned A. They would not want to deviate supposing the other player played their assigned strategy since they will get 1 (the highest payoff possible). Likewise for B.

So there is a continuum of correlated equilibria all behaving precisely the same in outcome, though the probability of what strategy is selected will be p. The two Nash equilibria would happen for p = 0 or p = 1, respectively.

Is that correct?

]]>I’ll have to read that book—thanks for pointing it out!

In the game where both players choose A or B, and both win $1 if they make the same choice but $0 if they make different choices, there are two Nash equilibria. But if this game is played just once, in the absence of communication, there is no reasonable way for either player to decide which choice to make. So, we shouldn’t expect that the Nash equilibrium is what actually occurs.

What are the correlated equilibria in this game?

]]>It took me a while to get that joke.

]]>Pretty elaborate stuff indeed! It’s worth noting that Khachiyan’s work on linear programming was considered a breakthrough because it showed this problem could be solved in polynomial time, even though his algorithm was too slow to be practical.

]]>