Liars and Hypocrites

26 July, 2017


Who do you trust: the liar or the hypocrite?

Some people like to accuse those are worried about climate change of being “hypocrites”. Why? Because we still fly around in planes, drive cars and so on.

What’s the argument? Could it be this?

“If even those folks who claim there’s a problem aren’t willing to do anything about it, it must not really be a problem.”

That argument is invalid. Say we have a married couple who both smoke. The husband says “we should quit smoking.” But he keeps smoking. Does this mean that it’s okay to smoke?

Or suppose he says “you should quit smoking”, but keeps on smoking himself. That’s would be infuriating. But it doesn’t make the statement less true.

Indeed, our civilization is addicted to burning carbon. It’s a lot like being addicted to nicotine. Addiction leads people to say one thing and do another. You know you should change your behavior—but you don’t have the will power. Or you do for a while… but then you lapse.

I see this in myself. I try to stop taking airplane flights, but like most successful scientists I get lots of invitations to conferences, with free flights to fun places. It’s hard to resist. It’s like offering cigarettes to someone who is trying to quit. I can resist nine times and cave in on the tenth! I can “relapse” for months and then come to my sense.

In fact the accusation of hypocrisy is not about the facts of climate change. It’s about choosing a social group:

“The people who want you to take climate change seriously are hypocrites. Don’t be a sucker. Don’t let them boss you around. Join us instead.”

This takes advantage of a psychological fact: most of us prefer liars to hypocrites. A lie is forgivable. But hypocrisy—someone publicly saying you should do something when they don’t themselves—is not.

There are studies about this:

• Association for Psychological Science, We dislike hypocrites because they deceive us, 30 January 2017.

The title of this article is wrong. Liars also deceive us. We hate hypocrites for other reasons.

We’re averse to hypocrites because their disavowal of bad behavior sends a false signal, misleading us into thinking they’re virtuous when they’re not, according to new findings in Psychological Science, a journal of the Association for Psychological Science. The research shows that people dislike hypocrites more than those who openly admit to engaging in a behavior that they disapprove of.

“People dislike hypocrites because they unfairly use condemnation to gain reputational benefits and appear virtuous at the expense of those who they are condemning–when these reputational benefits are in fact undeserved,” explains psychological scientist Jillian Jordan of Yale University, first author on the research.

Intuitively, it seems that we might dislike hypocrites because their word is inconsistent with their behavior, because they lack the self-control to behave according to their own morals, or because they deliberately engage in behaviors that they know to be morally wrong. All of these explanations seem plausible, but the new findings suggest that it’s the misrepresentation of their moral character that really raises our ire.

In an online study with 619 participants, Jordan and Yale colleagues Roseanna Sommers, Paul Bloom, and David G. Rand presented each participant with four scenarios about characters engaging in possible moral transgressions: a member of a track team using performance-enhancing drugs, a student cheating on a take-home chemistry exam, an employee failing to meet a deadline on a team project, and a member of a hiking club who engaged in infidelity.

In each scenario, participants read about a conversation involving moral condemnation of a transgression. The researchers varied whether the condemnation came from a “target character” (who subjects would later evaluate) or somebody else, as well as whether the scenario provided direct information about the target character’s own moral behavior. Participants then evaluated how trustworthy and likeable the target character was, as well as the likelihood that the target character would engage in the transgression.

The results showed that participants viewed the target more positively when he or she condemned the bad behavior in the scenario, but only when they had no information about how the character actually behaved. This suggests that we tend to interpret condemnation as a signal of moral behavior in the absence of direct information.

A second online study showed that condemning bad behavior conveyed a greater reputational boost for the character than directly stating that he or she didn’t engage in the behavior.

“Condemnation can act as a stronger signal of one’s own moral goodness than a direct statement of moral behavior,” the researchers write.

And additional data suggest that people dislike hypocrites even more than they dislike liars. In a third online study, participants had a lower opinion of a character who illegally downloaded music when he or she condemned the behavior than when he or she directly denied engaging in it.

I believe the accusation of hypocrisy is trying to set up a binary choice:

“Whose side are you on? Those hypocrites who say climate change is a problem and try to get you to make sacrifices, while they don’t? Or us liars, who say there’s no problem and your behavior is fine?”

Of course, being liars, they leave out the word “liars”.

One way out is to realize it’s not a binary choice.

There’s a third position: the honest hypocrite.

Perhaps the most critical piece of evidence for the theory of hypocrisy as false signaling is that people disliked hypocrites more than so-called “honest hypocrites.” In a fourth online study, the researchers tested perceptions of “honest hypocrites,” who—like traditional hypocrites—condemn behaviors that they engage in, but who also admit that they sometimes commit those behaviors.

“The extent to which people forgive honest hypocrites was striking to us,” says Jordan. “These honest hypocrites are seen as no worse than people who commit the same transgressions but keep their mouths shut and refrain from judging others for doing the same — suggesting that the entirety of our dislike for hypocrites can be attributed to the fact that they falsely signal their virtue.”

There’s also a fourth position: the non-liar, non-hypocrite. That’s even better. But sometimes, when we need to take collective action, we should listen to the honest hypocrite, who tells us that we should all take action, but admits he’s not doing it yet.

And now, here’s a great example of someone trying take advantage of our hatred of hypocrites. Pay careful attention to how she cleverly tries to manipulate you! By the end you’ll feel different than when you started. If she were trying to get you to smoke, by the end you’d light up a cigarette and feel proud of yourself.

An example

The hypocrisy of climate change advocates
Julie Kelly

So according to all the hysterical people, President-elect Donald Trump has appointed the most climate denier cabinet ever. As cabinet confirmation hearings get underway, expect to hear the charge “climate denier!” a lot.

For those of you who don’t know what a climate denier is, it means you either challenge, question or flat-out reject the idea that the planet is warming due to human activity. In the scientific world and in the world of international liberal groupthink (but I repeat myself), this is blasphemy. Should you remotely doubt the dubious models, unrealized dire predictions, changing goal posts or flawed data related to climate science, you are not just stupid according to these folks, but you are on par with those who deny the Holocaust.

Even people who believe in manmade climate change (or AGW, anthropogenic global warming) have been excommunicated from the climate tribe for raising any concern about climate science. Last month, Roger Pielke, Jr. wrote a revealing op-ed in the Wall Street Journal about how he became a target of the climate junta for saying there was no connection between weather disasters and climate change. Although Pielke believes in AGW and even supports a carbon tax to mitigate its impact, his scrutiny made him a target of powerful folks in Congress, the media and even the White House.

The first time I was called a climate denier was a few years ago, after I started writing about agricultural biotechnology or GMOs. The charge was an attempt to undermine my credibility on supporting genetic engineering: the line of attack was, if you don’t believe the science and consensus about man-made global warming, you are a scientific illiterate who has no business speaking in defense of other scientific issues like biotechnology. This was often dished out by climate change pushers who also oppose GMOs because they are anti-capitalist, anti-corporate ideologues (Bernie Sanders could be the poster child for this).

As I did more research on climate change, I learned one important thing: being a climate change believer means never having to say you’re sorry, or at least never making any major sacrifice to your lifestyle that would mitigate the pending doom you are so preoccupied with (but, sea ice!). You can go along with climate change dogma and do virtually nothing about it except recycle your newspapers while self-righteously calling the other side names. From the Pope to the president to the smug suburban mom, climate adherents live in glass houses that function thanks to evil stuff like oil and gas while throwing rocks at us so-called deniers.

So who are the real deniers: those who are reasonably skeptical about climate change or those who give lots of lip service to it while living a lifestyle totally inimical to every tenet of the climate change creed?

To that end, you might be a climate change denier if:

You are the Holy Father of the largest denomination of the Christian faith who calls climate change “one of the principal challenges facing humanity in our day” and that coal, oil and gas must be replaced “without delay” yet lives a palatial lifestyle powered by fossil fuels.

You are the president of the United States who tried to ban fracking on public land because it emits greenhouse gases but then takes credit for cutting “dependence on foreign oil by more than half” thanks to fracking.

You are a presidential candidate whose primary message is blasting big corporations from Exxon to Monsanto for destroying the planet but then demands a private jet to make meaningless campaign appearances on behalf of the woman who beat you so you can keep getting attention for yourself.

You are a movie star who works in one of the most energy-intensive and frivolous industries but now earns fame by leading protests against fracking and demands the country live on 100 percent renewables by 2050 then jets your family off from Manhattan to Australia on a jumbo jet to take pictures of the Great Barrier Reef.

You are Robert Kennedy, Jr.

You drive a Tesla but don’t know the electricity comes from a grid supported by fossil fuels.

You are a legislator who pushes solar panels and wind turbines without having the slightest clue how much energy and materials — like steel, concrete, diesel fuel, fiberglass and plastic — are needed to manufacture them.

You are Leonardo DiCaprio.

You are a suburban mom who looks down at other moms who don’t care/know/believe in climate change but you spend the day driving your privileged kids around in a pricy SUV and have two air-conditioners in your 6,000 square-foot house,

You oppose nuclear energy and/or genetically engineered crops.

You eat meat because meat production allegedly emits about 14.5 percent of greenhouse gases or some made-up number according to the United Nations.

You eat any sort of food because agriculture uses all kinds of climate polluting energy not to mention the big carbon footprint to process, package, ship and deliver that food to your local Whole Foods.

You are John Kerry.

So if you live off the grid, never fly in an airplane and don’t eat, then you can call me a denier. For the rest of you, please zip it. You deny climate change by your actions because you contribute daily to the very greenhouse gases you contend are destroying the planet. I’d rather be a denier than a hypocrite any day.

Correlated Equilibria in Game Theory

24 July, 2017

Erica Klarreich is one of the few science journalists who explains interesting things I don’t already know clearly enough so I can understand them. I recommend her latest article:

• Erica Klarreich, In game theory, no clear path to equilibrium, Quanta, 18 July 2017.

Economists like the concept of ‘Nash equilibrium’, but it’s problematic in some ways. This matters for society at large.

In a Nash equilibrium for a multi-player game, no player can improve their payoff by unilaterally changing their strategy. This doesn’t mean everyone is happy: it’s possible to be trapped in a Nash equilibrium where everyone is miserable, because anyone changing their strategy unilaterally would be even more miserable. (Think ‘global warming’.)

The great thing about Nash equilibria is that their meaning is easy to fathom, and they exist. John Nash won a Nobel prize for a paper proving that they exist. His paper was less than one page long. But he proved the existence of Nash equilibria for arbitrary multi-player games using a nonconstructive method: a fixed point theorem that doesn’t actually tell you how to find the equilibrium!

Given this, it’s not surprising that Nash equilibria can be hard to find. Last September a paper came out making this precise, in a strong way:

• Yakov Babichenko and Aviad Rubinstein, Communication complexity of approximate Nash equilibria.

The authors show there’s no guaranteed method for players to find even an approximate Nash equilibrium unless they tell each other almost everything about their preferences. This makes finding the Nash equilibrium prohibitively difficult to find when there are lots of players… in general. There are particular games where it’s not difficult, and that makes these games important: for example, if you’re trying to run a government well. (A laughable notion these days, but still one can hope.)

Klarreich’s article in Quanta gives a nice readable account of this work and also a more practical alternative to the concept of Nash equilibrium. It’s called a ‘correlated equilibrium’, and it was invented by the mathematician Robert Aumann in 1974. You can see an attempt to define it here:

• Wikipedia, Correlated equilibrium.

The precise mathematical definition near the start of this article is a pretty good example of how you shouldn’t explain something: it contains a big fat equation containing symbols not mentioned previously, and so on. By thinking about it for a while, I was able to fight my way through it. Someday I should improve it—and someday I should explain the idea here! But for now, I’ll just quote this passage, which roughly explains the idea in words:

The idea is that each player chooses their action according to their observation of the value of the same public signal. A strategy assigns an action to every possible observation a player can make. If no player would want to deviate from the recommended strategy (assuming the others don’t deviate), the distribution is called a correlated equilibrium.

According to Erica Klarreich it’s a useful notion. She even makes it sound revolutionary:

This might at first sound like an arcane construct, but in fact we use correlated equilibria all the time—whenever, for example, we let a coin toss decide whether we’ll go out for Chinese or Italian, or allow a traffic light to dictate which of us will go through an intersection first.

In [some] examples, each player knows exactly what advice the “mediator” is giving to the other player, and the mediator’s advice essentially helps the players coordinate which Nash equilibrium they will play. But when the players don’t know exactly what advice the others are getting—only how the different kinds of advice are correlated with each other—Aumann showed that the set of correlated equilibria can contain more than just combinations of Nash equilibria: it can include forms of play that aren’t Nash equilibria at all, but that sometimes result in a more positive societal outcome than any of the Nash equilibria. For example, in some games in which cooperating would yield a higher total payoff for the players than acting selfishly, the mediator can sometimes beguile players into cooperating by withholding just what advice she’s giving the other players. This finding, Myerson said, was “a bolt from the blue.”

(Roger Myerson is an economics professor at the University of Chicago who won a Nobel prize for his work on game theory.)

And even though a mediator can give many different kinds of advice, the set of correlated equilibria of a game, which is represented by a collection of linear equations and inequalities, is more mathematically tractable than the set of Nash equilibria. “This other way of thinking about it, the mathematics is so much more beautiful,” Myerson said.

While Myerson has called Nash’s vision of game theory “one of the outstanding intellectual advances of the 20th century,” he sees correlated equilibrium as perhaps an even more natural concept than Nash equilibrium. He has opined on numerous occasions that “if there is intelligent life on other planets, in a majority of them they would have discovered correlated equilibrium before Nash equilibrium.”

When it comes to repeated rounds of play, many of the most natural ways that players could choose to adapt their strategies converge, in a particular sense, to correlated equilibria. Take, for example, “regret minimization” approaches, in which before each round, players increase the probability of using a given strategy if they regret not having played it more in the past. Regret minimization is a method “which does bear some resemblance to real life — paying attention to what’s worked well in the past, combined with occasionally experimenting a bit,” Roughgarden said.

(Tim Roughgarden is a theoretical computer scientist at Stanford University.)

For many regret-minimizing approaches, researchers have shown that play will rapidly converge to a correlated equilibrium in the following surprising sense: after maybe 100 rounds have been played, the game history will look essentially the same as if a mediator had been advising the players all along. It’s as if “the [correlating] device was somehow implicitly found, through the interaction,” said Constantinos Daskalakis, a theoretical computer scientist at the Massachusetts Institute of Technology.

As play continues, the players won’t necessarily stay at the same correlated equilibrium — after 1,000 rounds, for instance, they may have drifted to a new equilibrium, so that now their 1,000-game history looks as if it had been guided by a different mediator than before. The process is reminiscent of what happens in real life, Roughgarden said, as societal norms about which equilibrium should be played gradually evolve.

In the kinds of complex games for which Nash equilibrium is hard to reach, correlated equilibrium is “the natural leading contender” for a replacement solution concept, Nisan said.

As Klarreich hints, you can find correlated equilibria using a technique called linear programming. That was proved here, I think:

• Christos H. Papadimitriou and Tim Roughgarden, Computing correlated equilibria in multi-player games, J. ACM 55 (2008), 14:1-14:29.

Do you know something about correlated equilibria that I should know? If so, please tell me!

A Bicategory of Decorated Cospans

8 July, 2017

My students are trying to piece together general theory of networks, inspired by many examples. A good general theory should clarify and unify these examples. What some people call network theory, I’d just call ‘applied graph invariant theory’: they come up with a way to calculate numbers from graphs, they calculate these numbers for graphs that show up in nature, and then they try to draw conclusions about this. That’s fine as far as it goes, but there’s a lot more to network theory!

There are many kinds of networks. You can usually create big networks of a given kind by sticking together smaller networks of this kind. The networks usually do something, and the behavior of the whole is usually determined by the behavior of the parts and how the parts are stuck together.

So, we should think of networks of a given kind as morphisms in a category, or more generally elements of an algebra of some operad, and define a map sending each such network to its behavior. Then we can study this map mathematically!

All these insights (and many more) are made precise in Fong’s theory of ‘decorated cospans’:

• Brendan Fong, The Algebra of Open and Interconnected Systems, Ph.D. thesis, University of Oxford, 2016. (Blog article here.)

Kenny Courser is starting to look at the next thing: how one network can turn into another. For example, a network might change over time, or we might want to simplify a complicated network somehow. If a network is morphism, a process where one network turns into another could be a ‘2-morphism’: that is, a morphism between morphisms. Just as categories have objects and morphisms, bicategories have objects, morphisms and 2-morphisms.

So, Kenny is looking at bicategories. As a first step, Kenny took Brendan’s setup and souped it up to define ‘decorated cospan bicategories’:

• Kenny Courser, Decorated cospan bicategories, to appear in Theory and Applications of Categories.

In this paper, he showed that these bicategories are often ‘symmetric monoidal’. This means that you can not only stick networks together end to end, you can also set them side by side or cross one over the other—and similarly for processes that turn one network into another! A symmetric monoidal bicategory is a somewhat fearsome structure, so Kenny used some clever machinery developed by Mike Shulman to get the job done:

• Mike Shulman, Constructing symmetric monoidal bicategories.

I would love to talk about the details, but they’re a bit technical so I think I’d better talk about something more basic. Namely: what’s a decorated cospan category and what’s a decorated cospan bicategory?

First: what’s a decorated cospan? A cospan in some category C is a diagram like this:

where the objects and morphisms are all in C. For example, if C is the category of sets, we’ve got two sets X and Y mapped to a set \Gamma.

In a ‘decorated’ cospan, the object \Gamma is equipped or, as we like to say, ‘decorated’ with extra structure. For example:

Here the set \Gamma consists of 3 points—but it’s decorated with a graph whose edges are labelled by numbers! You could use this to describe an electrical circuit made of resistors. The set X would then be the set of ‘input terminals’, and Y the set of ‘output terminals’.

In this example, and indeed in many others, there’s no serious difference between inputs and outputs. We could reflect the picture, switching the roles of X and Y, and the inputs would become outputs and vice versa. One reason for distinguishing them is that we can then attach the outputs of one circuit to the inputs of another and build a larger circuit. If we think of our circuit as a morphism from the input set X to the output set Y, this process of attaching circuits to form larger ones can be seen as composing morphisms in a category.

In other words, if we get the math set up right, we can compose a decorated cospan from X to Y and a decorated cospan from Y to Z and get a decorated cospan from X to Z. So with luck, we get a category with objects of C as objects, and decorated cospans between these guys as morphisms!

For example, we can compose this:

and this:

to get this:

What did I mean by saying ‘with luck’? Well, there’s not really any luck involved, but we need some assumptions for all this to work. Before we even get to the decorations, we need to be able to compose cospans. We can do this whenever our cospans live in a category with pushouts. In category theory, a pushout is how we glue two things together.

So, suppose our category C has pushouts. IF we then have two cospans in C, one from X to Y and one from Y to Z:

we can take a pushout:

and get a cospan from X to Z:

All this is fine and dandy, but there’s a slight catch: the pushout is only defined up to isomorphism, so we can’t expect this process of composing cospans to be associative: it will only be associative up to isomorphism.

What does that mean? What’s an isomorphism of cospans?

I’m glad you asked. A map of cospans is a diagram like this:

where the two triangles commmute. You can see two cospans in this picture; the morphism f provides the map from one to the other. If f is an isomorphism, then this is an isomorphism of cospans.

To get around this problem, we can work with a category where the morphisms aren’t cospans, but isomorphism classes of cospans. That’s what Brendan did, and it’s fine for many purposes.

But back around 1972, when Bénabou was first inventing bicategories, he noticed that you could also create a bicategory with

• objects of C as objects,
• spans in C as morphisms, and
• maps of spans in C as 2-morphisms.

Bicategories are perfectly happy for composition of 1-morphisms to be associative only up to isomorphism, so this solves the problem in a somewhat nicer way. (Taking equivalence classes of things when you don’t absolutely need to is regarded with some disdain in category theory, because it often means you’re throwing out information—and when you throw out information, you often regret it later.)

So, if you’re interested in decorated cospan categories, and you’re willing to work with bicategories, you should consider thinking about decorated cospan bicategories. And now, thanks to Kenny Courser’s work, you can!

He showed how the decorations work in the bicategorical approach: for example, he proved that whenever C has finite colimits and

F : (C,+) \to (\mathrm{Set}, \times)

is a lax symmetric monoidal functor, you get a symmetric monoidal bicategory where a morphism is a cospan in C:

with the object \Gamma decorated by an element of F(\Gamma).

Proving this took some virtuosic work in category theory. The key turns out to be this glorious diagram:

For the explanation, check out Proposition 4.1 in his paper.

I’ll talk more about applications of cospan bicategories when I blog about some other papers Kenny Courser and Daniel Cicala are writing.

Entropy 2018

6 July, 2017

The editors of the journal Entropy are organizing this conference:

Entropy 2018 — From Physics to Information Sciences and Geometry, 14–16 May 2018, Auditorium Enric Casassas, Faculty of Chemistry, University of Barcelona, Barcelona, Spain.

They write:

One of the most frequently used scientific words is the word “entropy”. The reason is that it is related to two main scientific domains: physics and information theory. Its origin goes back to the start of physics (thermodynamics), but since Shannon, it has become related to information theory. This conference is an opportunity to bring researchers of these two communities together and create a synergy. The main topics and sessions of the conference cover:

• Physics: classical and quantum thermodynamics
• Statistical physics and Bayesian computation
• Geometrical science of information, topology and metrics
• Maximum entropy principle and inference
• Kullback and Bayes or information theory and Bayesian inference
• Entropy in action (applications)

The inter-disciplinary nature of contributions from both theoretical and applied perspectives are very welcome, including papers addressing conceptual and methodological developments, as well as new applications of entropy and information theory.

All accepted papers will be published in the proceedings of the conference. A selection of invited and contributed talks presented during the conference will be invited to submit an extended version of their paper for a special issue of the open access journal Entropy.

The Geometric McKay Correspondence (Part 2)

2 July, 2017

Last time I sketched how the E_8 Dynkin diagram arises from the icosahedron. This time I’m fill in some details. I won’t fill in all the details, because I don’t know how! Working them out is the goal of this series, and I’d like to enlist your help.

(In fact, I’m running this series of posts both here and at the n-Category Café. So far I’m getting many more comments over there. So, to keep the conversation in one place, I’ll disable comments here and urge you to comment over there.)

Remember the basic idea. We start with the rotational symmetry group of the isosahedron and take its double cover, getting a 120-element group \Gamma called the binary icosahedral group. Since this is naturally a subgroup of \mathrm{SU}(2) it acts on \mathbb{C}^2, and we can form the quotient space

S = \mathbb{C}^2/\Gamma

This is a smooth manifold except at the origin—by which I mean the point coming from 0 \in \mathbb{C}^2. Luckily we can ‘resolve’ this singularity! This implies that we can find a smooth manifold \widetilde{S} and a smooth map

\pi \colon \widetilde{S} \to S

that’s one-to-one and onto except at the origin. There may be various ways to do this, but there’s one best way, the ‘minimal’ resolution, and that’s what I’ll be talking about.

The origin is where all the fun happens. The map \pi sends 8 spheres to the origin in \mathbb{C}^2/\Gamma, one for each dot in the \mathrm{E}_8 Dynkin diagram:

Two of these spheres intersect in a point if their dots are connected by an edge; otherwise they’re disjoint.

This is wonderful! So, the question is just how do we really see it? For starters, how do we get our hands on this manifold \widetilde{S} and this map \pi \colon \widetilde{S} \to S?

For this we need some algebraic geometry. Indeed, the whole subject of ‘resolving singularities’ is part of algebraic geometry! However, since I still remember my ignorant youth, I want to avoid flinging around the vocabulary of this subject until we actually need it. So, experts will have to pardon my baby-talk. Nonexperts can repay me in cash, chocolate, bitcoins or beer.

What’s \widetilde{S} like? First I’ll come out and tell you, and then I’ll start explaining what the heck I just said.

Theorem. \widetilde{S} is the space of all \Gamma-invariant ideals I \subseteq \mathbb{C}[x,y] such that \mathbb{C}[x,y]/I is isomorphic, as a representation of \Gamma, to the regular representation of \Gamma.

If you want a proof, this is Corollary 12.8 in Kirillov’s Quiver Representations and Quiver Varieties. It’s on page 245, so you’ll need to start by reading lots of other stuff. It’s a great book! But it’s not completely self-contained: for example, right before Corollary 12.8 he brings in a crucial fact without proof: “it can be shown that in dimension 2, if a crepant resolution exists, it is minimal”.

I will not try to prove the theorem; instead I will start explaining what it means.

Suppose you have a bunch of points p_1, \dots, p_n \in \mathbb{C}^2. We can look at all the polynomials on \mathbb{C}^2 that vanish at these points. What is this collection of polynomials like?

Let’s use x and y as names for the standard coordinates on \mathbb{C}^2, so polynomials on \mathbb{C}^2 are just polynomials in these variables. Let’s call the ring of all such polynomials \mathbb{C}[x,y]. And let’s use I to stand for the collection of such polynomials that vanish at our points p_1, \dots, p_n.

Here are two obvious facts about I:

A. If f \in I and g \in I then f + g \in I.

B. If f \in I and g \in \mathbb{C}[x,y] then fg \in I.

We summarize these by saying I is an ideal, and this is why we called it I. (So clever!)

Here’s a slightly less obvious fact about I:

C. If the points p_1, \dots, p_n are all distinct, then \mathbb{C}[x,y]/I has dimension n.

The point is that the value of a function f \in \mathbb{C}[x,y] at a point p_i doesn’t change if we add an element of I to f, so this value defines a linear functional on \mathbb{C}[x,y]/I. Guys like this form a basis of linear functionals on \mathbb{C}[x,y]/I, so it’s n-dimensional.

All this should make you interested in the set of ideals I with \mathrm{dim}(\mathbb{C}[x,y]/I) = n. This set is called the Hilbert scheme \mathrm{Hilb}^n(\mathbb{C}^2).

Why is it called a scheme? Well, Hilbert had a bunch of crazy schemes and this was one. Just kidding: actually Hilbert schemes were invented by Grothendieck in 1961. I don’t know why he named them after Hilbert. The kind of Hilbert scheme I’m using is a very basic one, more precisely called the ‘punctual’ Hilbert scheme.

The Hilbert scheme \mathrm{Hilb}^n(\mathbb{C}^2) is a whole lot like the set of unordered n-tuples of distinct points in \mathbb{C}^2. Indeed, we’ve seen that every such n-tuple gives a point in the Hilbert scheme. But there are also other points in the Hilbert scheme! And this is where the fun starts!

Imagine n particles moving in \mathbb{C}^2, with their motion described by polynomial functions of time. As long as these particles don’t collide, they define a curve in the Hilbert scheme. But it still works when they collide! When they collide, this curve will hit a point in the Hilbert scheme that doesn’t come from an unordered n-tuple of distinct points in \mathbb{C}^2. This point describes a ‘type of collision’.

More precisely: n-tuples of distinct points in \mathbb{C}^2 give an open dense set in the Hilbert scheme, but there are other points in the Hilbert scheme which can be reached as limits of those in this open dense set! The topology here is very subtle, so let’s look at an example.

Let’s look at the Hilbert scheme \mathrm{Hilb}^2(\mathbb{C}^2). Given two distinct points p_1, p_2 \in \mathbb{C}^2, we get an ideal

\{ f \in \mathbb{C}[x,y] \, : \; f(p_1) = f(p_2) = 0 \}

This ideal is a point in our Hilbert scheme, since \mathrm{dim}(\mathbb{C}[x,y]/I) = 2 .

But there are other points in our Hilbert scheme! For example, if we take any point p \in \mathbb{C}^2 and any vector v \in \mathbb{C}^2, there’s an ideal consisting of polynomials that vanish at p and whose directional derivative in the v direction also vanishes at p:

\displaystyle{ I = \{ f \in \mathbb{C}[x,y] \, : \; f(p) = \lim_{t \to 0} \frac{f(p+t v) - f(p)}{t} = 0 \} }

It’s pretty easy to check that this is an ideal and that \mathrm{dim}(\mathbb{C}[x,y]/I) = 2 . We can think of this ideal as describing two particles in \mathbb{C}^2 that have collided at p with relative velocity some multiple of v.

For example you could have one particle sitting at p while another particle smacks into it while moving with velocity v; as they collide the corresponding curve in the Hilbert scheme would hit I.

This would also work if the velocity were any multiple of v, since we also have

\displaystyle{ I = \{ f \in \mathbb{C}[x,y] \, : \; f(p) = \lim_{t \to 0} \frac{f(p+ c t v) - f(p)}{t} = 0 \} }

for any constant c \ne 0. And note, this constant can be complex. I’m trying to appeal to your inner physicist, but we’re really doing algebraic geometry over the complex numbers, so we can do weird stuff like multiply velocities by complex numbers.

Or, both particles could be moving and collide at p while their relative velocity was some complex multiple of v. As they collide, the corresponding point in the Hilbert scheme would still hit I.

But here’s the cool part: such ‘2-particle collisions with specified position and relative velocity’ give all the points in the Hilbert scheme \mathrm{Hilb}^2(\mathbb{C}^2), except of course for those points coming from 2 particles with distinct positions.

What happens when we go to the next Hilbert scheme, \mathrm{Hilb}^3(\mathbb{C}^2)? This Hilbert scheme has an open dense set corresponding to triples of particles with distinct positions. It has other points coming from situations where two particles collide with some specified position and relative velocity while a third ‘bystander’ particle sits somewhere else. But it also has points coming from triple collisions. And these are more fancy! Not only velocities but accelerations play a role!

I could delve into this further, but for now I’ll just point you here:

• John Baez, The Hilbert scheme for 3 points on a surface, MathOverflow, June 7, 2017.

The main thing to keep in mind is this. As n increases, there are more and more ways we can dream up ideals I with \mathrm{dim}(\mathbb{C}[x,y]/I) = n. But all these ideals consist of functions that vanish at n or fewer points and also obey other equations saying that various linear combinations of their first, second, and higher derivatives vanish. We can think of these ideals as ways for n particles to collide, with conditions on their positions, velocities, accelerations, etc. The total number of conditions needs to be n.

Now let’s revisit that description of the wonderful space we’re seeking to understand, \widetilde{S}:

Theorem. \widetilde{S} is the space of all \Gamma-invariant ideals I \subseteq \mathbb{C}[x,y] such that \mathbb{C}[x,y]/I is isomorphic, as a representation of \Gamma, to the regular representation of \Gamma.

Since \Gamma has 120 elements, its regular representation—the obvious representation of this group on the space of complex functions on this group—is 120-dimensional. So, points in \widetilde{S} are ideals I with \mathrm{dim}(\mathbb{C}[x,y]/I) = 120 . So, they’re points in the Hilbert scheme \mathrm{Hilb}^{120}(\mathbb{C}^2).

But they’re not just any old points in this Hilbert scheme! The binary icosahedral group \Gamma acts on \mathbb{C}^2 and thus anything associated with it. In particular, it acts on the Hilbert scheme \mathrm{Hilb}^{120}(\mathbb{C}^2). A point in this Hilbert scheme can lie in \widetilde{S} only if it’s invariant under the action of \Gamma. And given this, it’s in \widetilde{S} if and only if \mathbb{C}[x,y]/I is isomorphic to the regular representation of \Gamma.

Given all this, there’s an easy way to get your hands on a point I \in \widetilde{S}. Just take any nonzero element of \mathbb{C}^2 and act on it by \Gamma. You’ll get 120 distinct points in \mathbb{C}^2 — I promise. Do you see why? Then let I be the set of polynomials that vanish on all these points.

If you don’t see why this works, please ask me.

In fact, we saw last time that your 120 points will be the vertices of a 600-cell centered at the origin of \mathbb{C}^2:

By this construction we get enough points to form an open dense subset of \widetilde{S}. These are the points that aren’t mapped to the origin by

\pi \colon \widetilde{S} \to S

Alas, it’s the other points in \widetilde{S} that I’m really interested in. As I hope you see, these are certain ‘limits’ of 600-cells that have ‘shrunk to the origin’… or in other words, highly symmetrical ways for 120 points in \mathbb{C}^2 to collide at the origin, with some highly symmetrical conditions on their velocities, accelerations, etc.

That’s what I need to understand.

The Theory of Devices

20 June, 2017

I’m visiting the University of Genoa and talking to two category theorists: Marco Grandis and Giuseppe Rosolini. Grandis works on algebraic topology and higher categories, while Rosolini works on the categorical semantics of programming languages.

Yesterday, Marco Grandis showed me a fascinating paper by his thesis advisor:

• Gabriele Darbo, Aspetti algebrico-categoriali della teoria dei dispotivi, Symposia Mathematica IV (1970), Istituto Nazionale di Alta Matematica, 303–336.

It’s closely connected to Brendan Fong’s thesis, but also different—and, of course, much older. According to Grandis, Darbo was the first person to work on category theory in Italy. He’s better known for other things, like ‘Darbo’s fixed point theorem’, but this piece of work is elegant, and, it seems to me, strangely ahead of its time.

The paper’s title translates as ‘Algebraic-categorical aspects of the theory of devices’, and its main concept is that of a ‘universe of devices’: a collection of devices of some kind that can be hooked up using wires to form more devices of this kind. Nowadays we might study this concept using operads—but operads didn’t exist in 1970, and Darbo did quite fine without them.

The key is the category \mathrm{FinCorel}, which has finite sets as objects and ‘corelations’ as morphisms. I explained corelations here:

Corelations in network theory, 2 February 2016.

Briefly, a corelation from a finite set X to a finite set Y is a partition of the disjoint union of X and Y. We can get such a partition from a bunch of wires connecting points of X and Y. The idea is that two points lie in the same part of the partition iff they’re connected, directly or indirectly, by a path of wires. So, if we have some wires like this:

they determine a corelation like this:

There’s an obvious way to compose corelations, giving a category \mathrm{FinCorel}.

Gabriele Darbo doesn’t call them ‘corelations’: he calls them ‘trasduttori’. A literal translation might be ‘transducers’. But he’s definitely talking about corelations, and like Fong he thinks they are basic for studying ways to connect systems.

Darbo wants a ‘universe of devices’ to assign to each finite set X a set D(X) of devices having X as their set of ‘terminals’. Given a device in D(X) and a corelation f \colon X \to Y, thought of as a bunch of wires, he wants to be able to attach these wires to the terminals in X and get a new device with Y as its set of terminals. Thus he wants a map D(f): D(X) \to D(Y). If you draw some pictures, you’ll see this should give a functor

D : \mathrm{FinCorel} \to \mathrm{Set}

Moreover, if we have device with a set X of terminals and a device with a set Y of terminals, we should be able to set them side by side and get a device whose set of terminals form the set X + Y, meaning the disjoint union of X and Y. So, Darbo wants to have maps

\delta_{X,Y} : D(X) \times D(Y) \to D(X + Y)

If you draw some more pictures you can convince yourself that \delta should be a lax symmetric monoidal functor… if you’re one of the lucky few who knows what that means. If you’re not, you can look it up in many places, such as Section 1.2 here:

• Brendan Fong, The Algebra of Open and Interconnected Systems, Ph.D. thesis, University of Oxford, 2016. (Blog article here.)

Darbo does not mention lax symmetric monoidal functors, perhaps because such concepts were first introduced by Mac Lane only in 1968. But as far as I can tell, Darbo’s definition is almost equivalent to this:

Definition. A universe of devices is a lax symmetric monoidal functor D \colon \mathrm{FinCorel} \to \mathrm{Set}.

One difference is that Darbo wants there to be exactly one device with no terminals. Thus, he assumes D(\emptyset) is a one-element set, say 1, while the definition above would only demand the existence of a map \delta \colon 1 \to D(\emptyset) obeying a couple of axioms. That gives a particular ‘favorite’ device with no terminals. I believe we get Darbo’s definition from the above one if we further assume \delta is the identity map. This makes sense if we take the attitude that ‘a device is determined by its observable behavior’, but not otherwise. This attitude is called ‘black-boxing’.

Darbo does various things in his paper, but the most exciting to me is his example connected to linear electrical circuits. He defines, for any pair of objects V and I in an abelian category C, a particular universe of devices. He calls this the universe of linear devices having V as the object of potentials and I as the object of currents.

If you don’t like abelian categories, think of C as the category of finite-dimensional real vector spaces, and let V = I = \mathbb{R}. Electric potential and electric current are described by real numbers so this makes sense.

The basic idea will be familiar to Fong fans. In an electrical circuit made of purely conductive wires, when two wires merge into one we add the currents to get the current on the wire going out. When one wire splits into two we duplicate the potential to get the potentials on the wires going out. Working this out further, any corelation f : X \to Y between finite set determines two linear relations, one

f_* : I^X \rightharpoonup I^Y

relating the currents on the wires coming in to the currents on the wires going out, and one

f^* : V^Y \rightharpoonup V^X

relating the potentials on the wires going out to the potentials on the wires coming in. Here I^X is the direct sum of X copies of I, and so on; the funky arrow indicates that we have a linear relation rather than a linear map. Note that f_* goes forward while f^* goes backward; this is mainly just conventional, since you can turn linear relations around, but we’ll see it’s sort of nice.

If we let \mathrm{Rel}(A,B) be the set of linear relations between two objects A, B \in C, we can use the above technology to get a universe of devices where

D(X) = \mathrm{Rel}(V^X, I^X)

In other words, a device of this kind is simply a linear relation between the potentials and currents at its terminals!

How does D get to be a functor D : \mathrm{FinCorel} \to \mathrm{FinSet}? That’s pretty easy. We’ve defined it on objects (that is, finite sets) by the above formula. So, suppose we have a morphism (that is, a corelation) f \colon X \to Y. How do we define D(f) : D(X) \to D(Y)?

To answer this question, we need a function

D(f) : \mathrm{Rel}(V^X, I^X) \to \mathrm{Rel}(V^Y, I^Y)

Luckily, we’ve got linear relations

f_* : I^X \rightharpoonup I^Y


f^* : V^Y \rightharpoonup V^X

So, given any linear relation R \in \mathrm{Rel}(V^X, I^X), we just define

D(f)(R) = f_* \circ R \circ f^*


People who have read Fong’s thesis, or my paper with Blake Pollard on reaction networks:

• John Baez and Blake Pollard, A compositional framework for reaction networks.

will find many of Darbo’s ideas eerily similar. In particular, the formula

D(f)(R) = f_* \circ R \circ f^*

appears in Lemma 16 of my paper with Blake, where we are defining a category of open dynamical systems. We prove that D is a lax symmetric monoidal functor, which is just what Darbo proved—though in a different context, since our R is not linear like his, and for a different purpose, since he’s trying to show D is a ‘universe of devices’, while we’re trying to construct the category of open dynamical systems as a ‘decorated cospan category’.

In short: if this work of Darbo had become more widely known, the development of network theory could have been sped up by three decades! But there was less interest in a general theory of networks at the time, lax monoidal functors were brand new, operads unknown… and, sadly, few mathematicians read Italian.

Darbo has other papers, and so do his students. We should read them and learn from them! Here are a few open-access ones:

• Franco Parodi, Costruzione di un universo di dispositivi non lineari su una coppia di gruppi abeliani , Rendiconti del Seminario Matematico della Università di Padova 58 (1977), 45–54.

• Franco Parodi, Categoria degli universi di dispositivi e categoria delle T-algebre, Rendiconti del Seminario Matematico della Università di Padova 62 (1980), 1–15.

• Stefano Testa, Su un universo di dispositivi monotoni, Rendiconti del Seminario Matematico della Università di Padova 65 (1981), 53–57.

At some point I will scan in G. Darbo’s paper and make it available here.

The Geometric McKay Correspondence (Part 1)

19 June, 2017

The ‘geometric McKay correspondence’, actually discovered by Patrick du Val in 1934, is a wonderful relation between the Platonic solids and the ADE Dynkin diagrams. In particular, it sets up a connection between two of my favorite things, the icosahedron:

and the \mathrm{E}_8 Dynkin diagram:

When I recently gave a talk on this topic, I realized I didn’t understand it as well as I’d like. Since then I’ve been making progress with the help of this book:

• Alexander Kirillov Jr., Quiver Representations and Quiver Varieties, AMS, Providence, Rhode Island, 2016.

I now think I glimpse a way forward to a very concrete and vivid understanding of the relation between the icosahedron and E8. It’s really just a matter of taking the ideas in this book and working them out concretely in this case. But it takes some thought, at least for me. I’d like to enlist your help.

The rotational symmetry group of the icosahedron is a subgroup of \mathrm{SO}(3) with 60 elements, so its double cover up in \mathrm{SU}(2) has 120. This double cover is called the binary icosahedral group, but I’ll call it \Gamma for short.

This group \Gamma is the star of the show, the link between the icosahedron and E8. To visualize this group, it’s good to think of \mathrm{SU}(2) as the unit quaternions. This lets us think of the elements of \Gamma as 120 points in the unit sphere in 4 dimensions. They are in fact the vertices of a 4-dimensional regular polytope, which looks like this:

It’s called the 600-cell.

Since \Gamma is a subgroup of \mathrm{SU}(2) it acts on \mathbb{C}^2, and we can form the quotient space

S = \mathbb{C}^2/\Gamma

This is a smooth manifold except at the origin—that is, the point coming from 0 \in \mathbb{C}^2. There’s a singularity at the origin, and this where \mathrm{E}_8 is hiding! The reason is that there’s a smooth manifold \widetilde{S} and a map

\pi : \widetilde{S} \to S

that’s one-to-one and onto except at the origin. It maps 8 spheres to the origin! There’s one of these spheres for each dot here:

Two of these spheres intersect in a point if their dots are connected by an edge; otherwise they’re disjoint.

The challenge is to find a nice concrete description of \widetilde{S}, the map \pi : \widetilde{S} \to S, and these 8 spheres.

But first it’s good to get a mental image of S. Each point in this space is a \Gamma orbit in \mathbb{C}^2, meaning a set like this:

\{g x : \; g \in \Gamma \}

for some x \in \mathbb{C}^2. For x = 0 this set is a single point, and that’s what I’ve been calling the ‘origin’. In all other cases it’s 120 points, the vertices of a 600-cell in \mathbb{C}^2. This 600-cell is centered at the point 0 \in \mathbb{C}^2, but it can be big or small, depending on the magnitude of x.

So, as we take a journey starting at the origin in S, we see a point explode into a 600-cell, which grows and perhaps also rotates as we go. The origin, the singularity in S, is a bit like the Big Bang.

Unfortunately not every 600-cell centered at the origin is of the form I’ve shown:

\{g x : \; g \in \Gamma \}

It’s easiest to see this by thinking of points in 4d space as quaternions rather than elements of \mathbb{C}^2. Then the points g \in \Gamma are unit quaternions forming the vertices of a 600-cell, and multiplying g on the right by x dilates this 600-cell and also rotates it… but we don’t get arbitrary rotations this way. To get an arbitrarily rotated 600-cell we’d have to use both a left and right multiplication, and consider

\{x g y : \; g \in \Gamma \}

for a pair of quaternions x, y.

Luckily, there’s a simpler picture of the space S. It’s the space of all regular icosahedra centered at the origin in 3d space!

To see this, we start by switching to the quaternion description, which says

S = \mathbb{H}/\Gamma

Specifying a point x \in \mathbb{H} amounts to specifying the magnitude \|x\| together with x/\|x\|, which is a unit quaternion, or equivalently an element of \mathrm{SU}(2). So, specifying a point in

\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma

amounts to specifying the magnitude \|x\| together with a point in \mathrm{SU}(2)/\Gamma. But \mathrm{SU}(2) modulo the binary icosahedral group \Gamma is the same as \mathrm{SO(3)} modulo the icosahedral group (the rotational symmetry group of an icosahedron). Furthermore, \mathrm{SO(3)} modulo the icosahedral group is just the space of unit-sized icosahedra centered at the origin of \mathbb{R}^3.

So, specifying a point

\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma

amounts to specifying a nonnegative number \|x\| together with a unit-sized icosahedron centered at the origin of \mathbb{R}^3. But this is the same as specifying an icosahedron of arbitrary size centered at the origin of \mathbb{R}^3. There’s just one subtlety: we allow the size of this icosahedron to be zero, but then the way it’s rotated no longer matters.

So, S is the space of icosahedra centered at the origin, with the ‘icosahedron of zero size’ being a singularity in this space. When we pass to the smooth manifold \widetilde{S}, we replace this singularity with 8 spheres, intersecting in a pattern described by the \mathrm{E}_8 Dynkin diagram.

Points on these spheres are limiting cases of icosahedra centered at the origin. We can approach these points by letting an icosahedron centered at the origin shrink to zero size in a clever way, perhaps spinning about wildly as it does.

I don’t understand this last paragraph nearly as well as I’d like! I’m quite sure it’s true, and I know a lot of relevant information, but I don’t see it. There should be a vivid picture of how this works, not just an abstract argument. Next time I’ll start trying to assemble the material that I think needs to go into building this vivid picture.