Topos Theory (Part 5)

28 January, 2020

It’s time to understand why the category of sheaves on a topological space acts like the category of sets in the following ways:

• It has finite colimits.
• It has finite limits.
• It is cartesian closed.
• It has a subboject classifier.

We summarize these four properties by saying the category of sheaves is an elementary topos. (In fact it’s better, since it has all limits and colimits.)

As a warmup, first let’s see why that the category of presheaves on a topological space is an elementary topos!

It’s actually just as easy to see something more general, which will come in handy later: the category of presheaves on any category is an elementary topos. Remember, given a category \mathsf{C}, a presheaf on \mathsf{C} is a functor

F \colon \mathsf{C}^{\textrm{op}} \to \mathsf{Set}

A morphism of presheaves from F to another presheaf

G \colon \mathsf{C}^{\textrm{op}} \to \mathsf{Set}

is a natural transformation

\alpha \colon F \Rightarrow G

Presheaves on \mathsf{C} and morphisms between them form a presheaf category, which we call


or \widehat{\mathsf{C}} for short.

Presheaves on a topological space X are just the special case where we take \mathsf{C} to be the poset of open subsets of X. So if you want a ‘geometrical’ intuition for presheaves on a category, you should imagine the objects of the category as being like open sets. This will come in handy later, when we talk about sheaves on a category.

But there is also another important intuition regarding presheaves. This is more of an ‘algebraic’ intuition. Starting from a category \mathsf{C} and building the presheaf category


is analogous to taking a set S and building the set


of all functions from S to the integers. \mathbb{Z}^S is a commutative ring if we use pointwise addition and multiplication of functions as our ring operations. \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}} is an elementary topos—but this means we should think of an elementary topos as being a bit like a commutative ring. More precisely, it’s like a ‘categorified’ commutative ring, since it’s a category rather than merely a set.

There’s a one-to-one function

S \hookrightarrow \mathbb{Z}^S

sending any element of S to the characteristic function of that element. Similarly, there’s a full and faithful functor

\mathsf{C} \hookrightarrow \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

sending any object c \in \mathsf{C} to the presheaf \mathrm{hom}(-,c). This is called the Yoneda embedding. So, presheaf categories are a trick for embedding categories into elementary topoi!

In fact presheaf categories have all limits and colimits, so they are better than just elementary topoi: we will someday see they are examples of ‘Grothendieck topoi’. There are also other ways in which my story just now could be polished, but it would be a bit distracting to do so. Let’s get to work and study presheaf categories!

Today I’ll just talk about colimits and limits.

Colimits in presheaf categories

Presheaf categories have all colimits, and these colimits can be ‘computed pointwise’. What does this mean? Colimits are a lot like addition, and taking colimits of presheaves is a lot like how we add functions from a set to \mathbb{Z}. We just add their values at each point.

More precisely, say we have a diagram of presheaves on \mathsf{C}. This is a functor

F \colon \mathsf{D} \to \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

where \mathsf{D} is any small category, the ‘shape’ of our diagram. The colimit of F should be a presheaf, and I’ll call this

\mathrm{colim} F \in \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

How do we compute it? Well, notice that F is a functor from \mathsf{D} to the category of functors from \mathsf{C}^{\mathrm{op}} to \mathsf{Set}. So, we can change our viewpoint and think of it as a functor

F \colon \mathsf{D} \times \mathsf{C}^{\mathrm{op}} \to \mathsf{Set}

I’m using the same name for this because I’m lazy! Note that for each object c \in \mathsf{C}^{\mathrm{op}} we get a functor

F(-, c) \colon \mathsf{D} \to \mathsf{Set}

Since \mathsf{Set} has colimits, we can take the colimit of this functor and get a set

\mathrm{colim} F(-, c)

But you can check that this set depends functorially on c, so it defines a functor from \mathsf{C}^{\mathrm{op}} to \mathsf{Set}, which is our desired functor

\mathrm{colim} F \in  \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

Of course, you also have to check that this really is the colimit of our diagram of presheaves on \mathsf{C}.

Mac Lane and Moerdijk refer the reader to Section V.3 of Categories for the Working Mathematician for a proof of this result, but I think it’s better to prove it yourself. That is, it seems less painful to follow your nose and do the obvious thing at each step than to wrap your brain around someone else’s notation. I guess this is only true if you’ve go the hang of the subject, but anyway:

Puzzle. Show in the above situation that \mathrm{colim} F(-, c) depends functorially on c \in \mathsf{C}^{\mathrm{op}} and that the resulting functor is the colimit of the diagram F.

By the way, I can’t resist mentioning an important fact here: the category of presheaves on \mathsf{C} is the free category with all colimits on \mathsf{C}. In other words, it not only has all colimits, it’s precisely what you’d get by taking \mathsf{C} and ‘freely’ throwing in all colimits. This is why I mentioned colimits first.

This fact is analogous to the fact that when S is a finite set, \mathbb{Z}^S is the free abelian group on S. The fact that we don’t need a finiteness condition when working with presheaves is one of those ways in which categories are nicer than sets.

Limits in presheaf categories

Presheaf categories also have all limits, and these too can be ‘computed pointwise’. The argument is just like that for colimits, but now we use the fact that \mathsf{Set} has all limits.

Puzzle. Given a diagram of presheaves on \mathsf{C}

F \colon \mathsf{D} \to \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

show that \mathrm{lim} F(-, c) depends functorially on c \in \mathsf{C}^{\mathrm{op}} and that the resulting functor is the limit of the diagram F.

The category of graphs

It helps to understand some examples of what we’re doing here, and a nice example is the category of graphs. There’s a category \mathsf{G} that has two objects v and e, and just two morphisms

s, t \colon v \to e

besides the identity morphisms. A presheaf on \mathsf{G} is what category theorists call a graph.

Why? Well, say we have such a presheaf

F \colon \mathsf{G}^{\mathrm{op}} \to \mathsf{Set}

It consists of a set V = F(v) called the the set of vertices and a set E = F(e) called the set of edges, along with two functions

F(s) \colon E \to V

F(t) \colon E \to V

assigning to each edge its source and target. This is just what category theorists call a graph; graph theorists might call it a ‘directed multigraph’, and some other mathematicians would call it a ‘quiver’.

We will use \mathsf{Graph} to mean the category of graphs, namely the presheaf category

\widehat{\mathsf{G}} = \mathsf{Set}^{\mathsf{G}^{\mathrm{op}}}

Note how the ‘op’ in the definition of presheaf turned the morphisms s,t \colon v \to e into functions F(s), F(t) \colon E \to V. The ‘op’ is more of a nuisance than a help in this example (at least so far): we had to make s and e go from v to e precisely to counteract the effect of this ‘op’.

Since colimits and limits are computed pointwise in a presheaf category, we can compute the colimit or limit of a diagram of graphs quite easily: we just take the colimit of the sets of vertices and the sets of edges separately, and the rest goes along for the ride.

Puzzle. Let the graph G_V be the walking vertex: that is, the graph with just one vertex and no edges. Let G_E be the walking edge: that is, the graph with one edge and two vertices, the source and target of that edge. Show that there are exactly two different morphisms

f, g \colon G_V \to G_E

Compute the equalizer and coequalizer of

f, g \colon G_V \to G_E

Puzzle. Show explicitly how to build any graph as a colimit of copies of the walking vertex and the walking edge. That is, show any graph is isomorphic to the colimit of some diagram in \mathsf{Graph}, where the objects in this diagram are all copies of G_V and/or G_E.

Entropy in the Universe

25 January, 2020

If you click on this picture, you’ll see a zoomable image of the Milky Way with 84 million stars:

But stars contribute only a tiny fraction of the total entropy in the observable Universe. If it’s random information you want, look elsewhere!

First: what’s the ‘observable Universe’, exactly?

The further you look out into the Universe, the further you look back in time. You can’t see through the hot gas from 380,000 years after the Big Bang. That ‘wall of fire’ marks the limits of the observable Universe.

But as the Universe expands, the distant ancient stars and gas we see have moved even farther away, so they’re no longer observable. Thus, the so-called ‘observable Universe’ is really the ‘formerly observable Universe’. Its edge is 46.5 billion light years away now!

This is true even though the Universe is only 13.8 billion years old. A standard challenge in understanding general relativity is to figure out how this is possible, given that nothing can move faster than light.

What’s the total number of stars in the observable Universe? Estimates go up as telescopes improve. Right now people think there are between 100 and 400 billion stars in the Milky Way. They think there are between 170 billion and 2 trillion galaxies in the Universe.

In 2009, Chas Egan and Charles Lineweaver estimated the total entropy of all the stars in the observable Universe at 1081 bits. You should think of these as qubits: it’s the amount of information to describe the quantum state of everything in all these stars.

But the entropy of interstellar and intergalactic gas and dust is about ten times more the entropy of stars! It’s about 1082 bits.

The entropy in all the photons in the Universe is even more! The Universe is full of radiation left over from the Big Bang. The photons in the observable Universe left over from the Big Bang have a total entropy of about 1090 bits. It’s called the ‘cosmic microwave background radiation’.

The neutrinos from the Big Bang also carry about 1090 bits—a bit less than the photons. The gravitons carry much less, about 1088 bits. That’s because they decoupled from other matter and radiation very early, and have been cooling ever since. On the other hand, photons in the cosmic microwave background radiation were formed by annihilating
electron-positron pairs until about 10 seconds after the Big Bang. Thus the graviton radiation is expected to be cooler than the microwave background radiation: about 0.6 kelvin as compared to 2.7 kelvin.

Black holes have immensely more entropy than anything listed so far. Egan and Lineweaver estimate the entropy of stellar-mass black holes in the observable Universe at 1098 bits. This is connected to why black holes are so stable: the Second Law says entropy likes to increase.

But the entropy of black holes grows quadratically with mass! So black holes tend to merge and form bigger black holes — ultimately forming the ‘supermassive’ black holes at the centers of most galaxies. These dominate the entropy of the observable Universe: about 10104 bits.

Hawking predicted that black holes slowly radiate away their mass when they’re in a cold enough environment. But the Universe is much too hot for supermassive black holes to be losing mass now. Instead, they very slowly grow by eating the cosmic microwave background, even when they’re not eating stars, gas and dust.

So, only in the far future will the Universe cool down enough for large black holes to start slowly decaying via Hawking radiation. Entropy will continue to increase… going mainly into photons and gravitons! This process will take a very long time. Assuming nothing is falling into it and no unknown effects intervene, a solar-mass black hole takes about 1067 years to evaporate due to Hawking radiation — while a really big one, comparable to the mass of a galaxy, should take about 1099 years.

If our current most popular ideas on dark energy are correct, the Universe will continue to expand exponentially. Thanks to this, there will be a cosmological event horizon surrounding each observer, which will radiate Hawking radiation at a temperature of roughly 10-30 kelvin.

In this scenario the Universe in the very far future will mainly consist of massless particles produced as Hawking radiation at this temperature: photons and gravitons. The entropy within the exponentially expanding ball of space that is today our ‘observable Universe’ will continue to increase exponentially… but more to the point, the entropy density will approach that of a gas of photons and gravitons in thermal equilibrium at 10-30 kelvin.

Of course, it’s quite likely that some new physics will turn up, between now and then, that changes the story! I hope so: this would be a rather dull ending to the Universe.

For more details, go here:

• Chas A. Egan and Charles H. Lineweaver, A larger estimate of the entropy of the universe, The Astrophysical Journal 710 (2010), 1825.

Also read my page on information.

Topos Theory (Part 4)

21 January, 2020

In Part 1, I said how to push sheaves forward along a continuous map. Now let’s see how to pull them back! This will set up a pair of adjoint functors with nice properties, called a ‘geometric morphism’.

First recall how we push sheaves forward. I’ll say it more concisely this time. If you have a continuous map f \colon X \to Y between topological spaces, the inverse image of any open set is open, so we get a map

f^{-1} \colon \mathcal{O}(Y) \to \mathcal{O}(X)

A functor between categories gives a functor between the opposite categories. I’ll use the same name for this, if you can stand it:

f^{-1} \colon \mathcal{O}(Y)^{\mathrm{op}} \to \mathcal{O}(X)^{\mathrm{op}}

A presheaf on X is a functor

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

and we can compose this with f^{-1} to get a presheaf on Y,

F \circ f^{-1} \colon \mathcal{O}(Y)^{\mathrm{op}} \to \mathsf{Set}

We call this presheaf on Y the direct image or pushforward of F along f, and we write it as f_\ast F. In a nutshell:

f_\ast F = F \circ f^{-1}

Even better, this direct image operation extends to a functor from the category of presheaves on X to the category of presheaves on Y:

f_\ast \colon \widehat{\mathcal{O}(X)} \to \widehat{\mathcal{O}(Y)}

Better still, this functor sends sheaves to sheaves, so it restricts to a functor

f_\ast \colon \mathsf{Sh}(X) \to \mathsf{Sh}(X)

This is how we push forward sheaves on X to get sheaves on Y.

All this seems very natural and nice. But now let’s stop pushing and start pulling! This will give a functor going the other way:

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

The inverse image of a sheaf

At first it seems hard how to pull back sheaves, given how natural it was to push them forward. This is where our second picture of sheaves comes in handy!

Remember, a bundle over a topological space Y is a topological space E equipped with a continuous map

p \colon E \to Y

We say it’s an etale space over Y if it has a special property: each point e \in E has an open neighborhood such that p restricted to this neighborhood is a homeomorphism from this neighborhood to an open subset of Y. In Part 2 we defined the category of bundles over X, which is called \mathsf{Top}/X, and the full subcategory of this whose objects are etale spaces, called \mathsf{Etale}(X). I also sketched how we get an equivalence of categories

\mathsf{Sh}(X) \simeq \mathsf{Etale}(X)

So, to pull back sheaves we can just convert them into etale spaces, pull those back, and then convert them back into sheaves!

First I’ll tell you how to pull back a bundle. I’ll assume you know the general concept of ‘pullbacks’, and what they’re like in the category of sets. The category of topological spaces and continuous maps has pullbacks, and they work a lot like they do in the category of sets. Say we’re given a bundle over Y, which is really just a continuous map

p \colon E \to Y

and a continuous map

f \colon X \to Y

Then we can form their pullback and get a bundle over X called

f^\ast(p) \colon f^\ast(E) \to X

In class I’ll draw the pullback diagram, but it’s too much work to do here! As a set,

f^\ast E = \{ (e,x) \in E \times X \; \colon \; p(e) = f(x) \}

It’s a subset of E \times X, and we make it into a topological space using the subspace topology. The map

f^\ast p  \colon f^\ast E \to X

does the obvious thing: it sends (e,x) to x.

Puzzle. Prove that this construction really obeys the universal property for pullbacks in the category \mathsf{Top} where objects are topological space and morphisms are continuous maps.

Puzzle. Show that this construction extends to a functor

f^\ast \colon \mathsf{Top}/Y \to \mathsf{Top}/X

That is, find a natural way to define the pullback of a morphism between bundles, and prove that this makes f^\ast into a functor.

Puzzle. Prove that if p \colon E \to Y is an etale space over Y, and f \colon X \to Y is any continuous map, then f^\ast p \colon f^\ast E \to X is an etale space over X.

Putting these puzzles together, it instantly follows that we can restrict the functor

f^\ast \colon \mathsf{Top}/Y \to \mathsf{Top}/X

to etale spaces and morphisms between those, and get a functor

f^\ast \colon \mathsf{Etale}(Y) \to \mathsf{Etale}(X)

Using the equivalence

\mathsf{Sh}(X) \simeq \mathsf{Etale}(X)

we then get our desired functor

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

called the inverse image or pullback functor.

Slick! But what does the inverse image of a sheaf actually look like?

Suppose we have a sheaf F on Y and a continuous map f \colon X \to Y. We get an inverse image sheaf f^\ast(F) on X. But what is it like, concretely?

That is, suppose we have an open set U \subseteq X. What does an element s of (f^\ast F)(U) amount to?

Unraveling the definitions, s must be a section over U of the pullback along f of the etale space corresponding to F.

A point in the etale space corresponding to F is the germ at some y \in Y of some s \in F(V) where V is some open neighborhood of y.

Thus, our section s is just a continuous function sending each point x \in U to some germ of this sort at y = f(x).

There is more to say: we could try to unravel the definitions a bit more, and describe (f^\ast F)(U) directly in terms of the sheaf F, without mentioning the corresponding etale space! But maybe one of you reading this can do that more gracefully than I can.

The adjunction between direct and inverse image functors

Once they have direct and inverse images in hand, Mac Lane and Moerdijk prove the following as Theorem 2 in Section II.9:

Theorem. For any continuous map f \colon X \to Y, the direct image functor

f_\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

is left adjoint to the inverse image functor:

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

I won’t do it here, so please look at their proof if you’re curious! As you might expect, it involves hopping back and forth between our two pictures of sheaves: as presheaves with an extra property, and as bundles with an extra property — namely, etale spaces.

I don’t think there’s anything especially sneaky about their argument. They do however use this: if you take a sheaf, and convert it into an etale space, and convert that back into a sheaf, you get back where you started up to natural isomorphism. This isomorphism is just the counit \eta that I mentioned in Part 3.

Remember, the functor that turns presheaves into bundles

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

is left adjoint to the functor that turns bundles into presheaves:

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

So, there’s a unit

\epsilon \colon 1 \Rightarrow \Gamma \Lambda

and a unit

\eta \colon \Lambda \Gamma \Rightarrow 1

The fact we need now is that whenever a presheaf F is a sheaf, its counit

\eta_F \colon \Lambda \Gamma F \to F

is an isomorphism. This is part of Theorem 2 in Section II.6 in Mac Lane and Moerdijk.

And by the way, this fact has a partner! Whenever a bundle is an etale space, its unit is an isomorphism. So, converting an etale space into a sheaf and then back into an etale space also gets you back where you started, up to natural isomorphism. But the favored direction of this morphism is in the other direction: any sheaf maps to the sheaf of sections of its associated etale space, while any bundle maps to the etale space of its sheaf of sections.

Topos Theory (Part 3)

13 January, 2020

Last time I described two viewpoints on sheaves. In the first, a sheaf on a topological space X is a special sort of presheaf

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

Namely, it’s one obeying the ‘sheaf condition’.

I explained this condition in Part 1, but here’s a slicker way to say it. Suppose U \subseteq X is an open set covered by a collection of open sets U_i \subseteq U. Then we get this diagram:

\displaystyle{ FU \rightarrow \prod_i FU_i \rightrightarrows \prod_{i,j} F(U_i \cap U_j) }

The first arrow comes from restricting elements of FU to the smaller sets U_i. The other two arrows come from this: we can either restrict from FU_i to F(U_i \cap U_j), or restrict from FU_j to F(U_i \cap U_j).

The sheaf condition says that this diagram is an equalizer! This is just another way of saying that a family of s_i \in FU_i are the restrictions of a unique s \in FU iff their restrictions to the overlaps U_i \cap U_j are equal.

In the second viewpoint, a sheaf is a bundle over X

p \colon Y \to X

with the special property of being ‘etale’. Remember, this means that every point in Y has an open neighborhood that’s mapped homeomorphically onto an open neighborhood in X.

Last time I showed you how to change viewpoints. We got a functor that turns presheaves into bundles

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

and a functor that turns bundles into presheaves:

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

Moreover, I claimed \Lambda actually turns presheaves into etale spaces, and \Gamma actually turns bundles into sheaves. And I claimed that these functors restrict to an equivalence between the category of sheaves and the category of etale spaces:

\mathsf{Sh}(X) \simeq  \mathsf{Etale}(X)

What can we do with these ideas? Right away we can do two things:

• We can describe ‘sheafification’: the process of improving a presheaf to get a sheaf.

• We can see how to push forward and pull back sheaves along a continuous map between spaces.

I’ll do the first now and the second next time. I’m finding it pleasant to break up these notes into small bite-sized pieces, shorter than my actual lectures.


To turn a presheaf into a sheaf, we just hit it with \Lambda and then with \Gamma. In other words, we turn our presheaf into a bundle and then turn it back into a presheaf. It turns out the result is a sheaf!

Why? The reason is this:

Theorem. If we apply the functor

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

to any object, the result is a sheaf on X.

(The objects of \mathsf{Top}/X are, of course, the bundles over X.)

Proving this theorem was a puzzle last time; let me outline the solution. Remember that if we take a bundle

p \colon Y \to X

and hit it with \Gamma, we get a presheaf called \Gamma_p where \Gamma_p U is the set of sections of Y over X, and we restrict sections in the usual way, by restricting functions. But you can check that if we have an open set U covered by a bunch of open subsets U_i, and a bunch of sections s_i on the U_i that agree on the overlaps U_i \cap U_j, these sections piece together to define a unique section on all of U that restricts to each of the s_i. So, \Gamma_p is a sheaf!

It follows that \Gamma \Lambda sends presheaves to sheaves. Since sheaves are a full subcategory of presheaves, any \Gamma \Lambda automatically sends any morphism of presheaves to a morphism of sheaves, and we get the sheafification functor

\Gamma \Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Sh}(X)

To fully understand this, it’s good to actually take a presheaf and sheafify it. So take a presheaf:

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

When we hit this with \Lambda, we get a bundle

p \colon \Lambda F \to X

Remember: any element of F(U) for any open neighborhood U of x gives a point over x in \Lambda F, all points over x show up this way, and two such elements s \in F(U), s' \in F(U') determine the same point iff they become equal when we restrict them to some sufficiently small open neighborhood of x.

When we hit this bundle with \Gamma, we get a sheaf

\Gamma \Lambda F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

where (\Gamma \Lambda F)U is the set of sections of p over U. This is the sheafification of F.

So, if you think about it, you’ll see this: to define a section of the sheafification of F over an open set U, you can just take a bunch of sections of F over open sets covering U that agree when restricted to the overlaps.

Puzzle. Prove the above claim. Give a procedure for constructing a section of \Gamma \Lambda F over U given open sets U_i \subseteq U covering U and sections s_i of F over the U_i that obey

\displaystyle{ s_i|_{U_i \cap U_j} = s_j|_{U_i \cap U_j} }

The adjunction between presheaves and bundles

Here’s one nice consequence of the last puzzle. We can always use the trivial cover of U by U itself! Thus, any section of F over U gives a section of \Gamma \Lambda F over U. This is the key to the following puzzle:

Puzzle. Show that for any presheaf F there is morphism of presheaves

\eta_F \colon F \to \Gamma \Lambda F

Show that these morphisms are natural in F, so they define a natural transformation \eta \colon 1 \Rightarrow \Lambda \Gamma.

Now, this is just the sort of thing we’d expect if \Lambda were the left adjoint of \Gamma. Remember, when you have a left adjoint L \colon C \to D and a right adjoint R \colon D \to C, you always have a ‘unit’

\eta \colon 1 \Rightarrow R L

and a ‘counit’

\epsilon \colon L R \Rightarrow 1

where the double arrows stand for natural transformations.

And indeed, in Part 2 I claimed that \Lambda is the left adjoint of \Gamma. But I didn’t prove it. What we’re doing now could be part of the proof: in fact Mac Lane and Moerdijk prove it this way in Theorem 2 of Section II.6.

Let’s see if we can construct the counit

\epsilon \colon \Lambda \Gamma \Rightarrow 1

For this I hand you a bundle

p \colon Y \to X

You form its sheaf of sections \Gamma_p, and then you form the etale space \Lambda \Gamma_p of that. Then you want to construct a morphism of bundles \eta_p from your etale space \Lambda \Gamma_p to my original bundle.

Mac Lane and Moerdijk call the construction ‘inevitable’. Here’s how it works. We get points in \Lambda \Gamma_p over x \in X from sections of p \colon Y \to X over open sets containing x. But you can just take one of these sections and evaluate it at x and get a point in Y.

Puzzle. Show that this procedure gives a well-defined continuous map

\epsilon_p \colon \Lambda \Gamma_p \to Y

and that this is actually a morphism of bundles over X. Show that these morphisms define a natural transformation \eta \colon \Lambda \Gamma \Rightarrow 1.

Now that we have the unit and counit, if you’re feeling ambitious you can show they obey the two equations required to get a pair of adjoint functors, thus solving the following puzzle:

Puzzle. Show that

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

is left adjoint to

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

If you’re not feeling so ambitious, just look at Mac Lane and Moerdijk’s proof of Theorem 2 in Section II.6!

Can We Fix The Air?

12 January, 2020

A slightly different version of this article I wrote first appeared in Nautilus on November 28, 2019.

Water rushes into Venice’s city council chamber just minutes after the local government rejects measures to combat climate change. Wildfires consume eastern Australia as fire danger soars past “severe” and “extreme” to “catastrophic” in parts of New South Wales. Ice levels in the Chukchi Sea, north of Alaska, hit record lows. England sees floods all across the country. And that’s just this week, as I write this.

Human-caused climate change, and the disasters it brings, are here. In fact, they’re just getting started. What will things be like in another decade, or century?

It depends on what we do. If our goal is to stop global warming, the best way is to cut carbon emissions now—to zero. The United Kingdom, Denmark, and Norway have passed laws requiring net zero emissions by 2050. Sweden is aiming at 2045. But the biggest emitters—China, the United States, and India—are dragging their heels. So to keep global warming below 2 degrees Celsius over pre-industrial levels by 2100, it’s becoming more and more likely that we’ll need negative carbon emissions:

That is, we’ll need to fix the air. We’ll need to suck more carbon dioxide out of the atmosphere than we put in.

This may seem like a laughably ambitious goal. Can we actually do it? Or is it just a fantasy? I want to give you a sense of what it would take. But first, here’s one reason this matters. Most people don’t realize that large negative carbon emissions are assumed in many of the more optimistic climate scenarios. Even some policymakers tasked with dealing with climate change don’t know this.

In 2016, climate scientists Kevin Anderson and Glen Peters published a paper on this topic, called “The trouble with negative emissions.” The title is a bit misleading, since they are not against negative emissions. They are against lulling ourselves into complacency by making plans that rely on negative emissions—because we don’t really know how to achieve them at the necessary scale. We could be caught in a serious bind, with the poorest among us taking the biggest hit.

So, how much negative carbon emissions do we need to stay below 2 degrees Celsius of warming, and how people are hoping to achieve them? Let’s dive in!

In 2018, humans put about 37 billion tonnes of carbon dioxide into the air. A “tonne” is a metric ton, a bit larger than a US ton. Since the oxygen is not the problem—carbon dioxide consisting of one atom of carbon and two of oxygen—it might make more sense to count tonnes of carbon. But it’s customary to keep track of carbon by its carbon dioxide equivalent, so I’ll do that here. The National Academy of Sciences says that to keep global warming below 2 degrees Celsius by the century’s end, we will probably need to be removing about 10 billion tonnes of carbon dioxide from the air each year by 2050, and double that by 2100. How could we do this?

Whenever I talk about this, I get suggestions. Many ignore the sheer scale of the problem. For example, a company called Climeworks is building machines that suck carbon dioxide out of the air using a chemical process. They’re hoping to use these gadgets to make carbonated water for soft drinks—or create greenhouses that have lots of carbon dioxide in the air, for tastier vegetables. This sounds very exciting…until you learn that currently their method of getting carbon dioxide costs about $500 per ton. It’s much cheaper to make the stuff in other ways; beverage-grade carbon dioxide costs about a fifth as much. But even if they bring down the price and become competitive in their chosen markets, greenhouses and carbonation use only 6 million tonnes of carbon dioxide annually. This is puny compared to the amount we need to remove.

Thus, the right way to think of Climeworks is as a tentative first step toward a technology that might someday be useful for fighting global warming—but only if it can be dramatically scaled up and made much cheaper. The idea of finding commercial uses for carbon dioxide as a stepping-stone, a way to start developing technologies and bringing prices down, is attractive. But it’s different from finding commercial uses that could make a serious dent in our carbon emissions problem.

Here’s another example: using carbon dioxide from the air to make plastics. There’s a company called RenewCO2 that wants to do this. But even ignoring the cost, it’s clear that such a scheme could remove 10 billion tonnes of carbon dioxide from the air each year only if we drastically ramped up our production of plastics. In 2018, we made about 360 million tonnes of plastic. So, we’d have to boost plastic production almost ten-fold. Furthermore, we’d have to make all this plastic without massively increasing our use of fossil fuels. And that’s a general issue with schemes to fix the air. If we could generate a huge abundance of power in a carbon-free way—say from nuclear, solar, or wind—we could use some of that power to remove carbon dioxide from the atmosphere. But for the short term, a better use of that power is to retire carbon-burning power plants. Thus, while we can dream about energy-intensive methods of fixing the air, they will only come into their own—if ever—later in the century.

If plastics aren’t big enough to eat up 10 billion tonnes of carbon dioxide per year, what comes closer? Agriculture. I’m having trouble finding the latest data, but in 2004 the world created roughly 5 billion tonnes of “crop residue”: stems, leaves, and such left over from growing food. If we could dispose of most of this residue in a way that would sequester the carbon, that would count as serious progress. Indeed, environmental engineer Stuart Strand and physicist Gregory Benford—also a noted science fiction writer—have teamed up to study what would happen if we dumped bales of crop residue on the ocean floor. Even though this stuff would rot, it seems that the gases produced will take hundreds of years to resurface. And there’s plenty of room on the ocean floor.

Short of a massive operation to sink crop residues to the bottom of the sea, there are still many other ways to improve agriculture so that the soil accumulates more carbon. For example, tilling the land less reduces the rate at which organic matter decays and carbon goes back into the air. You can actually fertilize the land with half-burnt plant material full of carbon, called “biochar.” Planting crops with bigger roots, or switching from annual crops to perennials, also helps. These are just a few of the good ideas people have had. While agriculture and soil science are complex, and you probably don’t want to get into the weeds on this, the National Academy of Sciences estimates that we could draw down 3 billion tonnes of carbon dioxide per year from improved agriculture. That’s huge.

Having mentioned agriculture, it’s time to talk about forests. Everyone loves trees. However, it’s worth noting that a mature forest doesn’t keep on pulling down carbon at a substantial rate forever. Yes, carbon from the air goes to form wood and organic material in the soil. But decaying wood and organic material releases carbon back into the air. A climax forest is close to a steady state: the rate at which it removes carbon from the air is roughly equal to the rate at which it releases this carbon. So, the time when a forest pulls down the most carbon is when it’s first growing.

In July 2019, a paper in Science argued that the Earth has room for almost 4 million square miles of new forests. The authors claimed that as these new trees grow, they could pull down about 730 billion tonnes of carbon dioxide.

At first this sounds great. But remember, we are putting out 37 billion tonnes a year. So, the claim is that if we plant new forests over an area somewhat larger than the US, they will absorb the equivalent of roughly 20 years of carbon emissions. In short, this heroic endeavor would buy us time, but it wouldn’t be a permanent solution. Worse, many other authors have argued that the Science paper was overly optimistic. One rebuttal points out that it mistakenly assumed treeless areas have no organic carbon in the soil already. It also counted on a large increase of forests in regions that are now grassland or savanna. With such corrections made, it’s possible that new forests could only pull down at most 150 billion tonnes of carbon dioxide.

That’s still a lot. But getting people to plant vast new forests will be hard. Working with more realistic assumptions, the National Academy of Sciences says that in the short term we could draw down 2.5 billion tonnes of carbon dioxide per year by planting new forests and better managing existing ones. In short: If we push really hard, better agriculture and forestry could pull 5.5 billion tonnes of carbon dioxide from the air each year. One great advantage of both these methods is that they harness the marvelous ability of plants to turn carbon dioxide into complex organic compounds in a solar-powered way—much better than any technology humans have devised so far. If we ever invent new technologies that do better, it’ll probably be because we’ve learned some tricks from our green friends.

And here’s another way plants can help: biofuels. If we burn fuels that come from plants, we’re taking carbon out of the atmosphere and putting it right back in: net zero carbon emissions, roughly speaking. That’s better than fossil fuels, where we dig carbon up from the ground and burn it. But it would be even better if we could burn plants as fuels but then capture the carbon dioxide, compress it, and pump it underground into depleted oil and gas fields, unmineable coal seams, and the like.

To do this, we probably shouldn’t cut down forests to clear space for crops that we burn. Turning corn into ethanol is also rather inefficient, though the corn lobby in the U.S. has persuaded the government to spend lots of money on this, and about 40 percent of all corn grown in the U.S. now gets used this way. Suppose we just took all available agricultural, forestry, and municipal waste, like lawn trimmings, food waste, and such, to facilities able to burn it and pump the carbon dioxide underground. All this stuff ultimately comes from plants sucking carbon from the air. So, how much carbon dioxide could we pull out of the atmosphere this way? The National Academy of Sciences says up to 5.2 billion tonnes per year.

Of course, we can’t do this and also sink all agricultural waste into the ocean—that’s just another way of dealing with the same stuff. Furthermore, this high-end figure would require immensely better organization than we’ve been able to achieve so far. And there are risks involved in pumping lots of carbon dioxide underground.

What other activities could draw down lots of carbon? It pays to look at the biggest human industries: biggest, that is, in terms of sheer mass being processed. For example, we make lots of cement. Global cement production in 2017 was about 4.5 billion tons, with China making more than the rest of the world combined, and a large uncertainty in how much they made. As far as I know, only digging up and burning carbon is bigger: for example, 7.7 billion tons of coal is being mined per year.

Right now cement is part of the problem: To make the most commonly used kind we heat limestone until it releases carbon dioxide and becomes “quicklime.” Only about 7 percent of the total carbon we emit worldwide comes from this process—but that still counts for more than the entire aviation industry. Some scientists have invented cement that absorbs carbon dioxide as it dries. It has not yet caught on commercially, but the pressure on the industry is increasing. If we could somehow replace cement with a substance made mostly of carbon pulled from the atmosphere, and do it in an economically viable way, that would be huge. But this takes us into the realm of technologies that haven’t been invented yet.

New technologies may in fact hold the key to the problem. In the second half of the century we should be doing things that we can’t even dream of yet. In the next century, even more so. But it takes time to perfect and scale up new technologies. So it makes sense to barrel ahead with what we can do now, then shift gears as other methods become practical. Merely waiting and hoping is not wise.

Totaling up some of the options I’ve listed, we could draw down 1 billion tonnes of carbon dioxide by planting trees, 1.5 billion by better forest management, 3 billion by better agricultural practices, and up to 5.2 billion by biofuels with carbon capture. This adds up to over 10 billion tonnes per year. It’s not nearly enough to cancel the 37 billion tonnes we’re dumping into the air each year now. But combined with strenuous efforts to cut emissions, we might squeak by, and keep global warming below 2 degrees Celsius.

Even if we try, we are far from guaranteed to succeed—Anderson and Peters are right to warn about this. But will we even try? This is more a matter of politics and economics than of science and technology. The engineer Saul Griffith said that dealing with global warming is not like the Manhattan Project—it’s like the whole of World War II but with everyone on the same side. He was half right: We are not all on the same side. Not yet, anyway. Getting leaders who are inspired by these huge challenges, rather than burying their heads in the sand, would be a big step in the right direction.

Topos Theory (Part 2)

7 January, 2020

Last time I defined sheaves on a topological space X; this time I’ll say how to get these sheaves from ‘bundles’ over X. You may or may not have heard of bundles of various kinds, like vector bundles or fiber bundles. If you have, be glad: the bundles I’m talking about now include these as special cases. If not, don’t worry: the bundles I’m talking about now are much simpler!

A bundle over X is simply a topological space Y equipped with a continuous map to X, say

p \colon Y \to X

You should visualize Y as hovering above X, and p as projecting points y \in Y down to their shadows p(y) in X. This explains the word ‘over’, the term ‘projection’ for the map p, and many other things. It’s a powerful metaphor.

Bundles are not only a great source of examples of sheaves; in fact every sheaf comes from a bundle! Conversely, every sheaf—and even every presheaf—gives rise to a bundle.

But these constructions, which I’ll explain, do not give an equivalence of categories. That is, sheaves are not just another way of thinking about bundles, and neither are presheaves. Instead, we’ll get adjoint functors between the category of presheaves on X and the category of bundles X, and these will restrict to give an equivalence between the category of ‘nice’ presheaves on X—namely, the sheaves—and a certain category of ‘nice’ bundles over X, which are called ‘etale spaces’.

Thus, in the end we’ll get two complementary viewpoints on sheaves: the one I discussed last time, and another, where we think of them as specially nice bundles over X. In Sections 2.8 and 2.9 Mac Lane and Moerdijk use these complementary viewpoints to efficiently prove some of the big theorems about sheaves that I stated last time.

Before we get going, a word about a word: ‘etale’. This is really a French word, ‘étalé’, meaning ‘spread out’. We’ll see why Grothendieck chose this word. But now I mainly just want to apologize for leaving out the accents. I’m going to be typing a lot, it’s a pain to stick in those accents each time, and in English words with accents feel ‘fancy’.

From bundles to presheaves

Any bundle over X, meaning any continuous map

p \colon Y \to X,

gives a sheaf over X. Here’s how. Given an open set U \subseteq X, define a section of p over U to be a continuous function

s \colon U \to Y

such that

p \circ s = 1_U

In terms of pictures (which I’m too lazy to draw here) s maps each point of U to a point in Y ‘sitting directly over it’. There’s a presheaf \Gamma_p on X that assigns to each open set U \subset X the set of all sections of p over U:

\Gamma_p U = \{s: \; s \textrm{ is a section of } p \textrm{ over } U \}

Of course, to make \Gamma_p into a presheaf we need to say how to restrict sections over U to sections over a smaller open set, but we do this in the usual way: by restricting a function to a subset of its domain.

Puzzle. Check that with this choice of restriction maps \Gamma_p is a presheaf, and in fact a sheaf.

There’s actually a category of bundles over X. Given bundles

p \colon Y \to X


p' \colon Y' \to X

a morphism from the first to the second is a continuous map

f \colon Y \to Y'

making the obvious triangle commute:

p' \circ f = p

I’m too lazy to draw this as a triangle, so if you don’t see it in your mind’s eye you’d better draw it. Draw Y and Y' as two spaces hovering over X, and f as mapping each point in Y over x \in X to a point in Y' over the same point x.

We can compose morphisms between bundles over X in an evident way: a morphism is a continuous map with some property, so we just compose those maps. We thus get a category of bundles over X, which is called \mathsf{Top}/X.

I’ve told you how a bundle over X gives a presheaf on X. Similarly, a morphism of bundles over X gives a morphism of presheaves on X. Because this works in a very easy way, it should be no surprise that this gives a functor, which we call

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

Puzzle. Suppose we have two bundles over X, say p \colon Y \to X and p' \colon Y' \to X, and a morphism from the first to the second, say f \colon Y \to Y'. Suppose s \colon U \to Y is a section of the first bundle over the open set U \subset X. Show that f \circ s is a section of the second bundle over U. Use this to describe what the functor \Gamma does on morphisms, and check functoriality.

From presheaves to bundles

How do we go back from presheaves to bundles? Start with a presheaf

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

on X. To build a bundle over X, we’ll start by building a bunch of sets called \Lambda(F)_x, one for each point x \in X. Then we’ll take the union of these and put a topology on it, getting a space called \Lambda(F). There will be a map

p \colon \Lambda(F) \to X

sending all the points in \Lambda(F)_x to x, and this will be our bundle over x.

How do we build these sets \Lambda(F)_x? Our presheaf

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

doesn’t give us sets for points of X, just for open sets. So, we should take some sort of ‘limit’ of the sets F U over smaller and smaller open neighborhoods U of x. Remember, if U' \subseteq U our presheaf gives a restriction map

F U \to FU'

So, what we’ll actually do is take the colimit of all these sets FU, as U ranges over all neighborhoods of x. That gives us our set \Lambda(F)_x.

It’s good to ponder what elements of \Lambda(F)_x are actually like. They’re called germs at x, which is a nice name, because you can only see them under a microscope! For example, suppose F is the sheaf of continuous real-valued functions, so that FU consists of all continuous functions from U to \mathbb{R}. By the definition of colimit, for any open neighborhood U of x we have a map

FU \to \Lambda(F)_x

So any continuous real-valued function defined on any open neighborhood of x gives a ‘germ’ of a function on x. But also by the definition of colimit, any two such functions give the same germ iff they become equal when restricted to some open neighborhood of x. So the germ of a function is what’s left of that function as you zoom in closer and closer to the point x.

(If we were studying analytic functions on the real line, the germ at x would remember exactly their Taylor series at that point. But smooth functions have more information in their germs, and continuous functions are weirder still. For more on germs, watch this video.)

Now that we have the space of germs \Lambda(F)_x for each point x \in X, we define

\Lambda(F) = \bigcup_{x \in X} \Lambda(F)_x

There is then a unique function

p \colon \Lambda(F) \to X

sending everybody in \Lambda(F)_x to x. So we’ve almost gotten our bundle over X. We just need to put a topology on \Lambda(X).

We do this as follows. We’ll give a basis for the topology, by describing a bunch of open neighborhoods of each point in \Lambda(F). Remember, any point in \Lambda(F) is a germ. More specifically, any point in \Lambda(F) is in some set \Lambda(F)_x, so it’s the germ of some s \in FU where U is an open neighborhood of x. But this s has lots of other germs, too, namely its germs at all points y \in U. We take this collection of all these germs to be an open neighborhood of x. A general open set in \Lambda(F) will then be an arbitrary union of sets like this.

Puzzle. Show that with this topology on \Lambda(F), the map p \colon \Lambda(F) \to X is continuous.

Thus any presheaf on X gives a bundle over X.

Puzzle. Describe how a morphism of presheaves on X gives a morphism of bundles over X, and show that your construction defines a functor

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

Etale spaces

So now we have functors that turn bundles into presheaves:

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

and presheaves into bundles:

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

But we have already seen that the presheaves coming from bundles are ‘better than average’: they are sheaves! Similarly, the bundles coming from presheaves are better than average. They are ‘etale spaces’.

What does this mean? Well, if you think back on how we took a presheaf F and gave \Lambda(F) a topology a minute ago, you’ll see something very funny about that topology. Each point in \Lambda(F) has a neighborhood such that

p \colon \Lambda(F) \to X

restricted to that neighborhood is a homeomorphism. Indeed, remember that each point in \Lambda(F) is a germ of some

s \in F U

for some open U \subseteq X. We made the set of all germs of s into an open set in \Lambda(F). Call that open set V.

Puzzle. Show that p is a homeomorphism from V to U.

In class I’ll draw a picture of what’s going on. \Lambda(F) is a space sitting over X has lots of open sets V that look exactly like open sets U down in X. In terms of our visual metaphor, these open sets V are ‘horizontal’, which is why we invoke the term ‘etale’:

Definition. A bundle p \colon Y \to X is etale if each point y \in Y has an open neighborhood V such that p restricted to V is a homeomorphism from V to an open subset of X. We often call such a bundle an etale space over X.

So, if you did the last puzzle, you’ve shown that any presheaf on X gives an etale space over X.

(By the way, if you know about covering spaces, you should note that every covering space of X is an etale space over X but not conversely. In a covering space p \colon Y \to X we demand that each point down below, in X, has a neighborhood U such that p^{-1}(U) is a disjoint union of open sets homeomorphic to U, with p restricting to homeomorphism on each of these open sets. In an etale space we merely demand that each point up above, in Y, has a neighborhood V such that p restricted to V is a homeomorphism. This is a weaker condition. In general, etale spaces are rather weird if you’re used to spaces like manifolds: for example, Y will often not be Hausdorff.)

Sheaves versus etale spaces

Now things are nicely symmetrical! We have a functor that turns bundles into presheaves

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

but in fact it turns bundles into sheaves. We have a functor that turns presheaves into bundles

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

but in fact it turns presheaves into etale spaces.

Last time we defined \mathsf{Sh}(X) to be the full subcategory of \widehat{\mathcal{O}(X)} having sheaves as objects. Now let’s define \mathsf{Etale}(X) to be the full subcategory of \mathsf{Top}/X having etale spaces as objects. And here’s the punchline:

Theorem. The functor

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

is left adjoint to the functor

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

Moreover, if we restrict these functors to the subcategories \mathsf{Sh}(X) and \mathsf{Etale}(X), we get an equivalence of categories

\mathsf{Sh}(X) \simeq  \mathsf{Etale}(X)

The proof involves some work but also some very beautiful abstract nonsense: see Theorem 2, Corollary 3 and Lemma 4 of Section II.6. There’s a lot more to say, but this seems like a good place to stop.

Topos Theory (Part 1)

5 January, 2020

I’m teaching an introduction to topos theory this quarter, loosely based on Mac Lane and Moerdijk’s Sheaves in Geometry and Logic.

I’m teaching one and a half hours each week for 10 weeks, so we probably won’t make it far very through this 629-page book. I may continue for the next quarter, but still, to make good progress I’ll have to do various things.

First, I’ll assume basic knowledge of category theory, a lot of which is explained in the Categorical Preliminaries and Chapter 1 of this book. I’ll start in with Chapter 2. Feel free to ask questions!

Second, I’ll skip a lot of proofs and focus on stating definitions and theorems, and explaining what they mean and why they’re interesting.

These notes to myself will be compressed versions of what I will later write on the whiteboard.


Topos theory emerged from Grothendieck’s work on algebraic geometry; he developed it as part of his plan to prove the Weil Conjectures. It was really just one of many linked innovations in algebraic geometry that emerged from the French school, and it makes the most sense if you examine the whole package. Unfortunately algebraic geometry takes a long time to explain! But later Lawvere and Tierney realized that topos theory could serve as a grand generalization of logic and set theory. This logical approach is more self-contained, and easier to explain, but also a bit more dry—at least to me. I will try to steer a middle course, and the title Sheaves in Geometry and Logic shows that Mac Lane and Moerdijk were trying to do this too.

The basic idea of algebraic geometry is to associate to a space the commutative ring of functions on that space, and study the geometry and topology of this space using that ring. For example, if X is a compact Hausdorff space there’s a ring C(X) consisting of all continuous real-valued functions on X, and you can recover X from this ring. But algebraic geometers often deal with situations where there aren’t enough everywhere-defined functions (of the sort they want to consider) on a space. For example, the only analytic functions on the Riemann sphere are constant functions. That’s not good enough! Most analytic functions on the Riemann sphere have poles, and are only defined away from these poles. (I’m giving an example from complex analysis, in hopes that more people will get what I’m talking about, but there are plenty of purely algebraic examples.)

This forced algebraic geometers to invent ‘sheaves’, around 1945 or so. The idea of a sheaf is that instead of only considering functions defined everywhere, we look at functions defined on open sets.

So, let X be a topological space and let \mathcal{O}(X) be the collection of open subsets of X. This is a poset with inclusion as the partial ordering, and thus it is a category. A presheaf is a functor

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

So, a sheaf assigns to each open set U a set F U. It allows us to restrict an element of F U to any smaller open set U' \subseteq U, and a couple of axioms hold, which are encoded in the word ‘functor’. Note the ‘op’: that’s what lets us restrict elements of F U to smaller open sets.

The example to keep in mind is where F U consists of functions on U (that is, functions of the sort we want to consider, such as continuous or smooth or analytic functions). However, other examples are important too.

In many of these examples something nice happens. First, suppose we have s \in F U and an open cover of U by open sets U_i. Then we can restrict s to U_i getting something we can call s|_{U_i}. We can then further restrict this to U_i \cap U_j. And by the definition of presheaf, we have

(s|_{U_i})|_{U_i \cap U_j} = (s|_{U_j})|_{U_i \cap U_j}

In other words, if we take a guy in F U and restrict it to a bunch of open sets covering U, the resulting guys agree on the overlaps U_i \cap U_j. Check that this follows from the definition of functor and some other facts!

This is true for any presheaf. A presheaf is a sheaf if we can start the other way around, with a bunch of guys s_i \in F U_i that agree on overlaps:

s_i|_{U_i \cap U_j} = s_j|_{U_i \cap U_j}

and get a unique s \in F U that restricts to all these guys:

s|_{U_i} = s_i

Note this definition secretly has two clauses: I’m saying that in this situation s exists and is unique. If we have uniqueness but not necessarily existence, we say our presheaf is a separated presheaf.

The point of a sheaf is that you can tell if something is in F U by examining it locally. These examples explain what I mean:

Puzzle. Let X = \mathbb{R} and for each open set U \subseteq \mathbb{R} take F U to be the set of continuous real-valued functions on U. Show that with the usual concept of restriction of functions, F is a presheaf and in fact a sheaf.

Puzzle. Let X = \mathbb{R} and for each open set U \subseteq \mathbb{R} take F U to be the set of bounded continuous real-valued functions on U. Show that with the usual concept of restriction of functions, F is a separated presheaf but not a sheaf.

The problem is that a function can be bounded on each open set in an open cover of U yet not bounded on U. You can tell if a function is continuous by examining it locally, but you can’t tell if its bounded!

So, in a sense that should gradually become clear, sheaves are about ‘local truth’.

The category of sheaves on a space

There’s a category of presheaves on any topological space X. Since a presheaf on X is a functor

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set},

a morphism between presheaves is a natural transformation between such functors.

Remember, if \mathsf{C} and \mathsf{D} are categories, we use \mathsf{C}^{\mathsf{D}} to stand for the category where the objects are functors from \mathsf{D} to \mathsf{C}, and the morphisms are natural transformations. This is called a functor category.

So, a category of presheaves is just an example of a functor category, and the category of presheaves on X is called


But this name is rather ungainly, so we make an abbreviation

\widehat{\mathsf{C}} = \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

Then the category of presheaves on X is called


Sheaves are subtler, but we define morphisms of sheaves the exact same way. Every sheaf has an underlying presheaf, so we define a morphism between sheaves to be a morphism between their underlying presheaves. This gives the category of sheaves on X, which we call \mathsf{Sh}(X).

By how we’ve set things up, \mathsf{Sh}(X) is a full subcategory of

Now, what Grothendieck realized is that \mathsf{Sh}(X) acts a whole lot like the category of sets. For example, in the category of sets we can define ‘commutative rings’, but we can copy the definition in \mathsf{Sh}(X) and get ‘sheaves of commutative rings’, and so on. The point is that we’re copying ordinary math, but doing it locally, in a topological space.

Elementary topoi

Lawvere and Tierney clarified what was going on here by inventing the concept of ‘elementary topos’. I’ll throw the definition at you now and explain all the pieces in future classes:

Definition. An elementary topos, or topos for short, is a category with finite limits and colimits, exponentials and a subobject classifier.

I hope you know limits and colimits, since that’s the kind of basic category theory definition I’m assuming. Given two objects x and y in a category, their exponential is an object x^y that acts like the thing of all maps from y to x. I’ll give the actual definition later. A subobject classifier is, roughly, an object \Omega that generalizes the usual set of truth values

2 = \{0,1\}

Namely, subobjects of any object x are in one-to-one correspondence with morphisms from x to \Omega, which serve as ‘characteristic functions’. Again, this is just a sketch: I’ll give the actual definition later, or you can click on the link and read it now.

The point is that an elementary topos has enough bells and whistles that we can ‘do mathematics inside it’. It’s like an alternative universe, a variant of our usual category of sets and functions, where mathematicians can live. But beware: in general, the kind of mathematics we do in an elementary topos is finitistic mathematics using intuitionistic logic.

You see, the category of finite sets is an elementary topos, so you can’t expect to have ‘infinite objects’ like the set of natural numbers in an elementary topos—unless you decree that you want them (which people often do).

Also, we will see that while 2 = \{0,1\} is a Boolean algebra, the subobject classifier of an elementary topos need only be a ‘Heyting algebra’: a generalization of a Boolean algebra in which the law of excluded middle fails. This is actually not weird: it’s connected to the fact that a category of sheaves lets us reason ‘locally’. For example, we don’t just care if two functions are equal or not, we care if they’re equal or not in each open set. So we need a subtler form of logic than classical Boolean logic.

There’s a lot more to say, and I’m just sketching out the territory now, but one of the first big theorems we’re aiming for is this:

Theorem. For any topological space X, \mathsf{Sh}(X) is an elementary topos.

The topos of sheaves \mathsf{Sh}(X) remembers a lot about the topological space X that it came from… so a topos can also be seen as a way of talking about a space! This is even true for elementary topoi that aren’t topoi of sheaves on an actual space. So, topos theory is more than a generalization of set theory. It’s also, in a different way, a generalization of topology.

Grothendieck topoi

You’ll notice that sheaves on X were defined starting with the poset \mathcal{O}(X) of open sets of X. In fact, to define them we never used anything about X except this poset! This suggests that we could define sheaves more generally starting from any poset.

And that’s true—but Grothendieck went further: he defined sheaves starting from any category, as long as that category was equipped with some extra structure saying when a bunch of morphisms f_i \colon x_i \to x serve to ‘cover’ the object x. This extra data is called a ‘coverage’ or more often (rather confusingly) a ‘Grothendieck topology’. A category equipped with a Grothendieck topology is called a ‘site’.

So, Grothendieck figured out how to talk about the category of sheaves \mathsf{Sh}(\mathsf{C}) on any site \mathsf{C}. He did this before Lawvere and Tierney came along, and this was his definition of a topos. So, nowadays we say a category of sheaves on a site is a Grothendieck topos. However:

Theorem. Any Grothendieck topos is an elementary topos.

So, Lawvere and Tierney’s approach subsumes Grothendieck’s, in a sense. Not every elementary topos is a Grothendieck topos, though! For example, the category of finite sets is an elementary topos but not a Grothendieck topos. (It’s not big enough: any Grothendieck topos has, not just finite limits and colimits, but all small limits and colimits.) So both concepts of topos are important and still used. But when I say just ‘topos’, I’ll mean ‘elementary topos’.

Why did Grothendieck bother to generalize the concept of sheaves from sheaves on a topological space to sheaves on a site? He wasn’t just doing it for fun: it was a crucial step in his attempt to prove the Weil Conjectures!

Basically, when you’re dealing with spaces that algebraic geometers like—say, algebraic varieties—there aren’t enough open sets to do everything we want, so we need to use covering spaces as a generalization of open covers. So, instead of defining sheaves using the poset of open subsets of our space X, Grothendieck needed to use the category of covering spaces of X.

That’s the rough idea, anyway.

Geometric morphisms

As you probably know if you’re reading this, category theory is all about the morphisms. This is true not just within a category, but between them. The point of topos theory is not just to study one topos, but many. We don’t want merely to do mathematics in alternative universes: we want to be able to translate mathematics from one alternative universe to another!

So, what are the morphisms between topoi?

First, if you have a continuous map f \colon X \to Y between topological spaces, you can take the ‘direct image’ of a presheaf on X to get a presheaf on Y. Here’s how this works.

The inverse image of any open set is open, so we get an inverse image map

f^{-1} \colon \mathcal{O}(Y) \to \mathcal{O}(X)

sending each open set V \subseteq Y to the open set

f^{-1} V = \{x \in X :\; f(x) \in V \} \subseteq X

Given a presheaf F on X, we define its direct image to be the presheaf on Y given by

(f_\ast F)(V) = F(f^{-1} V)

Note the double reversal here: f maps points in X to points in Y, but open sets in Y give open sets in X, and then presheaves on X give presheaves on Y.

Of course we need to check that it works:

Puzzle. Show that f_\ast F is a presheaf. That is, explain how we can restrict an element of (f_\ast F)(V) to any open set contained in V, and check that we get a presheaf this way.

In fact it works very nicely:

Puzzle. Show that taking direct images gives a functor from the category of presheaves on X to the category of presheaves on Y.

Puzzle. Show that if F is a sheaf on X, its direct image f_\ast F is a sheaf on Y.

The upshot of all this is that a continuous map between topological spaces

f \colon X \to Y

gives a functor between sheaf categories

f_\ast \colon \mathsf{Sh}(X) \to \mathsf{Sh}(Y)

And this functor turns out to be very nice! This is another big theorem we aim to prove later:

Theorem. If f \colon X \to Y is a continuous map between topological spaces, the functor

f_\ast \colon \mathsf{Sh}(X) \to \mathsf{Sh}(Y)

has a left adjoint

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

that preserves finite limits.

This left adjoint is called the inverse image map. Note that because f_\ast has a left adjoint, it is a right adjoint, so it preserves limits. Because f^\ast is a left adjoint, it preserves colimits. The fact that f^\ast preserves finite limits is extra gravy on top of an already nice situation!

We bundle all this niceness into a definition:

Definition. A functor f_\ast \colon \mathsf{T} \to \mathsf{T'} between topoi is a geometric morphism if it has a left adjoint that preserves finite limits.

And this is the most important kind of morphism between topoi. It’s not a very obvious definition, but it’s extracted straight from what happens in examples.

To wrap up, I should add that people usually call the pair consisting of f_\ast \colon \mathsf{T} \to \mathsf{T'} and its left adjoint f^\ast \colon \mathsf{T'} \to \mathsf{T} a geometric morphism. A functor has at most one adjoint, up to natural isomorphism, so my definition is at least tolerable. But I’ll probably switch to the standard one when we get serious about geometric morphisms.

And we will eventually see that geometric morphisms let us translate mathematics from one alternative universe to another!


If this seemed like too much too soon, fear not, I’ll go over it again and actually define a lot of the concepts I merely sketched, like ‘exponentials’, ‘subobject classifier’, ‘Heyting algebra’, ‘Grothendieck topology’, and ‘Grothendieck topos’. I just wanted to get a lot of the main concepts on the table quickly. You should do the puzzles to see if you understand what I wanted you to understand. Unless I made a mistake, all of these are straightforward definition-pushing if you’re comfortable with some basic category theory.

For more background on topos theory I highly recommend this:

• Colin McLarty, The uses and abuses of the history of topos theory.

Abstract. The view that toposes originated as generalized set theory is a figment of set theoretically educated common sense. This false history obstructs understanding of category theory and especially of categorical foundations for mathematics. Problems in geometry, topology, and related algebra led to categories and toposes. Elementary toposes arose when Lawvere’s interest in the foundations of physics and Tierney’s in the foundations of topology led both to study Grothendieck’s foundations for algebraic geometry. I end with remarks on a categorical view of the history of set theory, including a false history plausible from that point of view that would make it helpful to introduce toposes as a generalization from set theory.

There’s also a lot of background material in the book for this course: