Topos Theory (Part 7)

18 February, 2020

I’m almost done explaining why any presheaf category is an elementary topos, meaning that

• it has finite colimits;
• it has finite limits;
• it’s cartesian closed.
• it has a subboject classifier.

Last time I explained why such categories are cartesian closed; now let’s talk about the subobject classifier!

Subobject classifiers

In the category of sets, 0-element and 1-element sets play special roles. The 0-element set is the initial object, and any 1-element set is a terminal object. But 2-element sets are also special! They let us talk about ‘truth values’.

We can take any such set, call it 2, and call its elements ‘true’ and ‘false’. Then subsets of any set X correspond in a one-to-one way with functions

\chi \colon X \to 2

by associating to any subset S \subseteq X the function taking the value ‘true’ on S and ‘false’ elsewhere: its characteristic function.

The idea of a subobject classifier is simply to copy this idea in other categories. A subobject classifier, which we’ll call \Omega, will serve as an ‘object of truth values’ in the category at hand. To make this precise, we’ll demand that subobjects of any object X correspond bijectively to morphisms \chi \colon X \to \Omega.

That’s the idea. But to make the idea useful we need to say how subobjects of X correspond to morphisms \chi \colon X \to \Omega. And of course before that we’ll need to say exactly what a subobject is!

In \mathsf{Set} we can pick out a subset of X using a monomorphism i \colon S \rightarrowtail X, where I use the funny arrow to denote a monomorphism (affectionately known as a ‘mono’). The idea is that the image of i is the subset of X we care about. But two different monos can have the same image, so we have to be careful!

Suppose we have two different monos

i \colon S \rightarrowtail X

and

i' \colon S' \rightarrowtail X

in \mathsf{Set}. When do they have the same image? The answer is: precisely when there’s an isomorphism

f \colon S \to S'

such that

i = i' \circ f.

Puzzle. Prove this.

So, more generally, in any category, we shall say that two monos i \colon S \rightarrowtail X and i' \colon S' \rightarrowtail X are isomorphic iff there’s an isomorphism f \colon S \to S' such that i = i' \circ f.

Definition. A subobject of an object X in a category C is an isomorphism class of monos into X,

We will eventually study subobjects to see how much they behave like subsets—there’s a lot to say about that! But first let’s see what a subobject classifier is.

Again we return to the category \mathsf{Set} to see how this should work. We need to know how subobjects of X \in \mathsf{Set} arise from functions \chi \colon X \to 2. For this we need to know which elements of X get mapped by \chi to ‘true’, so we need to specify which element of the set 2 counts as ‘true’. We do this using a function

\mathrm{true} \colon 1 \rightarrowtail 2

which of course is a mono. Then, given any function

\chi \colon X \to 2

we can take the pullback of f along \mathrm{true} and get a mono

i \colon S \rightarrowtail X

Please see diagram (2) in Section I.3 of Mac Lane and Moerdijk to see this pullback in all its glory.

The usual description of pullbacks in \mathsf{Set} assures us that

S \cong \{x \in X : \; \chi(x) = \mathrm{true} \}

and the map i \colon S \rightarrowtail X is the obvious inclusion.

We can copy this procedure in any category with pullbacks, leading to the definition we seek:

Definition. Let \mathsf{C} be a category with pullbacks. A subobject classifier is an object \Omega \in \mathsf{C} equipped with a mono

\mathrm{true} \colon 1 \rightarrowtail \Omega

such that for any object X \in \mathsf{C} there is a bijection between morphisms

\chi \colon X \to \Omega

and subobjects of X, given as follows: for any such morphism \chi, we pull back \mathrm{true} along \chi obtaining a mono

i \colon S \rightarrowtail X

and then take the subobject of X corresponding to this.

Here we are using a fact:

Puzzle. Show that the pullback of any mono is a mono.

Thus, i is automatically a mono, because we are assuming \mathrm{true} is.

Subobjects in presheaf categories

Let’s see what subobjects and subobject classifiers look like in presheaf categories. So now let \mathsf{C} be any category and let

\widehat{\mathsf{C}} = \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

be the category of presheaves on \mathsf{C}.

To get started: what are monos in \widehat{\mathsf{C}} like? Remember from Part 5 pullbacks are computed pointwise in presheaf categories. Furthermore, monos can be defined in terms of pullbacks:

Puzzle. Show that a morphism f \colon X \to Y is a mono iff its pullback along itself is the identity 1 \colon X \to X.

This implies that determining whether a morphism in \widehat{\mathsf{C}} is a mono must also be a ‘pointwise’ matter: that is, one that you can check by looking at one object c \in \mathsf{C} at a time. But you can prove this directly:

Puzzle. Let i \colon F \Rightarrow G be a morphism between presheaves F , G \in \widehat{\mathsf{C}}, that is, a natural transformation between functors F, G \colon \mathsf{C}^{\mathrm{op}} \to \mathsf{Set}. Show that i is a mono iff for each c \in \mathsf{C},

i_c \colon F(c) \to G(c)

is a mono in \mathsf{Set}, that is, a one-to-one function.

Thus, people say a morphism of presheaves i \colon F \to G makes F into a subpresheaf of G when each function i_c \colon F(c) \to G(c) is one-to-one, so F(c) corresponds to a subset of G(c). It’s good to look at this in examples, like the case of graphs, where we get the concept of ‘subgraph’: one graph included in another.

Of course, we should remember that a subobject of G really an equivalence class of monos. Two different monos i \colon F \rightarrowtail G and i \colon F' \rightarrowtail G give the same subobject of G if there’s an isomorphism between F and G that makes the obvious triangle commute. But sometimes people slack off and say F is a subobject of G if it’s equipped with a mono i \colon F \rightarrowtail G. I was coming close to doing that in the last paragraph.

Subobject classifiers in presheaf categories

Now we’re ready to understand the subobject classifier in a presheaf category. I’ll just tell you what it is. The subobject classifier in the presheaf category \widehat{\mathsf{C}} assigns to each object c \in \mathsf{C} the set of ‘sieves’ on c. So what’s a sieve?

Definition. Given a category \mathsf{C},, a sieve on an object c \in \mathsf{C} is a collection of morphisms to c such that if f \colon d \to c is in the sieve and g \colon d' \to d is any morphism, then f \circ g \colon d'\to c is in the sieve.

In other words, a sieve is a collection of morphisms with a fixed target that’s closed under precomposition. The name ‘sieve’ should remind you that if a piece of grain can get through a sieve, any smaller piece of grain can also get through. You can think of f \circ g as ‘smaller’ than f in some sense.

Here’s a slick way to think about sieves. Remember that the Yoneda embedding

y \colon \mathsf{C} \to \widehat{\mathsf{C}}

sends any object c \in \mathsf{C} to a presheaf

y(c) = \mathrm{hom}(-, c)

called a representable presheaf.

Here’s the cool fact: a sieve on c is just the same as a subobject of y(c)! For each object d it gives a subset of \mathsf{hom}(d,c), and for each morphism g \colon d' \to d it gives a map from \mathsf{hom}(d,c) to \mathsf{hom}(d',c), namely the map sending each f to f \circ g.

The subobject classifier \Omega in \widehat{\mathsf{C}} is a beautiful thing: it assigns to each object the set of all sieves on that object!

That is, for each object c \in \mathsf{C}, the set \Omega(c) is the set of all sieves on c. But we also need to say what \Omega does to morphisms. Given a morphism f \colon c' \to c, the map

\Omega(f) \colon \Omega(c) \to \Omega(c')

sends sieves on c to sieves on c' as follows. For any sieve S on c, we say a morphism is in \Omega(f)(S) if its composite with f is in S.

We also need to describe

\mathrm{true} \colon 1 \to \Omega

A terminal presheaf 1 sends each object of \mathsf{C} to a one-element set, so \mathrm{true} must pick out an element of \Omega(c) for each object c \in \mathsf{C} in a natural way (it’s a natural transformation.)

In other words, \mathrm{true} must choose a sieve on c for each c \in \mathsf{C} in a natural way. The naturality condition here says that if g \colon c' \to c, a morphism is in the chosen sieve on c' iff its composite with g is in the chosen sieve on c.

How does \mathrm{true} do this wonderful thing? Simple: for each object c it chooses the sieve containing all morphisms to c. Then the naturality condition holds trivially.

Of course, we have to check that

\mathrm{true} \colon 1 \to \Omega

really is a subobject classifier for \widehat{\mathsf{C}}. Only this will let us really understand what we’ve done here.

I’ll talk about this more next time, perhaps focusing on examples to build up intuition. For now I recommend that you read Section I.4 of Mac Lane and Moerdijk’s book for a general proof—and also a look at some examples!


Topos Theory (Part 6)

11 February, 2020

I’m explaining why any presheaf category is an elementary topos, meaning that

• it has finite colimits;
• it has finite limits;
• it’s cartesian closed.
• it has a subboject classifier.

Last time I tackled the first two bullet points; now let’s do the third. For starters, what’s a cartesian closed category, and why are they so nice? Answering this question will get us into some more ‘philosophical’ aspects of topos theory.

Cartesian closed categories

The category of sets has ‘sums’, also known as coproducts: given two sets X and Y there is a set X + Y, their disjoint union, which acts like their sum in various ways; for example you get its cardinality by adding those of X and Y:

|X + Y| = |X| + |Y|

The category of sets also has products: given two sets X and Y there is a set X \times Y, their cartesian product, which obeys

|X \times Y| = |X| \times |Y|

Both these can be understood using category theory, and I’m assuming you know how—if not, click the links. But after addition and multiplication comes exponentiation, and this too can be understood categorically. Even before you learn category theory, you should learn that the set of all functions from X to Y is called Y^X, and it has

|Y^X| = |Y|^{|X|}

But we can generalize this idea of ‘the set of functions from one set to another’ to ‘the object of morphisms from one object to another’ using category theory.

Here’s how. In the category of sets, something nice is true: there’s a natural one-to-one correspondence between functions from X \times Y to Z and functions from X to Z^Y. A function

f \colon X \times Y \to Z

eats a pair (x,y) \in X \times Y and spits out an element f(x,y) \in Z. We can turn this into a function

\tilde{f} \colon X \to Z^Y

that eats an element x and spits out a function from Y to Z, as follows:

\tilde{f}(x)(y) = f(x,y)

It’s like ‘delayed gratification’: instead of eating your dinner and dessert at once, you eat your dinner and turn into someone who is ready to eat dessert. Computer scientists call the map sending f to \tilde{f} currying… perhaps because they always eat curry for dinner? No, actually it’s named after Haskell Curry, who also has a computer language named after him.

This currying process is natural, meaning there’s a natural isomorphism

\mathrm{hom}(X \times Y, Z) \cong \mathrm{hom}(X, Z^Y)

More precisely, for any set Y there’s a natural isomorphism

\alpha \colon \mathrm{hom}(- \times Y, --) \stackrel{\sim}{\longrightarrow} \mathrm{hom}(-, {--}^Y)

Here both sides are functors waiting to eat two sets (which I’d called X and Z a moment ago) and spit out a set. The blank spots - and -- are places where you can stick sets.

Note that this natural isomorphism says the functor sending any set X to the set X \times Y has a right adjoint, namely the functor sending any set Z to the set Z^Y. In the usual way of category theorists we can abbreviate this by saying

- \times Y \colon \mathsf{Set} \to \mathsf{Set}

has a right adjoint,

-^{Y} \colon \mathsf{Set} \to \mathsf{Set}

We can generalize this idea to any category that has finite products. But first, a spiffy new name:

Definition. A category \mathsf{C} is cartesian if it has finite products, or equivalently if it has binary products and a terminal object.

Definition. A category \mathsf{C} is cartesian closed if it is cartesian and for any object Y the functor

- \times Y \colon \mathsf{C} \to \mathsf{C}

has a right adjoint.

Of course in this situation we call the right adjoint

-^{Y} \colon \mathsf{C} \to \mathsf{C}

so for any objects X,Y,Z \in \mathsf{C} there’s a natural one-to-one correspondence between morphisms

f \colon X \times Y \to Z

and morphisms

\tilde{f} \colon X \to Z^Y

It’s no surprise that taking \mathsf{C} = \mathsf{Set} gives an example of a cartesian closed category—that’s the example we were generalizing in the first place. But this example is actually a bit confusing, because in this example and only in this example there’s no difference between \mathrm{hom}(X,Y), which is a set, and Y^X, which is an object of \mathsf{C}.

(Well, pedants may argue that \mathsf{FinSet}, the category of finite sets, gives another example. But ultra-pedants may argue that a finite set is not exactly the same as a set. Let’s leave them to fight it out, and move on.)

In a cartesian closed category it’s very important to distinguish between \mathrm{hom}(X,Y) and Y^X. We call \mathrm{hom}(X,Y) the external hom and Y^X the internal hom or exponential. The idea is that the internal hom lives ‘inside’ the category \mathsf{C}, while \mathrm{hom}(X,Y) lives ‘outside’ \mathsf{C}, in the category of sets.

The philosophy here is that while category theory is traditionally based on set theory—for example any two objects of a category have a set of morphisms between them—it is nice to move away from this, and try to work with categories in a more free-standing way.

A first step is to consider categories where there’s an object of morphisms between two objects. Of course as long as they’re still categories, they also have sets of morphisms between objects, and also a set (or class) of objects. So we have not completely lifted ourselves up by our bootstraps and floated away from the set-theoretic foundations of mathematics. Not yet, anyway. But this is just the first stage! Eventually we can base set theory on topos theory, which in turn is based on first-order logic.

More to the point right now, the internal hom contains more information than the external hom. To see this, first note that in any category with a terminal object we can define an element of an object X to be a morphism

p \colon 1 \to X

where 1 is the terminal object (that is, our chosen terminal object: they’re all isomorphic). Any object has a set of elements. Furthermore any morphism

f \colon X \to Y

lets us turn elements of X into elements of Y, as follows: given an element

p \colon 1 \to X

we get an element

f \circ p \colon 1 \to Y

Puzzle. Prove that if \mathsf{C} has a terminal object, there is a functor

\mathrm{elt} \colon \mathsf{C} \to \mathsf{Set}

sending each object to its set of elements and each morphism f \colon X \to Y to the function sending elements

p \colon 1 \to X

to elements

f \circ p \colon 1 \to Y.

Puzzle. Prove that if \mathsf{C} is cartesian, the functor \mathrm{elt} \colon \mathsf{C} \to \mathsf{Set} preserves finite products.

In particular, this means that the terminal object 1 \in \mathsf{C} has just one element, and the god-given function

\mathrm{elt}(X \times Y) \to \mathrm{elt}(X) \times \mathrm{elt}(Y)

is a bijection. (If you don’t see what this god-given function is, ask me; this function is part of the definition of ‘preserves finite product’.)

What more can we say if \mathsf{C} is cartesian closed? First, any morphism

f \colon X \to Y

can be turned into a morphism from 1 \times X to Y, since there’s a god-given isomorphism 1 \times X \cong X. Abusing language we’ll call this morphism

f \colon X \times 1 \to Y

Then we can curry it, getting

\tilde{f} \colon 1 \to Y^X

This is an element of Y^X, called the name of f.

So, any morphism f \in \mathrm{hom}(X,Y) gives an element \widetilde{f} \in \mathrm{elt}(Y^X). This gives a nice link between the external hom and the internal hom: namely, the map

\begin{array}{ccc} \mathrm{hom}(X,Y) &\to& \mathrm{elt}(Y^X) \\                                          f &\to&  \tilde{f}   \end{array}

sending any morphism to its name! Moreover this map is a bijection, because currying is a bijection and so is the process that turns morphisms X \to Y into morphisms X \times 1 \to Y.

This may make it seem that the internal hom is exactly the same as the external hom. But no! We’ve just shown that elements of the internal hom (in the category-theoretic sense of ‘elements’) are in one-to-one correspondence with elements of the external hom (in the usual sense of ‘elements of a set’). But there’s more to an object than its elements!

To see this, it’s best to look at an example from last time.

The cartesian closed category of graphs

There’s a category \mathsf{G} that has two objects v and e, and just two morphisms

s, t \colon v \to e

besides the identity morphisms. Last time we saw a presheaf on \mathsf{G} is just what category theorists call a graph: it consists of a set V = F(v) called the set of vertices and a set E = F(e) called the set of edges, along with functions

F(s) \colon E \to V

F(t) \colon E \to V

assigning to each edge its source and target.

The category of graphs, \widehat{\mathsf{G}}, is a nice example of a cartesian closed category. It’s cartesian because every presheaf category has limits, including products. Furthermore, we know that products are computed ‘pointwise’. This means that the terminal object is the graph with one vertex and one edge: in general, terminal objects in presheaf categories have ‘one of each thing’. The product of two graphs F and G is a graph with

(F \times G)(v) = F(v) \times G(v)

(F \times G)(e) = F(e) \times G(e)

and similarly for the source and target functions. So a vertex in the product graph is just a pair of vertices, one in F and one in G, and similarly for edges.

Puzzle. Draw two graphs and draw their product.

We can figure out what morphisms of graphs are like, again using the fact that \widehat{\mathsf{G}} is a presheaf category. A morphism from the graph F to the graph G is just a natural transformation \alpha \colon F \Rightarrow G. So, it consists of a function

\alpha_v \colon F(v) \to G(v)

sending each vertex of F to a vertex of G, and a function

\alpha_e \colon F(e) \to G(e)

sending each edge of F to an edge of G. But the naturality condition requires that some squares commute, one for each morphism in \mathsf{G}. One of these commuting squares says that taking the source of an edge of F and then mapping the result to G is the same as first mapping that edge to G and then taking its source:

\alpha_v \circ F(s) = G(s) \circ \alpha_e

The other commuting square says the same thing for targets:

\alpha_v \circ F(t) = G(t) \circ \alpha_e

So, a morphism of graphs is just what you’d hope it would be!

This lets us see why \mathsf{G} is cartesian closed, and what its exponentials are like. Say we have three graphs F,G,H. What is H^G like? We use the fact that morphisms

\alpha \colon F \to H^G

must correspond naturally to morphisms

\tilde{\alpha} \colon F \times G \to H

To get the most mileage out of this it’s best to take F to be a very simple sort of graph, like the ‘walking vertex’ or the ‘walking edge’ (explained at the end of my last post). The walking vertex is a graph with just one vertex and no edges. If we take F to be this, morphisms

\alpha \colon F \to H^G

are just vertices of H^G. We’d like to know what those vertices are! But by currying, we know they correspond in a one-to-one way with morphisms

\tilde{\alpha} \colon F \times G \to H

Furthermore, we know F \times G is a graph with vertices corresponding to those of G, but no edges. (See why?) So, \widetilde{\alpha} amounts to an arbitrary function sending vertices of G to vertices of H.

So, a vertex of H^G is a function sending vertices of G to vertices of H. And we figured this out just by currying! This is a general trick.

Puzzle. Use a similar argument to show that an edge of H^G is a map sending edges of G to edges of H, together with two maps sending vertices to vertices, obeying a compatibility condition. For this it will help to use currying and take F to be the ‘walking edge’: the graph with one edge and two vertices, namely its source and its target.

Puzzle. Figure out the source and target of any edge of H^G.

So, you can figure out what H^G is like. Then you can do this:

Puzzle. Work out the set of elements of H^G, that is, the set of morphisms from the terminal graph to H^G.

Of course, we’ve already seen by a much more general argument that the elements of H^G must correspond to graph morphisms from G to H. But if you worked out what H^G is like as a graph, you can solve the puzzle that way too and check that the answers agree!

Anyway, my main point here is that the internal hom H^G knows more than the external hom \mathrm{hom}(G,H). The elements of H^G are the same as elements of the set \mathrm{hom}(G,H): they’re just graph morphisms from G to H. But H^G contains other information: for example, its vertices are arbitrary functions from the set of vertices of G to the set of vertices of H.

Puzzle. Suppose H is the walking vertex and G is the graph with one vertex and one edge from that vertex to itself. Show that \mathrm{hom}(G,H) is the empty set, but H^G is a nonempty graph.

Presheaf categories are cartesian closed

We can generalize some of these lessons to any presheaf category.

Theorem. Let \mathsf{C} be any small category. Then the presheaf category \widehat{\mathsf{C}} is cartesian closed.

This is Proposition 1 of Section I.6 in Mac Lane and Moerdijk’s book. The most interesting step is to determine for any two presheaves

G, H \colon \mathsf{C} \to \mathrm{Set}

what the internal hom H^G should be. We’ve already gotten some practice doing this in the case of graphs, and the general case works similarly.

In the case of graphs the key was to use the bijection between morphisms

\alpha \colon F \to H^G

and morphisms

\tilde{\alpha} \colon F \times G \to H

We focused on two particular choices of F, the walking vertex and the walking edge. These are fundamental because they are ‘representable’. Each object X \in \mathsf{C} gives a presheaf called y(X), defined by

y(X)(-) = \mathrm{hom}(-,X)

Presheaves isomorphic to these are called representable. Every object in a presheaf category is a colimit of representable presheaves—see Corollary 3 of Section I.5 for a proof. For example, the puzzle at the end last time asked you to show that every graph is a colimit of copies of the walking vertex and the walking edge. This is just a fancy way of saying that every graph is made of vertices and edges. But the point is that this idea generalizes! In any presheaf category, all objects are built from representables, and it’s always good to focus on the representables.

In general, to figure out H^G we need to know its value H^G(X) on every object X \in \mathsf{C}. (We also need to know its value on every morphism, but hold your horses.) How do we do this? The Yoneda lemma gives an isomorphism

H^G(X) \cong \mathrm{hom}(y(X),H^G)

but the definition of ‘cartesian closed’ requires that

\mathrm{hom}(y(X),H^G) \cong \mathrm{hom}(y(X) \times G, H)

So, you might as well try defining H^G on objects by

H^G(X) = \mathrm{hom}(y(X) \times G, H)

And this definition works just as well for morphisms: the Yoneda embedding gives a functor

y \colon \mathsf{C} \to \widehat{\mathsf{C}}

so you can try defining

H^G(-) = \mathrm{hom}(y(-) \times G, H)

where you can now stick either an object or a morphism in the blank. (You can now release your horses.) It’s easy to see that H^G, defined this way, is a functor. So, the only thing we need to do is check that we really have a natural bijection between morphisms

\alpha \colon F \to H^G

and morphisms

\tilde{\alpha} \colon F \times G \to H

You can try it yourself, or you can read Mac Lane and Moerdijk!


The series so far:

Part 1: sheaves, elementary topoi, Grothendieck topoi and geometric morphisms.

Part 2: turning presheaves into bundles and vice versa; turning sheaves into etale spaces and vice versa.

Part 3: sheafification; the adjunction between presheaves and bundles.

Part 4: direct and inverse images of sheaves.

Part 5: why presheaf categories are elementary topoi: colimits and limits in presheaf categories.

Part 6: why presheaf categories are elementary topoi: cartesian closed categories and why presheaf categories are cartesian closed.


The Category Theory Behind UMAP

10 February, 2020

An interesting situation has arisen. Some people working on applied category theory have been seeking a ‘killer app’: that is, an application of category theory to practical tasks that would be so compelling it would force the world to admit categories are useful. Meanwhile, the UMAP algorithm, based to some extent on category theory, has become very important in genomics:

• Leland McInnes, John Healy and James Melville, UMAP: uniform manifold approximation and projection for dimension reduction.

But while practitioners have embraced the algorithm, they’re still puzzled by its category-theoretic underpinnings, which are discussed in Section 2 of the paper. (You can read the remaining sections, which describe the algorithm quite concretely, without understanding Section 2.)

I first heard of this situation on Twitter when James Nichols wrote:

Wow! My first sighting of applied category theory: the UMAP algorithm. I’m a category novice, but the resulting adjacency-graph algorithm is v simple, so surely the theory boils down to reasonably simple arguments in topology/Riemannian geometry?

Do any of you prolific ACT tweeters know much about UMAP? I understand the gist of the linked paper, but not say why we need category theory to define this “fuzzy topology” concept, as opposed to some other analytic defn.

Junhyong Kim added:

What was gained by CT for UMAP? (honest question, not trying to be snarky)

Leland McInnes, one of the inventors of UMAP, responded:

It is my math background, how I think about the problem, and how the algorithm was derived. It wasn’t something that was added, but rather something that was always there—for me at least. In that sense what was gained was the algorithm.

I don’t really understand UMAP; for a good introduction to it see the original paper above and also this:

• Nikolay Oskolkov, How Exactly UMAP Works—and Why Exactly It Is Better Than tSNE, 3 October 2019.

tSNE is an older algorithm for taking clouds of data points in high dimensions and mapping them down to fewer dimensions so we can understand what’s going on. From the viewpoint of those working on genomics, the main good thing about UMAP is that it solves a bunch of problems that plagued tSNE. Oskolkov explains what these problems are and how UMAP deals with them. But he also alludes to the funny disconnect between these practicalities and the underlying theory:

My first impression when I heard about UMAP was that this was a completely novel and interesting dimension reduction technique which is based on solid mathematical principles and hence very different from tSNE which is a pure Machine Learning semi-empirical algorithm. My colleagues from Biology told me that the original UMAP paper was “too mathematical”, and looking at the Section 2 of the paper I was very happy to see strict and accurate mathematics finally coming to Life and Data Science. However, reading the UMAP docs and watching Leland McInnes talk at SciPy 2018, I got puzzled and felt like UMAP was another neighbor graph technique which is so similar to tSNE that I was struggling to understand how exactly UMAP is different from tSNE.

He then goes on and attempts to explain exactly why UMAP does so much better than tSNE. None of his explanation mentions category theory.

Since I don’t really understand UMAP or why it does better than tSNE, I can’t add anything to this discussion. In particular, I can’t say how much the category theory really helps. All I can do is explain a bit of the category theory. I’ll do that now, very briefly, just as a way to get a conversation going. I will try to avoid category-theoretic jargon as much as possible—not because I don’t like it or consider it unimportant, but because that jargon is precisely what’s stopping certain people from understanding Section 2.

I think it all starts with this paper by Spivak, which McInnes, Healy and Melville cite but for some reason don’t provide a link to:

• David Spivak, Metric realization of fuzzy simplicial sets.

Spivak showed how to turn a ‘fuzzy simplicial set’ into an ‘uber-metric space’ and vice versa. What are these things?

An ‘uber-metric space’ is very simple. It’s a slight generalization of a metric space that relaxes the usual definition in just two ways: it lets distances be infinite, and it lets distinct points have distance zero from each other. This sort of generalization can be very useful. I could talk about it a lot, but I won’t.

A fuzzy simplicial set is a generalization of a simplicial set.

A simplicial set starts out as a set of vertices (or 0-simplices), a set of edges (or 1-simplices), a set of triangles (or 2-simplices), a set of tetrahedra (or 3-simplices), and so on: in short, a set of n-simplices for each n. But there’s more to it. Most importantly, each n-simplex has a bunch of faces, which are lower-dimensional simplices.

I won’t give the whole definition. To a first approximation you can visualize a simplicial set as being like this:



But of course it doesn’t have to stop at dimension 3—and more subtly, you can have things like two different triangles that have exactly the same edges.

In a ‘fuzzy’ simplicial set, instead of a set of n-simplices for each n, we have a fuzzy set of them. But what’s a fuzzy set?

Fuzzy set theory is good for studying collections where membership is somewhat vaguely defined. Like a set, a fuzzy set has elements, but each element has a ‘degree of membership’ that is a number 0 < x ≤ 1. (If its degree of membership were zero, it wouldn't be an element!)

A map f: X → Y between fuzzy sets is an ordinary function, but obeying this condition: it can only send an element x ∈ X to an element f(x) ∈ Y whose degree of membership is greater than or equal to that of x. In other words, we don't want functions that send things to things with a lower degree of membership.

Why? Well, if I'm quite sure something is a dog, and every dog has a nose, then I'm must be at least equally sure that this dog has a nose! (If you disagree with this, then you can make up some other concept of fuzzy set. There are a number of such concepts, and I'm just describing one.)

So, a fuzzy simplicial set will have a set of n-simplices for each n, with each n-simplex having a degree of membership… but the degree of membership of its faces can't be less than its own degree of membership.

This is not the precise definition of fuzzy simplicial set, because I'm leaving out some distracting nuances. But you can get the precise definition by taking a nuts-and-bolts definition of simplicial set, like Definition 3.2 here:

• Greg Friedman, An elementary illustrated introduction to simplicial sets.

and replacing all the sets by fuzzy sets, and all the maps by maps between fuzzy sets.

If you like visualizing things, you can visualize a fuzzy simplicial set as an ordinary simplicial set, as in the picture above, but where an n-simplex is shaded darker if its degree of membership is higher. An n-simplex can’t be shaded darker than any of its faces.

How can you turn a fuzzy simplicial set into an uber-metric space? And how can you turn an uber-metric space into a fuzzy simplicial set?

Spivak focuses on the first question, because the answer is simpler, and it determines the answer to the second using some category theory. (Psst: adjoint functors!)

The answer to the first question goes like this. Say you have a fuzzy simplicial set. For each n-simplex whose degree of membership equals a, you turn it into a copy of this uber-metric space:

\{ (t_0, t_1, \dots, t_n) : t_0 + \cdots + t_n = - \log a , \; t_0, \ldots, t_n \geq 0 \} \subseteq \mathbb{R}^{n+1}

This is really just an ordinary metric space: an n-simplex that’s a subspace of Euclidean (n+1)-dimensional space with its usual Euclidean distance function. Then you glue together all these uber-metric spaces, one for each simplex in your fuzzy simplical set, to get a big fat uber-metric space.

This process is called ‘realization’. The key here is that if an n-simplex has a high degree of membership, it gets ‘realized’ as a metric space shaped like a small n-simplex. I believe the basic intuition is that an n-simplex with a high degree of membership describes an (n+1)-tuple of things—its vertices—that are close to each other.

In theory, I should try to describe the reverse process that turns an uber-metric space into a fuzzy simplicial set. If I did, I believe we would see that whenever an (n+1)-tuple of things—that is, points of our uber-metric space—are close, they give an n-simplex with a high degree of membership.

If so, then both uber-metric spaces and fuzzy simplicial sets are just ways of talking about which collections of data points are close, and we can translate back and forth between these descriptions.

But I’d need to think about this a bit more to do a good job of going further, and reading the UMAP paper a bit more I’m beginning to suspect that’s not the main thing that practitioners need to understand. I’m beginning to think the most useful thing is to get a feeling for fuzzy simplicial sets! I hope I’ve helped a bit in that direction. They are very simple things. They are also closely connected to an idea from topological data analysis:

• Nina Otter, Magnitude meets persistence. Homology theories for filtered simplicial sets.

I should admit that McInnes, Healy and Melville tweak Spivak’s formalism a bit. They call Spivak’s uber-metric spaces ‘extended-pseudo-metric spaces’, but they focus on a special kind, which they call ‘finite’. Unfortunately, I can’t find where they define this term. They also only consider a special sort of fuzzy simplicial set, which they call ‘bounded’, but I can’t find the definition of this term either! Without knowing these definitions, I can’t comment on how these tweaks change things.


Superfluid Quasicrystals

31 January, 2020

Condensed matter physics is so cool! Bounce 4 laser beams off mirrors to make an interference pattern with 8-fold symmetry. Put a Bose–Einstein condensate of potassium atoms into this “optical lattice” and you get a superfluid quasicrystal!

You see, no periodic pattern in the plane can have 8-fold symmetry, so the interference pattern of the light is ‘quasiperiodic’: it never repeats itself, thought it comes arbitrarily close, sort of like this pattern drawn by Greg Egan:

In the Bose–Einstein condensate all the particles have the same wavefunction, and the wavefunction itself, influenced by the light, also becomes quasiperiodic.

But that’s not all! As you increase the intensity of the lasers, the Bose-Einstein condensate suddenly collapses from a quasicrystal to a ‘localized’ state where all the atoms sit in the same place!

Below the gray curve is the potential V formed by the lasers, while the blue curve is the absolute value squared of the wavefunction of the Bose–Einstein condensate, |ψ0|2.

At top the lasers are off so V is zero and |ψ0|2 is constant. In the middle the lasers are on, but not too bright, so V and |ψ0| is quasiperiodic. At the bottom the lasers are brighter, so V is quasiperiodic and larger, and |ψ0|2 is localized.

It’s well known that when a crystal is sufficiently disordered, its electrons may localize: instead of having spread-out wavefunctions, they get trapped in specific regions as shown here:

This phenomenon is called ‘Anderson localization’, and it was discovered around 1958.

But when a Bose-Einstein condensate localizes, all the atoms get trapped in the same place—because they’re all in exactly the same state! This phenomenon was discovered experimentally at the University of Cambridge very recently:

• Matteo Sbroscia, Konrad Viebahn, Edward Carter, Jr-Chiun Yu, Alexander Gaunt and Ulrich Schneider, Observing localisation in a 2D quasicrystalline optical lattice.

The evidence for it is somewhat indirect, so I’m sure people will continue to study it. Localization of a Bose–Einstein condensate in a one-dimensional quasiperiodic potential was seen much earlier, in 2008:

• Giacomo Roati, Chiara D’Errico, Leonardo Fallani, Marco Fattori, Chiara Fort, Matteo Zaccanti, Giovanni Modugno, Michele Modugno and Massimo Inguscio, Anderson localization of a non-interacting Bose–Einstein condensate, Nature 453 (2008), 895–898.

The holy grail, a ‘Bose glass’, remains to be seen. It’s a Bose-Einstein condensate that’s also a glass: its wavefunctions is disordered rather than periodic or quasiperiodic.

New forms of matter with strange properties—I love ’em!

For more popularizations of these ideas, see:

• Julia C. Keller, Researchers create new form of matter—supersolid is crystalline and superfluid at the same time, Phys.org, 3 March 2018.

• University of Texas at Dallas, Solid research leads physicists to propose new state of matter, Phys.org, 9 April 2018.

The latter says “The term ‘superfluid quasicrystal’ sounds like something a comic-book villain might use to carry out his dastardly plans.”


Topos Theory (Part 5)

28 January, 2020

It’s time to understand why the category of sheaves on a topological space acts like the category of sets in the following ways:

• It has finite colimits.
• It has finite limits.
• It is cartesian closed.
• It has a subboject classifier.

We summarize these four properties by saying the category of sheaves is an elementary topos. (In fact it’s better, since it has all limits and colimits.)

As a warmup, first let’s see why the category of presheaves on a topological space is an elementary topos!

It’s actually just as easy to see something more general, which will come in handy later: the category of presheaves on any category is an elementary topos. Remember, given a category \mathsf{C}, a presheaf on \mathsf{C} is a functor

F \colon \mathsf{C}^{\textrm{op}} \to \mathsf{Set}

A morphism of presheaves from F to another presheaf

G \colon \mathsf{C}^{\textrm{op}} \to \mathsf{Set}

is a natural transformation

\alpha \colon F \Rightarrow G

Presheaves on \mathsf{C} and morphisms between them form a presheaf category, which we call

\mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

or \widehat{\mathsf{C}} for short.

Presheaves on a topological space X are just the special case where we take \mathsf{C} to be the poset of open subsets of X. So if you want a ‘geometrical’ intuition for presheaves on a category, you should imagine the objects of the category as being like open sets. This will come in handy later, when we talk about sheaves on a category.

But there is also another important intuition regarding presheaves. This is more of an ‘algebraic’ intuition. Starting from a category \mathsf{C} and building the presheaf category

\mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

is analogous to taking a set S and building the set

\mathbb{Z}^S

of all functions from S to the integers. \mathbb{Z}^S is a commutative ring if we use pointwise addition and multiplication of functions as our ring operations. \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}} is an elementary topos—but this means we should think of an elementary topos as being a bit like a commutative ring. More precisely, it’s like a ‘categorified’ commutative ring, since it’s a category rather than merely a set.

There’s a one-to-one function

S \hookrightarrow \mathbb{Z}^S

sending any element of S to the characteristic function of that element. Similarly, there’s a full and faithful functor

\mathsf{C} \hookrightarrow \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

sending any object c \in \mathsf{C} to the presheaf \mathrm{hom}(-,c). This is called the Yoneda embedding. So, presheaf categories are a trick for embedding categories into elementary topoi!

In fact presheaf categories have all limits and colimits, so they are better than just elementary topoi: we will someday see they are examples of ‘Grothendieck topoi’. There are also other ways in which my story just now could be polished, but it would be a bit distracting to do so. Let’s get to work and study presheaf categories!

Today I’ll just talk about colimits and limits.

Colimits in presheaf categories

Presheaf categories have all colimits, and these colimits can be ‘computed pointwise’. What does this mean? Colimits are a lot like addition, and taking colimits of presheaves is a lot like how we add functions from a set to \mathbb{Z}. We just add their values at each point.

More precisely, say we have a diagram of presheaves on \mathsf{C}. This is a functor

F \colon \mathsf{D} \to \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

where \mathsf{D} is any small category, the ‘shape’ of our diagram. The colimit of F should be a presheaf, and I’ll call this

\mathrm{colim} F \in \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

How do we compute it? Well, notice that F is a functor from \mathsf{D} to the category of functors from \mathsf{C}^{\mathrm{op}} to \mathsf{Set}. So, we can change our viewpoint and think of it as a functor

F \colon \mathsf{D} \times \mathsf{C}^{\mathrm{op}} \to \mathsf{Set}

I’m using the same name for this because I’m lazy! Note that for each object c \in \mathsf{C}^{\mathrm{op}} we get a functor

F(-, c) \colon \mathsf{D} \to \mathsf{Set}

Since \mathsf{Set} has colimits, we can take the colimit of this functor and get a set

\mathrm{colim} F(-, c)

But you can check that this set depends functorially on c, so it defines a functor from \mathsf{C}^{\mathrm{op}} to \mathsf{Set}, which is our desired functor

\mathrm{colim} F \in  \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

Of course, you also have to check that this really is the colimit of our diagram of presheaves on \mathsf{C}.

Mac Lane and Moerdijk refer the reader to Section V.3 of Categories for the Working Mathematician for a proof of this result, but I think it’s better to prove it yourself. That is, it seems less painful to follow your nose and do the obvious thing at each step than to wrap your brain around someone else’s notation. I guess this is only true if you’ve go the hang of the subject, but anyway:

Puzzle. Show in the above situation that \mathrm{colim} F(-, c) depends functorially on c \in \mathsf{C}^{\mathrm{op}} and that the resulting functor is the colimit of the diagram F.

By the way, I can’t resist mentioning an important fact here: the category of presheaves on \mathsf{C} is the free category with all colimits on \mathsf{C}. In other words, it not only has all colimits, it’s precisely what you’d get by taking \mathsf{C} and ‘freely’ throwing in all colimits. This is why I mentioned colimits first.

This fact is analogous to the fact that when S is a finite set, \mathbb{Z}^S is the free abelian group on S. The fact that we don’t need a finiteness condition when working with presheaves is one of those ways in which categories are nicer than sets.

Limits in presheaf categories

Just as colimits generalize addition, limits generalize multiplication. Presheaf categories also have all limits, and these too can be ‘computed pointwise’. The argument is just like that for colimits, but now we use the fact that \mathsf{Set} has all limits.

Puzzle. Given a diagram of presheaves on \mathsf{C}

F \colon \mathsf{D} \to \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

show that \mathrm{lim} F(-, c) depends functorially on c \in \mathsf{C}^{\mathrm{op}} and that the resulting functor is the limit of the diagram F.

The category of graphs

It helps to understand some examples of what we’re doing here, and a nice example is the category of graphs. There’s a category \mathsf{G} that has two objects v and e, and just two morphisms

s, t \colon v \to e

besides the identity morphisms. A presheaf on \mathsf{G} is what category theorists call a graph.

Why? Well, say we have such a presheaf

F \colon \mathsf{G}^{\mathrm{op}} \to \mathsf{Set}

It consists of a set V = F(v) called the set of vertices and a set E = F(e) called the set of edges, along with two functions

F(s) \colon E \to V

F(t) \colon E \to V

assigning to each edge its source and target. This is just what category theorists call a graph; graph theorists might call it a ‘directed multigraph’, and some other mathematicians would call it a ‘quiver’.

We will use \mathsf{Graph} to mean the category of graphs, namely the presheaf category

\widehat{\mathsf{G}} = \mathsf{Set}^{\mathsf{G}^{\mathrm{op}}}

Note how the ‘op’ in the definition of presheaf turned the morphisms s,t \colon v \to e into functions F(s), F(t) \colon E \to V. The ‘op’ is more of a nuisance than a help in this example (at least so far): we had to make s and e go from v to e precisely to counteract the effect of this ‘op’.

Since colimits and limits are computed pointwise in a presheaf category, we can compute the colimit or limit of a diagram of graphs quite easily: we just take the colimit of the sets of vertices and the sets of edges separately, and the rest goes along for the ride.

Puzzle. Let the graph G_V be the walking vertex: that is, the graph with just one vertex and no edges. Let G_E be the walking edge: that is, the graph with one edge and two vertices, the source and target of that edge. Show that there are exactly two different morphisms

f, g \colon G_V \to G_E

Compute the equalizer and coequalizer of

f, g \colon G_V \to G_E

Puzzle. Show explicitly how to build any graph as a colimit of copies of the walking vertex and the walking edge. That is, show any graph is isomorphic to the colimit of some diagram in \mathsf{Graph}, where the objects in this diagram are all copies of G_V and/or G_E.


Entropy in the Universe

25 January, 2020

If you click on this picture, you’ll see a zoomable image of the Milky Way with 84 million stars:



But stars contribute only a tiny fraction of the total entropy in the observable Universe. If it’s random information you want, look elsewhere!

First: what’s the ‘observable Universe’, exactly?

The further you look out into the Universe, the further you look back in time. You can’t see through the hot gas from 380,000 years after the Big Bang. That ‘wall of fire’ marks the limits of the observable Universe.

But as the Universe expands, the distant ancient stars and gas we see have moved even farther away, so they’re no longer observable. Thus, the so-called ‘observable Universe’ is really the ‘formerly observable Universe’. Its edge is 46.5 billion light years away now!

This is true even though the Universe is only 13.8 billion years old. A standard challenge in understanding general relativity is to figure out how this is possible, given that nothing can move faster than light.

What’s the total number of stars in the observable Universe? Estimates go up as telescopes improve. Right now people think there are between 100 and 400 billion stars in the Milky Way. They think there are between 170 billion and 2 trillion galaxies in the Universe.

In 2009, Chas Egan and Charles Lineweaver estimated the total entropy of all the stars in the observable Universe at 1081 bits. You should think of these as qubits: it’s the amount of information to describe the quantum state of everything in all these stars.

But the entropy of interstellar and intergalactic gas and dust is about ten times more the entropy of stars! It’s about 1082 bits.

The entropy in all the photons in the Universe is even more! The Universe is full of radiation left over from the Big Bang. The photons in the observable Universe left over from the Big Bang have a total entropy of about 1090 bits. It’s called the ‘cosmic microwave background radiation’.

The neutrinos from the Big Bang also carry about 1090 bits—a bit less than the photons. The gravitons carry much less, about 1088 bits. That’s because they decoupled from other matter and radiation very early, and have been cooling ever since. On the other hand, photons in the cosmic microwave background radiation were formed by annihilating
electron-positron pairs until about 10 seconds after the Big Bang. Thus the graviton radiation is expected to be cooler than the microwave background radiation: about 0.6 kelvin as compared to 2.7 kelvin.

Black holes have immensely more entropy than anything listed so far. Egan and Lineweaver estimate the entropy of stellar-mass black holes in the observable Universe at 1098 bits. This is connected to why black holes are so stable: the Second Law says entropy likes to increase.

But the entropy of black holes grows quadratically with mass! So black holes tend to merge and form bigger black holes — ultimately forming the ‘supermassive’ black holes at the centers of most galaxies. These dominate the entropy of the observable Universe: about 10104 bits.

Hawking predicted that black holes slowly radiate away their mass when they’re in a cold enough environment. But the Universe is much too hot for supermassive black holes to be losing mass now. Instead, they very slowly grow by eating the cosmic microwave background, even when they’re not eating stars, gas and dust.

So, only in the far future will the Universe cool down enough for large black holes to start slowly decaying via Hawking radiation. Entropy will continue to increase… going mainly into photons and gravitons! This process will take a very long time. Assuming nothing is falling into it and no unknown effects intervene, a solar-mass black hole takes about 1067 years to evaporate due to Hawking radiation — while a really big one, comparable to the mass of a galaxy, should take about 1099 years.

If our current most popular ideas on dark energy are correct, the Universe will continue to expand exponentially. Thanks to this, there will be a cosmological event horizon surrounding each observer, which will radiate Hawking radiation at a temperature of roughly 10-30 kelvin.

In this scenario the Universe in the very far future will mainly consist of massless particles produced as Hawking radiation at this temperature: photons and gravitons. The entropy within the exponentially expanding ball of space that is today our ‘observable Universe’ will continue to increase exponentially… but more to the point, the entropy density will approach that of a gas of photons and gravitons in thermal equilibrium at 10-30 kelvin.

Of course, it’s quite likely that some new physics will turn up, between now and then, that changes the story! I hope so: this would be a rather dull ending to the Universe.

For more details, go here:

• Chas A. Egan and Charles H. Lineweaver, A larger estimate of the entropy of the universe, The Astrophysical Journal 710 (2010), 1825.

Also read my page on information.


Topos Theory (Part 4)

21 January, 2020

In Part 1, I said how to push sheaves forward along a continuous map. Now let’s see how to pull them back! This will set up a pair of adjoint functors with nice properties, called a ‘geometric morphism’.

First recall how we push sheaves forward. I’ll say it more concisely this time. If you have a continuous map f \colon X \to Y between topological spaces, the inverse image of any open set is open, so we get a map

f^{-1} \colon \mathcal{O}(Y) \to \mathcal{O}(X)

A functor between categories gives a functor between the opposite categories. I’ll use the same name for this, if you can stand it:

f^{-1} \colon \mathcal{O}(Y)^{\mathrm{op}} \to \mathcal{O}(X)^{\mathrm{op}}

A presheaf on X is a functor

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

and we can compose this with f^{-1} to get a presheaf on Y,

F \circ f^{-1} \colon \mathcal{O}(Y)^{\mathrm{op}} \to \mathsf{Set}

We call this presheaf on Y the direct image or pushforward of F along f, and we write it as f_\ast F. In a nutshell:

f_\ast F = F \circ f^{-1}

Even better, this direct image operation extends to a functor from the category of presheaves on X to the category of presheaves on Y:

f_\ast \colon \widehat{\mathcal{O}(X)} \to \widehat{\mathcal{O}(Y)}

Better still, this functor sends sheaves to sheaves, so it restricts to a functor

f_\ast \colon \mathsf{Sh}(X) \to \mathsf{Sh}(X)

This is how we push forward sheaves on X to get sheaves on Y.

All this seems very natural and nice. But now let’s stop pushing and start pulling! This will give a functor going the other way:

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

The inverse image of a sheaf

At first it seems hard how to pull back sheaves, given how natural it was to push them forward. This is where our second picture of sheaves comes in handy!

Remember, a bundle over a topological space Y is a topological space E equipped with a continuous map

p \colon E \to Y

We say it’s an etale space over Y if it has a special property: each point e \in E has an open neighborhood such that p restricted to this neighborhood is a homeomorphism from this neighborhood to an open subset of Y. In Part 2 we defined the category of bundles over X, which is called \mathsf{Top}/X, and the full subcategory of this whose objects are etale spaces, called \mathsf{Etale}(X). I also sketched how we get an equivalence of categories

\mathsf{Sh}(X) \simeq \mathsf{Etale}(X)

So, to pull back sheaves we can just convert them into etale spaces, pull those back, and then convert them back into sheaves!

First I’ll tell you how to pull back a bundle. I’ll assume you know the general concept of ‘pullbacks’, and what they’re like in the category of sets. The category of topological spaces and continuous maps has pullbacks, and they work a lot like they do in the category of sets. Say we’re given a bundle over Y, which is really just a continuous map

p \colon E \to Y

and a continuous map

f \colon X \to Y

Then we can form their pullback and get a bundle over X called

f^\ast(p) \colon f^\ast(E) \to X

In class I’ll draw the pullback diagram, but it’s too much work to do here! As a set,

f^\ast E = \{ (e,x) \in E \times X \; \colon \; p(e) = f(x) \}

It’s a subset of E \times X, and we make it into a topological space using the subspace topology. The map

f^\ast p  \colon f^\ast E \to X

does the obvious thing: it sends (e,x) to x.

Puzzle. Prove that this construction really obeys the universal property for pullbacks in the category \mathsf{Top} where objects are topological space and morphisms are continuous maps.

Puzzle. Show that this construction extends to a functor

f^\ast \colon \mathsf{Top}/Y \to \mathsf{Top}/X

That is, find a natural way to define the pullback of a morphism between bundles, and prove that this makes f^\ast into a functor.

Puzzle. Prove that if p \colon E \to Y is an etale space over Y, and f \colon X \to Y is any continuous map, then f^\ast p \colon f^\ast E \to X is an etale space over X.

Putting these puzzles together, it instantly follows that we can restrict the functor

f^\ast \colon \mathsf{Top}/Y \to \mathsf{Top}/X

to etale spaces and morphisms between those, and get a functor

f^\ast \colon \mathsf{Etale}(Y) \to \mathsf{Etale}(X)

Using the equivalence

\mathsf{Sh}(X) \simeq \mathsf{Etale}(X)

we then get our desired functor

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

called the inverse image or pullback functor.

Slick! But what does the inverse image of a sheaf actually look like?

Suppose we have a sheaf F on Y and a continuous map f \colon X \to Y. We get an inverse image sheaf f^\ast(F) on X. But what is it like, concretely?

That is, suppose we have an open set U \subseteq X. What does an element s of (f^\ast F)(U) amount to?

Unraveling the definitions, s must be a section over U of the pullback along f of the etale space corresponding to F.

A point in the etale space corresponding to F is the germ at some y \in Y of some s \in F(V) where V is some open neighborhood of y.

Thus, our section s is just a continuous function sending each point x \in U to some germ of this sort at y = f(x).

There is more to say: we could try to unravel the definitions a bit more, and describe (f^\ast F)(U) directly in terms of the sheaf F, without mentioning the corresponding etale space! But maybe one of you reading this can do that more gracefully than I can.

The adjunction between direct and inverse image functors

Once they have direct and inverse images in hand, Mac Lane and Moerdijk prove the following as Theorem 2 in Section II.9:

Theorem. For any continuous map f \colon X \to Y, the direct image functor

f_\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

is left adjoint to the inverse image functor:

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

I won’t do it here, so please look at their proof if you’re curious! As you might expect, it involves hopping back and forth between our two pictures of sheaves: as presheaves with an extra property, and as bundles with an extra property — namely, etale spaces.

I don’t think there’s anything especially sneaky about their argument. They do however use this: if you take a sheaf, and convert it into an etale space, and convert that back into a sheaf, you get back where you started up to natural isomorphism. This isomorphism is just the counit \eta that I mentioned in Part 3.

Remember, the functor that turns presheaves into bundles

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

is left adjoint to the functor that turns bundles into presheaves:

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

So, there’s a unit

\epsilon \colon 1 \Rightarrow \Gamma \Lambda

and a unit

\eta \colon \Lambda \Gamma \Rightarrow 1

The fact we need now is that whenever a presheaf F is a sheaf, its counit

\eta_F \colon \Lambda \Gamma F \to F

is an isomorphism. This is part of Theorem 2 in Section II.6 in Mac Lane and Moerdijk.

And by the way, this fact has a partner! Whenever a bundle is an etale space, its unit is an isomorphism. So, converting an etale space into a sheaf and then back into an etale space also gets you back where you started, up to natural isomorphism. But the favored direction of this morphism is in the other direction: any sheaf maps to the sheaf of sections of its associated etale space, while any bundle maps to the etale space of its sheaf of sections.