Shannon Entropy from Category Theory

22 April, 2022

I’m giving a talk at Categorical Semantics of Entropy on Wednesday May 11th, 2022. You can watch it live on Zoom if you register, or recorded later. Here’s the idea:

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

You can see the slides now, here. I talk a bit about all these papers:

• John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, 2011.

• Tom Leinster, An operadic introduction to entropy, 2011.

• John Baez and Tobias Fritz, A Bayesian characterization of relative entropy, 2014.

• Tom Leinster, A short characterization of relative entropy, 2017.

• Nicolas Gagné and Prakash Panangaden, A categorical characterization of relative entropy on standard Borel spaces, 2017.

• Tom Leinster, Entropy and Diversity: the Axiomatic Approach, 2020.

• Arthur Parzygnat, A functorial characterization of von Neumann entropy, 2020.

• Arthur Parzygnat, Towards a functorial description of quantum relative entropy, 2021.

• Tai-Danae Bradley, Entropy as a topological operad derivation, 2021.

Categorical Semantics of Entropy

19 April, 2022

There will be a workshop on the categorical semantics of entropy at the CUNY Grad Center in Manhattan on Friday May 13th, organized by John Terilla. I was kindly invited to give an online tutorial beforehand on May 11, which I will give remotely to save carbon. Tai-Danae Bradley will also be giving a tutorial that day in person:

Tutorial: Categorical Semantics of Entropy, Wednesday 11 May 2022, 13:00–16:30 Eastern Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

12:00-1:00 Eastern Daylight Time — Lunch in Room 5209.

1:00-2:30 — Shannon entropy from category theory, John Baez, University of California Riverside; Centre for Quantum Technologies (Singapore); Topos Institute.

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

2:30-3:00 — Coffee break.

3:00-4:30 — Operads and entropy, Tai-Danae Bradley, The Master’s University; Sandbox AQ.

This talk will open with a basic introduction to operads and their representations, with the main example being the operad of probabilities. I’ll then give a light sketch of how this framework leads to a small, but interesting, connection between information theory, abstract algebra, and topology, namely a correspondence between Shannon entropy and derivations of the operad of probabilities.

Symposium on Categorical Semantics of Entropy, Friday 13 May 2022, 9:30-3:15 Eastern Daylight Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

9:30-10:00 Eastern Daylight Time — Coffee and pastries in Room 5209.

10:00-10:45 — Operadic composition of thermodynamic systems, Owen Lynch, Utrecht University.

The maximum entropy principle is a fascinating and productive lens with which to view both thermodynamics and statistical mechanics. In this talk, we present a categorification of the maximum entropy principle, using convex spaces and operads. Along the way, we will discuss a variety of examples of the maximum entropy principle and show how each application can be captured using our framework. This approach shines a new light on old constructions. For instance, we will show how we can derive the canonical ensemble by attaching a probabilistic system to a heat bath. Finally, our approach to this categorification has applications beyond the maximum entropy principle, and we will give an hint of how to adapt this categorification to the formalization of the composition of other systems.

11:00-11:45 — Polynomial functors and Shannon entropy, David Spivak, MIT and the Topos Institute.

The category Poly of polynomial functors in one variable is extremely rich, brimming with categorical gadgets (e.g. eight monoidal products, two closures, limits, colimits, etc.) and applications including dynamical systems, databases, open games, and cellular automata. In this talk I’ll show that objects in Poly can be understood as empirical distributions. In part using the standard derivative of polynomials, we obtain a functor to Set × Setop which encodes an invariant of a distribution as a pair of sets. This invariant is well-behaved in the sense that it is a distributive monoidal functor: it acts on both distributions and maps between them, and it preserves both the sum and the tensor product of distributions. The Shannon entropy of the original distribution is then calculated directly from the invariant, i.e. only in terms of the cardinalities of these two sets. Given the many applications of polynomial functors and of Shannon entropy, having this link between them has potential to create useful synergies, e.g. to notions of entropic causality or entropic learning in dynamical systems.

12:00-1:30 — Lunch in Room 5209

1:30-2:15 — Higher entropy, Tom Mainiero, Rutgers New High Energy Theory Center.

Is the frowzy state of your desk no longer as thrilling as it once was? Are numerical measures of information no longer able to satisfy your needs? There is a cure! In this talk we’ll learn about: the secret topological lives of multipartite measures and quantum states; how a homological probe of this geometry reveals correlated random variables; the sly decategorified involvement of Shannon, Tsallis, Réyni, and von Neumann in this larger geometric conspiracy; and the story of how Gelfand, Neumark, and Segal’s construction of von Neumann algebra representations can help us uncover this informatic ruse. So come to this talk, spice up your entropic life, and bring new meaning to your relationship with disarray.

2:30-3:15 — On characterizing classical and quantum entropy, Arthur Parzygnat, Institut des Hautes Études Scientifiques.

In 2011, Baez, Fritz, and Leinster proved that the Shannon entropy can be characterized as a functor by a few simple postulates. In 2014, Baez and Fritz extended this theorem to provide a Bayesian characterization of the classical relative entropy, also known as the Kullback–Leibler divergence. In 2017, Gagné and Panangaden extended the latter result to include standard Borel spaces. In 2020, I generalized the first result on Shannon entropy so that it includes the von Neumann (quantum) entropy. In 2021, I provided partial results indicating that the Umegaki relative entropy may also have a Bayesian characterization. My results in the quantum setting are special applications of the recent theory of quantum Bayesian inference, which is a non-commutative extension of classical Bayesian statistics based on category theory. In this talk, I will give an overview of these developments and their possible applications in quantum information theory.

Wine and cheese reception to follow, Room 5209.

Compositional Thermostatics (Part 4)

8 March, 2022

guest post by Owen Lynch

This is the fourth and final part of a blog series on this paper:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

In Part 1, we went over our definition of thermostatic system: it’s a convex space X of states and a concave function S \colon X \to [-\infty, \infty] saying the entropy of each state. We also gave examples of thermostatic systems.

In Part 2, we talked about what it means to compose thermostatic systems. It amounts to constrained maximization of the total entropy.

In Part 3 we laid down a categorical framework for composing systems when there are choices that have to be made for how the systems are composed. This framework has been around for a long time: operads and operad algebras.

In this post we will bring together all of these parts in a big synthesis to create an operad of all the ways of composing thermostatic systems, along with an operad algebra of thermostatic systems!

Recall that in order to compose thermostatic systems (X_1, S_1), \ldots, (X_n, S_n), we need to use a ‘parameterized constraint’, a convex subset

R \subseteq X_1 \times \cdots \times X_n \times Y,

where Y is some other convex set. We end up with a thermostatic system on Y, with S \colon Y \to [-\infty,\infty] defined by

S(y) = \sup_{(x_1,\ldots,x_n,y) \in R} S_1(x_1) + \cdots + S_n(x_n)

In order to model this using operads and operad algebras, we will make an operad \mathcal{CR} which has convex sets as its types, and convex relations as its morphisms. Then we will make an operad algebra that assigns to any convex set X the set of concave functions

S \colon X \to [-\infty,\infty]

This operad algebra will describe how, given a relation R \subseteq X_1 \times \cdots \times X_n \times Y, we can ‘push forward’ entropy functions on X_1,\ldots,X_n to form an entropy function on Y.

The operad \mathcal{CR} is built using a construction from Part 3 that takes a symmetric monoidal category and produces an operad. The symmetric monoidal category that we start with is \mathsf{ConvRel}, which has convex sets as its objects and convex relations as its morphisms. This symmetric monoidal category has \mathsf{Conv} (the category of convex sets and convex-linear functions) as a subcategory with all the same objects, and \mathsf{ConvRel} inherits a symmetric monoidal structure from the bigger category \mathsf{Conv}.

Following the construction from Part 3, we see that we get an operad

\mathcal{CR} = \mathrm{Op}(\mathsf{ConvRel})

exactly as described before: namely it has convex sets as types, and

\mathcal{CR}(X_1,\ldots,X_n;Y) = \mathsf{ConvRel}(X_1 \times \cdots \times X_n, Y)

Next we want to make an operad algebra on \mathcal{CR}. To do this we use a lax symmetric monoidal functor \mathrm{Ent} from \mathsf{ConvRel} to \mathsf{Set}, defined as follows. On objects, \mathrm{Ent} sends any convex set X to the set of entropy functions on it:

\mathrm{Ent}(X) = \{ S \colon X \to [-\infty,\infty] \mid S \: \text{is concave} \}

On morphisms, \mathrm{Ent} sends any convex relation to to the map that “pushes forward” an entropy function along that relation:

\mathrm{Ent}(R \subseteq X \times Y) = (y \mapsto \sup_{(x,y) \in R} S(x))

And finally, the all-important laxator \epsilon produces an entropy function on X_1 \times X_2 by summing an entropy function on X_1 and an entropy function on X_2:

\epsilon_{X_1,X_2} = ((S_1,S_2) \mapsto S_1 + S_2)

The proof that all this indeed defines a lax symmetric monoidal functor can be found in our paper. The main point is that once we have proven this really is a lax symmetric monoidal functor, we can invoke the machinery of lax symmetric monoidal functors and operad algebras to prove that we get an operad algebra! This is very convenient, because proving that we have an operad algebra directly would be somewhat tedious.

We have now reached the technical high point of the paper, which is showing that this operad algebra exists and thus formalizing what it means to compose thermostatic systems. All that remains to do now is to show off a bunch of examples of composition, so that you can see how all this categorical machinery works in practice. In our paper we give many examples, but here let’s consider just one.

Consider the following setup with two ideal gases connected by a movable divider.

The state space of each individual ideal gas is \mathbb{R}^3_{> 0}, with coordinates (U,V,N) representing energy, volume, and number of particles respectively. Let (U_1, V_1, N_1) be the coordinates for the left-hand gas, and (U_2, V_2, N_2) be the coordinates for the right-hand gas. Then as the two gases move to thermodynamic equilibrium, the conserved quantities are U_1 + U_2, V_1 + V_2, N_1 and N_2. We picture this with the following diagram.

Ports on the inner circles represent variables for the ideal gases, and ports on the outer circle represent variables for the composed system. Wires represent relations between those variables. Thus, the entire diagram represents an operation in \mathcal{CR}, given by

U_1 + U_2 = U^e
V_1 + V_2 = V^e
N_1 = N_1^e
N_2 = N_2^e

We can then use the operad algebra to take entropy functions S_1,S_2 \colon \mathbb{R}^3_{> 0} \to [-\infty, \infty] on the two inner systems (the two ideal gases), and get an entropy function S^e \colon \mathbb{R}^4_{> 0} \to [-\infty,\infty] on the outer system.

As a consequence of this entropy maximization procedure, the inner state (U_1,V_1,N_1), (U_2,V_2,N_2) are such that the temperature and pressure equilibriate between the two ideal gases. This is because constrained maximization with the constraint U_1 + U_2 = U^e leads to the following equations at a maximizer:

\displaystyle{ \frac{1}{T_1} = \frac{\partial S_1}{\partial U_1} = \frac{\partial S_2}{\partial U_2} = \frac{1}{T_2} }

(where T_1 and T_2 are the respective temperatures), and

\displaystyle{ \frac{p_1}{T_1} = \frac{\partial S_1}{\partial V_1} = \frac{\partial S_2}{\partial V_2} = \frac{p_2}{T_2} }

(where p_1 and p_2 are the respective pressures).

Thus we arrive at the expected conclusion, which is that temperature and pressure equalize when we maximize entropy under constraints on the total energy and volume.

And that concludes this series of blog posts! For more examples of thermostatic composition, I invite you to read our paper, which has some “thermostatic systems” that one does not normally see thought of in this way, such as heat baths and probabilistic systems! And if you find this stuff interesting, don’t hesitate to reach out to me! Just drop a comment here or email me at the address in the paper.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

Topos Institute Research Associates

5 March, 2022

Come spend the summer at the Topos Institute! For early-career researchers, we’re excited to open up applications for our summer research associate (RA) program.

Summer RAs are an important part of life at Topos — they help explore new directions relevant to Topos projects, and they bring new ideas, energy, and expertise to our research groups. This year we’ll welcome a new cohort of RAs to our offices in Berkeley, CA, with the program running from June to August.

RAs will be responsible for performing an in-depth research or teaching project, mentored by a Topos faculty mentor or advisor. This year possible mentors may include Conal Elliott, Valeria de Paiva, Evan Patterson, Dana Scott, David Spivak, and others. RAs will work closely with their mentor to define and pursue a research project.

Along the way, RAs will also participate in our weekly lunches and seminars, blog about their time here, and produce papers, books, software, or policy, according to the parameters of their project.

Applications are now open. The position is full-time (~40 hours per week) and paid hourly starting at $30/hour. Unfortunately, Topos is not able to sponsor US work visas for participants.

Please apply by Sunday March 27th. Offers of positions will be made in early April.

For some more information, and to apply, go here.

Applied Category Theory 2022

25 February, 2022

The Fifth International Conference on Applied Category Theory, ACT2022, will take place at the University of Strathclyde from 18 to 22 July 2022, preceded by the Adjoint School 2022 from 11 to 15 July. This conference follows previous events at Cambridge (UK), Cambridge (MA), Oxford and Leiden.

Applied category theory is important to a growing community of researchers who study computer science, logic, engineering, physics, biology, chemistry, social sciences, linguistics and other subjects using category-theoretic tools. The background and experience of our members is as varied as the systems being studied. The goal of the Applied Category Theory conference series is to bring researchers together, strengthen the applied category theory community, disseminate the latest results, and facilitate further development of the field.


We accept submissions in English of original research papers, talks about work accepted/submitted/published elsewhere, and demonstrations of relevant software. Accepted original research papers will be published in a proceedings volume. The keynote addresses will be chosen from the accepted papers. The conference will include an industry showcase event and community meeting. We particularly encourage people from underrepresented groups to submit their work and the organizers are committed to non-discrimination, equity, and inclusion.

Submission formats

Extended Abstracts should be submitted describing the contribution and providing a basis for determining the topics and quality of the anticipated presentation (1-2 pages). These submissions will be adjudicated for inclusion as a talk at the conference. Such work should include references to any longer papers, preprints, or manuscripts providing additional details.

Conference Papers should present original, high-quality work in the style of a computer science conference paper (up to 14 pages, not counting the bibliography; detailed proofs may be included in an appendix for the convenience of the reviewers). Such submissions should not be an abridged version of an existing journal article (see item 1) although pre-submission Arxiv preprints are permitted. These submissions will be adjudicated for both a talk and publication in the conference proceedings.

Software Demonstrations should be submitted in the format of an Extended Abstract (1-2 pages) giving the program committee enough information to assess the content of the demonstration. We are particularly interested in software that makes category theory research easier, or uses category theoretic ideas to improve software in other domains.

Extended abstracts and conference papers should be prepared with LaTeX. For conference papers please use the EPTCS style files available at

The submission link is

Important dates

The following dates are all in 2022, and Anywhere On Earth.

• Submission Deadline: Monday 9 May
• Author Notification: Tuesday 7 June
• Camera-ready version due: Tuesday 28 June
• Adjoint School: Monday 11 to Friday 15 July
• Main Conference: Monday 18 to Friday 22 July

Conference format

We hope to run the conference as a hybrid event with talks recorded or streamed for remote participation. However, due to the state of the pandemic, the possibility of in-person attendance is not yet confirmed. Please be mindful of changing conditions when booking travel or hotel accommodations.

Financial support

Limited financial support will be available. Please contact the organisers for more information.

Program committee

• Jade Master, University of Strathclyde (Co-chair)
• Martha Lewis, University of Bristol (Co-chair)

The full program committee will be announced soon.

Organizing committee

• Jules Hedges, University of Strathclyde
• Jade Master, University of Strathclyde
• Fredrik Nordvall Forsberg, University of Strathclyde
• James Fairbanks, University of Florida

Steering committee

• John Baez, University of California, Riverside
• Bob Coecke, Cambridge Quantum
• Dorette Pronk, Dalhousie University
• David Spivak, Topos Institute

Categories: the Mathematics of Connection

17 February, 2022

I gave this talk at Mathematics of Collective Intelligence, a workshop organized by Jacob Foster at UCLA’s Institute of Pure and Applied Mathematics, or IPAM for short. There have been a lot of great talks here, all available online.

Perhaps the main interesting thing about this talk is that I sketch some work happening at the Topos Institute where we are using techniques from category theory to design epidemiological models:

Categories: the mathematics of connection

Abstract. As we move from the paradigm of modeling one single self-contained system at a time to modeling ‘open systems’ which interact with their — perhaps unmodeled — environment, category theory becomes a useful tool. It gives a mathematical language to describe the interface between an open system and its environment, the process of composing open systems along their interfaces, and how the behavior of a composite system relates to the behaviors of its parts. It is far from a silver bullet: at present, every successful application of category theory to open systems takes hard work. But I believe we are starting to see real progress.

You can see my slides or watch a video of my talk on the IPAM website or here:

For some other related talks, see:

Monoidal categories of networks.

Symmmetric monoidal categories: a Rosetta stone.

To read more about my work on categories and open systems, go here:

Network theory.

Compositional Thermostatics (Part 3)

14 February, 2022

guest post by Owen Lynch

This is the third part (Part 1, Part 2) of a blog series on a paper that we wrote recently:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

In the previous two posts we talked about what a thermostatic system was, and how we think about composing them. In this post, we are going to back up from thermostatic systems a little bit, and talk about operads: a general framework for composing things! But we will not yet discuss how thermostatic systems use this framework—we’ll do that in the next post.

The basic idea behind this framework is the following. Suppose that we have a bunch of widgets, and we want to compose these widgets. If we are lucky, given two widgets there is a natural way of composing them. This is the case if the widgets are elements of a monoid; we simply use the monoid operation. This is also the case if the widgets are morphisms in a category; if the domain and codomain of two widgets match up, then they can be composed. More generally, n-morphisms in a higher category also have natural ways of composition.

However, there is not always a canonical way of composing widgets. For instance, let R be a commutative ring, and let a and b be elements of R. Then there are many ways to compose them: we could add them, subtract them, multiply them, etc. In fact any element of the free commutative ring \mathbb{Z}[x,y] gives a way of composing a pair of elements in a commutative ring. For instance, x^2 + xy - y^2, when applied to a and b, gives a^2 + ab - b^2. Note that there is nothing special here about the fact that we started with two elements of R; we could start with as many elements of R as we liked, say, a_1,\ldots,a_n, and any element of \mathbb{Z}[x_1,\ldots,x_n] would give a ‘way of composing’ a_1,\ldots,a_n.

The reader familiar with universal algebra should recognize that this situation is very general: we could do the exact same thing with vector spaces, modules, groups, algebras, or any more exotic structures that support a notion of ‘free algebra on n variables’.

Let’s also discuss a less algebraic example. A point process X on a subset of Euclidean space A \subseteq \mathbb{R}^n can be described as an assignment of a \mathbb{N}-valued random variable X_U to each measurable set U \subseteq A that is countably additive under disjoint union of measurable sets.

The interpretation is that a point process gives a random collection of points in A, and X_U counts how many points fall in U. Moreover, this collection of points cannot have a limit point; there cannot be infinitely many points in any compact subset of A.

Now suppose that f \colon B \to A and g \colon C \to A are rigid embeddings such that f(B) \cap g(C) = \emptyset, and that X is a point process on B and Y is a point process on C. Then we can define a new point process Z on A (assuming that X and Y are independent) by letting

Z_U = X_{f^{-1}(U)}+ Y_{g^{-1}(U)}

This is the union of the point process X running in f(B) and the point process Y running in g(C).

Composing two point processes

The precise details here are not so important: what I want to display is the intuition that we are geometrically composing things that ‘live on’ a space. The embeddings f and g give us a way of gluing together a point process on B and a point process on C to get a point process on A. We could have picked something else that lives on a space, like a scalar/vector field, but I chose point processes because they are easy to visualize and composing them is fairly simple (when composing vector fields one has to be careful that they ‘match’ at the edges).


In all of the examples in the previous section, we have things that we want to compose, and ways of composing them. This situation is formalized by operads and operad algebras (which we will define very shortly). However, the confusing part is that the operad part corresponds to'”ways of composing them’, and the operad algebra part corresponds to ‘things we want to compose’. Thus, the mathematics is somewhat ‘flipped’ from the way of thinking that comes naturally; we first think about the ways of composing things, and then we think about what things we want to compose, rather than first thinking about the things we want to compose and only later thinking about the ways of composing them!

Unfortunately, this is the logical way of presenting operads and operad algebras; we must define what an operad is before we can talk about their algebras, even if what we really care about is the algebras. Thus, without further ado, let us define what an operad is.

An operad \mathcal{O} consists of a collection \mathcal{O}_0 of types (which are abstract, just like the ‘objects’ in a category are abstract), and for every list of types X_1,\ldots,X_n,Y \in \mathcal{O}_0, a collection of operations \mathcal{O}(X_1,\ldots,X_n;Y).

These operations are the ‘ways of composing things’, but they themselves can be composed by ‘feeding into’ each other, in the following way.

Suppose that g \in \mathcal{O}(Y_1,\ldots,Y_n;Z) and for each i = 1,\ldots,n, f_i \in \mathcal{O}(X_{i,1},\ldots,X_{i,k_i};Y_i). Then we can make an operation

g(f_1,\ldots,f_n) \in \mathcal{O}(X_{1,1},\ldots,X_{1,k_1},\ldots,X_{n,1},\ldots,X_{n,k_n};Z)

We visualize operads by letting an operation be a circle that can take several inputs and produces a single output. Then composition of operations is given by attaching the output of circles to the input of other circles. Pictured below is the composition of a unary operator f_1, a nullary operator f_2, and a binary operator f_3 with a ternary operator g to create a ternary operator g(f_1,f_2,f_3).

This image has an empty alt attribute; its file name is bitmap.png
One view of composition in an operad

Additionally, for every type X \in \mathcal{O}_0, there is an ‘identity operation’ 1_X \in \mathcal{O}(X;X) that satisfies for any g \in \mathcal{O}(X_1,\ldots,X_n;Y)

g(1_{X_1},\ldots,1_{X_n}) = g

and for any f \in \mathcal{O}(X;Y)

1_Y(f) = f

There is also an associativity law for composition that is a massive pain to write out explicitly, but is more or less exactly as one would expect. For unary operators f,g,h, it states

f(g(h)) = f(g)(h)

The last condition for being an operad is that if f \in \mathcal{O}(X_1,\ldots,X_n;Y) and \sigma \in S(n), the symmetric group on n elements, then we can apply \sigma to f to get

\sigma^\ast(f) \in \mathcal{O}(X_{\sigma(1)},\ldots,X_{\sigma(n)};Y).

We require that (\sigma \tau)^\ast(f) = \tau^\ast(\sigma^\ast(f)) if \sigma,\tau \in S(n), and there are also some conditions for how \sigma^\ast interacts with composition, which can be straightforwardly derived from the intuition that \sigma^\ast permutes the arguments of an operation.

Note that our definition of an operad is what might typically be known as a ‘symmetric, colored operad’, but as we will always be using symmetric, colored operads, we choose to simply drop the modifiers.

That was a long definition, so it is time for an example. This example corresponds to the first situation in the first section, where we wanted to compose ring elements.

Define \mathcal{R} to be an operad with one type, which we will call R \in \mathcal{R}_0, and let \mathcal{R}(R^n;R) = \mathbb{Z}[x_1,\ldots,x_n], where \mathcal{R}(R^n;R) is \mathcal{R}(R,\ldots,R;R) with R repeated n times.

Composition is simply polynomial substitution. That is, if

q(y_1,\ldots,y_n) \in \mathbb{Z}[y_1,\ldots,y_n] \cong \mathcal{R}(R^n;R)


p_i(x_{i,1},\ldots,x_{i,k_i}) \in \mathbb{Z}[x_{i,1},\ldots,x_{i,k_i}] \cong \mathcal{R}(R^{k_i};R)


q(p_1(x_{1,1},\ldots,x_{1,k_1}),\ldots,p_n(x_{n,1},\ldots,x_{n,k_n})) \in \mathcal{R}(R^{\sum_{i=1}^n k_i};R)

is the composite of p_1,\ldots,p_n,q. For instance, composing

x^2 \in \mathbb{Z}[x] \cong \mathcal{R}(R;R)


y+z \in \mathbb{Z}[y,z] \cong \mathcal{R}(R,R;R)

results in

(y+z)^2 \in \mathbb{Z}[y,z] \cong \mathcal{R}(R,R;R)

The reader is invited to supply details for identities and the symmetry operators.

For the other example, define an operad \mathcal{P} by letting \mathcal{P}_0 be the set of compact subsets of \mathbb{R}^2 (we could consider something more exciting, but this works fine and is easy to visualize). An operation f \in \mathcal{P}(X_1,\ldots,X_n;Y) consists of disjoint embeddings f_1,\ldots,f_n, where f_i \colon X_i \to Y.

We can visualize such an operation as simply a shape with holes in it.

Shape with holes

Composition of such operations is just given by nesting the holes.

Composing by nesting

The outcome of the above composition is given by simply taking away the intermediate shapes (i.e. the big circle and the triangle).

The composed operation

Another source of examples for operads comes from the following construction. Suppose that (C,\otimes,1) is a symmetric monoidal category. Define \mathrm{Op}(C,\otimes,1) = \mathrm{Op}(C) by letting

\mathrm{Op}(C)_0 = C_0

where C_0 is the collection of objects in C, and

\mathrm{Op}(C)(X_1,\ldots,X_n;Y) = \mathrm{Hom}_C(X_1 \otimes \cdots \otimes X_n, Y)

To compose operations f_1,\ldots,f_n and g (assuming that the types are such that these are composable), we simply take g \circ (f_1 \otimes \ldots \otimes f_n). Moreover, the identity operation is simply the identity morphism, and the action of \sigma \in S(n) is given by the symmetric monoidal structure.

In fact, the second example that we talked about is an example of this construction! If we let C be the category where the objects are compact subsets of \mathbb{R}^2, with embeddings as the morphisms, and let the symmetric monoidal product be disjoint union, then it is not too hard to show that the operad we end up with is the same as the one we described above.

Perhaps the most important example of this construction is when it is applied to (\mathsf{Set}, \times, 1), because this is important in the next section! This operad has as types, sets, and an operation

f \in \mathrm{Op}(\mathsf{Set})(X_1,\ldots,X_n;Y)

is simply a function

f \colon X_1 \times \cdots \times X_n \to Y

Operad algebras

Although ‘operad algebra’ is the name that has stuck in the literature, I think a better term would be ‘operad action’, because the analogy to keep in mind is that of a group action. A group action allows a group to ‘act on’ elements of a set; an operad algebra similarly allows an operad to ‘act on’ elements of a set.

Moreover, a group action can be described as a functor from the 1-element category representing that group to \mathsf{Set}, and as we will see, an operad algebra can also be described as an ‘operad morphism’ from the operad to \mathrm{Op}(\mathsf{Set}), the operad just described in the last section.

In fact, this is how we will define an operad algebra; first we will define what an operad morphism is, and then we will define an operad algebra as an operad morphism to \mathrm{Op}(\mathsf{Set}).

An operad morphism F from an operad \mathcal{O} to an operad \mathcal{P} is exactly what one would expect: it consists of

• For every X_1,\ldots,X_n,Y \in \mathcal{O}_0, a map

F \colon \mathcal{O}(X_1,\ldots,X_n;Y) \to \mathcal{P}(F(X_1),\ldots,F(X_n);F(Y))

such that F commutes with all of the things an operad does, i.e. composition, identities, and the action of \sigma \in S(n).

Thus an operad morphism F from \mathcal{O} to \mathrm{Op}(\mathsf{Set}), also known as an operad algebra, consists of

• A set F(X) for every X \in \mathcal{O}_0
• A function F(f) \colon F(X_1) \times \cdots \times F(X_n) \to F(Y) for every operation f \in \mathcal{O}(X_1,\ldots,X_n;Y)

such that the assignment of sets and functions preserves identities, composition, and the action of \sigma \in S(n).

Without further ado, let’s look at the examples. From any ring A we can produce an algebra F_A of \mathcal{R}. We let F_A(R) = A (considered as a set), and for

p(x_1,\ldots,x_n) \in \mathbb{Z}[x_1,\ldots,x_n] = \mathcal{R}(X_1,\ldots,X_n;Y)

we let

F(p)(a_1,\ldots,a_n) = p(a_1,\ldots,a_n)

We can also make an operad algebra of point processes, \mathrm{PP}, for \mathcal{P}. For A \in \mathcal{P}_0, we let \mathrm{PP}(A) be the set of point processes on A. If f \colon A_1 \sqcup \cdots \sqcup A_n \to B is an embedding, then we let \mathrm{PP}(f) be the map that sends point processes X_1,\ldots,X_n on A_1,\ldots,A_n respectively to the point process Y defined by

Y_U = X_{f^{-1}(U) \cap A_1} + \cdots + X_{f^{-1}(U) \cap A_n}

Finally, if (C,\otimes,1) is a symmetric monoidal category, there is a way to make an operad algebra of \mathrm{Op}(C) from a special type of functor F \colon C \to \mathsf{Set}. This is convenient, because it is often easier to prove that the functor satisfies the necessary properties than it is to prove that the algebra is in fact well-formed.

The special kind of functor we need is a lax symmetric monoidal functor. This is a functor F equipped with a natural transformation \tau_{A,B} \colon F(A) \times F(B) \to F(A \otimes B) that is well-behaved with respect to the associator, identity, and symmetric structure of (C, \otimes, 1). We call \tau the laxator, and formally speaking, a lax symmetric monoidal functor consists of a functor along with a laxator. I won’t go into detail about the whole construction that makes an operad algebra out of a lax symmetric monoidal functor, but the basic idea is that given an operation f \in \mathrm{Op}(C)(X,Y;Z) (which is a morphism f \colon X \otimes Y \to Z), we can construct a function F(X) \times F(Y) \to F(Z) by composing

\tau_{X,Y} \colon F(X) \times F(Y) \to F(X \otimes Y)


F(f) \colon F(X \otimes Y) \to F(Z)

This basic idea can be extended using associativity to produce a function X_1 \times \cdots \times X_n \to Y from an operation f \colon X_1 \otimes \cdots \otimes X_n \to Y.

As an example of this construction, consider point processes again. We can make a lax symmetric monoidal functor \mathrm{PP} by sending a set A to \mathrm{PP}(A), the set of point processes on A, and an embedding f \colon A \to B to the map F(f) that sends a point process X to a point process Y defined by

Y_U = X_{f^{-1}(U)}

The laxator \tau_{A,B} \colon F(A) \times F(B) \to F(A \sqcup B) sends a point process X on A and a point process Y on B to a point process Z on a A \sqcup B defined by

Z_{U} = X_{U \cap A} + Y_{U \cap B}

The reader should inspect this definition and think about why it is equivalent to the earlier definition for the operad algebra of point processes.


This was a long post, so I’m going to try and go over the main points so that you can organize what you just learned in some sort of coherent fashion.

First I talked about how there frequently arises situations in which there isn’t a canonical way of ‘composing’ two things. The two examples that I gave were elements of a ring, and structures on spaces, specifically point processes.

I then talked about the formal way that we think about these situations. Namely, we organize the ‘ways of composing things’ into an operad, and then we organize the ‘things that we want to compose’ into an operad algebra. Along the way, I discussed a convenient way of making an operad out of a symmetric monoidal category, and an operad algebra out of a lax symmetric monoidal functor.

This construction will be important in the next post, when we make an operad of ‘ways of composing thermostatic systems’ and an operad algebra of thermostatic systems to go along with it.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

Compositional Thermostatics (Part 2)

7 February, 2022

guest post by Owen Lynch

In Part 1, John talked about a paper that we wrote recently:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

and he gave an overview of what a ‘thermostatic system’ is.

In this post, I want to talk about how to compose thermostatic systems. We will not yet use category theory, saving that for another post; instead we will give a ‘nuts-and-bolts’ approach, based on examples.

Suppose that we have two thermostatic systems and we put them in thermal contact, so that they can exchange heat energy. Then we predict that their temperatures should equalize. What does this mean precisely, and how do we derive this result?

Recall that a thermostatic system is given by a convex space X and a concave entropy function S \colon X \to [-\infty,\infty]. A ‘tank’ of constant heat capacity, whose state is solely determined by its energy, has state space X = \mathbb{R}_{> 0} and entropy function S(U) = C \log(U), where C is the heat capacity.

Now suppose that we have two tanks of heat capacity C_1 and C_2 respectively. As thermostatic systems, the state of both tanks is described by two energy variables, U_1 and U_2, and we have entropy functions

S_1(U_1) = C_1 \log(U_1)

S_2(U_2) = C_2 \log(U_2)

By conservation of energy, the total energy of both tanks must remain constant, so

U_1 + U_2 = U

for some U; equivalently

U_2 = U - U_1

The equilibrium state then has maximal total entropy subject to this constraint. That is, an equilibrium state (U_1^{\mathrm{eq}},U_2^{\mathrm{eq}}) must satisfy

S_1(U_1^{\mathrm{eq}}) + S_2(U_2^{\mathrm{eq}}) = \max_{U_1+U_2=U} S_1(U_1) + S_2(U_2)

We can now derive the condition of equal temperature from this condition. In thermodynamics, temperature is defined by

\displaystyle{ \frac{1}{T} = \frac{\partial S}{\partial U} }

The interested reader should calculate this for our entropy functions, and in doing this, see why we identify C with the heat capacity. Now, manipulating the condition of equilibrium, we get

\max_{U_1+U_2=U} S_1(U_1) + S_2(U_2) = \max_{U_1} S_1(U_1) + S_2(U-U_1)

As a function of U_1, the right hand side of this equation must have derivative equal to 0. Thus,

\displaystyle{ \frac{\partial}{\partial U_1} (S_1(U_1) + S_2(U-U_1)) = 0 }

Now, note that if U_2 = U - U_1, then

\displaystyle{  \frac{\partial}{\partial U_1} S(U-U_1) = -\frac{\partial}{\partial U_2} S(U_2) }

Thus, the condition of equilibrium is

\displaystyle{  \frac{\partial}{\partial U_1} S_1(U_1) = \frac{\partial}{\partial U_2} S_2(U_2) }

Using the fact that

\displaystyle{ \frac{1}{T_1} = \frac{\partial}{\partial U_1} S_1(U_1) , \qquad \frac{1}{T_2} = \frac{\partial}{\partial U_2} S_2(U_2) }

the above equation reduces to

\displaystyle{ \frac{1}{T_1} = \frac{1}{T_2} }

so we have our expected condition of temperature equilibriation!

The result of composing several thermostatic systems should be a new thermostatic system. In the case above, the new thermostatic system is described by a single variable: the total energy of the system U = U_1 + U_2. The entropy function of this new thermostatic system is given by the constrained supremum:

S(U) = \max_{U = U_1 + U_2} S_1(U_1) + S_2(U_2)

The reader should verify that this ends up being the same as a system with heat capacity C_1 + C_2, i.e. with entropy function given by

S(U) = (C_1 + C_2) \log(U)

A very similar argument goes through when one has two systems that can exchange both heat and volume; both temperature and pressure are equalized as a consequence of entropy maximization. We end up with a system that is parameterized by total energy and total volume, and has an entropy function that is a function of those quantities.

The general procedure is the following. Suppose that we have n thermostatic systems, (X_1,S_1),\ldots,(X_n,S_n). Let Y be a convex space, that we think of as describing the quantities that are conserved when we compose the n thermostatic systems (i.e., total energy, total volume, etc.). Each value of the conserved quantities y \in Y corresponds to many different possible values for x_1 \in X_1, \ldots x_n \in X_n. We represent this with a relation

R \subseteq X_1 \times \cdots \times X_n \times Y

We then turn Y into a thermostatic system by using the entropy function

S(y) = \max_{R(x_1,\ldots,x_n,y)} S_1(x_1) + \ldots + S_n(x_n)

It turns out that if we require R to be a convex relation (that is, a convex subspace of X_1 \times \cdots \times X_n \times Y) then S as defined above ends up being a concave function, so (Y,S) is a true thermostatic system.

We will have to wait until a later post in the series to see exactly how we describe this procedure using category theory. For now, however, I want to talk about why this procedure makes sense.

In the statistical mechanical interpretation, entropy is related to the probability of observing a specific macrostate. As we scale the system, the theory of large deviations tells us that seeing any macrostate other than the most probable macrostate is highly unlikely. Thus, we can find the macrostate that we will observe in practice by finding the entropy maxima. For an exposition of this point of view, see this paper:

• Jeffrey Commons, Ying-Jen Yang and Hong Qian, Duality symmetry, two entropy functions, and an eigenvalue problem in Gibbs’ theory.

There is also a dynamical systems interpretation of entropy, where entropy serves as a Lyapunov function for a dynamical system. This is the viewpoint taken here:

• Wassim M. Haddad, A Dynamical Systems Theory of Thermodynamics, Princeton U. Press.

In each of these viewpoints, however, the maximization of entropy is not global, but rather constrained. The dynamical system only maximizes entropy along its orbit, and the statistical mechanical system maximizes entropy with respect to constraints on the probability distribution.

We can think of thermostatics as a ‘common refinement’ of both of these points of view. We are agnostic as to the mechanism by which constrained maximization of entropy takes place and we are simply interested in investigating its consequences. We expect that a careful formalization of either system should end up deriving something similar to our thermostatic theory in the limit.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

Submission to arXiv

4 February, 2022

guest post by Phillip Helbig

Monthly Notices of the Royal Astronomical Society is one of the oldest and most prestigious journals in the fields of astronomy, astrophysics, and cosmology. My latest MNRAS paper was not allowed to appear in the astro-ph category at the arXiv (, the main avenue of distribution for scientific articles in many fields) because it was reclassified to a category which is inappropriate for several reasons. This is definitely not due to some technical error, misunderstanding, or oversight. It took more than three months for me to even be told why it had been reclassified, and that only after a well known cosmologist threatened the Scientific Director of arXiv that he would complain to the arXiv sponsors if things weren’t cleared up. Also, there is evidence that the reason I was given is not the real one.

Although I would like my paper to appear in astro-ph, this in not about just my paper. Rather, it is about the question whether the community wants arXiv to decide which papers, and hence which people, are allowed to be part of that community, as opposed to peer review by respected journals such as MNRAS. Below, after some general background on arXiv, I mention some policies which are probably not as well known as they should be, before briefly describing my own odyssey.

Like it or not, many if not most astronomers rely on arXiv at least for learning about new papers; some rely on it exclusively, despite the facts that not everything is on arXiv, that that which is there is not always in the definitive version, and that even if the definitive version is there, then that might not be clear. The last two (and, in some cases, the first as well) can be due to lazy authors or to restrictions imposed by journals as to what version, such as the ‘author’s accepted manuscript’, is allowed to appear; more-definitive versions hence either don’t appear or if so then that fact is not advertised. At the same time, publication in a respected journal is generally recognized as a mark of quality. In fact, the main reason that the quality of papers at arXiv is so high is that most of them will eventually appear in respected journals. So essentially journals are for separating the wheat from the chaff while arXiv has become the main method of distribution, because no subscription is required and because a majority of articles can be found at one website with a reasonably useful interface (the former is crucial for those without access to a subscription to every journal they might want to access and the latter saves large amounts of time). There is thus a problem if standards of acceptance between journals and arXiv differ.

The main reason, at least for me, to have my papers on arXiv is visibility. All else being equal, papers on arXiv are almost certainly read more, and probably cited more, than those which are not. (In a field in which a large fraction are on arXiv, the reason can’t be that only the better papers are put on arXiv. Also, at least a few years after the paper has appeared, having it on arXiv before it has appeared in the journal probably won’t substantially increase the number of times it is read and/or cited due to the only slightly increased time during which it has been available; the increased citation rate is due to the higher visibility from being on arXiv.) The ‘stamp of approval’ comes from the journal. It is easy to distribute open-access versions of the paper, although implementing a robust long-term storage strategy is not. Finding them is more difficult; that would be easiest via arXiv, but author-supplied links at the corresponding ADS& abstract web page are good enough.

People often look for open-access versions of papers via links on such web pages, especially if they want to make sure that they find the official version, not whatever version might be on arXiv; arXiv itself is not an option for papers which are not on arXiv; of course, ADS can be and is used completely independently of arXiv. Lack of visibility at arXiv is a serious disadvantage to an author and such decisions should be made only in extreme cases. (Also, having the paper at arXiv but in the wrong category can be worse than not having it there at all.)

arXiv is under no obligation to allow even a paper which has been accepted by a leading journal in the field to appear in the appropriate category (e.g., astro-ph for astronomy / astrophysics / cosmology), or even to appear at all. There are also some other things which are documented but not as well known as they should be, some things which are at best poorly documented, and inconsistent and/or incomplete recommendations. I think that it is important to alert the community to those in order to counter the impression held by many that everything worth reading is on arXiv and/or if something is not on arXiv then it must be a matter of the author excluding himself from the community, rather than being excluded by arXiv (references intentionally not included to avoid public shaming). (Of course, most who claim that all papers in their field worth reading are on arXiv are not in a position to make that claim, because they don’t read any papers which are not on arXiv.) I suspect that at least some of those things are known by many, but also that there is a fear of criticizing arXiv in public for fear of getting banned, which is the modern-day equivalent of excommunication.

According to the submission agreement, “[t]he Submitter waives…[a]ny claims against arXiv…based upon actions…including…decisions to include the Work in, or exclude the Work from, the repository…the classification or characterization of the Work.” “arXiv reserves the right to reject or reclassify any submission.” In other words, the idea that any serious paper (‘serious’ being defined here as having appeared in a respected journal) can (assuming, of course, that the journal allows it) be uploaded to arXiv is wrong. Also, arXiv reserves the right to reclassify the article, e.g. a paper submitted to astro-ph can be reclassified to gen-ph. Moreover, after such a reclassification, the author is not allowed to withdraw the paper (Steinn Sigurdsson*, personal communication; Eleonora Presani@, personal communication), although that is technically possible (by first ‘unsubmitting’ it then ‘deleting’ it).

Of course, journals also decide which papers they accept and reject. However, the comparison of arXiv with journals is not appropriate, for several reasons: arXiv does not peer-review submissions and claims to do only a minimal amount of moderation. Also, journals offer something between acceptance and rejection, namely the possibility of revision, coupled with the opportunity to discuss the degree of revision, or even reasons for rejection, with the referee(s) and/or editor(s). Of course, revision of an article accepted by a journal doesn’t make sense, but the fact that it is not offered is another piece of evidence that interaction with arXiv shouldn’t be compared to interaction with a journal. Moreover, if an article is rejected by a journal, it is not automatically submitted to another journal, much less without any possibility for the author to choose to withdraw it completely, hence the claim that the various arXiv categories are comparable to various journals with different standards (Eleonora Presani, personal communication) is dubious at best. In addition, there is usually more than one journal of comparable reputation in a given field, so the author has the chance of getting an independent evaluation. In that case, competition between journals is good. In the case of arXiv, however, a monopoly is actually good, as long as it works, because one of the main advantages of arXiv is that there is only one place one needs to look in order to find most papers. This is the main point of my criticism: arXiv’s unique relevance to the community means that excluding a paper from its intended category should be done only under extreme circumstances. arXiv has become one of the most important resources for the astronomical community but that community has essentially no control over arXiv. Great power should be accompanied by great responsibility. Quis custodiet ipsos custodes?

It is possible to appeal a decision. However, the appeals process is not well documented, in part because astro-ph is sometimes seen as a top-level category, sometimes as one of the physics categories. As part of the appeals process, “[e]xtreme cases may be addressed to the appropriate advisory committee chair only”. The value of a successful appeal is questionable, because most rely on the abstract lists for recent papers in a particularly category, either sent via email or available at the arXiv website. As far as I know, a paper reclassified after a successful appeal would not appear in the ‘recent’ list for that category. The main problem with such an appeal, though, is that arXiv is policing itself.

For various reasons, in recent years so-called arXiv-overlay journals have sprung up. There is even one for astrophysics, The Open Journal of Astrophysics, and I have published a review paper there. The basic idea is that there is a robust distribution structure already in place, namely arXiv, so the job of the journal is essentially only to provide refereeing. Such journals usually assume that all potential authors could post their paper to arXiv before submitting it to the journal, but obviously that is not the case. (Some even use the arXiv category as a filter to determine whether the paper could even be considered to be appropriate for the journal.) It is sometimes possible, though usually not widely advertised, to submit to the journal first and submit the paper to arXiv only after acceptance, which is what I did (like many, I prefer to put papers on arXiv only after acceptance). That paper had no problems at arXiv, but based on the reasons I’m presenting here, arXiv-overlay journals are no longer an option for me. (I have long suggested not only that should the possibility to submit to the journal before submitting to arXiv be more widely advertised, but also that the journal should have some sort of agreement with arXiv that any paper accepted by the journal automatically qualifies for the corresponding category at arXiv (after all, the purpose of a journal is publication); alas, the Open Journal of Astrophysics does not plan to pursue that at all: “OJA has no power to compel arXiv to accept submissions, nor would we want to. We see arXiv as the most important resource in astrophysics….”.) Despite the longevity and robustness of some traditional journals, the scientific publishing landscape is changing rapidly. That is a topic for another discussion, but part of it involves arXiv-overlay journals, and wrong assumptions about arXiv mean that a substantial part of the new system is built on shaky foundations.

Those who are interested in high-quality, free-for-readers-and-authors, well organized, open-access journals should check out Is there any valid reason to submit anywhere else? Their astronomy journals are just getting underway; please consider supporting them.

I learned about some of the things discussed above the hard way when my latest MNRAS paper was reclassified from astro-ph to gen-ph (general physics). Of course, I appealed the decision quickly, after discussing the matter with a few colleagues, some of whom assumed that it must have been some sort of technical glitch. It took more than three months before I was told a reason for the classification (after having escalated up to the highest levels of arXiv)§, and more than four before the appeals process finally ended. That paper is not on arXiv, and I don’t intend to post anything else to arXiv before the procedure becomes fairer, more transparent, and more accountable (if it ever does). I had escalated as highly as possible within arXiv before I asked Cornell University (which hosts arXiv) to investigate possible academic misconduct, which led to an email from Eleonora Presani. Her stance is essentially the same as that of Licia Verde#: my accusations themselves don’t seem to have been investigated and authors just have to live with the fact that arXiv can reclassify papers at will and even prevent authors from withdrawing them completely before announcement if they disagree with the reclassification. Unfortunately, Cornell takes the point of view that although Cornell maintains and sustains arXiv, it is not the university’s role to interfere in the moderation or appeal process.

There is evidence that I wasn’t told the real reason why my paper was reclassified$, and no-one with whom I have discussed the matter thinks that arXiv was right to reclassify my paper. (That doesn’t mean that they necessarily have a high opinion of my paper, but those are two separate issues. One colleague stated (though not in reference to my paper) that even the occasional papers which appear in respected journals obviously by mistake should appear on arXiv; that would put pressure on journals to be more careful and also benefit those wishing to critically discuss or refute them.) However, I will discuss that and other aspects (hopefully) unique to my case elsewhere (perhaps in the comments if there is interest), and here concentrate on problems which the astronomical community should recognize and try to correct.

I certainly regard reclassifying a paper which has appeared in MNRAS to a category other than astro-ph, giving reasons for the reclassification only after threat from a famous colleague, and then giving me a completely different reason, to be an extreme case. Thus, I did contact the chair of the physics advisory committee, Robert Seiringer; that he is the appropriate person was also confirmed by Licia Verde. Nevertheless, his response was that he could not investigate disputes involving individual submissions, which was also Verde’s reply to my complaint. Hence, not only is there disagreement between arXiv’s documented appeals procedure and how those involved actually behave, there seems to be no system of checks and balances within arXiv, not to mention the problem that the community, despite relying on arXiv, in practice has no way to arbitrate disputes with it; it is judge, jury, and executioner.

All who believe that my paper should be on arXiv in the astro-ph.CO category if I so desire are encouraged to contact the Scientific Director, the Executive Director, the Chair of the Scientific Advisory Board, and the Chair of the Physics Advisory Committee and complain. It is not necessary to think that my paper is great. It is enough if one thinks that it is not so bad that it should be banned from astro-ph, or even if one can point to worse papers which are in astro-ph. (Of course, if one agrees that my paper should appear in astro-ph.CO, the reason why arXiv has not (yet?) let it appear are irrelevant.)

Of course, my bad experience with arXiv is not the main point. The main point is that arXiv can, and does, make decisions which experts in the field (see third footnote; Tegmark wasn’t the only expert consulted by me) cannot understand at all. Due to fear of the consequences of criticizing arXiv, most of those probably go unnoticed. While arXiv does need the possibility to reject or reclassify some papers, that needs to be done transparently and fairly. However, in view of its value to the community, there should be some simple rules, such as a ‘white list’ of journals so that papers accepted by them automatically qualify for the corresponding category at arXiv. Fortunately, my own livelihood does not depend on submitting to arXiv (in either sense of the word). Imagine the consequences of a young scientist who, after a year or so of work, gets their first paper accepted by a serious journal, only to have it rejected by arXiv or reclassified into a category where no colleague, potential employer, and so on will see it. Not only that, but the decision is made by someone (or some thing; arXiv is now moving to classification based on machine learning, but that was not relevant to the reclassification of my paper (Licia Verde, personal communication)) via an untransparent algorithm and no reason is given. Any appeal is within arXiv itself and essentially consists of some people asking others if they are guilty and accepting the expected answer. Such behaviour should be an embarrassment to the scientific community.

I think that some action on the part of the community would be in order even if my paper were the only one affected. However, the problem is much larger. Many colleagues have told me that they disagree with the reclassification of my paper, but are afraid to say so publicly for fear of getting banned from arXiv themselves. Also, I have been told that I am far from the first person to make such complaints about arXiv. Since I have started discussing this with colleagues, a few other similar cases have been mentioned to me. Considering that many of those affected probably don’t mention it at all out of a false sense of shame, the number of people affected is probably larger than many might at first guess. (I am not on FaceBook, but I understand that a similar problem was recently discussed within a FaceBook group for professional astronomers.)

A new development is that arXiv, by its own admission, doesn’t have the necessary means to do its job properly, and that I am not the only one complaining about it:

• Daniel Garisto, reaches a milestone and a reckoning, Scientific American, 10 January 2022.

A red herring is that the American Astronomical Society has made all of its journals (which are some of the major journals in cosmology/astrophysics/astronomy) open-access. That probably won’t diminish the importance of arXiv—and hence the importance of making sure that it is run responsibly—for several reasons. First, an attraction of arXiv is that it is a one-stop shop with a reasonable interface, and by following it one can keep of with much of the literature in one’s field (though of course not all papers are posted to arXiv, but if it is run responsibly then there should be no reason for them not to be, except if the journal forbids posting (some version of) the paper to arXiv). Even if all papers were open-access, that would mean following websites, or RSS feeds, of several or even dozens of websites, not nearly as convenient as the abstract listings at arXiv. Second, the AAS journals have rather expensive publication fees, which are becoming increasingly hard to justify, especially in the case of online-only publications. (Note that there are journals with no publication fees which actually encourage the author to post something equivalent to the final version on arXiv with no embargo period; MNRAS is an example.) Third, items which would otherwise have limited circulation, such as theses and conference proceedings, can (in principle) be on arXiv.

I’m all for giving arXiv more support, but first my paper needs to be rehabilitated by being allowed into astro-ph, and the policies should be changed, and publicly communicated, so that such problems do not happen in the future (neither to me nor anyone else); I could then post my backlog. The evidence is that the goof is so large that a public apology is called for. The minimum which needs to be done:

  1. When a paper is reclassified, authors should be informed (now, there is not even an automatic email; that makes sense because arXiv thinks that it needs to reclassify some papers against the will of the submitter) and given a chance to approve the reclassification, delete the submission entirely, suggest another reclassification, or appeal. Until the matter is resolved, the submission should stay in the ‘hold’ status with no action required to keep it there (now, one has to unsubmit and resubmit it to keep it from going away).
  2. When a paper is reclassified, the submitter must be given concrete reasons.

  3. The appeals process needs to be overseen with some authority outside of arXiv which has the power to overrule arXiv’s decisions, otherwise it is more or less a farce. It seems to me that some committee in the corresponding professional organization would be a good choice, e.g. the International Astronomical Union for papers on cosmology / astrophysics / astronomy. There can be an internal appeals process, but the final authority of arXiv’s decisions should not reside with arXiv if arXiv is to provide a meaningful service to the community.

  4. Papers from the major journals should be essentially white-listed. If a paper is really so bad that it is obvious that it somehow slipped in by mistake, arXiv should request the journal to formally withdraw it. If the journal does so, then arXiv shouldn’t accept it either. If not, then it should go onto arXiv. (It should go on even if it is bad, to put pressure on journals to uphold quality and so that it can be discussed and rebutted).

  5. arXiv needs to publicly apologize for reclassifying papers for reasons other than quality or content (e.g. my case), and invite those papers to be resubmitted after the other points above have been implemented.

  6. The points above should make (re)submissions by wrong authors viable, but perhaps some sort of special protection is needed for whistle-blowers such as myself.

  7. I was going to call for the resignation of Seiringer, Verde, and Presani, but it seems that they have all no longer in the posts they were when interacting with me. The main guilty person, though, Sigurdsson, is still Scientific Director. How anyone can be aware of my story (which can be backed up with evidence, in court if necessary) and still think that Sigurdsson should have anything at all to do with arXiv is beyond me. Also, although they have chosen (probably with good reason) to remain nameless, if arXiv were not drastically wrong on this point, the distinguished colleagues who put in a lot of time and effort trying to get arXiv to reverse its decision would not have done so. I am extremely grateful to them for their courage.

Of course, a boycott will not put pressure on arXiv. (It would actually remove pressure if people who are critical of arXiv stop using it.) If really famous people publicly announce that they will stop posting to arXiv until the points I raise have been cleared up, that might lead to something.

It is not clear how large the problem is, in part because not everyone feels able to complain. I don’t think that my case is a one-off, or even part of a small minority, because otherwise arXiv would not have invested so much time and effort to prevent one more abstract from appearing in astro-ph. I have given them several opportunities to revert their decision and hence cut their losses, but never even received a reply to such requests. Thus, the problem is probably substantial, and hence should be of interest to the entire community.

Information based on the web pages pointed to by the URLs in the reference list reflects the state of those pages on 28 August 2020; that based on the technical behaviour of the arXiv interface reflects my experiences between 20 April and 25 July 2020. References to ‘arXiv’ reflect my experience with the astro-ph category.

I would be interested in hearing anything relevant to this topic by email (my address is easy enough to find). Please indicate the degree of confidentiality you wish.

Please point as many people as possible, by all means at your disposal, to this post and related discussion. I am probably taking a big risk by going public, but if I do so, I want it to have the maximum effect. I see the lack of accountability of arXiv as a serious problem in modern academia.


* Steinn Sigurdsson is the Scientific Director of arXiv.

@ Eleonora Presani was the first Executive Director of arXiv, the post having been created only in 2020, while arXiv itself was created in 1991. She used to work for Elsevier. On 21 December 2021, it was announced that she would step down. According to the same announcement, Steinn Sigurdsson is still Scientific Director. Robert Seiringer is no longer Chair of the physics committee. I don’t see a new Executive Director listed on the arXiv Leadership Team web page.

§ Even that happened only after noted cosmologist Max Tegmark had threatened to complain to arXiv’s sponsors if my paper wasn’t taken out of limbo. Before, I had received only an extremely brief reply from Sigurdsson, and that only after a colleague who has known him for a long time discussed my complaints with him. Tegmark not only agrees that arXiv is overstepping its bounds by essentially overriding the refereeing process of a respected journal, but also that there is no reason that my paper should not be allowed to appear in astro-ph. He was also kind enough and brave enough to give me permission to quote from his emails to me. These do contain quotations of emails he received from arXiv. Ethically, I think that trying to correct the tremendous harm done to me and others because of wrong reclassification overrides any concerns about quoting without permission (which of course would not be given), especially since such quotations make my case much stronger than merely paraphrasing what others have told me or even just my own suspicions; this is a typical whistle-blower situation.

$ The only reason which I was given is the alleged lack of “substantiveness” of the paper. Max Tegmark, on the other hand, wasn’t told that, but was told that my case is “complicated” and that “[t]he reason for this [arXiv not automatically accepting a paper accepted by a journal] is partly the SCOAP3 agreement, which arXiv is not party to but still put certain obligations on us, and partly because we can not privilege any one journal or publisher for legal reasons. We get sued.” (Max Tegmark, personal communication.) I certainly don’t think that arXiv should automatically accept a paper just because it has been accepted by any journal, but do think that rejecting or reclassifying a paper which has been accepted by a respected journal should be done only under extreme circumstances, via a transparent and fair process, and for reasons which can be explained. Also, no one I have talked to has any idea how SCOAP3 could be relevant to my paper. Apart from Max Tegmark, several other colleagues (all full professors of cosmology / astrophysics / astronomy at major research universities) tried to intervene with arXiv (which did not want even discuss the matter with a low-life such as myself). That none of them want their names mentioned publicly is a problem in itself: the people whom arXiv is supposed to serve do not feel free to offer constructive criticism in public. Between the lines (or even in them, if one is allowed to see them), it seems that, in my case, the reclassification was not due to the contents or quality of my paper, but rather indicates another, possibly even more serious, problem: arXiv appears to be afraid of getting sued by crackpots. Apparently they abuse the gen-ph category (which is a mix of papers about general physics, papers which at first or even second or third glance obviously belong in another category and have nothing obviously wrong with them, and genuine crackpot stuff) by reclassifying some real papers to it and also letting through a few crackpot papers, thus avoiding the accusation of white-listing the major journals (which shouldn’t be a problem) and the crackpots can be appeased by having their papers in the same category as some major-journal papers. Of course this is not a policy which arXiv has published, but when several people get the same message behind the scenes, it is as certain as it needs to be to make my case. Although I believe that the concept still would have been deeply flawed, I offered to leave the paper in gen-ph but get have it cross-listed to astro-ph, but that suggestion was rejected by arXiv. Of course, if their goal is to appease the crackpots but at the same time keep them out of the major categories, that strategy wouldn’t work, because they would then have to cross-list crackpot papers or make a distinction, which is what they are trying to avoid (or rather they want to have a few alibi papers with no distinction).

# Licia Verde was Chair of the arXiv Scientific Advisory Committee. The Chair is now Ralph Wijers, who is also chair of the Physics Advisory Committee. I did contact him, but he sees no reason to investigate my case, as it happened before his posts as Chairman.

& The SAO/NASA Astrophysics Data system is the most important bibliographic database in astronomy/astrophysics/cosmology, operated by the Smithsonian Astrophysical Observatory (part of the Harvard/Smithsonian Center for Astrophysics, which also includes the Harvard College Observatory) under a grant from the National Aeronautics and Space Administration.

Hardy, Ramanujan and Taxi No. 1729

30 January, 2022

In his book Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work, G. H. Hardy tells this famous story:

He could remember the idiosyncracies of numbers in an almost uncanny way. It was Littlewood who said every positive integer was one of Ramanujan’s personal friends. I remember once going to see him when he was lying ill at Putney. I had ridden in taxi-cab No. 1729, and remarked that the number seemed to be rather a dull one, and that I hoped it was not an unfavourable omen. “No,” he replied, “it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways.”


10^3 + 9^3 = 1000 + 729 = 1729 = 1728 + 1 = 12^3 + 1^3

But there’s more to this story than meets the eye.

First, it’s funny how this story becomes more dramatic with each retelling. In the foreword to Hardy’s book A Mathematician’s Apology, his friend C. P. Snow tells it thus:

Hardy used to visit him, as he lay dying in hospital at Putney. It was on one of those visits that there happened the incident of the taxicab number. Hardy had gone out to Putney by taxi, as usual his chosen method of conveyance. He went into the room where Ramanujan was lying. Hardy, always inept about introducing a conversation, said, probably without a greeting, and certainly as his first remark: “I thought the number of my taxicab was 1729. It seemed to me rather a dull number.” To which Ramanujan replied: “No, Hardy! No, Hardy! It is a very interesting number. It is the smallest number expressible as the sum of two cubes in two different ways.”

Here Hardy becomes “inept” and makes his comment “probably without a greeting, and certainly as his first remark”. Perhaps the ribbing of a friend who knew Hardy’s ways?

I think I’ve seen later versions where Hardy “burst into the room”.

But it’s common for legends to be embroidered with the passage of time. Here’s something more interesting. In Ono and Trebat-Leder’s paper The 1729 K3 surface, they write:

While this anecdote might give one the impression that Ramanujan came up with this amazing property of 1729 on the spot, he actually had written it down before even coming to England.

In fact they point out that Ramanujan wrote it down more than once!

Before he went to England, Ramanujan mainly published by posting puzzles to the questions section of the Journal of the Indian Mathematical Society. In 1913, in Question 441, he challenged the reader to prove a formula expressing a specific sort of perfect cube as a sum of three perfect cubes. If you keep simplifying this formula to see why it works, you eventually get

12^3 = (-1)^3 + 10^3 + 9^3

In Ramanujan’s Notebooks, Part III, Bruce Berndt explains that Ramanujan developed a method for finding solutions of Euler’s diophantine equation

a^3 + b^3 = c^3 + d^3

in his “second notebook”. This is one of three notebooks Ramanujan left behind after his death—and the results in this one were written down before he first went to England. In Item 20(iii) he describes his method and lists many example solutions, the simplest being

1^3 + 12^3 = 9^3 + 10^3

In 1915 Ramanujan posed another puzzle about writing a sixth power as a sum of three cubes, Question 661. And he posed a puzzle about writing $1$ as a sum of three cubes, Question 681.

Finally, four or five years later, Ramanujan revisited the equation a^3 + b^3 = c^3  + d^3 in his so-called Lost Notebook. This was actually a pile of 138 loose unnumbered pages written by Ramanujan in the last two years of his life, 1919 and 1920. George Andrews found them in a box in Trinity College, Cambridge much later, in 1976.

Now the pages have been numbered, published and intensively studied: George Andrews and Bruce Berndt have written five books about them! Here is page 341 of Ramanujan’s Lost Notebook, where he came up with a method for finding an infinite family of integer solutions to the equation a^3 + b^3 = c^3  + d^3:

As you can see, one example is

9^3 + 10^3 = 12^3 + 1

In Section 8.5 of George Andrews and Bruce Berndt’s book
Ramanujan’s Lost Notebook: Part IV, they discuss Ramanujan’s method, calling it “truly remarkable”.

In short, Ramanujan was well aware of the special properties of the number 1729 before Hardy mentioned it. And something prompted Ramanujan to study the equation a^3 + b^3 = c^3  + d^3 again near the end of his life, and find a new way to solve it.

Could it have been the taxicab incident??? Or did Hardy talk about the taxi after Ramanujan had just thought about the number 1729 yet again? In the latter case, it’s hardly a surprise that Ramanujan remembered it.

Thinking about this story, I’ve started wondering about what really happened here. First of all, as James Dolan pointed out to me, you don’t need to be a genius to notice that

1000 + 729 = 1728 + 1

Was Hardy, the great number theorist, so blind to the properties of numbers that he didn’t notice either of these ways of writing 1729 as a sum of two cubes? Base ten makes them very easy to spot if you know your cubes, and I’m sure Hardy knew 9^3 = 729 and 12^3 = 1728.

Second of all, how often do number theorists come out and say that a number is uninteresting? Except in that joke about the “least uninteresting number”, I don’t think I’ve heard it happen.

My wife Lisa suggested an interesting possibility that would resolve all these puzzles:

Hardy either knew of Ramanujan’s work on this problem or noticed himself that 1729 had a special property. He wanted to cheer up his dear friend Ramanujan, who was lying deathly ill in the hospital. So he played the fool by walking in and saying that 1729 was “rather dull”.

I have no real evidence for this, and I’m not claiming it’s true. But I like how it flips the meaning of the story. And it’s not impossible. Hardy was, after all, a bit of a prankster: each time he sailed across the Atlantic he sent out a postcard saying he had proved the Riemann Hypothesis, just in case he drowned.

We could try to see if there really was a London taxi with number 1729 at that time. It would be delicious to discover that it was merely an invention of Hardy’s. But I don’t know if records of London taxi-cab numbers from around 1919 still exist.

Maybe I’ll let C. P. Snow have the last word. After telling his version of the incident with Hardy, Ramanujan and the taxicab, he writes:

This is the exchange as Hardy recorded it. It must be substantially accurate. He was the most honest of men; and further no one could possibly have invented it.