• Brendan Fong, *The Algebra of Open and Interconnected Systems*, Ph.D. thesis, Department of Computer Science, University of Oxford, 2016.

This material is close to my heart, since I’ve informally served as Brendan’s advisor since 2011, when he came to Singapore to work with me on chemical reaction networks. We’ve been collaborating intensely ever since. I just looked at our correspondence, and I see it consists of 880 emails!

At some point I gave him a project: *describe the category whose morphisms are electrical circuits*. He took up the challenge much more ambitiously than I’d ever expected, developing powerful general frameworks to solve not only this problem but also many others. He did this in a number of papers, most of which I’ve already discussed:

• Brendan Fong, Decorated cospans, *Th. Appl. Cat.* **30** (2015), 1096–1120. (Blog article here.)

• Brendan Fong and John Baez, A compositional framework for passive linear circuits. (Blog article here.)

• Brendan Fong, John Baez and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Brendan Fong and Brandon Coya, Corelations are the prop for extraspecial commutative Frobenius monoids. (Blog article here.)

• Brendan Fong, Paolo Rapisarda and Paweł Sobociński,

A categorical approach to open and interconnected dynamical systems.

But Brendan’s thesis is the best place to see a lot of this material in one place, integrated and clearly explained.

I wanted to write a summary of his thesis. But since he did that himself very nicely in the preface, I’m going to be lazy and just quote that! (I’ll leave out the references, which are crucial in scholarly prose but a bit off-putting in a blog.)

This is a thesis in the mathematical sciences, with emphasis on the mathematics. But before we get to the category theory, I want to say a few words about the scientific tradition in which this thesis is situated.

Mathematics is the language of science. Twinned so intimately with physics, over the past centuries mathematics has become a superb—indeed, unreasonably effective—language for understanding planets moving in space, particles in a vacuum, the structure of spacetime, and so on. Yet, while Wigner speaks of the unreasonable effectiveness of mathematics in the natural sciences, equally eminent mathematicians, not least Gelfand, speak of the unreasonable *ineffectiveness* of mathematics in biology and related fields. Why such a difference?

A contrast between physics and biology is that while physical systems can often be studied in isolation—the proverbial particle in a vacuum—biological systems are necessarily situated in their environment. A heart belongs in a body, an ant in a colony. One of the first to draw attention to this contrast was Ludwig von Bertalanffy, biologist and founder of general systems theory, who articulated the difference as one between closed and open systems:

Conventional physics deals only with closed systems, i.e. systems which are considered to be isolated from their environment. […] However, we find systems which by their very nature and definition are not closed systems. Every living organism is essentially an open system. It maintains itself in a continuous inflow and outflow, a building up and breaking down of components, never being, so long as it is alive, in a state of chemical and thermodynamic equilibrium but maintained in a so-called ‘steady state’ which is distinct from the latter.

While the ambitious generality of general systems theory has proved difficult, von Bertalanffy’s philosophy has had great impact in his home field of biology, leading to the modern field of systems biology. Half a century later, Dennis Noble, another great pioneer of systems biology and the originator of the first mathematical model of a working heart, describes the shift as one from reduction to integration.

Systems biology […] is about putting together rather than taking apart, integration rather than reduction. It requires that we develop ways of thinking about integration that are as rigorous as our reductionist programmes, but different. It means changing our philosophy, in the full sense of the term.

In this thesis we develop rigorous ways of thinking about integration or, as we refer to it, interconnection.

Interconnection and openness are tightly related. Indeed, openness implies that a system may be interconnected with its environment. But what is an environment but comprised of other systems? Thus the study of open systems becomes the study of how a system changes under interconnection with other systems.

To model this, we must begin by creating language to describe theinterconnection of systems. While reductionism hopes that phenomena can be explained by reducing them to “elementary units investigable independently of each other” (in the words of von Bertalanffy), this philosophy of integration introduces as an additional and equal priority the investigation of the way these units are interconnected. As such, this thesis is predicated on the hope that the meaning of an expression in our new language is determined by the meanings of its constituent expressions together with the syntactic rules combining them. This is known as the principle of compositionality.

Also commonly known as Frege’s principle, the principle of compositionality both dates back to Ancient Greek and Vedic philosophy, and is still the subject of active research today. More recently, through the work of Montague in natural language semantics and Strachey and Scott in programming language semantics, the principle of compositionality has found formal expression as the dictum that the interpretation of a language should be given by a homomorphism from an algebra of syntactic representations to an algebra of semantic objects. We too shall follow this route.

The question then arises: what do we mean by algebra? This mathematical question leads us back to our scientific objectives: what do we mean by system? Here we must narrow, or at least define, our scope. We give some examples. The investigations of this thesis began with electrical circuits and their diagrams, and we will devote significant time to exploring their compositional formulation. We discussed biological systems above, and our notion of system

includes these, modelled say in the form of chemical reaction networks or Markov processes, or the compartmental models of epidemiology, population biology, and ecology. From computer science, we consider Petri nets, automata, logic circuits, and the like. More abstractly, our notion of system encompasses matrices and systems of differential equations.

Drawing together these notions of system are well-developed diagrammatic representations based on network diagrams— that is, topological graphs. We call these network-style diagrammatic languages. In abstract, by ‘system’ we shall simply mean that which can be represented by a box with a collection of terminals, perhaps of different types, through which it interfaces with the surroundings. Concretely, one might envision a circuit diagram with terminals, such as

or

The algebraic structure of interconnection is then simply the structure that results from the ability to connect terminals of one system with terminals of another. This graphical approach motivates our language of interconnection: indeed, these diagrams will be the expressions of our language.

We claim that the existence of a network-style diagrammatic language to represent a system implies that interconnection is inherently important in understanding the system. Yet, while each of these example notions of system are well-studied in and of themselves, their compositional, or algebraic, structure has received scant attention. In this thesis, we study an algebraic structure called a ‘hypergraph category’, and argue that this is the relevant algebraic structure for modelling interconnection of open systems.

Given these pre-existing diagrammatic formalisms and our visual intuition, constructing algebras of syntactic representations is thus rather straightforward. The semantics and their algebraic structure are more subtle.

In some sense our semantics is already given to us too: in studying these systems as closed systems, scientists have already formalised the meaning of these diagrams. But we have shifted from a closed perspective to an open one, and we need our semantics to also account for points of interconnection.

Taking inspiration from Willems’ behavioural approach and Deutsch’s constructor theory, in this thesis I advocate the following position. First, at each terminal of an open system we may make measurements appropriate to the type of terminal. Given a collection of terminals, the **universum** is then the set of all possible measurement outcomes. Each open system has a collection of terminals, and hence a universum. The semantics of an open system is the subset of measurement outcomes on the terminals that are permitted by the system. This is known as the **behaviour** of the system.

For example, consider a resistor of resistance This has two terminals—the two ends of the resistor—and at each terminal, we may measure the potential and the current. Thus the universum of this system is the set where the summands represent respectively the potentials and currents at each of the two terminals. The resistor is governed by Kirchhoff’s current law, or conservation of charge,

and Ohm’s law. Conservation of charge states that the current flowing into one terminal must equal the current flowing out of the other terminal, while Ohm’s law states that this current will be proportional to the potential difference, with constant of proportionality Thus the behaviour of the resistor is the set

Note that in this perspective a law such as Ohm’s law is a mechanism for partitioning *behaviours* into possible and impossible behaviours.

Interconnection of terminals then asserts the identification of the variables at the identified terminals. Fixing some notion of open system and subsequently an algebra of syntactic representations for these systems, our approach, based on the principle of compositionality, requires this to define an algebra of semantic objects and a homomorphism from syntax to semantics. The first part of this thesis develops the mathematical tools necessary to pursue this vision for modelling open systems and their interconnection.

The next goal is to demonstrate the efficacy of this philosophy in applications. At core, this work is done in the faith that the right language allows deeper insight into the underlying structure. Indeed, after setting up such a language for open systems there are many questions to be asked: Can we find a sound and complete logic for determining when two syntactic expressions have the same semantics? Suppose we have systems that have some property, for example controllability. In what ways can we interconnect controllable systems so that the combined system is also controllable? Can we compute the semantics of a large system quicker by computing the semantics of subsystems and then composing them? If I want a given system to achieve a specified trajectory, can we interconnect another system to make it do so? How do two different notions of system, such as circuit diagrams and signal flow graphs, relate to each other? Can we find homomorphisms between their syntactic and semantic algebras? In the second part of this thesis we explore some applications in depth, providing answers to questions of the above sort.

The thesis is divided into two parts. Part I, comprising

Chapters 1 to 4, focuses on mathematical foundations. In it we develop the theory of hypergraph categories and a powerful tool for constructing and manipulating them: decorated corelations. Part II, comprising Chapters 5 to 7, then discusses applications of this theory to examples of open systems.

The central refrain of this thesis is that the syntax and semantics of network-style diagrammatic languages can be modelled by hypergraph categories. These are introduced in Chapter 1. **Hypergraph categories** are symmetric monoidal categories in which every object is equipped with the structure of a special commutative Frobenius monoid in a way compatible with the monoidal product. As we will rely heavily on properties of monoidal categories, their functors, and their graphical calculus, we begin with a whirlwind review of these ideas. We then provide a definition of hypergraph categories and their functors, a strictification theorem, and an important example: the category of cospans in a category with finite colimits.

A **cospan** is a pair of morphisms

with a common codomain. In Chapter 2 we introduce the idea of a ‘decorated cospan’, which equips the apex with extra structure. Our motivating example is cospans of finite sets decorated by graphs, as in this picture:

Here graphs are a proxy for expressions in a network-style diagrammatic language. To give a bit more formal detail, let be a category with finite colimits, writing its as coproduct as and let be a braided monoidal category. Decorated cospans provide a method of producing a hypergraph category from a lax braided monoidal functor

The objects of these categories are simply the objects of while the morphisms are pairs comprising a cospan in together with an element in —the so-called **decoration**. We will also describe how to construct hypergraph functors between decorated cospan categories. In particular, this provides a useful tool for constructing a hypergraph category that captures the syntax of a network-style diagrammatic language.

Having developed a method to construct a category where the morphisms are expressions in a diagrammatic language, we turn our attention to categories of semantics. This leads us to the notion of a corelation, to which we devote Chapter 3. Given a factorisation system on a category we define a **corelation** to be a cospan such that the copairing of the two maps, a map is a morphism in Factorising maps using the factorisation system leads to a notion of equivalence on cospans, and this helps us describe when two diagrams are equivalent. Like cospans, corelations form hypergraph categories.

In Chapter 4 we decorate corelations. Like decorated cospans,

decorated corelations are corelations together with some additional structure on the apex. We again use a lax braided monoidal functor to specify the sorts of extra structure allowed. Moreover, decorated corelations too form the morphisms of a hypergraph category. The culmination of our theoretical work is to show that every hypergraph category and every hypergraph functor can be constructe using decorated corelations. This implies that we can use decorated corelations to construct a semantic hypergraph category for any network-style diagrammatic language, as well as a hypergraph functor from its syntactic category that interprets each diagram. We also discuss how the intuitions behind decorated corelations guide construction of these categories and functors.

Having developed these theoretical tools, in the second part we turn to demonstrating that they have useful applications. Chapter 5 uses corelations to formalise signal flow diagrams representing linear time-invariant discrete dynamical systems as morphisms in a category. Our main result gives an intuitive sound and fully complete equational theory for reasoning about these linear time-invariant systems. Using this framework, we derive a novel structural characterisation of controllability, and consequently provide a methodology for analysing controllability of networked and interconnected systems.

Chapter 6 studies passive linear networks. Passive linear

networks are used in a wide variety of engineering applications, but the best studied are electrical circuits made of resistors, inductors and capacitors. The goal is to construct what we call the ‘black box functor’, a hypergraph functor from a category of open circuit diagrams to a category of behaviours of circuits. We construct the former as a decorated cospan category, with each morphism a cospan of finite sets decorated by a circuit diagram on the apex. In this category, composition describes the process of attaching the outputs of one circuit to the inputs of another. The behaviour of a circuit is the relation it imposes between currents and potentials at their terminals. The space of these currents and potentials naturally has the structure of a symplectic vector space, and the relation imposed by a circuit is a Lagrangian linear relation. Thus, the black box functor goes from our category of circuits to the category of symplectic vector spaces and Lagrangian linear relations. Decorated corelations provide a critical tool for constructing these hypergraph categories and the black box functor.

Finally, in Chapter 7 we mention two further research directions. The first is the idea of a ‘bound colimit’, which aims to describe why epi-mono factorisation systems are useful for constructing corelation categories of semantics for open systems. The second research direction pertains to applications of the black box functor for passive linear networks, discussing the work of Jekel on the inverse problem for electric circuits and the work of Baez, Fong, and Pollard on open Markov processes.

]]>

Yesterday Blake Pollard and I drove to Metron’s branch in San Diego. For the first time, I met four of the main project participants: John Foley (math), Thy Tran (programming), Tom Mifflin and Chris Boner (two higher-ups involved in the project). Jeff Monroe and Tiffany Change give us a briefing on Metron’s ExAMS software. This lets you design complex systems and view them in various ways.

The most fundamental view is the ‘activity trace’, which consists of a bunch of parallel rows, one for each ‘performer’. Each row has a bunch of boxes which represent ‘activities’ that the performer can do. Two boxes are connected by a wire when one box’s activity causes another to occur. In general, time goes from left to right. Thus, if B can only occur after A, the box for B is drawn to the right of the box for A.

The wires can also merge via logic gates. For example, suppose activity D occurs whenever A and B but not C have occurred. Then wires coming out of the A, B, and C boxes merge in a logic gate and go into the A box. However, these gates are a bit more general than your ordinary Boolean logic gates. They may also involve ‘delays’, e.g. we can say that A occurs 10 minutes after B occurs.

I would like to understand the mathematics of just these logic gates, for starters. Ignoring delays for a minute (get the pun?), they seem to be giving a generalization of Petri nets. In a Petri net we only get to use the logical connective ‘and’. In other words, an activity can occur when *all* of some other activities have occurred. People have considered various generalizations of Petri nets, and I think some of them allow more general logical connectives, but I’m forgetting where I saw this done. Do you know?

In the full-fledged activity traces, the ‘activity’ boxes also compute functions, whose values flow along the wires and serve as inputs to other box. That is, when an activity occurs, it produces an output, which depends on the inputs entering the box along input wires. The output then appears on the wires coming out of that box.

I forget if each activity box can have multiple inputs and multiple outputs, but that’s certainly a natural thing.

The fun part is that one one can zoom in on any activity trace, seeing more fine-grained descriptions of the activities. In this more fine-grained description each box turns into a number of boxes connected by wires. And perhaps each wire becomes a number of parallel wires? That would be mathematically natural.

Activity traces give the so-called ‘logical’ description of the complex system being described. There is also a much more complicated ‘physical’ description, saying the exact mechanical functioning of all the parts. These parts are described using ‘plugins’ which need to be carefully described ahead of time—but can then simply be *used* when assembling a complex system.

Our little team is supposed to be designing our own complex systems using operads, but we want to take advantage of the fact that Metron already has this working system, ExAMS. Thus, one thing I’d like to do is understand ExAMS in terms of operads and figure out how to do something exciting and new using this understanding. I was very happy when Tom Mifflin embraced this goal.

Unfortunately there’s no manual for ExAMS: the US government was willing to pay for the creation of this system, but not willing to pay for documentation. Luckily it seems fairly simple, at least the part that I care about. (There are a lot of other views derived from the activity trace, but I don’t need to worry about these.) Also, ExAMS uses some DoDAF standards which I can read about. Furthermore, in some ways it resembles UML and SySML, or more precisely, certain *parts* of these languages.

In particular, the ‘activity diagrams’ in UML are a lot like the activity traces in ExAMS. There’s an activity diagram at the top of this page, and another below, in which time proceeds down the page.

So, I plan to put some time into understanding the underlying math of these diagrams! If you know people who have studied them using ideas from category theory, please tell me.

]]>

David Thouless of the University of Washington:

and Duncan Haldane of Princeton University:

They won it for their “theoretical discovery of topological phase transitions and topological phases of matter”, which was later confirmed by many experiments.

Sadly, the world’s reaction was aptly summarized by *Wired* magazine’s headline:

Nobel Prize in Physics Goes to Another Weird Thing Nobody Understands

Journalists worldwide struggled to pronounce ‘topology’, and a member of the Nobel prize committee was reduced to waving around a bagel and a danish to explain what the word means:

That’s fine as far as it goes: I’m all for using food items to explain advanced math! However, it doesn’t explain what Kosterlitz, Thouless and Haldane actually did. I think a 3-minute video with the right animations would make the beauty of their work perfectly clear. I can see it in my head. Alas, I don’t have the skill to make those animations—hence this short article.

I’ll just explain the Kosterlitz–Thouless transition, which is an effect that shows up in thin films of magnetic material. Haldane’s work on magnetic wires is related, but it deserves a separate story.

I’m going to keep this very quick! For more details, try this excellent blog article:

• Brian Skinner, Samuel Beckett’s guide to particles and antiparticles, *Ribbonfarm*, 24 September 2015.

I’m taking all my pictures from there.

Imagine a thin film of stuff where each atom’s spin likes to point in the same direction as its neighbors. Also suppose that each spin must point in the plane of the material.

Your stuff will be happiest when all its spins are lined up, like this:

What does ‘happy’ mean? Physicists often talk this way. It sounds odd, but it means something precise: it means that the energy is low. When your stuff is very cold, its energy will be as low as possible, so the spins will line up.

When you heat up your thin film, it gets a bit more energy, so the spins can do more interesting things.

Here’s one interesting possibility, called a ‘vortex’:

The spins swirl around like the flow of water in a whirlpool. Each spin is *fairly* close to being lined up to its neighbors, except near the middle where they’re doing a terrible job.

The total energy of a vortex is enormous. The reason is not the problem at the middle, which certainly contributes some energy. The reason is that ‘fairly’ close is not good enough. The spins fail to perfectly line up with their neighbors even far away from the middle of this picture. This problem is bad enough to make the energy huge. (In fact, the energy would be *infinite* if our thin film of material went on forever.)

So, even if you heat up your substance, there won’t be enough energy to make many vortices. This made people think vortices were irrelevant.

But there’s another possibility, called an ‘antivortex’:

A single antivortex has a huge energy, just like a vortex. So again, it might seem antivortices are irrelevant if you’re wondering what your stuff will do when it has just a little energy.

But here’s what Kosterlitz and Thouless noticed: the combination of *a vortex together with an antivortex* has much less energy than either one alone! So, when your thin film of stuff is hot enough, the spins will form ‘vortex-antivortex pairs’.

Brian Skinner has made a beautiful animation showing how this happens. A vortex-antivortex pair can appear out of nothing:

… and then disappear again!

Thanks to this process, at low temperatures our thin film will contain a dilute ‘gas’ of vortex-antivortex pairs. Each vortex will stick to an antivortex, since it takes a lot of energy to separate them. These vortex-antivortex pairs act a bit like particles: they move around, bump into each other, and so on. But unlike most ordinary particles, they can appear out of nothing, or disappear, in the process shown above!

As you heat up the thin film, you get more and more vortex-antivortex pairs, since there’s more energy available to create them. But here’s the really surprising thing. Kosterlitz and Thouless showed that as you turn up the heat, there’s a certain temperature at which the vortex-antivortex pairs suddenly ‘unbind’ and break apart!

Why? Because at this point, the density of vortex-antivortex pairs is so high, and they’re bumping into each other so much, that we can’t tell which vortex is the partner of which antivortex. All we’ve got is a thick soup of vortices and antivortices!

What’s interesting is that this happens *suddenly* at some particular temperature. It’s a bit like how ice *suddenly* turns into liquid water when it warms above its melting point. A sudden change in behavior like this is called a **phase transition**.

So, the **Kosterlitz–Thouless transition** is the sudden unbinding of the vortex-antivortex pairs as you heat up a thin film of stuff where the spins are confined to a plane and they like to line up.

In fact, the pictures above are relevant to many other situations, like thin films of superconductive materials. So, these too can exhibit a Kosterlitz–Thouless transition. Indeed, the work of Kosterlitz and Thouless was the key that unlocked a treasure room full of strange new states of matter, called ‘topological phases’. But this is another story.

What is the actual definition of a vortex or antivortex? As you march around either one and look at the little arrows, the arrows turn around—one full turn. It’s a **vortex** if when you walk around it clockwise the little arrows make a full turn *clockwise*:

It’s an **antivortex** if when you walk around it clockwise the little arrows make a full turn *counterclockwise*:

Topologists would say the vortex has ‘winding number’ 1, while the antivortex has winding number -1.

In the physics, the winding number is very important. Any collection of vortex-antivortex pairs has winding number 0, and Kosterlitz and Thouless showed that situations with winding number 0 are the only ones with small enough energy to be important for a large thin film at rather low temperatures.

Now for the puzzles:

**Puzzle 1:** What’s the mirror image of a vortex? A vortex, or an antivortex?

**Puzzle 2:** What’s the mirror image of an antivortex?

Here are some clues, drawn by the science fiction writer Greg Egan:

and the mathematician Simon Willerton:

To dig a bit deeper, try this:

• The Nobel Prize in Physics 2016, Topological phase transitions and topological phases of matter.

It’s a very well-written summary of what Kosterlitz, Thouless and Haldane did.

Also, check out Simon Burton‘s simulation of the system Kosterlitz and Thouless were studying:

In this simulation the spins start out at random and then evolve towards equilibrium at a temperature far below the Kosterlitz–Thouless transition. When equilibrium is reached, we have a gas of vortex-antivortex pairs. Vortices are labeled in blue while antivortices are green (though this is not totally accurate because the lattice is discrete). Burton says that if we raise the temperature to the Kosterlitz–Thouless transition, the movie becomes ‘a big mess’. That’s just what we’d expect as the vortex-antivortex pairs unbind.

I thank Greg Egan, Simon Burton, Brian Skinner, Simon Willerton and Haitao Zhang, whose work made this blog article infinitely better than it otherwise would be.

]]>

What’s a ‘system of systems’?

It’s a system made of many disparate parts, each of which is a complex system in its own right. The biosphere is a system of systems. But so far, people usually use this buzzword for large human-engineered systems where the different components are made by different organizations, perhaps over a long period of time, with changing and/or incompatible standards. This makes it impossible to fine-tune everything in a top-down way and have everything fit together seamlessly.

So, systems of systems are inherently messy. And yet we need them.

Metron was applying for a grant from **DARPA**, the Defense Advanced Research Projects Agency, which funds a lot of cutting-edge research for the US military. It may seem surprising that DARPA is explicitly interested in using *category theory* to study systems of systems. But it actually shouldn’t be surprising: their mission is to try many things and find a few that work. They are willing to take risks.

Metron was applying for a grant under a DARPA program run by John S. Paschkewitz, who is interested in

new paradigms and foundational approaches for the design of complex systems and system-of-systems (SoS) architectures.

This program is called **CASCADE**, short for Complex Adaptive System Composition and Design Environment. Here’s the idea:

Complex interconnected systems are increasingly becoming part of everyday life in both military and civilian environments. In the military domain, air-dominance system-of-systems concepts, such as those being developed under DARPA’s SoSITE effort, envision manned and unmanned aircraft linked by networks that seamlessly share data and resources in real time. In civilian settings such as urban “smart cities”, critical infrastructure systems—water, power, transportation, communications and cyber—are similarly integrated within complex networks. Dynamic systems such as these promise capabilities that are greater than the mere sum of their parts, as well as enhanced resilience when challenged by adversaries or natural disasters. But they are difficult to model and cannot be systematically designed using today’s tools, which are simply not up to the task of assessing and predicting the complex interactions among system structures and behaviors that constantly change across time and space.

To overcome this challenge, DARPA has announced the Complex Adaptive System Composition and Design Environment (CASCADE) program. The goal of CASCADE is to advance and exploit novel mathematical techniques able to provide a deeper understanding of system component interactions and a unified view of system behaviors. The program also aims to develop a formal language for composing and designing complex adaptive systems. A special notice announcing a Proposers Day on Dec. 9, 2015, was released today on FedBizOpps here: http://go.usa.gov/cT7uR.

“CASCADE aims to fundamentally change how we design systems for real-time resilient response within dynamic, unexpected environments,” said John Paschkewitz, DARPA program manager. “Existing modeling and design tools invoke static ‘playbook’ concepts that don’t adequately represent the complexity of, say, an airborne system of systems with its constantly changing variables, such as enemy jamming, bad weather, or loss of one or more aircraft. As another example, this program could inform the design of future forward-deployed military surgical capabilities by making sure the functions, structures, behaviors and constraints of the medical system—such as surgeons, helicopters, communication networks, transportation, time, and blood supply—are accurately modeled and understood.”

CASCADE could also help the Department of Defense fulfill its role of providing humanitarian assistance in response to a devastating earthquake, hurricane or other catastrophe, by developing comprehensive response models that account for the many components and interactions inherent in such missions, whether in urban or austere environs.

“We need new design and representation tools to ensure resilience of buildings, electricity, drinking water supply, healthcare, roads and sanitation when disaster strikes,” Paschkewitz said. “CASCADE could help develop models that would provide civil authorities, first responders and assisting military commanders with the sequence and timing of critical actions they need to take for saving lives and restoring critical infrastructure. In the stress following a major disaster, models that could do that would be invaluable.”

The CASCADE program seeks expertise in the following areas:

• Applied mathematics, especially in category theory, algebraic geometry and topology, and sheaf theory

• Operations research, control theory and planning, especially in stochastic and non-linear control

• Modeling and applications responsive to challenges in battlefield medicine logistics and platforms, adaptive logistics, reliability, and maintenance

• Search and rescue platforms and modeling

• Adaptive and resilient urban infrastructure

Metron already designs systems of systems used in Coast Guard search and rescue missions. Their grant proposal was to use category theory and operads to do this better. They needed an academic mathematician as part of their team: that was one of the program’s requirements. So they asked if I was interested.

I had mixed feelings.

On the one hand, I come from a line of peaceniks including Joan Baez, Mimi Fariña, their father the physicist Albert Baez, and my parents. I don’t like how the US government puts so much energy into fighting wars rather than solving our economic, social and environmental problems. It’s interesting that ‘systems of systems engineering’, as a field, is so heavily dominated by the US military. It’s an important subject that could be useful in many ways. We need it for better energy grids, better adaptation to climate change, and so on. I dream of using it to develop ‘ecotechnology’: technology that works *with* nature instead of trying to battle it and defeat it. But it seems the US doesn’t have the money, or the risk-taking spirit, to fund applications of category theory to *those* subjects.

On the other hand, I was attracted by the prospect of using category theory to design complex adaptive systems—and using it not just to tackle foundational issues, but also concrete challenges. I liked the idea of working with a team of people who are more practical than me. In this project, a big part of my job would be to write and publish papers: that’s something I can do. But Metron had other people who would try to create prototypes of software for helping the Coast Guard design search and rescue missions.

So I was torn.

In fact, because of my qualms, I’d already turned down an offer from another company that was writing a proposal for the CASCADE program. But the Metron project seemed slightly more attractive—I’m not sure why, perhaps because it was described to me in a more concrete way. And unlike that other company, Metron has a large existing body of software for evaluating complex systems, which should help me focus my theoretical ideas. The interaction between theory and practice can make theory a lot more interesting.

Something tipped the scales and I said yes. We applied for the grant, and we got it.

And so, an interesting adventure began. It will last for 3 years, and I’ll say more about it soon.

]]>

The next step is to ask whether these singularities rob general relativity of its predictive power. The ‘cosmic censorship hypothesis’, proposed by Penrose in 1969, claims they do not.

In this final post I’ll talk about cosmic censorship, and conclude with some big questions… and a place where you can get all these posts in a single file.

To say what we want to rule out, we must first think about what behaviors we consider acceptable. Consider first a black hole formed by the collapse of a star. According to general relativity, matter can fall into this black hole and ‘hit the singularity’ in a finite amount of proper time, but nothing can come out of the singularity.

The time-reversed version of a black hole, called a ‘white hole’, is often considered more disturbing. White holes have never been seen, but they are mathematically valid solutions of Einstein’s equation. In a white hole, matter can come *out* of the singularity, but nothing can fall *in*. Naively, this seems to imply that the future is unpredictable given knowledge of the past. Of course, the same logic applied to black holes would say the past is unpredictable given knowledge of the future.

If white holes are disturbing, perhaps the Big Bang should be more so. In the usual solutions of general relativity describing the Big Bang, *all matter in the universe* comes out of a singularity! More precisely, if one follows any timelike geodesic back into the past, it becomes undefined after a finite amount of proper time. Naively, this may seem a massive violation of predictability: in this scenario, the whole universe ‘sprang out of nothing’ about 14 billion years ago.

However, in all three examples so far—astrophysical black holes, their time-reversed versions and the Big Bang—spacetime is globally hyperbolic. I explained what this means last time. In simple terms, it means we can specify initial data at one moment in time and use the laws of physics to predict the future (and past) throughout all of spacetime. How is this compatible with the naive intuition that a singularity causes a failure of predictability?

For any globally hyperbolic spacetime one can find a smoothly varying family of Cauchy surfaces () such that each point of lies on exactly one of these surfaces. This amounts to a way of chopping spacetime into ‘slices of space’ for various choices of the ‘time’ parameter For an astrophysical black hole, the singularity is in the future of all these surfaces. That is, an incomplete timelike or null geodesic must go through all these surfaces before it becomes undefined. Similarly, for a white hole or the Big Bang, the singularity is in the past of all these surfaces. In either case, the singularity cannot interfere with our predictions of what occurs in spacetime.

A more challenging example is posed by the Kerr–Newman solution of Einstein’s equation coupled to the vacuum Maxwell equations. When

this solution describes a rotating charged black hole with mass charge and angular momentum in units where However, an electron violates this inequality. In 1968, Brandon Carter pointed out that if the electron were described by the Kerr–Newman solution, it would have a gyromagnetic ratio of much closer to the true answer than a classical spinning sphere of charge, which gives But since

this solution gives a spacetime that is not globally hyperbolic: it has closed timelike curves! It also contains a ‘naked singularity’. Roughly speaking, this is a singularity that can be seen by arbitrarily faraway observers in a spacetime whose geometry asymptotically approaches that of Minkowski spacetime. The existence of a naked singularity implies a failure of global hyperbolicity.

The cosmic censorship hypothesis comes in a number of forms. The original version due to Penrose is now called ‘weak cosmic censorship’. It asserts that in a spacetime whose geometry asymptotically approaches that of Minkowski spacetime, gravitational collapse cannot produce a naked singularity.

In 1991, Preskill and Thorne made a bet against Hawking in which they claimed that weak cosmic censorship was false. Hawking conceded this bet in 1997 when a counterexample was found. This features finely-tuned infalling matter poised right on the brink of forming a black hole. It *almost* creates a region from which light cannot escape—but not quite. Instead, it creates a naked singularity!

Given the delicate nature of this construction, Hawking did not give up. Instead he made a second bet, which says that weak cosmic censorshop holds ‘generically’ — that is, for an open dense set of initial conditions.

In 1999, Christodoulou proved that for spherically symmetric solutions of Einstein’s equation coupled to a massless scalar field, weak cosmic censorship holds generically. While spherical symmetry is a very restrictive assumption, this result is a good example of how, with plenty of work, we can make progress in rigorously settling the questions raised by general relativity.

Indeed, Christodoulou has been a leader in this area. For example, the vacuum Einstein equations have solutions describing gravitational waves, much as the vacuum Maxwell equations have solutions describing electromagnetic waves. However, gravitational waves can actually form black holes when they collide. This raises the question of the stability of Minkowski spacetime. Must sufficiently small perturbations of the Minkowski metric go away in the form of gravitational radiation, or can tiny wrinkles in the fabric of spacetime somehow amplify themselves and cause trouble—perhaps even a singularity? In 1993, together with Klainerman, Christodoulou proved that Minkowski spacetime is indeed stable. Their proof fills a 514-page book.

In 2008, Christodoulou completed an even longer rigorous study of the formation of black holes. This can be seen as a vastly more detailed look at questions which Penrose’s original singularity theorem addressed in a general, preliminary way. Nonetheless, there is much left to be done to understand the behavior of singularities in general relativity.

In this series of posts, we’ve seen that in every major theory of physics, challenging mathematical questions arise from the assumption that spacetime is a continuum. The continuum threatens us with infinities! Do these infinities threaten our ability to extract predictions from these theories—or even our ability to formulate these theories in a precise way?

We can answer these questions, but only with hard work. Is this a sign that we are somehow on the wrong track? Is the continuum as we understand it only an approximation to some deeper model of spacetime? Only time will tell. Nature is providing us with plenty of clues, but it will take patience to read them correctly.

To delve deeper into singularities and cosmic censorship, try this delightful book, which is free online:

• John Earman, *Bangs, Crunches, Whimpers and Shrieks: Singularities and Acausalities in Relativistic Spacetimes*, Oxford U. Press, Oxford, 1993.

To read this whole series of posts in one place, with lots more references and links, see:

• John Baez, Struggles with the continuum.

]]>

In general relativity, infinities coming from the continuum nature of spacetime are deeply connected to its most dramatic successful predictions: black holes and the Big Bang. In this theory, the density of the Universe approaches infinity as we go back in time toward the Big Bang, and the density of a star approaches infinity as it collapses to form a black hole. Thus we might say that instead of struggling against infinities, general relativity *accepts* them and has learned to live with them.

General relativity does not take quantum mechanics into account, so the story is not yet over. Many physicists hope that quantum gravity will eventually save physics from its struggles with the continuum! Since quantum gravity far from being understood, this remains just a hope. This hope has motivated a profusion of new ideas on spacetime: too many to survey here. Instead, I’ll focus on the humbler issue of how singularities arise in general relativity—and why they might not rob this theory of its predictive power.

General relativity says that spacetime is a 4-dimensional Lorentzian manifold. Thus, it can be covered by patches equipped with coordinates, so that in each patch we can describe points by lists of four numbers. Any curve going through a point then has a tangent vector whose components are Furthermore, given two tangent vectors at the same point we can take their inner product

where as usual we sum over repeated indices, and is a matrix called the metric, depending smoothly on the point. We require that at any point we can find some coordinate system where this matrix takes the usual Minkowski form:

However, as soon as we move away from our chosen point, the form of the matrix in these particular coordinates may change.

General relativity says how the metric is affected by matter. It does this in a single equation, Einstein’s equation, which relates the ‘curvature’ of the metric at any point to the flow of energy-momentum through that point. To define the curvature, we need some differential geometry. Indeed, Einstein had to learn this subject from his mathematician friend Marcel Grossman in order to write down his equation. Here I will take some shortcuts and try to explain Einstein’s equation with a bare minimum of differential geometry. For how this approach connects to the full story, and a list of resources for further study of general relativity, see:

• John Baez and Emory Bunn, The meaning of Einstein’s equation.

Consider a small round ball of test particles that are initially all at rest relative to each other. This requires a bit of explanation. First, because spacetime is curved, it only looks like Minkowski spacetime—the world of special relativity—in the limit of very small regions. The usual concepts of ’round’ and ‘at rest relative to each other’ only make sense in this limit. Thus, all our forthcoming statements are precise only in this limit, which of course relies on the fact that spacetime is a continuum.

Second, a test particle is a classical point particle with so little mass that while it is affected by gravity, its effects on the geometry of spacetime are negligible. We assume our test particles are affected only by gravity, no other forces. In general relativity this means that they move along timelike geodesics. Roughly speaking, these are paths that go slower than light and bend as little as possible. We can make this precise without much work.

For a path in *space* to be a geodesic means that if we slightly vary any small portion of it, it can only become longer. However, a path in *spacetime* traced out by particle moving slower than light must be ‘timelike’, meaning that its tangent vector satisfies We define the proper time along such a path from to to be

This is the time ticked out by a clock moving along that path. A timelike path is a geodesic if the proper time can only *decrease* when we slightly vary any small portion of it. Particle physicists prefer the opposite sign convention for the metric, and then we do not need the minus sign under the square root. But the fact remains the same: timelike geodesics locally maximize the proper time.

Actual particles are not test particles! First, the concept of test particle does not take quantum theory into account. Second, all known particles are affected by forces other than gravity. Third, any actual particle affects the geometry of the spacetime it inhabits. Test particles are just a mathematical trick for studying the geometry of spacetime. Still, a sufficiently light particle that is affected very little by forces other than gravity can be approximated by a test particle. For example, an artificial satellite moving through the Solar System behaves like a test particle if we ignore the solar wind, the radiation pressure of the Sun, and so on.

If we start with a small round ball consisting of many test particles that are initially all at rest relative to each other, to first order in time it will not change shape or size. However, to second order in time it can expand or shrink, due to the curvature of spacetime. It may also be stretched or squashed, becoming an ellipsoid. This should not be too surprising, because any linear transformation applied to a ball gives an ellipsoid.

Let be the volume of the ball after a time has elapsed, where time is measured by a clock attached to the particle at the center of the ball. Then in units where Einstein’s equation says:

These flows here are measured at the center of the ball at time zero, and the coordinates used here take advantage of the fact that to first order, at any one point, spacetime looks like Minkowski spacetime.

The flows in Einstein’s equation are the diagonal components of a matrix called the ‘stress-energy tensor’. The components of this matrix say how much momentum in the direction is flowing in the direction through a given point of spacetime. Here and range from to corresponding to the and coordinates.

For example, is the flow of -momentum in the -direction. This is just the energy density, usually denoted The flow of -momentum in the -direction is the pressure in the direction, denoted and similarly for and You may be more familiar with direction-independent pressures, but it is easy to manufacture a situation where the pressure depends on the direction: just squeeze a book between your hands!

Thus, Einstein’s equation says

It follows that positive energy density and positive pressure both curve spacetime in a way that makes a freely falling ball of point particles tend to shrink. Since and we are working in units where ordinary mass density counts as a form of energy density. Thus a massive object will make a swarm of freely falling particles at rest around it start to shrink. In short, *gravity attracts*.

Already from this, gravity seems dangerously inclined to create singularities. Suppose that instead of test particles we start with a stationary cloud of ‘dust’: a fluid of particles having nonzero energy density but no pressure, moving under the influence of gravity alone. The dust particles will still follow geodesics, but they will affect the geometry of spacetime. Their energy density will make the ball start to shrink. As it does, the energy density will increase, so the ball will tend to shrink ever faster, approaching infinite density in a finite amount of time. This in turn makes the curvature of spacetime become infinite in a finite amount of time. The result is a ‘singularity’.

In reality, matter is affected by forces other than gravity. Repulsive forces may prevent gravitational collapse. However, this repulsion creates pressure, and Einstein’s equation says that pressure also creates gravitational attraction! In some circumstances this can overwhelm whatever repulsive forces are present. Then the matter collapses, leading to a singularity—at least according to general relativity.

When a star more than 8 times the mass of our Sun runs out of fuel, its core suddenly collapses. The surface is thrown off explosively in an event called a supernova. Most of the energy—the equivalent of thousands of Earth masses—is released in a ten-minute burst of neutrinos, formed as a byproduct when protons and electrons combine to form neutrons. If the star’s mass is below 20 times that of our the Sun, its core crushes down to a large ball of neutrons with a crust of iron and other elements: a neutron star.

However, this ball is unstable if its mass exceeds the Tolman–Oppenheimer–Volkoff limit, somewhere between 1.5 and 3 times that of our Sun. Above this limit, gravity overwhelms the repulsive forces that hold up the neutron star. And indeed, no neutron stars heavier than 3 solar masses have been observed. Thus, for very heavy stars, the endpoint of collapse is not a neutron star, but something else: a *black hole*, an object that bends spacetime so much even light cannot escape.

If general relativity is correct, a black hole contains a singularity. Many physicists expect that general relativity breaks down inside a black hole, perhaps because of quantum effects that become important at strong gravitational fields. The singularity is considered a strong hint that this breakdown occurs. If so, the singularity may be a purely theoretical entity, not a real-world phenomenon. Nonetheless, everything we have observed about black holes matches what general relativity predicts. Thus, unlike all the other theories we have discussed, general relativity predicts infinities that are connected to striking phenomena that are *actually observed*.

The Tolman–Oppenheimer–Volkoff limit is not precisely known, because it depends on properties of nuclear matter that are not well understood. However, there are theorems that say singularities *must* occur in general relativity under certain conditions.

One of the first was proved by Raychauduri and Komar in the mid-1950’s. It applies only to ‘dust’, and indeed it is a precise version of our verbal argument above. It introduced the Raychauduri’s equation, which is the geometrical way of thinking about spacetime curvature as affecting the motion of a small ball of test particles. It shows that under suitable conditions, the energy density must approach infinity in a finite amount of time along the path traced out out by a dust particle.

The first required condition is that the flow of dust be initally converging, not expanding. The second condition, not mentioned in our verbal argument, is that the dust be ‘irrotational’, not swirling around. The third condition is that the dust particles be affected only by gravity, so that they move along geodesics. Due to the last two conditions, the Raychauduri–Komar theorem does not apply to collapsing stars.

The more modern singularity theorems eliminate these conditions. But they do so at a price: they require a more subtle concept of singularity! There are various possible ways to define this concept. They’re all a bit tricky, because a singularity is not a point or region in spacetime.

For our present purposes, we can define a singularity to be an ‘incomplete timelike or null geodesic’. As already explained, a timelike geodesic is the kind of path traced out by a test particle moving slower than light. Similarly, a null geodesic is the kind of path traced out by a test particle moving at the speed of light. We say a geodesic is ‘incomplete’ if it ceases to be well-defined after a finite amount of time. For example, general relativity says a test particle falling into a black hole follows an incomplete geodesic. In a rough-and-ready way, people say the particle ‘hits the singularity’. But the singularity is not a place in spacetime. What we really mean is that the particle’s path becomes undefined after a finite amount of time.

We need to be a bit careful about what we mean by ‘time’ here. For test particles moving slower than light this is easy, since we can parametrize a timelike geodesic by proper time. However, the tangent vector of a null geodesic has so a particle moving along a null geodesic does not experience any passage of proper time. Still, any geodesic, even a null one, has a family of preferred parametrizations. These differ only by changes of variable like this: By ‘time’ we really mean the variable in any of these preferred parametrizations. Thus, if our spacetime is some Lorentzian manifold we say a geodesic is incomplete if, parametrized in one of these preferred ways, it cannot be extended to a strictly longer interval.

The first modern singularity theorem was proved by Penrose in 1965. It says that if space is infinite in extent, and light becomes trapped inside some bounded region, and no exotic matter is present to save the day, either a singularity or something even more bizarre must occur. This theorem applies to collapsing stars. When a star of sufficient mass collapses, general relativity says that its gravity becomes so strong that light becomes trapped inside some bounded region. We can then use Penrose’s theorem to analyze the possibilities.

Shortly thereafter Hawking proved a second singularity theorem, which applies to the Big Bang. It says that if space is finite in extent, and no exotic matter is present, generically either a singularity or something even more bizarre must occur. The singularity here could be either a Big Bang in the past, a Big Crunch in the future, both—or possibly something else. Hawking also proved a version of his theorem that applies to certain Lorentzian manifolds where space is infinite in extent, as seems to be the case in our Universe. This version requires extra conditions.

There are some undefined phrases in this summary of the Penrose–Hawking singularity theorems, most notably these:

• ‘exotic matter’

• ‘singularity’

• ‘something even more bizarre’.

So, let me say a bit about each.

These singularity theorems precisely specify what is meant by ‘exotic matter’. This is matter for which

at some point, in some coordinate system. By Einstein’s equation, this would make a small ball of freely falling test particles tend to *expand*. In other words, exotic matter would create a repulsive gravitational field. No matter of this sort has ever been found; the matter we know obeys the so-called ‘dominant energy condition’

The Penrose–Hawking singularity theorems also say what counts as ‘something even more bizarre’. An example would be a closed timelike curve. A particle following such a path would move slower than light yet eventually reach the same point where it started—and not just the same point in space, but the same point in *spacetime!* If you could do this, perhaps you could wait, see if it would rain tomorrow, and then go back and decide whether to buy an umbrella today. There are certainly solutions of Einstein’s equation with closed timelike curves. The first interesting one was found by Einstein’s friend Gödel in 1949, as part of an attempt to probe the nature of time. However, closed timelike curves are generally considered less plausible than singularities.

In the Penrose–Hawking singularity theorems, ‘something even more bizarre’ means that spacetime is not ‘globally hyperbolic’. To understand this, we need to think about when we can predict the future or past given initial data. When studying field equations like Maxwell’s theory of electromagnetism or Einstein’s theory of gravity, physicists like to specify initial data on space at a given moment of time. However, in general relativity there is considerable freedom in how we choose a slice of spacetime and call it ‘space’. What should we require? For starters, we want a 3-dimensional submanifold of spacetime that is ‘spacelike’: every vector tangent to should have However, we also want any timelike or null curve to hit exactly once. A spacelike surface with this property is called a Cauchy surface, and a Lorentzian manifold containing a Cauchy surface is said to be globally hyperbolic. There are many theorems justifying the importance of this concept. Globally hyperbolicity excludes closed timelike curves, but also other bizarre behavior.

By now the original singularity theorems have been greatly generalized and clarified. Hawking and Penrose gave a unified treatment of both theorems in 1970. The 1973 textbook by Hawking and Ellis gives a systematic introduction to this subject. Hawking gave an elegant informal overview of the key ideas in 1994, and a paper by Garfinkle and Senovilla reviews the subject and its history up to 2015.

If we accept that general relativity really predicts the existence of singularities in physically realistic situations, the next step is to ask whether they rob general relativity of its predictive power. I’ll talk about that next time!

]]>

I concluded with a famous example: the magnetic moment of the electron. With a truly heroic computation, physicists have used QED to compute this quantity up to order If we also take other Standard Model effects into account we get agreement to roughly one part in

However, if we continue adding up terms in this power series, there is no guarantee that the answer converges. Indeed, in 1952 Freeman Dyson gave a heuristic argument that makes physicists expect that the series *diverges*, along with most other power series in QED!

The argument goes as follows. If these power series converged for small positive they would have a nonzero radius of convergence, so they would also converge for small negative Thus, QED would make sense for small negative values of which correspond to *imaginary* values of the electron’s charge. If the electron had an imaginary charge, electrons would attract each other electrostatically, since the usual repulsive force between them is proportional to Thus, if the power series converged, we would have a theory like QED for electrons that attract rather than repel each other.

However, there is a good reason to believe that QED cannot make sense for electrons that attract. The reason is that it describes a world where the vacuum is unstable. That is, there would be states with arbitrarily large negative energy containing many electrons and positrons. Thus, we expect that the vacuum could spontaneously turn into electrons and positrons together with photons (to conserve energy). Of course, this not a rigorous proof that the power series in QED diverge: just an argument that it would be strange if they did not.

To see why electrons that attract could have arbitrarily large negative energy, consider a state with a large number of such electrons inside a ball of radius We require that these electrons have small momenta, so that nonrelativistic quantum mechanics gives a good approximation to the situation. Since its momentum is small, the kinetic energy of each electron is a small fraction of its rest energy If we let be the expected value of the total rest energy and kinetic energy of all the electrons, it follows that is approximately proportional to

The Pauli exclusion principle puts a limit on how many electrons with momentum below some bound can fit inside a ball of radius This number is asymptotically proportional to the volume of the ball. Thus, we can assume is approximately proportional to It follows that is approximately proportional to

There is also the negative potential energy to consider. Let be the operator for potential energy. Since we have electrons attracted by an potential, and each pair contributes to the potential energy, we see that is approximately proportional to or Since grows faster than we can make the expected energy arbitrarily large and negative as

Note the interesting contrast between this result and some previous ones we have seen. In Newtonian mechanics, the energy of particles attracting each other with a potential is unbounded below. In quantum mechanics, thanks the uncertainty principle, the energy is bounded below for any fixed number of particles. However, quantum field theory allows for the creation of particles, and this changes everything! Dyson’s disaster arises because the vacuum can turn into a state with *arbitrarily large numbers* of electrons and positrons. This disaster only occurs in an imaginary world where is negative—but it may be enough to prevent the power series in QED from having a nonzero radius of convergence.

We are left with a puzzle: how can perturbative QED work so well in practice, if the power series in QED diverge?

Much is known about this puzzle. There is an extensive theory of ‘Borel summation’, which allows one to extract well-defined answers from certain divergent power series. For example, consider a particle of mass on a line in a potential

When this potential is bounded below, but when it is not: classically, it describes a particle that can shoot to infinity in a finite time. Let be the quantum Hamiltonian for this particle, where is the usual operator for the kinetic energy and is the operator for potential energy. When the Hamiltonian is essentially self-adjoint on the set of smooth wavefunctions that vanish outside a bounded interval. This means that the theory makes sense. Moreover, in this case has a ‘ground state’: a state whose expected energy is as low as possible. Call this expected energy One can show that depends smoothly on for and one can write down a Taylor series for

On the other hand, when the Hamiltonian is *not* essentially self-adjoint. This means that the quantum mechanics of a particle in this potential is ill-behaved when Heuristically speaking, the problem is that such a particle could tunnel through the barrier given by the local maxima of and shoot off to infinity in a finite time.

This situation is similar to Dyson’s disaster, since we have a theory that is well-behaved for and ill-behaved for As before, the bad behavior seems to arise from our ability to convert an infinite amount of potential energy into other forms of energy. However, in this simpler situation one can *prove* that the Taylor series for does not converge. Barry Simon did this around 1969. Moreover, one can prove that Borel summation, applied to this Taylor series, gives the correct value of for The same is known to be true for certain quantum field theories. Analyzing these examples, one can see why summing the first few terms of a power series can give a good approximation to the correct answer even though the series diverges. The terms in the series get smaller and smaller for a while, but eventually they become huge.

Unfortunately, nobody has been able to carry out this kind of analysis for quantum electrodynamics. In fact, the current conventional wisdom is that this theory is inconsistent, due to problems at very short distance scales. In our discussion so far, we summed over Feynman diagrams with vertices to get the first terms of power series for answers to physical questions. However, one can also sum over all diagrams with loops. This more sophisticated approach to renormalization, which sums over infinitely many diagrams, may dig a bit deeper into the problems faced by quantum field theories.

If we use this alternate approach for QED we find something surprising. Recall that in renormalization we impose a momentum cutoff essentially ignoring waves of wavelength less than and use this to work out a relation between the the electron’s bare charge and its renormalized charge We try to choose that makes equal to the electron’s experimentally observed charge If we sum over Feynman diagrams with vertices this is always possible. But if we sum over Feynman diagrams with at most one loop, it ceases to be possible when reaches a certain very large value, namely

According to this one-loop calculation, the electron’s bare charge becomes *infinite* at this point! This value of is known as a ‘Landau pole’, since it was first noticed in about 1954 by Lev Landau and his colleagues.

What is the meaning of the Landau pole? We said that poetically speaking, the bare charge of the electron is the charge we would see if we could strip off the electron’s virtual particle cloud. A somewhat more precise statement is that is the charge we would see if we collided two electrons head-on with a momentum on the order of In this collision, there is a good chance that the electrons would come within a distance of from each other. The larger is, the smaller this distance is, and the more we penetrate past the effects of the virtual particle cloud, whose polarization ‘shields’ the electron’s charge. Thus, the larger is, the larger becomes.

So far, all this makes good sense: physicists have done experiments to actually measure this effect. The problem is that according to a one-loop calculation, becomes infinite when reaches a certain huge value.

Of course, summing only over diagrams with at most one loop is not definitive. Physicists have repeated the calculation summing over diagrams with loops, and again found a Landau pole. But again, this is not definitive. Nobody knows what will happen as we consider diagrams with more and more loops. Moreover, the distance corresponding to the Landau pole is absurdly small! For the one-loop calculation quoted above, this distance is about

This is hundreds of orders of magnitude smaller than the length scales physicists have explored so far. Currently the Large Hadron Collider can probe energies up to about 10 TeV, and thus distances down to about meters, or about 0.00002 times the radius of a proton. Quantum field theory seems to be holding up very well so far, but no reasonable physicist would be willing to extrapolate this success down to meters, and few seem upset at problems that manifest themselves only at such a short distance scale.

Indeed, attitudes on renormalization have changed significantly since 1948, when Feynman, Schwinger and Tomonoga developed it for QED. At first it seemed a bit like a trick. Later, as the success of renormalization became ever more thoroughly confirmed, it became accepted. However, some of the most thoughtful physicists remained worried. In 1975, Dirac said:

Most physicists are very satisfied with the situation. They say: ‘Quantum electrodynamics is a good theory and we do not have to worry about it any more.’ I must say that I am very dissatisfied with the situation, because this so-called ‘good theory’ does involve neglecting infinities which appear in its equations, neglecting them in an arbitrary way. This is just not sensible mathematics. Sensible mathematics involves neglecting a quantity when it is small—not neglecting it just because it is infinitely great and you do not want it!

As late as 1985, Feynman wrote:

The shell game that we play [. . .] is technically called ‘renormalization’. But no matter how clever the word, it is still what I would call a dippy process! Having to resort to such hocus-pocus has prevented us from proving that the theory of quantum electrodynamics is mathematically self-consistent. It’s surprising that the theory still hasn’t been proved self-consistent one way or the other by now; I suspect that renormalization is not mathematically legitimate.

By now renormalization is thoroughly accepted among physicists. The key move was a change of attitude emphasized by Kenneth Wilson in the 1970s. Instead of treating quantum field theory as the correct description of physics at arbitrarily large energy-momenta, we can assume it is only an approximation. For renormalizable theories, one can argue that even if quantum field theory is inaccurate at large energy-momenta, the corrections become negligible at smaller, experimentally accessible energy-momenta. If so, instead of seeking to take the limit, we can use renormalization to relate bare quantities at some large but finite value of to experimentally observed quantities.

From this practical-minded viewpoint, the possibility of a Landau pole in QED is less important than the behavior of the Standard Model. Physicists believe that the Standard Model would suffer from Landau pole at momenta low enough to cause serious problems if the Higgs boson were considerably more massive than it actually is. Thus, they were relieved when the Higgs was discovered at the Large Hadron Collider with a mass of about 125 GeV/c^{2}. However, the Standard Model may still suffer from a Landau pole at high momenta, as well as an instability of the vacuum.

Regardless of practicalities, for the *mathematical* physicist, the question of whether or not QED and the Standard Model can be made into well-defined mathematical structures that obey the axioms of quantum field theory remain open problems of great interest. Most physicists believe that this can be done for pure Yang–Mills theory, but actually proving this is the first step towards winning $1,000,000 from the Clay Mathematics Institute.

]]>

I want to sketch some of the key issues in the case of quantum electrodynamics, or ‘QED’. The history of QED has been nicely told here:

• Silvian Schweber, *QED and the Men who Made it: Dyson, Feynman, Schwinger, and Tomonaga*, Princeton U. Press, Princeton, 1994.

Instead of explaining the history, I will give a very simplified account of the current state of the subject. I hope that experts forgive me for cutting corners and trying to get across the basic ideas at the expense of many technical details. The nonexpert is encouraged to fill in the gaps with the help of some textbooks.

QED involves just one dimensionless parameter, the fine structure constant:

Here is the electron charge, is the permittivity of the vacuum, is Planck’s constant and is the speed of light. We can think of as a dimensionless version of the electron charge. It says how strongly electrons and photons interact.

Nobody knows why the fine structure constant has the value it does! In computations, we are free to treat it as an adjustable parameter. If we set it to zero, quantum electrodynamics reduces to a free theory, where photons and electrons do not interact with each other. A standard strategy in QED is to take advantage of the fact that the fine structure constant is small and expand answers to physical questions as power series in This is called ‘perturbation theory’, and it allows us to exploit our knowledge of free theories.

One of the main questions we try to answer in QED is this: if we start with some particles with specified energy-momenta in the distant past, what is the probability that they will turn into certain other particles with certain other energy-momenta in the distant future? As usual, we compute this probability by first computing a complex amplitude and then taking the square of its absolute value. The amplitude, in turn, is computed as a power series in

The term of order in this power series is a sum over Feynman diagrams with vertices. For example, suppose we are computing the amplitude for two electrons wth some specified energy-momenta to interact and become two electrons with some other energy-momenta. One Feynman diagram appearing in the answer is this:

Here the electrons exhange a single photon. Since this diagram has two vertices, it contributes a term of order The electrons could also exchange two photons:

giving a term of A more interesting term of order is this:

Here the electrons exchange a photon that splits into an electron-positron pair and then recombines. There are infinitely many diagrams with two electrons coming in and two going out. However, there are only finitely many with vertices. Each of these contributes a term proportional to to the amplitude.

In general, the external edges of these diagrams correspond to the experimentally observed particles coming in and going out. The internal edges correspond to ‘virtual particles’: that is, particles that are not directly seen, but appear in intermediate steps of a process.

Each of these diagrams is actually a notation for an integral! There are systematic rules for writing down the integral starting from the Feynman diagram. To do this, we first label each edge of the Feynman diagram with an energy-momentum, a variable The integrand, which we shall not describe here, is a function of all these energy-momenta. In carrying out the integral, the energy-momenta of the external edges are held fixed, since these correspond to the experimentally observed particles coming in and going out. We integrate over the energy-momenta of the internal edges, which correspond to virtual particles, while requiring that energy-momentum is conserved at each vertex.

However, there is a problem: the integral typically diverges! Whenever a Feynman diagram contains a loop, the energy-momenta of the virtual particles in this loop can be arbitrarily large. Thus, we are integrating over an infinite region. In principle the integral could still converge if the integrand goes to zero fast enough. However, we rarely have such luck.

What does this mean, physically? It means that if we allow virtual particles with arbitrarily large energy-momenta in intermediate steps of a process, there are ‘too many ways for this process to occur’, so the amplitude for this process diverges.

Ultimately, the continuum nature of spacetime is to blame. In quantum mechanics, particles with large momenta are the same as waves with short wavelengths. Allowing light with arbitrarily short wavelengths created the ultraviolet catastrophe in classical electromagnetism. Quantum electromagnetism averted that catastrophe—but the problem returns in a different form as soon as we study the interaction of photons and charged particles.

Luckily, there is a strategy for tackling this problem. The integrals for Feynman diagrams become well-defined if we impose a ‘cutoff’, integrating only over energy-momenta in some bounded region, say a ball of some large radius In quantum theory, a particle with momentum of magnitude greater than is the same as a wave with wavelength less than Thus, imposing the cutoff amounts to ignoring waves of short wavelength—and for the same reason, ignoring waves of high frequency. We obtain well-defined answers to physical questions when we do this. Unfortunately the answers depend on and if we let they diverge.

However, this is not the correct limiting procedure. Indeed, among the quantities that we can compute using Feynman diagrams are the charge and mass of the electron! Its charge can be computed using diagrams in which an electron emits or absorbs a photon:

Similarly, its mass can be computed using a sum over Feynman diagrams where one electron comes in and one goes out.

The interesting thing is this: to do these calculations, we must start by assuming some charge and mass for the electron—but the charge and mass we *get out* of these calculations do not equal the masses and charges we *put in!*

The reason is that virtual particles affect the observed charge and mass of a particle. Heuristically, at least, we should think of an electron as surrounded by a cloud of virtual particles. These contribute to its mass and ‘shield’ its electric field, reducing its observed charge. It takes some work to translate between this heuristic story and actual Feynman diagram calculations, but it can be done.

Thus, there are two different concepts of mass and charge for the electron. The numbers we put into the QED calculations are called the ‘bare’ charge and mass, and Poetically speaking, these are the charge and mass we would see if we could strip the electron of its virtual particle cloud and see it in its naked splendor. The numbers we get out of the QED calculations are called the ‘renormalized’ charge and mass, and These are computed by doing a sum over Feynman diagrams. So, they take virtual particles into account. These are the charge and mass of the electron clothed in its cloud of virtual particles. It is these quantities, not the bare quantities, that should agree with experiment.

Thus, the correct limiting procedure in QED calculations is a bit subtle. For any value of and any choice of and we compute and The necessary integrals all converge, thanks to the cutoff. We choose and so that and agree with the experimentally observed charge and mass of the electron. The bare charge and mass chosen this way depend on so call them and

Next, suppose we want to compute the answer to some other physics problem using QED. We do the calculation with a cutoff using and as the bare charge and mass in our calculation. Then we take the limit

In short, rather than simply fixing the bare charge and mass and letting we cleverly adjust the bare charge and mass as we take this limit. This procedure is called ‘renormalization’, and it has a complex and fascinating history:

• Laurie M. Brown, ed., *Renormalization: From Lorentz to Landau (and Beyond)*, Springer, Berlin, 2012.

There are many technically different ways to carry out renormalization, and our account so far neglects many important issues. Let us mention three of the simplest.

First, besides the classes of Feynman diagrams already mentioned, we must also consider those where one photon goes in and one photon goes out, such as this:

These affect properties of the photon, such as its mass. Since we want the photon to be massless in QED, we have to adjust parameters as we take to make sure we obtain this result. We must also consider Feynman diagrams where nothing comes in and nothing comes out—so-called ‘vacuum bubbles’—and make these behave correctly as well.

Second, the procedure just described, where we impose a ‘cutoff’ and integrate over energy-momenta lying in a ball of radius is not invariant under Lorentz transformations. Indeed, any theory featuring a smallest time or smallest distance violates the principles of special relativity: thanks to time dilation and Lorentz contractions, different observers will disagree about times and distances. We could accept that Lorentz invariance is broken by the cutoff and hope that it is restored in the limit, but physicists prefer to maintain symmetry at every step of the calculation. This requires some new ideas: for example, replacing Minkowski spacetime with 4-dimensional Euclidean space. In 4-dimensional Euclidean space, Lorentz transformations are replaced by rotations, and a ball of radius is a rotation-invariant concept. To do their Feynman integrals in Euclidean space, physicists often let time take imaginary values. They do their calculations in this context and then transfer the results back to Minkowski spacetime at the end. Luckily, there are theorems justifying this procedure.

Third, besides infinities that arise from waves with arbitrarily short wavelengths, there are infinities that arise from waves with arbitrarily *long* wavelengths. The former are called ‘ultraviolet divergences’. The latter are called ‘infrared divergences’, and they afflict theories with massless particles, like the photon. For example, in QED the collision of two electrons will emit an infinite number of photons with very long wavelengths and low energies, called ‘soft photons’. In practice this is not so bad, since any experiment can only detect photons with energies above some nonzero value. However, infrared divergences are conceptually important. It seems that in QED any electron is inextricably accompanied by a cloud of soft photons. These are real, not virtual particles. This may have remarkable consequences.

Battling these and many other subtleties, many brilliant physicists and mathematicians have worked on QED. The good news is that this theory has been proved to be ‘perturbatively renormalizable’:

• J. S. Feldman, T. R. Hurd, L. Rosen and J. D. Wright, *QED: A Proof of Renormalizability*, Lecture Notes in Physics **312**, Springer, Berlin, 1988.

• Günter Scharf, *Finite Quantum Electrodynamics: The Causal Approach*, Springer, Berlin, 1995

This means that we can indeed carry out the procedure roughly sketched above, obtaining answers to physical questions as power series in

The bad news is we do not know if these power series converge. In fact, it is widely believed that they diverge! This puts us in a curious situation.

For example, consider the magnetic dipole moment of the electron. An electron, being a charged particle with spin, has a magnetic field. A classical computation says that its magnetic dipole moment is

where is its spin angular momentum. Quantum effects correct this computation, giving

for some constant called the gyromagnetic ratio, which can be computed using QED as a sum over Feynman diagrams with an electron exchanging a single photon with a massive charged particle:

The answer is a power series in but since all these diagrams have an even number of vertices, it only contains integral powers of The lowest-order term gives simply In 1948, Julian Schwinger computed the next term and found a small correction to this simple result:

By now a team led by Toichiro Kinoshita has computed up to order This requires computing over 13,000 integrals, one for each Feynman diagram of the above form with up to 10 vertices! The answer agrees very well with experiment: in fact, if we also take other Standard Model effects into account we get agreement to roughly one part in

This is the most accurate prediction in all of science.

However, as mentioned, it is widely believed that this power series *diverges!* Next time I’ll explain why physicists think this, and what it means for a divergent series to give such a good answer when you add up the first few terms.

]]>

Right now the world of particle physics is in a shocked, somewhat demoralized state because the Large Hadron Collider has not yet found any physics beyond the Standard Model. Some Chinese scientists want to forge ahead by building an even more powerful, even more expensive accelerator.

But Yang recently came out *against* this. This is a big deal, because he is very prestigious, and only China has the will to pay for the next machine. The director of the Chinese institute that wants to build the next machine, Wang Yifeng, issued a point-by-point rebuttal of Yang the very next day.

Over on G+, Willie Wong translated some of Wang’s rebuttal in some comments to my post on this subject. The real goal of my post here is to make this translation a bit easier to find—not because I agree with Wang, but because this discussion is important: it affects the future of particle physics.

First let me set the stage. In 2012, two months after the Large Hadron Collider found the Higgs boson, the Institute of High Energy Physics proposed a bigger machine: the Circular Electron Positron Collider, or CEPC.

This machine would be a ring 100 kilometers around. It would collide electrons and positrons at an energy of 250 GeV, about twice what you need to make a Higgs. It could make lots of Higgs bosons and study their properties. It might find something new, too! Of course that would be the hope.

It would cost $6 billion, and the plan was that China would pay for 70% of it. Nobody knows who would pay for the rest.

According to *Science*:

On 4 September, Yang, in an article posted on the social media platform WeChat, says that China should not build a supercollider now. He is concerned about the huge cost and says the money would be better spent on pressing societal needs. In addition, he does not believe the science justifies the cost: The LHC confirmed the existence of the Higgs boson, he notes, but it has not discovered new particles or inconsistencies in the standard model of particle physics. The prospect of an even bigger collider succeeding where the LHC has failed is “a guess on top of a guess,” he writes. Yang argues that high-energy physicists should eschew big accelerator projects for now and start blazing trails in new experimental and theoretical approaches.

That same day, IHEP’s director, Wang Yifang, posted a point-by-point rebuttal on the institute’s public WeChat account. He criticized Yang for rehashing arguments he had made in the 1970s against building the BECP. “Thanks to comrade [Deng] Xiaoping,” who didn’t follow Yang’s advice, Wang wrote, “IHEP and the BEPC … have achieved so much today.” Wang also noted that the main task of the CEPC would not be to find new particles, but to carry out detailed studies of the Higgs boson.

Yang did not respond to request for comment. But some scientists contend that the thrust of his criticisms are against the CEPC’s anticipated upgrade, the Super Proton-Proton Collider (SPPC). “Yang’s objections are directed mostly at the SPPC,” says Li Miao, a cosmologist at Sun Yat-sen University, Guangzhou, in China, who says he is leaning toward supporting the CEPC. That’s because the cost Yang cites—$20 billion—is the estimated price tag of both the CEPC and the SPPC, Li says, and it is the SPPC that would endeavor to make discoveries beyond the standard model.

Still, opposition to the supercollider project is mounting outside the high-energy physics community. Cao Zexian, a researcher at CAS’s Institute of Physics here, contends that Chinese high-energy physicists lack the ability to steer or lead research in the field. China also lacks the industrial capacity for making advanced scientific instruments, he says, which means a supercollider would depend on foreign firms for critical components. Luo Huiqian, another researcher at the Institute of Physics, says that most big science projects in China have suffered from arbitrary cost cutting; as a result, the finished product is often a far cry from what was proposed. He doubts that the proposed CEPC would be built to specifications.

The state news agency Xinhua has lauded the debate as “progress in Chinese science” that will make big science decision-making “more transparent.” Some, however, see a call for transparency as a bad omen for the CEPC. “It means the collider may not receive the go-ahead in the near future,” asserts Institute of Physics researcher Wu Baojun. Wang acknowledged that possibility in a 7 September interview with Caijing magazine: “opposing voices naturally have an impact on future approval of the project,” he said.

Willie Wong’s prefaced his translation of Wang’s rebuttal with this:

Here is a translation of the essential parts of the rebuttal; some standard Chinese language disclaimers of deference etc are omitted. I tried to make the translation as true to the original as possible; the viewpoints expressed are not my own.

Here is the translation:

Today (September 4) published the article by CN Yang titled “China should not build an SSC today”. As a scientist who works on the front line of high energy physics and the current director of the the high energy physics institute in the Chinese Academy of Sciences, I cannot agree with his viewpoint.

(A) The first reason to Dr. Yang’s objection is that a supercollider is a bottomless hole. His objection stemmed from the American SSC wasting 3 billion US dollars and amounted to naught. The LHC cost over 10 billion US dollars. Thus the proposed Chinese accelerator cannot cost less than 20 billion US dollars, with no guaranteed returns.[Ed: emphasis original]Here, there are actually three problems. The first is “why did SSC fail”? The second is “how much would a Chinese collider cost?” And the third is “is the estimate reasonable and realistic?” Here I address them point by point.

(1) Why did the American SSC fail? Are all colliders bottomless pits?The many reasons leading to the failure of the American SSC include the government deficit at the time, the fight for funding against the International Space Station, the party politics of the United States, the regional competition between Texas and other states. Additionally there are problems with poor management, bad budgeting, ballooning construction costs, failure to secure international collaboration. See references [2,3] [Ed: consult original article for references; items 1-3 are English language]. In reality, “exceeding the budget” is definitely not the primary reason for the failure of the SSC; rather, the failure should be attributed to some special and circumstantial reasons, caused mainly by political elements.

For the US, abandoning the SSC was a very incorrect decision. It lost the US the chance for discovering the Higgs Boson, as well as the foundations and opportunities for future development, and thereby also the leadership position that US has occupied internationally in high energy physics until then. This definitely had a very negative impact on big science initiatives in the US, and caused one generation of Americans to lose the courage to dream. The reasons given by the American scientific community against the SSC are very similar to what we here today against the Chinese collider project. But actually the cancellation of the SSC did not increase funding to other scientific endeavors. Of course, activation of the SSC would not have reduced the funding to other scientific endeavors, and many people who objected to the project are not regretting it.

Since then, LHC was constructed in Europe, and achieved great success. Even though its construction exceeded its original budget, but not by a lot. This shows that supercollider projects do not have to be bottomless, and has a chance to succeed.

The Chinese political landscape is entirely different from that of the US. In particular, for large scale constructions, the political system is superior. China has already accomplished to date many tasks which the Americans would not, or could not do; many more will happen in the future. The failure of SSC doesn’t mean that we cannot do it. We should scientifically analyze the situation, and at the same time foster international collaboration, and properly manage the budget.

(2) How much would it cost?Our planned collider (using circumference of 100 kilometers for computations) will proceed in two steps. [Ed: details omitted. The author estimated that the electron-positron collider will cost 40 Billion Yuan, followed by the proton-proton collider which will cost 100 billion Yuan, not accounting for inflation. With approximately 10 year construction time for each phase.] The two-phase planning is to showcase the scientific longevity of the project, especially entrainment of other technical development (e.g. high energy superconductors), and that the second phase [ed: the proton-proton collider] is complementary to the scientific and technical developments of the first phase. The reason that the second phase designs are incorporated in the discussion is to prevent the scenario where design elements of the first phase inadvertently shuts off possibility of further expansion in the second phase.

(3) Is this estimate realistic? Are we going to go down the same road as the American SSC?First, note that in the past 50 years , there were many successful colliders internationally (LEP, LHC, PEPII, KEKB/SuperKEKB etc) and many unsuccessful ones (ISABELLE, SSC, FAIR, etc). The failed ones are all proton accelerators. All electron colliders have been successful. The main reason is that proton accelerators are more complicated, and it is harder to correctly estimate the costs related to constructing machines beyond the current frontiers.

There are many successful large-scale constructions in China. In the 40 years since the founding of the high energy physics institute, we’ve built [list of high energy experiment facilities, I don’t know all their names in English], each costing over 100 million Yuan, and none are more than 5% over budget, in terms of actual costs of construction, time to completion, meeting milestones. We have a well developed expertise in budget, construction, and management.

For the CEPC (electron-positron collider) our estimates relied on two methods:

(i) Summing of the parts: separately estimating costs of individual elements and adding them up.

(ii) Comparisons: using costs for elements derived from costs of completed instruments both domestically and abroad.

At the level of the total cost and at the systems level, the two methods should produce cost estimates within 20% of each other.

After completing the initial design [ref. 1], we produced a list of more than 1000 required equipments, and based our estimates on that list. The estimates are refereed by local and international experts.

For the SPPC (the proton-proton collider; second phase) we only used the second method (comparison). This is due to the second phase not being the main mission at hand, and we are not yet sure whether we should commit to the second phase. It is therefore not very meaningful to discuss its potential cost right now. We are committed to only building the SPPC once we are sure the science and the technology are mature.

(B) The second reason given by Dr. Yang is that China is still a developing country, and there are many social-economic problems that should be solved before considering a supercollider.Any country, especially one as big as China, must consider both the immediate and the long-term in its planning. Of course social-economic problems need to be solved, and indeed solving them is taking currently the lions share of our national budget. But we also need to consider the long term, including an appropriate amount of expenditures on basic research, to enable our continuous development and the potential to lead the world. The China at the end of the Qing dynasty has a rich populace with the world’s highest GDP. But even though the government has the ability to purchase armaments, the lack of scientific understanding reduced the country to always be on the losing side of wars.

In the past few hundred years, developments into understanding the structure of matter, from molecules, atoms, to the nucleus, the elementary particles, all contributed and led the scientific developments of their era. High energy physics pursue the finest structure of matter and its laws, the techniques used cover many different fields, from accelerator, detectors, to low temperature, superconducting, microwave, high frequency, vacuum, electronic, high precision instrumentation, automatic controls, computer science and networking, in many ways led to the developments in those fields and their broad adoption. This is a indicator field in basic science and technical developments.

Building the supercollider can result in China occupying the leadership position in such diverse scientific fields for several decades, and also lead to the domestic production of many of the important scientific and technical instruments. Furthermore, it will allow us to attract international intellectual capital, and allow the training of thousands of world-leading specialists in our institutes.How is this not an urgent need for the country?In fact, the impression the Chinese government and the average Chinese people create for the world at large is a populace with lots of money, and also infatuated with money. It is hard for a large country to have a international voice and influence without significant contribution to the human culture. This influence, in turn, affects the benefits China receive from other countries. In terms of current GDP, the proposed project (including also the phase 2 SPPC) does not exceed that of the Beijing positron-electron collider completed in the 80s, and is in fact lower than LEP, LHC, SSC, and ILC.

Designing and starting the construction of the next supercollider within the next 5 years is a rare opportunity to let us achieve a leadership position internationally in the field of high energy physics.The newly discovered Higgs boson has a relatively low mass, which allows us to probe it further using a circular positron-electron collider. Furthermore, such colliders has a chance to be modified into proton colliders. This facility will have over 5 decades of scientific use. Furthermore, currently Europe, US, and Japan all already have scientific items on their agenda, and within 20 years probably cannot construct similar facilities. This gives us an advantage in competitiveness. Thirdly, we already have the experience building the Beijing positron-electron collider, so such a facility is in our strengths. The window of opportunity typically lasts only 10 years, if we miss it, we don’t know when the next window will be. Furthermore, we have extensive experience in underground construction, and the Chinese economy is currently at a stage of high growth. We have the ability to do the constructions and also the scientific need. Therefore a supercollider is a very suitable item to consider.

(C) The third reason given by Dr. Yang is that constructing a supercollider necessarily excludes funding other basic sciences.China currently spends 5% of all R&D budget on basic research; internationally 15% is more typical for developed countries. As a developing country aiming to joint the ranks of developed country, and as a large country, I believe we should aim to raise the ratio to 10% gradually and eventually to 15%. In terms of numbers, funding for basic science has a large potential for growth (around 100 billion yuan per annum) without taking away from other basic science research.

On the other hand, where should the increased funding be directed? Everyone knows that a large portion of our basic science research budgets are spent on purchasing scientific instruments, especially from international sources. If we evenly distribute the increased funding amount all basic science fields, the end results is raising the GDP of US, Europe, and Japan. If we instead spend 10 years putting 30 billion Yuan into accelerator science, more than 90% of the money will remain in the country, and improve our technical development and market share of domestic companies. This will also allow us to raise many new scientists and engineers, and greatly improve the state of art in domestically produced scientific instruments.

In addition, putting emphasis into high energy physics will only bring us to the normal funding level internationally (it is a fact that particle physics and nuclear physics are severely underfunded in China). For the purposes of developing a world-leading big science project, CEPC is a very good candidate. And it does not contradict a desire to also develop other basic sciences.

(D) Dr. Yang’s fourth objection is that both supersymmetry and quantum gravity have not been verified, and the particles we hope to discover using the new collider will in fact be nonexistent.That is of course not the goal of collider science. In [ref 1] which I gave to Dr. Yang myself, we clearly discussed the scientific purpose of the instrument. Briefly speaking, the standard model is only an effective theory in the low energy limit, and a new and deeper theory is need. Even though there are some experimental evidence beyond the standard model, more data will be needed to indicate the correct direction to develop the theory. Of the known problems with the standard model, most are related to the Higgs Boson. Thus a deeper physical theory should have hints in a better understanding of the Higgs boson. CEPC can probe to 1% precision [ed. I am not sure what this means] Higgs bosons, 10 times better than LHC. From this we have the hope to correctly identify various properties of the Higgs boson, and test whether it in fact matches the standard model. At the same time, CEPC has the possibility of measuring the self-coupling of the Higgs boson, of understanding the Higgs contribution to vacuum phase transition, which is important for understanding the early universe. [Ed. in this previous sentence, the translations are a bit questionable since some HEP jargon is used with which I am not familiar] Therefore, regardless of whether LHC has discovered new physics, CEPC is necessary.

If there are new coupling mechanisms for Higgs, new associated particles, composite structure for Higgs boson, or other differences from the standard model, we can continue with the second phase of the proton-proton collider, to directly probe the difference. Of course this could be due to supersymmetry, but it could also be due to other particles. For us experimentalists, while we care about theoretical predictions, our experiments are not designed only for them. To predict whether a collider can or cannot discover a hypothetical particle at this moment in time seems premature, and is not the view point of the HEP community in general.

(E) The fifth objection is that in the past 70 years high energy physics have not led to tangible improvements to humanity, and in the future likely will not.In the past 70 years, there are many results from high energy physics, which led to techniques common to everyday life. [Ed: list of examples include sychrotron radiation, free electron laser, scatter neutron source, MRI, PET, radiation therapy, touch screens, smart phones, the world-wide web. I omit the prose.]

[Ed. Author proceeds to discuss hypothetical economic benefits from

a) superconductor science

b) microwave source

c) cryogenics

d) electronics

sort of the usual stuff you see in funding proposals.]

(F) The sixth reason was that the institute for High Energy Physics of the Chinese Academy of Sciences has not produced much in the past 30 years. The major scientific contributions to the proposed collider will be directed by non-Chinese, and so the nobel will also go to a non-Chinese.[Ed. I’ll skip this section because it is a self-congratulatory pat on one’s back (we actually did pretty well for the amount of money invested), a promise to promote Chinese participation in the project (in accordance to the economic investment), and the required comment that “we do science for the sake of science, and not for winning the Nobel.”]

(G) The seventh reason is that the future in HEP is in developing a new technique to accelerate particles, and developing a geometric theory, not in building large accelerators.A new method to accelerate particles is definitely an important aspect to accelerator science. In the next several decades this can prove useful for scattering experiments or for applied fields where beam confinement is not essential. For high energy colliders, in terms of beam emittance and energy efficiency, new acceleration principles have a long way to go. During this period, high energy physics cannot be simply put on hold. In terms of “geometric theory” or “string theory”, these are too far from experimentally approachable, and is not a problem we can consider currently.

People disagree on the future of high energy physics. Currently there are no Chinese winners of the Nobel prize in physics, but there are many internationally. Dr. Yang’s viewpoints are clearly out of mainstream. Not just currently, but also in the past several decades. Dr. Yang has been documented to have held a pessimistic view of higher energy physics and its future since the 60s, and that’s how he missed out on the discovery of the standard model. He is on record as being against Chinese collider science since the 70s. It is fortunate that the government supported the Institute of High Energy Physics and constructed various supporting facilities, leading to our current achievements in synchrotron radiation and neutron scattering. For the future, we should listen to the younger scientists at the forefront of current research, for that’s how we can gain international recognition for our scientific research.

It will be very interesting to see how this plays out.

]]>

In Part 1 we looked at classical point particles interacting gravitationally. We saw they could convert an infinite amount of potential energy into kinetic energy in a finite time! Then we switched to electromagnetism, and went a bit beyond traditional Newtonian mechanics: in Part 2 we threw quantum mechanics into the mix, and in Part 3 we threw in special relativity. Briefly, quantum mechanics made things better, but special relativity made things worse.

Now let’s throw in *both!*

When we study charged particles interacting electromagnetically in a way that takes both quantum mechanics and special relativity into account, we are led to quantum field theory. The ensuing problems are vastly more complicated than in any of the physical theories we have discussed so far. They are also more consequential, since at present quantum field theory is our best description of all known forces except gravity. As a result, many of the best minds in 20th-century mathematics and physics have joined the fray, and it is impossible here to give more than a quick summary of the situation. This is especially true because the final outcome of the struggle is not yet known.

It is ironic that quantum field theory originally emerged as a *solution* to a problem involving the continuum nature of spacetime, now called the ‘ultraviolet catastrophe’. In classical electromagnetism, a box with mirrored walls containing only radiation acts like a collection of harmonic oscillators, one for each vibrational mode of the electromagnetic field. If we assume waves can have arbitrarily small wavelengths, there are infinitely many of these oscillators. In classical thermodynamics, a collection of harmonic oscillators in thermal equilibrium will share the available energy equally: this result is called the ‘equipartition theorem’.

Taken together, these principles lead to a dilemma worthy of Zeno. The energy in the box must be divided into an infinite number of equal parts. If the energy in each part is nonzero, the total energy in the box must be infinite. If it is zero, there can be no energy in the box.

For the logician, there is an easy way out: perhaps a box of electromagnetic radiation can only be in thermal equilibrium if it contains no energy at all! But this solution cannot satisfy the physicist, since it does not match what is actually observed. In reality, any nonnegative amount of energy is allowed in thermal equilibrium.

The way out of this dilemma was to change our concept of the harmonic oscillator. Planck did this in 1900, almost without noticing it. Classically, a harmonic oscillator can have any nonnegative amount of energy. Planck instead treated the energy

…not as a continuous, infinitely divisible quantity, but as a discrete quantity composed of an integral number of finite equal parts.

In modern notation, the allowed energies of a quantum harmonic oscillator are integer multiples of where is the oscillator’s frequency and is a new constant of nature, named after Planck. When energy can only take such discrete values, the equipartition theorem no longer applies. Instead, the principles of thermodynamics imply that there is a well-defined thermal equilibrium in which vibrational modes with shorter and shorter wavelengths, and thus higher and higher energies, hold less and less of the available energy. The results agree with experiments when the constant is given the right value.

The full import of what Planck had done became clear only later, starting with Einstein’s 1905 paper on the photoelectric effect, for which he won the Nobel prize. Einstein proposed that the discrete energy steps actually arise because light comes in particles, now called ‘photons’, with a photon of frequency carrying energy It was even later that Ehrenfest emphasized the role of the equipartition theorem in the original dilemma, and called this dilemma the ‘ultraviolet catastrophe’. As usual, the actual history is more complicated than the textbook summaries. For details, try:

• Helen Kragh, *Quantum Generations: A History of Physics in the Twentieth Century*, Princeton U. Press, Princeton, 1999.

The theory of the ‘free’ quantum electromagnetic field—that is, photons not interacting with charged particles—is now well-understood. It is a bit tricky to deal with an infinite collection of quantum harmonic oscillators, but since each evolves independently from all the rest, the issues are manageable. Many advances in analysis were required to tackle these issues in a rigorous way, but they were erected on a sturdy scaffolding of algebra. The reason is that the quantum harmonic oscillator is exactly solvable in terms of well-understood functions, and so is the free quantum electromagnetic field. By the 1930s, physicists knew precise formulas for the answers to more or less any problem involving the free quantum electromagnetic field. The challenge to mathematicians was then to find a coherent mathematical framework that takes us to these answers starting from clear assumptions. This challenge was taken up and completely met by the mid-1960s.

However, for physicists, the free quantum electromagnetic field is just the starting-point, since this field obeys a quantum version of Maxwell’s equations where the charge density and current density vanish. Far more interesting is ‘quantum electrodynamics’, or QED, where we also include fields describing charged particles—for example, electrons and their antiparticles, positrons—and try to impose a quantum version of the full-fledged Maxwell equations. Nobody has found a fully rigorous formulation of QED, nor has anyone proved such a thing cannot be found!

QED is part of a more complicated quantum field theory, the Standard Model, which describes the electromagnetic, weak and strong forces, quarks and leptons, and the Higgs boson. It is widely regarded as our best theory of elementary particles. Unfortunately, nobody has found a rigorous formulation of this theory either, despite decades of hard work by many smart physicists and mathematicians.

To spur progress, the Clay Mathematics Institute has offered a million-dollar prize for anyone who can prove a widely believed claim about a class of quantum field theories called ‘pure Yang–Mills theories’.

A good example is the fragment of the Standard Model that describes only the strong force—or in other words, only gluons. Unlike photons in QED, gluons interact with each other. To win the prize, one must prove that the theory describing them is mathematically consistent and that it describes a world where the lightest particle is a ‘glueball’: a blob made of gluons, with mass strictly greater than zero. This theory is considerably simpler than the Standard Model. However, it is already very challenging.

This is not the only million-dollar prize that the Clay Mathematics Institute is offering for struggles with the continuum. They are also offering one for a proof of global existence of solutions to the Navier–Stokes equations for fluid flow. However, their quantum field theory challenge is the only one for which the problem statement is not completely precise. The Navier–Stokes equations are a collection of partial differential equations for the velocity and pressure of a fluid. We know how to precisely phrase the question of whether these equations have a well-defined solution for all time given smooth initial data. Describing a quantum field theory is a trickier business!

To be sure, there are a number of axiomatic frameworks for quantum field theory:

• Ray Streater and Arthur Wightman, *PCT, Spin and Statistics, and All That*, Benjamin Cummings, San Francisco, 1964.

• James Glimm and Arthur Jaffe, *Quantum Physics: A Functional Integral Point of View*, Springer, Berlin, 1987.

• John C. Baez, Irving Segal and Zhengfang Zhou, *Introduction to Algebraic and Constructive Quantum Field Theory*, Princeton U. Press, Princeton, 1992.

• Rudolf Haag, *Local Quantum Physics: Fields, Particles, Algebras*, Springer, Berlin, 1996.

We can prove physically interesting theorems from these axioms, and also rigorously construct some quantum field theories obeying these axioms. The easiest are the free theories, which describe non-interacting particles. There are also examples of rigorously construted quantum field theories that describe interacting particles in fewer than 4 spacetime dimensions. However, no quantum field theory that describes interacting particles in 4-dimensional spacetime has been proved to obey the usual axioms. Thus, much of the wisdom of physicists concerning quantum field theory has not been fully transformed into rigorous mathematics.

Worse, the question of whether a particular quantum field theory studied by physicists obeys the usual axioms is not completely precise—at least, not yet. The problem is that going from the physicists’ formulation to a mathematical structure that might or might not obey the axioms involves some choices.

This is not a cause for despair; it simply means that there is much work left to be done. In practice, quantum field theory is marvelously good for calculating answers to physics questions. The answers involve approximations. In practice the approximations work very well. The problem is just that we do not fully understand, in a mathematically rigorous way, what these approximations are supposed to be approximating.

How could this be? In the next part, I’ll sketch some of the key issues in the case of quantum electrodynamics. I won’t get into all the technical details: they’re too complicated to explain in one blog article. Instead, I’ll just try to give you a feel for what’s at stake.

]]>