The Madelung Rules

8 December, 2021

I’ve been thinking about chemistry lately. I’m always amazed by how far we can get in the study of multi-electron atoms using ideas from the hydrogen atom, which has just one electron.

If we ignore the electron’s spin and special relativity, and use just Schrödinger’s equation, the hydrogen atom is exactly solvable—and a key part of any thorough introduction to quantum mechanics. So I’ll zip through that material now, saying just enough to introduce the ‘Madelung rules’, which are two incredibly important rules of thumb for how electrons behave in the ground states of multi-electron atoms.

If take the Hilbert space \mathbf{H} of bound states of the hydrogen atom, ignoring electron spin, we can decompose it as a direct sum of subspaces:

\displaystyle{ \mathbf{H} = \bigoplus_{n = 1}^\infty \mathbf{H}_n }

where the energy equals -1/n^2 in the subspace \mathbf{H}_n.

Since the hydrogen atom has rotational symmetry, each subspace \mathbf{H}_n further decomposes into irreducible representations of the rotation group \mathrm{SO}(3). Any such representation is classified by a natural number \ell = 0, 1, 2, \dots. Physically this number describes angular momentum; mathematically it describes the dimension of the representation. The angular momentum is \sqrt{\ell(\ell + 1)}, while the dimension is 2\ell + 1. Concretely, we can think of this representation as the space of homogeneous polynomials of degree \ell with vanishing Laplacian. I’ll say a bit more about this next time.

The subspace \mathbf{H}_n decomposes as a direct sum of irreducible representations with 0 \le \ell \le n - 1, like this:

\displaystyle{\mathbf{H}_n = \bigoplus_{\ell = 0}^n \mathbf{H}_{n,\ell} }

So, in the subspace \mathbf{H}_{n,\ell} the electron has energy -1/n^2 and angular momentum \sqrt{\ell(\ell + 1)}.

I’ve been ignoring the electron’s spin, but we shouldn’t ignore it completely, because it’s very important for understanding the periodic table and chemistry in general. To take it into account, all I will do is tensor \mathbf{H} with \mathbb{C}^2, to account for the electron’s two spin states. So, the true Hilbert space of bound states of a hydrogen atom is \mathbf{H} \otimes \mathbb{C}^2.

By what I’ve said, we can decompose this Hilbert space into subspaces

\mathbf{H}_n \otimes \mathbb{C}^2

called shells, getting this:

\displaystyle{ \mathbf{H} \otimes \mathbb{C}^2 = \bigoplus_{n = 1}^\infty \mathbf{H}_n \otimes \mathbb{C}^2 }

We can then decompose the shells further into subspaces

\mathbf{H}_{n,\ell} \otimes \mathbb{C}

called subshells, obtaining our final result:

\displaystyle{ \mathbf{H} \otimes \mathbb{C}^2 = \bigoplus_{n = 1}^\infty \bigoplus_{\ell = 0}^{n-1} \mathbf{H}_{n , \ell} \otimes \mathbb{C}^2 }


\dim(\mathbf{H}_{n,\ell}) = 2\ell + 1

the dimensions of the subshells are

\dim(\mathbf{H}_{n,\ell} \otimes \mathbb{C}^2) = 2(2\ell + 1)

or in other words, twice the odd numbers:

2, 6, 10, 14, \dots

This lets us calculate the dimensions of the shells:

\displaystyle{ \dim(\mathbf{H}_{n}) = \sum_{\ell = 0}^{n-1} \dim(\mathbf{H}_{n,\ell})  = 2\sum_{\ell = 0}^{n-1} (2\ell + 1) = 2n^2}

since the sum of the first bunch of odd numbers is perfect square. So, the dimensions of the shells are twice the perfect squares:

2, 8, 18, 32, \dots

Okay, so much for hydrogen!

Now, the ‘Aufbau principle’ says that as we keep going through the periodic table, looking at elements with more and more electrons, these electrons will act approximately as if each one occupies a subshell. Thanks to the Pauli exclusion principle, we can’t put more electrons into some subshell than the dimension of that subshell. So the big question is: which subshell does each electron go into?

This is the question that the Madelung rules answer! Here’s what they say:

  1. Electrons are assigned to subshells in order of increasing values of n + \ell.
  2. For subshells with the same value of n + \ell, electrons are assigned first to the subshell with lower n—or equivalently, higher \ell.

They aren’t always right, but they’re damned good, given that in reality we’ve got a bunch of electrons interacting with each other—and not even described by separate wavefunctions, but really one big fat ‘entangled’ wavefunction.

Here’s what the Madelung rules predict:

So, subshells get filled in order of increasing n + \ell, and for any choice of n + \ell we start with the biggest possible \ell and work down.

This chart uses some old-fashioned but still very popular notation for the subshells. We say the number n, but instead of saying the number \ell we use a letter:

\ell = 0: s
\ell = 1: p
\ell = 2: d
\ell = 3: f

The reasons for these letters would make for a long and thrilling story, but not today.

The history of the Madelung rules

At this point I should go through the periodic table and show you how well the Madelung rules predict what’s going on. Basically, as we fill a particular subshell we get a bunch of related elements, and then as we go on up to the next subshell we get another bunch—and we can understand a lot about their properties! But there are also plenty of deviations from the Madelung rules, which are also interesting. I’ll talk about all these things next time.

I should also say a bit about why the Madelung rules work as well as they do! For example, what’s the importance of n + \ell?

But that’s not what I have lined up for today. Sorry! Instead, I want to talk about something much less important: the historical origin of the Madelung rules. According to Wikipedia they have many names:

• the Madelung rule (after Erwin Madelung)
• the Janet rule (after Charles Janet)
• the Klechkowsky rule (after Vsevolod Klechkovsky)
• Wiswesser’s rule (after William Wiswesser)
• the Aufbau approximation
• the diagonal rule, and
• the Uncle Wiggly path.

Seriously! Uncle Wiggly!

Understanding the history of these rules is going to be difficult. You can read a lot about their prehistory here:

• Wikipedia, Aufbau principle.

Bohr and Sommerfeld played a big role in setting up the theory of shells and subshells, and perhaps the whole idea of ‘Aufbau’: German for ‘building up’ atoms one electron at a time.

But I was confused about whether Madelung discovered both rules or just one, especially because a lot of people say ‘Madelung rule’ in the singular, when there are really two. So I asked on History of Science and Mathematics Stackexchange:

There are two widely used rules of thumb to determine which subshells are filled in a neutral atom in its ground state:

• Electrons are assigned to subshells in order of increasing value of n + \ell.

• For subshells with the same value of n + \ell, electrons are assigned first to the subshell with lower n.

These rules don’t always hold, but that’s not my concern here. My question is: which of these rules did Erwin Madelung discover? (Both? Just one?)

Wikipedia seems to say Charles Janet discovered the first in 1928:

A periodic table in which each row corresponds to one value of n + \ell (where the values of n and \ell correspond to the principal and azimuthal quantum numbers respectively) was suggested by Charles Janet in 1928, and in 1930 he made explicit the quantum basis of this pattern, based on knowledge of atomic ground states determined by the analysis of atomic spectra. This table came to be referred to as the left-step table. Janet “adjusted” some of the actual n + \ell values of the elements, since they did not accord with his energy ordering rule, and he considered that the discrepancies involved must have arisen from measurement errors. In the event, the actual values were correct and the n + \ell energy ordering rule turned out to be an approximation rather than a perfect fit, although for all elements that are exceptions the regularised configuration is a low-energy excited state, well within reach of chemical bond energies.

It then goes on to say:

In 1936, the German physicist Erwin Madelung proposed this as an empirical rule for the order of filling atomic subshells, and most English-language sources therefore refer to the Madelung rule. Madelung may have been aware of this pattern as early as 1926.

Is “this” still rule 1?

It then goes on to say:

In 1945 William Wiswesser proposed that the subshells are filled in order of increasing values of the function

{\displaystyle W(n,\ell)=n+\ell-{\frac {\ell}{\ell+1}}.}

This is equivalent to the combination of rules 1 and 2. So, this article seems to suggest that rule 2 was discovered by Wiswesser. But I have my doubts. Goudsmit and Richards write:

Madelung discovered a simple empirical rule for neutral atoms. It consists of two parts.

and then they list both 1 and 2.

I have not yet managed to get ahold of Erwin Madelung’s work, e.g.

• E. Madelung, Die mathematischen Hilfsmittel der Physikers, Springer, Berlin, 1936

I got a helpful reply from M. Farooq:

The only relevant section in E. Madelung’s edited book, Die mathematischen Hilfsmittel der Physikers, Springer: Berlin, 1936 is a small paragraph. It seems Madelung just called this for electron book-keeping and he had no justification for his proposition. He calls it a lexicographic order (lexikographische Ordnung).

I was searching for it some other reasons several months ago: Question in SE Physics. The original and the (machine but edited) translations is shared.

15 Atomic structure (electron catalog) (to p. 301).

The eigenfunction of an atom, consisting of Z electrons and Z-times positively charged nucleus, can be constructed in the case of removed degeneracy in first approximation as a product of Z hydrogen eigenfunctions (cf. p. 356), each of which is defined by four quantum numbers n, \ell, m, s defined by n>0, n-1 \geqq \ell \geqq 0, s=\pm \frac{1}{2}, m \leqq l. According to Pauli’s principle, no two of these functions may coincide in all four quantum numbers. According to the Bohr principle of structure, an atom with Z electrons is formed from an atom with (Z-1) electrons by adding another one (and increasing the nuclear charge by 1) without changing the quantum numbers of the already existing electrons. Therefore a catalog can be set up, from whose in each case Z first positions the atom is built up in the basic state (cf. the table p. 360).

The ordering principle of this catalog is a lexicographic order according to the numbers (n+\ell), n, s, m. A theoretical justification of just this arrangement is not yet available. One reads from it:

1. The periodic system of the elements. Two atoms are homologous if in each case their “last electron” in the l, m, s coincides.

2. The spectroscopic character of the basic term, entered in column 10 . There is namely |\Sigma m|=0,1,2,3 \ldots the characterS P, D, F, G, H, I \ldots and (2|\Sigma s|+1) the multiplicity.

3. The possibilities for excited states (possible terms), where not all Z electrons are in the first Z positions of the catalog.

The catalog is the representation form of an empirical rule. It idealizes the experience, because in some cases deviations are observed.

So, it’s clear that Madelung knew and efficiently stated both rules in 1936! By the way, ‘lexicographic order’ is the standard mathematical term for how we’re ordering the pairs (n+\ell, n) in the Madelung rules.


I took the handsome chart illustrating the Madelung rules from here:

Aufbau Principle Chemistry God, 18 February 2020.

This blog article is definitely worth reading!

Compositional Thermostatics (Part 1)

22 November, 2021

At the Topos Institute this summer, a group of folks started talking about thermodynamics and category theory. It probably started because Spencer Breiner and my former student Joe Moeller, both working at NIST, were talking about thermodynamics with some people there. But I’ve been interested in thermodynamics for quite a while now –and Owen Lynch, a grad student visiting from the University of Utrecht, wanted to do his master’s thesis on the subject. He’s now working with me. Sophie Libkind, David Spivak and David Jaz Myers also joined in: they’re especially interested in open systems and how they interact.

Prompted by these conversations, a subset of us eventually wrote a paper on the foundations of equilibrium thermodynamics:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

The idea here is to describe classical thermodynamics, classical statistical mechanics and quantum statistical mechanics in a unified framework based on entropy maximization. This framework can also handle ‘generalized probabilistic theories’ of the sort studied in quantum foundations—that is, theories like quantum mechanics, but more general.

To unify all these theories, we define a ‘thermostatic system’ to be any convex space X of ‘states’ together with a concave function

S \colon X \to [-\infty, \infty]

assigning to each state an ‘entropy’.

Whenever several such systems are combined and allowed to come to equilibrium, the new equilibrium state maximizes the total entropy subject to constraints. We explain how to express this idea using an operad. Intuitively speaking, the operad we construct has as operations all possible ways of combining thermostatic systems. For example, there is an operation that combines two gases in such a way that they can exchange energy and volume, but not particles—and another operation that lets them exchange only particles, and so on.

It is crucial to use a sufficiently general concept of ‘convex space’, which need not be a convex subset of a vector space. Luckily there has been a lot of work on this, so we can just grab a good definition off the shelf:

Definition. A convex space is a set X with an operation c_\lambda \colon X \times X \to X for each \lambda \in [0, 1] such that the following identities hold:

1) c_1(x, y) = x

2) c_\lambda(x, x) = x

3) c_\lambda(x, y) = c_{1-\lambda}(y, x)

4) c_\lambda(c_\mu(x, y) , z) = c_{\lambda'}(x, c_{\mu'}(y, z)) for all 0 \le \lambda, \mu, \lambda', \mu' \le 1 satisfying \lambda\mu = \lambda' and 1-\lambda = (1-\lambda')(1-\mu').

To understand these axioms, especially the last, you need to check that any vector space is a convex space with

c_\lambda(x, y) = \lambda x + (1-\lambda)y

So, these operations c_\lambda describe ‘convex linear combinations’.

Indeed, any subset of a vector space closed under convex linear combinations is a convex space! But there are other examples too.

In 1949, the famous mathematician Marshall Stone invented ‘barycentric algebras’. These are convex spaces satisfying one extra axiom: the cancellation axiom, which says that whenever \lambda \ne 0,

c_\lambda(x,y) = c_\lambda(x',y) \implies x = x'

He proved that any barycentric algebra is isomorphic to a convex subset of a vector space. Later Walter Neumann noted that a convex space, defined as above, is isomorphic to a convex subset of a vector space if and only if the cancellation axiom holds.

Dropping the cancellation axiom has convenient formal consequences, since the resulting more general convex spaces can then be defined as algebras of a finitary commutative monad, giving the category of convex spaces very good properties.

But dropping this axiom is no mere formal nicety. In our definition of ‘thermostatic system’, we need the set of possible values of entropy to be a convex space. One obvious candidate is the set [0,\infty). However, for a well-behaved formalism based on entropy maximization, we want the supremum of any set of entropies to be well-defined. This forces us to consider the larger set [0,\infty], which does not obey the cancellation axiom.

But even that is not good enough! In thermodynamics you often read about the ‘heat bath‘, an idealized system that can absorb or emit an arbitrarily large amount of energy while keeping a fixed temperature. We want to treat the ‘heat bath’ as a thermostatic system on an equal footing with any other. To do this, we need to allow consider negative entropies—not because the heat bath can have negative entropy, but because it acts as an infinite reservoir of entropy, and the change in entropy from its default state can be positive or negative.

This suggests letting entropies take values in the convex space \mathbb{R}. But then the requirement that any set of entropies have a supremum (including empty and unbounded sets) forces us to use the larger convex space [-\infty,\infty].

This does not obey the cancellation axiom, so there is no way to think of it as a convex subset of a vector space. In fact, it’s not even immediately obvious how to make it into a convex space at all! After all, what do you get when you take a nontrivial convex linear combination of \infty and -\infty? You’ll have to read our paper for the answer, and the justification.

We then define a thermostatic system to be a convex set X together with a concave function

S \colon X \to [-\infty, \infty]

where concave means that

S(c_\lambda(x,y)) \ge c_\lambda(S(x), S(y))

We give lots of examples from classical thermodynamics, classical and quantum statistical mechanics, and beyond—including our friend the ‘heat bath’.

For example, suppose X is the set of probability distributions on an n-element set, and suppose S \colon X \to [-\infty, \infty] is the Shannon entropy

\displaystyle{ S(p) = - \sum_{i = 1}^n p_i \log p_i }

Then given two probability distributions p and q, we have

S(\lambda p + (1-\lambda q)) \ge \lambda S(p) + (1-\lambda) S(q)

for all \lambda \in [0,1]. So this entropy function is convex, and S \colon X \to [-\infty, \infty] defines a thermostatic system. But in this example the entropy only takes nonnegative values, and is never infinite, so you need to look at other examples to see why we want to let entropy take values in [-\infty,\infty].

After looking at examples of thermostatic systems, we define an operad whose operations are convex-linear relations from a product of convex spaces to a single convex space. And then we prove that thermostatic systems give an algebra for this operad: that is, we can really stick together thermostatic systems in all these ways. The trick is computing the entropy function of the new composed system from the entropy functions of its parts: this is where entropy maximization comes in.

For a nice introduction to these ideas, check out Owen’s blog article:

• Owen Lynch, Compositional thermostatics, Topos Institute Blog, 9 September 2021.

And then comes the really interesting part: checking that this adequately captures many of the examples physicists have thought about!

The picture at the top of this post shows one that we discuss: two cylinders of ideal gas with a movable divider between them that’s permeable to heat. Yes, this is an operation in an operad—and if you tell us the entropy function of each cylinder of gas, our formalism will automatically compute the entropy function of the resulting combination of these two cylinders.

There are many other examples. Did you ever hear of the ‘canonical ensemble’, the ‘microcanonical ensemble’, or the ‘grand canonical ensemble’? Those are famous things in statistical mechanics. We show how our formalism recovers these.

I’m sure there’s much more to be done. But I feel happy to see modern math being put to good use: making the foundations of thermodynamics more precise. Once Vladimir Arnol’d wrote:

Every mathematician knows that it is impossible to understand any elementary course in thermodynamics.

I’m not sure our work will help with that—and indeed, it’s possible that once the mathematicians finally understand thermodynamics, physicists won’t understand what the mathematicians are talking about! But at least we’re clearly seeing some more of the mathematical structures that are hinted at, but not fully spelled out, in such an elementary course.

I expect that our work will interact nicely with Simon Willerton’s work on the Legendre transform. The Legendre transform of a concave (or convex) function is widely used in thermostatics, and Simon describes this for functions valued in [-\infty,\infty] using enriched profunctors:

• Simon Willerton, Enrichment and the Legendre–Fenchel transform I, The n-Category Café, April 16, 2014.

• Simon Willerton, Enrichment and the Legendre–Fenchel transform II, The n-Category Café, May 22, 2014.

He also has a paper on this, and you can see him talk about it on YouTube.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

The Kuramoto–Sivashinsky Equation (Part 7)

3 November, 2021

I have a lot of catching up to do. I want to share a bunch of work by Steve Huntsman. I’ll start with some older material. A bit of this may be ‘outdated’ by his later work, but I figure it’s all worth recording.

One goal here is to define ‘stripes’ for the Kuramoto–Sivashinky equation in a way that lets us count them, their births, and their mergers, and so on. We need a good definition to test the conjectures I made in Part 1.

While I originally formulated my conjectures for the ‘integral form’ of the
Kuramoto–Sivashinky equation:

h_t + h_{xx} + h_{xxxx} + \frac{1}{2} (h_x)^2 = 0

Steve has mostly been working with the derivative form:

u_t + u_{xx} + u_{xxxx} + u u_x = 0

so you can assume that unless I say otherwise. He’s using periodic boundary conditions such that

u(t,x) = u(t,x+L)

for some length L. The length depends on the particular experiment he’s doing.

First, a plot of stripes. It looks like L = 100 here:

Births and deaths are shown as green and red dots, respectively. But to see them, you may need to click on the picture to enlarge it!

According to my conjecture there should be no red dots. The red dots at the top and the bottom of the image don’t count: they mostly arise because this program doesn’t take the periodic boundary conditions into account. There are two other red dots, which are worth thinking about.

Nice! But how are stripes being defined here? He describes how:

The stripe definition is mostly pretty simple and not image processy at all, and the trick to improve it is limited to removing little blobs and is easily explained.

Let u(t,x) be the solution to the KSE. Then let

v(t,x) := u(t,x)- u(t,x+a)

where a is the average integer offset (maybe I’m missing a minus sign a la -a) that maximizes the cross-correlation between u(t,x) and -u(t,x+a). Now anywhere v exceeds its median is part of a stripe.

The image processing trick is that I delete little stripes (and I use what image processors would call 4-connectivity to define simply connected regions—this is the conservative idea that a pixel should have a neighbor to the north, south, east, or west to be connected to that neighbor, instead of the aggressive 8-connectivity that allows NE, NW, SE, SW too) whose area is less than 1000 grid points. So it uses lots of image processing machinery to actually do its job, but the definition is simple and easily explained mathematically.

An obvious fix that removes the two nontrivial deaths in the picture I sent is to require a death to be sufficiently far away from another stripe: here I am guessing that the characteristic radius of a stripe will work just fine.

Learn Applied Category Theory!

27 October, 2021

Do you like the idea of learning applied category theory by working on a project, as part of a team led by an expert? If you’re an early career researcher you can apply to do that now!

Mathematical Research Community: Applied Category Theory, meeting 2022 May 29–June 4. Details on how to apply: here. Deadline to apply: Tuesday 2022 February 15 at 11:59 Eastern Time.

After working with your team online, you’ll take an all-expenses-paid trip to a conference center in upstate New York for a week in the summer. There will be a pool, bocci, lakes with canoes, woods to hike around in, campfires at night… and also whiteboards, meeting rooms, and coffee available 24 hours a day to power your research!

Later you’ll get invited to the 2023 Joint Mathematics Meetings in Boston.

There will be three projects to choose from:

Valeria de Paiva (Topos Institute) will lead a study in the context of computer science that investigates indexed containers and partial compilers using lenses and Dialectica categories.

Nina Otter (Queen Mary University of London) will lead a study of social networks using simplicial complexes.

John Baez (University of California, Riverside) will lead a study of chemical reaction networks using category theoretic methods such as structured cospans.

The whole thing is being organized by Daniel Cicala of the University of New Haven:

and Simon Cho of Two Six Technologies:

I should add that this is just one of four ‘Mathematical Research Communities’ run by the American Mathematical Society in 2022, and you may prefer another. The applied category theory session will be held at the same time and place as one on data science! Then there are two more:

• Week 1a: Applied Category Theory

Organizers: John Baez, University of California, Riverside; Simon Cho, Two Six Technologies; Daniel Cicala, University of New Haven; Nina Otter, Queen Mary University of London; Valeria de Paiva, Topos Institute.

• Week 1b: Data Science at the Crossroads of Analysis, Geometry, and Topology

Organizers: Marina Meila, University of Washington; Facundo Mémoli, The Ohio State University; Jose Perea, Northeastern University; Nicolas Garcia Trillos, University of Wisconsin-Madison; Soledad Villar, Johns Hopkins University.

• Week 2a: Models and Methods for Sparse (Hyper)Network Science

Organizers: Sinan G. Aksoy, Pacific Northwest National Laboratory; Aric Hagberg, Los Alamos National Laboratory; Cliff Joslyn, Pacific Northwest National Laboratory; Bill Kay, Oak Ridge National Laboratory; Emilie Purvine, Pacific Northwest National Laboratory; Stephen J. Young, Pacific Northwest National Laboratory; Jennifer Webster, Pacific Northwest National Laboratory.

• Week 2b: Trees in Many Contexts

Organizers: Miklós Bóna, University of Florida; Éva Czabarka, University of South Carolina; Heather Smith Blake, Davidson College; Stephan Wagner, Uppsala University; Hua Wang, Georgia Southern University.

Applicants should be ready to engage in collaborative research and should be “early career”—either expecting to earn a PhD within two years or having completed a PhD within five years of the date of the summer conference. Exceptions to this limit on the career stage of an applicant may be made on a case-by-case basis. The Mathematical Research Community (MRC) program is open to individuals who are US citizens as well as to those who are affiliated with US institutions and companies/organizations. A few international participants may be accepted. Depending on space and other factors, a small number of self-funded participants may be admitted. Individuals who have once previously been an MRC participant will be considered for admission, and their applications must include a rationale for repeating. Please note that individuals cannot participate in the MRC program more than twice: applications from individuals who have twice been MRC participants will not be considered.

We seek individuals who will both contribute to and benefit from the MRC experience, and the goal is to create a collaborative research community that is vibrant, productive, and diverse. We welcome applicants from academic institutions of all types, as well as from private industry and government laboratories and agencies. Women and under-represented minorities are especially encouraged to apply.

All participants are expected to be active in the full array of MRC activities—the summer conference, special sessions at the Joint Mathematics Meetings, and follow-up collaborations.

The Kuramoto–Sivashinsky Equation (Part 6)

25 October, 2021

guest post by Theodore Kolokolnikov

I coded up a simple dynamical system with the following rules (loosely motivated by theory of motion of spikes in reaction-diffusion systems, see e.g. appendix of this paper, as well as this paper):

• insert a particle if inter-particle distance is more than some maxdist
• merge any two particles that collide
• otherwise evolve particles according to the ODE

\displaystyle{ x'_k(t) = \sum_{j=1}^N G_x(x_k, x_j) }

Here, G is a Green’s function that satisfies

G_{xx}-\lambda^2 G = -\delta(x,y)

inside the interval [-L,L] with Neumann boundary conditions G_x(\pm L, y)=0. Explicitly,

\displaystyle{ G(x,y) = \frac{\cosh((x+y)\lambda)+\cosh((2L-|x-y|) \lambda )}{2 \lambda \sinh(2L \lambda ) }  }


\displaystyle{ G_x(x,y)= \frac{\sinh((x+y) \lambda )+ \sinh((|x-y|-2L) \lambda ) \mbox{sign}(x-y) } {2 \sinh(2L\lambda)} }

where sign(0) is taken to be zero so that

\displaystyle{ G_x(y,y) := \frac{G_x(y^+,y)+ G_x(y^-,y)}{2} }

In particular, for large \lambda, one has

\displaystyle{ G(x,y)\sim\frac{e^{-\lambda | x-y|}}{2\lambda} }


\displaystyle{ G_x(x,y)\sim-\frac{e^{-\lambda | x-y|}} {2} \mbox{sign}(x-y), ~~\lambda \gg 1 }

Here are some of the resulting simulations, with different \lambda (including complex \lambda). This is mainly just for fun but there is a wide range of behaviours. In particular, I think the large-lambda limit should be able to capture analogous dynamics in the Keller-Siegel model with logistic growth (Hillen et. al.), see e.g. figures in this paper.

The Kuramoto–Sivashinsky Equation (Part 5)

24 October, 2021

In Parts 3 and 4, I showed some work of Cheyne Weis on the ‘derivative form’ of the Kuramoto–Sivashinksy equation, namely

u_t + u_{xx} + u_{xxxx} + u u_x = 0

Steve Huntsman’s picture of a solution above gives you a good feel for how this works.

Now let’s turn to the ‘integral form’, namely

h_t + h_{xx} + h_{xxxx} + \frac{1}{2} (h_x)^2 = 0

This has rather different behavior, though it’s closely related, since if h is any solution of the integral form then

u = h_x

is a solution of the derivative form.

Cheyne drew a solution of the integral form:

You’ll immediately see the most prominent feature: it slopes down! I’ll show later that the average of h over space can never increase with time, and it decreases unless h is constant as a function of space. By contrast, we saw in Part 2 that the average of u over space never changes with time.

However, we can subtract off the average of h over space to eliminate this dramatic but rather boring effect. The result looks like this:

Now it’s very easy to see the ‘stripes’ I’m so obsessed with: they are the ridges in these pictures. You can see how as time increases from left to right these stripes are born and merge, but never die or split.

But how can we mathematically define these stripes, to make it possible to state precise conjectures about them? We could try defining them to be points where u is locally maximized of as a function of x at any time t. With this definition, Cheyne gets stripes like this:

The previous picture shows up in the lower right hand corner of this one.

These stripes look pretty good, but you’ll see some gaps where they momentarily disappear and then reappear. I don’t think these invalidate my conjecture that stripes never ‘die’. I just think this definition of stripe is not quite right. (Of course I would think that, wouldn’t I? I want the conjecture to be true!)

Cheyne thought that maybe overlaying maxima in time would help:

This fills in some gaps, but there are still stripes that momentarily die, only to be shortly reborn. It might be good to define stripes to be points where this function— u minus its average over space—exceeds a certain cutoff.

Let’s conclude by proving that the average of h over space can never increase with time. To prove this, just take the time derivative of the integral of h over space, and show it’s \le 0. Remember that we’re assuming h(t,x) is periodic in x with period L, so ‘space’ is the interval [0,L] with its endpoints identified to form a circle. So, we get

\begin{array}{ccl}  \displaystyle{ \frac{d}{d t} \int_0^L h(t,x) \, dx } &=& \displaystyle{ \int_0^L h_t(t,x) \, dx } \\ \\  &=& \displaystyle{ -\int_0^L \left( h_{xx} + h_{xxxx} + \frac{1}{2} (h_x)^2 \right) \, dx } \\ \\  &=& \displaystyle{  -\left( h_x + h_{xxx} \right) \Big|_0^L -\int_0^L (h_x)^2 \, dx } \\ \\  &=& \displaystyle{ -\int_0^L (h_x)^2 \, dx }  \end{array}

This is \le 0, as desired. Moreover, it’s zero iff h is constant as a function on space!

The Kuramoto–Sivashinsky Equation (Part 4)

23 October, 2021

Here is some more work by Cheyne Weis. Last time I explained that Cheyne and Steve Huntsman were solving the ‘derivative form’ of the Kuramoto–Sivashinsky equation, namely this:

u_t + u_{xx} + u_{xxxx} + u u_x = 0

Above is one of Steve’s pictures of a typical solution with its characteristic ‘stripes’. Cheyne started trying to identify these stripes as locations where du/dx \le c for some cutoff c, for example c = -0.7. He made some nice 3d views of du/dx illustrating the problems with this. As he explains:

So there is the tradeoff that if I make the cutoff too high, the sections that look greenish are getting identified as stripes. If I make it too low, the points near where two stripes merge disappear. I attached some 3D plots showing the landscape of du/dx reiterating this point.

The disappearance of the stripes as they merge is unavoidable to a certain extent. The plots in the PowerPoint show how even when there is no cutoff, there is some gap between the merging stripes. There may be some characteristic length scale needed to qualify them as merging.

You can click on these pictures to make them bigger.

The Kuramoto–Sivashinsky Equation (Part 3)

23 October, 2021

I’ve been getting a lot of help from Steve Huntsman and also Cheyne Weis, who is a physics grad student at the University of Chicago. You can see a lot, but far from all, of Steve’s work as comments on part 1. Here are some things Cheyne has been doing.

Cheyne started out working with the ‘derivative form’ of the Kuramoto–Sivashinsky equation, meaning this:

u_t + u_{xx} + u_{xxxx} + u u_x = 0

and he soon noticed what Steve made clear in the image above: the ‘stripes’ in solutions of this equation aren’t ‘bumps’ (regions where u is large) but regions where the solution is rapidly changing from positive to negative. This suggests a way to define stripes: look for where du/dx < c for some negative c. It seems c = -0.7 is a pretty good choice.

I thought maybe it would be better to use the derivative of the PDE’s solution (du/dx) to define the stripes. You can find an image of this in the attached PowerPoint.

The second slide has another image where the lines represent the minima of du/dx (as a function of x) that are below a certain threshold c. You can see these lines appearing and combining as apparent in Thien An’s animation. Hopefully this is some progress on the definition of a “bump”. If you agree, I could use this to test some of your other conjectures.

Here are the result for a range of alternative choices of c. The problem, if we’re seeking a definition of ‘stripe’ where stripes never die as time passes, is the presence of short ‘ministripes’ that die shortly after they appear. What’s really going on, I believe, is that when small stripes merge with larger ones, the derivative du/dx becomes smaller in absolutely value, thus going above the cutoff c. In short, merging is being misinterpreted as death.

The Kuramoto–Sivashinsky Equation (Part 2)

22 October, 2021

I love the Kuramoto–Sivashinsky equation, beautifully depicted here by Thien An, because it’s one of the simplest partial differential equations that displays both chaos and a visible ‘arrow of time’. Last time I made some conjectures about it. Most notably, I conjecture that the ‘stripes’ you see above can appear or merge as time passes, but never disappear or split.

But I was quite confused for a while, because the Kuramoto–Sivashinsky equation comes in two forms. The integral form, which I spent most of my time discussing, is this:

h_t + h_{xx} + h_{xxxx} + \frac{1}{2} (h_x)^2 = 0

where h(t,x) is a real function of two real variables, ‘time’ t and ‘space’ x. The derivative form is this:

u_t + u_{xx} + u_{xxxx} + u u_x = 0

where u(t,x) is again a real function of two real variables. If h is a solution of the integral form,

u = h_x

is a solution of the derivative form. This is easy to see: just take the x derivative of the equation with h in it and see what you get.

By the way, beware of this: Thien An’s wonderful movie above shows the integral form of the equation, but she calls the function u.

Note that if h has a local maximum as a function of x, then u will go positive and then negative. This is important because it affects the definition of ‘stripe’. Here’s what stripes look in the derivative form of the Kuramoto–Sivashinsky equation, as computed by Steve Huntsman:

As you can see, the function u goes negative and then positive as we increase x moving through a stripe. This means that h would have a local minimum.

Another thing puzzling me was that Steve Huntsman found solutions of the derivative form where the stripes in u mostly move in the same direction as time passes:

This suggests that there’s some way to take a solution where the stripes aren’t moving much, and switch to a moving frame of reference to get a solution with moving stripes. And indeed I’ll show that’s true! But I’ll also show it’s not true for the integral form of the Kuramoto–Sivashinsky equation—at least not with periodic boundary conditions, which is what we’re using here.

Galilean transformations

Galileo was the guy who first emphasized that the laws of physics look the same in a moving frame of reference, but later Maxwell explained the idea very poetically:

Our whole progress up to this point may be described as a gradual development of the doctrine of relativity of all physical phenomena. Position we must evidently acknowledge to be relative, for we cannot describe the position of a body in any terms which do not express relation… There are no landmarks in space; one portion of space is exactly like every other portion, so that we cannot tell where we are. We are, as it were, on an unruffled sea, without stars, compass, sounding, wind or tide, and we cannot tell in which direction we are going. We have no log which we can case out to take a dead reckoning by; we may compute our rate of motion with respect to the neighboring bodies, but we do not know how these bodies may be moving in space.

Before Einstein came along, a transformation into a moving frame of reference worked like this:

t \mapsto t
x \mapsto x + vt

where v is a constant, the velocity.

These are called ‘Galilean transformations’ And here’s something cool: the derivative form of the Kuramoto–Sivashinsky equation has Galilean transformations as symmetries! If u(t,x) is a solution, so is the boosted function

u'(t,x) = u(t,x+vt) - v

The prime here does not mean derivative: u' is the name of our new boosted solution. To get this boosted solution, we do the obvious coordinate transformation into a moving frame of reference, but then a bit more: we must subtract the constant v.

Let’s see how this works! Suppose u is a solution of the derivative form of the Kuramoto–Sivashinsky equation:

u_t + u_{xx} + u_{xxxx} + u u_x = 0

Then defining

u'(t,x) = u(t, x+vt) - v

we see the t derivative of u' has an extra vu_x term:

u'_t(t,x) = u_t(t,x+vt) + vu_x(t,x+vt)

while its x derivatives are simple:

u'_x(t,x) = u_x(t,x+vt)

and so on for the higher x derivatives.

This lets us check that the boosted function u' is also a solution:

\begin{array}{ccl}  u'_t + u'_{xx} + u'_{xxxx} + u' u'_x &=& u_t(t,x+vt) + vu_x(t,x+vt)  \\ \\  &&  + u_{xx}(t,x+vt) + u_{xxxx}(t,x+tv)  \\ \\  && + (u(t,x+vt) - v) u_x(t,x+vt) \\ \\  &=& 0  \end{array}

Note how subtracting the constant v when we transform u exactly cancels the extra term vu_x term we get when we transform the time derivative of u.

I got this idea from here:

• Uriel Frisch, Zhensu She and Olivier Thual, Viscoelastic behaviour of cellular solutions to the Kuramoto–Sivashinsky model, Journal of Fluid Mechanics 168 (1986), 221–240.

thanks to someone on Twitter.

Does the integral form of the Kuramoto-Sivashinsky equation also have Galilean transformations as symmetries? Yes, but there’s a certain problem with them. Let’s see how it goes.

If h(t,x) is a solution, then so is this boosted function:

h'(t,x) = h(t,x+vt) - vx - \frac{1}{2}t v^2

But now the ‘fudge factors’ in the boosted function are more sneaky! We don’t just subtract a constant, as we did with the derivative form. We subtract a function that depends on both the space and time coordinates.

Let’s check. The first time derivative of h' works like this:

h'_t(t,x) = h_t(t,x+vt) + vh_t(t,x+vt) - \frac{1}{2} v^2

The first space derivative works like this:

h'_x(t,x) = h_x(t,x+vt) - v

The second space derivative is simpler:

h'_{xx}(t,x) = h_{xx}(t,x+vt)

and the higher space derivatives work the same way. Now suppose h was a solution of the integral form of the Kuramoto-Sivashinsky equation:

h_t + h_{xx} + h_{xxxx} + \frac{1}{2} (h_x)^2 = 0

We can use our formulas to check that h' is a solution:

\begin{array}{ccl}  h'_t \! +\! h'_{xx}\! +\! h'_{xxxx} \!+ \! \frac{1}{2} (h'_x)^2 \!  &=& h_t(t,x+vt) + vh_t(t,x+vt) \! - \!\frac{1}{2} v^2 \\ \\  && + h_{xx}(t,x+vt)   \\ \\  && + h_{xxxx}(t,x+vt) \\ \\  && + \frac{1}{2}(h_x(t,x+vt) - v)^2 \\ \\  &=& h_x(t,x+vt) + h_{xx}(t,x+vt)  \\ \\  && + h_{xxxx}(t,x+vt) + \frac{1}{2} h_x(t,x+vt)^2  \\ \\  &=& 0  \end{array}

The cancellations in the second step rely crucially on those sneaky fudge factors in the definition of the boosted solution h'. I chose those fudge factors precisely to make these cancellations happen.

Periodic boundary conditions

A lot of really interesting results about the Kuramoto-Sivashinsky equation involve looking at solutions that are periodic in space. For the derivative form this means

u(t,x+L) = u(t,x)

It’s easy to see that if u obeys this equation, so does its boosted version:

u'(t,x) = u(t,x+vt) - v

After all, a quick calculation shows

\begin{array}{ccl}  u'(t,x+L) &=& u(t,x+L+vt) - v \\ \\  &=& u(t,x+vt) - v \\ \\  &=& u'(t,x)   \end{array}

So for any spatially periodic solution with some stripes that are basically standing still, there’s a spatially periodic boosted version where they’re moving at velocity v. You can think of a spatially periodic solution as a function on the cylinder. In the boosted version, the stripes spiral around this cylinder like the stripes on a barber pole!

But for the integral form this doesn’t work! Suppose h is a solution of the integral form that is periodic in space:

h(t,x+L) = h(t,x)

Now its boosted version is defined by

h'(t,x) = h(t,x+vt) - vx - \frac{1}{2}t v^2

and this is not periodic in space:

\begin{array}{ccl}  h'(t,x+L) &=& h(t,x+L+vt) - v(x+L) - \frac{1}{2}t v^2  \\ \\  &=& h(t,x+vt) - v(x+L) - \frac{1}{2}t v^2 \\ \\  &\ne& h(t,x+vt) = h'(t,x)  \end{array}

A conserved quantity

The derivative form of the Kuramoto–Shivashinsky equation has an unexpected symmetry under Galilean transformations, or boosts, even for periodic solutions. It also has an interesting conserved quantity, namely

\displaystyle{ \int_0^L u(t,x) \, dx }

To see this, note that

\begin{array}{ccl}  \displaystyle{ \frac{d}{dt} \int_0^L u(t,x) \, dx } &=& \displaystyle{ \int_0^L u_t(t,x) \, dx }  \\ \\  &=& \displaystyle{ - \int_0^L \left( u_{xx} + u_{xxxx} + u u_x \right) \, dx } \\ \\  &=& \displaystyle{ \left. - \left( u_x + u_{xxx} + \frac{1}{2} u^2 \right)   \right|_0^L  } \\ \\  &=& 0  \end{array}

Note that if we boost a solution, replacing u(t,x) by u(t,x+vt) - v, we also subtract a constant from this conserved quantity, namely v L. So, it seems that the conserved quantity

\displaystyle{ \int_0^L u(t,x) \, dx }

is a way of measuring ‘how much the stripes are moving, on average’.

I’ll venture a guess that when

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

the stripes are stationary, on average. One reason I’m willing to guess this is that this equation has another meaning, too.

I mentioned that we can get solutions u of the derivative form from solutions h of the integral form by differentiating them, like this:

u = h_x

Does every solution of the derivative form arise this way from a solution of the integral form? Given u we can define h by

\displaystyle{ h(t,x) = \int_0^x u(t,y) \, dy }

and we can check it obeys the integral form of the Kuramoto–Sivashinsky equation. I’ve been showing you lots of calculations and we’re probably both getting tired of them, so I won’t actually go through this check. But here’s the interesting part. Suppose we restrict attention to spatially periodic solutions! If u is spatially periodic, h will be so iff

\displaystyle{ \int_0^L u(t,x) \, dx = 0}

since we have

\begin{array}{ccl}  h(t,x+L) &=& \displaystyle{ \int_0^{x+L} u(t,y) \, dy } \\ \\  &=&  \displaystyle{ \int_0^x u(t,y) \, dy + \int_x^{x+L} u(t,y) \, dy } \\ \\  &=& h(t,x)  \end{array}

where in the last step, the first integral is h(t,x) by definition and the second is zero because

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

and u is periodic with period L.


Limiting ourselves to spatially periodic solutions, we see:

• The derivative form of the Kuramoto–Sivashinsky equation has symmetry under boosts.

• The integral form does not.

• Solutions of the integral form correspond precisely to solutions of the derivative form with

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

• Boosting a solution of the derivative form with

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

gives a solution where this quantity is not zero.

Given all this, I think the good theorems about spatially periodic solutions of the Kuramoto–Sivashinsky equation will focus on solutions of the integral form, or equivalently solutions of the differential form with

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

Epilogue: symmetries of the heat equation

At first I was shocked that the Kuramoto–Sivashinsky equation had Galilean transformations as symmetries, because it’s a relative of the heat equation, and the heat equation obviously doesn’t have Galilean transformations as symmetries: you can’t get a solution of the heat equation where a wave of heat moves along at constant velocity as it spreads out.

But then I remembered that the heat equation

u_t = u_{xx}

is very similar to Schrödinger’s equation:

u_t = i u_{xx}

And Schrödinger’s equation does have Galilean transformations as symmetries, since it describes a free particle! Physicists know how they work. You don’t just replace u(t,x) with u(t,x+vt), you also multiply it by a suitable complex phase that depends on t and x. That is, you multiply u(t,x+vt) by the exponential of some imaginary-valued function of t and x.

So then I realized that the heat equation does have Galilean symmetries! They work just as for Schrödinger’s equation, but now you have to multiply u(t,x+vt) by the exponential of some real-valued function of t and x.

This grows exponentially as you move in one direction in space, so people thinking about heat often don’t think about such solutions. Yes, you can get a ‘wave of heat that moves along at constant velocity as it spreads out’, but it’s like a huge tsunami wave rolling in from infinity!

I was very proud of myself for having discovered this weird Galilean invariance of the heat equation, and I posed it as a puzzle on Twitter. But then Tom Price pointed out a paper about it:

• U. Niederer, Schrödinger invariant generalized heat equations, Helvetica Physica Acta 51 (1978), 220–239.

It turns out the Galilean invariance of the heat equation, and much more, was discovered by Sophus Lie in 1882!

The so-called ‘Schrödinger group’ also includes certain fractional linear transformations of the plane. So this raises the question of whether the Kuramoto–Sivashinsky equation has even more symmetries than those in the Galilei group (translations, spatial reflections and Galilean transformations).

But it seems this too has been studied:

• Mehdi Nadjafikhah and Fatemeh Ahangari, Lie symmetry analysis of the two-dimensional generalized Kuramoto–Sivashinsky equation, Mathematical Sciences 6 (2012).

This discusses the analogue of the Kuramoto–Sivashinsky equation in two dimensions of space and one of time. It uses some techniques to compute its symmetry group, but I don’t see any sign of invariance under a big group like the ‘Schrödinger group’.

The Kuramoto–Sivashinsky Equation (Part 1)

17 October, 2021

I love this movie showing a solution of the Kuramoto–Sivashinsky equation, made by Thien An. If you haven’t seen her great math images on Twitter, check them out!

I hadn’t known about this equation, and it looked completely crazy to me at first. But it turns out to be important, because it’s one of the simplest \partial differential equations that exhibits chaotic behavior.

As the image scrolls to the left, you’re seeing how a real-valued function h(t,x) of two real variables changes with the passage of time. The vertical direction is ‘space’, x, while the horizontal direction is time, t.

Near the end of this post I’ll make some conjectures about the Kuramoto–Sivashinsky equation. The first one is very simple: as time passes, stripes appear and merge, but they never disappear or split.

The behavior of these stripes makes the Kuramoto–Sivashinsky equation an excellent playground for thinking about how differential equations can describe ‘things’ with some individuality, even though their solutions are just smooth functions. But to test my conjectures, I could really use help from people who are good at numerical computation or creating mathematical images!

First let me review some known stuff. You can skip this and go straight to the conjectures if you want, but some terms might not make sense.


For starters, note that these stripes seem to appear out of nowhere. That’s because this system is chaotic: small ripples get amplified. This is especially true of ripples with a certain wavelength: roughly 2 \sqrt{2} \pi, as we’ll see later.

And yet while solutions of the Kuramoto–Sivanshinsky equation are chaotic, they have a certain repetitive character. That is, they don’t do completely novel things; they seem to keep doing the same general sort of thing. The world this equation describes has an arrow of time, but it’s ultimately rather boring compared to ours.

The reason is that all smooth solutions of the Kuramoto–Sivanshinsky equation quickly approach a certain finite-dimensional manifold of solutions, called an ‘inertial manifold’. The dynamics on the inertial manifold is chaotic. And sitting inside it is a set called an ‘attractor’, which all solutions approach. This attractor is probably a fractal. This attractor describes the complete repertoire of what you’ll see solutions do if you wait a long time.

Some mathematicians have put a lot of work into proving these things, but let’s see how much we can understand without doing anything too hard.

Written out with a bit less jargon, the Kuramoto–Sivashinky equation says

\displaystyle{ \frac{\partial h}{\partial t} = - \frac{\partial^2 h}{\partial x^2}  - \frac{\partial^4 h}{\partial x^4}  - \frac{1}{2}\left( \frac{\partial h}{\partial x}\right)^2 }

or in more compressed notation,

h_t = -h_{xx} - h_{xxxx} - \frac{1}{2} (h_x)^2

To understand it, first remember the heat equation:

h_t = h_{xx}

This describes how heat spreads out. That is: if h(t,x) is the temperature of an iron rod at position x at time t, the heat equation describes how this temperature function flattens out as time passes and heat spreads.

But the Kuramoto–Sivashinsky equation more closely resembles the time-reversed heat equation

h_t = -h_{xx}

This equation describes how, running a movie of a hot iron rod backward, heat tends to bunch up rather than smear out! Small regions of different temperature, either hotter or colder than their surroundings, will tend to amplify.

This accounts for the chaotic behavior of the Kuramoto–Sivashinsky equation: small stripes emerge as if out of thin air and then grow larger. But what keeps these stripes from growing uncontrollably?

The next term in the equation helps. If we have

h_t = -h_{xx} - h_{xxxx}

then very sharp spikes in h(t,x) tend to get damped out exponentially.

To see this, it helps to bring in a bit more muscle: Fourier series. We can easily solve the heat equation if our iron rod is the interval [0,2\pi] and we demand that its temperature is the same at both ends:

h(t,0) = h(t,2\pi)

This lets us write the temperature function h(t,x) in terms of the functions e^{ikx} like this:

\displaystyle{  h(t,x) = \sum_{k = -\infty}^\infty \hat{h}_k(t) e^{ikx} }

for some functions \hat{h}_k(t). Then the heat equation gives

\displaystyle{   \frac{d}{d t} \hat{h}_k(t) = -k^2 \hat{h}_k(t) }

and we can easily solve these equations and get

\displaystyle{  \hat{h}_k(t) = e^{-k^2 t} \hat{h}_k(0) }

and thus

\displaystyle{   h(t,x) = \sum_{k = -\infty}^\infty \hat{h}_k(0) e^{-k^2 t} e^{ikx} }

So, each function \hat{h}_k(t) decays exponentially as time goes on, and the so-called ‘high-frequency modes’, \hat{h}_k(t) with |k| big, get damped really fast due to that e^{-k^2 t} factor. This is why heat smears out as time goes on.

If we solve the time-reversed heat equation the same way we get

\displaystyle{  h(t,x) = \sum_{k = -\infty}^\infty \hat{h}_k(0) e^{k^2 t} e^{ikx} }

so now high-frequency modes get exponentially amplified. The time-reversed heat equation is a very unstable: if you change the initial data a little bit by adding a small amount of some high-frequency function, it will make an enormous difference as time goes by.

What keeps things from going completely out of control? The next term in the equation helps:

h_t = -h_{xx} - h_{xxxx}

This is still linear so we can still solve it using Fourier series. Now we get

\displaystyle{   h(t,x) = \sum_{k = -\infty}^\infty \hat{h}_k(0) e^{(k^2-k^4) t} e^{ikx} }

Since k^2 - k^4 \le 0, none of the modes \hat{h}_k(t) grows exponentially. In fact, all the modes decay exponentially except for three: k = -1,0,1. These will be constant in time. So, any solution will approach a constant as time goes on!

We can make the story more interesting if we don’t require our rod to have length 2\pi. Say it has length L. We can write periodic functions on the interval [0,L] as linear combinations of functions e^{ikx} where now the frequencies k aren’t integers: instead

k = 2\pi n/L

for integers n. The longer our rod, the lower these frequencies k can be. The rest of the math works almost the same: we get

\displaystyle{ h(t,x) = \sum_{n = -\infty}^\infty \hat{h}_k(0) e^{(k^2-k^4) t} e^{ikx} }

but we have to remember k = 2\pi n/L. The modes with k^2 - k^4 > 0 will grow exponentially, while the rest will decay exponentially or stay constant. Note that k^2 - k^4 > 0 only for 0 <|k| < 1. So, modes with these frequencies grow exponentially. Modes with |k| > 1 decay exponentially.

If L < 2\pi, all the frequencies k are integers times 2\pi/L, which is bigger than 1, so no modes grow exponentially—and indeed all solutions approach a constant! But as you look at longer and longer rods, you get more and more modes that grow exponentially. The number of these will be roughly proportional to L, though they will ‘pop into existence’ at certain specific values of L.

Which exponentially growing modes grow the fastest? These are the ones that make k^2 - k^4 as large as possible, so they happen near where

\displaystyle{ \frac{d}{dk} (k^2 - k^4) = 0 }

namely k = 1/\sqrt{2}. The wavelength of a mode is 2\pi/k, so these fastest-growing modes have wavelength close to 2\sqrt{2} \pi.

In short, our equation has a certain length scale where the instability is most pronounced: temperature waves with about this wavelength grow fastest.

All this is very easy to work out in as much detail as we want, because our equation so far is linear. The full-fledged Kuramoto–Sivashinsky equation

h_t = -h_{xx} - h_{xxxx} - \frac{1}{2} (h_x)^2

is a lot harder. And yet some features of the linear version remain, which is why it was worth spending time on that version.

For example, I believe the stripes we see in the movie above have width roughly 2 \sqrt{2} \pi. Stripes of roughly this width tend to get amplified. Why don’t they keep on growing taller forever? Apparently the nonlinear term -(h_x)^2 prevents it. But this is not obvious. Indeed, it’s conjectured that if you solve the Kuramoto–Sivashinsky equation starting with a bounded smooth function h(0,x), the solution will remain bounded by a constant. But this has not been proved—or at least it was not proved as of 2000, when this very nice article was written:

• Encyclopedia of Mathematics, Kuramoto–Sivashinsky equation.

The inertial manifold

The most fascinating fact about the Kuramoto–Sivashinsky equation is that for any fixed length L, it has a finite-dimensional manifold M of solutions such that every solution approaches one of these, exponentially fast! So, while this equation really describes an infinite-dimensional dynamical system, as t \to \infty its solutions move closer and closer to the solutions of some finite-dimensional dynamical system. This finite-dimensional system contains all the information about the patterns we’re seeing in Thien An’s movie.

As I mentioned, the manifold M is called an ‘inertial manifold’. This is a general concept in dynamical systems theory:

• Wikipedia, Inertial manifold.

To make these ideas precise we need to choose a notion of distance between two solutions at a given time. A good choice uses the L^2 norm for periodic functions on [0,L]:

\displaystyle{  \|f\| = \sqrt{\int_0^L |f(x)|^2 \, dx} }

Functions on [0,L] with finite L^2 norm form a Hilbert space called L^2[0,L]. If we start with any function h(0,-) in this Hilbert space we get a solution h(t,x) of the Kuramoto–Sivashinsky equation such that the function h(t,-) is in this Hilbert space at all later times t. Furthermore, this function is smooth, even analytic, for all later times:

• P. Collet, J.-P. Eckmann, H. Epstein and J. Stubbe, Analyticity for the Kuramoto–Sivashinsky equation, Physica D 67 (1993), 321–326.

This smoothing property is well-known for the heat equation, but it’s much less obvious here!

This work also shows that the Kuramoto–Sivashinsky equation defines a dynamical system on the Hilbert space L^2[0,L]. And based on earlier work by other mathematicians, Temam and Wang have heroically shown that this Hilbert space contains an inertial manifold of dimension bounded by some constant times L^{1.64} (\ln L)^{0.2}.

• Roger Temam and Xiao Ming Wang, Estimates on the lowest dimension of inertial manifolds for the Kuramoto-Sivashinsky equation in the general case, Differential and Integral Equations 7 (1994), 1095–1108.

I conjecture that in reality its dimension grows roughly linearly with L. Why? We’ve just seen this is true for the linearized version of the Kuramoto–Sivashinsky equation: all modes with frequency |k| > 1 get damped exponentially, but since there’s one mode for each integer n, and k = 2\pi n/L, these modes correspond to integers n with |n| \le L /2\pi. So, there are \lfloor L/\pi \rfloor of these modes. In short, for the linearized Kuramoto–Sivashinsky equation the inertial manifold has dimension about L/\pi.

This evidence is rather weak, since it completely ignores the nonlinearity of the Kuramoto–Sivashinsky equation. I would not be shocked if the dimension of the inertial manifold grew at some other rate than linearly with L.

Sitting inside the inertial manifold is an attractor, the smallest set that all solutions approach. This is probably a fractal, since that’s true of many chaotic systems. So besides trying to estimate the dimension of the inertial manifold, which is an integer we should try to estimate the dimension of this attractor, which may not be an integer!

There have been some nice numerical experiments studying solutions of the Kuramoto–Sivashinsky equation for various values of L, seeing how they get more complicated as L increases. For small L, every solution approaches a constant, just as in the linearized version. For larger L we get periodic solutions, and as L continues to increase we get period doubling and finally chaos—a typical behavior for dynamical systems. But that’s just the start. For details, read this:

• Demetrios T. Papageorgiou and Yiorgos S. Smyrlis, The route to chaos for the Kuramoto–Sivashinsky equation, Theoretical and Computational Fluid Dynamics 3 (1991), 15–42.

I’ll warn you that they use a slightly different formalism. Instead of changing the length L, they keep it equal to 2\pi and change the equation, like this:

h_t = -h_{xx} - v h_{xxxx} - \frac{1}{2} (h_x)^2

for some number v they call the ‘viscosity’. It’s just a different way of describing the same business, so if I had more energy I could figure out the relation between L and v and tell you at which length L chaos first kicks in. But I won’t now. Instead, I want to make some conjectures.


There should be some fairly well-defined notion of a ‘stripe’ for the Kuramoto–Sivashinsky equations: you can see the stripes form and merge here, and if we can define them, we can count them and say precisely when they’re born and when they merge:

For now I will define a ‘stripe’ as follows. At any time, a solution of the Kuramoto–Sivashinsky gives a periodic function h on the interval [0,L]. We can think of this as a function on the circle. A stripe will be a maximal closed subinterval of the circle on which h \ge c. This definition depends on a constant c > 0, and it’s up to you to pick a value of the constant that makes my conjectures true—or at least, almost true!

So, here are the conjectures:

First, I conjecture that if L is large enough, almost every non-negative solution in the inertial manifold has a finite number of stripes at any time t, and that while they can appear and merge as we increase t, they can never split or disappear.

(Here ‘almost every’ is in the usual sense of measure theory. There are certainly solutions of the Kuramoto–Sivashinsky equation that don’t have stripes that appear and merge, like constant solutions. These solutions may lie on the inertial manifold, but I’m claiming they are rare.)

I also conjecture that the time-averaged number of stripes is asymptotically proportional to L as L \to \infty for almost every nonnegative solution on the inertial manifold. The constant of proportionality shouldn’t depend on the solution we pick, except for solutions in some set of measure zero. It will, however, depend on our precise definition of ‘stripe’, e.g. our choice of the constant c.

I also conjecture that there’s a well-defined time average of the rate at which new stripes form, which is also asymptotically proportional to L and independent of which solution we pick, except for solutions in a set of measure zero.

I also conjecture that this rate equals the time-averaged rate at which stripes merge, while the time-averaged rate at which stripes disappear or split is zero.

These conjectures are rather bold, but of course there are various fallback positions if they fail.

How can we test these conjectures? It’s hard to explicitly describe solutions that are actually on the inertial manifold, but by definition, any solution keeps getting closer to the inertial manifold at an exponential rate. Thus, it should behave similarly to solutions that are on the inertial manifold, after we wait long enough. So, I’ll conjecture that the above properties hold not only for almost every solution on the inertial manifold, but for typical solutions that start near the inertial manifold… as long as we wait long enough when doing our time averages.

If you feel like working on this, here are some things I’d really like:

• Images like Thien An’s but with various choices of L. To create these, maybe start with

h(0,x) = \sin \frac{x}{2L} + \textrm{a small amount of noise}

and run it for long enough to ‘settle down’—that is, get near the inertial manifold.

• A time-averaged count of the average number of stripes for various choices of L. I’m conjecturing that this is asymptotically proportional to L for large L.

• Time-averaged counts of the rates at which stripes are born, merge, split, and die—again for various choices of L. I’m conjecturing that the first two are asymptotically proportional to L for large L and that they’re equal. I’m conjecturing that the last two are zero, or tiny.

If someone gets into this, maybe we could submit a short paper to Experimental Mathematics. I’ve been browsing papers on the Kuramoto–Sivashinsky equations, and I haven’t yet seen anything that gets into as much detail on what solutions look like as I’m trying to do here.

The arrow of time

One more thing. I forgot to emphasize that the dynamical system on the Hilbert space L^2[0,L] is not reversible: we can evolve a solution forwards in time and it will stay in this Hilbert space, but not backwards in time. This is very well-known for the heat equation; the point is that solutions get smoother as we run them forward, but when we run them backward they typically get more wild and eventually their L^2 norm blows up.

What makes this especially interesting is that the dynamical system on the inertial manifold probably is reversible. As long as this manifold is compact, it must be: any smooth vector field on a compact manifold M generates a ‘flow’ that you can run forward or backward in time.

And yet, even if this flow is reversible, as I suspect it is, it doesn’t resemble its time-reversed version! It has an ‘arrow of time’ built in, since bumps are born and merge much more often than they merge and split.

So, if my guesses are right, the inertial manifold for the Kuramoto–Sivashinsky equation describes a deterministic universe where time evolution is reversible—and yet the future doesn’t look like the past, because the dynamics carry the imprint of the irreversible dynamics of the Kuramoto–Sivashinsky equation on the larger Hilbert space of all solutions.

A warning

If you want to help me, the following may be useful. I believe the stripes are ‘bumps’, that is, regions where u > c for some positive constant c. That would make them easy to define. I was shocked when Steve Huntsman did some calculations and produced a picture showing a solution where L = 128:

Here the stripes are not mere bumps: they are regions where, as we increase x, the solution first becomes large and negative and then becomes large and positive!

After massive confusion I realized that Steve was using some MATLAB code adapted from this website:

• Mathab Lak, Test case for PDEs: Kuramoto–Sivashinksy, Crank-Nicolson/Adams-Bashforth (CNAB2) timestepping.

and this code solves a different version of the Kuramoto–Sivashinksy equations, the so-called derivative form:

u_t = -u_{xx} - u_{xxxx} - uu_x

If h satisfies the integral form of the Kuramoto–Sivashinksy equation, which is the one I’ve been talking about all along:

h_t = -h_{xx} - h_{xxxx} - \frac{1}{2} (h_x)^2

then its derivative

u = h_x

satisfies the derivative form.

So, the two equations are related, but you have to be careful because some of their properties are quite different! For the integral form, the cross-section of a typical stripe looks very roughly like this:

but for the derivative form it looks more like this:

You can grab Steve Huntsman’s MATLAB code here, but beware: this program solves the derivative form! Many of Steve’s pictures in comments to this blog—indeed, all of them so far—show solutions of the derivative form. In particular, the phenomenon he discovered of stripes tending to move at a constant nonzero velocity seems to be special to the derivative form of the Kuramoto–Sivashinsky equation. I’ll say more about this next time.