## Effective Thermodynamics for a Marginal Observer

8 May, 2018

guest post by Matteo Polettini

Suppose you receive an email from someone who claims “here is the project of a machine that runs forever and ever and produces energy for free!” Obviously he must be a crackpot. But he may be well-intentioned. You opt for not being rude, roll your sleeves, and put your hands into the dirt, holding the Second Law as lodestar.

Keep in mind that there are two fundamental sources of error: either he is not considering certain input currents (“hey, what about that tiny hidden cable entering your machine from the electrical power line?!”, “uh, ah, that’s just to power the “ON” LED”, “mmmhh, you sure?”), or else he is not measuring the energy input correctly (“hey, why are you using a Geiger counter to measure input voltages?!”, “well, sir, I ran out of voltmeters…”).

In other words, the observer might only have partial information about the setup, either in quantity or quality. Because he has been marginalized by society (most crackpots believe they are misunderstood geniuses) we will call such observer “marginal,” which incidentally is also the word that mathematicians use when they focus on the probability of a subset of stochastic variables.

In fact, our modern understanding of thermodynamics as embodied in statistical mechanics and stochastic processes is founded (and funded) on ignorance: we never really have “complete” information. If we actually had, all energy would look alike, it would not come in “more refined” and “less refined” forms, there would not be a differentials of order/disorder (using Paul Valery’s beautiful words), and that would end thermodynamic reasoning, the energy problem, and generous research grants altogether.

Even worse, within this statistical approach we might be missing chunks of information because some parts of the system are invisible to us. But then, what warrants that we are doing things right, and he (our correspondent) is the crackpot? Couldn’t it be the other way around? Here I would like to present some recent ideas I’ve been working on together with some collaborators on how to deal with incomplete information about the sources of dissipation of a thermodynamic system. I will do this in a quite theoretical manner, but somehow I will mimic the guidelines suggested above for debunking crackpots. My three buzzwords will be: marginal, effective, and operational.

### “Complete” thermodynamics: an out-of-the-box view

The laws of thermodynamics that I address are:

• The good ol’ Second Law (2nd)

• The Fluctuation-Dissipation Relation (FDR), and the Reciprocal Relation (RR) close to equilibrium.

• The more recent Fluctuation Relation (FR)1 and its corollary the Integral Fluctuation Relation (IFR), which have been discussed on this blog in a remarkable post by Matteo Smerlak.

The list above is all in the “area of the second law”. How about the other laws? Well, thermodynamics has for long been a phenomenological science, a patchwork. So-called stochastic thermodynamics is trying to put some order in it by systematically grounding thermodynamic claims in (mostly Markov) stochastic processes. But it’s not an easy task, because the different laws of thermodynamics live in somewhat different conceptual planes. And it’s not even clear if they are theorems, prescriptions, or habits (a bit like in jurisprudence2).

Within stochastic thermodynamics, the Zeroth Law is so easy nobody cares to formulate it (I do, so stay tuned…). The Third Law: no idea, let me know. As regards the First Law (or, better, “laws”, as many as there are conserved quantities across the system/environment interface…), we will assume that all related symmetries have been exploited from the offset to boil down the description to a minimum.

This minimum is as follows. We identify a system that is well separated from its environment. The system evolves in time, the environment is so large that its state does not evolve within the timescales of the system3. When tracing out the environment from the description, an uncertainty falls upon the system’s evolution. We assume the system’s dynamics to be described by a stochastic Markovian process.

How exactly the system evolves and what is the relationship between system and environment will be described in more detail below. Here let us take an “out of the box” view. We resolve the environment into several reservoirs labeled by index $\alpha$. Each of these reservoirs is “at equilibrium” on its own (whatever that means4). Now, the idea is that each reservoir tries to impose “its own equilibrium” on the system, and that their competition leads to a flow of currents across the system/environment interface. Each time an amount of the reservoir’s resource crosses the interface, a “thermodynamic cost” has to be to be paid or gained (be it a chemical potential difference for a molecule to go through a membrane, or a temperature gradient for photons to be emitted/absorbed, etc.).

The fundamental quantities of stochastic thermodynamic modeling thus are:

• On the “-dynamic” side: the time-integrated currents $\Phi^t_\alpha$, independent among themselves5. Currents are stochastic variables distributed with joint probability density

$P(\{\Phi_\alpha\}_\alpha)$

• On the “thermo-” side: The so-called thermodynamic forces or “affinities”6 $\mathcal{A}_\alpha$ (collectively denoted $\mathcal{A}$). These are tunable parameters that characterize reservoir-to-reservoir gradients, and they are not stochastic. For convenience, we conventionally take them all positive.

Dissipation is quantified by the entropy production:

$\sum \mathcal{A}_\alpha \Phi^t_\alpha$

We are finally in the position to state the main results. Be warned that in the following expressions the exact treatment of time and its scaling would require a lot of specifications, but keep in mind that all these relations hold true in the long-time limit, and that all cumulants scale linearly with time.

FR: The probability of observing positive currents is exponentially favoured with respect to negative currents according to

$P(\{\Phi_\alpha\}_\alpha) / P(\{-\Phi_\alpha\}_\alpha) = \exp \sum \mathcal{A}_\alpha \Phi^t_\alpha$

Comment: This is not trivial, it follows from the explicit expression of the path integral, see below.

IFR: The exponential of minus the entropy production is unity

$\big\langle \exp - \sum \mathcal{A}_\alpha \Phi^t_\alpha \big\rangle_{\mathcal{A}} =1$

Homework: Derive this relation from the FR in one line.

2nd Law: The average entropy production is not negative

$\sum \mathcal{A}_\alpha \left\langle \Phi^t_\alpha \right\rangle_{\mathcal{A}} \geq 0$

Homework: Derive this relation using Jensen’s inequality.

Equilibrium: Average currents vanish if and only if affinities vanish:

$\left\langle \Phi^t_\alpha \right\rangle_{\mathcal{A}} \equiv 0, \forall \alpha \iff \mathcal{A}_\alpha \equiv 0, \forall \alpha$

Homework: Derive this relation taking the first derivative w.r.t. ${\mathcal{A}_\alpha}$ of the IFR. Notice that also the average depends on the affinities.

S-FDR: At equilibrium, it is impossible to tell whether a current is due to a spontaneous fluctuation (quantified by its variance) or to an external perturbation (quantified by the response of its mean). In a symmetrized (S-) version:

$\left. \frac{\partial}{\partial \mathcal{A}_\alpha}\left\langle \Phi^t_{\alpha'} \right\rangle \right|_{0} + \left. \frac{\partial}{\partial \mathcal{A}_{\alpha'}}\left\langle \Phi^t_{\alpha} \right\rangle \right|_{0} = \left. \left\langle \Phi^t_{\alpha} \Phi^t_{\alpha'} \right\rangle \right|_{0}$

Homework: Derive this relation taking the mixed second derivatives w.r.t. ${\mathcal{A}_\alpha}$ of the IFR.

RR: The reciprocal response of two different currents to a perturbation of the reciprocal affinities close to equilibrium is symmetrical:

$\left. \frac{\partial}{\partial \mathcal{A}_\alpha}\left\langle \Phi^t_{\alpha'} \right\rangle \right|_{0} - \left. \frac{\partial}{\partial \mathcal{A}_{\alpha'}}\left\langle \Phi^t_{\alpha} \right\rangle \right|_{0} = 0$

Homework: Derive this relation taking the mixed second derivatives w.r.t. ${\mathcal{A}_\alpha}$ of the FR.

Notice the implication scheme: FR ⇒ IFR ⇒ 2nd, IFR ⇒ S-FDR, FR ⇒ RR.

### “Marginal” thermodynamics (still out-of-the-box)

Now we assume that we can only measure a marginal subset of currents $\{\Phi_\mu^t\}_\mu \subset \{\Phi_\alpha^t\}_\alpha$ (index $\mu$ always has a smaller range than $\alpha$), distributed with joint marginal probability

$P(\{\Phi_\mu\}_\mu) = \int \prod_{\alpha \neq \mu} d\Phi_\alpha \, P(\{\Phi_\alpha\}_\alpha)$

Notice that a state where these marginal currents vanish might not be an equilibrium, because other currents might still be whirling around. We call this a stalling state.

$\mathrm{stalling:} \qquad \langle \Phi_\mu \rangle \equiv 0, \quad \forall \mu$

My central question is: can we associate to these currents some effective affinity $\mathcal{Q}_\mu$ in such a way that at least some of the results above still hold true? And, are all definitions involved just a fancy mathematical construct, or are they operational?

First the bad news: In general the FR is violated for all choices of effective affinities:

$P(\{\Phi_\mu\}_\mu) / P(\{-\Phi_\mu\}_\mu) \neq \exp \sum \mathcal{Q}_\mu \Phi^t_\mu$

This is not surprising and nobody would expect that. How about the IFR?

Marginal IFR: There are effective affinities such that

$\left\langle \exp - \sum \mathcal{Q}_\mu \Phi^t_\mu \right\rangle_{\mathcal{A}} =1$

Mmmhh. Yeah. Take a closer look this expression: can you see why there actually exists an infinite choice of “effective affinities” that would make that average cross 1? Which on the other hand is just a number, so who even cares? So this can’t be the point.

The fact is, the IFR per se is hardly of any practical interest, as are all “absolutes” in physics. What matters is “relatives”: in our case, response. But then we need to specify how the effective affinities depend on the “real” affinities. And here steps in a crucial technicality, whose precise argumentation is a pain. Basing on reasonable assumptions7, we demonstrate that the IFR holds for the following choice of effective affinities:

$\mathcal{Q}_\mu = \mathcal{A}_\mu - \mathcal{A}^{\mathrm{stalling}}_\mu$,

where $\mathcal{A}^{\mathrm{stalling}}$ is the set of values of the affinities that make marginal currents stall. Notice that this latter formula gives an operational definition of the effective affinities that could in principle be reproduced in laboratory (just go out there and tune the tunable until everything stalls, and measure the difference). Obviously:

Stalling: Marginal currents vanish if and only if effective affinities vanish:

$\left\langle \Phi^t_\mu \right\rangle_{\mathcal{A}} \equiv 0, \forall \mu \iff \mathcal{A}_\mu \equiv 0, \forall \mu$

Now, according to the inference scheme illustrated above, we can also prove that:

Effective 2nd Law: The average marginal entropy production is not negative

$\sum \mathcal{Q}_\mu \left\langle \Phi^t_\mu \right\rangle_{\mathcal{A}} \geq 0$

S-FDR at stalling:

$\left. \frac{\partial}{\partial \mathcal{A}_\mu}\left\langle \Phi^t_{\mu'} \right\rangle \right|_{\mathcal{A}^{\mathrm{stalling}}} + \left. \frac{\partial}{\partial \mathcal{A}_{\mu'}}\left\langle \Phi^t_{\mu} \right\rangle \right|_{\mathcal{A}^{\mathrm{stalling}}} = \left. \left\langle \Phi^t_{\mu} \Phi^t_{\mu'} \right\rangle \right|_{\mathcal{A}^{\mathrm{stalling}}}$

Notice instead that the RR is gone at stalling. This is a clear-cut prediction of the theory that can be experimented with basically the same apparatus with which response theory has been experimentally studied so far (not that I actually know what these apparatus are…): at stalling states, differing from equilibrium states, the S-FDR still holds, but the RR does not.

### Into the box

You’ve definitely gotten enough at this point, and you can give up here. Please exit through the gift shop.

If you’re stubborn, let me tell you what’s inside the box. The system’s dynamics is modeled as a continuous-time, discrete configuration-space Markov “jump” process. The state space can be described by a graph $G=(I, E)$ where $I$ is the set of configurations, $E$ is the set of possible transitions or “edges”, and there exists some incidence relation between edges and couples of configurations. The process is determined by the rates $w_{i \gets j}$ of jumping from one configuration to another.

We choose these processes because they allow some nice network analysis and because the path integral is well defined! A single realization of such a process is a trajectory

$\omega^t = (i_0,\tau_0) \to (i_1,\tau_1) \to \ldots \to (i_N,\tau_N)$

A “Markovian jumper” waits at some configuration $i_n$ for some time $\tau_n$ with an exponentially decaying probability $w_{i_n} \exp - w_{i_n} \tau_n$ with exit rate $w_i = \sum_k w_{k \gets i}$, then instantaneously jumps to a new configuration $i_{n+1}$ with transition probability $w_{i_{n+1} \gets {i_n}}/w_{i_n}$. The overall probability density of a single trajectory is given by

$P(\omega^t) = \delta \left(t - \sum_n \tau_n \right) e^{- w_{i_N}\tau_{i_N}} \prod_{n=0}^{N-1} w_{j_n \gets i_n} e^{- w_{i_n} \tau_{i_n}}$

One can in principle obtain the probability distribution function of any observable defined along the trajectory by taking the marginal of this measure (though in most cases this is technically impossible). Where does this expression come from? For a formal derivation, see the very beautiful review paper by Weber and Frey, but be aware that this is what one would intuitively come up with if one had to simulate with the Gillespie algorithm.

The dynamics of the Markov process can also be described by the probability of being at some configuration $i$ at time $t$, which evolves via the master equation

$\dot{p}_i(t) = \sum_j \left[ w_{ij} p_j(t) - w_{ji} p_i(t) \right]$.

We call such probability the system’s state, and we assume that the system relaxes to a uniquely defined steady state $p = \mathrm{lim}_{t \to \infty} p(t)$.

A time-integrated current along a single trajectory is a linear combination of the net number of jumps $\#^t$ between configurations in the network:

$\Phi^t_\alpha = \sum_{ij} C^{ij}_\alpha \left[ \#^t(i \gets j) - \#^t(j\gets i) \right]$

The idea here is that one or several transitions within the system occur because of the “absorption” or the “emission” of some environmental degrees of freedom, each with different intensity. However, for the moment let us simplify the picture and require that only one transition contributes to a current, that is that there exist $i_\alpha,j_\alpha$ such that

$C^{ij}_\alpha = \delta^i_{i_\alpha} \delta^j_{j_\alpha}$.

Now, what does it mean for such a set of currents to be “complete”? Here we get inspiration from Kirchhoff’s Current Law in electrical circuits: the continuity of the trajectory at each configuration of the network implies that after a sufficiently long time, cycle or loop or mesh currents completely describe the steady state. There is a standard procedure to identify a set of cycle currents: take a spanning tree $T$ of the network; then the currents flowing along the edges $E\setminus T$ left out from the spanning tree form a complete set.

The last ingredient you need to know are the affinities. They can be constructed as follows. Consider the Markov process on the network where the observable edges are removed $G' = (I,T)$. Calculate the steady state of its associated master equation $(p^{\mathrm{eq}}_i)_i$, which is necessarily an equilibrium (since there cannot be cycle currents in a tree…). Then the affinities are given by

$\mathcal{A}_\alpha = \log w_{i_\alpha j_\alpha} p^{\mathrm{eq}}_{j_\alpha} / w_{j_\alpha i_\alpha} p^{\mathrm{eq}}_{i_\alpha}$.

Now you have all that is needed to formulate the complete theory and prove the FR.

Homework: (Difficult!) With the above definitions, prove the FR.

How about the marginal theory? To define the effective affinities, take the set $E_{\mathrm{mar}} = \{i_\mu j_\mu, \forall \mu\}$ of edges where there run observable currents. Notice that now its complement obtained by removing the observable edges, the hidden edge set $E_{\mathrm{hid}} = E \setminus E_{\mathrm{mar}}$, is not in general a spanning tree: there might be cycles that are not accounted for by our observations. However, we can still consider the Markov process on the hidden space, and calculate its stalling steady state $p^{\mathrm{st}}_i$, and ta-taaa: The effective affinities are given by

$\mathcal{Q}_\mu = \log w_{i_\mu j_\mu} p^{\mathrm{st}}_{j_\mu} / w_{j_\mu i_\mu} p^{\mathrm{st}}_{i_\mu}$.

Proving the marginal IFR is far more complicated than the complete FR. In fact, very often in my field we will not work with the current’ probability density itself, but we prefer to take its bidirectional Laplace transform and work with the currents’ cumulant generating function. There things take a quite different and more elegant look.

Many other questions and possibilities open up now. The most important one left open is: Can we generalize the theory the (physically relevant) case where the current is supported on several edges? For example, for a current defined like $\Phi^t = 5 \Phi^t_{12} + 7 \Phi^t_{34}$? Well, it depends: the theory holds provided that the stalling state is not “internally alive”, meaning that if the observable current vanishes on average, then also should $\Phi^t_{12}$ and $\Phi^t_{34}$ separately. This turns out to be a physically meaningful but quite strict condition.

### Is all of thermodynamics “effective”?

Let me conclude with some more of those philosophical considerations that sadly I have to leave out of papers…

Stochastic thermodynamics strongly depends on the identification of physical and information-theoretic entropies â€” something that I did not openly talk about, but that lurks behind the whole construction. Throughout my short experience as researcher I have been pursuing a program of “relativization” of thermodynamics, by making the role of the observer more and more evident and movable. Inspired by Einstein’s Gedankenexperimenten, I also tried to make the theory operational. This program may raise eyebrows here and there: Many thermodynamicians embrace a naive materialistic world-view whereby what only matters are “real” physical quantities like temperature, pressure, and all the rest of the information-theoretic discourse is at best mathematical speculation or a fascinating analog with no fundamental bearings. According to some, information as a physical concept lingers alarmingly close to certain extreme postmodern claims in the social sciences that “reality” does not exist unless observed, a position deemed dangerous at times when the authoritativeness of science is threatened by all sorts of anti-scientific waves.

I think, on the contrary, that making concepts relative and effective and by summoning the observer explicitly is a laic and prudent position that serves as an antidote to radical subjectivity. The other way around—clinging to the objectivity of a preferred observer, which is implied in any materialistic interpretation of thermodynamics, e.g. by assuming that the most fundamental degrees of freedom are the positions and velocities of gas’s molecules—is the dangerous position, expecially when the role of such preferred observer is passed around from the scientist to the technician and eventually to the technocrat, who would be induced to believe there are simple technological fixes to complex social problems

How do we reconcile observer-dependency and the laws of physics? The object and the subject? On the one hand, much like the position of an object depends on the reference frame, so much so entropy and entropy production do depend on the observer and the particular apparatus that he controls or experiment he is involved with. On the other hand, much like motion is ultimately independent of position and it is agreed upon by all observers that share compatible measurement protocols, so much so the laws of thermodynamics are independent of that particular observer’s quantification of entropy and entropy production (e.g., the effective Second Law holds independently of how much the marginal observer knows of the system, if he operates according to our phenomenological protocol…). This is the case even in the every-day thermodynamics as practiced by energetic engineers et al., where there are lots of choices to gauge upon, and there is no other external warrant that the amount of dissipation being quantified is the “true” one (whatever that means…)—there can only be trust in one’s own good practices and methodology.

So in this sense, I like to think that all observers are marginal, that this effective theory serves as a dictionary by which different observers practice and communicate thermodynamics, and that we should not revere the laws of thermodynamics as “true” idols, but rather as tools of good scientific practice.

### References

• M. Polettini and M. Esposito, Effective fluctuation and response theory, arXiv:1803.03552.

In this work we give the complete theory and numerous references to work of other people that was along the same lines. We employ a “spiral” approach to the presentation of the results, inspired by the pedagogical principle of Albert Baez.

• M. Polettini and M. Esposito, Effective thermodynamics for a marginal observer, Phys. Rev. Lett. 119 (2017), 240601, arXiv:1703.05715.

This is a shorter version of the story.

• B. Altaner, M. Polettini and M. Esposito, Fluctuation-dissipation relations far from equilibrium, Phys. Rev. Lett. 117 (2016), 180601, arXiv:1604.0883.

An early version of the story, containing the FDR results but not the full-fledged FR.

• G. Bisker, M. Polettini, T. R. Gingrich and J. M. Horowitz, Hierarchical bounds on entropy production inferred from partial information, J. Stat. Mech. (2017), 093210, arXiv:1708.06769.

Some extras.

• M. F. Weber and E. Frey, Master equations and the theory of stochastic path integrals, Rep. Progr. Phys. 80 (2017), 046601, arXiv:1609.02849.

Great reference if one wishes to learn about path integrals for master equation systems.

### Footnotes

1 There are as many so-called “Fluctuation Theorems” as there are authors working on them, so I decided not to call them by any name. Furthermore, notice I prefer to distinguish between a relation (a formula) and a theorem (a line of reasoning). I lingered more on this here.

2 “Just so you know, nobody knows what energy is.”—Richard Feynman.

I cannot help but mention here the beautiful book by Shapin and Schaffer, Leviathan and the Air-Pump, about the Boyle vs. Hobbes diatribe about what constitutes a “matter of fact,” and Bruno Latour’s interpretation of it in We Have Never Been Modern. Latour argues that “modernity” is a process of separation of the human and natural spheres, and within each of these spheres a process of purification of the unit facts of knowledge and the unit facts of politics, of the object and the subject. At the same time we live in a world where these two spheres are never truly separated, a world of “hybrids” that are at the same time necessary “for all practical purposes” and unconceivable according to the myths that sustain the narration of science, of the State, and even of religion. In fact, despite these myths, we cannot conceive a scientific fact out of the contextual “network” where this fact is produced and replicated, and neither we can conceive society out of the material needs that shape it: so in this sense “we have never been modern”, we are not quite different from all those societies that we take pleasure of studying with the tools of anthropology. Within the scientific community Latour is widely despised; probably he is also misread. While it is really difficult to see how his analysis applies to, say, high-energy physics, I find that thermodynamics and its ties to the industrial revolution perfectly embodies this tension between the natural and the artificial, the matter of fact and the matter of concern. Such great thinkers as Einstein and Ehrenfest thought of the Second Law as the only physical law that would never be replaced, and I believe this is revelatory. A second thought on the Second Law, a systematic and precise definition of all its terms and circumstances, reveals that the only formulations that make sense are those phenomenological statements such as Kelvin-Planck’s or similar, which require a lot of contingent definitions regarding the operation of the engine, while fetishized and universal statements are nonsensical (such as that masterwork of confusion that is “the entropy of the Universe cannot decrease”). In this respect, it is neither a purely natural law—as the moderns argue, nor a purely social construct—as the postmodern argue. One simply has to renounce to operate this separation. While I do not have a definite answer on this problem, I like to think of the Second Law as a practice, a consistency check of the thermodynamic discourse.

3 This assumption really belongs to a time, the XIXth century, when resources were virtually infinite on planet Earth…

4 As we will see shortly, we define equilibrium as that state where there are no currents at the interface between the system and the environment, so what is the environment’s own definition of equilibrium?!

5 This because we have already exploited the First Law.

6 This nomenclature comes from alchemy, via chemistry (think of Goethe’s The elective affinities…), it propagated in the XXth century via De Donder and Prigogine, and eventually it is still present in language in Luxembourg because in some way we come from the “late Brussels school”.

7 Basically, we ask that the tunable parameters are environmental properties, such as temperatures, chemical potentials, etc. and not internal properties, such as the energy landscape or the activation barriers between configurations.

## Symposium on Compositional Structures

4 May, 2018

As I type this, sitting in a lecture hall at the Lorentz Center, Jamie Vicary, University of Birmingham and University of Oxford, is announcing a new series of meetings:

The website, which will probably change, currently says this:

### Symposium on Compositional Structures (SYCO)

The Symposium on Compositional Structures is a new interdisciplinary meeting aiming to support the growing community of researchers interested in the phenomenon of compositionality, from both applied and abstract perspectives, and in particular where category theory serves as a unifying common language.

We welcome submissions from researchers across computer science, mathematics, physics, philosophy, and beyond, with the aim of fostering discussion, disseminating new ideas, and spreading knowledge of open problems between fields. Submission is encouraged for both mature research and work in progress, and by both established academics and junior researchers, including students. The meeting does not have proceedings.

While no list of topics could be exhaustive, SYCO welcomes submissions with a compositional focus related any the following areas, in particular from the perspective of category theory:

logical methods in computer science, including quantum and classical programming, concurrency, natural language processing and machine learning;

graphical calculi, including string diagrams, Petri nets and reaction networks;

languages and frameworks, including process algebras, proof nets, type theory and game semantics;

abstract algebra and pure category theory, including monoidal category theory, higher category theory, operads, polygraphs, and relationships to homotopy theory;

quantum algebra, including quantum computation and representation theory;

tools and techniques, including rewriting, formal proofs and proof assistants;

industrial applications, including case studies and real-world problem descriptions.

### Meetings

Meetings will involve both invited and contributed talks. The first meeting is planned for Autumn 2018, with more details to follow soon.

### Funding

Some funding may be available to support travel and subsistence, especially for junior researchers who are speaking at the meeting.

### Steering committee

The symposium is managed by the following people:

Ross Duncan, University of Strathclyde.
Chris Heunen, University of Edinburgh.
Aleks Kissinger, Radboud University Nijmegen.
Samuel Mimram, École Polytechnique.
Pawel Sobocinski, University of Southampton.
Jamie Vicary, University of Birmingham and University of Oxford.

## Thermodynamics of Computation

4 May, 2018

David Wolpert of the Santa Fe Institute has set up a website on the thermodynamics of computation:

Here’s the idea:

This website is the result of a successful meeting at SFI which brought together researchers from diverse disciplines including biology, computer science, physics, bioinformatics, and chemistry to discuss overlapping interesting in thermodynamics and computation.

The thermodynamic restrictions on all systems that perform computation provide major challenges to modern design of computers. For example, at present ~5% of US energy consumption is used to run computers. Similarly, ~50% of the lifetime budget of a modern high-performance computing center is to pay the energy bill. As another example, Google has placed some of their data servers next to rivers in order to use the water in the river to remove the heat they generate. As a final example, one of the major challenges facing current efforts to build exascale computers is how to keep them from melting.

The thermodynamic costs of performing computation also play a major role in biological computation. For example, ~ 20% of the calories burned by a human are used by their brain — which does nothing but compute. This is a massive reproductive fitness penalty. Indeed, one of the leading contenders for what development in hominin evolution allowed us to develop into homo sapiens is the discovery of fire, since that allowed us to cook meat and tubers, and thereby for the first time release enough calories from our food sources to power our brains. Despite this huge fitness cost though, no current evolution theory accounts for the thermodynamic consequences of computation. It also seems that minimizing the thermodynamic costs of computation has been a major factor in evolution’s selection of the chemical networks in all biological organisms, not just hominin brains. In particular, simply dividing the rate of computation in the entire biosphere by the rate of free energy incident on the earth in sunlight shows that the biosphere as a whole performs computation with a thermodynamic efficiency orders of magnitude greater than our current supercomputers.

In the 1960s and 1970s, Rolf Landauer, Charlie Bennett and collaborators performed the first, pioneering analysis of the fundamental thermodynamic costs involved in bit erasure, perhaps the simplest example of a computation. Unfortunately, their physics was semi-formal, initially with no equations at all. This is because when they did their work, nonequilibrium statistical physics was in its infancy, and so they simply did not have the tools for a formal analysis of the thermodynamics of computation.

Moreover, only a trivially small portion of computer science theory (CS) is concerned with the number of erasure operations needed to perform a given computation. At its core, much of CS is concerned with unavoidable resource / time tradeoffs in running computation. That is the basis of all computational complexity theory, many approaches to characterizing the algorithmic power of different kinds of computers, etc.

Fortunately, the last two decades have seen a revolution in nonequilibrium statistical physics. This has resulted in some extremely powerful tools for analyzing the fundamental thermodynamic properties of far-from-equilibrium dynamic systems – like computers. These tools have already clarified that there are unavoidable thermodynamic tradeoffs in computation, in addition to the resource/time tradeoffs of conventional CS theory. These thermodynamic tradeoffs relate the (physical) speed of a computation, the noise level, and whether the computational system is thermodynamically “tailored” for one specific environment or is general purpose, to name just a few. Interestingly, some of these tradeoffs are also addressed in modern computer engineering, for example in the techniques of approximate computing, adaptive slowing, etc. However, this work is being done in an ad hoc manner, driven entirely by phenomenology.

As a result, the time is ripe to pursue a new field of science and engineering: a modern thermodynamics of computation. This would combine the resource/time tradeoffs of concern in conventional CS with the thermodynamic tradeoffs in computation that are now being revealed. In this way we should be able to develop the tools necessary both for analyzing thermodynamic costs in biological systems and for engineering next-generation computers.

We hope this website will serve as a gathering place and hub for all interested.

Wolpert writes:

This website contains

1) An initial list of researchers working on relating topics, grouped by scientific field:

i) Nonequilibrium statistical physics
ii) Stochastic thermodynamics
iii) Chemical reaction networks
iv) Computer science theory
v) Computer science engineering to address energy costs
vi) Thermodynamics of neurobiology
vii) Thermodynamics of single cells
viii) Artificial biological computation
ix) Naturally occurring biological computation
x) Quantum thermodynamics and information processing

2) An initial list of relevant papers, grouped by the same fields, with added meta-data
of keywords, citation scores, and click-through counts.

3) The researchers are cross-referenced with any of the papers they are authors on.

4) A job board

5) An events page

6) A discussion forums page.

Coming soon are more elaborate search functions, a dedicated page for funding opportunities, one for video presentations, and automated special announcement notifications.

Please note that this website is a wiki. That means that everyone who is registered can edit it. Please register (under “request an account”) and then edit the site.

In particular, the current list of researchers and papers is only a starting point. There are likely many omissions, misfilings of researchers and papers in the wrong subject area, and outright mistakes.

## The Golden Ratio and the Entropy of Braids

22 November, 2017

Here’s a cute connection between topological entropy, braids, and the golden ratio. I learned about it in this paper:

• Jean-Luc Thiffeault and Matthew D. Finn, Topology, braids, and mixing in fluids.

### Topological entropy

I’ve talked a lot about entropy on this blog, but not much about topological entropy. This is a way to define the entropy of a continuous map $f$ from a compact topological space $X$ to itself. The idea is that a map that mixes things up a lot should have a lot of entropy. In particular, any map defining a ‘chaotic’ dynamical system should have positive entropy, while non-chaotic maps maps should have zero entropy.

How can we make this precise? First, cover $X$ with finitely many open sets $U_1, \dots, U_k.$ Then take any point in $X,$ apply the map $f$ to it over and over, say $n$ times, and report which open set the point lands in each time. You can record this information in a string of symbols. How much information does this string have? The easiest way to define this is to simply count the total number of strings that can be produced this way by choosing different points initially. Then, take the logarithm of this number.

Of course the answer depends on $n,$ typically growing bigger as $n$ increases. So, divide it by $n$ and try to take the limit as $n \to \infty.$ Or, to be careful, take the lim sup: this could be infinite, but it’s always well-defined. This will tell us how much new information we get, on average, each time we apply the map and report which set our point lands in.

And of course the answer also depends on our choice of open cover $U_1, \dots, U_k.$ So, take the supremum over all finite open covers. This is called the topological entropy of $f.$

Believe it or not, this is often finite! Even though the log of the number of symbol strings we get will be larger when we use a cover with lots of small sets, when we divide by $n$ and take the limit as $n \to \infty$ this dependence often washes out.

### Braids

Any braid gives a bunch of maps from the disc to itself. So, we define the entropy of a braid to be the minimum—or more precisely, the infimum—of the topological entropies of these maps.

How does a braid give a bunch of maps from the disc to itself? Imagine the disc as made of very flexible rubber. Grab it at some finite set of points and then move these points around in the pattern traced out by the braid. When you’re done you get a map from the disc to itself. The map you get is not unique, since the rubber is wiggly and you could have moved the points around in slightly different ways. So, you get a bunch of maps.

I’m being sort of lazy in giving precise details here, since the idea seems so intuitively obvious. But that could be because I’ve spent a lot of time thinking about braids, the braid group, and their relation to maps from the disc to itself!

This picture by Thiffeault and Finn may help explain the idea:

As we keep move points around each other, we keep building up more complicated braids with 4 strands, and keep getting more complicated maps from the disc to itself. In fact, these maps are often chaotic! More precisely: they often have positive entropy.

In this other picture the vertical axis represents time, and we more clearly see the braid traced out as our 4 points move around:

Each horizontal slice depicts a map from the disc (or square: this is topology!) to itself, but we only see their effect on a little rectangle drawn in black.

### The golden ratio

Okay, now for the punchline!

Puzzle 1. Which braid with 3 strands has the highest entropy per generator? What is its entropy per generator?

I should explain: any braid with 3 strands can be written as a product of generators $\sigma_1, \sigma_2, \sigma_1^{-1}, \sigma_2^{-1}.$ Here $\sigma_1$ switches strands 1 and 2 moving the counterclockwise around each other, $\sigma_2$ does the same for strands 2 and 3, and $\sigma_1^{-1}$ and $\sigma_2^{-1}$ do the same but moving the strands clockwise.

For any braid we can write it as a product of $n$ generators with $n$ as small as possible, and then we can evaluate its entropy divided by $n.$ This is the right way to compare the entropy of braids, because if a braid gives a chaotic map we expect powers of that braid to have entropy growing linearly with $n.$

Now for the answer to the puzzle!

Answer 1. A 3-strand braid maximizing the entropy per generator is $\sigma_1 \sigma_2^{-1}.$ And the entropy of this braid, per generator, is the logarithm of the golden ratio:

$\displaystyle{ \log \left( \frac{\sqrt{5} + 1}{2} \right) }$

In other words, the entropy of this braid is

$\displaystyle{ \log \left( \frac{\sqrt{5} + 1}{2} \right)^2 }$

All this works regardless of which base we use for our logarithms. But if we use base e, which seems pretty natural, the maximum possible entropy per generator is

$\displaystyle{ \ln \left( \frac{\sqrt{5} + 1}{2} \right) \approx 0.48121182506\dots }$

Or if you prefer base 2, then each time you stir around a point in the disc with this braid, you’re creating

$\displaystyle{ \log_2 \left( \frac{\sqrt{5} + 1}{2} \right) \approx 0.69424191363\dots }$

bits of unknown information.

This fact was proved here:

• D. D’Alessandro, M. Dahleh and I Mezíc, Control of mixing in fluid flow: A maximum entropy approach, IEEE Transactions on Automatic Control 44 (1999), 1852–1863.

So, people call this braid $\sigma_1 \sigma_2^{-1}$ the golden braid. But since you can use it to generate entropy forever, perhaps it should be called the eternal golden braid.

What does it all mean? Well, the 3-strand braid group is called $\mathrm{B}_3$, and I wrote a long story about it:

• John Baez, This Week’s Finds in Mathematical Physics (Week 233).

You’ll see there that $\mathrm{B}_3$ has a representation as 2 × 2 matrices:

$\displaystyle{ \sigma_1 \mapsto \left(\begin{array}{rr} 1 & 1 \\ 0 & 1 \end{array}\right)}$

$\displaystyle{ \sigma_2 \mapsto \left(\begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array}\right) }$

These matrices are shears, which is connected to how the braids $\sigma_1$ and $\sigma_2$ give maps from the disc to itself that shear points. If we take the golden braid and turn it into a matrix using this representation, we get a matrix for which the magnitude of its largest eigenvalue is the square of the golden ratio! So, the amount of stretching going on is ‘the golden ratio per generator’.

I guess this must be part of the story too:

Puzzle 2. Is it true that when we multiply $n$ matrices of the form

$\displaystyle{ \left(\begin{array}{rr} 1 & 1 \\ 0 & 1 \end{array}\right) , \quad \left(\begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array}\right) }$

or their inverses:

$\displaystyle{ \left(\begin{array}{rr} 1 & -1 \\ 0 & 1 \end{array}\right) , \quad \left(\begin{array}{rr} 1 & 0 \\ 1 & 1 \end{array}\right) }$

the magnitude of the largest eigenvalue of the resulting product can never exceed the $n$th power of the golden ratio?

There’s also a strong connection between braid groups, certain quasiparticles in the plane called Fibonacci anyons, and the golden ratio. But I don’t see the relation between these things and topological entropy! So, there is a mystery here—at least for me.

For more, see:

• Matthew D. Finn and Jean-Luc Thiffeault, Topological optimisation of rod-stirring devices, SIAM Review 53 (2011), 723—743.

Abstract. There are many industrial situations where rods are used to stir a fluid, or where rods repeatedly stretch a material such as bread dough or taffy. The goal in these applications is to stretch either material lines (in a fluid) or the material itself (for dough or taffy) as rapidly as possible. The growth rate of material lines is conveniently given by the topological entropy of the rod motion. We discuss the problem of optimising such rod devices from a topological viewpoint. We express rod motions in terms of generators of the braid group, and assign a cost based on the minimum number of generators needed to write the braid. We show that for one cost function—the topological entropy per generator—the optimal growth rate is the logarithm of the golden ratio. For a more realistic cost function,involving the topological entropy per operation where rods are allowed to move together, the optimal growth rate is the logarithm of the silver ratio, $1+ \sqrt{2}.$ We show how to construct devices that realise this optimal growth, which we call silver mixers.

Here is the silver ratio:

But now for some reason I feel it’s time to stop!

## Biology as Information Dynamics (Part 3)

9 November, 2017

On Monday I’m giving this talk at Caltech:

Biology as information dynamics, November 13, 2017, 4:00–5:00 pm, General Biology Seminar, Kerckhoff 119, Caltech.

If you’re around, please check it out! I’ll be around all day talking to people, including Erik Winfree, my graduate student host Fangzhou Xiao, and other grad students.

If you can’t make it, you can watch this video! It’s a neat subject, and I want to do more on it:

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’ — a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Liebler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clearer, more general formulation of Fisher’s fundamental theorem of natural selection.

## Entropy 2018

6 July, 2017

The editors of the journal Entropy are organizing this conference:

Entropy 2018 — From Physics to Information Sciences and Geometry, 14–16 May 2018, Auditorium Enric Casassas, Faculty of Chemistry, University of Barcelona, Barcelona, Spain.

They write:

One of the most frequently used scientific words is the word “entropy”. The reason is that it is related to two main scientific domains: physics and information theory. Its origin goes back to the start of physics (thermodynamics), but since Shannon, it has become related to information theory. This conference is an opportunity to bring researchers of these two communities together and create a synergy. The main topics and sessions of the conference cover:

• Physics: classical and quantum thermodynamics
• Statistical physics and Bayesian computation
• Geometrical science of information, topology and metrics
• Maximum entropy principle and inference
• Kullback and Bayes or information theory and Bayesian inference
• Entropy in action (applications)

The inter-disciplinary nature of contributions from both theoretical and applied perspectives are very welcome, including papers addressing conceptual and methodological developments, as well as new applications of entropy and information theory.

All accepted papers will be published in the proceedings of the conference. A selection of invited and contributed talks presented during the conference will be invited to submit an extended version of their paper for a special issue of the open access journal Entropy.

## Information Processing in Chemical Networks (Part 2)

13 June, 2017

I’m in Luxembourg, and I’ll be blogging a bit about this workshop:

Dynamics, Thermodynamics and Information Processing in Chemical Networks, 13-16 June 2017, Complex Systems and Statistical Mechanics Group, University of Luxembourg. Organized by Massimiliano Esposito and Matteo Polettini.

I’ll do it in the comments!

I explained the idea of this workshop here:

and now you can see the program here.