## Coupling Through Emergent Conservation Laws (Part 2)

27 June, 2018

joint post with Jonathan Lorand, Blake Pollard, and Maru Sarazola

Here’s a little introduction to the chemistry and thermodynamics prerequisites for our work on ‘coupling’. Luckily, it’s fun stuff that everyone should know: a lot of the world runs on these principles!

We will be working with reaction networks. A reaction network consists of a set of reactions, for example $\mathrm{X}+\mathrm{Y}\longrightarrow \mathrm{XY}$

Here X, Y and XY are the species involved, and we interpret this reaction as species X and Y combining to form species XY. We call X and Y the reactants and XY the product. Additive combinations of species, such as X + Y, are called complexes.

The law of mass action states that the rate at which a reaction occurs is proportional to the product of the concentrations of the reactants. The proportionality constant is called the rate constant; it is a positive real number associated to a reaction that depends on chemical properties of the reaction along with the temperature, the pH of the solution, the nature of any catalysts that may be present, and so on. Every reaction has a reverse reaction; that is, if X and Y combine to form XY, then XY can also split into X and Y. The reverse reaction has its own rate constant.

We can summarize this information by writing $\mathrm{X} + \mathrm{Y} \mathrel{\substack{\alpha_{\rightarrow} \\\longleftrightarrow\\ \alpha_{\leftarrow}}} \mathrm{XY}$

where $\alpha_{\to}$ is the rate constant for X and Y to combine and form XY, while $\alpha_\leftarrow$ is the rate constant for the reverse reaction.

As time passes and reactions occur, the concentration of each species will likely change. We can record this information in a collection of functions $[\mathrm{X}] \colon \mathbb{R} \to [0,\infty),$

one for each species $X,$ where $\mathrm{X}(t)$ gives the concentration of the species $\mathrm{X}$ at time $t.$ This naturally leads one to consider the rate equation of a given reaction, which specifies the time evolution of these concentrations. The rate equation can be read off from the reaction network, and in the above example it is: $\begin{array}{ccc} \dot{[\mathrm{X}]} & = & -\alpha_\to [\mathrm{X}][\mathrm{Y}]+\alpha_\leftarrow [\mathrm{XY}]\\ \dot{[\mathrm{Y}]} & = & -\alpha_\to [\mathrm{X}][\mathrm{Y}]+\alpha_\leftarrow [\mathrm{XY}]\\ \dot{[\mathrm{XY}]} & = & \alpha_\to [\mathrm{X}][\mathrm{Y}]-\alpha_\leftarrow [\mathrm{XY}] \end{array}$

Here $\alpha_\to [\mathrm{X}] [\mathrm{Y}]$ is the rate at which the forward reaction is occurring; thanks to the law of mass action, this is the rate constant $\alpha_\to$ times the product of the concentrations of X and Y. Similarly, $\alpha_\leftarrow [\mathrm{XY}]$ is the rate at which the reverse reaction is occurring.

We say that a system is in detailed balanced equilibrium, or simply equilibrium, when every reaction occurs at the same rate as its reverse reaction. This implies that the concentration of each species is constant in time. In our example, the condition for equilibrium is $\displaystyle{ \frac{\alpha_\to}{\alpha_\leftarrow}=\frac{[\mathrm{XY}]}{[\mathrm{X}][\mathrm{Y}]} }$

and the rate equation then implies that $\dot{[\mathrm{X}]} = \dot{[\mathrm{Y}]} =\dot{[\mathrm{XY}]} = 0$

The laws of thermodynamics determine the ratio of the forward and reverse rate constants. For any reaction at all, this ratio is $\displaystyle{ \frac{\alpha_\to}{\alpha_\leftarrow} = e^{-\Delta {G^\circ}/RT} } \qquad \qquad \qquad (1)$

where $T$ is the temperature, $R$ is the ideal gas constant, and $\Delta {G^\circ}$ is the free energy change under standard conditions.

Note that if $\Delta {G^\circ} < 0$, then the rate constant of the forward reaction is larger than the rate constant of the reverse reaction: $\alpha_\to > \alpha_\leftarrow$

In this case one may loosely say that the forward reaction ‘wants’ to happen ‘spontaneously’. Such a reaction is called exergonic. If on the other hand $\Delta {G^\circ} > 0$, then the forward reaction is ‘non-spontaneous’ and it is called endergonic.

The most important thing for us is that $\Delta {G^\circ}$ takes a very simple form. Each species has a free energy. The free energy of a complex $\mathrm{A}_1 + \cdots + \mathrm{A}_m$

is the sum of the free energies of the species $\mathrm{A}_i$. Given a reaction $\mathrm{A}_1 + \cdots + \mathrm{A}_m \longrightarrow \mathrm{B}_1 + \cdots + \mathrm{B}_n$

the free energy change $\Delta {G^\circ}$ for this reaction is the free energy of $\mathrm{B}_1 + \cdots + \mathrm{B}_n$

minus the free energy of $\mathrm{A}_1 + \cdots + \mathrm{A}_m.$

As a consequence, $\Delta{G^\circ}$ is additive with respect to combining multiple reactions in either series or parallel. In particular, then, the law (1) imposes relations between ratios of rate constants: for example, if we have the following more complicated set of reactions $\mathrm{A} \mathrel{\substack{\alpha_{\rightarrow} \\\longleftrightarrow\\ \alpha_{\leftarrow}}} \mathrm{B}$ $\mathrm{B} \mathrel{\substack{\beta_{\rightarrow} \\\longleftrightarrow\\ \beta_{\leftarrow}}} \mathrm{C}$ $\mathrm{A} \mathrel{\substack{\gamma_{\rightarrow} \\\longleftrightarrow\\ \gamma_{\leftarrow}}} \mathrm{C}$

then we must have $\displaystyle{ \frac{\gamma_\to}{\gamma_\leftarrow} = \frac{\alpha_\to}{\alpha_\leftarrow} \frac{\beta_\to}{\beta_\leftarrow} . }$

So, not only are the rate constant ratios of reactions determined by differences in free energy, but also nontrivial relations between these ratios can arise, depending on the structure of the system of reactions in question!

Okay—this is all the basic stuff we’ll need to know. Please ask questions! Next time we’ll go ahead and use this stuff to start thinking about how biology manages to make reactions that ‘want’ to happen push forward reactions that are useful but wouldn’t happen spontaneously on their own.

The paper:

• John Baez, Jonathan Lorand, Blake S. Pollard and Maru Sarazola,
Biochemical coupling through emergent conservation laws.

The blog series:

Part 1 – Introduction.

Part 2 – Review of reaction networks and equilibrium thermodynamics.

Part 3 – What is coupling?

Part 4 – Interactions.

Part 5 – Coupling in quasiequilibrium states.

Part 6 – Emergent conservation laws.

Part 7 – The urea cycle.

Part 8 – The citric acid cycle.

## Coupling Through Emergent Conservation Laws (Part 1)

27 June, 2018

joint post with Jonathan Lorand, Blake Pollard, and Maru Sarazola

In the cell, chemical reactions are often ‘coupled’ so that reactions that release energy drive reactions that are biologically useful but involve an increase in energy. But how, exactly, does coupling work?

Much is known about this question, but the literature is also full of vague explanations and oversimplifications. Coupling cannot occur in equilibrium; it arises in open systems, where the concentrations of certain chemicals are held out of equilibrium due to flows in and out. One might thus suspect that the simplest mathematical treatment of this phenomenon would involve non-equilibrium steady states of open systems. However, Bazhin has shown that some crucial aspects of coupling arise in an even simpler framework:

• Nicolai Bazhin, The essence of ATP coupling, ISRN Biochemistry 2012 (2012), article 827604.

He considers ‘quasi-equilibrium’ states, where fast reactions have come into equilibrium and slow ones are neglected. He shows that coupling occurs already in this simple approximation.

In this series of blog articles we’ll do two things. First, we’ll review Bazhin’s work in a way that readers with no training in biology or chemistry should be able to follow. (But if you get stuck, ask questions!) Second, we’ll explain a fact that seems to have received insufficient attention: in many cases, coupling relies on emergent conservation laws.

Conservation laws are important throughout science. Besides those that are built into the fabric of physics, such as conservation of energy and momentum, there are also many ’emergent’ conservation laws that hold approximately in certain circumstances. Often these arise when processes that change a given quantity happen very slowly. For example, the most common isotope of uranium decays into lead with a half-life of about 4 billion years—but for the purposes of chemical experiments in the laboratory, it is useful to treat the amount of uranium as a conserved quantity.

The emergent conservation laws involved in biochemical coupling are of a different nature. Instead of making the processes that violate these laws happen more slowly, the cell uses enzymes to make other processes happen more quickly. At the time scales relevant to cellular metabolism, the fast processes dominate, while slowly changing quantities are effectively conserved. By a suitable choice of these emergent conserved quantities, the cell ensures that certain reactions that release energy can only occur when other ‘desired’ reactions occur. To be sure, this is only approximately true, on sufficiently short time scales. But this approximation is enlightening!

Following Bazhin, our main example involves ATP hydrolysis. We consider this following schema for a whole family of reactions: $\begin{array}{ccc} \mathrm{X} + \mathrm{ATP} & \longleftrightarrow & \mathrm{ADP} + \mathrm{XP}_{\mathrm{i}} \qquad (1) \\ \mathrm{XP}_{\mathrm{i}} + \mathrm{Y} & \longleftrightarrow & \mathrm{XY} + \mathrm{P}_{\mathrm{i}} \,\;\;\;\;\qquad (2) \end{array}$

Some concrete examples of this schema include:

• The synthesis of glutamine (XY) from glutamate (X) and ammonium (Y). This is part of the important glutamate-glutamine cycle in the central nervous system.

• The synthesis of sucrose (XY) from glucose (X) and fructose (Y). This is one of many processes whereby plants synthesize more complex sugars and starches from simpler building-blocks.

In these and other examples, the two reactions, taken together, have the effect of synthesizing a larger molecule XY out of two parts X and Y while ATP is broken down to ADP and the phosphate ion Pi Thus, they have the same net effect as this other pair of reactions: $\begin{array}{ccc} \mathrm{X} + \mathrm{Y} &\longleftrightarrow & \mathrm{XY} \;\;\;\quad \quad \qquad (3) \\ \mathrm{ATP} &\longleftrightarrow & \mathrm{ADP} + \mathrm{P}_{\mathrm{i}} \qquad (4) \end{array}$

The first reaction here is just the synthesis of XY from X and Y. The second is a deliberately simplified version of ATP hydrolysis. The first involves an increase of energy, while the second releases energy. But in the schema used in biology, these processes are ‘coupled’ so that ATP can only break down to ADP + Pi if X and Y combine to form XY.

As we shall see, this coupling crucially relies on a conserved quantity: the total number of Y molecules plus the total number of Pi ions is left unchanged by reactions (1) and (2). This fact is not a fundamental law of physics, nor even a general law of chemistry (such as conservation of phosphorus atoms). It is an emergent conservation law that holds approximately in special situations. Its approximate validity relies on the fact that the cell has enzymes that make reactions (1) and (2) occur more rapidly than reactions that violate this law, such as (3) and (4).

In the series to come, we’ll start by providing the tiny amount of chemistry and thermodynamics needed to understand what’s going on. Then we’ll raise the question “what is coupling?” Then we’ll study the reactions required for coupling ATP hydrolysis to the synthesis of XY from components X and Y, and explain why these reactions are not yet enough for coupling. Then we’ll show that coupling occurs in a ‘quasiequilibrium’ state where reactions (1) and (2), assumed much faster than the rest, have reached equilibrium, while the rest are neglected. And then we’ll explain the role of emergent conservation laws!

The paper:

• John Baez, Jonathan Lorand, Blake S. Pollard and Maru Sarazola,
Biochemical coupling through emergent conservation laws.

The blog series:

Part 1 – Introduction.

Part 2 – Review of reaction networks and equilibrium thermodynamics.

Part 3 – What is coupling?

Part 4 – Interactions.

Part 5 – Coupling in quasiequilibrium states.

Part 6 – Emergent conservation laws.

Part 7 – The urea cycle.

Part 8 – The citric acid cycle.

## A Biochemistry Question

26 June, 2018

Does anyone know a real-world example of a cycle like this:

or in other words, this: $\begin{array}{ccc} \mathrm{A} + \mathrm{C}_1 \longrightarrow \mathrm{C}_2 \\ \mathrm{X} + \mathrm{C}_2 \longrightarrow \mathrm{C}_3 \\ \mathrm{C}_3 \longrightarrow \mathrm{B} + \mathrm{C}_4 \\ \mathrm{C}_4 \longrightarrow \mathrm{Y} + \mathrm{C}_1 \end{array}$

where the reaction $\mathrm{A} \to \mathrm{B}$

is exergonic (i.e., involves a decrease in free energy) while $\mathrm{X} \to \mathrm{Y}$

is endergonic (i.e., involves a free energy increase)?

The idea is that the above cycle, presumably catalyzed so that all the reactions go fairly fast under normal conditions, ‘couples’ the exergonic reaction, which ‘wants to happen’, to the endergonic reaction, which doesn’t… thus driving the endergonic one.

I would love an example from biochemistry. This is like a baby version of much more elaborate cycles such as the citric acid cycle, shown here: in a picture from Stryer’s Biochemistry. I’m writing a paper on this stuff with Jonathan Lorand, Blake Pollard and Maru Sarazola, and we have—presumably obvious—reasons to want to discuss a simpler cycle!

## Applied Category Theory 2018 – Videos

30 April, 2018

Some of the talks at Applied Category Theory 2018 were videotaped by the Statebox team. You can watch them on YouTube:

• David Spivak, A higher-order temporal logic for dynamical systems. Book available here and slides here.

• Fabio Zanasi and Bart Jacobs, Categories in Bayesian networks. Paper available here. (Some sound missing; when you hit silence skip forwards to about 15:00.)

• Bob Coecke and Aleks Kissinger, Causality. Paper available here.

• Samson Abramsky, Games and constraint satisfaction, Part 1 and Part 2. Paper available here.

• Dan Ghica, Diagrammatic semantics for digital circuits. Paper available here.

• Kathryn Hess, Towards a categorical approach to neuroscience.

• Tom Leinster, Biodiversity and the theory of magnitude. Papers available here and here.

• John Baez, Props in network theory. Slides available here, paper here and blog article here.

## Retrotransposons

14 January, 2018 • Ed Yong, Brain cells share information with virus-like capsules, Atlantic, January 12, 2018.

Your brain needs a protein called Arc. If you have trouble making this protein, you’ll have trouble forming new memories. The neuroscientist Jason Shepherd noticed something weird:

He saw that these Arc proteins assemble into hollow, spherical shells that look uncannily like viruses. “When we looked at them, we thought: What are these things?” says Shepherd. They reminded him of textbook pictures of HIV, and when he showed the images to HIV experts, they confirmed his suspicions. That, to put it bluntly, was a huge surprise. “Here was a brain gene that makes something that looks like a virus,” Shepherd says.

That’s not a coincidence. The team showed that Arc descends from an ancient group of genes called gypsy retrotransposons, which exist in the genomes of various animals, but can behave like their own independent entities. They can make new copies of themselves, and paste those duplicates elsewhere in their host genomes. At some point, some of these genes gained the ability to enclose themselves in a shell of proteins and leave their host cells entirely. That was the origin of retroviruses—the virus family that includes HIV.

It’s worth pointing out that gypsy is the name of a specific kind of retrotransposon. A retrotransposon is a gene that can make copies of itself by first transcribing itself from DNA into RNA and then converting itself back into DNA and inserting itself at other places in your chromosomes.

About 40% of your genes are retrotransposons! They seem to mainly be ‘selfish genes’, focused on their own self-reproduction. But some are also useful to you.

So, Arc genes are the evolutionary cousins of these viruses, which explains why they produce shells that look so similar. Specifically, Arc is closely related to a viral gene called gag, which retroviruses like HIV use to build the protein shells that enclose their genetic material. Other scientists had noticed this similarity before. In 2006, one team searched for human genes that look like gag, and they included Arc in their list of candidates. They never followed up on that hint, and “as neuroscientists, we never looked at the genomic papers so we didn’t find it until much later,” says Shepherd.

I love this because it confirms my feeling that viruses are deeply entangled with our evolutionary past. Computer viruses are just the latest phase of this story.

As if that wasn’t weird enough, other animals seem to have independently evolved their own versions of Arc. Fruit flies have Arc genes, and Shepherd’s colleague Cedric Feschotte showed that these descend from the same group of gypsy retrotransposons that gave rise to ours. But flies and back-boned animals co-opted these genes independently, in two separate events that took place millions of years apart. And yet, both events gave rise to similar genes that do similar things: Another team showed that the fly versions of Arc also sends RNA between neurons in virus-like capsules. “It’s exciting to think that such a process can occur twice,” says Atma Ivancevic from the University of Adelaide.

This is part of a broader trend: Scientists have in recent years discovered several ways that animals have used the properties of virus-related genes to their evolutionary advantage. Gag moves genetic information between cells, so it’s perfect as the basis of a communication system. Viruses use another gene called env to merge with host cells and avoid the immune system. Those same properties are vital for the placenta—a mammalian organ that unites the tissues of mothers and babies. And sure enough, a gene called syncytin, which is essential for the creation of placentas, actually descends from env. Much of our biology turns out to be viral in nature.

Here’s something I wrote in 1998 when I was first getting interested in this business:

RNA reverse transcribing viruses

RNA reverse transcribing viruses are usually called retroviruses. They have a single-stranded RNA genome. They infect animals, and when they get inside the cell’s nucleus, they copy themselves into the DNA of the host cell using reverse transcriptase. In the process they often cause tumors, presumably by damaging the host’s DNA.

Retroviruses are important in genetic engineering because they raised for the first time the possibility that RNA could be transcribed into DNA, rather than the reverse. In fact, some of them are currently being deliberately used by scientists to add new genes to mammalian cells.

Retroviruses are also important because AIDS is caused by a retrovirus: the human immunodeficiency virus (HIV). This is part of why AIDS is so difficult to treat. Most usual ways of killing viruses have no effect on retroviruses when they are latent in the DNA of the host cell.

From an evolutionary viewpoint, retroviruses are fascinating because they blur the very distinction between host and parasite. Their genome often contains genetic information derived from the host DNA. And once they are integrated into the DNA of the host cell, they may take a long time to reemerge. In fact, so-called endogenous retroviruses can be passed down from generation to generation, indistinguishable from any other cellular gene, and evolving along with their hosts, perhaps even from species to species! It has been estimated that up to 1% of the human genome consists of endogenous retroviruses! Furthermore, not every endogenous retrovirus causes a noticeable disease. Some may even help their hosts.

It gets even spookier when we notice that once an endogenous retrovirus lost the genes that code for its protein coat, it would become indistinguishable from a long terminal repeat (LTR) retrotransposon—one of the many kinds of “junk DNA” cluttering up our chromosomes. Just how much of us is made of retroviruses? It’s hard to be sure.

For my whole article, go here:

It’s about the mysterious subcellular entities that stand near the blurry border between the living and the non-living—like viruses, viroids, plasmids, satellites, transposons and prions. I need to update it, since a lot of new stuff is being discovered!

Jason Shepherd’s new paper has a few other authors:

• Elissa D. Pastuzyn, Cameron E. Day, Rachel B. Kearns, Madeleine Kyrke-Smith, Andrew V. Taibi, John McCormick, Nathan Yoder, David M. Belnap, Simon Erlendsson, Dustin R. Morado, John A.G. Briggs, Cédric Feschotte and Jason D. Shepherd, The neuronal gene Arc encodes a repurposed retrotransposon gag protein that mediates intercellular RNA transfer, Cell 172 (2018), 275–288.

## Biology as Information Dynamics (Part 3)

9 November, 2017

On Monday I’m giving this talk at Caltech:

Biology as information dynamics, November 13, 2017, 4:00–5:00 pm, General Biology Seminar, Kerckhoff 119, Caltech.

If you’re around, please check it out! I’ll be around all day talking to people, including Erik Winfree, my graduate student host Fangzhou Xiao, and other grad students.

If you can’t make it, you can watch this video! It’s a neat subject, and I want to do more on it:

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’ — a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Liebler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clearer, more general formulation of Fisher’s fundamental theorem of natural selection.

## Biology as Information Dynamics (Part 2)

27 April, 2017

Here’s a video of the talk I gave at the Stanford Complexity Group:

You can see slides here:

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’ — a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Liebler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clearer, more general formulation of Fisher’s fundamental theorem of natural selection.

I’d given a version of this talk earlier this year at a workshop on Quantifying biological complexity, but I’m glad this second try got videotaped and not the first, because I was a lot happier about my talk this time. And as you’ll see at the end, there were a lot of interesting questions.