Applied Category Theory 2018 – Videos

30 April, 2018

Some of the talks at Applied Category Theory 2018 were videotaped by the Statebox team. You can watch them on YouTube:

• David Spivak, A higher-order temporal logic for dynamical systems. Book available here and slides here.

• Fabio Zanasi and Bart Jacobs, Categories in Bayesian networks. Paper available here. (Some sound missing; when you hit silence skip forwards to about 15:00.)

• Bob Coecke and Aleks Kissinger, Causality. Paper available here.

• Samson Abramsky, Games and constraint satisfaction, Part 1 and Part 2. Paper available here.

• Dan Ghica, Diagrammatic semantics for digital circuits. Paper available here.

• Kathryn Hess, Towards a categorical approach to neuroscience.

• Tom Leinster, Biodiversity and the theory of magnitude. Papers available here and here.

• John Baez, Props in network theory. Slides available here, paper here and blog article here.


Retrotransposons

14 January, 2018

This article is very interesting:

• Ed Yong, Brain cells share information with virus-like capsules, Atlantic, January 12, 2018.

Your brain needs a protein called Arc. If you have trouble making this protein, you’ll have trouble forming new memories. The neuroscientist Jason Shepherd noticed something weird:

He saw that these Arc proteins assemble into hollow, spherical shells that look uncannily like viruses. “When we looked at them, we thought: What are these things?” says Shepherd. They reminded him of textbook pictures of HIV, and when he showed the images to HIV experts, they confirmed his suspicions. That, to put it bluntly, was a huge surprise. “Here was a brain gene that makes something that looks like a virus,” Shepherd says.

That’s not a coincidence. The team showed that Arc descends from an ancient group of genes called gypsy retrotransposons, which exist in the genomes of various animals, but can behave like their own independent entities. They can make new copies of themselves, and paste those duplicates elsewhere in their host genomes. At some point, some of these genes gained the ability to enclose themselves in a shell of proteins and leave their host cells entirely. That was the origin of retroviruses—the virus family that includes HIV.

It’s worth pointing out that gypsy is the name of a specific kind of retrotransposon. A retrotransposon is a gene that can make copies of itself by first transcribing itself from DNA into RNA and then converting itself back into DNA and inserting itself at other places in your chromosomes.

About 40% of your genes are retrotransposons! They seem to mainly be ‘selfish genes’, focused on their own self-reproduction. But some are also useful to you.

So, Arc genes are the evolutionary cousins of these viruses, which explains why they produce shells that look so similar. Specifically, Arc is closely related to a viral gene called gag, which retroviruses like HIV use to build the protein shells that enclose their genetic material. Other scientists had noticed this similarity before. In 2006, one team searched for human genes that look like gag, and they included Arc in their list of candidates. They never followed up on that hint, and “as neuroscientists, we never looked at the genomic papers so we didn’t find it until much later,” says Shepherd.

I love this because it confirms my feeling that viruses are deeply entangled with our evolutionary past. Computer viruses are just the latest phase of this story.

As if that wasn’t weird enough, other animals seem to have independently evolved their own versions of Arc. Fruit flies have Arc genes, and Shepherd’s colleague Cedric Feschotte showed that these descend from the same group of gypsy retrotransposons that gave rise to ours. But flies and back-boned animals co-opted these genes independently, in two separate events that took place millions of years apart. And yet, both events gave rise to similar genes that do similar things: Another team showed that the fly versions of Arc also sends RNA between neurons in virus-like capsules. “It’s exciting to think that such a process can occur twice,” says Atma Ivancevic from the University of Adelaide.

This is part of a broader trend: Scientists have in recent years discovered several ways that animals have used the properties of virus-related genes to their evolutionary advantage. Gag moves genetic information between cells, so it’s perfect as the basis of a communication system. Viruses use another gene called env to merge with host cells and avoid the immune system. Those same properties are vital for the placenta—a mammalian organ that unites the tissues of mothers and babies. And sure enough, a gene called syncytin, which is essential for the creation of placentas, actually descends from env. Much of our biology turns out to be viral in nature.

Here’s something I wrote in 1998 when I was first getting interested in this business:

RNA reverse transcribing viruses

RNA reverse transcribing viruses are usually called retroviruses. They have a single-stranded RNA genome. They infect animals, and when they get inside the cell’s nucleus, they copy themselves into the DNA of the host cell using reverse transcriptase. In the process they often cause tumors, presumably by damaging the host’s DNA.

Retroviruses are important in genetic engineering because they raised for the first time the possibility that RNA could be transcribed into DNA, rather than the reverse. In fact, some of them are currently being deliberately used by scientists to add new genes to mammalian cells.

Retroviruses are also important because AIDS is caused by a retrovirus: the human immunodeficiency virus (HIV). This is part of why AIDS is so difficult to treat. Most usual ways of killing viruses have no effect on retroviruses when they are latent in the DNA of the host cell.

From an evolutionary viewpoint, retroviruses are fascinating because they blur the very distinction between host and parasite. Their genome often contains genetic information derived from the host DNA. And once they are integrated into the DNA of the host cell, they may take a long time to reemerge. In fact, so-called endogenous retroviruses can be passed down from generation to generation, indistinguishable from any other cellular gene, and evolving along with their hosts, perhaps even from species to species! It has been estimated that up to 1% of the human genome consists of endogenous retroviruses! Furthermore, not every endogenous retrovirus causes a noticeable disease. Some may even help their hosts.

It gets even spookier when we notice that once an endogenous retrovirus lost the genes that code for its protein coat, it would become indistinguishable from a long terminal repeat (LTR) retrotransposon—one of the many kinds of “junk DNA” cluttering up our chromosomes. Just how much of us is made of retroviruses? It’s hard to be sure.

For my whole article, go here:

Subcellular life forms.

It’s about the mysterious subcellular entities that stand near the blurry border between the living and the non-living—like viruses, viroids, plasmids, satellites, transposons and prions. I need to update it, since a lot of new stuff is being discovered!

Jason Shepherd’s new paper has a few other authors:

• Elissa D. Pastuzyn, Cameron E. Day, Rachel B. Kearns, Madeleine Kyrke-Smith, Andrew V. Taibi, John McCormick, Nathan Yoder, David M. Belnap, Simon Erlendsson, Dustin R. Morado, John A.G. Briggs, Cédric Feschotte and Jason D. Shepherd, The neuronal gene Arc encodes a repurposed retrotransposon gag protein that mediates intercellular RNA transfer, Cell 172 (2018), 275–288.


Biology as Information Dynamics (Part 3)

9 November, 2017

On Monday I’m giving this talk at Caltech:

Biology as information dynamics, November 13, 2017, 4:00–5:00 pm, General Biology Seminar, Kerckhoff 119, Caltech.

If you’re around, please check it out! I’ll be around all day talking to people, including Erik Winfree, my graduate student host Fangzhou Xiao, and other grad students.

If you can’t make it, you can watch this video! It’s a neat subject, and I want to do more on it:

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’ — a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Liebler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clearer, more general formulation of Fisher’s fundamental theorem of natural selection.


Biology as Information Dynamics (Part 2)

27 April, 2017

Here’s a video of the talk I gave at the Stanford Complexity Group:

You can see slides here:

Biology as information dynamics.

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’ — a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Liebler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clearer, more general formulation of Fisher’s fundamental theorem of natural selection.

I’d given a version of this talk earlier this year at a workshop on Quantifying biological complexity, but I’m glad this second try got videotaped and not the first, because I was a lot happier about my talk this time. And as you’ll see at the end, there were a lot of interesting questions.


Complexity Theory and Evolution in Economics

24 April, 2017

This book looks interesting:

• David S. Wilson and Alan Kirman, editors, Complexity and Evolution: Toward a New Synthesis for Economics, MIT Press, Cambridge Mass., 2016.

You can get some chapters for free here. I’ve only looked carefully at this one:

• Joshua M. Epstein and Julia Chelen, Advancing Agent_Zero.

Agent_Zero is a simple toy model of an agent that’s not the idealized rational actor often studied in economics: rather, it has emotional, deliberative, and social modules which interact with each other to make decisions. Epstein and Chelen simulate collections of such agents and see what they do:

Abstract. Agent_Zero is a mathematical and computational individual that can generate important, but insufficiently understood, social dynamics from the bottom up. First published by Epstein (2013), this new theoretical entity possesses emotional, deliberative, and social modules, each grounded in contemporary neuroscience. Agent_Zero’s observable behavior results from the interaction of these internal modules. When multiple Agent_Zeros interact with one another, a wide range of important, even disturbing, collective dynamics emerge. These dynamics are not straightforwardly generated using the canonical rational actor which has dominated mathematical social science since the 1940s. Following a concise exposition of the Agent_Zero model, this chapter offers a range of fertile research directions, including the use of realistic geographies and population levels, the exploration of new internal modules and new interactions among them, the development of formal axioms for modular agents, empirical testing, the replication of historical episodes, and practical applications. These may all serve to advance the Agent_Zero research program.

It sounds like a fun and productive project as long as one keeps ones wits about one. It’s hard to draw conclusions about human behavior from such simplified agents. One can argue about this, and of course economists will. But regardless of this, one can draw conclusions about which kinds of simplified agents will engage in which kinds of collective behavior under which conditions.

Basically, one can start mapping out a small simple corner of the huge ‘phase space’ of possible societies. And that’s bound to lead to interesting new ideas that one wouldn’t get from either 1) empirical research on human and animal societies or 2) pure theoretical pondering without the help of simulations.

Here’s an article whose title, at least, takes a vastly more sanguine attitude toward benefits of such work:

• Kate Douglas, Orthodox economics is broken: how evolution, ecology, and collective behavior can help us avoid catastrophe, Evonomics, 22 July 2016.

I’ll quote just a bit:

For simplicity’s sake, orthodox economics assumes that Homo economicus, when making a fundamental decision such as whether to buy or sell something, has access to all relevant information. And because our made-up economic cousins are so rational and self-interested, when the price of an asset is too high, say, they wouldn’t buy—so the price falls. This leads to the notion that economies self-organise into an equilibrium state, where supply and demand are equal.

Real humans—be they Wall Street traders or customers in Walmart—don’t always have accurate information to hand, nor do they act rationally. And they certainly don’t act in isolation. We learn from each other, and what we value, buy and invest in is strongly influenced by our beliefs and cultural norms, which themselves change over time and space.

“Many preferences are dynamic, especially as individuals move between groups, and completely new preferences may arise through the mixing of peoples as they create new identities,” says anthropologist Adrian Bell at the University of Utah in Salt Lake City. “Economists need to take cultural evolution more seriously,” he says, because it would help them understand who or what drives shifts in behaviour.

Using a mathematical model of price fluctuations, for example, Bell has shown that prestige bias—our tendency to copy successful or prestigious individuals—influences pricing and investor behaviour in a way that creates or exacerbates market bubbles.

We also adapt our decisions according to the situation, which in turn changes the situations faced by others, and so on. The stability or otherwise of financial markets, for instance, depends to a great extent on traders, whose strategies vary according to what they expect to be most profitable at any one time. “The economy should be considered as a complex adaptive system in which the agents constantly react to, influence and are influenced by the other individuals in the economy,” says Kirman.

This is where biologists might help. Some researchers are used to exploring the nature and functions of complex interactions between networks of individuals as part of their attempts to understand swarms of locusts, termite colonies or entire ecosystems. Their work has provided insights into how information spreads within groups and how that influences consensus decision-making, says Iain Couzin from the Max Planck Institute for Ornithology in Konstanz, Germany—insights that could potentially improve our understanding of financial markets.

Take the popular notion of the “wisdom of the crowd”—the belief that large groups of people can make smart decisions even when poorly informed, because individual errors of judgement based on imperfect information tend to cancel out. In orthodox economics, the wisdom of the crowd helps to determine the prices of assets and ensure that markets function efficiently. “This is often misplaced,” says Couzin, who studies collective behaviour in animals from locusts to fish and baboons.

By creating a computer model based on how these animals make consensus decisions, Couzin and his colleagues showed last year that the wisdom of the crowd works only under certain conditions—and that contrary to popular belief, small groups with access to many sources of information tend to make the best decisions.

That’s because the individual decisions that make up the consensus are based on two types of environmental cue: those to which the entire group are exposed—known as high-correlation cues—and those that only some individuals see, or low-correlation cues. Couzin found that in larger groups, the information known by all members drowns out that which only a few individuals noticed. So if the widely known information is unreliable, larger groups make poor decisions. Smaller groups, on the other hand, still make good decisions because they rely on a greater diversity of information.

So when it comes to organising large businesses or financial institutions, “we need to think about leaders, hierarchies and who has what information”, says Couzin. Decision-making structures based on groups of between eight and 12 individuals, rather than larger boards of directors, might prevent over-reliance on highly correlated information, which can compromise collective intelligence. Operating in a series of smaller groups may help prevent decision-makers from indulging their natural tendency to follow the pack, says Kirman.

Taking into account such effects requires economists to abandon one-size-fits-all mathematical formulae in favour of “agent-based” modelling—computer programs that give virtual economic agents differing characteristics that in turn determine interactions. That’s easier said than done: just like economists, biologists usually model relatively simple agents with simple rules of interaction. How do you model a human?

It’s a nut we’re beginning to crack. One attendee at the forum was Joshua Epstein, director of the Center for Advanced Modelling at Johns Hopkins University in Baltimore, Maryland. He and his colleagues have come up with Agent_Zero, an open-source software template for a more human-like actor influenced by emotion, reason and social pressures. Collections of Agent_Zeros think, feel and deliberate. They have more human-like relationships with other agents and groups, and their interactions lead to social conflict, violence and financial panic. Agent_Zero offers economists a way to explore a range of scenarios and see which best matches what is going on in the real world. This kind of sophistication means they could potentially create scenarios approaching the complexity of real life.

Orthodox economics likes to portray economies as stately ships proceeding forwards on an even keel, occasionally buffeted by unforeseen storms. Kirman prefers a different metaphor, one borrowed from biology: economies are like slime moulds, collections of single-celled organisms that move as a single body, constantly reorganising themselves to slide in directions that are neither understood nor necessarily desired by their component parts.

For Kirman, viewing economies as complex adaptive systems might help us understand how they evolve over time—and perhaps even suggest ways to make them more robust and adaptable. He’s not alone. Drawing analogies between financial and biological networks, the Bank of England’s research chief Andrew Haldane and University of Oxford ecologist Robert May have together argued that we should be less concerned with the robustness of individual banks than the contagious effects of one bank’s problems on others to which it is connected. Approaches like this might help markets to avoid failures that come from within the system itself, Kirman says.

To put this view of macroeconomics into practice, however, might mean making it more like weather forecasting, which has improved its accuracy by feeding enormous amounts of real-time data into computer simulation models that are tested against each other. That’s not going to be easy.

 


Stanford Complexity Group

19 April, 2017

Aaron Goodman of the Stanford Complexity Group invited me to give a talk there on Thursday April 20th. If you’re nearby—like in Silicon Valley—please drop by! It will be in Clark S361 at 4:20 pm.

Here’s the idea. Everyone likes to say that biology is all about information. There’s something true about this—just think about DNA. But what does this insight actually do for us, quantitatively speaking? To figure this out, we need to do some work.

Biology is also about things that make copies of themselves. So it makes sense to figure out how information theory is connected to the replicator equation—a simple model of population dynamics for self-replicating entities.

To see the connection, we need to use ‘relative information’: the information of one probability distribution relative to another, also known as the Kullback–Leibler divergence. Then everything pops into sharp focus.

It turns out that free energy—energy in forms that can actually be used, not just waste heat—is a special case of relative information Since the decrease of free energy is what drives chemical reactions, biochemistry is founded on relative information.

But there’s a lot more to it than this! Using relative information we can also see evolution as a learning process, fix the problems with Fisher’s fundamental theorem of natural selection, and more.

So this what I’ll talk about! You can see my slides here:

• John Baez, Biology as information dynamics.

but my talk will be videotaped, and it’ll eventually be put here:

Stanford complexity group, YouTube.

You can already see lots of cool talks at this location!

 


Periodic Patterns in Peptide Masses

6 April, 2017

Gheorghe Craciun is a mathematician at the University of Wisconsin who recently proved the Global Attractor Conjecture, which since 1974 was the most famous conjecture in mathematical chemistry. This week he visited U. C. Riverside and gave a talk on this subject. But he also told me about something else—something quite remarkable.

The mystery

A peptide is basically a small protein: a chain of made of fewer than 50 amino acids. If you plot the number of peptides of different masses found in various organisms, you see peculiar oscillations:

These oscillations have a frequency of about 14 daltons, where a ‘dalton’ is roughly the mass of a hydrogen atom—or more precisely, 1/12 the mass of a carbon atom.

Biologists had noticed these oscillations in databases of peptide masses. But they didn’t understand them.

Can you figure out what causes these oscillations?

It’s a math puzzle, actually.

Next I’ll give you the answer, so stop looking if you want to think about it first.

The solution

Almost all peptides are made of 20 different amino acids, which have different masses, which are almost integers. So, to a reasonably good approximation, the puzzle amounts to this: if you have 20 natural numbers m_1, ... , m_{20}, how many ways can you write any natural number N as a finite ordered sum of these numbers? Call it F(N) and graph it. It oscillates! Why?

(We count ordered sums because the amino acids are stuck together in a linear way to form a protein.)

There’s a well-known way to write down a formula for F(N). It obeys a linear recurrence:

F(N) = F(N - m_1) + \cdots + F(N - m_{20})

and we can solve this using the ansatz

F(N) = x^N

Then the recurrence relation will hold if

x^N = x^{N - m_1} + x^{N - m_2} + \dots + x^{N - m_{20}}

for all N. But this is fairly easy to achieve! If m_{20} is the biggest mass, we just need this polynomial equation to hold:

x^{m_{20}} = x^{m_{20} - m_1} + x^{m_{20} - m_2} + \dots + 1

There will be a bunch of solutions, about m_{20} of them. (If there are repeated roots things get a bit more subtle, but let’s not worry about.) To get the actual formula for F(N) we need to find the right linear combination of functions x^N where x ranges over all the roots. That takes some work. Craciun and his collaborator Shane Hubler did that work.

But we can get a pretty good understanding with a lot less work. In particular, the root x with the largest magnitude will make x^N grow the fastest.

If you haven’t thought about this sort of recurrence relation it’s good to look at the simplest case, where we just have two masses m_1 = 1, m_2 = 2. Then the numbers F(N) are the Fibonacci numbers. I hope you know this: the Nth Fibonacci number is the number of ways to write N as the sum of an ordered list of 1’s and 2’s!

1

1+1,   2

1+1+1,   1+2,   2+1

1+1+1+1,   1+1+2,   1+2+1,   2+1+1,   2+2

If I drew edges between these sums in the right way, forming a ‘family tree’, you’d see the connection to Fibonacci’s original rabbit puzzle.

In this example the recurrence gives the polynomial equation

x^2 = x + 1

and the root with largest magnitude is the golden ratio:

\Phi = 1.6180339...

The other root is

1 - \Phi = -0.6180339...

With a little more work you get an explicit formula for the Fibonacci numbers in terms of the golden ratio:

\displaystyle{ F(N) = \frac{1}{\sqrt{5}} \left( \Phi^{N+1} - (1-\Phi)^{N+1} \right) }

But right now I’m more interested in the qualitative aspects! In this example both roots are real. The example from biology is different.

Puzzle 1. For which lists of natural numbers m_1 < \cdots < m_k are all the roots of

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1

real?

I don’t know the answer. But apparently this kind of polynomial equation always one root with the largest possible magnitude, which is real and has multiplicity one. I think it turns out that F(N) is asymptotically proportional to x^N where x is this root.

But in the case that’s relevant to biology, there’s also a pair of roots with the second largest magnitude, which are not real: they’re complex conjugates of each other. And these give rise to the oscillations!

For the masses of the 20 amino acids most common in life, the roots look like this:

The aqua root at right has the largest magnitude and gives the dominant contribution to the exponential growth of F(N). The red roots have the second largest magnitude. These give the main oscillations in F(N), which have period 14.28.

For the full story, read this:

• Shane Hubler and Gheorghe Craciun, Periodic patterns in distributions of peptide masses, BioSystems 109 (2012), 179–185.

Most of the pictures here are from this paper.

My main question is this:

Puzzle 2. Suppose we take many lists of natural numbers m_1 < \cdots < m_k and draw all the roots of the equations

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1

What pattern do we get in the complex plane?

I suspect that this picture is an approximation to the answer you’d get to Puzzle 2:

If you stare carefully at this picture, you’ll see some patterns, and I’m guessing those are hints of something very beautiful.

Earlier on this blog we looked at roots of polynomials whose coefficients are all 1 or -1:

The beauty of roots.

The pattern is very nice, and it repays deep mathematical study. Here it is, drawn by Sam Derbyshire:


But now we’re looking at polynomials where the leading coefficient is 1 and all the rest are -1 or 0. How does that change things? A lot, it seems!

By the way, the 20 amino acids we commonly see in biology have masses ranging between 57 and 186. It’s not really true that all their masses are different. Here are their masses:

57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186

I pretended that none of the masses m_i are equal in Puzzle 2, and I left out the fact that only about 1/9th of the coefficients of our polynomial are nonzero. This may affect the picture you get!