I’m going to try posting more short articles here. I often read papers and want to jot down a few notes about them. I’ve avoided doing that here, on the theory that articles here should be beautiful works of art. Now I’ll try being a bit more relaxed. I hope you ask questions, or fill in details I leave out.
My former grad student Mike Stay has become fascinated by the pi-calculus as model of computation. The more famous lambda-calculus is a simple model of functional programming: each expression in this calculus can be seen as a program, or a description of a function. The pi-calculus can do all the same things, but more: it models processes that can compute but also create channels and send messages to each other along these channels.
I’m trying to use network theory to understand the essence of biology in a very simplified way. I’m not convinced that the pi-calculus is the right formalism for this. But some people are trying to apply it to cell biology, so I want to learn what they’re doing.
I’m especially fascinated by this paper, and I’d like to understand it in detail:
• Davide Chiarugi, Pierpaolo Degano and Roberto Marangoni, A computational approach to the functional screening of genomes, PLoS Computational Biology, 28 September 2007.
They used a framework for describing computational networks, the “enhanced pi-calculus”, to help find a minimal set of genes required for the survival of a bacterial cell. If we knew this, it would help us make a guess for the genome of the Last Universal Ancestor. This is the most recent ancestor of all organisms now living on Earth—some single-celled guy, between 3.5 and 3.8 billion years old.
I’m not mainly interested in understanding the Last Universal Ancestor: I’m mainly interested in understanding a very simple form of life. So, I want to find what sort of equations they’re using to describe what happens in a cell… and how they’re converting their pi-calculus model to those equations. Unfortunately this paper doesn’t say, and it doesn’t seem like their code is open-source. But I’ll poke around.
I know how to use Petri nets to model chemical reactions. In their paper they mildly disparage these, and refer to other papers on this general issue. So, I should read that stuff. But I want to see whether Petri nets are merely inconvenient for them, or insufficiently powerful to describe their model.
I especially want to see if their model includes processes involving membranes, like the cell membrane, since Petri nets are not really suited to these. A eukaryotic cell has lots of organelles with membranes, so we will need some kind of ‘membrane calculus’ to understand it, and I’ve been gradually gearing up to understand that. But it’s possible their prokaryotic cell is just treated as a single bag of chemicals all of which can freely react with each other. Petri nets would suffice for this!
The π-calculus was designed to express, run, and reason about concurrent systems. These are abstract systems composed of processes, i.e., autonomous, independent processing units that run in parallel and eventually communicate, by exchanging messages through channels. A biochemical reaction between two metabolites, catalyzed by an enzyme, can be modeled in π-calculus as a communication. The two metabolites are represented by two processes, and, in our approach, the enzyme is modeled as the channel which permits the communication.
In addition to communications, the π-calculus also allows us to specify silent internal actions, used to model those activities of the cell, the details of which we are not interested in (e.g., the pure presence of a catalyst in a reaction, where it is not actively involved). The calculus has the means to express alternative behavior, when a metabolite can act in different possible manners: the way to follow is chosen according to a given probability distribution.
The main difference between the standard π-calculus and the enhanced version we used in this work is the notion of address. An address is a unique identifier of a process, totally transparent to the user, automatically assigned to all of its child subprocesses. This labeling technique helps in tracking the history of virtual metabolites and reasoning about computations, in a purely mechanical way. In particular, stochastic implementation or causality are kept implicit and are recovered as needed.
I would also like to understand all the chemical reactions that occur in their minimal model of a cell. They started by simulating an already known model, the ‘minimal gene set’ or ‘MGS’:
The MGS-prokaryote has been exhaustively described in the enhanced π-calculus. We represented the 237 genes, their relative products, and the metabolic pathways expressed and regulated by the genes, as the corresponding processes and channels. In particular: the glycolytic pathway, the pentose phosphate pathway, the pathways involved in nucleotide, aminoacids, coenzyme, lipids, and glycerol metabolism. Moreover, MGS genes encode for a set of membrane carriers for metabolite uptake, including the PTS carrier. We placed this virtual cell in an optimal virtual environment, in which all nutrients and water were available, and where no problems were present in eliminating waste substances.
However, they found that a cell described by this minimal gene set would soon die! So, they added more features until they got a cell that could survive.
This seems like a fun way to learn a manageable amount of biochemistry. I’m less interested in knowing all the molecules on a first-name basis than in getting a rough overview of what goes on the simplest organisms.
Many of the processes mentioned above seem adequately described by chemical reaction networks—or in other words, Petri nets. For example, here is the glycolytic pathway:
I didn’t know what the ‘PTS carrier’ is, but Graham Jones helped out, writing:
Wikipedia says the ‘phosphotransferase system or PTS, is a distinct method used by bacteria for sugar uptake where the source of energy is from phosphoenolpyruvate (PEP).’)
Pyruvate is then converted to acetate which, being a catabolite, can diffuse out of the cell. A transmembrane reduced-NAD dehydrogenase complex catalyzes the oxidation of reduced-NAD; this reaction is coupled with the synthesis of ATP through the ATP synthase/ATPase transmembrane system. This set of reactions enables the cell to manage its energetic metabolism.
The cell “imports” fatty acids, glycerol, and some other metabolites, e.g., choline, and uses them for the synthesis of triglycerides and phospholipids; these are essential components of the plasma membrane. (plasma membrane = cell membrane).
I guess that a Petri net could suffice, with some tricks to represent the environment outside the cell. But I think it would need to be stochastic, not just differential equations, because some processes would involve a single copy, or a tiny number of copies, of a molecule.
For more references and links to the pi-calculus, other process calculi and their applications to biology, see:
• Azimuth Library, Process calculus.