Category Theory for Better Spreadsheets

5 February, 2014

Since one goal of Azimuth is to connect mathematicians to projects that can more immediately help the world, I want to pass this on. It’s a press release put out by Jocelyn Paine, who has blogged about applied category theory on the n-Category Café. I think he’s a serious guy, so I hope we can help him out!

Spreadsheet researcher Jocelyn Ireson-Paine has launched an Indiegogo campaign to fund a project to make spreadsheets safer. It will show how to write spreadsheets that are easier to read and less error-prone than when written in Excel. This is important because spreadsheet errors have cost some companies millions of pounds, even causing resignations and share-price crashes. An error in one spreadsheet, an economic model written in 2010 by Harvard economists Carmen Reinhart and Kenneth Rogoff, has even been blamed for tax rises and public-sector cuts. If he gets funding, Jocelyn will re-engineer this spreadsheet. He hopes that, because of its notoriety, this will catch public attention.

Reinhart and Rogoff’s spreadsheet was part of a paper on the association between debt and economic growth. They concluded that in countries where debt exceeds 90% of gross domestic product, growth is notably lower. But in spring 2013, University of Massachusetts student Thomas Herndon found they had omitted data when calculating an average. Because their paper’s conclusion supported governments’ austerity programmes, much criticism followed. They even received hate email blaming them for tax rises and public-sector cuts.

Jocelyn said, “The error probably didn’t change the results much. But better software would have made the nature of the error clearer, as well as the economics calculations, thus averting ill-informed and hurtful media criticism. Indeed, it might have avoided the error altogether.”

Jocelyn’s project will use two ideas. One is “literate programming”. Normally, a programmer writes a program first, then adds comments explaining how it works. But in literate programming, the programmer becomes an essayist. He or she first writes the explanation, then inserts the calculations as if putting equations into a maths essay. In ordinary spreadsheets, you’re lucky to get any documentation at all; in literate spreadsheets, documentation comes first.

The other idea is “modularity”. This means building spreadsheets from self-contained parts which can be developed, tested, and documented independently of one another. This gives the spreadsheet’s author less to think about, making mistakes less likely. It also makes it easier to replace parts that do have mistakes.

Jocelyn has embodied these ideas in a piece of software named Excelsior. He said, “‘Excelsior’ means ‘higher’ in Latin, and ‘upwards!’ in Longfellow’s poem. I think of it as meaning ‘upwards from Excel’. In fact, though, it’s the name of a wonderful Oxford café where I used to work on my ideas.”

Jocelyn also wants to show how advanced maths benefits computing. Some of his inspiration came from a paper he found on a friend’s desk in the Oxford University Department of Computer Science. Written by professor Joseph Goguen, this used a branch of maths called category theory to elucidate what it means for something to be part of a system, and how the behaviour of a system arises from the behaviours of its parts. Jocelyn said, “The ideas in the paper were extremely general, applying to many different areas. And when you think of modules as parts, they even apply to spreadsheets. This shows the value of abstraction.”

For more

For pictures or more information, please contact Jocelyn Ireson-Paine:

Postal: 23 Stratfield Road, Oxford, OX2 7BG, UK.
Tel: 07768 534 091.

Campaign Web site:

Campaign blog:

Jocelyn’s bio:

Jocelyn’s personal website, for academic and general stuff:

Background information: links to all topics mentioned can be found at the end of Paine’s campaign text at

These include literate programming, modularity, the Reinhart-Rogoff spreadsheet, category theory, and many horror stories about the damage caused by spreadsheet errors.

The Selected Papers Network (Part 4)

29 July, 2013

guest post by Christopher Lee

In my last post, I outlined four aspects of walled gardens that make them very resistant to escape:

• walled gardens make individual choice irrelevant, by transferring control to the owner, and tying your one remaining option (to leave the container) to being locked out of your professional ecosystem;

• all competition is walled garden;

• walled garden competition is winner-take-all;

• even if the “good guys” win (build the biggest walled garden), they become “bad guys” (masters of the walled garden, whose interests become diametrically opposed to that of the people stuck in their walled garden).

To state the obvious: even if someone launched a new site with the perfect interface and features for an alternative system of peer review, it would probably starve to death both for lack of users and lack of impact. Even for the rare user who found the site and switched all his activity to it, he would have little or no impact because almost no one would see his reviews or papers. Indeed, even if the Open Science community launched dozens of sites exploring various useful new approaches for scientific communication, that might make Open Science’s prospects worse rather than better. Since each of these sites would in effect be a little walled garden (for reasons I outlined last time), their number and diversity would mainly serve to fragment the community (i.e. the membership and activity on each such site might be ten times less than it would have been if there were only a few such sites). When your strengths (diversity; lots of new ideas) act as weaknesses, you need a new strategy. is an attempt to an offer such a new strategy. It represents only about two weeks of development work by one person (me), and has only been up for about a month, so it can hardly be considered the last word in the manifold possibilities of this new strategy. However, this bare bones prototype demonstrates how we can solve the four ‘walled garden dilemmas’:

Enable walled-garden users to ‘levitate’—be ‘in’ the walled garden but ‘above’ it at the same time. There’s nothing mystical about this. Think about it: that’s what search engines do all the time—a search engine pulls material out of all the worlds’ walled gardens, and gives it a new life by unifying it based on what it’s about. All does is act as a search engine that indexes content by what paper and what topics it’s about, and who wrote it.

This enables isolated posts by different people to come together in a unified conversation about a specific paper (or topic), independent of what walled gardens they came from—while simultaneously carrying on their full, normal life in their original walled garden.

Concretely, rather than telling Google+ users (for example) they should stop posting on Google+ and post only on instead (which would make their initial audience plunge to near zero), we tell them to add a few tags to their Google+ post so can easily index it. They retain their full Google+ audience, but they acquire a whole new set of potential interactions and audience (trivial example: if they post on a given paper, will display their post next to other people’s posts on the same paper, resulting in all sorts of possible crosstalk).

Some people have expressed concern that indexes Google+, rightly pointing out that Google+ is yet another walled garden. Doesn’t that undercut our strategy to escape from walled gardens? No. Our strategy is not to try to find a container that is not a walled garden; our strategy is to ‘levitate’ content from walled gardens. Google+ may be a walled garden in some respects, but it allows us to index users’ content, which is all we need.

It should be equally obvious that should not limit itself to Google+. Indeed, why should a search engine restrict itself to anything less than the whole world? Of course, there’s a spectrum of different levels of technical challenges for doing this. And this tends to produce an 80-20 rule, where 80% of the value can be attained by only 20% of the work. Social networks like Google+, Twitter etc. provide a large portion of the value (potential users), for very little effort—they provide open APIs that let us search their indexes, very easily. Blogs represent another valuable category for indexing.

More to the point, far more important than technology is building a culture where users expect their content to ‘fly’ unrestricted by walled-garden boundaries, and adopt shared practices that make that happen easily and naturally. Tagging is a simple example of that. By putting the key metadata (paper ID, topic ID) into the user’s public content, in a simple, standard way (as opposed to hidden in the walled garden’s proprietary database), tagging makes it easy for anyone and everyone to index it. And the more users get accustomed to the freedom and benefits this provides, the less willing they’ll be to accept walled gardens’ trying to take ownership (ie. control) of the users’ own content.

Don’t compete; cooperate: if we admit that it will be extremely difficult for a small new site (like to compete with the big walled gardens that surround it, you might rightly ask, what options are left? Obviously, not to compete. But concretely, what would that mean?

☆ enable users in a walled garden to liberate their own content by tagging and indexing it;

☆ add value for those users (e.g. for mathematicians, give them LaTeX equation support);

☆ use the walled garden’s public channel as your network transport—i.e. build your community within and through the walled garden’s community.

This strategy treats the walled garden not as a competitor (to kill or be killed by) but instead as a partner (that provides value to you, and that you in turn add value to). Morever, since this cooperation is designed to be open and universal rather than an exclusive partnership (concretely, anyone could index posts, because they are public), we can best describe this as public data federation.

Any number of sites could cooperate in this way, simply by:

☆ sharing a common culture of standard tagging conventions;

☆ treating public data (i.e. viewable by anybody on the web) as public (i.e. indexable by anybody);

☆ drawing on the shared index of global content (i.e. when the index has content that’s relevant to your site’s users, let them see and interact with it).

To anyone used to the traditional challenges of software interoperability, this might seem like a tall order—it might take years of software development to build such a data federation. But consider: by using Google+’s open API, has de facto established such a data federation with Google+, one of the biggest players in the business. Following the checklist:

☆ offers a very simple tagging standard, and more and more Google+ users are trying it;

☆ Google+ provides the API that enables public posts to be searched and indexed. in turn assures that posts made on are visible to Google+ by simply posting them on Google+;

☆ users can see posts from (and have discussions with) Google+ users who have never logged into (or even heard of), and vice versa.

Now consider: what if someone set up their own site based on the open source code (or even wrote their own implementation of our protocol from scratch). What would they need to do to ensure 100% interoperability (i.e. our three federation requirements above) with Nothing. That federation interoperability is built into the protocol design itself. And since this is federation, that also means they’d have 100% interoperation with Google+ as well. We can easily do so also with Twitter, WordPress, and other public networks.

There are lots of relevant websites in this space. Which of them can we actually federate with in this way? This divides into two classes: those that have open APIs vs. those that don’t. If a walled garden has an API, you can typically federate with it simply by writing some code to use their API, and encouraging its users to start tagging. Everybody wins: the users gain new capabilities for free, and you’ve added value to that walled garden’s platform. For sites that lack such an API (typically smaller sites), you need more active cooperation to establish a data exchange protocol. For example, we are just starting discussions with arXiv and MathOverflow about such ‘federation’ data exchange.

To my mind, the most crucial aspect of this is sincerity: we truly wish to cooperate with (add value to) all these walled garden sites, not to compete with them (harm them). This isn’t some insidious commie plot to infiltrate and somehow destroy them. The bottom line is that websites will only join a federation if it benefits them, by making their site more useful and more attractive to users. Re-connecting with the rest of the world (in other walled gardens) accomplishes that in a very fundamental way. The only scenario I see where this would not seem advantageous, would be for a site that truly believes that it is going to achieve market dominance across this whole space (‘one walled garden to rule them all’). Looking over the landscape of players (big players like Google, Twitter, LinkedIn, Facebook, vs. little players focused on this space like Mendeley, ResearchGate, etc.), I don’t think any of the latter can claim this is a realistic plan—especially when you consider that any success in that direction will just make all other players federate together in self-defense.

Level the playing field: these considerations lead naturally to our third concern about walled gardens: walled garden competition strongly penalizes new, small players, and makes bigger players assume a winner-takes-all outcome. Concretely, (or any other new site) is puny compared with, say, Mendeley. However, the federation strategy allows us to turn that on its head. Mendeley is puny compared with Google+, and operates in de facto federation with Google+. How likely is it that Mendeley is going to crush Google+ as a social network where people discuss science? If a user could only post to other members (a small audience), then Mendeley wins by default. But that’s not how it works: a user has all of Google+ as his potential audience. In a federation strategy, the question isn’t how big you are, but rather how big your federation is. And in this day of open APIs, it is really easy to extend that de facto federation across a big fraction of the world’s social networks. And that is level playing field.

Provide no point of control: our last concern about walled gardens was that they inevitably create a divergence of interests for the winning garden’s owner vs. the users trapped inside. Hence the best of intentions (great ideas for building a wonderful community) can truly become the road to hell—an even better walled garden. After all, that’s how the current walled garden system evolved (from the reasonable and beneficial idea of establishing journals). If any one site ‘wins’, our troubles will just start all over again. Is there any alternative?

Yes: don’t let any one site win; only build a successful federation. Since user data can flow freely throughout the federation, users can move freely within the federation, without losing their content, accumulated contacts and reputation, in short, their professional ecosystem. If a successful site starts making policies that are detrimental to users, they can easily vote with their feet. The data federation re-establishes the basis for a free market, namely unconstrained individual freedom of choice.

The key is that there is no central point of control. No one ‘owns’ (i.e. controls) the data. It will be stored in many places. No one can decide to start denying it to someone else. Anyone can access the public data under the rules of the federation. Even if multiple major players conspired together, anyone else could set up an alternative site and appeal to users: vote with your feet! As we know from history, the problem with senates and other central control mechanisms is that given enough time and resources, they can be corrupted and captured by both elites and dictators. Only a federation system with no central point of control has a basic defense: regardless of what happens at ‘the top’, all individuals in the system have freedom of choice between many alternatives, and anybody can start a new alternative at any time. Indeed, the key red flag in any such system is when the powers-that-be start pushing all sorts of new rules that hinder people from starting new alternatives, or freely migrating to alternatives.

Note that implicit in this is an assertion that a healthy ecosystem should contain many diverse alternative sites that serve different subcommunities, united in a public data federation. I am not advocating that should become the ‘one paper index to rule them all’. Instead, I’m saying we need one successful exemplar of a federated system, that can help people see how to move their content beyond the walled garden and start ‘voting with their feet’.

So: how do we get there? In my view, we need to use to prove the viability of the federation model in two ways:

☆ we need to develop the interface to be a genuinely good way to discuss scientific papers, and subscribe to others’ recommendations. It goes without saying that the current interface needs lots of improvements, e.g. to work past some of Google+’s shortcomings. Given that the current interface took only a couple of weeks of hacking by just one developer (yours truly), this is eminently doable.

☆ we need to show that is not just a prisoner of Google+, but actually an open federation system, by adding other systems to the federation, such as Twitter and independent blogs. Again, this is straightforward.

To Be or Not To Be?

All of which brings us to the real question that will determine our fates. Are you for a public data federation, or not? In my
view, if you seriously want reform of the current walled garden
system, federation is the only path forward that is actually a path forward (instead of to just another walled garden). It is the only strategy that allows the community to retain control over its own content. That is fundamental.

And if you do want a public data federation, are you willing to
work for that outcome? If not, then I think you don’t really want it—because you can contribute very easily. Even just adding #spnetwork tags to your posts—wherever you write them—is a very valuable contribution that enormously increases the value of the federation ecosystem.

One more key question: who will join me in developing the platform (both the software, and federation alliances)? As long as is a one-man effort, it must fail. We don’t need a big team, but it’s time to turn the project into a real team. The project has solid foundations that will enable rapid development of new federation partnerships—e.g. exciting, open APIs like REST — and of seamless, intuitive user interfaces — such as the MongoDB noSQL database, and AJAX methods. A small, collaborative team will be able to push this system forward quickly in exciting, useful ways. If you jump in now, you can be one of the very first people on the team.

I want to make one more appeal. Whatever you think about as it exists today, forget about it.

Why? Because it’s irrelevant to the decision we need to make today: public data federation, yes or no? First, because the many flaws of the current have almost no bearing on that critical question (they mainly reflect the limitations of a version 0.1 alpha product). Second, because the whole point of federation is to ‘let a thousand flowers bloom’— to enable a diverse ecology of different tools and interfaces, made viable because they work together as a federation, rather than starving to death as separate, warring, walled gardens.

Of course, to get to that diverse, federated ecosystem, we first
have to prove that one federated system can succeed—and
liberate a bunch of minds in the process, starting with our own. We have to assemble a nucleus of users who are committed to making this idea succeed by using it, and a team of developers who are driven to build it. Remember, talking about the federation ideal will not by itself accomplish anything. We have to act, now; specifically, we have to quickly build a system that lets more and more people see the direct benefits of public data federation. If and when that is clearly successful, and growing sustainably, we can consider branching out, but not before.

For better or worse, in a world of walled gardens, is the one effort (in my limited knowledge) to do exactly that. It may be ugly, and annoying, and alpha, but it offers people a new and different kind of social contract than the walled gardens. (If someone can point me to an equivalent effort to implement the same public data federation strategy, we will of course be delighted to work with them! That’s what federation means).

The question now for the development of public data federation is whether we are working together to make it happen, or on the contrary whether we are fragmenting and diffusing our effort. I believe that public data federation is the Manhattan Project of the war for Open Science. It really could change the world in a fundamental and enduring way. Right now the world may seem headed the opposite direction (higher and higher walls), but it does not have to be that way. I believe that all of the required ingredients are demonstrably available and ready to go. The only remaining requirement is that we rise as a community and do it.

I am speaking to you, as one person to another. You as an individual do not even have the figleaf of saying “Well, if I do this, what’s the point? One person can’t have any impact.” You as an individual can change this project. You as an individual can change the world around you through what you do on this project.

Petri Net Programming (Part 3)

19 April, 2013

guest post by David Tanzer

The role of differential equations

Last time we looked at stochastic Petri nets, which use a random event model for the reactions. Individual entities are represented by tokens that flow through the network. When the token counts get large, we observed that they can be approximated by continuous quantities, which opens the door to the application of continuous mathematics to the analysis of network dynamics.

A key result of this approach is the “rate equation,” which gives a law of motion for the expected sizes of the populations. Equilibrium can then be obtained by solving for zero motion. The rate equations are applied in chemistry, where they give the rates of change of the concentrations of the various species in a reaction.

But before discussing the rate equation, here I will talk about the mathematical form of this law of motion, which consists of differential equations. This form is naturally associated with deterministic systems involving continuous magnitudes. This includes the equations of motion for the sine wave:

and the graceful ellipses that are traced out by the orbits of the planets around the sun:

This post provides some mathematical context to programmers who have not worked on scientific applications. My goal is to get as many of you on board as possible, before setting sail with Petri net programming.

Three approaches to equations: theoretical, formula-based, and computational

Let’s first consider the major approaches to equations in general. We’ll illustrate with a Diophantine equation

x^9 + y^9 + z^9 = 2

where x, y and z are integer variables.

In the theoretical approach (aka “qualitative analysis”), we start with the meaning of the equation and then proceed to reason about its solutions. Here are some simple consequences of this equation. They can’t all be zero, can’t all be positive, can’t all be negative, can’t all be even, and can’t all be odd.

In the formula-based approach, we seek formulas to describe the solutions. Here is an example of a formula (which does not solve our equation):

\{(x,y,z) | x = n^3, y = 2n - 4, z = 4 n | 1 \leq n \leq 5 \}

Such formulas are nice to have, but the pursuit of them is diabolically difficult. In fact, for Diophantine equations, even the question of whether an arbitrarily chosen equation has any solutions whatsoever has been proven to be algorithmically undecidable.

Finally, in the computational approach, we seek algorithms to enumerate or numerically approximate the solutions to the equations.

The three approaches to differential equations

Let’s apply the preceding classification to differential equations.

Theoretical approach

A differential equation is one that constrains the rates at which the variables are changing. This can include constraints on the rates at which the rates are changing (second-order equations), etc. The equation is ordinary if there is a single independent variable, such as time, otherwise it is partial.

Consider the equation stating that a variable increases at a rate equal to its current value. The bigger it gets, the faster it increases. Given a starting value, this determines a process — the solution to the equation — which here is exponential growth.

Let X(t) be the value at time t, and let’s initialize it to 1 at time 0. So we have:

X(0) = 1

X'(t) = X(t)

These are first-order equations, because the derivative is applied at most once to any variable. They are linear equations, because the terms on each side of the equations are linear combinations of either individual variables or derivatives (in this case all of the coefficients are 1). Note also that a system of differential equations may in general have zero, one, or multiple solutions. This example belongs to a class of equations which are proven to have a unique solution for each initial condition.

You could imagine more complex systems of equations, involving multiple dependent variables, all still depending on time. That includes the rate equations for a Petri net, which have one dependent variable for each of the population sizes. The ideas for such systems are an extension of the ideas for a single-variable system. Then, a state of the system is a vector of values, with one component for each of the dependent variables. For first-order systems, such as the rate equations, where the derivatives appear on the left-hand sides, the equations determine, for each possible state of the system, a “direction” and rate of change for the state of the system.

Now here is a simple illustration of what I called the theoretical approach. Can X(t) ever become negative? No, because it starts out positive at time 0, and in order to later become negative, it must be decreasing at a time t_1 when it is still positive. That is to say, X(t_1) > 0, and X'(t_1) < 0. But that contradicts the assumption X'(t) = X(t). The general lesson here is that we don’t need a solution formula in order to make such inferences.

For the rate equations, the theoretical approach leads to substantial theorems about the existence and structure of equilibrium solutions.

Formula-based approach

It is natural to look for concise formulas to solve our equations, but the results of this overall quest are largely negative. The exponential differential equation cannot be solved by any formula that involves a finite combination of simple operations. So the solution function must be treated as a new primitive, and given a name, say \exp(t). But even when we extend our language to include this new symbol, there are many differential equations that remain beyond the reach of finite formulas. So an endless collection of primitive functions is called for. (As standard practice, we always include exp(t), and its complex extensions to the trigonometric functions, as primitives in our toolbox.)

But the hard mathematical reality does not end here, because even when solution formulas do exist, finding them may call for an ace detective. Only for certain classes of differential equations, such as the linear ones, do we have systematic solution methods.

The picture changes, however, if we let the formulas contain an infinite number of operations. Then the arithmetic operators give a far-reaching base for defining new functions. In fact, as you can verify, the power series

X(t) = 1 + t + t^2/2! + t^3/3! + ...

which we view as an “infinite polynomial” over the time parameter t, exactly satisfies our equations for exponential motion, X(0) = 1 and X'(t) = X(t). This power series therefore defines \exp(t). By the way, applying it to the input 1 produces a definition for the transcendental number e:

e = X(1) = 1 + 1 + 1/2 + 1/6 + 1/24 + 1/120 + ... \approx 2.71828

Computational approach

Let’s leave aside our troubles with formulas, and consider the computational approach. For broad classes of differential equations, there are approximation algorithms that be successfully applied.

For starters, any power series that satisfies a differential equation may work for a simple approximation method. If a series is known to converge over some range of inputs, then one can approximate the value at those points by stopping the computation after a finite number of terms.

But the standard methods work directly with the equations, provided that they can be put into the right form. The simplest one is called Euler’s method. It works over a sampling grid of points separated by some small number \epsilon. Let’s take the case where we have a first-order equation in explicit form, which means that X'(t) = f(X(t)) for some function f.

We begin with the initial value X(0). Applying f, we get X'(0) = f(X(0)). Then for the interval from 0 to \epsilon,we use a linear approximation for X(t), by assuming that the derivative remains constant at X'(0). That gives X(\epsilon) = X(0) + \epsilon \cdot X'(0). Next, X'(\epsilon) = f(X(\epsilon)), and X(2 \epsilon) = X(\epsilon) + \epsilon \cdot X'(\epsilon), etc. Formally,

X(0) = \textrm{initial}

X((n+1) \epsilon) = X(n \epsilon) + \epsilon f(X(n \epsilon))

Applying this to our exponential equation, where f(X(t)) = X(t), we get:

X(0) = 1

X((n+1) \epsilon) = X(n \epsilon) + \epsilon X(n \epsilon) = X(n \epsilon) (1 + \epsilon)


X(n \epsilon) = (1 + \epsilon) ^ n

So the approximation method gives a discrete exponential growth, which converges to a continuous exponential in the limit as the mesh size goes to zero.

Note, the case we just considered has more generality than might appear at first, because (1) the ideas here are easily extended to systems of explicit first order equations, and (2) higher-order equations that are “explicit” in an extended sense—meaning that the highest-order derivative is expressed as a function of time, of the variable, and of the lower-order derivatives—can be converted into an equivalent system of explicit first-order equations.

The challenging world of differential equations

So, is our cup half-empty or half-full? We have no toolbox of primitive formulas for building the solutions to all differential equations by finite compositions. And even for those which can be solved by formulas, there is no general method for finding the solutions. That is how the cookie crumbles. But on the positive side, there is an array of theoretical tools for analyzing and solving important classes of differential equations, and numerical methods can be applied in many cases.

The study of differential equations leads to some challenging problems, such as the Navier-Stokes equations, which describe the flow of fluids.

These are partial differential equations involving flow velocity, pressure, density and external forces (such as gravity), all of which vary over space and time. There are non-linear (multiplicative) interactions between these variables and their spatial and temporal derivatives, which leads to complexity in the solutions.

At high flow rates, this complexity can produce chaotic solutions, which involve complex behavior at a wide range of resolution scales. This is turbulence. Here is an insightful portrait of turbulence, by Leonardo da Vinci, whose studies in turbulence date back to the 15th Century.

Turbulence, which has been described by Richard Feynman as the most important unsolved problem of classical physics, also presents a mathematical puzzle. The general existence of solutions to the Navier-Stokes equations remains unsettled. This is one of the “Millennium Prize Problems”, for which a one million dollar prize is offered: in three dimensions, given initial values for the velocity and scalar fields, does there exist a solution that is smooth and everywhere defined? There are also complications with grid-based numerical methods, which will fail to produce globally accurate results if the solutions contain details at a smaller scale than the grid mesh. So the ubiquitous phenomenon of turbulence, which is so basic to the movements of the atmosphere and the seas, remains an open case.

But fortunately we have enough traction with differential equations to proceed directly with the rate equations for Petri nets. There we will find illuminating equations, which are the subject of both major theorems and open problems. They are non-linear and intractable by formula-based methods, yet, as we will see, they are well handled by numerical methods.

Petri Net Programming (Part 2)

20 December, 2012

guest post by David A. Tanzer

An introduction to stochastic Petri nets

In the previous article, I explored a simple computational model called Petri nets. They are used to model reaction networks, and have applications in a wide variety of fields, including population ecology, gene regulatory networks, and chemical reaction networks. I presented a simulator program for Petri nets, but it had an important limitation: the model and the simulator contain no notion of the rates of the reactions. But these rates critically determine the character of the dynamics of network.

Here I will introduce the topic of ‘stochastic Petri nets,’ which extends the basic model to include reaction dynamics. Stochastic means random, and it is presumed that there is an underlying random process that drives the reaction events. This topic is rich in both its mathematical foundations and its practical applications. A direct application of the theory yields the rate equation for chemical reactions, which is a cornerstone of chemical reaction theory. The theory also gives algorithms for analyzing and simulating Petri nets.

We are now entering the ‘business’ of software development for applications to science. The business logic here is nothing but math and science itself. Our study of this logic is not an academic exercise that is tangential to the implementation effort. Rather, it is the first phase of a complete software development process for scientific programming applications.

The end goals of this series are to develop working code to analyze and simulate Petri nets, and to apply these tools to informative case studies. But we have some work to do en route, because we need to truly understand the models in order to properly interpret the algorithms. The key questions here are when, why, and to what extent the algorithms give results that are empirically predictive. We will therefore be embarking on some exploratory adventures into the relevant theoretical foundations.

The overarching subject area to which stochastic Petri nets belong has been described as stochastic mechanics in the network theory series here on Azimuth. The theme development here will partly parallel that of the network theory series, but with a different focus, since I am addressing a computationally oriented reader. For an excellent text on the foundations and applications of stochastic mechanics, see:

• Darren Wilkinson, Stochastic Modelling for Systems Biology, Chapman and Hall/CRC Press, Boca Raton, Florida, 2011.

Review of basic Petri nets

A Petri net is a graph with two kinds of nodes: species and transitions. The net is populated with a collection of ‘tokens’ that represent individual entities. Each token is attached to one of the species nodes, and this attachment indicates the type of the token. We may therefore view a species node as a container that holds all of the tokens of a given type.

The transitions represent conversion reactions between the tokens. Each transition is ‘wired’ to a collection of input species-containers, and to a collection of output containers. When it ‘fires’, it removes one token from each input container, and deposits one token to each output container.

Here is the example we gave, for a simplistic model of the formation and dissociation of H2O molecules:

The circles are for species, and the boxes are for transitions.

The transition combine takes in two H tokens and one O token, and outputs one H2O token. The reverse transition is split, which takes in one H2O, and outputs two H’s and one O.

An important application of Petri nets is to the modeling of biochemical reaction networks, which include the gene regulatory networks. Since genes and enzymes are molecules, and their binding interactions are chemical reactions, the Petri net model is directly applicable. For example, consider a transition that inputs one gene G, one enzyme E, and outputs the molecular form G • E in which E is bound to a particular site on G.

Applications of Petri nets may differ widely in terms of the population sizes involved in the model. In general chemistry reactions, the populations are measured in units of moles (where a mole is ‘Avogadro’s number’ 6.022 · 1023 entities). In gene regulatory networks, on the other hand, there may only be a handful of genes and enzymes involved in a reaction.

This difference in scale leads to a qualitative difference in the modelling. With small population sizes, the stochastic effects will predominate, but with large populations, a continuous, deterministic, average-based approximation can be used.

Representing Petri nets by reaction formulas

Petri nets can also be represented by formulas used for chemical reaction networks. Here is the formula for the Petri net shown above:

H2O ↔ H + H + O

or the more compact:

H2O ↔ 2 H + O

The double arrow is a compact designation for two separate reactions, which happen to be opposites of each other.

By the way, this reaction is not physically realistic, because one doesn’t find isolated H and O atoms traveling around and meeting up to form water molecules. This is the actual reaction pair that predominates in water:

2 H2O ↔ OH + H3O+

Here, a hydrogen nucleus H+, with one unit of positive charge, gets removed from one of the H2O molecules, leaving behind the hydroxide ion OH. In the same stroke, this H+ gets re-attached to the other H2O molecule, which thereby becomes a hydronium ion, H3O+.

For a more detailed example, consider this reaction chain, which is of concern to the ocean environment:

CO2 + H2O ↔ H2CO3 ↔ H+ + HCO3

This shows the formation of carbonic acid, namely H2CO3, from water and carbon dioxide. The next reaction represents the splitting of carbonic acid into a hydrogen ion and a negatively charged bicarbonate ion, HCO3. There is a further reaction, in which a bicarbonate ion further ionizes into an H+ and a doubly negative carbonate ion CO32-. As the diagram indicates, for each of these reactions, a reverse reaction is also present. For a more detailed description of this reaction network, see:

• Stephen E. Bialkowski, Carbon dioxide and carbonic acid.

Increased levels of CO2 in the atmosphere will change the balance of these reactions, leading to a higher concentration of hydrogen ions in the water, i.e., a more acidic ocean. This is of concern because the metabolic processes of aquatic organisms is sensitive to the pH level of the water. The ultimate concern is that entire food chains could be disrupted, if some of the organisms cannot survive in a higher pH environment. See the Wikipedia page on ocean acidification for more information.

Exercise. Draw Petri net diagrams for these reaction networks.

Motivation for the study of Petri net dynamics

The relative rates of the various reactions in a network critically determine the qualitative dynamics of the network as a whole. This is because the reactions are ‘competing’ with each other, and so their relative rates determine the direction in which the state of the system is changing. For instance, if molecules are breaking down faster then they are being formed, then the system is moving towards full dissociation. When the rates are equal, the processes balance out, and the system is in an equilibrium state. Then, there are only temporary fluctuations around the equilibrium conditions.

The rate of the reactions will depend on the number of tokens present in the system. For example, if any of the input tokens are zero, then the transition can’t fire, and so its rate must be zero. More generally, when there are few input tokens available, there will be fewer reaction events, and so the firing rates will be lower.

Given a specification for the rates in a reaction network, we can then pose the following kinds of questions about its dynamics:

• Does the network have an equilibrium state?

• If so, what are the concentrations of the species at equilibrium?

• How quickly does it approach the equilibrium?

• At the equilibrium state, there will still be temporary fluctuations around the equilibrium concentrations. What are the variances of these fluctuations?

• Are there modes in which the network will oscillate between states?

This is the grail we seek.

Aside from actually performing empirical experiments, such questions can be addressed either analytically or through simulation methods. In either case, our first step is to define a theoretical model for the dynamics of a Petri net.

Stochastic Petri nets

A stochastic Petri net (with kinetics) is a Petri net that is augmented with a specification for the reaction dynamics. It is defined by the following:

• An underlying Petri net, which consists of species, transitions, an input map, and an output map. These maps assign to each transition a multiset of species. (Multiset means that duplicates are allowed.) Recall that the state of the net is defined by a marking function, that maps each species to its population count.

• A rate constant that is associated with each transition.

• A kinetic model, that gives the expected firing rate for each transition as a function of the current marking. Normally, this kinetic function will include the rate constant as a multiplicative factor.

A further ‘sanity constraint’ can be put on the kinetic function for a transition: it should give a positive value if and only if all of its inputs are positive.

• A stochastic model, which defines the probability distribution of the time intervals between firing events. This specific distribution of the firing intervals for a transition will be a function of the expected firing rate in the current marking.

This definition is based on the standard treatments found, for example in:

• M. Ajmone Marsan, Stochastic Petri nets: an elementary introduction, in Advances in Petri Nets, Springer, Berlin, 1989, 1–23.

or Wilkinson’s book mentioned above. I have also added an explicit mention of the kinetic model, based on the ‘kinetics’ described in here:

• Martin Feinberg, Lectures on chemical reaction networks.

There is an implied random process that drives the reaction events. A classical random process is given by a container with ‘particles’ that are randomly traveling around, bouncing off the walls, and colliding with each other. This is the general idea behind Brownian motion. It is called a random process because the outcome results from an ‘experiment’ that is not fully determined by the input specification. In this experiment, you pour in the ingredients (particles of different types), set the temperature (the distributions of the velocities), give it a stir, and then see what happens. The outcome consists of the paths taken by each of the particles.

In an important limiting case, the stochastic behavior becomes deterministic, and the population sizes become continuous. To see this, consider a graph of population sizes over time. With larger population sizes, the relative jumps caused by the firing of individual transitions become smaller, and graphs look more like continuous curves. In the limit, we obtain an approximation for high population counts, in which the graphs are continuous curves, and the concentrations are treated as continuous magnitudes. In a similar way, a pitcher of sugar can be approximately viewed as a continuous fluid.

This simplification permits the application of continuous mathematics to study of reaction network processes. It leads to the basic rate equation for reaction networks, which specifies the direction of change of the system as a function of the current state of the system.

In this article we will be exploring this continuous deterministic formulation of Petri nets, under what is known as the mass action kinetics. This kinetics is one implementation of the general specification of a kinetic model, as defined above. This means that it will define the expected firing rate of each transition, in a given marking of the net. The probabilistic variations in the spacing of the reactions—around the mean given by the expected firing rate—is part of the stochastic dynamics, and will be addressed in a subsequent article.

The mass-action kinetics

Under the mass action kinetics, the expected firing rate of a transition is proportional to the product of the concentrations of its input species. For instance, if the reaction were A + C → D, then the firing rate would be proportional to the concentration of A times the concentration of C, and if the reaction were A + A → D, it would be proportional to the square of the concentration of A.

This principle is explained by Feinberg as follows:

For the reaction A+C → D, an occurrence requires that a molecule of A meet a molecule of C in the reaction, and we take the probability of such an encounter to be proportional to the product [of the concentrations of A and C]. Although we do not presume that every such encounter yields a molecule of D, we nevertheless take the occurrence rate of A+C → D to be governed by [the product of the concentrations].

For an in-depth proof of the mass action law, see this article:

• Daniel Gillespie, A rigorous definition of the chemical master equation, 1992.

Note that we can easily pass back and forth between speaking of the population counts for the species, and the concentrations of the species, which is just the population count divided by the total volume V of the system. The mass action law applies to both cases, the only difference being that the constant factors of (1/V) used for concentrations will get absorbed into the rate constants.

The mass action kinetics is a basic law of empirical chemistry. But there are limits to its validity. First, as indicated in the proof in the Gillespie, the mass action law rests on the assumptions that the system is well-stirred and in thermal equilibrium. Further limits are discussed here:

• Georg Job and Regina Ruffler, Physical Chemistry (first five chapters), Section 5.2, 2010.

They write:

…precise measurements show that the relation above is not strictly adhered to. At higher concentrations, values depart quite noticeably from this relation. If we gradually move to lower concentrations, the differences become smaller. The equation here expresses a so-called “limiting law“ which strictly applies only when c → 0.

In practice, this relation serves as a useful approximation up to rather high concentrations. In the case of electrically neutral substances, deviations are only noticeable above 100 mol m−3. For ions, deviations become observable above 1 mol m−3, but they are so small that they are easily neglected if accuracy is not of prime concern.

Why would the mass action kinetics break down at high concentrations? According to the book quoted, it is due to “molecular and ionic interactions.” I haven’t yet found a more detailed explanation, but here is my supposition about what is meant by molecular interactions in this context. Doubling the number of A molecules doubles the number of expected collisions between A and C molecules, but it also reduces the probability that any given A and C molecules that are within reacting distance will actually react. The reaction probability is reduced because the A molecules are ‘competing’ for reactions with the C molecules. With more A molecules, it becomes more likely that a C molecule will simultaneously be within reacting distance of several A molecules; each of these A molecules reduces the probability that the other A molecules will react with the C molecule. This is most pronounced when the concentrations in a gas get high enough that the molecules start to pack together to form a liquid.

The equilibrium relation for a pair of opposite reactions

Suppose we have two opposite reactions:

T: A + B \stackrel{u}{\longrightarrow} C + D

T': C + D \stackrel{v}{\longrightarrow} A + B

Since the reactions have exactly opposite effects on the population sizes, in order for the population sizes to be in a stable equilibrium, the expected firing rates of T and T' must be equal:

\mathrm{rate}(T') = \mathrm{rate}(T)

By mass action kinetics:

\mathrm{rate}(T) = u [A] [B]

\mathrm{rate}(T') = v [C] [D]

where [X] means the concentration of X.

Hence at equilibrium:

u [A] [B] = v [C] [D]


\displaystyle{ \frac{[A][B]}{[C][D]} = \frac{v}{u} = K }

where K is the equilibrium constant for the reaction pair.

Equilibrium solution for the formation and dissociation of a diatomic molecule

Let A be some type of atom, and let D = A2 be the diatomic form of A. Then consider the opposite reactions:

A + A \stackrel{u}{\longrightarrow} D

D \stackrel{v}{\longrightarrow} A + A

From the preceding analysis, at equilibrium the following relation holds:

u [A]^2 = v [D]

Let N(A) and N(B) be the population counts for A and B, and let

N = N(A) + 2 N(D)

be the total number of units of A in the system, whether they be in the form of atoms or diatoms.

The value of N is an invariant property of the system. The reactions cannot change it, because they are just shuffling the units of A from one arrangement to the other. By way of contrast, N(A) is not an invariant quantity.

Dividing this equation by the total volume V, we get:

[N] = [A] + 2 [D]

where [N] is the concentration of the units of A.

Given a fixed value for [N] and the rate constants u and v, we can then solve for the concentrations at equilibrium:

\displaystyle{u [A]^2 = v [D] = v ([N] - [A]) / 2 }

\displaystyle{2 u [A]^2 + v [A] - v [N] = 0 }

\displaystyle{[A] = (-v \pm \sqrt{v^2 + 8 u v [N]}) / 4 u }

Since [A] can’t be negative, only the positive square root is valid.

Here is the solution for the case where u = v = 1:

\displaystyle{[A] = (\sqrt{8 [N] + 1} - 1) / 4 }

\displaystyle{[D] = ([N] - [A]) / 2 }


We’ve covered a lot of ground, starting with the introduction of the stochastic Petri net model, followed by a general discussion of reaction network dynamics, the mass action laws, and calculating equilibrium solutions for simple reaction networks.

We still have a number of topics to cover on our journey into the foundations, before being able to write informed programs to solve problems with stochastic Petri nets. Upcoming topics are (1) the deterministic rate equation for general reaction networks and its application to finding equilibrium solutions, and (2) an exploration of the stochastic dynamics of a Petri net. These are the themes that will support our upcoming software development.

Petri Net Programming (Part 1)

1 October, 2012

guest post by David Tanzer

Petri nets are a simple model of computation with a range of modelling applications that include chemical reaction networks, manufacturing processes, and population biology dynamics. They are graphs through which entities flow and have their types transformed. They also have far-reaching mathematical properties which are the subject of an extensive literature. See the network theory series here on Azimuth for a panoramic and well-diagrammed introduction to the theory and applications of Petri nets.

In this article, I will introduce Petri nets and give an expository presentation of a program to simulate them. I am hoping that this will be of interest to anyone who wants to see how scientific concepts can be formulated and actualized in computer software. Here the simulator program will be applied to a “toy model” of a chemical reaction network for the synthesis and dissociation of H2O molecules. The source code for this program should give the reader some “clay” to work with.

This material will prepare you for the further topic of stochastic Petri nets, where the sequencing of the Petri net is probabilistically controlled.

Definition of Petri nets

A Petri net is a diagram with two kinds of nodes: container nodes, called species (or “places”, “states”), which can hold zero or more tokens, and transition nodes, which are “wired” to some containers called its inputs and some called its outputs. A transition can have multiple input or output connections to a single container.

When a transition fires, it removes one token from each input container, and adds one token to each output container. When there are multiple inputs or outputs to the same container, then that many tokens get removed or added.

The total state of the Petri net is described by a labelling function which maps each container to the number of tokens it holds. A transition is enabled to fire in a labelling if there are enough tokens at its input containers. If no transitions are enabled, the it is halted.

The sequencing is non-deterministic, because in a given labelling there may be several transitions that are enabled. Dataflow arises whenever one transition sends tokens to a container that is read by another transition.

Petri nets represent entity conversion networks, which consist of entities of different species, along with reactions that convert input entities to output entities. Each entity is symbolized by a token in the net, and all the tokens for a species are grouped into an associated container. The conversion of entities is represented by the transitions that transform tokens.

Example 1: Disease processes

Here’s an example discussed earlier on Azimuth. It describes the virus that causes AIDS. The species are healthy cell, infected cell, and virion (the technical term for an individual virus). The transitions are for infection, production of healthy cells, reproduction of virions within an infected cell, death of healthy cells, death of infected cells, and death of virions.

Here the species are yellow circles and the transitions are aqua squares. Note that there are three transitions called “death” and two called “production.” They are disambiguated by a Greek letter suffix.

production (\alpha) describes the birth of one healthy cell, so it has no input and one healthy as output.

death (\gamma) has one healthy as input and no output.

death (\delta) has one infected as input and no output.

death (\zeta) has one virion as input and no output.

infection (\beta) takes one healthy and one virion as input, and has one infected cell as output.

production (\epsilon) describes the reproduction of the virus in an infected cell, so it has one infected as input, and one infected and one virion as output.

Example 2: Population dynamics

Here the tokens represent organisms, and the species are biological species. This simple example involving two species, wolf and rabbit:

There are three transitions: birth, which inputs one rabbit and outputs rabbit + rabbit like asexual reproduction), predation, which converts rabbit plus wolf to wolf plus wolf, and death, which inputs one wolf and outputs nothing.

Example 3: Chemical reaction networks

Here the entities are chemical units: molecules, isolated atoms, or ions. Each chemical unit is represented by a token in the net, and a container holds all of the tokens for a chemical species. Chemical reactions are then modeled as transitions that consume input tokens and generate output tokens.

We will be using the following simplified model for water formation and dissociation:

• The species are H, O, and H2O.

• Transition combine inputs two H atoms and one O atom, and outputs an H2O molecule.

• Transition split is the reverse process, which inputs one H2 and outputs two H and one O.

Note that the model is intended to show the idea of chemical reaction Petri nets, so it is not physically realistic. The actual reactions involve H2 and O2, and there are intermediate transitions as well. For more details, see Part 3 of the network theory series.

A Petri net simulator

The following Python script will simulate a Petri net, using parameters that describe the species, the transitions, and the initial labelling. It will run the net for a specified number of steps. At each step it chooses randomly among the enabled transitions, fires it, and prints the new labelling on the console.


Here is the first Petri net simulator.

Running the script

The model parameters are already coded into the script. So let’s give it a whirl:


This produced the output:

    H, O, H2O, Transition
    5, 3, 4, split
    7, 4, 3, split
    9, 5, 2, combine
    7, 4, 3, combine
    5, 3, 4, split
    7, 4, 3, split
    9, 5, 2, split
    11, 6, 1, combine
    9, 5, 2, split
    11, 6, 1, split
    13, 7, 0, ...

We started out in a state with 5 H’s, 3 O’s, and 4 H2O’s, then a split took place, which increased H by 2, increased O by 1, and decreased H2O by one, then…

Running it again gives a different execution sequence.

Software structures used in the program

Before performing a full dissection of the code, I’d like to make a digression to discuss some of the basic software constructs that are used in this program. This is directed in particular to those of you who are coming from the science side, and are interested in learning more about programming.

This program exercises a few of the basic mechanisms of object-oriented and functional programming. The Petri net logic is bundled into the definition of a single class called PetriNet, and each PetriNet object is an instance of the class. This logic is grouped into methods, which are functions whose first parameter is bound to an instance of the class.

For example, here is the method IsHalted of the class PetriNet:

    def IsHalted(this):
        return len(this.EnabledTransitions()) == 0

The first parameter, conventionally called “this” or “self,” refers to the PetriNet class instance. len returns the length of a list, and EnabledTransitions is a method that returns the list of transitions enabled in the current labelling.

Here is the syntax for using the method:

    if petriNetInstance.IsHalted(): ...

If it were the case that IsHalted took additional parameters, the definition would look like this:

    def IsHalted(this, arg1, ..., argn):

and the call would look like this:

    if petriNetInstance.IsHalted(arg1, ..., argn)

Here is the definition of EnabledTransitions, which shows a basic element of functional programming:

    def EnabledTransitions(this):
        return filter(lambda transition: transition.IsEnabled(this.labelling), this.transitions)

A PetriNet instance holds this list of Transition objects, called this.transitions. The expression “lambda: transition …” is an anonymous function that maps a Transition object to the boolean result returned by calling the IsEnabled method on that transition, given the current labelling this.labelling. The filter function takes an anonymous boolean-valued function and a list of objects, and returns the sublist consisting of those objects that satisfy this predicate.

The program also gives an example of class inheritance, because the class PetriNet inherits all fields and methods from a base class called PetriNetBase.

Python as a language for pedagogical and scientific programming

We continue with our digression, to talk now about the choice of language. With something as fundamental to our thinking as a language, it’s easy to see how the topic of language choice tends to become partisan. Imagine, for example, a debate between different nationalities, about which language was more beautiful, logical, etc. Here the issue is put into perspective by David Tweed from the Azimuth project:

Programming languages are a lot like “motor vehicles”: a family car has different trade-offs to a sports car to a small van to a big truck to a motorbike to …. Each of these has their own niche where they make the most sense to use.

The Petri net simulator which I wrote in Python, and which will soon to be dissected, could have been equivalently written in any modern object-oriented programming language, such as C++, Java or C#. I chose Python for the following reasons. First, it is a scripting language. This means that the casual user need not get involved with a compilation step that is preparatory to running the program. Second, it does provide the abstraction capabilities needed to express the Petri net model. Third, I like the way that the syntax rolls out. This syntax has been described as executable pseudo-code. Finally, the language has a medium-sized user-base and support community.

Python is good for proof of concept programs, and it works well as a pedagogical programming language. Yet it is also practical. It has been widely adopted in the field of scientific programming. David Tweed explains it in these terms:

I think a big part of Python’s use comes from an the way that a lot of scientific programming is as much about “management” — parsing files of prexisting structured data, iterating over data, controlling network connections, calling C code, launching subprograms, etc — as much as “numeric computation”. This is where Python is particularly good, and it’s now acquired the NumPy & SciPy extensions to speed up the numerical programming elements, but it’s primarily the higher level elements that make it attractive in science.

Because variables in Python are neither declared in the program text, nor inferred by the byte-code compiler, the types of the variables are known only at run time. This has a negative impact on performance for data intensive calculations: rather than having the compiler generate the right code for the data type that is being processed, the types need to be checked at run time. The NumPy and SciPy libraries address this by providing bulk array operations, e.g., addition of two matrices, in a native code library that is integrated with the Python runtime environment. If this framework does not suffice for your high-performance numerical application, then you will have to turn to other languages, notably, C++.

All this being said, it is now time to return to the main topic of this article, which is the content of Petri net programming.

Top-level structure of the Petri net simulator script

At a high level, the script constructs a Petri net, constructs the initial labelling, and then runs the simulation for a given number of steps.

First construct the Petri net:

    # combine: 2H + 1O -> 1H2O
    combineSpec = ("combine", [["H",2],["O",1]], [["H2O",1]])

    # split: 1H2O -> 2H + 1O 
    splitSpec = ("split", [["H2O",1]], [["H",2],["O",1]])

    petriNet = PetriNet(
        ["H","O","H2O"],         # species
        [combineSpec,splitSpec]  # transitions

Then establish the initial labelling:

    initialLabelling = {"H":5, "O":3, "H2O":4}

Then run it for twenty steps:

    steps = 20
    petriNet.RunSimulation(steps, initialLabelling)

Program code

Species will be represented simply by their names, i.e., as strings. PetriNet and Transition will be defined as object classes. Each PetriNet instance holds a list of species names, a list of transition names, a list of Transition objects, and the current labelling. The labelling is a dictionary from species name to token count. Each Transition object contains an input dictionary and an input dictionary. The input dictionary map the name of a species to the number of times that the transition takes it as input, and similarly for the output dictionary.

Class PetriNet

The PetriNet class has a top-level method called RunSimulation, which makes repeated calls to FireOneRule. FireOneRule obtains the list of enabled transitions, chooses one randomly, and fires it. This is accomplished by the method SelectRandom, which uses a random integer between 1 and N to choose a transition from the list of enabled transitions.

    class PetriNet(PetriNetBase):
        def RunSimulation(this, iterations, initialLabelling): 
            this.PrintHeader()  # prints e.g. "H, O, H2O"
            this.labelling = initialLabelling
            this.PrintLabelling() # prints e.g. "3, 5, 2" 

            for i in range(iterations):
               if this.IsHalted():
                   print "halted"
            print "iterations completed" 
        def EnabledTransitions(this):
            return filter(lambda transition: transition.IsEnabled(this.labelling), this.transitions)
        def IsHalted(this):
            return len(this.EnabledTransitions()) == 0
        def FireOneRule(this):
            this.SelectRandom(this.EnabledTransitions()).Fire (this.labelling)

        def SelectRandom(this, items):
            randomIndex = randrange(len(items))
            return items[randomIndex]
Class Transition

The Transition class exposes two key methods. IsEnabled takes a labeling as parameter, and returns a boolean saying whether the transition can fire. This is determined by comparing the input map for the transition with the token counts in the labeling, to see if there is sufficient tokens for it to fire. The Fire method takes a labelling in, and updates the counts in it to reflect the action of removing input tokens and creating output tokens.

    class Transition:
        # Fields:
        # transitionName
        # inputMap: speciesName -&gt; inputCount
        # outputMap: speciesName -&gt; outputCount 
        # constructor
        def __init__(this, transitionName):
           this.transitionName = transitionName
           this.inputMap = {} 
           this.outputMap = {} 
        def IsEnabled(this, labelling):
           for inputSpecies in this.inputMap.keys():
              if labelling[inputSpecies] &lt; this.inputMap[inputSpecies]: 
                  return False  # not enough tokens
           return True # good to go 
        def Fire(this, labelling):
           print this.transitionName
           for inputName in this.inputMap.keys():
              labelling[inputName] = labelling[inputName] - this.inputMap[inputName] 
           for outputName in this.outputMap.keys():
              labelling[outputName] = labelling[outputName] + this.outputMap[outputName] 
Class PetriNetBase

Notice that the class line for PetriNet declares that it inherits from a base class PetriNetBase. The base class contains utility methods that support PetriNet: PrintHeader, PrintLabelling, SelectRandom, and the constructor, which converts the transition specifications into Transition objects.

    class PetriNetBase:
        # Fields:
        # speciesNames
        # Transition list
        # labelling: speciesName -&gt; token count 
        # constructor
        def __init__(this, speciesNames, transitionSpecs):
            this.speciesNames = speciesNames
            this.transitions = this.BuildTransitions(transitionSpecs)
        def BuildTransitions(this, transitionSpecs):
            transitions = []
            for (transitionName, inputSpecs, outputSpecs) in transitionSpecs:
                transition = Transition(transitionName)
                for degreeSpec in inputSpecs:
                    this.SetDegree(transition.inputMap, degreeSpec)
                for degreeSpec in outputSpecs:
                    this.SetDegree(transition.outputMap, degreeSpec)
            return transitions 
        def SetDegree(this, dictionary, degreeSpec):
            speciesName = degreeSpec[0]
            if len(degreeSpec) == 2:
                degree = degreeSpec[1]
                degree = 1
            dictionary[speciesName] = degree
        def PrintHeader(this):
            print string.join(this.speciesNames, ", ") + ", Transition"
        def PrintLabelling(this):
            for speciesName in this.speciesNames:
                print str(this.labelling[speciesName]) + ",", 


We’ve learned about an important model for computation, called Petri nets, and seen how it can be used to model a general class of entity conversion networks, which include chemical reaction networks as a major case. Then we wrote a program to simulate them, and applied it to a simple model for formation and dissociation of water molecules.

This is a good beginning, but observe the following limitation of our current program: it just randomly picks a rule. When we run the program, the simulation makes a kind of random walk, back and forth, between the states of full synthesis and full dissociation. But in a real chemical system, the rates at which the transitions fires are _probabilistically determined_, and depend, among other things, on the temperature. With a high probability for formation, and a low probability for dissociation, we would expect the system to reach an equilibrium state in which H2O is the predominant “token” in the system. The relative concentration of the various chemical species would be determined by the relative firing rates of the various transitions.

This gives motivation for the next article that I am writing, on stochastic Petri nets.

Programming exercises

1. Extend the constructor for PetriNet to accept a dictionary from transitionName to a number, which will give the relative probability of that transition firing. Modify the firing rule logic to use these values. This is a step in the direction of the stochastic Petri nets covered in the next article.

2. Make a prediction about the overall evolution of the system, given fixed probabilities for synthesis and dissociation, and then run the program to see if your prediction is confirmed.

3. If you like, improve the usability of the script, by passing the model parameters from the command line. Use sys.argv and eval. You can use single quotes, and pass a string like “{‘H’:5, ‘O’:3, ‘H2O’:4}” from the command line.

By the way: if you do this exercises, please post a comment including your code, or a link to your code! To make Python code look pretty on this blog, see this.


Thanks to David Tweed and Ken Webb for reviewing the code and coming up with improvements to the specification syntax for the Petri nets. Thanks to Jacob Biamonte for the diagrams, and for energetically reviewing the article. John Baez contributed the three splendid examples of Petri nets. He also encouraged me along the way, and provided good support as an editor.

Appendix: Notes on the language interpreter

The sample program is written in Python, which is a low-fuss scripting language with abstraction capabilities. The language is well-suited for proof-of-concept programming, and it has a medium-sized user base and support community. The main website is

You have a few distributions to choose from:

• If you are on a Linux or Mac type of system, it is likely to already be installed. Just open a shell and type “python.” Otherwise, use the package manager to install it.

• In Windows, you can use the version from the web site. Alternatively, install cygwin, and in the installer, select Python, along with any of your other favorite Linux-style packages. Then you can partially pretend that you are working on a Linux type of system.

• The Enthought Python distribution comes packaged with a suite of Python-based open-source scientific and numeric computing packages. This includes ipython, scipy, numpy and matplotlib.

The program is distributed as a self-contained script, so in Linux and cygwin you can just execute ./ When running this way under cygwin, you can either refer to the cygwin version of the Python executable, or to any of the native Windows versions of interpreter. You just need to adjust the first line of the script to point to the python executable.

High-Speed Finance

8 August, 2012


These days, a lot of buying and selling of stocks is done by computers—it’s called algorithmic trading. Computers can do it much faster than people. Watch how they’ve been going wild!

The date is at lower left. In 2000 it took several seconds for computers to make a trade. By 2010 the time had dropped to milliseconds… or even microseconds. And around this year, market activity started becoming much more intense.

I can’t even see the Flash Crash on May 6 of 2010—also known as The Crash of 2:45. The Dow Jones plummeted 9% in 5 minutes, then quickly bounced back. For fifteen minutes, the economy lost a trillion dollars. Then it reappeared.

But on August 5, 2011, when the credit rating of the US got downgraded, you’ll see the activity explode! And it’s been crazy ever since.

The movie above was created by Nanex, a company that provides market data to traders. The x axis shows the time of day, from 9:30 to 16:00. The y axis… well, it’s the amount of some activity per unit time, but they don’t say what. Do you know?

The folks at Nanex have something very interesting to say about this. It’s not high frequency trading or ‘HFT’ that they’re worried about—that’s actually gone down slightly from 2008 to 2012. What’s gone up is ‘high frequency quoting’, also known as ‘quote spam’ or ‘quote stuffing’.

Over on Google+, Sergey Ten explained the idea to me:

Quote spam is a well-known tactic. It used by high-frequency traders to get competitive advantage over other high-frequency traders. HF traders generate high-frequency quote spam using a pseudorandom (or otherwise structured) algorithm, with his computers coded to ignore it. His competitors don’t know the generating algorithm and have to process each quote, thus increasing their load, consuming bandwidth and getting a slight delay in processing.

A quote is an offer to buy or sell stock at a given price. For a clear and entertaining of how this works and why traders are locked into a race for speed, try:

• Chris Stucchio, A high frequency trader’s apology, Part 1, 16 April 2012. Part 2, 25 April 2012.

I don’t know a great introduction to quote spam, but this paper isn’t bad:

• Jared F. Egginton, Bonnie F. Van Ness, and Robert A. Van Ness, Quote stuffing, 15 March 2012.

Toward the physical limits of speed

In fact, the battle for speed is so intense that trading has run up against the speed of light.

For example, by 2013 there will be a new transatlantic cable at the bottom of the ocean, the first in a decade. Why? Just to cut the communication time between US and UK traders by 5 milliseconds. The new fiber optic line will be straighter than existing ones:

“As a rule of thumb, each 62 miles that the light has to travel takes about 1 millisecond,” Thorvardarson says. “So by straightening the route between Halifax and London, we have actually shortened the cable by 310 miles, or 5 milliseconds.”

Meanwhile, a London-based company called Fixnetix has developed a special computer chip that can prepare a trade in just 740 nanoseconds. But why stop at nanoseconds?

With the race for the lowest “latency” continuing, some market participants are even talking about picoseconds––trillionths of a second. At first the realm of physics and math and then computer science, the picosecond looms as the next time barrier.

Actions that take place in nanoseconds and picoseconds in some cases run up against the sheer limitations of physics, said Mark Palmer, chief executive of Lexington, Mass.-based StreamBase Systems.

Black swans and the ultrafast machine ecology

As high-frequency trading and high-frequency quoting leave slow-paced human reaction times in the dust, markets start to behave differently. Here’s a great paper about that:

• Neil Johnson, Guannan Zhao, Eric Hunsader, Jing Meng, Amith Ravindar, Spencer Carran amd Brian Tivnan, Financial black swans driven by ultrafast machine ecology.

A black swan is an unexpectedly dramatic event, like a market crash or a stock bubble that bursts. But according to this paper, such events are now happening all the time at speeds beyond our perception!

Here’s one:

It’s a price spike in the stock of a company called Super Micro Computer, Inc.. On October 1st, 2010, it shot up 26% and then crashed back down. But this all happened in 25 milliseconds!

These ultrafast black swans happen at least once a day. And they happen most of all to financial institutions.

Here’s a great blog article about this stuff:

• Mark Buchanan, Approaching the singularity—in global finance, The Physics of Finance, 13 February 2012.

I won’t try to outdo Buchanan’s analysis. I’ll just quote the abstract of the original paper:

Society’s drive toward ever faster socio-technical systems, means that there is an urgent need to understand the threat from ‘black swan’ extreme events that might emerge. On 6 May 2010, it took just five minutes for a spontaneous mix of human and machine interactions in the global trading cyberspace to generate an unprecedented system-wide Flash Crash. However, little is known about what lies ahead in the crucial sub-second regime where humans become unable to respond or intervene sufficiently quickly. Here we analyze a set of 18,520 ultrafast black swan events that we have uncovered in stock-price movements between 2006 and 2011. We provide empirical evidence for, and an accompanying theory of, an abrupt system-wide transition from a mixed human-machine phase to a new all-machine phase characterized by frequent black swan events with ultrafast durations (<650ms for crashes, <950ms for spikes). Our theory quantifies the systemic fluctuations in these two distinct phases in terms of the diversity of the system's internal ecology and the amount of global information being processed. Our finding that the ten most susceptible entities are major international banks, hints at a hidden relationship between these ultrafast 'fractures' and the slow 'breaking' of the global financial system post-2006. More generally, our work provides tools to help predict and mitigate the systemic risk developing in any complex socio-technical system that attempts to operate at, or beyond, the limits of human response times.

Trans-quantitative analysts?

When you get into an arms race of trying to write algorithms whose behavior other algorithms can’t predict, the math involved gets very tricky. Over on Google+, F. Lengvel pointed out something strange. In May 2010, Christian Marks claimed that financiers were hiring experts on large ordinals—crudely speaking, big infinite numbers!—to design algorithms that were hard to outwit.

I can’t confirm his account, but I can’t resist quoting it:

In an unexpected development for the depressed market for mathematical logicians, Wall Street has begun quietly and aggressively recruiting proof theorists and recursion theorists for their expertise in applying ordinal notations and ordinal collapsing functions to high-frequency algorithmic trading. Ordinal notations, which specify sequences of ordinal numbers of ever increasing complexity, are being used by elite trading operations to parameterize families of trading strategies of breathtaking sophistication.

The monetary advantage of the current strategy is rapidly exhausted after a lifetime of approximately four seconds — an eternity for a machine, but barely enough time for a human to begin to comprehend what happened. The algorithm then switches to another trading strategy of higher ordinal rank, and uses this for a few seconds on one or more electronic exchanges, and so on, while opponent algorithms attempt the same maneuvers, risking billions of dollars in the process.

The elusive and highly coveted positions for proof theorists on Wall Street, where they are known as trans-quantitative analysts, have not been advertised, to the chagrin of executive recruiters who work on commission. Elite hedge funds and bank holding companies have been discreetly approaching mathematical logicians who have programming experience and who are familiar with arcane software such as the ordinal calculator. A few logicians were offered seven figure salaries, according to a source who was not authorized to speak on the matter.

Is this for real? I like the idea of ‘trans-quantitative analysts’: it reminds me of ‘transfinite numbers’, which is another name for infinities. But it sounds a bit like a joke, and I haven’t been able to track down any references to trans-quantitative analysts, except people talking about Christian Marks’ blog article.

I understand a bit about ordinal notations, but I don’t think this is the time to go into that—not before I’m sure this stuff is for real. Instead, I’d rather reflect on a comment of Boris Borcic over on Google+:

Last week it occurred to me that LessWrong and OvercomingBias together might play a role to explain why Singularists don’t seem to worry about High Frequency Robot Trading as a possible pathway for Singularity-like developments. I mean IMO they should, the Singularity is about machines taking control, ownership is control, HFT involves slicing ownership in time-slices too narrow for humans to know themselves owners and possibly control.

The ensuing discussion got diverted to the question of whether algorithmic trading involved ‘intelligence’, but maybe intelligence is not the point. Perhaps algorithmic traders have become successful parasites on the financial system without themselves being intelligent, simply by virtue of their speed. And the financial system, in turn, seems to be a successful parasite on our economy. Regulatory capture—the control of the regulating agencies by the industry they’re supposed to be regulating—seems almost complete. Where will this lead?

Increasing the Signal-to-Noise Ratio With More Noise

30 July, 2012

guest post by Glyn Adgie and Tim van Beek

Or: Are the Milankovich Cycles Causing the Ice Ages?

The Milankovich cycles are periodic changes in how the Earth orbits the Sun. One question is: can these changes can be responsible for the ice ages? On the first sight it seems impossible, because the changes are simply too small. But it turns out that we can find a dynamical system where a small periodic external force is actually strengthened by random ‘noise’ in the system. This phenomenon has been dubbed ‘stochastic resonance’ and has been proposed as an explanation for the ice ages.

In this blog post we would like to provide an introduction to it. We will look at a bistable toy example. And you’ll even get to play with an online simulation that some members of the Azimuth Project created, namely Glyn Adgie, Allan Erskine and Jim Stuttard!

Equations with noise

In This Weeks Finds, back in ‘week308’, we had a look at a model with ‘noise’. A lot of systems can be described by ordinary differential equations:

\displaystyle{ \frac{d x}{d t} = f(x, t)}

If f is nice, the time evolution of the system will be a nice smooth function x(t), like the trajectory of a thrown stone. But there are situations where we have some kind of noise, a chaotic, fluctuating influence, that we would like to take into account. This could be, for example, turbulence in the air flow around a rocket. Or, in our case, short term fluctuations of the weather of the earth. If we take this into account, we get an equation of the form

\displaystyle{ \frac{d x}{d t} = f(x, t) + W(t) }

where the W is some model of noise. In all the examples above, the noise is actually a simplified way to take into account fast degrees of freedom of the system at hand. This way we do not have to explicitly model these degrees of freedom, which is often impossible. But we could still try to formulate a model where long term influences of these degrees of freedom are included.

A way to make the above mathematically precise is to write the equation in the form

d x_t = f(x, t) d t + g(x, t) d W_t

as an Ito stochastic differential equation driven by a Wiener process.

The mathematics behind this is somewhat sophisticated, but we don’t need to delve into it in this blog post. For those who are interested in it, we have gathered some material here:

Stochastic differential equation, Azimuth Library.

Physicists and engineers who model systems with ordinary and partial differential equations should think of stochastic differential equations as another tool in the toolbox. With this tool, it is possible to easily model systems that exhibit new behavior.

What is stochastic resonance?

Listening to your friend who sits across the table, the noise around you may become a serious distraction, although humans are famous for their ability to filter the signal—what your friend is saying – from the noise—all other sounds, people’s chatter, dishes clattering etc. But could you imagine a situation where more noise makes it easier to detect the signal?

Picture a marble that sits in a double well potential:

double well potential

This graphic is taken from

• Roberto Benzi, Stochastic resonance: from climate to biology.

This review paper is also a nice introduction to the topic.

This simple model can be applied to various situations. Since we think about the ice ages, we can interpret the two metastable states at the two minima of the potential in this way: the minimum to the left at -1 corresponds to the Earth in an ice age. The minimum to the right at 1 corresponds to the Earth at warmer temperatures, as we know it today.

Let us add a small periodic force to the model. This periodic force corresponds to different solar input due to periodic changes in Earth’s orbit: the Milankovich cycles. There are actually several different cycles with different periods, but in our toy model we will consider only one force with one period.

If the force itself is too small to get the marble over the well that is between the two potential minima, we will not observe any periodic change in the position of the marble. But if we add noise to the motion, it may happen that the force together with the noise succeed in getting the marble from one minimum to the other! This is more likely to happen when the force is pushing the right way. So, it should happen with roughly the same periodicity as the force: we should see a ‘quasi-periodic’ motion of the marble.

But if we increase the noise more and more, the influence of the external force will become unrecognizable, and we will not recognize any pattern at all.

There should be a noise strength somewhere in between that makes the Milankovich cycles as visible as possible from the Earth’s temperature record. In other words, if we think of the periodic force as a ‘signal’, there should be some amount of noise that maximizes the ‘signal-to-noise ratio’. Can we find out what it is?

In order to find out, let us make our model precise. Let’s take the time-independent potential to be

\displaystyle{ V(x) = \frac{1}{4} x^4 - \frac{1}{2} x^2 }

This potential has two local minima at x = \pm 1. Let’s take the time-dependent periodic forcing, or ‘signal’, to be

F(t) = - A \; \sin(t)

The constant A is the amplitude of the periodic forcing. The time-dependent effective potential of the system is therefore:

\displaystyle{ V(x, t) = \frac{1}{4} x^4 - \frac{1}{2} x^2 - A \; \sin(t) x }

Including noise leads to the equation of motion, which is usually called a Langevin equation by physicists and chemists:

\begin{array}{ccl} dX &=& -V'(X_t, t) \; dt + \sqrt{2D} \; dW_t \\  \\  &=& (X_t - X_t^3 + A \;  \sin(t)) \; dt  + \sqrt{2D} \; dW_t  \end{array}

For more about what a Langevin equation is, see our article on stochastic differential equations.

This model has two free parameters, A and D. As already mentioned, A is the amplitude of the forcing. D is called the diffusion coefficient and represents the strength of noise. In our case it is a constant, but in more general models it could depend on space and time.

It is possible to write programs that generate simulations aka numerical approximations to solutions of stochastic differential equations. For those interested in how this works, there is an addendum at the bottom of the blog post that explains this. We can use such a program to model the three cases of small, high and optimal noise level.

In the following graphs, green is the external force, while red is the marble position. Here is a simulation with a low level of noise:

low noise level

As you can see, within the time of the simulation there is no transition from the metastable state at 1 to the one at -1. That is, in this situation Earth stays in the warm state.

Here is a simulation with a high noise level:

high noise level

The marble jumps wildly between the states. By inspecting the graph with your eyes only, you don’t see any pattern in it, do you?

And finally, here is a simulation where the noise level seems to be about right to make the signal visible via a quasi-periodic motion:

high noise level

Sometimes the marble makes a transition in phase with the force, sometimes it does not, sometimes it has a phase lag. It can even happen that the marble makes a transition while the force acts in the opposite direction!

Some members of the Azimuth crew have created an online model that calculates simulations of the model, for different values of the involved constants. You can see it here:

Stochastic resonance example, Azimuth Code Project.

We especially thank Allan Erskine and Jim Stuttard.

You can change the values using the sliders under the graphic and see what happens. You can also choose different ‘random seeds’, which means that the random numbers used in the simulation will be different.

Above, we asked if one could calculate for a fixed A the level of noise D that maximizes the signal to noise ratio. We have defined the model system, but we still need to define ‘signal to noise ratio’ in order to address the question.

There is no general definition that makes sense for all kinds of systems. For our model system, let’s assume that the expectation value of x(t) has a Fourier expansion like this:

\displaystyle{ \langle x(t) \rangle = \sum_{n = - \infty}^{\infty}  M_n \exp{(i n t)} }

Then one useful definition of the signal to noise ratio \eta is

\displaystyle{ \eta := \frac{|M_1|^2}{A^2}}

So, can one calculate the value of D, depending on A that maximizes \eta? We don’t know! Do you?

If you would like to know more, have a look at this page:

Stochastic resonance, Azimuth Library.

Also, read on for more about how we numerically solved our stochastic differential equation, and the design decisions we had to make to create a simulation that runs on your web browser.

For übernerds only

Numerically solving stochastic differential equations

Maybe you are asking yourself what is behind the online model? How does one ‘numerically solve’ a stochastic differential equation and in what sense are the graphs that you see there ‘close’ to the exact solution? No problem, we will explain it here!

First, you need C code like this:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define znew ((z=36969*(z&65535)+(z>>16)) << 16)
#define wnew ((w = 18000*(w&65535)+(w>>16))&65535)

static unsigned long z = 362436069, w = 521288629;

float rnor() {
	float x, y, v;

	x = ((signed long) (znew + wnew)) * 1.167239E-9;

	if (fabs(x) < 1.17741)
		return (x);

	y = (znew + wnew) * 2.328306E-10;

	if (log(y) < 0.6931472 - 0.5 * x * x)
		return x;

	x = (x > 0) ? 0.8857913 * (2.506628 - x) : -0.8857913 * (2.506628 + x);

	if (log(1.8857913 - y) < 0.5718733 - 0.5 * (x * x))
		return (x);

	do {
		v = ((signed long) (znew + wnew)) * 4.656613E-10;

		x = -log(fabs(v)) * 0.3989423;

		y = -log((znew + wnew)* 2.328306E-10);

	} while (y + y < x * x);

	return (v > 0 ? 2.506628 + x : 2.506628 - x);

int main(void) {

	int index = 0;

	for(index = 0; index < 10; index++) {
		printf("%f\n", rnor());

Pretty scary, huh? Do you see what it does? We don’t think anyone stands a chance to understand all the ‘magical constants’ in it. We pasted it here so you can go around and show it to friends who claim that they know the programming language C and understand every program in seconds, and see what they can tell you about it. We’ll explain it below.

Since this section is for übernerds, we will assume that you know stochastic processes in continuous time. Brownian motion on \mathbb{R} can be viewed as a probability distribution on the space of all continuous functions on a fixed interval, C[0, T]. This is also true for solutions of stochastic differential equations in general. A ‘numerical approximation’ to a stochastic differential equation is an discrete time approximation that samples this probability space.

Another way to think about it is to start with numerical approximations of ordinary differential equations. When you are handed such an equation of the form

\displaystyle{ \frac{d x}{d t} = f(x, t) }

with an initial condition x(0) = x_0, you know that there are different discrete approximation schemes to calculate an approximate solution to it. The simplest one is the Euler scheme. You need to choose a step size h and calculate values for x recursively using the formula

x_{n + 1} = x_n + f(x_n, t_n) \; h

with t_{n+1} = t_n + h. Connect the discrete values of x_n with straight lines to get your approximating function x^h. For h \to 0, you get convergence for example in the supremum norm of x^h(t) to the exact solution x(t):

\displaystyle{  \lim_{h \to 0} \| x^h - x \|_\infty = 0 }

It is possible to extend some schemes from ordinary differential equations to stochastic differential equations that involve certain random variables in every step. The best textbook on the subject that I know is this one:

• Peter E. Kloeden and Eckhard Platen, Numerical Solution of Stochastic Differential Equations, Springer, Berlin, 2010. (ZMATH review.)

The extension of the Euler scheme is very simple, deceptively so! It is much less straightforward to determine the extensions of higher Runge–Kutta schemes, for example. You have been warned! The Euler scheme for stochastic differential equations of the form

d X_t = f(X, t) d t + g(X, t) d W_t

is simply

X_{n + 1} = X_n + f(X_n, t_n) \; h + g(X_n, t_n) \; w_n

where all the w_n are independent random variables with a normal distribution N(0, h). If you write a computer program that calculates approximations using this scheme, you need a pseudorandom number generator. This is what the code we posted above is about. Actually, it is an efficient algorithm to calculate pseudorandom numbers with a normal distribution. It’s from this paper:

• George Marsaglia and Wai Wan Tsang, The Monty Python method for generating random variables, ACM Transactions on Mathematical Software 24 (1998), 341–350.

While an approximation to an ordinary differential equation approximates a function, an approximation to a stochastic differential equation approximates a probability distribution on C[0, T]. So we need a different concept of ‘convergence’ of the approximation for h \to 0. The two most important ones are called ‘strong’ and ‘weak’ convergence.

A discrete approximation X^h to an Ito process X converges strongly at time T if

\lim_{h \to 0} \; E(|X_T^h - X_T|) = 0

It is said to be of strong order \gamma if there is a constant C such that

E(|X_T^h - X_T|) \leq C h^{\gamma}

As we see from the definition, a strong approximation needs to approximate the path of the original process X at the end of the time interval [0, T].

What about weak convergence?

A discrete approximation X^h to an Ito process X converges weakly for a given class of functions G at a time T if for every function g \in G we have

\lim_{h \to 0} \; E(g(X_T^h)) - E(g(X_T)) = 0

where we assume that all appearing expectation values exist and are finite.

It is said to be of weak order \gamma with respect to G if there is a constant C such that for every function g \in G:

E(g(X_T^h)) - E(g(X_T)) \leq C h^{\gamma}

One speaks simply of ‘weak convergence’ when G is the set of all polynomials.

So weak convergence means that the approximation needs to approximate the probability distribution of the original process X. Weak and strong convergence are indeed different concepts. For example, the Euler scheme is of strong order 0.5 and of weak order 1.0.

Our simple online model implements the Euler scheme to calculate approximations to our model equation. So now you know it what sense it is indeed an approximation to the solution process!

Implementation details of the interactive online model

There appear to be three main approaches to implementing an interactive online model. It is assumed that all the main browsers understand Javascript.

1) Implement the model using a server-side programme that produces graph data in response to parameter settings, and use Javascript in the browser to plot the graphs and provide interaction inputs.

2) Pre-compute the graph data for various sets of parameter values, send these graphs to the browser, and use Javascript to select a graph and plot it. The computation of the graph data can be done on the designers machine, using their preferred language.

3) Implement the model entirely in Javascript.

Option 1) is only practical if one can run interactive programs on the server. While there is technology to do this, it is not something that most ISPs or hosting companies provide. You might be provided with Perl and PHP, but neither of these is a language of choice for numeric coding.

Option 2) has been tried. The problem with this approach is the large amount of pre-computed data required. For example, if we have 4 independent parameters, each of which takes on 10 values, then we need 10000 graphs. Also, it is not practical to provide much finer control of each parameter, as this increases the data size even more.

Option 3) looked doable. Javascript has some nice language features, such as closures, which are a good fit for implementing numerical methods for solving ODEs. The main question then is whether a typical Javascript engine can cope with the compute load.

One problem that arose with a pure Javascript implementation is the production of normal random deviates required for the numerical solution of an SDE. Javascript has a random() function, that produces uniform random deviates, and there are algorithms that could be implemented in Javascript to produce normal deviates from uniform deviates. The problems here are that there is no guarantee that all Javascript engines use the same PRNG, or that the PRNG is actually any good, and we cannot set a specific seed in Javascript. Even if the PRNG is adequate, we have the problem that different users will see different outputs for the same parameter settings. Also, reloading the page will reseed the PRNG from some unknown source, so a user will not be able to replicate a particular output.

The chosen solution was to pre-compute a sequence of random normal deviates, using a PRNG of known characteristics. This sequence is loaded from the server, and will naturally be the same for all users and sessions. This is the C++ code used to create the fixed Javascript array:

#include <boost/random/mersenne_twister.hpp>
#include <boost/random/normal_distribution.hpp>
#include <iostream>

int main(int argc, char * argv[])
    boost::mt19937 engine(1234); // PRNG with seed = 1234
    boost::normal_distribution<double> rng(0.0, 1.0); // Mean 0.0, standard deviation 1.0
    std::cout << "normals = [\n";
    int nSamples = 10001;
        std::cout << rng(engine) << ",\n";
    std::cout << "];";

Deviates with a standard deviation other than 1.0 can be produced in Javascript by multiplying the pre-computed deviates by a suitable scaling factor.

The graphs only use 1000 samples from the Javascript array. To see the effects of different sample paths, the Javascript code selects one of ten slices from the pre-computed array.

The Science Code Manifesto

15 October, 2011

There’s a manifesto that you can sign, calling for a more sensible approach to the use of software in science. It says:

Software is a cornerstone of science. Without software, twenty-first century science would be impossible. Without better software, science cannot progress.

But the culture and institutions of science have not yet adjusted to this reality. We need to reform them to address this challenge, by adopting these five principles:

Code: All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper.

Copyright: The copyright ownership and license of any released source code must be clearly stated.

Citation: Researchers who use or adapt science source code in their research must credit the code’s creators in resulting publications.

Credit: Software contributions must be included in systems of scientific assessment, credit, and recognition.

Curation: Source code must remain available, linked to related materials, for the useful lifetime of the publication.

The founding signatories are:

• Nick Barnes and David Jones of the Climate Code Foundation,

• Peter Norvig, the director of research at Google,

• Cameron Neylon of Science in the Open,

• Rufus Pollock of the Open Knowledge Foundation,

• Joseph Jackson of the Open Science Foundation.

I was the 312th person to sign. How about joining?

There’s a longer discussion of each point of the manifesto here. It ties in nicely with the philosophy of the Azimuth Code Project, namely:

Many papers in climate science present results that cannot be reproduced. The authors present a pretty diagram, but don’t explain which software they used to make it, and don’t make this software available, don’t really explain how they did what they did. This needs to change! Scientific results need to be reproducible. Therefore, any software used should be versioned and published alongside any scientific results.

All of this is true for large climate models such as General Circulation Models, as well—but the problem becomes much more serious, because these models have long outgrown the extend where a single developer was able to understand all the code. This is a kind of phase transition in software development: it necessitates a different toolset and a different approach to software development.

As Nick Barnes points out, these ideas

… are simply extensions of the core principle of science: publication. Publication is what distinguishes science from alchemy, and is what has propelled science—and human society—so far and so fast in the last 300 years. The Manifesto is the natural application of this principle to the relatively new, and increasingly important, area of science software.

Your Model Is Verified, But Not Valid! Huh?

12 June, 2011

guest post by Tim van Beek

Among the prominent tools in climate science are complicated computer models. For more on those, try this blog:

• Steve Easterbrook, Serendipity, or What has Software Engineering got to do with Climate Change?

After reading Easterbrook’s blog post about “climate model validation”, and some discussions of this topic elsewhere, I noticed that there is some “computer terminology” floating around that disguises itself as plain English! This has led to some confusion, so I’d like to explain some of it here.

Technobabble: The Quest for Cooperation

Climate change may be the first problem in the history of humankind that has to be tackled on a global scale, by people all over the world working together. Of course, a prerequisite of working together is a mutual understanding and a mutual language. Unfortunately every single one of the many professions that scientists and engineers engage in have created their own dialect. And most experts are proud of it!

When I read about the confusion that “validation” versus “verification” of climate models has caused, I was reminded of the phrase “technobabble”, which screenwriters for the TV series Star Trek used whenever they had to write a dialog involving the engineers on the Starship Enterprise. Something like this:

“Captain, we have to send an inverse tachyon beam through the main deflector dish!”

“Ok, make it so!”

Fortunately, neither Captain Picard nor the audience had to understand what was really going on.

It’s a bit different in the real world, where not everyone may have the luxury of staying on the sidelines while the trustworthy crew members in the Enterprise’s engine room solve all the problems. We can start today by explaining some software engineering technobabble that came up in the context of climate models. But why would software engineers bother in the first place?

Short Review of Climate Models

Climate models come in a hierarchy of complexity. The simplest ones only try to simulate the energy balance of the planet earth. These are called energy balance models. They don’t take into account the spherical shape of the earth, for example.

At the opposite extreme, the most complex ones try to simulate the material and heat flow of the atmosphere and the oceans on a topographical model of the spinning earth. These are called general circulation models, or GCMs for short. GCMs have a lot of code, sometimes more than a million lines of code.

A line of code is basically one instruction for the computer to carry out, like:

add 1/2 and 1/6 and store the result in a variable called e

print e on the console

In order to understand what a computer program does, in theory, one has to memorize every single line of code and understand it. And most programs use a lot of other programs, so in theory one would have to understand those, too. This is of course not possible for a single person!

We hope that taking into account a lot of effects, which results in a lot of lines of code, makes the models more accurate. But it certainly means that they are complex enough to be interesting for software engineers.

In the case of software that is used to run an internet shop, a million lines of code isn’t much. But it is already too big for one single person to handle. Basically, this is where all the problems start, that software engineering seeks to solve.

When more than one person works on a software project things often get complicated.

(From the manual of CVS, the “Concurrent Versions System”.)

Software Design Circle

The job of software engineer is in some terms similar to the work of an architect. The differences are mainly due to the abstract nature of software. Everybody can see if a building is finished or if it isn’t, but that’s not possible with software. Nevertheless every software project does come to an end, and people have to decide whether or not the product, the software, is finished and does what it should. But since software is so abstract, people have come up with special ideas about how the software “production process” should work and how to tell if the software is correct. I would like to explain these a little bit further.

Stakeholders and Shifts in Stakeholder Analysis

There are many different people working in an office building with different interests: cleaning squads, janitors, plant security, and so on. When you design a new office building, you need to identify and take into account all the different interests of all these groups. Most software projects are similar, and the process just mentioned is usually called stakeholder analysis.

Of course, if you take into account only the groups already mentioned, you’ll build an office building without any offices, because that would obviously be the simplest one to monitor and to keep working. Such an office building wouldn’t make much sense, of course! This is because we made a fatal mistake with our stakeholder analysis: we failed to take into account the most important stakeholders, the people who will actually use the offices. These are the key stakeholders of the office building project.

After all, the primary purpose of an office building is to provide offices. And in the end, if we have an office building without offices, we’ll notice that no one will pay us for our efforts.

Gathering Requirements

While it may be obvious what most people want from an office building, the situation is usually much more abstract, hence much more complicated, for software projects.

This is why software people carry out a requirement analysis, where they ask the stakeholders what they would like the software to do. A requirement for an office building might be, for example, “we need a railway station nearby, because most of the people who will work in the building don’t have cars.” A requirement for a software project might be, for example, “we need the system to send email notifications to our clients on a specific schedule”.

In an ideal world, the requirement analysis would result in a document —usually called something like a system specification—that contains both the requirements, and also descriptions of the test cases that are needed to test whether the finished system meets the requirements. For example:

“Employee A lives in an apartment 29 km away from the office building and does not have a car. She gets to work within 30 minutes by using public transportation.”

Verification versus Validation

When we have finished the office building (or the software system), we’ll have to do some acceptance testing, in order to convince our customer that she should pay us (or simply to use the system, if it is for free). When you buy a car, your “acceptance test” is driving away with it—if that does not work, you know that there is something wrong with your car! But for complicated software—or office buildings—we need to agree on what we do to test if the system is finished. That’s what we need the test cases for.

If we are lucky, the relevant test cases will already be described in the system specification, as noted above. But that is not the whole story.

Every scientific community that has its own identity invents its own new language, often borrowing words from everyday language and defining new, surprising, special meanings for them. Software engineers are no different. There are, for example, two very different aspects to testing a system:

• Did we do everything according to the system specification?


• Now that the system is there, and our key stakeholders can see it for themselves, did we get the system specification right: is our product useful to them?

The first is called verification, the second validation. As you can see, software engineers took two almost synonymous words from everyday language and gave them quite different meanings!

For example, if you wrote in the specification for an online book seller:

“we calculate the book price by multiplying the ISBN number by pi”

and the final software system does just that, then the system is verified. But if the book seller would like to stay in business, I bet that he won’t say the system has been validated.

Stakeholders of Climate Models

So, for business applications, it’s not quite right to ask “is the software correct?” The really important question is: “is the software as useful for the key stakeholders as it should be?”

But in Mathematics Everything is Either True or False!

One may wonder if this “true versus useful” stuff above makes any sense when we think about a piece of software that calculates, for example, a known mathematical function like a “modified Bessel function of the first kind”. After all, it is defined precisely in mathematics what these functions look like.

If we are talking about creating a program that can evaluate these functions, there are a lot of technical choices that need to be specified. Here is a random example (if you don’t understand it, don’t worry, that is not necessary to get the point):

• Current computers know data types with a finite value range and finite precision only, so we need to agree on which such data type we want as a model of the real or complex numbers. For example, we might want to use the “double precision floating-point format”, which is an international standard.

Another aspect is, for example, “how long may the function take to return a value?” This is an example of a non-functional requirement (see Wikipedia). These requirements will play a role in the implementation too, of course.

However, apart from these technical choices, there is no ambiguity as to what the function should do, so there is no need to distinguish verification and validation. Thank god that mathematics is eternal! A Bessel function will always be the same, for all of eternity.

Unfortunately, this is no longer true when a computer program computes something that we would like to compare to the real world. Like, for example, a weather forecast. In this case the computer model will, like all models, include some aspects of the real world. Or rather, some specific implementations of a mathematical model of a part of the real world.

Verification will still be the same, if we understand it to be the stage where we test to see if the single pieces of the program compute what they are supposed to. The parts of the program that do things that can be defined in a mathematically precise way. But validation will be a whole different step if understood in the sense of “is the model useful?”

But Everybody Knows What Weather Is!

But still, does this apply to climate models at all? I mean, everybody knows what “climate” is, and “climate models” should simulate just that, right?

As it turns out, it is not so easy, because climate models serve very different purposes:

• Climate scientists want to test their understanding of basic climate processes, just as physicists calculate a lot of solutions to their favorite theories to gain a better understanding of what these theories can and do model.

• Climate models are also used to analyse observational data, to supplement such data and/or to correct them. Climate models have had success in detecting misconfiguration and biases in observational instruments.

• Finally, climate models are also used for global and/or local predictions of climate change.

The question “is my climate model right?” therefore translates to the question “is my climate model useful?” This question has to refer to a specific use of the model, or rather: to the viewpoint of the key stakeholders.

The Shift of Stakeholders

One problem of the discussions of the past seems to be due to a shift of the key stakeholders. For example: some climate models have been developed as a tool for climate scientists to play around with certain aspects of the climate. When the scientists published papers, including insights gained from these models, they usually did not publish anything about the implementation details. Mostly, they did not publish anything about the model at all.

This is nothing unusual. After all, a physicist or mathematician will routinely publish her results and conclusions—maybe with proofs. But she is not required to publish every single thought she had to think to produce her results.

But after the results of climate science became a topic in international politics, a change of the key stakeholders occurred: a lot of people outside the climate science community developed an interest in the models. This is a good thing. There is a legitimate need of researchers to limit participation in the review process, of course. But when the results of a scientific community become the basis of far-reaching political decisions, there is a legitimate public interest in the details of the ongoing research process, too. The problem in this case is that the requirements of the new key stakeholders, such as interested software engineers outside the climate research community, are quite different from the requirements of the former key stakeholders, climate scientists.

For example, if you write a program for your own eyes only, there is hardly any need to write a detailed documentation of it. If you write it for others to understand it, as rule of thumb, you’ll have to produce at least as much documentation as code.

Back to the Start: Farms, Fields and Forests

As an example of a rather prominent critic of climate models, let’s quote the physicist Freeman Dyson:

The models solve the equations of fluid dynamics and do a very good job of describing the fluid motions of the atmosphere and the oceans.

They do a very poor job of describing the clouds, the dust, the chemistry and the biology of fields, farms and forests. They are full of fudge factors so the models more or less agree with the observed data. But there is no reason to believe the same fudge factors would give the right behaviour in a world with different chemistry, for example in a world with increased CO2.

Let’s assume that Dyson is talking here about GCMs, with all their parametrizations of unresolved processes (which he calls “fudge factors”). Then the first question that comes to my mind is “why would a climate model need to describe fields, farms and forests in more detail?”

I’m quite sure that the answer will depend on what aspects of the climate the model should represent, in what regions and over what timescale.

And that certainly depends on the answer to the question “what will we use our model for?” Dyson seems to assume that the answer to this question is obvious, but I don’t think that this is true. So, maybe we should start with “stakeholder analysis” first.