Category Theory in Epidemiology

14 January, 2024

This is a talk I gave at the Edinburgh Mathematical Society on December 4, 2023:

Abstract. “Stock and flow diagrams” are widely used for modeling in epidemiology. Modelers often regard these diagrams as an informal step toward a mathematically rigorous formulation of a model in terms of ordinary differential equations. However, these diagrams have a precise syntax, which can be explicated using category theory. Although commercial tools already exist for drawing these diagrams and solving the differential equations they describe, my collaborators and I have created new software that overcomes some limitations of existing tools. Basing this software on categories has many advantages, but I will explain three: functorial semantics, model composition, and model stratification. This is joint work with Xiaoyan Li, Sophie Libkind, Nathaniel Osgood, Evan Patterson and Eric Redekopp.

You can see my slides here. You can get the code for Stockflow
here and for ModelCollab here.

For more, read these:

• John Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood and Eric Redekopp, A categorical framework for modeling with stock and flow diagrams, to appear in Mathematics for Public Health, Springer.

• John Baez, Xiaoyan Li, Sophie Libkind, Nathaniel Osgood and Evan Patterson, Compositional modeling with stock and flow diagrams, Proceedings Fifth International Conference on Applied Category Theory, EPTCS 380 (2022), 77-96.

• Sophie Libkind, Andrew Baas, Micah Halter, Evan Patterson and James Fairbanks, An algebraic framework for structured epidemic modeling, Phil. Trans. R. Soc. A. 380 (2022), 20210309.

• Evan Patterson and Micah Halter, Compositional epidemiological modeling using structured cospans.


Agent-Based Models (Part 2)

17 August, 2023

Some news! Nathaniel Osgood, Evan Patterson, Kris Brown, Xiaoyan Li, Sean Wu, William Waites and I are going to work together at the International Centre for Mathematical Sciences for six weeks starting on May 1st, 2024. We’re going to use category theory to design better software for agent-based models.

Needless to say, we’re getting started on the math now, since 6 weeks is not enough for that.

Here’s a bit more about our plan.

Introduction

An agent-based model (ABM) is a computational construct characterizing the dynamics of one or more open populations of individual agents, each associated with parameters, state, rules and actions, and interacting with each other and with an environment in which they are embedded [Mc]. Such agents are often spatially situated and mobile, linked by one or more networks, and hierarchically nested. While ABMs increasingly inform health policy decision-making, our work within this arena, and that of many other researchers, clearly demonstrates that extant tools and software frameworks impose many barriers to effective ABM construction and use.

In a domain where interdisciplinary teamwork is of central importance, current ABMs lack precise and publishable specifications, are commonly opaque to non-technical stakeholders, and are often difficult to navigate even for technical team members. They also provide poor support for modifying and reusing models, and often impaired performance. All this and more raises numerous challenges to the collaboration within and between diverse teams critical for impactful modeling.

A new generation of modeling methods grounded in applied category theory offers the potential to overcome many of these barriers. Our recent work has demonstrated the promise of these methods as part of the AlgebraicJulia open-source software project [B1,B2,L]. In this application, we seek to extend our work to ABMs. We plan to develop models that offer visual transparency for stakeholders, modularity and reuse, easy optimization, alignment between types of heterogeneity, formal migration in support of model evolution, and a syntax/semantics separation supporting flexibility in model operation across interdisciplinary teams.

These new methods embody the following principles:

Modularity: Models of specific subsystems may be constructed individually by different domain experts, then coupled together using appropriate principles, supporting ongoing collaboration by the parties.

Hierarchy: Multiscale models, such as of spatially distributed human/animal populations, can be constructed in a multi-scale way that mirrors the structure of the system.

Robustness: Models are well-defined mathematical structures, which may be specified, optimized, evolved, and visualized using high-level operations. Simulation code is generated from the specification, reducing programming effort and errors, and accelerating the feedback cycle with stakeholders.

Clarity, accessibility, and transparency: Rigorous yet intuitive diagrammatic languages, such as wiring diagrams, support clear communication of model structure. This allows input and critique from all members of a modelling team, regardless of mathematical experience.

Objectives

The overall objective of our proposed work is to reduce barriers to team-based and richly data-informed health dynamic modeling by using categorical methods to realize visually and mathematically transparent, modular, flexible, readily evolvable, efficient and reproducible agent-based modelling. We propose five lines of work. (1)-(3) will be undertaken at the ICMS facility during the event, while (4)-(5) will be conducted remotely and continue after the event:

  1. Developing a mathematical framework for ABMs.
  2. Implementating this framework in AlgebraicJulia.
  3. Creating five ABMs using our framework for teaching purposes, and beginning to develop a full-scale production-quality ABM within this framework. Here we shall draw on Osgood’s training library of over 100 diverse health science models.
  4. Building a real-time collaborative web-based interface to let teams of users simultaneously and collaboratively construct, view, modify and interact with ABMs.
  5. Evaluating, refining and disseminating the above via training bootcamps, incubators, hackathons, Osgood’s online video series [O] and sponsored projects.

While development will concentrate on these five lines of work in successive periods of the schedule, we anticipate contributors from all spheres of work to be continually present through all 6 weeks of the proposed event.

The team

The team are:

• Nathaniel D. Osgood: computer scientist and public health modeler with dozens of peer-reviewed publications on ABMs; provincial director of COVID-19 modeling in Canada; supervised creation of ABMs used for COVID-19 decision-making across Canada and Australia. Contributed to development of StockFlow.jl using AlgebraicJulia as a framework for “stock and flow models” in epidemiology [B1,B2,M,SF].

• John C. Baez: mathematical physicist and applied category theorist; helped develop the “decorated cospan” framework widely used in AlgebraicJulia [B0], and helped apply this to stock and flow models in epidemiology [B1,B2.

• William Waites: computer scientist and member of the Digital Health and Wellness Research Group at Strathclyde; developed compositional and rule-based models of disease transmission [W1,W2].

• Xiaoyan Li: Doctoral student in Computer Science with extensive peer-reviewed contributions in health applications of dynamic modeling; lead developer of the StockFlow.jl package using AlgebraicJulia for stock and flow health models [B1,B2,M,SF].

• Sean Wu: Epidemiology (PhD) and public health (MPH) modeler; senior scientist at Merck; creator of an ABM package in R and a prototype ABM implementation in AlgebraicJulia [Wu].

• Evan Patterson: applied category theorist specializing in scientific computing, software systems and data science; helped design the AlgebraicJulia framework [AJ] and software using this framework for both Petri net [L] and stock and flow models [B1] of epidemiology.

• Kristopher Brown: research software engineer; led software development for ABMs and rewrite rules in AlgebraicJulia [Br,Wu].

Benefits

Based on our above-proposed work in building, evaluating and refining model libraries and updating our course exercises for use with the systems being developed here, we will work to secure widespread dissemination of our ACT-based modeling tools and supporting material via our bootcamps, hackathons and online materials. The applicants and attendees have already led a successful and well-subscribed bootcamp on infectious disease modeling using ACT. Osgood has a demonstrated history of leading of dozens of previous ABM bootcamps across Australia, Canada, the US and beyond, and has $>3500$ subscribers and $>700,000$ views on his teaching channel [O]. Thus, we are confident that successful delivery of this work will support effective opportunities for feedback and evaluation, uptake and dissemination of the categorical methods described here. The work will further offer significant insights for researchers in computer science (via conference publications, AlgebraicJulia contributions, language design/engineering), computational epidemiology, and ripple-through benefits for public health (PH) decision-makers and population health.

Methodological contributions will come in the form of an open-source project to support health agent-based modeling. For computational epidemiologists, this work is likely to illuminate little-recognized avenues to advance their field, and offer support of more transparent, shareable, reusable, versatile and less error-prone models. For health scientists, we anticipate our work will expand data availability and reduce cognitive and computational burdens and collaborative hurdles associated with modeling, leading to more rapid insight regarding intervention tradeoffs and indirectly improving PH decision-making via our extensive modeling application collaborations. While the effects are more indirect, we expect that decision-makers in PH (with whom we maintain extensive collaborations) will benefit from this research via earlier warnings of emergent PH concerns, more reliable and transparent models, improved ability to explore scenarios, greater confidence regarding intervention tradeoffs, and better understanding of drivers underlying observed PH trends, and enhanced decision making.

Methodological Details

Adapting the approach employed to create software for stock and flow [B1,B2] and Petri net [L] models, we will encode the syntax of ABMs using copresheaf categories, which are well supported in AlgebraicJulia [AJ]. In standard fashion, a functor into \mathsf{Set} will serve to encode a particular ABM. Various functors will then allow various rewriting-based semantics to execute or analyze this ABM. Semantics that we seek to support (in decreasing order of priority) include single-realization simulation, ensemble simulation, sensitivity analyses, calibration, parameter sampling via Approximate Bayesian Computation, and execution using Sequential Monte Carlo methods. Drawing on insights from our previous work, we plan to use Julia macros to create a declarative domain-specific language supporting the specification of ABMs. Our work will draw on the success of a recent effort by our collaborators [Wu] to implement a simple version of individual-based models using the AlgebraicJulia package for rewrite rules [Br], but will move far beyond what was accomplished in that work by supporting both higher-level ABM constructs critical for model transparency and a repertoire of mechanisms for agent-based modeling. Constructs that we aim to support include — but are not limited to — statecharts, mode-dependent wiring diagrams, and ODEs for characterizing and evolving state, populations, queries over them, and scenario definitions. We will additionally seek to support (in order of decreasing priority) hierarchical nesting of agents, network and spatial embedding, and between-agent interaction mechanisms, such as message exchange.

If there is insufficient time to adequately support all mechanisms, we shall pursue them according to priority and ease of characterization within the developed mathematics and API. During the event, we plan to implement these methods in five ABMs for pertussis, COVID-19, gestational and type 2 diabetes, opioid-related harms, and antimicrobial resistance.

References

[AJ] AlgebraicJulia: Bringing compositionality to technical computing. Available at
https://www.algebraicjulia.org.

[B0] J. C. Baez, K. Courser and C. Vasilakopoulou, Structured versus decorated cospans, Compositionality 4, 3 (2022).

[B1] J. C. Baez, X. Li, S. Libkind, N. D. Osgood and E. Patterson, Compositional modeling with stock and flow diagrams, Proceedings Fifth International Conference on Applied Category Theory, EPTCS 380 (2022), 77–96.

[B2] J. C. Baez, X. Li, S. Libkind, N. D. Osgood and E. Redekopp, A categorical framework for modeling with stock and flow diagrams, to appear in Mathematics for Public Health, Springer, Berlin.

[Br] K. Brown, E. Patterson, T. Hanks, and J. Fairbanks, Computational category-theoretic rewriting, in Graph Transformation: 15th International Conference, ICGT 2022, 2022, pp. 155–172.

[L] S. Libkind, A. Baas, M. Halter, E. Patterson and J. P. Fairbanks, An algebraic framework for structured epidemic modelling, Philosophical Transactions of the Royal Society A 380 (2022), 20210309.

[Mc] G. W. McDonald and N. D. Osgood, Agent-based modeling and its tradeoffs: an introduction and examples.

[M] N. Meadows, X. Li and N. D. Osgood, Hierarchical and upstream-downstream composition of stock and flow models.

[O] N. D. Osgood, Teaching videos.

[SV] Stockflow.jl. Available at https://github.com/AlgebraicJulia/StockFlow.jl.

[W1] W. Waites, M. Cavaliere, D. Manheim, J. Panovska-Griffiths, and V. Danos, Rule-based epidemic models, Journal of Theoretical Biology 530 (2021), 110851.

[W2] W. Waites, M. Cavaliere, V. Danos et al, Compositional modelling of immune response and virus transmission dynamics, Philosophical Transactions of the Royal Society A 380 (2022), 20210307.

[Wu] S. L. Wu, S. Libkind, K. Brown, E. Patterson and J. Fairbanks, Individual.jl: Rewriting individual-based models for epidemiology using graph rewriting.


Compositional System Dynamics for Public Health

24 June, 2023

The Topos Institute has a colloquium every week. This time it’s by my old grad school pal Nate Osgood. He’s a computer scientist, he helped lead Canada’s COVID modeling effort…

… and he is now spear-heading a project to develop software that uses ideas from category theory to make it easier to create models of epidemic disease!

I am completely biased, since I’m involved in this—but I am really excited about how category theory, computer science and epidemiology are meeting in this project and yielding practical tools for public health.

His talk is Thursday June 29, 10:00-11:00 Pacific Time. To watch it either live or recorded go here. It should be fun, because I believe he and his grad student Xiaoyan Li will attempt to assemble a disease model in real time using our software.

Here’s his talk abstract:

• Nathaniel Osgood, Towards compositional system dynamics for public health.

Abstract: For decades, System Dynamics (SD) modeling has served as a prominent, diagram-centric methodology used for public health modeling. Much of its strength arises from its versatile use of 3 types of diagrams, with each serving both to elevate transparency across the interdisciplinary teams responsible for most impactful models, and to reason about patterns of system behavior. Causal loop diagrams (CLDs) are used in semi-qualitative processes early in the modeling process and seek to support insight into feedback structure, behavioral modes, and leverage points. As modeling proceeds, system structure diagrams further distinguish stocks (accumulations) from flows and material from informational dependencies. Stock & flow diagrams build on that representation to characterize mathematical dependencies, quantify parameters and initial values for stocks, and have been particularly widely used in scenario simulation in public health and mathematical epidemiology. While ubiquitous use of diagrams renders SD modeling markedly effective in supporting team science and shaping stakeholders’ mental models, existing tools suffer from a number of shortcomings. These include poor support for modularity, cumbersome and obscure model stratification, and an inability to capture the relationships between the 3 diagram types.

Within this talk, we describe initial progress towards creating a framework for compositional System Dynamics, including theory, API support via StockFlow.jl within AlgebraicJulia, and ModelCollab—a real-time collaborative tool to support interdisciplinary teams in modularly building, composing and flexibly analyzing Stock & Flow diagrams. Our approach separates syntax from semantics, and characterizes diagrams using copresheaves with a schema category. Diagram composition draws on the theory of structured cospans and undirected wiring diagrams, and employs pullbacks for model stratification. Model interpretation is achieved via functorial semantics, with ordinary differential equations being just one of several semantic domains supported. After describing the current state of implementation, we describe plans for future work, including enriching support for CLDs, and adding support for several computational statistics algorithms and additional types of structurally-informed model analyses. This is joint work with John Baez, Evan Patterson, Nicholas Meadows, Sophie Libkind, Alex Alegre and Eric Redekopp.

For details (if this talk abstract wasn’t long enough for you), try our paper:

• John C. Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood and Eric Redekopp, A categorical framework for modeling with stock and flow diagrams, to appear in Mathematics for Public Health, Springer.


Categories for Epidemiology

4 May, 2023

Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood, Eric Redekopp and I have been creating software for modeling the spread of disease… with the help of category theory!

Lots of epidemiologists use “stock-flow diagrams” to describe ordinary differential equation (ODE) models of disease dynamics. We’ve created two tools to help them.

The first, called StockFlow.jl, is based on category theory and written in AlgebraicJulia, a framework for programming with categories that many people at or associated with Topos have been developing. The second, called ModelCollab, runs on web browsers and serves as a graphical user interface for StockFlow.jl.

Using ModelCollab requires no knowledge of Julia or category theory! This feature should be useful in “participatory modeling”, an approach where models are built with the help of diverse stakeholders. However, as we keep introducing new features in StockFlow.jl, it takes time to implement them in ModelCollab.

But what’s a stock-flow diagram, and what does our software let you do with them?

The picture here shows an example: a simple disease model where Susceptible people become Infective, then Recovered, then Susceptible again.

The boxes are “stocks” and the double-edged arrows are “flows”. There are also blue “links” from stocks to “variables”, and from stocks and variables to flows. This picture doesn’t show the formulas that say exactly how the variables depend on stocks, and how the flows depend on stocks and variables. So, this picture doesn’t show the whole thing. It’s really just what they call a “system structure diagram”: a stock-flow diagram missing the quantitative information that you need to get a system of ODEs from it. A stock-flow diagram, on the other hand, uniquely specifies a system of first-order ODEs.

Modelers often regard diagrams as an informal step toward a mathematically rigorous formulation of a model in terms of ODEs. However, we’ve shown that stock-flow diagrams have a precise mathematical syntax! They are objects in a category \mathsf{StockFlow}, while “open” stock-flow diagrams, where things can flow in and out of the whole system, are horizontal 1-cells in a double category \mathbb{O}\mathbf{pen}(\mathsf{StockFlow}). If you know category theory you can read a paper we wrote with Evan Patterson where we explain this:

• John C. Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood and Evan Patterson, Compositional modeling with stock and flow diagrams. To appear in Proceedings of Applied Category Theory 2022.

If you don’t, we have a gentler paper for you:

• John C. Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood and Evan Redekopp, A categorical framework for modeling with stock and flow diagrams, to appear in Mathematics for Public Health, Springer, Berlin.

Why does it help to formalize the syntax of stock-flow diagrams using category theory? There are many reasons, but here are three:

1. Functorial Semantics

Our software lets modelers separate the syntax of stock and flow diagrams from their semantics: that is, the various uses to which these diagrams are put. Different choices of semantics are described via different functors. This idea, called “functorial semantics”, goes back to Lawvere and is popular in certain realms of theoretical computer science.

Besides the ODE semantics, we have implemented functors that turn stock-flow diagrams into other widely used diagrams: “system structure diagrams”, which I already explained, and “causal loop diagrams”. It doesn’t really matter much here, but a causal loop diagram ignores the distinction between stocks, flows and variables, lumps them all together, and has arrows saying what affects what:

These other forms of semantics capture purely qualitative features of stock and flow models. In the future, people can implement still more forms of semantics, like stochastic differential equation models!

So, instead of a single monolithic model, we have something much more flexible.

2. Composition

ModelCollab provides a structured way to build complex stock-flow diagrams from small reusable pieces. These pieces are open stock-flow diagrams, and sticking together amounts to composing them.

ModelCollab lets users save these diagrams and retrieve them for reuse as parts of various larger models. Since ModelCollab can run on multiple web browsers, it lets members of a modeling team compose models collaboratively. This is a big advance on current systems, which are not optimized for collaborative work.

This picture shows two small stock-flow diagrams being composed in ModelCollab:


Some of the underlying math here was developed in earlier work using categories and epidemiological modeling, which was also done by people at Topos and their collaborators:

• Sophie Libkind, Andrew Baas, Micah Halter, Evan Patterson and James P. Fairbanks, An algebraic framework for structured epidemic modelling, Philosophical Transactions of the Royal Society A 380 (2022), 20210309.

3. Stratification

Our software also allows users to “stratify” models: that is, refine them by subdividing a single population (stock) into several smaller populations with distinct features. For example, you might take a disease model and break each stock into different age groups.

In contrast to the global changes commonly required to stratify stock-flow diagrams, our software lets users build a stratified diagram as a “pullback” of simpler diagrams, which can be saved for reuse. Pullbacks are a concept from category theory, and here we are using pullbacks in the category whose objects are system structure diagrams. Remember, these are like stock and flow diagrams, but lacking the quantitative information describing the rates of flows. After a system structure diagram has been constructed, this information can be added to obtain a stock and flow diagram.

This picture shows two different models stratified in two different ways, creating four larger models. I won’t try to really explain this here. But at least you can get a tiny glimpse of how complicated these models get. They get a lot bigger! That’s why we need software based on good math to deal with them efficiently.

References

[AJ] AlgebraicJulia: Bringing compositionality to technical computing.

[B1] John C. Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood and Evan Patterson, Compositional modeling with stock and flow diagrams. To appear in Proceedings of Applied Category Theory 2022.

[B2] John C. Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood and Eric Redekopp, A categorical framework for modeling with stock and flow diagrams, to appear in Mathematics for Public Health, Springer, Berlin.

[H] P. S. Hovmand, Community Based System Dynamics, Springer, Berlin, 2014.

[L] Sophie Libkind, Andrew Baas, Micah Halter, Evan Patterson and James P. Fairbanks, An algebraic framework for structured epidemic modelling, Philosophical Transactions of the Royal Society A 380 (2022), 20210309.

[MC] ModelCollab: A web-based application for collaborating on simulation models in real-time using Firebase.

[SF] Stockflow.jl.


Compositional Modeling with Decorated Cospans

27 June, 2022

It’s finally here: software that uses category theory to let you build models of dynamical systems! We’re going to train epidemiologists to use this to model the spread of disease. My first talk on this will be on Wednesday June 29th. You’re invited!

Compositional modeling with decorated cospans, Graph Transformation Theory and Practice (GReTA) seminar, 19:00 UTC, Wednesday 29 June 2022.

You can attend live on Zoom if you click here. You can also watch it live on YouTube, or later recorded, here:

Abstract. Decorated cospans are a general framework for composing open networks and mapping them to dynamical systems. We explain this framework and illustrate it with the example of stock and flow diagrams. These diagrams are widely used in epidemiology to model the dynamics of populations. Although tools already exist for building these diagrams and simulating the systems they describe, we have created a new software package called StockFlow which uses decorated cospans to overcome some limitations of existing software. Our approach cleanly separates the syntax of stock and flow diagrams from the semantics they can be assigned. We have implemented a semantics where stock and flow diagrams are mapped to ordinary differential equations, although others are possible. We illustrate this with code in StockFlow that implements a simplified version of a COVID-19 model used in Canada. This is joint work with Xiaoyan Li, Sophie Libkind, Nathaniel Osgood and Evan Patterson.

My talk is at a seminar on graph rewriting, so I’ll explain how the math applies to graphs before turning to ‘stock-flow diagrams’, like this one here:

Stock-flow diagrams are used to create models in epidemiology. There’s a functor mapping them to dynamical systems.

But the key idea in our work is ‘compositional modeling’. This lets different teams build different models and then later assemble them into a larger model. The most popular existing software for stock-flow diagrams does not allow this. Category theory to the rescue!

This work would be impossible without the right team! Brendan Fong developed decorated cospans and then started the Topos Institute. My coauthors Evan Patterson and Sophie Libkind work there, and they know how to program using category theory.

Evan started a seminar on epidemiological modeling – and my old grad school pal Nate Osgood showed up, along with his grad student Xiaoyan Li! Nate is a computer scientist who now runs the main COVID model for the government of Canada.

So, all together we have serious expertise in category theory, computer science, and epidemiology. Any two parts alone would not be enough for this project.

And I’m not even listing all the people whose work was essential. For example, Kenny Courser and Christina Vasilakopoulou helped modernize the theory of decorated cospans in a way we need here. James Fairbanks, Evan and others designed AlgebraicJulia, the software environment that our package StockFlow relies on. And so on!

Moral: to apply category theory to real-world problems, you need a team.

And we’re just getting started!


Epidemiological Modeling With Structured Cospans

19 October, 2020

This is a wonderful development! Micah Halter and Evan Patterson have taken my work on structured cospans with Kenny Courser and open Petri nets with Jade Master, together with Joachim Kock’s whole-grain Petri nets, and turned them into a practical software tool!

Then they used that to build a tool for ‘compositional’ modeling of the spread of infectious disease. By ‘compositional’, I mean that they make it easy to build more complex models by sticking together smaller, simpler models.

Even better, they’ve illustrated the use of this tool by rebuilding part of the model that the UK has been using to make policy decisions about COVID19.

All this software was written in the programming language Julia.

I had expected structured cospans to be useful in programming and modeling, but I didn’t expect it to happen so fast!

For details, read this great article:

• Micah Halter and Evan Patterson, Compositional epidemiological modeling using structured cospans, 17 October 2020.

Abstract. The field of applied category theory (ACT) aims to put the compositionality inherent to scientific and engineering processes on a firm mathematical footing. In this post, we show how the mathematics of ACT can be operationalized to build complex epidemiological models in a compositional way. In the first two sections, we review the idea of structured cospans, a formalism for turning closed systems into open ones, and we illustrate its use in Catlab through the simple example of open graphs. Finally, we put this machinery to work in the setting of Petri nets and epidemiological models. We construct a portion of the COEXIST model for the COVID-19 pandemic and we simulate the resulting ODEs.

You can see related articles by James Fairbanks, Owen Lynch and Evan Patterson here:

AlgebraicJulia Blog.

Also try these videos:

• James Fairbanks, AlgebraicJulia: Applied category theory in Julia, 29 July 2020.

• Evan Patterson, Realizing applied category theory in Julia, 16 January 2020.

I’m biased, but I think this is really cool cutting-edge stuff. If you want to do work along these lines let me know here and I’ll get Patterson to take a look.

Here’s part of a network created using their software:


Applied Category Theory at NIST (Part 3)

22 February, 2020

Sadly, this workshop has been cancelled due to the coronavirus pandemic. It may be postponed to a later date.

My former student Blake Pollard is working at the National Institute of Standards and Technology. He’s working with Spencer Breiner and Eswaran Subrahmanian, who are big advocates of using category theory to organize design and manufacturing processes. In the spring of 2018 they had a workshop on applied category theory with a lot of honchos from industry and government in attendance—you can see videos by clicking the link.

This spring they’re having another workshop on this topic!

Applied Category Theory Workshop, April 8-9, 2020, National Institute of Standards and Technology, Gaithersburg, Maryland. Organized by Spencer Breiner, Blake Pollard and Eswaran Subrahmanian.

The focus of this workshop in on fostering the development of tooling and use-cases supporting the applied category theory community. We are particularly interested in bringing together practitioners who are engaged with susceptible domains as well as those involved in the implementation, support, and utilization of software and other tools. There will be a number of talks/demos showcasing existing approaches as well as ample time for discussion.

Here are the speakers listed so far:

• John Baez, University of California, Riverside

• Arquimedes Canedo, Siemens

• Daniel Cicala, New Haven University

• James Fairbanks, Georgia Tech Research Institute

• Jules Hedges, Max Planck Institute for the Mathematical Sciences

• Jelle Herold, Statebox

• Evan Patterson, Stanford University

• Qunfen Qi, University of Huddersfield

• Christian Williams, University of California, Riverside

• Ryan Wisnesky, Conexus.ai

I’ll also be giving a separate talk on “ecotechnology” at NIST on Friday April 10th; more about that later!


Complex Adaptive System Design (Part 7)

19 February, 2018

In March, I’ll be talking at Spencer Breiner‘s workshop on Applied Category Theory at the National Institute of Standards and Technology. I’ll be giving a joint talk with John Foley about our work using operads to design networks. This work is part of the Complex Adaptive System Composition and Design Environment project being done by Metron Scientific Solutions and managed by John Paschkewitz at DARPA.

I’ve written about this work before:

• Complex adaptive systems design: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6.

But we’ve done a lot more, and my blog articles are having trouble keeping up! So I’d like to sketch out the big picture as it stands today.

If I had to summarize, I’d say we’ve developed a formalism for step-by-step compositional design and tasking, using commitment networks. But this takes a while to explain.

Here’s a very simple example of a commitment network:

It has four nodes, which represent agents: a port, a helicopter, a UAV (an unmanned aerial vehicle, or drone) and a target. The edges between these notes describe relationships between these agents. Some of these relationships are ‘commitments’. For example, the edges labelled ‘SIR’ say that one agent should ‘search, intervene and rescue’ the other.

Our framework for dealing with commitment networks has some special features. It uses operads, but this isn’t really saying much. An ‘operad’ is a bunch of ways of sticking things together. An ‘algebra’ of the operad gives a particular collection of these things, and says what we get when we stick them together. These concepts are extremely general, so there’s a huge diversity of operads, each with a huge diversity of algebras. To say one is using operads to solve a problem is a bit like saying one is using math. What matters more is the specific kind of operad one is using, and how one is using it.

For our work, we needed to develop a new class of operads called network operads, which are described here:

• John Baez, John Foley, Joseph Moeller and Blake Pollard, Network models.

In this paper we mainly discuss communication networks. Subsequently we’ve been working on a special class of network operads that describe how to build commitment networks.

Here are some of key ideas:

• Using network operads we can build bigger networks from smaller ones by overlaying them. David Spivak’s operad of wiring diagrams only let us ‘wire together’ smaller networks to form bigger ones:

Here networks X1, X2 and X3 are being wired together to form Y.

Network operads also let us wire together networks, but in addition they let us take one network:

and overlay another:

to create a larger network:

This is a new methodology for designing systems. We’re all used to building systems by wiring together subsystems: anyone who has a home stereo system has done this. But overlaying systems lets us do more. For example, we can take two plans of action involving the same collection of agents, and overlay them to get a new plan. We’ve all done this, too: you tell a bunch of people to do things… and then tell the same people, or an overlapping set of people, to do some other things. But lots of problems can arise if you aren’t careful. A mathematically principled approach can avoid some of these problems.

• The nodes of our networks represent agents of various types. The edges represent various relationships between agents. For example, they can represent communication channels. But more interestingly, they can represent commitments. For example, we can have an edge from A to B saying that agent A has to go rescue agent B. We call this kind of network a commitment network.

• By overlaying commitment networks, we can not only build systems out of smaller pieces but also build complicated plans by overlaying smaller pieces of plans. Since ‘tasking’ means telling a system what to do, we call this compositional tasking.

• If one isn’t careful, overlaying commitment networks can produce conflicts. Suppose we have a network with an edge saying that agent A has to rescue agent B. On top of this we overlay a network with an edge saying that A has to rescue agent C. If A can’t do both of these tasks at once, what should A do? There are various choices. We need to build a specific choice into the framework, so we can freely overlay commitment networks and get a well-defined result that doesn’t overburden the agents involved. We call this automatic deconflicting.

• Our approach to automatic deconflicting uses an idea developed by the famous category theorist Bill Lawvere: graphic monoids. I’ll explain these later, along with some of their side-benefits.

• Networks operads should let us do step-by-step compositional tasking. In other words, they should let us partially automate the process of tasking networks of agents, both

1) compositionally: tasking smaller networks and then sticking them together, e.g. by overlaying them, to get larger networks,

and

2) in a step-by-step way, starting at a low level of detail and then increasing the amount of detail.

To do this we need not just operads but their algebras.

• Remember, a network operad is a bunch of ways to stick together networks of some kind, e.g. by overlaying them. An algebra of this operad specifies a particular collection of networks of this kind, and says what we actually get when we stick them together.

So, a network operad can have one algebra in which things are described in a bare-bones, simplified way, and another algebra in which things are described in more detail. Indeed it will typically have many algebras, corresponding to many levels of detail, but for simplicity let’s just think about two.

When we have a ‘less detailed’ algebra A and a ‘more detailed’ algebra A', they will typically be related by a map

f \colon A' \to A

which ‘forgets the extra details’. This sort of map is called a homomorphism of algebras. We give examples in our paper Network models.

But what we usually want to do, when designing a system, is not forget extra detail, but rather add extra detail to a rough specification. There is not always a systematic way to do this. If there is, then we may have a homomorphism

g \colon A \to A'

going back the other way. This lets us automate the process of filling in the details. But we can’t usually count on being able to do this. So, often we may have to start with an element of A and search for an element of A' that is mapped to it by f : A' \to A. And typically we want this element to be optimal, or at least ‘good enough’, according to some chosen criteria. Expressing this idea formally helps us figure out how to automate the search. John Foley, in particular, has been working on this.

That’s an overview of our ideas.

Next, for the mathematically inclined, I want to give a few more details on one of the new elements not mentioned in our Network models paper: ‘graphic monoids’.

Graphic monoids

In our paper Network models we explain how the ‘overlay’ operation makes the collection of networks involving a given set of agents into a monoid. A monoid is a set M with a product that is associative and has an identity element 1:

(xy)z = x(yz)
1 x = x = x 1

In our application, this product is overlaying two networks.

A graphic monoid is one in which the graphic identity

x y x = x y

holds for all x,y.

To understand this identity, let us think of the elements of the monoid as “commitments”. The product x y means “first committing to do x, then committing to do y”. The graphic identity says that if we first commit to do x, then y, and then x again, it’s the same as first committing to do x and then y. Committing to do x again doesn’t change anything!

In particular, in any graphic monoid we have

xx = x 1 x = x 1 = x

so making the same commitment twice is the same as making it once. Mathematically we say every element x of a graphic monoid is idempotent:

x^2 = x

A commutative monoid obeying this law x^2 = x automatically obeys the graphic identity, since then

x y x = x^2 y = x y

But for a noncommutative monoid, the graphic identity is stronger than x^2 = x. It says that after committing to x, no matter what intervening commitments one might have made, committing to x again has no further effect. In other words: the intervening commitments did not undo the original commitment, so making the original commitment a second time has no effect! This captures the idea of how promises should behave.

As I said, for any network model, the set of all networks involving a fixed set of agents is a monoid. In a commitment network model, this monoid is required to be a graphic monoid. Joseph Moeller is writing a paper that shows how to construct a large class of commitment network models. We will follow this with a paper illustrating how to use these in compositional tasking.

For now, let me just mention a side-benefit. In any graphic monoid we can define a relation x \le y by

x \le y  \; \iff \; x a = y for some a

This makes the graphic monoid into a partially ordered set, meaning that these properties hold:

reflexivity: x \le x

transitivity: x \le y , y \le z \; \implies \; x \le z

antisymmetry: x \le y, y \le x \; \implies x = y

In the context of commitment networks, x \le y means that starting from x we can reach y by making some further commitment a: that is, x a = y for some a. So, as we ‘task’ a collection of agents by giving them more and more commitments, we move up in this partial order.


The whole series of posts:

Part 1. CASCADE: the Complex Adaptive System Composition and Design Environment.

Part 2. Metron’s software for system design.

Part 3. Operads: the basic idea.

Part 4. Network operads: an easy example.

Part 5. Algebras of network operads: some easy examples.

Part 6. Network models.

Part 7. Step-by-step compositional design and tasking using commitment networks.

Part 8. Compositional tasking using category-valued network models.

Part 9 – Network models from Petri nets with catalysts.

Part 10 – Two papers reviewing the whole project.


Pyrofex

4 February, 2018

Mike Stay is applying category theory to computation at a new startup called Pyrofex. And this startup has now entered a deal with RChain.

But let me explain why I’m interested. I’m interested in applied category theory… but this is special.

Mike Stay came to work with me at U.C. Riverside after getting a master’s in computer science at the University of Auckland, where he worked with Cristian Calude on algorithmic randomness. For example:

• Cristian S. Calude and Michael A. Stay, From Heisenberg to Gödel via Chaitin, International Journal of Theoretical Physics 44 (2008), 1053–1065.

• Cristian S. Calude and Michael A. Stay, Most programs stop quickly or never halt, Advances in Applied Mathematics, 40 (2008), 295–308.

It seems like ages ago, but I dimly remember looking at his application, seeing the title of the first of these two papers, and thinking “he’s either a crackpot, or I’m gonna like him”.

You see, the lure of relating Gödel’s theorem to Heisenberg’s uncertainty principle is fatally attractive to many people who don’t really understand either. I looked at the paper, decided he wasn’t a crackpot… and yes, it turned out I liked him.

(A vaguely similar thing happened with my student Chris Rogers, who’d written some papers applying techniques from general relativity to the study of crystals. As soon as I assured myself this stuff was for real, I knew I’d like him!)

Since Mike knew a lot about computer science, his presence at U. C. Riverside emboldened me to give a seminar on classical versus quantum computation. I used this as an excuse to learn the old stuff about the lambda-calculus and cartesian closed categories. When I first started, I thought the basic story would be obvious: people must be making up categories where the morphisms describe processes of computation.

But I soon learned I was wrong: people were making up categories where objects were data types, but the morphisms were equivalence classes of things going between data types—and this equivalence relation completely washed out the difference, between, say, a program that actually computes 237 × 419 and a program that just prints out 99303, which happens to be the answer to that problem.

In other words, the actual process of computation was not visible in the category-theoretic framework. I decided that to make it visible, what we really need are 2-categories in which 2-morphisms are ‘processes of computation’. Or in the jargon: objects are types, morphisms are terms, and 2-morphisms are rewrites.

It turned out people had already thought about this:

• Barnaby P. Hilken, Towards a proof theory of rewriting: the simply-typed 2λ-calculus, Theor. Comp. Sci. 170 (1996), 407–444.

• R. A. G. Seely, Weak adjointness in proof theory in Proc. Durham Conf. on Applications of Sheaves, Springer Lecture Notes in Mathematics 753, Springer, Berlin, 1979, pp. 697–701.

• R. A. G. Seely, Modeling computations: a 2-categorical framework, in Proc. Symposium on Logic in Computer Science 1987, Computer Society of the IEEE, pp. 65—71.

But I felt this viewpoint wasn’t nearly as popular as it should be. It should be very popular, at least among theoretical computer scientists, because it describes what’s actually going on in the lambda-calculus. If you read a big fat book on the lambda-calculus, like Barendregt’s The Lambda Calculus: Its Syntax and Semantics, you’ll see it spends a lot of time on reduction strategies: that is, ways of organizing the process of computation. All this is buried in the usual 1-categorical treatment. It’s living up at the 2-morphism level!

Mike basically agreed with me. We started by writing introduction to the usual 1-categorical stuff, and how it shows up in many different fields:

• John Baez and Michael Stay, Physics, topology, logic and computation: a Rosetta Stone, in New Structures for Physics, ed. Bob Coecke, Lecture Notes in Physics vol. 813, Springer, Berlin, 2011, pp. 95–172.

For financial reasons he had to leave U. C. Riverside and take a job at Google. But he finished his Ph.D. at the University of Auckland, with Cristian Calude and me as co-advisors. And large chunk of his thesis dealt with cartesian closed 2-categories and their generalizations suitable for quantum computation:

• Michael Stay, Compact closed bicategories, Theory and Applications of Categories 31 (2016), 755–798.

Great stuff! My students these days are building on this math and marching ahead.

I said Mike ‘basically’ agreed with me. He agreed that we need to go beyond the usual 1-categorical treatment to model the process of computation. But when it came to applying this idea to computer science, Mike wasn’t satisfied with thinking about the lambda-calculus. That’s an old model of computation: it’s okay for a single primitive computer, but not good for systems where different parts are sending messages to each other, like the internet, or even multiprocessing in a single computer. In other words, the lambda-calculus doesn’t really handle the pressing issues of concurrency and distributed computation.

So, Mike wanted to think about more modern formalisms for computation, like the pi-calculus, using 2-categories.

He left Google and wrote some papers with Greg Meredith on these ideas, for example:

• Michael Stay and Lucius Gregory Meredith, Higher category models of the pi-calculus.

• Michael Stay and Lucius Gregory Meredith, Representing operational semantics with enriched Lawvere theories.

The second one takes a key step: moving away from 2-categories to graph-enriched categories, which are simpler and perhaps better.

Then, after various twists and turns, he started a company called Pyrofex with Nash Foster. Or perhaps I should say Foster started a company with Mike, since Foster is the real bigshot of the two. Here’s what their webpage says:

Hello—

My name is Nash Foster, and together with my friend and colleague Mike Stay, I recently founded a company called Pyrofex. We founded our company for one simple reason: we love to build large-scale distributed systems that are always reliable and secure and we wanted to help users like yourself do it more easily.

When Mike and I founded the company, we felt strongly that several key advances in programming language theory would ease the development of every day large-scale systems. However, we were not sure exactly how to expose the APIs to users or how to build the tool-sets. Grand visions compete with practical necessity, so we spent months talking to users just like you to discover what kinds of things you were most interested in. We spent many hours at white-boards, checking our work against the best CS theory we know. And, of course, we have enjoyed many long days of hacking in the hopes that we can produce something new and useful, for you.

I think this is the most exciting time in history to be a programmer (and I wrote my first program in the early 1980’s). The technologies available today are varied and compelling and exciting. I hope that you’ll be as excited as I am about some of the ideas we discovered while building our software.

And on January 8th, 2018, their company entered into an arrangement with Greg Meredith’s company Rchain. Below I’ll quote part of an announcement. I don’t know much about this stuff—at least, not yet. But I’m happy to see some good ideas getting applied in the real world, and especially happy to see Mike doing it.

The RChain Cooperative & Pyrofex Corporation announce Strategic Partnership

The RChain Cooperative and Pyrofex Corporation today announced strategically important service contracts and an equity investment intended to deliver several mutually beneficial blockchain solutions. RChain will acquire 1.1 million shares of Pyrofex Common Stock as a strategic investment. The two companies will ink separate service contracts to reinforce their existing relationship and help to align their business interests.

Pyrofex will develop critical tools and platform components necessary for the long-term success of the RChain platform. These tools are designed to leverage RChain’s unique blockchain environment and make blockchain development simpler, faster, and more effective than ever before. Under these agreements, Pyrofex will develop the world’s first decentralized IDE for writing blockchain smart contracts on the RChain blockchain.

Pyrofex also commits to continuing the core development of RChain’s blockchain platform and to organizing RChain’s global developer events and conferences.

Comments on the News

“We’re thrilled to have an opportunity to strengthen our relationship with the RChain Cooperative in 2018. Their commitment to open-source development mirrors our own corporate values. It’s a pleasure to have such a close relationship with a vibrant open-source community. I’ve rarely seen the kind of excitement the Coop’s members share and we look forward to delivering some great new technology this year.” — Nash E. Foster, Cofounder & CEO, Pyrofex Corp.

“Intuitive development tools are important for us and the blockchain ecosystem as a whole; we’re incredibly glad Pyrofex intends to launch their tools on RChain first. But, Ethereum has been a huge supporter of RChain and we’re pleased that Pyrofex intends to support Solidity developers as well. Having tools that will make it possible for developers to migrate smart contracts between blockchains is going to create tremendous possibilities.” — Lucius Greg Meredith, President, RChain Cooperative Background

Pyrofex is a software development company co-founded by Dr. Michael Stay, PhD and Nash Foster in 2016. Dr. Stay and Greg Meredith are long-time colleagues and collaborators whose mutual research efforts form the mathematical foundations of RChain’s technology. One example of the work that Greg and Mike have collaborated on is the work on the LADL (Logic as Distributed Law) algorithm. You can watch Dr. Stay present the latest research from the RChain Developers retreat.

Pyrofex and its development team should be familiar to those who follow the RChain Cooperative. They currently employ 14 full-time and several part-time developers dedicated to RChain platform development. Pyrofex CEO Nash Foster and Lead Project Manager Medha Parlikar have helped grow RChain’s development team to an impressive 20+ Core devs with plans on doubling by mid 2018. The team now includes multiple PhDs, ex-Googlers, and other word class talents.

Every Wednesday, you will find Medha on our debrief updating the community with the latest developments in RChain. Here she is announcing the recent node.hello release along with a demo from core developer Chris Kirkwood-Watts.

The working relationship between the RChain Cooperative and Pyrofex has gone so well that the Board of Directors and the community at large have supported Pyrofex’s proposal to develop Cryptofex, the much needed developer tool kit for the decentralized world.

The RChain Cooperative is ecstatic to further develop its relationship with the Pyrofex team.

“As we fly towards Mercury and beyond, we all could use better tools.”
— The RChain Co-op Team.

Listen to the announcement from Greg Meredith as well as a short Q&A with Pyrofex CEO Nash Foster, from a recent community debrief.

The following is an excerpt from the soon to be released Cryptofex Whitepaper.

The Problem: Writing Software is Hard, Compiling is Harder

In 1983, Bordland Software Corporation acquired a small compiler called Compas Pascal and released it in the United States as Turbo Pascal. It was the first product to integrate a compiler and the editor in which software was written and for nearly a decade Borland’s products defined the market for integrated development environments (IDEs).

The year after Borland released TurboPascal, Ken Thompson observed the distinct and unique dangers associated with compiler technologies. In his famous Turing Award acceptance speech, Thompson described a mechanism by which a virus can be injected into a compiler such that every binary compiled with that compiler will replicate the virus.

“In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well installed microcode bug will be almost impossible to detect.” — Ken Thompson

Unfortunately, many developers today remain stuck in a world constructed in the early 1980’s. IDEs remain essentially the same, able to solve only those problems that neatly fit onto their laptop’s single Intel CPU. But barely a month ago, on 22nd November 2017, the Intel Corporation released a critical firmware update to the Intel Management Engine and in the intervening weeks, the public at large has become aware of the “Meltdown” bug. The IME and other components are exactly the sort of low-level microcode applications that Thompson warned about. Intel has demonstrated perfectly that in the past 33 years, we have learned little and gone nowhere.

Ironically, we have had a partial solution to these problems for nearly a decade. In 2009, David A. Wheeler published his PhD dissertation, in which he proposed a mechanism by which multiple compilers can be used to verify the correctness of a compiler output. Such a mechanism turns out to be tailor-made for the decentralized blockchain environment. Combining Wheeler’s mechanism with a set of economic incentives for compile farms to submit correct outputs gives us a very real shot at correcting a problem that has plagued us for more than 30 years.

The Solution: A Distributed and Decentralized Toolchain

If we crack open the development environments at companies like Google and Amazon, many of us would be surprised to discover that very few programs are compiled on a single machine. Already, the most sophisticated organizations in the world have moved to a distributed development environment. This allows them to leverage the cloud, bringing high-performance distributed computing to bear on software development itself. At Google, many thousands of machines churn away compiling code, checking it for correctness, and storing objects to be re-used later. Through clever use of caching and “hermetic” builds, Google makes its builds faster and more computationally efficient than could possibly be done on individual developer workstations. Unfortunately, most of us cannot afford to dedicate thousands of machines to compilation.

The open-source community might be able to build large scale shared compilation environments on the Internet, but Ken Thompson explained to us why we could not trust a shared environment for these workloads. However, in the age of blockchain, it’s now possible to build development environments that harness the power of large-scale compute to compile and check programs against programmer intent. Secure, cheap, and fast — we can get all three.

CryptoFex is just such a Decentralized Integrated Development Environment (DIDE) allowing software engineers to author, test, compile, and statically check their code to ensure that it is secure, efficient, and scalable.


Statebox: A Universal Language of Distributed Systems

22 January, 2018

guest post by Christian Williams

A short time ago, on the Croatian island of Zlarin, there gathered a band of bold individuals—rebels of academia and industry, whose everyday thoughts and actions challenge the separations of the modern world. They journeyed from all over to learn of the grand endeavor of another open mind, an expert functional programmer and creative hacktivist with significant mathematical knowledge: Jelle |yell-uh| Herold.

The Dutch computer scientist has devoted his life to helping our species and our planet: from consulting in business process optimization to winning a Greenpeace hackathon, from updating Netherlands telecommunications to creating a website to determine ways for individuals to help heal the earth, Jelle has gained a comprehensive perspective of the interconnected era. Through a diverse and innovative career, he has garnered crucial insights into software design and network computation—most profoundly, he has realized that it is imperative that these immense forces of global change develop thoughtful, comprehensive systematization.

Jelle understood that initiating such a grand ambition requires a massive amount of work, and the cooperation of many individuals, fluent in different fields of mathematics and computer science. Enter the Zlarin meeting: after a decade of consideration, Jelle has now brought together proponents of categories, open games, dependent types, Petri nets, string diagrams, and blockchains toward a singular end: a universal language of distributed systems—Statebox.

Statebox is a programming language formed and guided by fundamental concepts and principles of theoretical mathematics and computer science. The aim is to develop the canonical process language for distributed systems, and thereby elucidate the way these should actually be designed. The idea invokes the deep connections of these subjects in a novel and essential way, to make code simple, transparent, and concrete. Category theory is both the heart and pulse of this endeavor; more than a theory, it is a way of thinking universally. We hope the project helps to demonstrate the importance of this perspective, and encourages others to join.

The language is designed to be self-optimizing, open, adaptive, terminating, error-cognizant, composable, and most distinctively—visual. Petri nets are the natural representation of decentralized computation and concurrency. By utilizing them as program models, the entire language is diagrammatic, and this allows one to inspect the flow of the process executed by the program. While most languages only compile into illegible machine code, Statebox compiles directly into diagrams, so that the user immediately sees and understands the concrete realization of the abstract design. We believe that this immanent connection between the “geometric” and “algebraic” aspects of computation is of great importance.

Compositionality is a rightfully popular contemporary term, indicating the preservation of type under composition of systems or processes. This is essential to the universality of the type, and it is intrinsic to categories, which underpin the Petri net. A pertinent example is that composition allows for a form of abstraction in which programs do not require complete specification. This is parametricity: a program becomes executable when the functions are substituted with valid terms. Every term has a type, and one cannot connect pieces of code that have incompatible inputs and outputs—the compiler would simply produce an error. The intent is to preserve a simple mathematical structure that imposes as little as possible, and still ensure rationality of code. We can then more easily and reliably write tools providing automatic proofs of termination and type-correctness. Many more aspects will be explained as we go along, and in more detail in future posts.

Statebox is more than a specific implementation. It is an evolving aspiration, expressing an ideal, a source of inspiration, signifying a movement. We fully recognize that we are at the dawn of a new era, and do not assume that the current presentation is the best way to fulfill this ideal—but it is vital that this kind of endeavor gains the hearts and minds of these communities. By learning to develop and design by pure theory, we make a crucial step toward universal systems and knowledge. Formalisms are biased, fragile, transient—thought is eternal.

Thank you for reading, and thank you to John Baez—|bi-ez|, some there were not aware—for allowing me to write this post. Azimuth and its readers represent what scientific progress can and should be; it is an honor to speak to you. My name is Christian Williams, and I have just begun my doctoral studies with Dr. Baez. He received the invitation from Jelle and could not attend, and was generous enough to let me substitute. Disclaimer: I am just a young student with big dreams, with insufficient knowledge to do justice to this huge topic. If you can forgive some innocent confidence and enthusiasm, I would like to paint a big picture, to explain why this project is important. I hope to delve deeper into the subject in future posts, and in general to celebrate and encourage the cognitive revolution of Applied Category Theory. (Thank you also to Anton and Fabrizio for providing some of this writing when I was not well; I really appreciate it.)

Statebox Summit, Zlarin 2017, was awesome. Wish you could’ve been there. Just a short swim in the Adriatic from the old city of Šibenik |shib-enic|, there lies the small, green island of Zlarin |zlah-rin|, with just a few hundred kind inhabitants. Jelle’s friend, and part of the Statebox team, Anton Livaja and his family graciously allowed us to stay in their houses. Our headquarters was a hotel, one of the few places open in the fall. We set up in the back dining room for talks and work, and for food and sunlight we came to the patio and were brought platters of wonderful, wholesome Croatian dishes. As we burned the midnight oil, we enjoyed local beer, and already made history—the first Bitcoin transaction of the island, with a progressive bartender, Vinko.

Zlarin is a lovely place, but we haven’t gotten to the best part—the people. All who attended are brilliant, creative, and spirited. Everyone’s eyes had a unique spark to light. I don’t think I’ve ever met such a fascinating group in my life. The crew: Jelle, Anton, Emi Gheorghe, Fabrizio Genovese, Daniel van Dijk, Neil Ghani, Viktor Winschel, Philipp Zahn, Pawel Sobocinski, Jules Hedges, Andrew Polonsky, Robin Piedeleu, Alex Norta, Anthony di Franco, Florian Glatz, Fredrik Nordvall Forsberg. These innovators have provocative and complementary ideas in category theory, computer science, open game theory, functional programming, and the blockchain industry; and they came to share an important goal. These are people who work earnestly to better humanity, motivated by progress, not profit. Talking with them gave me hope, that there are enough intelligent, open-minded, and caring people to fix this mess of modern society. In our short time together, we connected—now, almost all continue to contribute and grow the endeavor.

Why is society a mess? The present human condition is absurd. We are in a cognitive renaissance, yet our world is in peril. We need to realize a deeper harmony of theory and practice—we need ideas that dare to dream big, that draw on the vast wealth of contemporary thought to guide and unite subjects in one mission. The way of the world is only a reflection of how we choose to think, and for more than a century we have delved endlessly into thought itself. If we truly learn from our thought, knowledge and application become imminently interrelated, not increasingly separate. It is imperative that we abandon preconception, pretense and prejudice, and ask with naive sincerity: “How should things be, really, and how can we make it happen?”

This pertains more generally to the irresponsibly ad hoc nature of society—we find ourselves entrenched in inadequate systems. Food, energy, medicine, finance, communications, media, governance, technology—our deepening dependence on centralization is our greatest vulnerability. Programming practice is the perfect example of the gradual failure of systems when their design is left to wander in abstraction. As business requirements evolved, technological solutions were created haphazardly, the priority being immediate return over comprehensive methodology, which resulted in ‘duct-taped’ systems, such as the Windows OS. Our entire world now depends on unsystematic software, giving rise to so much costly disorganization, miscommunication, and worse, bureaucracy. Statebox aims to close the gap between the misguided formalisms which came out of this type of degeneration, and design a language which corresponds naturally to essential mathematical concepts—to create systems which are rational, principled, universal. To explain why Statebox represents to us such an important ideal, we must first consider its closest relative, the elephant in the technological room: blockchain.

Often the best ideas are remarkably simple—in 2008, an unknown person under the alias Satoshi Nakamoto published the whitepaper Bitcoin: A Peer-to-Peer Electronic Cash System. In just a few pages, a protocol was proposed which underpins a new kind of computational network, called a blockchain, in which interactions are immediate, transparent, and permanent. This is a personal interpretation—the paper focuses on the application given in its title. In the original financial context, immediacy is one’s ability to directly transact with anyone, without intermediaries, such as banks; transparency is one’s right to complete knowledge of the economy in which one participates, meaning that each node owns a copy of the full history of the network; permanence is the irrevocability of one’s transactions. These core aspects are made possible by an elegant use of cryptography and game theory, which essentially removes the need for trusted third parties in the authorization, verification, and documentation of transactions. Per word, it’s almost peerless in modern influence; the short and sweet read is recommended.

The point of this simplistic explanation is that blockchain is about more than economics. The transaction could be any cooperation, the value could be any social good—when seen as a source of consensus, the blockchain protocol can be expanded to assimilate any data and code. After several years of competing cryptocurrencies, the importance of this deeper idea was gradually realized. There arose specialized tools to serve essential purposes in some broader system, and only recently have people dared to conceive of what this latter could be. In 2014, a wunderkind named Vitalik Buterin created Ethereum, a fully programmable blockchain. Solidity is a Turing-complete language of smart contracts, autonomous programs which enable interactions and enact rules on the network. With this framework, one can not only transact with others, but implement any kind of process; one can build currencies, websites, or organizations—decentralized applications, constructed with smart contracts, could be just about anything.

There is understandably great confidence and excitement for these ventures, and many are receiving massive public investment. Seriously, the numbers are staggering—but most of it is pure hype. There is talk of the first global computer, the internet of value, a single shared source of truth, and other speculative descriptions. But compared to the ambition, the actual theory is woefully underdeveloped. So far, implementations make almost no use of the powerful ideas of mathematics. There are still basic flaws in blockchain itself, the foundation of almost all decentralized technology. For example, the two viable candidates for transaction verification are called Proof of Work and Proof of Stake: the former requires unsustainable consumption of resources, namely hardware and electricity, and the latter is susceptible to centralization. Scalability is a major problem, thus also cost and speed of transactions. A major Ethereum dApp, Decentralized Autonomous Organization, was hacked.

These statements are absolutely not to disregard all of the great work of this community; it is primarily rhetoric to distinguish the high ideals of Statebox, and I lack the eloquence to make the point diplomatically, nor near the knowledge to give a real account of this huge endeavor. We now return to the rhetoric.

What seems to be lost in the commotion is the simple recognition that we do not yet really know what we should make, nor how to do so. The whole idea is simply too big—the space of possibility is almost completely unknown, because this innovation can open every aspect of society to reform. But as usual, people try to ignore their ignorance, imagining it will disappear, and millions clamor about things we do not yet understand. Most involved are seeing decentralization as an exciting business venture, rather than our best hope to change the way of this broken world; they want to cash in on another technological wave. Of the relatively few idealists, most still retain the assumptions and limitations of the blockchain.

For all this talk, there is little discussion of how to even work toward the ideal abstract design. Most mathematics associated to blockchain is statistical analysis of consensus, while we’re sitting on a mountain of powerful categorical knowledge of systems. At the summit, Prof. Neil Ghani said “it’s like we’re on the Moon, talking about going to Mars, while everyone back on Earth still doesn’t even have fire.” We have more than enough conceptual technology to begin developing an ideal and comprehensive system, if the right minds come together. Theory guides practice, practice motivates theory—the potential is immense.

Fortunately, there are those who have this big picture in mind. Long before the blockchain craze, Jelle saw the fundamental importance of both distributed systems and the need for academic-industrial symbiosis. In the mid-2000’s, he used Petri nets to create process tools for businesses. Employees could design and implement any kind of abstract workflow to more effectively communicate and produce. Jelle would provide consultation to optimize these processes, and integrate them into their existing infrastructure—as it executed, it would generate tasks, emails, forms and send them to designated individuals to be completed for the next iteration. Many institutions would have to shell out millions of dollars to IBM or Fujitsu for this kind of software, and his was more flexible and intuitive. This left a strong impression on Jelle, regarding the power of Petri nets and the impact of deliberate design.

Many experiences like this gradually instilled in Jelle a conviction to expand his knowledge and begin planning bold changes to the world of programming. He attended mathematics conferences, and would discuss with theorists from many relevant subjects. On the island, he told me that it was actually one of Baez’s talks about networks which finally inspired him to go for this huge idea. By sincerely and openly reaching out to the whole community, Jelle made many valuable connections. He invited these thinkers to share his vision—theorists from all over Europe, and some from overseas, gathered in Croatia to learn and begin to develop this project—and it was a great success.

By now you may be thinking, alright kid spill the beans already. Here they are, right into your brain—well, most will be in the next post, but we should at least have a quick overview of some of the main ideas not already discussed.

The notion of open system complements compositionality. The great difference between closure and openness, in society as well as theory, was a central theme in many of our conversations during the summit. Although we try to isolate and suspend life and cognition in abstraction, the real, concrete truth is what flows through these ethereal forms. Every system in Statebox is implicitly open, and this impels design to idealize the inner and outer connections of processes. Open systems are central to the Baez Network Theory research team. There are several ways to categorically formalize open systems; the best are still being developed, but the first main example can be found in The Algebra of Open and Interconnected Systems by Brendan Fong, an early member of the team.

Monoidal categories, as this blog knows well, represent systems with both series and parallel processes. One of the great challenge of this new era of interconnection is distributed computation—getting computers to work together as a supercomputer, and monoidal categories are essential to this. Here, objects are data types, and morphisms are computations, while composition is serial and tensor is parallel. As Dr. Baez has demonstrated with years of great original research, monoidal categories are essential to understanding the complexity of the world. If we can connect our knowledge of natural systems to social systems, we can learn to integrate valuable principles—a key example being complete resource cognizance.

Petri nets are presentations of free strict symmetric monoidal categories, and as such they are ideal models of “normal” computation, i.e. associative, unital, and commutative. Open Petri nets are the workhorses of Statebox. They are the morphisms of a category which is itself monoidal—and via openness it is even richer and more versatile. Most importantly it is compact closed, which introduces a simple but crucial duality into computation—input-output interchange—which is impossible in conventional cartesian closed computation, and actually brings the paradigm closer to quantum computation

Petri nets represent processes in an intuitive, consistent, and decentralized way. These will be multi-layered via the notion of operad and a resourceful use of Petri net tokens, representing the interacting levels of a system. Compositionality makes exploring their state space much easier: the state space of a big process can be constructed from those of smaller ones, a technique that more often than not avoids state space explosion, a long-standing problem in Petri net analysis. The correspondence between open Petri nets and a logical calculus, called place/transition calculus, allows the user to perform queries on the Petri net, and a revolutionary technique called information-gain computing greatly reduces response time.

Dependently typed functional programming is the exoskeleton of this beautiful beast; in particular, the underlying language is Idris. Dependent types arose out of both theoretical mathematics and computer science, and they are beginning to be recognized as very general, powerful, and natural in practice. Functional programming is a similarly pure and elegant paradigm for “open” computation. They are fascinating and inherently categorical, and deserve whole blog posts in the future.

Even economics has opened its mind to categories. Statebox is very fortunate to have several of these pioneers—open game theory is a categorical, compositional version of game theory, which allows the user to dynamically analyze and optimize code. Jules’ choice of the term “teleological category” is prescient; it is about more than just efficiency—it introduces the possibility of building principles into systems, by creating game-theoretical incentives which can guide people to cooperate for the greater good, and gradually lessen the influence of irrational, selfish priorities.

Categories are the language by which Petri nets, functional programming, and open games can communicate—and amazingly, all of these theories are unified in an elegant representation called string diagrams. These allow the user to forget the formalism, and reason purely in graphical terms. All the complex mathematics goes under the hood, and the user only needs to work with nodes and strings, which are guaranteed to be formally correct.

Category theory also models the data structures that are used by Statebox: Typedefs is a very lightweight—but also very expressive—data structure, that is at the very core of Statebox. It is based on initial F-algebras, and can be easily interpreted in a plethora of pre-existing solutions, enabling seamless integration with existing systems. One of the core features of Typedefs is that serialization is categorically internalized in the data structure, meaning that every operation involving types can receive a unique hash and be recorded on the blockchain public ledger. This is one of the many components that make Statebox fail-resistant: every process and event is accounted for on the public ledger, and the whole history of a process can be rolled back and analyzed thanks to the blockchain technology.

The Statebox team is currently working on a monograph that will neatly present how all the pertinent categorical theories work together in Statebox. This is a formidable task that will take months to complete, but will also be the cleanest way to understand how Statebox works, and which mathematical questions have still to be answered to obtain a working product. It will be a thorough document that also considers important aspects such as our guiding ethics.

The team members are devoted to creating something positive and different, explicitly and solely to better the world. The business paradigm is based on the principle that innovation should be open and collaborative, rather than competitive and exclusive. We want to share ideas and work with you. There are many blooming endeavors which share the ideals that have been described in this article, and we want them all to learn from each other and build off one another.

For example, Statebox contributor and visionary economist Viktor Winschel has a fantastic project called Oicos. The great proponent of applied category theory, David Spivak, has an exciting and impressive organization called Categorical Informatics. Mike Stay, a past student of Dr. Baez, has started a company called Pyrofex, which is developing categorical distributed computation. There are also somewhat related languages for blockchain, such as Simplicity, and innovative distributed systems such as Iota and RChain. Even Ethereum is beginning to utilize categories, with Casper. And of course there are research groups, such as Network Theory and Mathematically Structured Programming, as well as so many important papers, such as Algebraic Databases. This is just a slice of everything going on; as far as I know there is not yet a comprehensive account of all the great applied category theory and distributed innovations being developed. Inevitably these endeavors will follow the principle they share, and come together in a big way. Statebox is ready, willing, and able to help make this reality.

If you are interested in Statebox, you are welcomed with open arms. You can contact Jelle at jelle@statebox.io, Fabrizio at fabrizio@statebox.org, Emi at emi@statebox.io, Anton at anton@statebox.io; they can provide more information, connect you to the discussion, or anything else. There will be a second summit in 2018 in about six months, details to be determined. We hope to see you there. Future posts will keep you updated, and explain more of the theory and design of Statebox. Thank you very much for reading.

P.S. Found unexpected support in Šibenik! Great bar—once a reservoir.