Language Complexity (Part 3)

23 February, 2021

David A. Tanzer

A detailed measure of computational complexity

In Part 2, we identified language complexity by how “difficult” it is for a program to decide if some given text is a sentence of the language — in other words, by the complexity of the decision problem for the language.  

Note there may be many decision procedures — which are implementations of functions — that solve the decision problem for a language. For our purposes, we will focus our attention just on the ones which give the best performance. As motivation for this focus, consider that our simple language ALL_A = {“A”, “AA”, “AAA”, …} could be decided by an ill-conceived program which does all sorts of unnecessary work — but that would say nothing about the complexity of ALL_A itself. On the other hand, the procedure which simply scans the characters and verifies that all are ‘A’ is optimal, and simple — which shows that ALL_A is itself a simple language.

Our goal now is to quantify the notion of language complexity. That boils down to the matter of how to quantify the amount of work a program — in this case, the language decider — needs to do.

Computational complexity

Say we have a program P that inputs a string, goes through some steps, and outputs some results. For each input string, we can count how many computation steps it leads to.   Let StepCount(P,x) be the number of steps needed to compute the result from x.

Now let’s consider the effect of the size of the input on the number of steps. 

In general, we expect larger inputs to trigger longer sequences of computation steps. In the first place, it takes more steps to scan more text. And moreover, as the analysis gets larger, more steps will be involved.

Checking whether all characters are ‘A’ takes order N work, where N is the size of the input string.

For something that requires more than order N work, consider a program which takes as input the text representation of a number, parses it, and computes the number of factors in the number. The analysis of factors calls for a significant amount of computation beyond the N steps to scan the input. Moreover, this work will become larger and more complex as the input values become larger.

These facts are indicators of what is known as the time complexity of the computation.

Sidenote: the space complexity of a computation pertains to the amount of memory it uses, as a function of input size. In our context, we are considering just the time complexity, i.e., the running time as a function of input size.

Worst-case complexity

It is fair to ask what is meant by the number of computation steps required to compute the output, for input of size N. After all, there are multiple input strings of size N, each of which may trigger a different number of computation steps. For instance, take our loop-based decision procedure for ALL_A. Input “AAAAA” causes 5 computation steps (as the loop runs through to completion). But “AABAA”, also of length 5, causes only 3 computation steps (as the loop terminates once it sees the ‘B’).

What we care about here is just the maximum number of steps that will be needed for a given input size. In other words, we focus our analysis on the worst-case behavior for each input size N. The motivation here is that any bound on running time that we put on the worst-case inputs of size N gives us a bound for all inputs of size N.

Let MaxSteps(P,N) be the maximum number of steps that program P may perform for an input of size N.

Let P'(N) = MaxSteps(P,N).

So P’ is the function that gives us the maximum number of steps that P will perform for any input size N.

P’ is our first step towards quantifying the time complexity of the program P. 

But for typical purposes, P’ presents an overly detailed description of complexity.

For instance, take a decision procedure P for ALL_A. We would like to be able to summarize the time complexity of P by saying that it is linear in the size of the input string. That information is contained in the structure of the function P'(n). But P’ offers a much more granular description, with one value for every natural number n.

In the next posts we will look into the classification of P'(n) into general complexity classes such as linear, quadratic and exponential.

Reposted from the Signal Beat, Copyright © 2021, All Rights Reserved.

Applied Category Theory 2021

17 February, 2021

The big annual applied category theory conference is coming! It’s the fourth one: the first three were at Leiden, Oxford and (virtually) MIT. This one will be online and also, with luck, in person—but don’t make your travel arrangements just yet:

Fourth Annual International Conference on Applied Category Theory (ACT 2021), 12–16 July 2021, online and at the Computer Laboratory of the University of Cambridge.

It will take place shortly after the Applied Category Theory Adjoint School, which will—with luck—culminate in a meeting 5–9 July at the same location.

You can now submit a paper! As in a computer science conference, that’s how you get to give a talk. For more details, read on.


Applied category theory is a topic of interest for a growing community of researchers, interested in studying many different kinds of systems using category-theoretic tools. These systems are found across computer science, mathematics, and physics, as well as in social science, linguistics, cognition, and neuroscience. The background and experience of our members is as varied as the systems being studied. The goal of the Applied Category Theory conference series is to bring researchers
together, disseminate the latest results, and facilitate further development of the field.

We accept submissions of both original research papers, and work accepted/submitted/ published elsewhere. Accepted original research papers will be invited for publication in a proceedings volume. The keynote addresses will be drawn from the best accepted papers. The conference will include an industry showcase event.

We hope to run the conference as a hybrid event, with physical attendees present in Cambridge, and other participants taking part online. However, due to the state of the pandemic, the possibility of in-person attendance is not yet confirmed. Please do not book your travel or hotel accommodation yet.

Important dates (all in 2021)

• Submission of contributed papers: Monday 10 May

• Acceptance/rejection notification: Monday 7 June

• Adjoint school: Monday 5 July to Friday 9 July

• Main conference: Monday 12 July to Friday 16 July


The following two types of submissions are accepted:

• Proceedings Track. Original contributions of high-quality work consisting of an extended abstract, up to 12 pages, that provides evidence of results of genuine interest, and with enough detail to allow the program committee to assess the merits of the work. Submission of work-in-progress is encouraged, but it must be more substantial than a research proposal.

• Non-Proceedings Track. Descriptions of high-quality work submitted or published elsewhere will also be considered, provided the work is recent and relevant to the conference. The work may be of any length, but the program committee members may only look at the first 3 pages of the submission, so you should ensure that these pages contain sufficient evidence of the quality and rigour of your work.

Papers in the two tracks will be reviewed against the same standards of quality. Since ACT is an interdisciplinary conference, we use two tracks to accommodate the publishing conventions of different disciplines. For example, those from a Computer Science background may prefer the Proceedings Track, while those from a Mathematics, Physics or other background may prefer the Non-Proceedings Track. However, authors from any background are free to choose the track that they prefer, and submissions may be moved from the Proceedings Track to the Non-Proceedings Track at any time at the request of the authors.

Contributions must be submitted in PDF format. Submissions to the Proceedings Track must be prepared with LaTeX, using the EPTCS style files available at

The submission link will soon be available on the ACT2021 web page:

Program committee

Chair: Kohei Kishida, University of Illinois, Urbana-Champaign

The full program committee will be announced soon.

Local organizers

• Lukas Heidemann, University of Oxford
• Nick Hu, University of Oxford
• Ioannis Markakis, University of Cambridge
• Alex Rice, University of Cambridge
• Calin Tataru, University of Cambridge
• Jamie Vicary, University of Cambridge

Steering committee

• John Baez, University of California Riverside and Centre for Quantum Technologies
• Bob Coecke, Cambridge Quantum Computing
• Dorette Pronk, Dalhousie University
• David Spivak, Topos Institute

Language Complexity (Part 2)

17 February, 2021

David A. Tanzer

Decision problems, decision procedures and complexity

In Part 1, we introduced the formal idea of a language, as being just a set of sentences. Now let’s approach the topic of language complexity.

For any given language, there is an associated decision problem: given a candidate string of characters, determine whether or not it belongs to the language. A decision procedure, or decider, is a program that solves the decision problem: it returns True or False to indicate whether a given character string belongs to the language.

decider_for_english("This is a tomato") = True 
decider_for_english("Tomato a Eso 3e") = False   

Here is the key idea for measuring language complexity. A simple language will have a decision procedure that is straightforward and runs quickly. In contrast, a complex language calls for a lot of “thinking” on the part of the decision procedure, in order for it to assess membership in the language. So the the complexity of a language will be defined by the computational complexity of its decision procedure.

Now let’s look at the complexity of a tiny language:  

L = {“Bob likes Alice”, “Alice likes Bob”}.

def decider_for_L(text):
    if text == "Bob likes Alice": return True
    if text == "Alice likes Bob": return True
    return False # none of the cases matched

Because this code runs quickly, the theory classifies L as a simple language. (Since we already knew that L was simple, this is a good check on the theory.) More generally, every language which is finite will get classified as simple, since a decision procedure can be written using this pattern.

With infinite languages, things get more interesting. Take the language ALL_A, with strings consisting only of the character ‘A’:

ALL_A = {“A”, “AA”, “AAA”, “AAAA”, …}

We can’t write a decision procedure for ALL_A using the above code pattern, as it would create a program with an infinite number of lines. That’s not feasible in reality, so we’ll rule it out.

However, we can still write a decision procedure for ALL_A. This is easy: just loop through every character in the string and check whether it equals ‘A’.

Again, the theory matches our expectations. We see that ALL_A is a simple language, and the theory classifies it as so.

Now consider the language PRIME_A, with strings just containing ‘A’, where the length is a prime number:


We can write a decision procedure for PRIME_A, but now the code has more work to do. First it has to loop through the input, to count the number of characters N. Then it has to analyze whether N is prime. This is related to the work of factorizing N, which is not a trivial matter. And the work increases as N gets larger.

So the theory tells us that PRIME_A has a greater complexity than ALL_A. And indeed it does. A human decider would have to do a lot more mental processing in order to determine membership in PRIME_A as compared to membership in ALL_A.

With these ideas, you should now have a qualitative understanding for what language complexity means. In the next posts, we will refine it into a quantitative concept.

Reposted from the Signal Beat, Copyright © 2021, All Rights Reserved.

Language Complexity (Part 1)

12 February, 2021

David A. Tanzer

A simple view of languages

How complex is the English language? The question has many dimensions and nuances. What does complexity mean, and how could it be measured? This a tough nut to crack, so in this post we’ll make a retrenchment to reconsider the question in a more formal setting — computer science.

In this setting, a language gets identified with its extension — the collection of all its sentences. So English gets modeled as just a large repository of sentences. Although this is an ‘impoverished’ view of language, it still has value, as simpler models can give us ideas to work with.

It is easy to see that the extension of the English language is an infinite set of sentences, as it includes for example:

  • The father of Adam did not exist.
  • The father of the father of Adam did not exist.
  • The father of the father of the father of Adam did not exist.
  • Etc., ad infinitum

One can also consider very small languages. For example, the sublanguage of English consisting of the sentences where the subject and object are either Alice or Bob, and the only verb is ‘likes’:

  • Alice likes Alice.
  • Alice likes Bob.
  • Bob likes Alice.
  • Bob likes Bob.

Let’s call this language AB:

AB = {“Alice likes Alice”, “Alice likes Bob”, “Bob likes Alice”, “Bob likes Bob”}.

Here, a ‘sentence’ means something rather simple – a string of characters belonging to the language. So the sentence “Bob likes Bob” just means [B, o, b, SPACE, l, i, k, e, s, SPACE, B, o, b], and the word “Alice” is just a shorthand for the string [A, l, i, c, e]. There are no semantics here.

Here are a couple of other examples of formal languages.

  • The set of all strings involving just the characters 0 and 1, which start with 0 and alternate. That is: {“0”, “01”, “010”, “0101”, …}. This is a simple example of an infinite language.
  • The set of prime numbers, written out as strings: [“2”, “3”, “5”, “7”, “11”, “13”, “17”, …]. Another infinite language.

Now that we know what these ‘languages’ are, we are ready to talk about how to measure their complexity.

Reposted from the Signal Beat, Copyright © 2021, All Rights Reserved.

Graph Transformation Theory and Applications

4 December, 2020

I love graph rewriting—the study of ways to change one graph into another by changing one small part at a time. My student Daniel Cicala did his thesis on this! So I’m happy to hear about the new virtual seminar series GReTA: Graph TRansformation Theory and Applications.

It aims to serve as a platform for the international graph rewriting community, promote recent developments and trends in the field, and encourage regular networking and interaction between members of this community.

Seminars are held twice a month in the form of Zoom sessions (some of which will be live-streamed to YouTube). Go to the link if you want to join on Zoom.

You can get regular updates on the GReTA seminars in several ways:

• Subscribe to the GReTA YouTube channel.

• Subscribe to the GReTA Google Calendar (or alternatively import it in iCal format).

• Subscribe to the GReTA mailing list.

Here are the two talks so far. Any subject that can promote talks on both logic and chemistry must be good! Thinking of chemistry and logic as two aspects of the same thing is bound to trigger new ideas. (Just as a sequence of chemical reactions converts reactants into products, a proof converts assumptions into conclusions.)

Speaker: Barbara König
Title: Graph transformation meets logic

Abstract. We review the integration of (first-order) logic respectively nested conditions into graph transformation. Conditions can serve various purposes: they can constrain graph rewriting, symbolically specify sets of graphs, be used in query languages and in verification (for instance in Hoare logic and for behavioural equivalence checking). In the graph transformation community the formalism of nested graph conditions has emerged, that is, conditions which are equivalent to first-order logic, but directly integrate graphs and graph morphisms, in order to express constraints more succinctly. In this talk we also explain how the notion of nested conditions can be lifted from graph transformation systems to the setting of reactive systems as defined by Leifer and Milner. It turns out that some constructions for graph transformation systems (such as computing weakest preconditions and strongest postconditions and showing local confluence by means of critical pair analysis) can be done quite elegantly in the more general setting.

Speakers: Daniel Merkle and Jakob Lykke Andersen
Title: Chemical graph transformation and applications

Abstract: Any computational method in chemistry must choose some level of precision in the modeling. One choice is made in the methods of quantum chemistry based on quantum field theory. While highly accurate, the methods are computationally very demanding, which restricts their practical use to single reactions of molecules of moderate size even when run on supercomputers. At the same time, most existing computational methods for systems chemistry and biology are formulated at the other abstraction extreme, in which the structure of molecules is represented either not at all or in a very rudimentary fashion that does not permit the tracking of individual atoms across a series of reactions.

In this talk, we present our on-going work on creating a practical modelling framework for chemistry based on Double Pushout graph transformation, and how it can be applied to analyse chemical systems. We will address important technical design decisions as well as the importance of methods inspired from Algorithm Engineering in order to reach the required efficiency of our implementation. We will present chemically relevant features that our framework provides (e.g. automatic atom tracing) as well as a set of chemical systems we investigated are currently investigating. If time allows we will discuss variations of graph transformation rule compositions and their chemical validity.

Epidemiological Modeling With Structured Cospans

19 October, 2020

This is a wonderful development! Micah Halter and Evan Patterson have taken my work on structured cospans with Kenny Courser and open Petri nets with Jade Master, together with Joachim Kock’s whole-grain Petri nets, and turned them into a practical software tool!

Then they used that to build a tool for ‘compositional’ modeling of the spread of infectious disease. By ‘compositional’, I mean that they make it easy to build more complex models by sticking together smaller, simpler models.

Even better, they’ve illustrated the use of this tool by rebuilding part of the model that the UK has been using to make policy decisions about COVID19.

All this software was written in the programming language Julia.

I had expected structured cospans to be useful in programming and modeling, but I didn’t expect it to happen so fast!

For details, read this great article:

• Micah Halter and Evan Patterson, Compositional epidemiological modeling using structured cospans, 17 October 2020.

Abstract. The field of applied category theory (ACT) aims to put the compositionality inherent to scientific and engineering processes on a firm mathematical footing. In this post, we show how the mathematics of ACT can be operationalized to build complex epidemiological models in a compositional way. In the first two sections, we review the idea of structured cospans, a formalism for turning closed systems into open ones, and we illustrate its use in Catlab through the simple example of open graphs. Finally, we put this machinery to work in the setting of Petri nets and epidemiological models. We construct a portion of the COEXIST model for the COVID-19 pandemic and we simulate the resulting ODEs.

You can see related articles by James Fairbanks, Owen Lynch and Evan Patterson here:

AlgebraicJulia Blog.

Also try these videos:

• James Fairbanks, AlgebraicJulia: Applied category theory in Julia, 29 July 2020.

• Evan Patterson, Realizing applied category theory in Julia, 16 January 2020.

I’m biased, but I think this is really cool cutting-edge stuff. If you want to do work along these lines let me know here and I’ll get Patterson to take a look.

Here’s part of a network created using their software:

ACT2020 Program

27 June, 2020


Applied Category Theory 2020 is coming up soon! After the Tutorial Day on Sunday July 6th, there will be talks from Monday July 7th to Friday July 10th. All talks will be live on Zoom and on YouTube. Recorded versions will appear on YouTube later.

Here is the program—click on it to download a more readable version:

Here are the talks! They come in three kinds: keynotes, regular presentations and short industry presentations. Within each I’ve listed them in alphabetical order by speaker: I believe the first author is the speaker.

This is gonna be fun.

Keynote presentations (35 minutes)

• Henry Adams, Johnathan Bush and Joshua Mirth, Operations on metric thickenings.

• Nicolas Blanco and Noam Zeilberger: Bifibrations of polycategories and classical linear logic.

• Bryce Clarke, Derek Elkins, Jeremy Gibbons, Fosco Loregian, Bartosz Milewski, Emily Pillmore and Mario Román: Profunctor optics, a categorical update.

• Tobias Fritz, Tomáš Gonda, Paolo Perrone and Eigil Rischel: Distribution functors, second-order stochastic dominance and the Blackwell–Sherman–Stein Theorem in categorical probability.

• Micah Halter, Evan Patterson, Andrew Baas and James Fairbanks: Compositional scientific computing with Catlab and SemanticModels.

• Joachim Kock: Whole-grain Petri nets and processes.

• Andre Kornell, Bert Lindenhovius and Michael Mislove: Quantum CPOs.

• Martha Lewis: Towards logical negation in compositional distributional semantics.

• Jade Master and John Baez: Open Petri nets.

• Lachlan McPheat, Mehrnoosh Sadrzadeh, Hadi Wazni and Gijs Wijnholds, Categorical vector space semantics for Lambek calculus with a relevant modality.

• David Jaz Myers: Double categories of open dynamical systems.

• Toby St Clere Smithe, Cyber Kittens, or first steps towards categorical cybernetics.

Regular presentations (20 minutes)

• Robert Atkey, Bruno Gavranović, Neil Ghani, Clemens Kupke, Jeremy Ledent and Fredrik Nordvall Forsberg: Compositional game theory, compositionally.

• John Baez and Kenny Courser: Coarse-graining open Markov processes.

• Georgios Bakirtzis, Christina Vasilakopoulou and Cody Fleming, Compositional cyber-physical systems modeling.

• Marco Benini, Marco Perin, Alexander Alexander Schenkel and Lukas Woike: Categorification of algebraic quantum field theories.

• Daniel Cicala: Rewriting structured cospans.

• Bryce Clarke: A diagrammatic approach to symmetric lenses.

• Bob Coecke, Giovanni de Felice, Konstantinos Meichanetzidis, Alexis Toumi, Stefano Gogioso and Nicolo Chiappori: Quantum natural language processing.

• Geoffrey Cruttwell, Jonathan Gallagher and Dorette Pronk: Categorical semantics of a simple differential programming language.

• Swaraj Dash and Sam Staton: A monad for probabilistic point processes.

• Giovanni de Felice, Elena Di Lavore, Mario Román and Alexis Toumi: Functorial language games for question answering.

• Giovanni de Felice, Alexis Toumi and Bob Coecke: DisCoPy: monoidal categories in Python.

• Brendan Fong, David Jaz Myers and David I. Spivak: Behavioral mereology: a modal logic for passing constraints.

• Rocco Gangle, Gianluca Caterina and Fernando Tohme, A generic figures reconstruction of Peirce’s existential graphs (alpha).

• Jules Hedges and Philipp Zahn: Open games in practice.

• Jules Hedges: Non-compositionality in categorical systems theory.

• Michael Johnson and Robert Rosebrugh, The more legs the merrier: A new composition for symmetric (multi-)lenses.

• Joe Moeller, John Baez and John Foley: Petri nets with catalysts.

• John Nolan and Spencer Breiner, Symmetric monoidal categories with attributes.

• Joseph Razavi and Andrea Schalk: Gandy machines made easy via category theory.

• Callum Reader: Measures and enriched categories.

• Mario Román: Open diagrams via coend calculus.

• Luigi Santocanale, Dualizing sup-preserving endomaps of a complete lattice.

• Dan Shiebler: Categorical stochastic processes and likelihood.

• Richard Statman, Products in a category with only one object.

• David I. Spivak: Poly: An abundant categorical setting for mode-dependent dynamics.

• Christine Tasson and Martin Hyland, The linear-non-linear substitution 2-monad.

• Tarmo Uustalu, Niccolò Veltri and Noam Zeilberger: Proof theory of partially normal skew monoidal categories.

• Dmitry Vagner, David I. Spivak and Evan Patterson: Wiring diagrams as normal forms for computing in symmetric monoidal categories.

• Matthew Wilson, James Hefford, Guillaume Boisseau and Vincent Wang: The safari of update structures: visiting the lens and quantum enclosures.

• Paul Wilson and Fabio Zanasi: Reverse derivative ascent: a categorical approach to learning Boolean circuits.

• Vladimir Zamdzhiev: Computational adequacy for substructural lambda calculi.

• Gioele Zardini, David I. Spivak, Andrea Censi and Emilio Frazzoli: A compositional sheaf-theoretic framework for event-based systems.

Industry presentations (8 minutes)

• Arquimedes Canedo (Siemens Corporate Technology).

• Brendan Fong (Topos Institute).

• Jelle Herold (Statebox): Industrial strength CT.

• Steve Huntsman (BAE): Inhabiting the value proposition for category theory.

• Ilyas Khan (Cambridge Quantum Computing).

• Alan Ransil (Protocol Labs): Compositional data structures for the decentralized web.

• Alberto Speranzon (Honeywell).

• Ryan Wisnesky (Conexus): Categorical informatics at scale.

ACT2020 Tutorial Day

17 June, 2020

If you’re wanting to learn some applied category theory, register for the tutorials that are taking place on July 5, 2020 as part of ACT2020!

Applied category theory offers a rigorous mathematical language and toolset for relating different concepts from across math, science, and technology. For example, category theory finds common patterns between geometry (shapes), algebra (equations), numbers, logic, probability, etc. Applied category theory (ACT) looks for how those very same patterns extend outward to data, programs, processes, physics, linguistics, and so on—things we see in the real world. The field is currently growing, as new applications and common patterns are being found all the time. When you understand these ideas, more of your intuitions about the world can be made rigorous and thus be communicated at a larger scale. This in turn gives our community a chance to solve larger and more complex scientific, technological, and maybe even societal problems.

This year’s international applied category theory conference ACT2020 is having a tutorial day, meant to introduce newcomers to applied category theory. Tutorial day will take place on July 5 and will include a few main topics that will be taught semi-traditionally (via presentation, exercises, and discussion) over Zoom, as well as mentors who will be available throughout the day to work with smaller groups and/or individuals. We invite you to sign up here if you’re interested, so we can keep you posted. Hope to see you there!

The four courses will be roughly as follows:

• David Spivak: categorical databases for introducing sets, functions, categories, and functors.

• Fabrizio Genovese: string diagrams as a graphical language for category theory.

• Emily Riehl: the Yoneda lemma in the context of matrices.

• Paolo Perrone: monads and comonads.

A Complete Axiomatisation of Partial Differentiation

18 May, 2020

In the eighth talk of the ACT@UCR seminar, Gordon Plotkin told us about partial differentiation, viewed as a logical theory.

He gave his talk on Wednesday May 20th. Afterwards we discussed it on the Category Theory Community Server, here:

You can view or join the conversation there if you sign in.

You can see his slides here, or download a video of his talk here, or watch his video here:

• Gordon Plotkin, A complete axiomatisation of partial differentiation.

Abstract. We formalise the well-known rules of partial differentiation in a version of equational logic with function variables and binding constructs. We prove the resulting theory is complete with respect to polynomial interpretations. The proof makes use of Severi’s theorem that all multivariate Hermite problems are solvable. We also hope to present a number of related results, such as decidability and Hilbert–Post completeness.


The Category Theory Behind UMAP

10 February, 2020

An interesting situation has arisen. Some people working on applied category theory have been seeking a ‘killer app’: that is, an application of category theory to practical tasks that would be so compelling it would force the world to admit categories are useful. Meanwhile, the UMAP algorithm, based to some extent on category theory, has become very important in genomics:

• Leland McInnes, John Healy and James Melville, UMAP: uniform manifold approximation and projection for dimension reduction.

But while practitioners have embraced the algorithm, they’re still puzzled by its category-theoretic underpinnings, which are discussed in Section 2 of the paper. (You can read the remaining sections, which describe the algorithm quite concretely, without understanding Section 2.)

I first heard of this situation on Twitter when James Nichols wrote:

Wow! My first sighting of applied category theory: the UMAP algorithm. I’m a category novice, but the resulting adjacency-graph algorithm is v simple, so surely the theory boils down to reasonably simple arguments in topology/Riemannian geometry?

Do any of you prolific ACT tweeters know much about UMAP? I understand the gist of the linked paper, but not say why we need category theory to define this “fuzzy topology” concept, as opposed to some other analytic defn.

Junhyong Kim added:

What was gained by CT for UMAP? (honest question, not trying to be snarky)

Leland McInnes, one of the inventors of UMAP, responded:

It is my math background, how I think about the problem, and how the algorithm was derived. It wasn’t something that was added, but rather something that was always there—for me at least. In that sense what was gained was the algorithm.

I don’t really understand UMAP; for a good introduction to it see the original paper above and also this:

• Nikolay Oskolkov, How Exactly UMAP Works—and Why Exactly It Is Better Than tSNE, 3 October 2019.

tSNE is an older algorithm for taking clouds of data points in high dimensions and mapping them down to fewer dimensions so we can understand what’s going on. From the viewpoint of those working on genomics, the main good thing about UMAP is that it solves a bunch of problems that plagued tSNE. Oskolkov explains what these problems are and how UMAP deals with them. But he also alludes to the funny disconnect between these practicalities and the underlying theory:

My first impression when I heard about UMAP was that this was a completely novel and interesting dimension reduction technique which is based on solid mathematical principles and hence very different from tSNE which is a pure Machine Learning semi-empirical algorithm. My colleagues from Biology told me that the original UMAP paper was “too mathematical”, and looking at the Section 2 of the paper I was very happy to see strict and accurate mathematics finally coming to Life and Data Science. However, reading the UMAP docs and watching Leland McInnes talk at SciPy 2018, I got puzzled and felt like UMAP was another neighbor graph technique which is so similar to tSNE that I was struggling to understand how exactly UMAP is different from tSNE.

He then goes on and attempts to explain exactly why UMAP does so much better than tSNE. None of his explanation mentions category theory.

Since I don’t really understand UMAP or why it does better than tSNE, I can’t add anything to this discussion. In particular, I can’t say how much the category theory really helps. All I can do is explain a bit of the category theory. I’ll do that now, very briefly, just as a way to get a conversation going. I will try to avoid category-theoretic jargon as much as possible—not because I don’t like it or consider it unimportant, but because that jargon is precisely what’s stopping certain people from understanding Section 2.

I think it all starts with this paper by Spivak, which McInnes, Healy and Melville cite but for some reason don’t provide a link to:

• David Spivak, Metric realization of fuzzy simplicial sets.

Spivak showed how to turn a ‘fuzzy simplicial set’ into an ‘uber-metric space’ and vice versa. What are these things?

An ‘uber-metric space’ is very simple. It’s a slight generalization of a metric space that relaxes the usual definition in just two ways: it lets distances be infinite, and it lets distinct points have distance zero from each other. This sort of generalization can be very useful. I could talk about it a lot, but I won’t.

A fuzzy simplicial set is a generalization of a simplicial set.

A simplicial set starts out as a set of vertices (or 0-simplices), a set of edges (or 1-simplices), a set of triangles (or 2-simplices), a set of tetrahedra (or 3-simplices), and so on: in short, a set of n-simplices for each n. But there’s more to it. Most importantly, each n-simplex has a bunch of faces, which are lower-dimensional simplices.

I won’t give the whole definition. To a first approximation you can visualize a simplicial set as being like this:

But of course it doesn’t have to stop at dimension 3—and more subtly, you can have things like two different triangles that have exactly the same edges.

In a ‘fuzzy’ simplicial set, instead of a set of n-simplices for each n, we have a fuzzy set of them. But what’s a fuzzy set?

Fuzzy set theory is good for studying collections where membership is somewhat vaguely defined. Like a set, a fuzzy set has elements, but each element has a ‘degree of membership’ that is a number 0 < x ≤ 1. (If its degree of membership were zero, it wouldn't be an element!)

A map f: X → Y between fuzzy sets is an ordinary function, but obeying this condition: it can only send an element x ∈ X to an element f(x) ∈ Y whose degree of membership is greater than or equal to that of x. In other words, we don't want functions that send things to things with a lower degree of membership.

Why? Well, if I'm quite sure something is a dog, and every dog has a nose, then I'm must be at least equally sure that this dog has a nose! (If you disagree with this, then you can make up some other concept of fuzzy set. There are a number of such concepts, and I'm just describing one.)

So, a fuzzy simplicial set will have a set of n-simplices for each n, with each n-simplex having a degree of membership… but the degree of membership of its faces can't be less than its own degree of membership.

This is not the precise definition of fuzzy simplicial set, because I'm leaving out some distracting nuances. But you can get the precise definition by taking a nuts-and-bolts definition of simplicial set, like Definition 3.2 here:

• Greg Friedman, An elementary illustrated introduction to simplicial sets.

and replacing all the sets by fuzzy sets, and all the maps by maps between fuzzy sets.

If you like visualizing things, you can visualize a fuzzy simplicial set as an ordinary simplicial set, as in the picture above, but where an n-simplex is shaded darker if its degree of membership is higher. An n-simplex can’t be shaded darker than any of its faces.

How can you turn a fuzzy simplicial set into an uber-metric space? And how can you turn an uber-metric space into a fuzzy simplicial set?

Spivak focuses on the first question, because the answer is simpler, and it determines the answer to the second using some category theory. (Psst: adjoint functors!)

The answer to the first question goes like this. Say you have a fuzzy simplicial set. For each n-simplex whose degree of membership equals a, you turn it into a copy of this uber-metric space:

\{ (t_0, t_1, \dots, t_n) : t_0 + \cdots + t_n = - \log a , \; t_0, \ldots, t_n \geq 0 \} \subseteq \mathbb{R}^{n+1}

This is really just an ordinary metric space: an n-simplex that’s a subspace of Euclidean (n+1)-dimensional space with its usual Euclidean distance function. Then you glue together all these uber-metric spaces, one for each simplex in your fuzzy simplical set, to get a big fat uber-metric space.

This process is called ‘realization’. The key here is that if an n-simplex has a high degree of membership, it gets ‘realized’ as a metric space shaped like a small n-simplex. I believe the basic intuition is that an n-simplex with a high degree of membership describes an (n+1)-tuple of things—its vertices—that are close to each other.

In theory, I should try to describe the reverse process that turns an uber-metric space into a fuzzy simplicial set. If I did, I believe we would see that whenever an (n+1)-tuple of things—that is, points of our uber-metric space—are close, they give an n-simplex with a high degree of membership.

If so, then both uber-metric spaces and fuzzy simplicial sets are just ways of talking about which collections of data points are close, and we can translate back and forth between these descriptions.

But I’d need to think about this a bit more to do a good job of going further, and reading the UMAP paper a bit more I’m beginning to suspect that’s not the main thing that practitioners need to understand. I’m beginning to think the most useful thing is to get a feeling for fuzzy simplicial sets! I hope I’ve helped a bit in that direction. They are very simple things. They are also closely connected to an idea from topological data analysis:

• Nina Otter, Magnitude meets persistence. Homology theories for filtered simplicial sets.

I should admit that McInnes, Healy and Melville tweak Spivak’s formalism a bit. They call Spivak’s uber-metric spaces ‘extended-pseudo-metric spaces’, but they focus on a special kind, which they call ‘finite’. Unfortunately, I can’t find where they define this term. They also only consider a special sort of fuzzy simplicial set, which they call ‘bounded’, but I can’t find the definition of this term either! Without knowing these definitions, I can’t comment on how these tweaks change things.