guest post by Mike Stay

Programs are an expression of programmer intent. We want the computer to do something for us, so we need to tell it what to do. We make mistakes, though, so we want to be able to check somehow that the program will do what we want. The idea of semantics for a programming language is that we assign some meaning to programs in such a way that we can reason about the behavior of a program. There are two main approaches to this: denotational semantics and operational semantics. I’ll discuss both below, but the post will focus for the most part on operational semantics.

There’s a long history of using 2-categories and related structures for term rewriting and operational semantics, but Greg Meredith and I are particularly fond of an approach using multisorted Lawvere theories enriched over the category of reflexive directed graphs, which we call Gph. Such enriched Lawvere theories are equal in power to, for instance, Sassone and Sobociński’s reactive systems, but in my opinion they have a cleaner categorical presentation. We wrote a paper on them:

Here I’ll just sketch the basic ideas.

Denotational Semantics

Denotational semantics works really well for functional programming languages. The actual process of computation is largely ignored in denotational semantics; it doesn’t matter how you compute the function, just what the function is. John Baez’s seminar eleven years ago explored Lambek and Scott’s approach to the denotational semantics of lambda calculus, and there was extensive discussion on this blog. Lambek and Scott constructed a cartesian closed category of types and α \alpha - β \beta - η \eta -equivalence classes of terms with one free variable, and then assigned meaning to the types and terms with a cartesian closed functor into Set.

Denotational semantics gets a lot harder once we move away from functional programming languages. Modern programs run on multiple computers at the same time and each computer has several cores. The computers are connected by networks that can mix up the order in which messages are received. A program may run perfectly by itself, but deadlock when you run it in parallel with another copy of itself. The notion of “composition” begins to change, too: we run programs in parallel with each other and let them interact by passing messages back and forth, not simply by feeding the output of one function into the input of another. All of this makes it hard to think of such programs as functions.

Operational Semantics

Operational semantics is the other end of the spectrum, concerned with the rules by which the state of a computer changes. Whereas denotational semantics is inspired by Church and the lambda calculus, operational semantics is inspired by Turing and his machines. All of computational complexity lives here (e.g. P = ? NP P \stackrel{?}{=} NP ).

To talk about the operational semantics of a programming language, there are five things we need to define.

First, we have to describe the layout of the state of the computer. For each kind of data that goes into a description of the state, we have a sort. If we’re using a programming language like lambda calculus, we have a sort for variables and a sort for terms, and the term is the entire state of the computer. If we’re using a Turing machine, there are more parts: the tape, the state transition table, the current state, and the position of the read/write head on the tape. If we’re using a modern language like JavaScript, the state is very complex: there are a couple of stacks, the heap, the lexical environment, the this binding, and more.

Second, we have to build up the state itself using term constructors. For example, in lambda calculus, we start with variables and use abstraction and application to build up a specific term.

Third, we say what rearrangements of the state we’re going to ignore; this is called structural congruence. In lambda calculus, we say that two terms are the same if they only differ in the choice of bound variables. In pi calculus, it doesn’t matter in what order we list the processes that are all running at the same time.

Fourth, we give reduction rules describing how the state is allowed to change. In lambda calculus, the state only changes via β \beta -reduction, substituting the argument of a function for the bound variable. In a Turing machine, each state leads to one of five others (change the bit to 0 or 1, then move left or right; or halt). In pi calculus, there may be more than one transition possible out of a particular state: if a process is listening on a channel and there are two messages, then either message may be processed first. Computational complexity theory is all about how many steps it takes to compute a result, so we do not have equations between sequences of rewrites.

Finally, the reduction rules themselves may only apply in certain contexts; for example, in all modern programming languages based on the lambda calculus, no reductions happen under an abstraction. That is, even if a term t t reduces to t ′ t' , it is never the case that λ x . t \lambda x.t reduces to λ x . t ′ \lambda x.t' . The resulting normal form is called “weak head normal form”.

Here’s an example from Boudol’s paper “The π \pi -calculus in direct style.” There are two sorts: x x or z z for variables and L L or N N for terms. The first line, labeled “syntax,” defines four term constructors. There are equations for structural congruence, and there are two reduction rules followed by the contexts in which the rules apply:

Category theory

We’d like to formalize this using category theory. For our first attempt, we capture almost all of this information in a multisorted Gph-enriched Lawvere theory: we have a generating object for each sort, a generating morphism for each term constructor, an equation between morphisms encoding structural congruence, and an edge for each rewrite.

We interpret the theory in Gph. Sorts map to graphs, term constructors to graph homomorphisms, equations to equations, and rewrites map to things I call “graph transformations”, which are the obvious generalization of a natural transformation to the category of graphs: a graph transformation between two graph homomorphisms α : F ⇒ G \alpha:F\Rightarrow G assigns to each vertex v v an edge α v : Fv → Gv \alpha_v:Fv \to Gv . There’s nothing about a commuting square in the definition because it doesn’t even parse: we can’t compose edges to get a new edge.

This initial approach doesn’t quite work because of the way reduction contexts are usually presented. The reduction rules assume that we have a “global view” of the term being reduced, but the category theory insists on a “local view”. By “local” I mean that we can always whisker a reduction with a term constructor: if K K is an endomorphism on a graph, then given any edge e : v → v ′ e:v\to v' , there’s necessarily an edge K e : K v → K v ′ . K e:K v \to K v'. These two requirements conflict: to model reduction to weak head normal form, if we have a reduction t → t ′ , t \to t', we don’t want a reduction λ x . t → λ x . t ′ . \lambda x.t \to \lambda x.t'.

One solution is to introduce “context constructors”, unary morphisms for marking reduction contexts. These contexts become part of the rewrite rules and the structural congruence; for example, taking C C to be the context constructor for weak head normal form, we add a structural congruence rule that says that to reduce an application of one term to another, we have to reduce the term on the left first:

C ( T U ) ≡ ( C T U ) . C(T\; U) \equiv (C T\; U).

We also modify the reduction reduction rule to involve the context constructors. Here’s β \beta reduction when reducing to weak head normal form:

β : ( C ( λ x . T ) U ) ⇒ C T { U / x } . \beta: (C(\lambda x.T)\; U) \Rightarrow C T\{U/x\}.

Now β \beta reduction can’t happen just anywhere; it can only happen in the presence of the “catalyst” C C .

With context constructors, we can capture all of the information about operational semantics using a multisorted Gph-enriched Lawvere theory: we have a generating object for each sort, a generating morphism for each term constructor and for each context constructor, equations between morphisms encoding structural congruence and context propagation, and an edge for each rewrite in its appropriate context.

Connecting to Lambek/Scott

The SKI combinator calculus is a formal system invented by Schönfinkel and Curry. It allows for universal computation, and expressions in this calculus can easily be translated into the lambda calculus, but it’s simpler because it doesn’t include variables. The SK calculus is a fragment of the SK calculus that is still computationally universal.

We can recover the Lambek/Scott-style denotational semantics of the SK calculus (see the appendix) by taking the Gph-theory, modding out by the edges, and taking the monoid of endomorphisms on the generating object. The monoid is the cartesian closed category with only the “untyped” type. Using Melliès and Zeilberger’s notion of a functor as a type refinement system, we “-oidify” the monoid into a category of types and equivalence classes of terms.

However, modding out by edges utterly destroys the semantics of concurrent languages, and composition of endomorphisms doesn’t line up particularly well with composition of processes, so neither of those operations are desirable in general. That doesn’t stop us from considering Gph-enriched functors as type refinement systems, though.

Let G G be the free model of a theory on the empty graph. Our plan for future work is to show how different notions of a collection of edges of G G give rise to different kinds of logics. For example, if we take subsets of the edges of G , G, we get subgraphs of G , G, which form a Heyting algebra. On the other hand, if we consider sets of lists of composable edges in G , G, we get quantale semantics for linear logic. Specific collections will be the types in the type system, and proofs should be graph homomorphisms mapped over the collection. Edges will feature in proof normalization.

At the end, we should have a system where given a formal semantics for a language, we algorithmically derive a type system tailored to the language. We should also get a nice Curry-Howard style approach to operational semantics that even denotational semantics people won’t turn up their noses at!

Appendix

Gph

For our purposes, a graph is a set E E of edges and a set V V of vertices together with three functions s : E → V s\colon E \to V for the source of the edge, t : E → V t\colon E \to V for the target, and a : V → E a\colon V \to E such that s ∘ a = V s\circ a = V and t ∘ a = V t \circ a = V —that is, a a assigns a chosen self-loop to each vertex. A graph homomorphism maps vertices to vertices and edges to edges such that sources, targets and chosen self-loops are preserved. Gph is the category of graphs and graph homomorphisms. Gph has finite products: the terminal graph is the graph with one vertex and one loop, while the product of two graphs ( E , V , s , t , a ) × ( E ′ , V ′ , s ′ , t ′ , a ′ ) (E , V , s, t, a) \times (E' , V' , s' , t' , a') is ( E × E ′ , V × V ′ , s × s ′ , t × t ′ , a × a ′ ) . (E \times E', V \times V', s \times s', t\times t', a \times a').

Gph is a topos; the subobject classifier has two vertices t , f t, f and five edges: the two self-loops, an edge from t t to f , f, an edge from f f to t , t, and an extra self-loop on t t . Any edge in a subgraph maps to the chosen self-loop on t , t, while an edge not in the subgraph maps to one of the other four edges depending on whether the source and target vertex are included or not.

A Gph-enriched category consists of

a set of objects;

for each pair of objects x , y , x, y, a graph hom ( x , y ) ; \hom(x,y);

a graph for each triple of objects x , y , z , x, y, z, a composition graph homomorphism ∘ : hom ( y , z ) × hom ( x , y ) → hom ( x , z ) ; \quad \circ\colon \hom(y, z) \times \hom(x, y) \to \hom(x, z); and

a composition graph homomorphism and for each object x , x, a vertex of hom ( x , x ) , \hom(x, x), the identity on x , x,

such that composition is associative, and composition and the identity obey the unit laws. A Gph-enriched category has finite products if the underlying category does.

Any category is trivially Gph-enrichable by treating the elements of the hom sets as vertices and adjoining a self loop to each vertex. The category Gph is nontrivially Gph-enriched: Gph is a topos, and therefore cartesian closed, and therefore enriched over itself. Given two graph homomorphisms F , F ′ : ( E , V , s , t , a ) → ( E ′ , V ′ , s ′ , t ′ , a ′ ) , F, F'\colon (E, V, s, t, a) \to (E', V', s', t', a'), a graph transformation assigns to each vertex v v in V V an edge e ′ e' in E ′ E' such that s ′ ( e ′ ) = F ( v ) s'(e') = F(v) and t ′ ( e ′ ) = F ′ ( v ) . t'(e') = F'(v). Given any two graphs G G and G ′ , G', there is an exponential graph G ′ G G'^G whose vertices are graph homomorphisms between them and whose edges are graph transformations. There is a natural isomorphism between the graphs C A × B C^{A\times B} and ( C B ) A . (C^B)^A.

A Gph-enriched functor between two Gph-enriched categories C , D C, D is a functor between the underlying categories such that the graph structure on each hom set is preserved, i.e. the functions between hom sets are graph homomorphisms between the hom graphs.

Let S S be a finite set, FinSet be a skeleton of the category of finite sets and functions between them, and FinSet / S FinSet/S be the category of functions into S S and commuting triangles. A multisorted Gph-enriched Lawvere theory, hereafter Gph-theory is a Gph-enriched category with finite products Th equipped with a finite set S S of sorts and a Gph-enriched functor θ : FinSet op / S → Th \theta\colon FinSet^{op}/S \to Th that preserves products strictly. Any Gph-theory has an underlying multisorted Lawvere theory given by forgetting the edges of each hom graph.

A model of a Gph-theory Th is a Gph-enriched functor from Th to Gph that preserves products up to natural isomorphism. A homomorphism of models is a braided Gph-enriched natural transformation between the functors. Let FPGphCat be the 2-category of small Gph-enriched categories with finite products, product-preserving Gph-functors, and braided Gph-natural transformations. The forgetful functor U : FPGphCat [ Th , Gph ] → Gph U\colon FPGphCat[\Th, \Gph] \to \Gph that picks out the underlying graph of a model has a left adjoint that picks out the free model on a graph.

Gph-enriched categories are part of a spectrum of 2-category-like structures. A strict 2-category is a category enriched over Cat with its usual product. Sesquicategories are categories enriched over Cat with the “funny” tensor product; a sesquicategory can be thought of as a generalized 2-category where the interchange law does not always hold. A Gph-enriched category can be thought of as a generalized sesquicategory where 2-morphisms (now edges) cannot always be composed. Any strict 2-category has an underlying sesquicategory, and any sesquicategory has an underlying Gph-enriched category; these forgetful functors have left adjoints.

Some examples

The SK calculus

Here’s a presentation of the Gph-theory for the SK calculus:

objects T T

morphisms S : 1 → T S\colon 1 \to T K : 1 → T K\colon 1 \to T ( − − ) : T × T → T (-\; -)\colon T \times T \to T

equations none

edges σ : ( ( ( S x ) y ) z ) ⇒ ( ( x z ) ( y z ) ) \sigma\colon (((S\; x)\; y)\; z) \Rightarrow ((x\; z)\; (y\; z)) κ : ( ( K x ) z ) ⇒ x \kappa\colon ((K\; x)\; z) \Rightarrow x



The free model of this theory on the empty graph has a vertex for every term in the SK calculus and an edge for every reduction.

The SK calculus with the weak head normal form reduction strategy

Here’s the theory above modified to account for weak head normal form:

objects T T

morphisms S : 1 → T S\colon 1 \to T K : 1 → T K\colon 1 \to T ( − − ) : T × T → T (-\; -)\colon T \times T \to T R : T → T R\colon T \to T

equations R ( x y ) = ( Rx y ) R(x\; y) = (Rx\; y)

edges σ : ( ( ( RS x ) y ) z ) ⇒ ( ( Rx z ) ( y z ) ) \sigma\colon (((RS\; x)\; y)\; z) \Rightarrow ((Rx\; z)\; (y\; z)) κ : ( ( RK x ) z ) ⇒ Rx \kappa\colon ((RK\; x)\; z) \Rightarrow Rx



If M M is an SK SK term with no uses of R R and M ′ M' is its weak head normal form, then RM RM reduces to RM ′ . RM'.