

Introduced by Alonzo Church in the 1930s, $\lambda$-calculus can be considered as the smallest computation programming language of the world. Reading through the various tutorials one can find various examples which showcase its universality; addition, conditional operators, equality and inequality and recursion. One thing which always intrigued me is how applicable is $\lambda$-calculus to everyday problems such as finding the set of URLs in a sitemap or a set of alternate language urls? The question is obviously rhetorical since we know that $\lambda$-calculus is Turing complete! Can we model a “modern” problem in terms of $\lambda$-calculus and will such a problem still feel natural? The answer is yes! In this post we are going to apply $\lambda$-calculus in an SEO domain, specifically to construct a Sitemap and to find the hreflang for a language.

$\lambda$-Calculus

Disclaimer to a future math reader! If you are mathsy and you might feel like pulling your hair out after reading this (mathematical loose hand-wavy) document, rather than going on an epic rant about how useless I am please help out! After all I’m not a Mathematician 🙂 but I'd love to learn!

The central concept in $\lambda$-calculus is an expression. An expression is defined recursively as follows (not pure $\lambda$-calculus):

where

name , also referred to as a variable, is an identifier which for our purposes will be any sequence of italic lowercase letters e.g. lang, slug.

, also referred to as a variable, is an identifier which for our purposes will be any sequence of italic lowercase letters e.g. lang, slug. cons is a constant which for our purposes will be represented by a sequence of lowercase letters e.g. en , fi , sv .

Having defined an expression let us look at some examples:

Notation-wise we will use the following rules:

Variables are listed after the $\lambda$ e.g. lang is a variable in $\lambda\textit{lang}.\textit{lang}$

An application is always associated to the left such that $E_1 \dots E_{n} \equiv (((E_1 E_2) E_3) \dots E_n)$

$\lambda$ binds to the right as far as possible: $\lambda \textit{l}.\textit{l} \textit{l} \equiv \lambda \textit{l}.(\textit{l} \textit{l})

ot\equiv (\lambda \textit{l}.\textit{l})\textit{l}$

Outermost parenthesis are omitted: $E_1 \ldots E_2 \equiv (E_1 \ldots E_2)$

Free and Bound Variables

In $\lambda$-calculus all names are local to definition. In the function $\lambda \textit{x}.\textit{x}$ we say that $\textit{x}$ is “bound” since its occurrence in the body of the definition is preceded by $\lambda \textit{x}$. A name not preceded by $\lambda$ is called a “free variable”.

The set of free variables can be defined recursively:

Similarly, the set of bound variables can be defined recursively:

$\beta$-Reduction

In $\lambda$-calculus there is one computational rule called $\beta-$reduction: $((\lambda\textit{x}.s)\texttt{t})$ can be reduced to $\textit{s}[\texttt{t}/\textit{x}]$, the result of replacing the arguments $\texttt{t}$ for the formal parameter $\textit{x}$ in $s$. Note: It is assumed that no substitution will mix free and bound identifiers and hence we will omit the discussion of $\alpha$-reduction. Some examples are:

A more complex example would be the following:

Some $\lambda-$expressions have a non-deterministic reduction. Let us take as an example the $\lambda$-expression $(\lambda x.x x)((\lambda y.y)z)$. This $\lambda$-expression has two possible reductions

and

On the other end of the spectrum we find expressions that do not have a single reduction sequence. As an example consider

For the purposes of our discussion we will only consider $\lambda$-expressions of the form

where $n_1\ldots n_{k}$ are the parameters of the lambda functions and $c_1\ldots c_{k}$ are the arguments of the function. Each individual argument $c_{k}$ can be either a name or constant. It can be shown that expressions of this form are deterministic and fully normalised (cannot be reduced further). Specifically, it will take $k$ $\beta-$reductions to normalise the $\lambda$-expression.

Context

Given a $\lambda$-expression of the form

we will refer to the sequence of reductions $c_1c_2\ldots c_{k}$ as the context of the $\lambda-$expression. In this context the item $c_{k}$ is called a variable. The variable $c_{k}$ can take on two possible values: a name or a constant . If $c_{k}$ is a name we will say that the variable $c_{k}$ is a free variable since the normalised expression will contain such a name $c_{k}$ in the set of free variables. If $c_{k}$ is a constant we will say that the variable $c_{k}$ is a bound variable since the normalised expression will contain such a constant $c_{k}$ in the set of bound variables. As an example let us consider the following:

In this case the variable $c_1$ is the name $\textit{a}$ and the variable $c_2$ is the constant $\texttt{lobby}$. The normalised expression is $\textit{a}\ \texttt{lobby}$. It is clear that the free variable $\textit{a}$ is a free variable in the normalised $\lambda-$expression $\textit{a}\ \texttt{lobby}$ and that $\texttt{lobby}$ is a bound variable.

When all $c_1\ldots c_{k}$ variables are free we will refer to the lambda expression as a free lambda expression. When all $c_1\ldots c_{k}$ variables are bound we will refer to the lambda expression as a bound lambda expression. When the context contains a mix of free and bound variables we will refer to the lambda expression as a partial lambda expression.

Given a context $c_1\ldots c_{k}$, the $\lambda-$expression required to normalise to the context is

Similarly, given a normalised expression $c_1\ldots c_{k}$ one can always find the source context and the source $\lambda-$expression which reduced to the normalised expression. To make such a conversion explicit we will use the function $\omega(c_1\ldots,c_{k})$

Expression Tree

Starting with a $\lambda$-expression at the root, one can visualise all the reduction steps leading to the normalised expression in a tree. As an example let us consider how the reduction sequence for $(\lambda \textit{lang}. \textit{lang}) \texttt{en}$ can be represented.

A more complex example would be the following

In general, the generic $\lambda-$expression

can be represented as follows

Notation-wise we will use $\psi((\lambda n_1\ldots\lambda n_{k} \cdot n_1\ldots n_{k})c_1\ldots c_{k})$ or $\psi(\omega(c_1\ldots c_{k}))$ to refer to the sequence of reductions i.e. expression tree.

Expression Sets

Reduction trees discussed in the previous section represent the normalisation steps for a given expression $\omega(c_1c_2\ldots c_{k})$ where $c_1,\ldots,c_{k}$ is in a given configuration. If $c_{1} \in \mathcal{C}_{1}, c_{2} \in \mathcal{C}_{2}, \ldots, c_{k} \in \mathcal{C}_{k}$ we can build the set of all expression trees by normalising all expressions in the set

You can think of $\mathcal{C}_{k}$ as the type for variable $c_{k}$. As an example let us consider the expression $((\lambda \textit{lang}. (\lambda \textit{slug}. \textit{lang}\ \textit{slug}))c_1c_2$. Where $c_1 \in \{ \texttt{en}, \texttt{fi} \}$ and $c_2 \in \{ \texttt{lobby} \}$. The reduction set $\mathcal{RS}$ would contain two trees

If $c_1 \in \{\texttt{e}, \texttt{f}\}$ and $c_2 \in \{\texttt{l}, \texttt{g}\}$ the reduction set $\mathcal{RS}$ would contain four trees

Given that each expression tree has only one reduction sequence to normalisation we will merge all expression trees in a single tree by omitting the context. As an example let us consider the $\lambda-$expression set

where $c_1 \in \{ \texttt{en},\texttt{fi}\}$ and $c_2 \in \{ \texttt{lobby}, \texttt{games}\}$.

The expression set can be represented as

It is important to note that an expression set tree is just a visualisation aide. Mentally, one should always expand each node in the tree to the set of $\lambda-$expressions. At each level the set of $\lambda-$expressions being represented can be found by mapping all descendant leaf-nodes using $\omega(c_1c_2\ldots c_{k})$ defined in the previous section. As an example the root node in the expression set $\lambda \textit{l}.\lambda \textit{s}.\textit{l}\ \textit{s}$ has four descendant leaf nodes

and hence represents the set

Free/Bound/Partial Reduction Sets

Leaf nodes in a reduction set represent normalised expressions. If all leaf expressions (normalised expressions) are bound expressions we say that the reduction set is a bound reduction set. The example below illustrates a bound reduction set.

If some normalised expressions are bound we say that the reduction set is a partial reduction set. The example below illustrates a partial reduction set.

If no normalised expressions are bound - that is all leaf expressions are free - we say that the reduction set is a free reduction set. The example below illustrates a free reduction set.

Serialisation and de-serialisation

Serialisation

Given a $\lambda-$expression of the form

we can serialise the normalised reduction $c_1\ldots c_{k}$ by means of a bijective function $f$ which takes a context configuration and maps it to a string $s$. The function $f$ has to be bijective, as later on, we will need to deserialise the string back to the context.

One implementation of serialisation function $f$ is to join the context using the / character. As we have two types of variables, free and bound, we need to distinguish between these different types so during deserialisation, the context can be correctly reconstructed. In order to distinguish between free and bound variables we will prefix free variables with the colon prefix.

Let’s serialise the following expression

The normalised context for this expression is

The result of serialising the above expression is

One other implementation of a serialisation function is $f_{int}$ which takes a context and serialises the context into an internationalised string. As an example consider

The context for this expression is

The result of serialising the above expression is

where the constant slots in Finnish is serialised to kolikkopelit.

Deserialisation

Deserialisation maps a string representation back to the context.

One implementation of the $\bar{f}$ function obtains the original context using the context extraction function $ce$

and reconstructs the original expression using the $\omega$ function

Given a serialisation

Search

Search enables us to find a particular expression tree in an expression set. In order to create a search function we will first define a match function for expressions. The match function takes a source (source) $\lambda$-expression and a search $\lambda$-expression (se) and returns a true if

As an example let us match $\lambda l.\texttt{en}$ to itself

Having defined match for expression we will move up the abstraction hierarchy and define matching on expression trees. Let’s overload the $match$ function so that it now takes a source expression tree and a search expression tree and calls match on the source and search expression trees.

Let us $match$ the expression tree $\psi(((\lambda \textit{l}. (\lambda \textit{s}. \textit{l}\ \textit{s})) \texttt{en})\texttt{lobby})$ to itself.

Having defined match for expression tree we will move up (once again) the abstraction hierarchy and define a $search$ function for expression sets. Since the expression set $\mathcal{RS}$ represents a set of expression trees, searching through an expression set is simply the act of matching each element of the expression set with the search expression tree se.

Search Properties

In this section we shall look at two search properties: the search identity and freed context identity.

Search Identity Searching for a context equal to a normalised expression $\textit{e}$ present in an expression set $\mathcal{RS}$ will yield the result set $\mathcal{RS}_{match} = \{ \psi(\omega(\textit{e})) \}$.





Freed Context Identity Starting from a normalised expression $\textit{e}$ present in the expression set $\mathcal{RS}$ and freeing one or more `constants` will yield the results set $\mathcal{RS}_{match}$ which includes $\psi(\textit{e})$ i.e. $\mathcal{RS}_{match} \supseteq \{ \psi(\omega(\textit{e})) \}$ .

Search identity

The number of normalised expression nodes a set can have is dependant on the context. Specifically it is the cartesian product of all contexts $\mathcal{C}_{1}\ldots \mathcal{C}_{k}$. The set of normalised expressions in a bound expression set $\mathcal{RS}$ implies that there is one and only one reduction to any $c’ \in \mathcal{C}_{1} \times \ldots \times \mathcal{C}_{k}$. Thus, any search starting from a bound context $c’$ will find the same normalised node $c’$. We call this the search identity.

Let’s for instance search for

on the expression set

where $c_1 \in \{\texttt{en}, \texttt{fi}\}$ and $c_2 \in \{\texttt{lobby}, \texttt{games}\}$.

The individual $match$ reductions are

Freed context search

Searching for a bound expression tree will not yield meaningful results. In general we are interested in searching for expression trees which are similar to a search expression tree. In order to generalise a normalised expression tree we will substitute one or more constants with free variables. Consider the following expression set

In general one would search for a bound expression tree to uniquely identify the expression tree within an expression set

where $c_1 \in \{\texttt{en}, \texttt{fi}\}$ and $c_2 \in \{\texttt{lobby}, \texttt{games}\}$.

The normalised reduction for the node with the $\dagger$ symbol is $\texttt{en}\ \texttt{lobby}$. In order to retrieve similar nodes one needs to substitute constants with free variables. Such a substitution marginalises all constant values for the given variable $c_{k}$ making such a variable (in the context) irrelevant to the search. For instance in order to find similar $\texttt{lobby}$ expression trees, we retain the bound $\texttt{lobby}$ constant and replace the $\texttt{en}$ value with the free variable $\ast$. Searching for the expression tree $\psi(\lambda \textit{lang}. (\lambda \textit{slug}. \textit{lang}\ \textit{slug})\ast\ \texttt{lobby})$ will yield

The individual $match$ reductions are

Nothing! Give me Nothing!

Given the $\lambda$-expression

there may be cases where the context $c_k$ is undefined or nothing. We will represent nothing using the token constant $\varnothing$. We require that the types $\mathcal{C}_{k}$ include a dummy bound value $\varnothing$ to represent nothing. A configuration $c_1,\ldots,c_{k}$ is considered valid if

Informally, this requires that if a variable in a particular position within a configuration is $\varnothing$, all subsequent variables within the configuration need to be $\varnothing$.

As an example, let us consider the $\lambda$-expression

where

In order to introduce nothing, we will add the constant $\varnothing$ to all type sets

and restrict the configuration $c_1c_2c_3$ to be in the set

$\varnothing$ does not affect $\beta-$reduction, it is after all a constant . As an example let us consider

We will typically omit $\varnothing$ from the expression set for clarity.

Search including nothing

As the $\varnothing$ is a constant, search does not need to be modified. Lets search for the expression tree $\psi(\omega({\texttt{en }\varnothing}))$ in the expression set

where $lang \in \{\texttt{en},\ \varnothing\}$ and $slug \in \{\texttt{lobby},\ \varnothing\}$.

The individual $match$ reductions are

This comes as no surprise due to the search identity! Note that $\psi(\omega(\varnothing\ \texttt{lobby}))$ is not included in $\mathcal{RS}$ since this would violate

Integrating this with the Content Management System

In this section we shall discuss how the $\lambda$-calculus can be applied to solve two SEO problems: getting all the paths of the site and getting a path’s alternates in other languages.

Constructing the Partial Expression Set

Constructing an internationalised sitemap can be translated to the problem of serialising a bound expression set. But how does one construct such a bound expression set? All URLs on our website are saved in a Content Management System (CMS) and follow one of the following forms:

One can interpret the above URLs as a serialisation originating from the expression set

where $c_{1}=\textit{lang}$, $c_{2} \in \{ \texttt{slots}, \varnothing \}$ and $c_{3}=\textit{slug}$.

Such an expression set is a partial expression set and needs to be bound in order to retrieve a site’s sitemap. Recall from earlier that free variables represent a marginalisation of constants for a particular type $\mathcal{C}_{k}$. In this case lang represents a marginalisation of all constants of type $\mathcal{C}_{lang}$ and $\textit{slug}$ represents a marginalisation of all constants of type $\mathcal{C}_{slug}$. Resolvers provide a means of finding the set of constants $c_{k}$ for a type $\mathcal{C}_{k}$ when $c_{k}$ is a free variable.

Constructing the Bound Expression Set - Resolvers

A resolver function $r$ is a function which takes a context $c_{k}$ and returns the set of all possible bound values $\mathcal{C}_{k}$. This is required when the value $c_{k}$ has been marginalised by a free variable.

To convert the partial expression set

where $c_{1}=\textit{lang}$, $c_{2} \in \{ \texttt{slots}, \varnothing \}$ and $c_{3}=\textit{slug}$, we need to resolve context $c_{1}$ and $c_{3}$ through the resolvers $r(c_{1})$ and $r(c_{3})$ to obtain the bound values for types $\mathcal{C}_{lang}$ and $\mathcal{C}_{slug}$ respectively. Using the CMS one can implemented $r(c_{1})$ by retrieving the set of all supported languages and $r(c_{3})$ by enumerating all games supported by the system. For our example we will assume that the set $\mathcal{C}_{lang}$ and $\mathcal{C}_{slug}$ are

Substituting $\textit{lang}$ and $\textit{slug}$ with the set of bound variables $\mathcal{C}_{lang}$ and $\mathcal{C}_{slug}$

results in the bound expression set $\mathcal{RS}$.

We omitted $\varnothing$ for simplicity but there exist expression tree including $\varnothing$ in the expression set $\mathcal{RS}$ namely:

99 Problems! Sitemap and Alternates ain’t one!

Having constructed the bound expression set we will now tackle the problem of serialising the complete sitemap and the problem of finding a list of alternate URLs from any given URL.

Getting all the paths of the site for sitemap

Having constructed the bound expression set we can obtain the set of URLs by passing each normalised expression through an internationalisation serialisation function $f_{int}$. For visualisation purposes, we shall display the serialised form after passing it through the serialisation function handling internationalisation $f_{int}$ below the node. Due to space limitations, we shall omit drawing the $\varnothing$ nodes.

Since some paths might not be desired in the final list of URLs (sitemap) we will filter some deserialised URLs by passing them through a predicate $p$. Such a predicate needs to be based on the value of the variable $c_{k}$ present in the context. As an example the predicate $p(c) \rightarrow c_{1}

e \texttt{en}$ will filter the serialised set

As an implementation detail the CMS should include a flag on each sitemap and game element to indicate whether such an element is to be included in the sitemap. Such a flag needs to be encoded in a variable $c_{k}$ within the context.

to

Getting alternate paths

Given a bound expression set $\mathcal{RS}$ how does one find all alternate URLs from a source URL? This problem can be solved by using the search functionality discussed in Section search . In essence we want to search the bound expression set $\mathcal{RS}$ using the normalised expression tree deserialised from the source URL. Since the language is not important the context $c_{1}$ - representing the language - will be marginalised in the search expression tree.

As an example, let us assume that a client requested the page /fi/kolikkopelit/gonzofi/ annotated with $^\dagger$ below.

We can find the original normalised expression which serialised to $\texttt{/fi/kolikkopelit/gonzofi/}$ by using the inverse function $\bar{f_{int}}$

We convert the normalised expression $\texttt{fi slots gonzo}$ to the expression tree - $\psi(\omega(\texttt{fi slots gonzo}))$ - by using the $\omega$ function and work through the $\beta$-reductions. To search for similar expression trees which vary only in language we marginalise the language context by introducing a $\textit{free}$ variable $\ast$. We can obtain the set of alternate expression trees by applying

The expression set $\mathcal{RS}_{alternates}$ is then serialised using $f_{int}$ to obtain the set of alternate URLs

We will use the filter function $p(c) \rightarrow c_{1}

e \texttt{en} \wedge c_{2}

e \texttt{slots} \wedge c_{3}

e \texttt{gonzo} $ to filter out the original URL.

Conclusion

In this post we have defined a model using $\lambda$-calculus which can be applied to common SEO problems namely the problem of retrieving all site URLs and the problem of retrieving all alternate URLs. Hopefully you find these ideas interesting! Stay safe and keep hacking!