



Mathematical Background by John F. Sowa

This web page is a revised and extended version of Appendix A from the book Conceptual Structures by John F. Sowa. It presents a brief summary of the following topics for students and general readers of that book and related books such as Knowledge Representation and books on logic, linguistics, and computer science.

Note: Special symbols in this file that are outside the Latin-1 character set (ISO 8859-1) are represented by a .gif image for each character. The alt tag for each image gives the name of the character. Students who are just learning the symbols can move the mouse to any symbol to get a brief reminder of its name.

1. Sets, Bags, and Sequences

Elementary or "naive" set theory is used to define basic mathematical structures. A set is an arbitrary collection of elements, which may be real or imaginary, physical or abstract. In mathematics, sets are usually composed of abstract things like numbers and points, but one can also talk about sets of apples, oranges, people, or canaries. In computer science, sets are composed of bits, bytes, pointers, and blocks of storage. In many applications, the elements are never defined, but are left as abstractions that could be represented in many different ways in the human brain, on a piece of paper, or in computer storage.

Curly braces are used to enclose a set specification. For small, finite sets, the specification of a set can be an exhaustive list of all its elements:

{1, 97, 63, 12}.

{12, 63, 97, 1}.

{x | vertebrate(x) and warmBlooded(x) and hasHair(x) and lactiferous(x)}

{1, 2, 3} {x | x is an integer and 0<x<4} {x | x is a positive integer, x divides 6, and x 6} {x | x=1 or x=2 or x=3}

In any theory using sets, there are two privileged sets: the empty set {}, which contains no elements at all, and the universal set U, which contains every element that is being considered. In mathematical discussions, for example, the universal set may be the set of all integers Z or the set of all real numbers R. In most discussions, the universal set is usually defined at the beginning of the presentation. Thereafter, other sets are built up from U: subsets of U, pairs of elements of U, sets of sets of elements from U, etc.

Of all the operators that deal with sets, the most basic is , which states whether a particular element is in a set: the notation x S means that x is an element of the set S; it may also be read x is a member of the set S or simply x is in S. All other operators on sets can be defined in terms of . Let A and B be any two sets. Following are the common operators of set theory; listed for each one is its name, standard symbol, informal English definition, and formal definition in terms of :

Union. A B is the set that contains all the elements in either A or B or both: A B = { x | x A or x B }.

is the set that contains all the elements in either or or both: Intersection. A B is the set that contains all the elements that are in both A and B : A B = { x | x A and x B }.

is the set that contains all the elements that are in both and : Complement. - A is the set that contains everything in the universal set that is not in A : - A = { x | x U and not x A }.

- is the set that contains everything in the universal set that is not in : Difference. A - B is the set that contains all the elements that are in A but not in B : A - B = { x | x A and not x B }.

- is the set that contains all the elements that are in but not in : Subset. A B means that every element of A is also an element of B : If x A , then x B . In particular, every set is a subset of itself: A A .

means that every element of is also an element of : In particular, every set is a subset of itself: . Proper subset. A is a proper subset of B if A B and there is at least one element of B that is not in A : If x A , then x B and there exists some b where b B and not b A .

is a proper subset of if and there is at least one element of that is not in : Superset. A is a superset of B if B is a subset of A .

is a superset of if is a subset of . Empty set. The empty set has no elements: for every x , it is false that x {}. The empty set is a subset of every set, including itself: for every set A , {} A .

The empty set has no elements: for every , it is false that {}. The empty set is a subset of every set, including itself: for every set , {} . Disjoint sets. Two sets A and B are said to be disjoint if they have no common elements; i.e., their intersection is the empty set: A B={}.

The operators for union, intersection, and complement satisfy several standard identities. Some of the identities listed below are similar to the rules of ordinary arithmetic. Addition and multiplication, for example, obey the rules of commutativity and associativity, and the minus sign obeys the rule of double complementation. Idempotency, absorption, and De Morgan's laws, however, do not hold for ordinary arithmetic. Distributivity holds for multiplication over addition, but addition does not distribute over multiplication.

Idempotency. A A is identical to A , and A A is also identical to A .

is identical to , and is also identical to . Commutativity. A B is identical to B A , and A B is identical to B A .

is identical to , and is identical to . Associativity. A ( B C ) is identical to ( A B ) C , and A ( B C ) is identical to ( A B ) C .

( ) is identical to ( ) , and ( ) is identical to ( ) . Distributivity. A ( B C ) is identical to ( A B ) ( A C ), and A ( B C ) is identical to ( A B ) ( A C ).

( ) is identical to ( ) ( ), and ( ) is identical to ( ) ( ). Absorption. A ( A B ) is identical to A , and A ( A B ) is also identical to A .

( ) is identical to , and ( ) is also identical to . Double complementation. - - A is identical to A .

- - is identical to . De Morgan's laws. -(A B) is identical to -A -B. and -(A B) is identical to -A -B.

For complex sets, the rule for determining which elements are in the set may be too complex to state in a single expression. An example is the set of all grammatical sentences in some language, natural or artificial. Such sets are typically specified by a recursive definition:

First a finite starting set of elements is given.

Then some operations are specified for generating new elements of the set from old elements.

Finally, the set is defined to be the smallest set containing the starting elements and all others that can be derived from them by repeated application of the generating operations.

The set resulting from these operations is said to be the closure of the starting set under the given generating operations. As an example of a recursive definition, the set S of all positive integers not divisible by 3 could be specified by intension:

S = {x | x is an integer, x>0, and 3 does not divide x}.

Let the set {1, 2} be a subset of S .

. If x is any element of S , then x +3 is also an element of S .

is any element of , then +3 is also an element of . S is the smallest set that has the above two properties; i.e., S is a proper subset of any other set that has those properties.

All elements of S may be enumerated by starting with {1, 2}. The first stage of adding 3 generates the new elements 4 and 5, adding 3 to them gives 7 and 8, then 10 and 11, and so on. The set S is the closure of the set {1, 2} under the operation of adding 3. A recursive definition is a special kind of definition by intension. The formal grammars presented in Section 10 define languages by a recursive definition in which the generating operations are specified by a set of production rules. For a discussion and comparison of various methods of definition, see the notes on definitions by Norman Swartz.

A set has no duplicate elements. Since all duplicates are discarded in computing the union of two sets, the union operator is idempotent: A A=A. In some cases, one may want to allow duplicates; therefore, a bag is a collection of things with possible duplicates. Since there may be more than one occurrence of a given element x, the count operator @ is a generalization of the element operator . The expression x@A is the number of times the element x occurs in the bag A. Bags are useful for many purposes, such as taking averages: if four men have heights of 178cm, 184cm, 178cm, and 181cm, then the set of those numbers is {178, 181, 184} with the average 181; but the bag of the numbers is {178, 178, 181, 184} with average 180.25.

A sequence is an ordered bag. To distinguish ordered sequences from unordered sets and bags, the elements of a sequence are enclosed in angle brackets: 178, 184, 178, 181 ; the empty sequence is written . If a sequence has n elements, the elements are numbered from 1 to n (or alternatively from 0 to n-1). A sequence of two elements is sometimes called an ordered pair; a sequence of three elements, a triple; a sequence of four, a quadruple; a sequence of five, a quintuple; and a sequence of n elements, an n-tuple. Historically, the theory of sets was first defined without considering order. On a piece of paper or in computer storage, however, the elements of a set must be listed in some order. Sequences are therefore easier to represent than bags, and bags are easier to represent than sets: a bag is a sequence with the ordering ignored, and a set is a sequence with both order and duplicates ignored.

New sets may be created by combining elements from the universe U in various ways. The cross product of two sets A and B, written A×B, is the set of all possible ordered pairs with the first element of each pair taken from A and the second element from B. If A is the set {1,2} and B is the set {x,y,z}, then A×B is the set,

{ 1,x , 1,y , 1,z , 2,x , 2,y , 2,z }.

{ x,y | x A and y B}.

{ x,y,z | x A, y B, and z C}.

In this book, most sets are finite. Inside a computer or the human brain, all sets that are explicitly stored must be finite. But mathematical definitions and proofs are generally simpler if there is no upper limit on the size of sets. Therefore, definitions in computer science often permit infinite sets, but with the understanding that any implementation will only choose a finite subset. Most infinite sets discussed in computer science are assumed to be countable: a countably infinite set is one whose elements can be put in a one-to-one correspondence with the integers. The set of all real numbers is uncountable, but such sets are far beyond anything that can be implemented in computer systems.

The terminology for sets is quite standard, although some authors use the word class for set and others make a distinction between classes and sets. Bags are not used as commonly as sets, and the terminology is less standard. Some authors use the word multiset for a bag. Sequences are sometimes called lists or vectors, but some authors draw distinctions between them. Some authors use the symbol ∅ for the empty set, but the notation {} is more consistent with the notation for the empty sequence.

2. Functions

A function is a rule for mapping the elements of one set to elements of another set. The notation f: A B means that f is a function that maps any element x in the set A to some element f(x) in the set B. The set A is called the domain of f, and B is called the range of f. In mathematics, the element x is called the argument, and f(x) is called the result or the image of x under the mapping f. In computer science, x is called the input and f(x) is called the output.

Suppose Z is the set of all integers, and N is the set of non-negative integers (i.e. the positive integers and zero). Then define a function square: Z N with the mapping rule,

square(x) = x2.

A function is onto if every element of its range is the image of some element of its domain. As an example, define the absolute value function, abs: Z N, with the mapping,

+x if x 0 abs(x) = -x if x<0

A function is one-to-one if no two elements of its domain are mapped into the same element of its range. The function abs is not one-to-one because all the elements of N except 0 are the images of two different elements of Z. For example, abs(-3) and abs(3) are both 3. As a more subtle example, consider the function g: Z N with the mapping,

g(x) = 2x2 + x.

A function that is both one-to-one and onto is called an isomorphism. The two sets that form the domain and range of the function are said to be isomorphic to each other. Let E be the set of even integers, and let O be the set of odd integers. Then define the function increment: E O with the mapping,

increment(x) = x + 1.

For many applications, isomorphic structures are considered equivalent. In old-fashioned computer systems, for example, holes on a punched card could represent the same data as magnetized spots on tape or currents flowing in transistors. Differences in the hardware are critical for the engineer, but irrelevant to the programmer. When programmers copied data from cards to tape, they would blithely talk about "loading cards to tape" as if the actual paper were moved. One mythical programmer even wrote a suggestion for reducing the shipping costs in recycling old cards: load the cards to tape and punch them out at the recycling works.

If f is an isomorphism from A to B, then there exists an inverse function, f -1: B A. The inverse of the function increment is the function decrement: O E with the mapping,

decrement(x) = x - 1.

Functions may have more than one argument. A function of two arguments whose first argument comes from a set A, second argument from a set B, and result from a set C is specified f: A×B C. A function with one argument is called monadic, with two arguments dyadic, with three arguments triadic, and with n arguments n-adic. Those terms are derived from Greek. Some authors prefer the Latin terms unary, binary, ternary, and n-ary. The number of arguments of a function is sometimes called its valence, adicity, or arity.

The rule that defines a function f:A B as a mapping from a set A to a set B is called the intension of the function f. The extension of f is the set of ordered pairs determined by such a rule:

{ a 1 ,b 1 , a 2 ,b 2 , a 3 ,b 3 ,...}.

Since a function is a rule for mapping one set to another, the term mapping is sometimes used as a synonym for function. Another synonym for function is the term operator. Addition, subtraction, multiplication, and division are dyadic functions defined over the real numbers, but they are usually called operators. A common distinction is that functions have ordinary alphabetic names, but operators are designated by special symbols like + or ÷. Traditional mathematical practice has tended to use several different terms as informal synonyms for functions:

If the domain of a function is a set of simple things like numbers and type labels, it is usually called a function .

. If its domain and range are sets of complex structures like conceptual graphs, it is often called a mapping .

. If its name is being spelled in full for readability, it may be written as an alphanumeric string, such as increment ( x ) or add1 ( x ).

( ) or ( ). If it often occurs in complex expressions, it may be abbreviated by a single symbol or Greek letter, such as x, and be called an operator.

3. Lambda Calculus

Defining a function by a rule is more natural or intuitive than defining it as a set of ordered pairs. But a question arises when functions defined by different rules or intensions happen to have exactly the same sets of ordered pairs or extensions. In developing his theory of lambda calculus, the logician Alonzo Church (1941) distinguished equality by intension from equality by extension:

It is possible, however, to allow two functions to be different on the ground that the rule of correspondence is different in meaning in the two cases although always yielding the same result when applied to any particular argument. When this is done, we shall say that we are dealing with functions in intension. The notion of difference in meaning between two rules of correspondence is a vague one, but in terms of some system of notation, it can be made exact in various ways.

In mathematics, the traditional way of defining a function is to specify the name of a function and its formal parameter on the left side of an equation and to put the defining expression on the right:

f(x) = 2x2 + 3x - 2.

f = x(2x2 + 3x - 2).

Church's rules for lambda conversion are formal statements of the common techniques for defining and evaluating functions. Whenever a function is applied to its arguments, such as f(5), the function may be evaluated by replacing the name f with the body of the definition and substituting the argument 5 for every occurrence of the formal parameter x. Church also defined additional operators, which combined with function evaluation to produce a computational system that is as general as a Turing machine.

With such rules, Church answered the question about equality of functions: they are equal by extension if they have the same sets of ordered pairs, and they are equal by intension if their definitions are reducible to the same canonical form by the rules of lambda conversion. An important result of the lambda calculus is the Church-Rosser theorem: when an expression has more than one function application that can be evaluated, the order of evaluation is irrelevant because the same canonical form would be obtained with any sequence of evaluations.

In computer science, the clear separation of the name of a function from its defining expression enables a lambda expression to be used anywhere that a function name could be used. This feature is especially useful for applications that create new functions dynamically and pass them as arguments to other functions that evaluate them. John McCarthy (1960) adopted the lambda notation as the basis for defining and evaluating functions in the LISP programming language. A common technique of computational linguistics is to translate natural language phrases to lambda expressions that define their semantics. William Woods (1968) used that technique for defining the semantics of the English quantifiers every and some as well as extended quantifiers such as more than two or less than seven. He implemented his definitions in LISP programs that translated English questions to lambda expressions, which were later evaluated to compute the answers. Richard Montague (1970) adopted a similar technique for his treatment of quantifiers in natural language semantics.

4. Graphs

In diagrams, a graph is normally drawn as a network of nodes connected by arcs. Such diagrams introduce arbitrary conventions that are irrelevant to the mathematical definitions and theorems: Are the arcs drawn curved or straight? Short or long? Are the nodes drawn as dots, circles, or other shapes? Is there any significance in having node x above, below, to the right, or to the left of node y? To avoid such questions, a graph is defined formally without any reference to a diagram. Diagrams are then introduced as informal illustrations. Diagrams are essential for developing an intuitive understanding, but the definitions and proofs are independent of any features of the drawing that are not explicitly mentioned in the formal definitions.

Figure 1: A sample graph

Formally, a graph G consists of a set N of nodes and a set A of arcs. Every arc in A is a pair of nodes from the set N. For the sample graph in Figure 1, the set of nodes is {A, B, C, D, E}, and the set of arcs is { A,B , A,D , B,C , C,D , D,E }. Notice that node D happens to be an endpoint of three different arcs. That property can be seen instantly from the diagram, but it takes careful checking to verify it from the set of pairs. For people, diagrams are the most convenient way of thinking about graphs. For mathematical theories, a set of pairs is easier to axiomatize. And for computer implementations, many different data structures are used, such as blocks of storage for the nodes and pointers for the arcs.

Figure 2: An alternate way of drawing the same graph as Figure 1.

Figure 2 is another way of drawing the same graph shown in Figure 1. The two diagrams look very different, but their abstract representations as sets of nodes and arcs are the same. Even when graphs are defined in a purely abstract way, questions may arise about the order of the two nodes of an arc. If the order is irrelevant, the notation {A,B} shows that the arc is an unordered set of two nodes. A graph whose arcs are unordered pairs is said to be undirected. If the order is significant, A,B and B,A represent distinct arcs, and the graph is said to be directed. For the directed graph represented in Figures 1 and 2, an arrowhead on each arc points to the second node of each ordered pair.

Although graphs are defined abstractly, mathematicians normally visualize them as diagrams. The common conventions for drawing graphs are reflected in descriptive terms like endpoint, loop, path, and cycle. Let e be the arc a,b . Then the nodes a and b are called endpoints of e, and e is said to connect a and b. If e is an arc of a directed graph, then the first endpoint a is called the source of e, and the second endpoint b is called the target of e. The word target is easy to remember since that is the direction the arrow points. A loop is an arc e whose endpoints are the same node: e= a,a .

Combinations of arcs are often named by the methods of traversing a graph. A walk through a graph is a sequence of nodes a 0 , a 1 , ..., a n for which any two adjacent nodes a i and a i+1 are the endpoints of some arc. Any arc whose endpoints are adjacent nodes of a walk is said to be traversed by the walk. A walk that contains n+1 nodes must traverse n arcs and is therefore said to be of length n. A path is a walk in which all nodes are distinct. A walk with only one node a 0 is a path of length 0. If the first and last nodes of a walk are the same, but all other nodes are distinct, then the walk is called a cycle. Every loop is a cycle of length 1, but cycles may traverse more than one arc.

For the graph in Figure 2, the walk E, D, A, B is a path because all nodes are distinct. The path is of length 3, which is equal to the number of arcs traversed by a point that moves along the path. The walk D, C, B, A, D is a cycle because it starts and ends at the same node.

If G is a directed graph, then a walk, path, or cycle through G may or may not follow the same direction as the arrows. A walk, path, or cycle through G is said to be directed if adjacent nodes occur in the same order in which they occur in some arc of G: if a i and a i+1 are adjacent nodes on the walk, then the ordered pair a i ,a i+1 must be an arc of G. An arc of a directed graph is like a one-way street, and a directed walk obeys all the one-way signs (arrowheads). An undirected walk through a directed graph is possible, simply by ignoring the ordering.

A graph is connected if there is a possible path (directed or undirected) between any two nodes. If it is not connected, then it breaks down into disjoint components, each of which is connected, but none of which has a path linking it to any other component. A cutpoint of a graph is a node, which when removed, causes the graph (or the component in which it is located) to separate into two or more disconnected components.

Certain special cases of graphs are important enough to be given special names: an acyclic graph is one that has no cycles, and a tree is an acyclic connected graph for which the path between any two nodes is unique. The most commonly used trees are rooted trees:

The arcs of a rooted tree are directed.

If a,b is an arc of the tree, the node a is called the parent of b , and b is a child of a .

is an arc of the tree, the node is called the of , and is a of . There is a privileged node called the root , which has no parent.

, which has no parent. Every node except the root has exactly one parent.

A node that has no child is called a leaf.

The terminology of trees is extended to related graphs: a forest is a collection of disconnected trees; a chain is a tree with no branches all the nodes lie along a single path; and a seed has only one node and no arcs. Some authors require every graph to have at least one node, but other authors include the empty graph or blank, which has no nodes or arcs.

Figure 3: A binary tree

A binary tree is a rooted tree where every node that is not a leaf has exactly two children (Figure 3). In a binary tree, the two children of each node are usually designated as the left child and the right child. Since a tree has no cycles, a common convention for simplifying the diagrams is to omit the arrowheads on the arcs, but to draw the parent nodes at a higher level than their children. For Figure 3, the root A, which has no parent, is at the top; and the leaves, which have no children, are arranged along the bottom.

In computer applications, each node of a tree or other graph may have some associated data. To process that data, a program can take a walk through the tree and process the data at each node it visits. For the tree in Figure 3, imagine a walk that starts at the root, visits every node at least once, and stops when it returns to the root. Assume that the left child is always visited before the right child. Such a walk will visit the leaves of the tree only once, but it will visit each of the branching nodes three times: A, B, D, B, E, B, A, C, F, H, F, I, F, C, G, C, A . There are therefore three options for processing the data at the branching nodes:

Preorder. Process the data at the first visit to the node. For Figure 3, the nodes would be processed in the order A, B, D, E, C, F, H, I, G.

Process the data at the first visit to the node. For Figure 3, the nodes would be processed in the order A, B, D, E, C, F, H, I, G. Postorder. Process the data at the last visit to each node. For Figure 3, the nodes would be processed in the order D, E, B, H, I, F, G, C, A.

Process the data at the last visit to each node. For Figure 3, the nodes would be processed in the order D, E, B, H, I, F, G, C, A. Inorder. Process the data at the middle visit to each node. For Figure 3, the nodes would be processed in the order D, B, E, A, H, F, I, C, G.

These definitions can be generalized to trees with an aribtrary number of children at each branching node. They can also be generalized to graphs by finding a spanning tree, which includes all the nodes of the graph, but omits some subset of the arcs.

A common application of graph or tree walking algorithms is the translation of a parse tree or a conceptual graph to some natural or artificial language. The patterns of word order in various natual languages can be generated by different ways of walking through a conceptual graph and translating the concept nodes to words of the target language (Sowa 1984). Irish and Biblical Hebrew, for example, are preorder languages that put the verb first, Latin and Japanese are postorder languages that put the verb last, and English and Chinese are inorder languages that put the verb in the middle.

The terminology for graphs in this section is fairly standard, but many of the ideas have been developed independently by different people, who have introduced different terms. Some authors use the terms vertex and edge instead of node and arc. Others distinguish degrees of connectivity in a directed graph: it is strongly connected if there is a directed path between any two nodes, and it is weakly connected if there is only an undirected path between some pair of nodes. Some authors use the term digraph as an abbreviation for directed graph, but that use is confusing, since digraph should mean double graph. Occasionally, people introduce fancy terms like arborescence for rooted tree, but the simpler terminology is more descriptive.

5. Relations