Our programming language extends the standard SLDNF resolution algorithm (see Methods ), with three main modifications. First, we simplified the treatment of negative predicates to avoid a consistency problem with standard SLDNF known as. Floundering occurs when trying to prove a negative predicate containing free variables. Such a predicate does not have a finitely failed tree because it is indeterminate, therefore forcing SLDNF to fail such cases can result in inconsistencies. (36) Unfortunately, floundering is an undecidable property that cannot be verified statically. To circumvent this issue, during program execution we check that negative predicates are fully instantiated and therefore have no free variables. An example of a fully instantiated negative predicate for our logic program is. Second, we modified the, which determines the next node in the search tree to expand during resolution. Standard SLDNF typically aims to find a single solution to a goal in the fastest way possible, and therefore uses a depth-first search (DFS) strategy. However, our language is used to fully explore the possible behaviors of a system, rather than finding a single behavior, so our resolution strategy employs a breadth-first search (BFS) strategy. This has stronger guarantees than DFS since it is, in the sense that a solution is always found if one exists. (38) As a consequence, the order in which clauses are written in a program is not important, unlike other logic programming languages such as Prolog. Also, since our search strategy is BFS, we do not support extra-logical operators such as cut (!) to curb backtracking during solution finding. Third and most importantly, our language extends the standard unification algorithm with a novel equational theory of nucleic acid strands, using theapproach. Our equational theory definesandthat allow the identification and sound manipulation of general nucleic acid motifs. The soundness of our method rests on the Double-Push Out approach of graph grammar theory. (38,39) The following section presents the equational theory of our language though an example.

The matching of predicates and terms in SLDNF resolution is performed by a procedure called unification . This is a constraint satisfaction algorithm that works on sets of equality constraints of the form t 1 ≐ t 2 , and finds an assignment θ of logical variables in terms t 1 and t 2 such that the two terms become identical, with θ( t 1 ) = θ( t 2 ). It can be viewed as a generalized form of pattern matching. Unification works by iteratively decomposing equalities over composite terms into equalities over their components. For example, the equality mortal (“ Socrates ”) ≐ mortal ( X ) is decomposed into an equality over predicate names, mortal ≐ mortal , and an equality over the arguments, “ Socrates ” ≐ X . The first equality is trivially true, while the second equality is solved by the assignment θ ( X ) = “ Socrates ”. Unification succeeds with a solution θ when all of the equality constraints are satisfied.

This program has two Horn clauses : a fact stating that Socrates is human, and a clause stating that for any X , if X is human then X is also mortal. SLDNF resolution provides a logically sound algorithm to verify that an atomic predicate A , also called goal , is the logical consequence of a logic program. For our example, to know whether Socrates is mortal we can query the system with the goal mortal (“Socrates”) . SLDNF resolution then attempts to verify the goal using all declared facts and clauses in the logic program. In this case the verification succeeds, since the goal is indeed a consequence of the program. If the goal also contains logical variables, SLDNF resolution finds all possible instantiations of the variables such that the goal succeeds. For our example, the system can find all mortals with the goal mortal(X) . In this case the only solution is X = “Socrates″ .

We developed an inference system for our logic programming language based on SLDNF resolution, which is widely used in logic programming languages such as Prolog. (36,37) We briefly summarize the semantics of SLDNF resolution using a simple logic program based on the well-known Aristotelian syllogism

Nucleic Acid Equational Theory

DNA strand displacement,toehold. It can implement a broad range of computation, including any computation that can be expressed as an abstract chemical reaction network, We present the equational theory of our logic programming language by using it to encode the elementary rules of (11) whose systematic use was pioneered in ref (40) . DNA strand displacement involves an invading single strand of DNA displacing an incumbent strand, hybridized to a template strand. The process is mediated by a short, single-stranded region of DNA referred to as a. It can implement a broad range of computation, including any computation that can be expressed as an abstract chemical reaction network, (12) such as oscillations, switches, population protocols, combinatorial logic and sequential logic. Chemical reaction networks are known to be Turing complete with an arbitrarily small probability of error, (41) due to the finite number of species involved. However, DNA strand displacement systems can form potentially unbounded polymers, and are therefore more expressive in that they have been shown to be Turing powerful. (42)

process, which is defined as a multiset of nucleic acid strands (domain, which represents a unique nucleic acid sequence. We assume that a domain can bind to its complement but cannot interact with any other domains in the system. In practice, this is achieved by ensuring that distinct domains use noninterfering nucleic acid sequences, for instance by relying on appropriate coding strategies.x* denotes the Watson–Crick complement of domain x. Toeholds are domains that are assumed to be short enough to spontaneously unbind from their complement. A toehold is labeled with a caret, written x∧, where the complement of x∧ is x∧*. We indicate that two domains x and x* are bound using the notation x!i and x*!i, where i is a unique identifier called a bond, which can be either a variable name or an integer. We define a site s as a domain that is either bound or free, and a sequence S as a nonempty sequence of sites, ordered from the 5′ end to the 3′ end. A process P is a multiset of strands <S 1 > | ... | <S N >, separated by the parallel composition operator (|), where each strand is a sequence enclosed in angle brackets. In addition, we define a species as a set of strands bound to each other such that they form a connected component. For convenience, we define additional syntactic sugar that allows a species to be enclosed in square brackets and preceded by a constant, which denotes the species population. We represent a nucleic acid system in our language as a, which is defined as a multiset of nucleic acid strands ( Table 1 ). The basic abstraction of our language is the, which represents a unique nucleic acid sequence. We assume that a domain can bind to its complement but cannot interact with any other domains in the system. In practice, this is achieved by ensuring that distinct domains use noninterfering nucleic acid sequences, for instance by relying on appropriate coding strategies. (13) We represent a domain with a lower-case variable and annotate complementary domains with a star, where* denotes the Watson–Crick complement of domain. Toeholds are domains that are assumed to be short enough to spontaneously unbind from their complement. A toehold is labeled with a caret, written, where the complement ofis*. We indicate that two domainsand* areusing the notationand*!, whereis a unique identifier called a, which can be either a variable name or an integer. We define aas a domain that is either bound or free, and aas a nonempty sequence of sites, ordered from the 5′ end to the 3′ end. A processis a multiset of strands <> | ... | <>, separated by the parallel composition operator (|), where each strand is a sequence enclosed in angle brackets. In addition, we define aas a set of strands bound to each other such that they form a connected component. For convenience, we define additional syntactic sugar that allows a species to be enclosed in square brackets and preceded by a constant, which denotes the species population.

Table 1. Syntax of Processes a d::= x | x* | x∧ | x∧* | X domain k::= x | int | X bond, int ≥ 0 t::= int | string | x | x(T 1 , ..., T N ) tag, N ≥ 1, int ≥ 0 l::= x | int | X location, int ≥ 0 s::= d{t} @l | d{t} ! k@l | X site, with{t} and @l optional S::= s 1 ... s N sequence of sites, N ≥ 1 P::= <S 1 > | ··· | <S N > process, N ≥ 0

For example, consider the following process, which relies on DNA strand displacement to compute a join operation. A corresponding graphical representation of the process is also shown.

Table 2. Syntax of Patterns and Contexts a π::= <S> | <S | S | S> | S 1 > | <S 2 | ⌀ pattern C N ::= [·] i | P | < S C N | S C N | C N S> | C N | C N context with N holes, 1 ≤ i ≤ N

Table 3. Syntax of Logic Programs a T::= X | int | float | string | π | C N [π 1 ]...[π N ] | X[π 1 ]...[π N ] term | x(T 1 , ..., T N ) | [T 1 ;...; T N ] | [T 1 ;...; T N # X] A::= x(T 1 , ..., T N ) atomic predicate L::= A | not A literal H::= A: – L 1 , ..., L N horn clause

The process consists of a multiset of species, separated by the parallel composition operator, where each species is enclosed in square brackets and preceded by its population. The first species is a single strand < tb∧ b >, consisting of a toehold tb∧ followed by a domain b . Similarly, the second species is a single strand < tx∧ x >. The third species is a complex consisting of a strand < to∧*!1 x*!2 tx∧*!3 b*!4 tb∧* > bound to two shorter strands < x!2 to ∧!1 > and < b!4 tx ∧ ! 3 >. The bonds are omitted in the corresponding graphical representation, since they are only used to determine connectivity.

logical variables X (wildcard “_” is the logical variable that matches any term. Logical variables can then be combined with patterns π (S> matches a strand with exactly sequence S, while the pattern <S matches a strand with sequence S at its 5′ end, and the pattern S> matches a strand with sequence S at its 3′ end. The pattern S matches a sequence that can be present anywhere in a process, and the pattern S 1 > | <S 2 matches a nick between two adjacent strands, where S 1 > and <S 2 represent the two ends of the strands where the nick occurs. For example, the pattern a!1 > | < b!2 matches the nick in the double stranded complex < d a!1 > | < b * !2 a * !1 > | < b!2 c >. Note that the order in which strands are written in a complex is not significant for pattern matching, since processes are identified up to reordering of strands. In general, the pattern S 1 > | <S 2 matches any two strands in a process such that the 3′ end of one strand matches S 1 , while the 5′ end of the other strand matches S 2 . This pattern does not require the strands to be adjacent to each other or bound to a common strand, though this constraint can be encoded explicitly for nicking enzymes ( In addition to specifying the initial conditions of a system as a process, our language allows logic predicates to be defined in order to automatically generate system behavior. This is achieved by extending the syntax of processes with 1 Table ), where the“_” is the logical variable that matches any term. Logical variables can then be combined with 2 Table ) to match a specific part of a process. The pattern <> matches a strand with exactly sequence, while the pattern matches a strand with sequenceat its 3′ end. The patternmatches a sequence that can be present anywhere in a process, and the pattern> | and | | <> | <>. Note that the order in which strands are written in a complex is not significant for pattern matching, since processes are identified up to reordering of strands. In general, the pattern> | Figure 4 ). The empty pattern ⌀ does not match any strand, and is used to model the creation and deletion of strands.

P using the notion of a context C N (N holes”, i in C N is associated with a number i ∈ N. The matching is performed by applying a context C N to patterns π 1 ...π N , written C N [π 1 ]...[π N ]. This fills each numbered hole [·] i in C N with the corresponding pattern π i . For example, applying the context C 2 = <d 1 d 2 [·] 2 | <d 4 [·] 1 d 6 > to the patterns π 1 = d 5 and π 2 = d 3 > is written C 2 [d 5 ][d 3 >] and results in the process <d 1 d 2 d 3 > | <d 4 d 5 d 6 >. Note that only patterns of the same kind can be replaced with each other: for example, a 3′ end pattern cannot be replaced with a nick pattern. The Methods provides more details on the well-formedness conditions for pattern substitution. To allow general logic predicates to be defined, our language embeds these patterns and contexts in a general logic programming language, by extending the standard syntax of Prolog ( A pattern π is matched with a processusing the notion of a 2 Table ), defined as a “process withholes”, (43) where each hole [·]inis associated with a number. The matching is performed bya contextto patterns π...π, written[π]...[π]. This fills each numbered hole [·]inwith the corresponding pattern π. For example, applying the context= to the patterns πand π> is written][>] and results in the process <> | <>. Note that only patterns of the samecan be replaced with each other: for example, a 3′ end pattern cannot be replaced with a nick pattern. The Methods provides more details on the well-formedness conditions for pattern substitution. To allow general logic predicates to be defined, our language embeds these patterns and contexts in a general logic programming language, by extending the standard syntax of Prolog ( Table 3 ).

P1 and P2 to bind, producing the resulting process Q : We now illustrate how this language can be used to define logic predicates that automatically generate the behavior of DNA strand displacement systems. The following logic predicate defines the conditions that need to be satisfied in order for processesandto bind, producing the resulting process

P1 matches a context C1[D] and P2 matches a context C2[D ′ ] , such that D is complementary to D ′, as specified by the built-in predicate compl(D,D ′ ) . The resulting process Q is obtained by replacing D with D!i in context C1 , and replacing D ′ with D ′ !i in context C2 , written C1[D!i] | C2[D ′ !i] . Furthermore, we require that the bond i is fresh in the sense that it should not occur anywhere in processes P1 and P2 . This is enforced by the built-in predicate freshBond(D!i, P1 | P2) . This example illustrates how contexts are used to match specific patterns in a process and then update these matched patterns directly in place. The use of contexts in this way is a powerful abstraction for writing rules that generate behavior. The resulting process Q is the complex that is produced when P1 binds to P2 and is represented graphically as follows: The predicate is satisfied ifmatches a contextandmatches a context, such thatis complementary to′, as specified by the built-in predicate. The resulting processis obtained by replacingwithin context, and replacing′ within context, written. Furthermore, we require that the bondisin the sense that it should not occur anywhere in processesand. This is enforced by the built-in predicate. This example illustrates how contexts are used to match specific patterns in a process and then update these matched patterns directly in place. The use of contexts in this way is a powerful abstraction for writing rules that generate behavior. (43) If we apply this rule to our example process defined previously, we obtain the following instantiations of the logical variables:The resulting processis the complex that is produced whenbinds toand is represented graphically as follows:

P to perform a strand displacement step, resulting in the process Q : Similarly, the following predicate defines the conditions that need to be satisfied in order for processto perform a strand displacement step, resulting in the process

P matches a context in which the sequence D ′ !iE ′ !j is bound to the sequence E!jD on bond j and to the sequence D!i on bond i , then the unbound domain D can replace the bound domain D!i . The sites E!j and D!i are included as arguments of the predicate, to record the bound domains at the beginning and end of the displacement, respectively. Applying this predicate to the above process results in the following, where the bound domain b has been displaced: Note that this predicate only allows displacement to take place in the 5′ to 3′ direction. As a result, we also need to define a symmetric displaceL predicate for the 3′ to 5′ direction ( The predicate states that if the initial processmatches a context in which the sequenceis bound to the sequenceon bondand to the sequenceon bond, then the unbound domaincan replace the bound domain. The sitesandare included as arguments of the predicate, to record the bound domains at the beginning and end of the displacement, respectively. Applying this predicate to the above process results in the following, where the bound domainhas been displaced:Note that this predicate only allows displacement to take place in the 5′ to 3′ direction. As a result, we also need to define a symmetricpredicate for the 3′ to 5′ direction ( Figure 1 ).

Figure 1 Figure 1. Logic program and automatically generated chemical reaction network for a DNA strand displacement example computing the Join of two signals. (A) Logic program encoding binding, unbinding, and displacement predicates, together with the initial conditions of the DNA strand displacement system. (B) Graphical representation of the corresponding chemical reaction network generated by the logic program. The graph consists of two types of nodes, representing species and reactions. Each species node contains a graphical representation of a DNA complex. Each reaction node displays the rate of the forward reaction on top and the rate of the reverse reaction, when present, on the bottom. Edges between a species node and a reaction node that have an open arrowhead denote the products of the reaction. Edges with either no arrowhead or a solid arrowhead denote the reactants, where solid arrowheads are used to denote a reversible reaction. Species present initially are highlighted in bold, with the remaining species generated automatically by the logic program.

Finally, the following predicate defines the conditions for unbinding:

toehold(D) , and relies on an additional predicate to check that there are no bound domains adjacent to domain D : This encodes the assumption that only the toehold domains are short enough to unbind, as specified by the built-in predicate, and relies on an additional predicate to check that there are no bound domains adjacent to domain

Note that this predicate takes the site D!i as an argument, which contains both the domain D and its corresponding bond i . Since the bond can only occur twice in a well-formed process, this allows the inference system to pinpoint the specific domain on which unbinding occurs in order to test for adjacent bound domains, even if there are multiple occurrences of domain D in the system.

chemical reaction network (CRN), defined as a set of reactions, where each reaction consists of a multiset of reactant species, a reaction rate and a multiset of product species. To achieve this we follow the approach presented in ref species predicate, which converts a process in the language to a multiset of species, together with the definition of a reaction predicate, which generates the reactions that can occur involving one or more species. We adapted this method to our logic programming language, by defining a built-in species predicate that converts a process P to a multiset of species, where a species is a set of one or more strands that form a connected component, and by allowing the programmer to define one or more custom reaction predicates of the form reaction([P 1 ;···;P N ], R, Q). The reaction predicate takes as input a list of one or more processes [P 1 ;···;P N ] denoting the reactant species, together with a reaction rate R, and produces as output the product of the reaction, specified as a process Q. The system then takes care of splitting the process Q into individual products species, using the built-in species predicate. Applying this approach to our example, we can encode the rules of DNA strand displacement by defining reaction predicates for binding, displacement, and unbinding of strands, using our previously defined predicates. In this case we define fixed rates for bind, displace, and unbind, however the predicates can be further refined so that the rates depend on the specific domains involved ( x to ∧ > is produced only if both strands < tb ∧ b > and < tx ∧ x > are present. We now combine the various predicates to automatically generate the behavior of a nucleic acid system. In our language we choose to represent this behavior as a(CRN), defined as a set of, where each reaction consists of a multiset of reactant species, a reaction rate and a multiset of product species. To achieve this we follow the approach presented in ref (44) , which defines a method for converting a process in an arbitrary biological programming language to a CRN. The method requires the definition of apredicate, which converts a process in the language to a multiset of species, together with the definition of apredicate, which generates the reactions that can occur involving one or more species. We adapted this method to our logic programming language, by defining a built-inpredicate that converts a processto a multiset of species, where a species is a set of one or more strands that form a connected component, and by allowing the programmer to define one or more custompredicates of the form([;···;],). Thepredicate takes as input a list of one or more processes [;···;] denoting the reactant species, together with a reaction rate, and produces as output the product of the reaction, specified as a process. The system then takes care of splitting the processinto individual products species, using the built-inpredicate. Applying this approach to our example, we can encode the rules of DNA strand displacement by definingpredicates for binding, displacement, and unbinding of strands, using our previously defined predicates.In this case we define fixed rates for bind, displace, and unbind, however the predicates can be further refined so that the rates depend on the specific domains involved ( Section S1.1 ). If we apply these three predicates to the Join example, we automatically generate a chemical reaction network of the system behavior ( Figure 1 ), where the strand <> is produced only if both strands <> and <> are present.

cover predicate (displace, cover, and bind predicates are extended to occur on a maximal sequence of consecutive domains. We achieve this by defining corresponding displaces, covers, and binds predicates, using a recursive encoding. For example, the following displaces predicate defines a maximal sequence of consecutive displacements: Although the rules in Figure 1 are sufficient to accurately generate the desired behavior of our Join example, additional predicates are needed in order for the same set of rules to generate the complete behavior of a broad range of DNA strand displacement systems. For instance, we also need to account for the case in which adjacent complementary domains can bind to each other, by defining an appropriatepredicate ( section S1.1 ). In addition, since the division of a DNA sequence into domains can be done arbitrarily, in order to maintain biological accuracy we need to ensure that the, andpredicates are extended to occur on a maximal sequence of consecutive domains. We achieve this by defining corresponding, andpredicates, using a recursive encoding. For example, the followingpredicate defines a maximal sequence of consecutive displacements: