Genetic encoding of the Hamiltonian Path Problem

The design of our bacterial computer benefited from a series of abstractions of DNA sequence into the edges and nodes of a Hamiltonian path. The first abstraction treated DNA segments as edges of a directed graph. DNA edges flanked by hixC sites can be reshuffled by Hin recombinase, creating random orderings and orientations of edges of the graph. The second abstraction treated all nodes, except the terminal one, as genes split into two halves (Figure 2). The first (5') half of the gene for a given node is found on any DNA edge that terminates at the node, while the second (3') half of the gene is found on any DNA edge that originates at the node. The final abstraction was an arrangement of DNA edges that represented a HPP solution and exhibited a new phenotype. To place our proposed improvement of DNA computing in the historical context of the graph in Figure 1, we designed the constructs shown in Figure 2. Each node in the graph is represented by a gene that encodes an observable phenotype, such as antibiotic resistance or fluorescence. The exception to this is node 5, which is represented by a transcription terminator to ensure that it will be the last node in the Hamiltonian path. Each 5' half of a gene is denoted by the left half of a circle and each 3' half is denoted by the right half of a circle. Gene halves connected by arrows and flanked by triangular hixC sites are the flippable DNA edges. The order and orientation of the DNA edges determines the starting configuration, an example of which is illustrated in Figure 2a. Hin-mediated recombination of the 14 DNA edges could produce 1.42 × 1015 possible configurations. Of these, a small fraction represent Hamiltonian paths with all of the node genes intact (see mathematical modeling section below for details). An example of one of these solution configurations is illustrated in Figure 2b. Bacterial colonies that contain an HPP solution will express a unique combination of phenotypes that can be detected directly or found by selection.

Figure 2 Illustration of the use of split genes to encode a seven node Hamiltonian Path Problem. a. The manner in which each of the directed edges in Figure 1 could be encoded in DNA is illustrated. The 5' half of each node gene is denoted by and the 3' half is denoted by . DNA edges are depicted by gene halves connected by arrows and flanked by triangles that represent hixC sites. Transcription in the direction of the solid arrow would terminate early and result in the expression of only one marker gene. b. Hin-mediated recombination would randomly reshuffle the DNA edges into many configurations. One possible example of an HPP solution configuration with its marker gene halves reunited is illustrated. Transcription in the direction of the solid arrow would result in expression of the six marker gene phenotypes. Full size image

Splitting GFP and RFP genes

Once we were convinced that our proposed in vivo DNA computer could solve a HPP, we chose a simpler three node graph for our first biological implementation of the problem. To execute our design, we needed to split two marker genes by inserting hixC sites. For each gene to be split, we had to find a site in the encoded protein where 13 specific amino acids could be inserted without destroying the function of the protein. We examined the three-dimensional structure of each protein candidate, chose a site for the insertion, built gene halves, and tested the reunited halves with the 13 amino acid insertion for protein function. We successfully inserted hixC sites into the coding sequences of both GFP and RFP without loss of fluorescence [11]. We inserted the hixC site between amino acids 157 and 158 in GFP, and between the structurally equivalent amino acids 154 and 155 in RFP. Each of the insertions extended a loop outside of the beta barrel structure of the fluorescent proteins. We also tested two hybrid constructs to ensure that they would not fluoresce. We assembled the 5' half of GFP with the 3' half of RFP and the hybrid protein did not fluoresce red or green (data not shown). Similarly, the 5' half of RFP placed upstream of the 3' half of GFP did not cause fluorescence (data not shown). In addition, none of the four half proteins fluoresced by themselves (data not shown). These results demonstrated the suitability of the GFP and RFP gene halves as parts for use in programming a bacterial computer to solve an HPP. Being able to split two genes enabled us to design a bacterial computer to solve an HPP for a three node directed graph.

Mathematical modeling of bacterial computational capacity

We used mathematical modeling to examine several important questions about the system. The first question is whether the order and orientation of the DNA edges in a starting construct affect the probability of detecting an HPP solution. During an HPP experiment, billions of bacteria cells will attempt to find a solution by random flipping of DNA edges catalyzed by Hin recombinase. We developed a Markov Chain model in MATLAB using the signed permutations of {1,2,...n} as the states of DNA edges in the HPP. We assumed that each possible reversal of adjacent DNA edges was equally likely. Using this transition matrix, we computed the probability that any starting configuration would be in any of the solved states after k flips. We conducted this analysis for a number of different graphs. Figure 3 shows one example of the results, for a graph with four nodes and three edges. The graph shows a relatively quick convergence to equilibrium, as was the case for all the graphs we analyzed. In this example, there are 48 possible configurations of the edges, only one of which is a solution. After about 20 flips, the probability that the edges are in the solution state (or any other state) is 1/48 (≈ 0.02). Consideration of the reaction rate reported for Hin recombinase [12] led us to conclude that equilibrium could be reached in the 3-node, 3-edge experiment that we intended to use as a proof-of-concept. Assuming that E. coli divides every 20–30 minutes and that we grow the cells for 16 hours, exceeding 20 flips should occur even if Hin recombinase catalyzes only one reaction per cell cycle.

Figure 3 Markov Chain model of solving a Hamiltonian Path Problem. Each colored line represents a different starting configuration of a graph with four nodes and three edges. As the number of flips increases, the probability of finding a Hamiltonian path solution converges to 1/48, or about 0.02. Full size image

We also used mathematical modeling to determine how many bacteria would be needed to have high confidence that, after Hin recombination, at least one cell would contain a plasmid with a true HPP solution. For the example of the graph in Figure 1, each HPP solution would have six DNA edges in the proper order and orientation followed by the remaining eight edges in any order and orientation. Because there are 8! ways to order the eight remaining edges, and two ways to orient each one, there are 8!·28 = 10,321,920 different configurations that are solutions, one example of which is shown in Figure 2b. There is a total of 14!·214 = 1.42 × 1015 possible configurations of the edges (14! ways to order the edges, and two ways to order each one), many of which are not even valid connected paths in the graph, much less Hamiltonian paths. The probability of any one plasmid holding an HPP solution is p = (8!·28)/(14!·214). Assuming that the states of different plasmids are independent and that a sufficient number of flips has occurred to achieve a uniform distribution of the 14!·214possible configurations, the probability that at least one of m plasmids holds an HPP solution is 1-(1-p)m. From this expression, we can solve for m to find the number of plasmids needed to reach the desired probability of finding at least one solution. For example, if we wanted to be 99.9% sure of finding an HPP solution, we would need at least one billion independent, identically distributed plasmids. A billion E. coli can grow overnight in a single culture. It should be noted, however, that it may take longer than that for Hin recombination to produce a uniform distribution of all possible plasmid configurations. Since each bacterium would have at least 100 copies of the plasmid, the computational capacity of a billion cells exceeds our needs by two orders of magnitude. Because the number of processors would be increasing exponentially, the time required for a biological computer to evaluate all 14!·214 configurations is a constant multiple of log(14!·214), or approximately 14·log(14), while the time required for a conventional computer to evaluate the same number of paths would be a constant multiple of 14!·214.

A key feature of our experimental design is the simplicity of detecting answers with phenotypes of red and green fluorescence resulting in yellow colonies. However, when our design is applied to a more complex problem such as the one presented in Figures 1 and 2, it is possible that a colony with a correct phenotype might have an incorrect genotype, resulting in a false positive. We considered the question of whether there are too many false positives to detect a true positive. Using MATLAB, we computed the number of true positives for the 14-edge graph in Figure 1 to be 10,321,920 and the number of total positives to be 168,006,848. The ratio of true positives to total positives is therefore approximately 0.06. Since all false positive solutions must have at least one more edge between the starting node and the ending node than in the true solution states, putative solutions could be screened using PCR. However, since the ratio of true to total positives gets smaller with the size of the problem, this approach becomes increasingly impractical. An alternative would be to conduct high throughput DNA sequencing of pooled putative solution plasmids.

Our mathematical modeling supported the conclusion that our experimental design could solve Hamiltonian Path Problems. As a proof-of-concept, we designed a simple directed graph with a unique Hamiltonian path and programmed a bacterial computer to find that path.

Programming a bacterial computer

Figure 4a shows the directed graph with three nodes and three edges that we chose to encode in our bacterial computer. The graph contains a unique Hamiltonian path starting at the RFP node, traveling via edge A to the GFP node, and using edge B to reach the ending TT node. Edge C, from RFP to TT, is a detractor. Figure 4b illustrates the DNA constructs we used to encode a solved HPP as a positive control and two unsolved starting configurations. Since the solution must originate at the RFP node and terminate at the GFP node, DNA edge A contained the 3' half of RFP followed by the 5' half of GFP. DNA edge B originated at GFP and terminated at TT, so its DNA segment has 3' GFP followed by the double transcription terminator. DNA edge C originated with the 3' half of RFP and terminated at TT. Each of the 5' gene halves included a ribosome binding site (RBS) upstream of its start codon in order to support translation.

Figure 4 DNA constructs that encode a three node Hamiltonian Path Problem. a. The three node directed graph contains a Hamiltonian path starting at the RFP node, proceeding to the GFP node, and finishing at the TT node. b. Construct ABC represents a solution to the three node HPP. Its three hixC-flanked DNA segments are in the proper order and orientation for the GFP and RFP genes to be intact. ACB has the RFP gene intact but not the GFP gene, while BAC has neither gene intact. Full size image

As illustrated in Figure 4b, we designed an expression cassette to contain the three DNA edges. To ensure the solution begins at the RFP node, the cassette starts with a bacteriophage T7 RNA polymerase promoter, an RBS, and 5' RFP prior to the first hixC site. Construct ABC represents one of two HPP solutions since it begins with the RFP node, passes through GFP and ends with TT. Since both the RFP and GFP genes are intact, downstream of the promoter, in the correct orientation, and followed by the transcriptional terminators, ABC colonies should express both red and green fluorescence and appear yellow. A second solution is ABC', in which forward DNA edges A and B are followed by backwards DNA edge C. Bacteria containing this configuration are expected to fluoresce yellow, since RFP and GFP are intact and in forward orientation. Construct ACB has the RFP gene intact, in the correct orientation, and uninterrupted by transcriptional terminators, but its GFP gene halves are not united. As a result, this construct is predicted to produce red colonies. The BAC construct has neither RFP nor GFP intact and should not fluoresce at all. The three plates on the left side of Figure 5 show that all three constructs produced the predicted phenotypes in the absence of Hin recombinase: ABC colonies fluoresce yellow, ACB colonies fluoresce red, and BAC colonies show no fluorescence.

Figure 5 Detecting solutions to a Hamiltonian Path Problem with bacterial computing. Bacterial colonies containing each of the three starting constructs ABC, ACB, and BAC are shown on the left. Hin recombination resulted in the three plates of colonies on the right. The callouts include yellow colored colonies that contain solutions to the HPP. Full size image

Random orderings of edges in the directed graph were produced by Hin-mediated recombination in a separate experiment using each of the three starting constructs ABC, BAC, or ACB. In a given experiment, bacteria were cotransformed with 1) a plasmid conferring ampicillin resistance and containing one of the three starting constructs and 2) a plasmid encoding tetracycline resistance with a Hin recombinase expression cassette. The resulting cotransformed colonies were grown overnight for isolation of plasmids containing the Hin-exposed HPP constructs. The isolated plasmids were then used in a second round of transformation into bacteria that expressed bacteriophage T7 RNA polymerase and plated on media containing only ampicillin (Figure 5). Ampicillin-resistant colonies were grown overnight to allow the T7 RNA polymerase to transcribe each plasmid in its final flipped state. Because each colony represented a single transformation event and Hin was no longer present, each colony contained isogenic plasmids and thus only one configuration of the three DNA edges. This experimental protocol was followed for each of the three starting constructs.

Verifying bacterial computer solutions to a Hamiltonian Path Problem

Once Hin recombinase reorders the DNA edges of each of the constructs, a distribution of 48 possible configurations is expected. The positive control ABC construct should convert from its yellow fluorescent starting phenotype to the red and uncolored phenotypes of unsolved arrangements. The ABC recombination plate pictured in Figure 5 matched our prediction. We assumed that the double transcriptional terminator would function in reverse orientation, so that green colonies would not be possible in the experiment. However, green colonies on the ABC recombination plate indicate that TT did not block further transcription. The ABC recombination plate also shows a number of unusually colored colonies that were not expected, which we discuss later.

The ACB starting construct was expected to undergo Hin-mediated recombination to produce a variety of configurations, including a solution that requires at least two flips. Yellow fluorescent colonies representing putative HPP solutions are visible on the ACB recombination plate. The BAC starting configuration was three flips away from the nearest solution. Several examples of yellow fluorescent colonies on the BAC recombination plate are candidates for solutions to the HPP. As with the ABC recombination plate, we found unexpected colony colors on both the ACB and BAC recombination plates.

Yellow fluorescent colonies on the ACB and BAC recombination plates provided preliminary evidence that the bacterial computer had solved both versions of the HPP. We wanted to verify this result by sequencing plasmid DNA to determine the genotypes of three yellow colonies from each of the ABC, ACB, and BAC recombination plates. All nine colonies had a genotype of ABC or ABC', in which the third DNA edge is in reverse orientation (Figure 6). Both of these configurations represent a solution to the HPP. These results verified that our bacterial computer had found true solutions to a three node HPP configured in two different starting orientations.