Building a computing system in bacteria Finite state machines are logic circuits with a predetermined sequence of actions that are triggered depending on the starting conditions. They are used for a variety of devices and biological systems, from vending machines to neural circuits. Roquet et al. have taken a finite state machine approach to control the expression of integrases, or enzymes that insert or excise phage DNA into or out of bacterial chromosomes. The integrases altered the DNA sequence of a plasmid to record all five possible combinations of two inputs. Such circuits can be used to record the states that the cell experienced over time and can be deployed in state-dependent gene expression programs. Science, this issue p. 363

Structured Abstract INTRODUCTION Living systems execute regulatory programs and exhibit specific phenotypes depending on the identity and timing of chemical signals, but general strategies for mimicking such behaviors with artificial genetic programs are lacking. Synthetic circuits that produce outputs only depending on simultaneous combinations of inputs are limited in their scale and their ability to recognize dynamics because they do not uniquely detect or respond to temporally ordered inputs. To address these limitations, we developed and experimentally validated a framework for implementing state machines that record and respond to all identities and orders of gene regulatory events in living cells. RATIONALE We built recombinase-based state machines (RSMs) that use input-driven recombinases to manipulate DNA registers made up of overlapping and orthogonal pairs of recombinase recognition sites. Specifically, chemical inputs express recombinases that can perform two types of irreversible operations on a register: excision if their recognition sites are aligned, or inversion if their recognition sites are anti-aligned. The registers are designed to adopt a distinct DNA sequence (“state”) for every possible “permuted substring” of inputs—that is, every possible combination and ordering of inputs. The state persists even when inputs are removed and may be read with sequencing or by polymerase chain reaction. Using mathematical analysis to determine how the structure of a RSM relates to its scalability, we found that incorporating multiple orthogonal pairs of recognition sites per recombinase allows a RSM to outperform combinational circuits in scale. Genetic parts (made up of promoters, terminators, and genes) may be interleaved into RSM registers to implement gene regulation programs capable of expressing unique combinations of genes in each state. In addition, we provide a computational tool that accepts a user-specified two-input multigene regulation program and returns corresponding registers that implement it. This searchable database enables facile creation of RSMs with desired behaviors without requiring detailed knowledge of gene circuit design. RESULTS We built two-input, five-state RSMs and three-input, 16-state RSMs capable of recording every permuted substring of their inputs. We tested the RSMs in Escherichia coli and used Sanger sequencing to measure performance. For the two-input, five-state RSM, at least 97% of cells treated with each permuted substring of inputs adopted their expected state. For the three-input, 16-state RSM, at least 88% of cells treated with each permuted substring of inputs adopted their expected state, although we observed 100% for most treatment conditions. We used these two- and three-input RSMs to implement gene regulation programs by interleaving genetic parts into their registers. For the two-input, five-state system, we designed registers for various gene regulation programs using our computational database and search function. Four single-gene regulation programs and one multigene regulation program (which expressed a different set of fluorescent reports in each state) were successfully implemented in E. coli, with at least 94% of cells adopting their expected gene expression profile when treated with each permuted substring of inputs. Lastly, we successfully implemented two different three-input, 16-state gene regulation programs; one of these—a three-input passcode switch—performed with at least 97% of cells adopting the expected gene expression behavior. CONCLUSION Our work presents a powerful framework for implementing RSMs in living cells that are capable of recording and responding to all identities and orders of a set of chemical inputs. Depending on desired applications, the prototypical inducible systems used here to drive the RSMs can be replaced by sensors that correspond to desired input signals or gene regulation events. We anticipate that the integration of RSMs into complex living systems will transform our capacity to understand and engineer them. Summary of a three-input, 16-state RSM. (A) The RSM mechanism. A chemical input induces the expression of a recombinase (from a gene on the input plasmid) that modifies a DNA register made up of overlapping and orthogonal recombinase recognition sites. Distinct recombinases can be controlled by distinct inputs. These recombinases each target multiple orthogonal pairs of their cognate recognition sites (shown as triangles and half-ovals) to catalyze inversion (when the sites are anti-aligned) or excision (when the sites are aligned). (B) The register is designed to adopt a distinct DNA state for every identity and order of inputs. Three different inputs—orange, blue, and purple—are represented by colored arrows. Unrecombined recognition sites are shaded; recombined recognition sites are outlined.

Abstract State machines underlie the sophisticated functionality behind human-made and natural computing systems that perform order-dependent information processing. We developed a recombinase-based framework for building state machines in living cells by leveraging chemically controlled DNA excision and inversion operations to encode states in DNA sequences. This strategy enables convenient readout of states (by sequencing and/or polymerase chain reaction) as well as complex regulation of gene expression. We validated our framework by engineering state machines in Escherichia coli that used one, two, or three chemical inputs to control up to 16 DNA states. These state machines were capable of recording the temporal order of all inputs and performing multi-input, multi-output control of gene expression. We also developed a computational tool for the automated design of gene regulation programs using recombinase-based state machines. Our scalable framework should enable new strategies for recording and studying how combinational and temporal events regulate complex cell functions and for programming sophisticated cell behaviors.

State machines are systems that exist in any of a number of states, in which transitions between states are controlled by inputs (1). The next state of a given state machine is determined not only by a particular input, but also by its current state. This state-dependent logic can be used to produce outputs that are dependent on the order of inputs, unlike in combinational logic circuits wherein the outputs are solely dependent on the current combination of inputs. Figure 1 depicts a state machine that enters a different state for each permuted substring of two inputs A and B, by which we refer to each distinct combination and ordering of those two inputs: {no input, A only, B only, A followed by B (A → B), B followed by A (B → A)}.

Fig. 1 Example of a state machine. Nodes represent states; arrows represent transitions between states mediated by inputs. Each of the possible permuted substrings of the two inputs A and B generates a unique state.

Synthetic state machines that record and respond to sequences of signaling and gene regulatory events within a cell could be transformative tools in the study and engineering of complex living systems. For example, in human development, progenitor cells differentiate into specific cell types with disparate functions determined by the timing and order of transcription factor (TF) activation (2, 3). This information has allowed researchers to program human stem cells into differentiated cells (4, 5), and conversely, reprogram differentiated cells into stem cells by means of exogenous, sequential TF activation (6, 7). However, the temporal organization of TF cascades that drive different cell lineages remains largely unknown. State machines that record and actuate gene expression in response to the order of TF activation in individual cells would be useful for understanding and modulating these differentiation processes.

Such state machines may also improve our understanding of disease progression, which can also depend on the appearance and order of extracellular and intracellular factors. For example, in cancer, the temporal order of genetic mutations in a tumor can determine its phenotype (8). Similarly, in both somatic diseases and pathogenic infections, preadaptation of disease cells to different environmental conditions may affect the way the cells behave and respond to drug treatments (9–12). Integrating state machines into disease models and subsequently analyzing the history of cells that survive treatment would be useful for understanding how disease progression affects therapeutic response.

Despite their potential to transform the understanding and engineering of biological systems, complex functional state machines have yet to be implemented in living cells because of a lack of scalable and generalizable frameworks (13). Oishi et al. proposed a theoretical CRISPR interference-based strategy for building state machines in living cells, in which state is encoded epigenetically (14). In contrast, we developed a scalable recombinase-based strategy for implementing state machines in living cells, in which a given state is encoded in a particular DNA sequence. The direct storage of state information in the DNA sequence ensures that it is maintained stably and with minimal burden to the cell. Recombinases have been used to implement switches (15–19), chemical pulse counters (20), Boolean logic gates integrated with memory (21, 22), and temporal logic (23). We used them to implement scalable state machines, such as those that can distinguish among all possible permuted substrings of a set of inputs with unique gene expression outputs. We refer to our state machine implementations as recombinase-based state machines (RSMs).

Recombinase-based state machine parts and operations In a RSM, inputs are defined by chemical signals, and state is defined by the DNA sequence within a prescribed region of DNA, termed the register. Chemical signals mediate state transitions by inducing the expression of large serine recombinases that catalyze recombination events on the register, thereby changing the state. Specifically, each recombinase recognizes a cognate pair of DNA recognition sites on the register, attP (derived from a phage) and attB (derived from its bacterial host), and carries out a recombination reaction between them, yielding attL and attR sites (made up of conjoined halves of attB and attP) (24, 25). In the absence of extra cofactors, this reaction is irreversible (fig. S1 and text S1) (26–28). Each site in a cognate attB-attP pair has a matching central dinucleotide that determines its polarity (29, 30). If the two sites are anti-aligned (oriented with opposite polarity) on the register, then the result of their recombination is the inversion of the DNA between them (Fig. 2A and fig. S2A). Alternatively, if the two sites are aligned (oriented with the same polarity) on the register, then the result of their recombination is the excision of the DNA between them (Fig. 2B and fig. S2B). DNA segments that are excised from the register are assumed to be lost because of a lack of origin of replication. Fig. 2 Rules of recombination on a register. The register is depicted as an array of underscored alphabet symbols (arbitrary DNA) and shape symbols (recognition sites). (A) If sites in an attB-attP pair are anti-aligned, then the DNA between them is inverted during recombination. (B) If sites in an attB-attP pair are aligned, then the DNA between them is excised during recombination. (C) Multiple inputs can drive distinct recombinases that operate on their own attB-attP pairs. In this example, input A drives the orange recombinase and input B drives the blue recombinase. (D) Multiple orthogonal attB-attP pairs for a given recombinase can be placed on a register. Here, distinct shapes denote two pairs of attB-attP. Up to six orthogonal and directional attB-attP pairs can be created per large serine recombinase (31). Figure S2 gives more detail on the recombination reactions shown here. When there are multiple inputs to a RSM, they can each drive distinct recombinases that operate only on their own attB-attP pairs. At least 25 (putatively orthogonal) large serine recombinases have been described and tested in the literature (18, 25), and bioinformatics mining can be used to discover even more (18). Recognition sites for multiple recombinases may be arranged in several different ways on the register. If attB-attP pairs from different recombinases are nested or overlapping, then the operation of one recombinase can affect the operation of subsequent recombinases—either by rearranging the relative orientation of their attB and attP sites or by excising one or both sites in a pair from the register—thereby precluding any type of downstream operation on these sites. For example, if we consider the initial register design in Fig. 2C, applying input B → A leads to a unique DNA sequence, but applying A → B leads to the same DNA sequence we would expect if we only applied A, because the A-driven recombinase excises a site for the B-driven recombinase. We measure the “information capacity” of a RSM by the number of distinct states it can access, and hence the number of permuted substrings of inputs it can distinguish. Given the noncommutative nature of recombinase operations on a register, one might naïvely believe that the information capacity of RSMs would behave like N! for N inputs. But if a RSM is designed such that each input-driven recombinase only has one attB-attP pair on the register, the information capacity of the RSM never exceeds 2N, which is the result we would expect if recombinase operations were commutative (Box 1 and text S2). To circumvent this information bottleneck, registers must be designed with multiple orthogonal attB-attP pairs per recombinase. Orthogonal attB-attP pairs for a large serine recombinase can be engineered by mutating the central dinucleotide of each site in the native attB-attP pair (29–31). Pairs of sites with the same central dinucleotide sequence should recombine, but they should not recombine if the central dinucleotide sequences do not match (Fig. 2D and fig. S2C). Box 1 Mathematical discussion of RSMs. If a RSM with N inputs is designed such that each input-driven recombinase only has one attB-attP pair on the register, the number of states cannot exceed 2N. To prove this important claim, we first introduce the concept of irreducibility. An irreducible string of recombinases is one in which, when the recombinases are applied to a register in the given order, each recombinase performs an operation (excision or inversion) on the register. We can make the following two statements about irreducible strings: Statement 1: Every possible state of a register must be accessible by the application of some irreducible string of recombinases. This follows from considering that (i) each state is the result of a string of recombination operations, and (ii) the string of recombinases corresponding to that string of recombination operations is irreducible by definition. Statement 2: Assuming a register with one attB-attP pair per recombinase, all irreducible strings from the same subset of recombinases generate the same state on the register. This follows from considering that (i) all rearrangeable DNA segments on the register are flanked on both sides by attB and/or attP sites belonging to the subset of recombinases being applied; (ii) by the definition of irreducibility, each recombinase in the irreducible string will catalyze recombination between its attB-attP pair; and (iii) when recombination between attB and attP sites occurs, they always form the same junctions: The back end of the attB will join the front end of the attP, and the front end of the attB will join the back end of the attP. Therefore, all rearrangeable DNA segments will form the same junctions after an irreducible string of recombinases is applied, regardless of the order in which those recombinases are applied. Now to prove the claim: Given a RSM with N input-driven recombinases and one pair of attB-attP per recombinase on its register, all states must be accessible by some irreducible string of recombinases (statement 1), and all irreducible strings from the same subset of the N recombinases must generate the same state (statement 2). Therefore, there cannot be more states than there are subsets of recombinases, which is 2N (see text S2 for a more detailed version of this proof). More generally, this proof can be expanded to show that, given k pairs of orthogonal attB-attP pairs per recombinase on a register, the number of states it can access will never exceed 2kN (see text S3). For large serine recombinases, there is a limit of k = 6 orthogonal and directional attB-attP pairs for a given recombinase (31). Therefore, the information capacity of RSMs using large serine recombinases is intrinsically bound exponentially. There are still many unanswered mathematical questions regarding RSM structure. For example, given a DNA sequence, what is the computational difficulty of deciding whether it admits an irreducible ordering of a set of recombinases? Is this problem NP-hard? Also, how can we decide whether an irreducible string of recombinases has minimal length, or whether there might be a shorter irreducible string that produces the same DNA sequence?

Building a two-input, five-state RSM To implement a RSM that enters a different state (five in total) for every permuted substring of two inputs, it was sufficient to use two orthogonal attB-attP pairs for one recombinase and one attB-attP pair for the other recombinase. Figure 3A shows the RSM design and a detailed representation of its state diagram. This RSM is composed of two plasmids: an input plasmid and an output plasmid. The input plasmid, at a high copy number, expresses two large serine recombinases, BxbI and TP901, from the anhydrotetracycline (ATc)–inducible P LtetO promoter and the arabinose (Ara)–inducible P BAD promoter, respectively. The output plasmid, at a single copy number, contains the register that is modified by the recombinases expressed from the input plasmid. Fig. 3 Designing and validating a two-input, five-state RSM. (A) The two plasmids used to implement the RSM (top) and a detailed state diagram demonstrating the resulting register for each permuted substring of the two inputs (ATc and Ara; bottom). (B) The performance of the RSM in E. coli. Nodes represent populations of cells induced with permuted substrings of the inputs ATc (orange arrow) and Ara (blue arrow). Cultures were treated with saturating concentrations of each input (ATc, 250 ng/ml; Ara, 1% w/v) at 30°C for 18 hours in three biological replicates. Node labels indicate the expected state [corresponding to (A)] and the percentage of cells in that state as determined by Sanger sequencing of colonies from individual cells in each population (at least 66 cells totaled over all three biological replicates). The register is initially composed of an aligned BxbI attB-attP pair and two anti-aligned and orthogonal TP901 attB-attP pairs. If ATc is introduced first to the system, then BxbI is expressed and excises the DNA inside of its cognate recognition site pair, which includes a recognition site for TP901. Subsequent introduction of Ara to the system induces the expression of TP901, which recombines its cognate recognition sites on the outer edge of the register, thus inverting everything in between. Conversely, if Ara is introduced first to the system, then the outer TP901 sites invert everything between the edges of the register and the inner TP901 sites invert an inner portion of the register, thus setting the BxbI recognition sites into an anti-aligned configuration. Subsequent application of ATc to the system inverts the sequence of DNA between the BxbI sites. As a result, each permuted substring of the inputs yields a distinct DNA sequence on the register. To evaluate the performance of the RSM in Escherichia coli, we grew five populations of cells that were treated with all five permuted substrings of the inputs ATc and Ara (no input, ATc only, Ara only, ATc → Ara, and Ara → ATc). We Sanger-sequenced the register in colonies of at least 22 cells from each population in each of three biological replicates to determine the percent of cells with the expected DNA sequence (Fig. 3B) (32). At least 97% of all cells treated with each permuted substring of inputs adopted the expected state, thus confirming the fidelity of our RSM. Table S1 provides information for the sequenced registers that were not in the expected state. Because our Sanger sequencing readout of state was low-throughput, we also developed a quantitative polymerase chain reaction (qPCR)–based method to conveniently interrogate state on a population-wide level. The excision and inversion of DNA segments in our register permitted the design of primer pairs that were amplified in some states but not others. We created a computer program, the PCR-based state interrogation tool (PSIT), to identify all possible sets of primer pairs that uniquely identify each state of a given register (fig. S3 and appendix S2). For our two-input, five-state RSM, we chose a set of three primer pairs and performed qPCR on DNA that was isolated from each population of cells treated with all possible permuted substrings of the ATc and Ara inputs. The fractional amount of register DNA amplified was calculated for each primer pair in our set and was compared to what we would expect if all cells in each population adopted just one of the five possible states (32). In agreement with our sequencing results, the qPCR measurements of all experimental populations were most similar to what we would expect if all cells in each population adopted their expected state (fig. S4).

Scaling RSMs We developed a modular register design strategy for building RSMs that enter a distinct state for every permuted substring of inputs (approximately eN! states for N inputs; see table S2 and texts S4 and S5). For N inputs, the design strategy uses N – 1 recognition sites per recombinase, and hence is limited to register designs for up to seven inputs (13,700 states) because only six orthogonal and directional attB-attP pairs can be created per large serine recombinase (31). Because the two-input, five-state RSM shown in Fig. 3A represents only a marginal improvement in information capacity over two-input, four-state systems achievable by combinational computation, we sought to further demonstrate the information capacity enabled by our RSM framework by scaling to a three-input, 16-state RSM (Fig. 4A and fig. S5). The input plasmid for this state machine expresses an additional recombinase, A118, under a 2,4-diacetylphloroglucinol (DAPG)–inducible P PhlF promoter system, and its register uses two orthogonal attB-attP pairs for each of the three recombinases (following the design strategy in text S5). Fig. 4 Scaling to a three-input, 16-state RSM. (A) The two plasmids used to implement the RSM. ATc, Ara, and DAPG induce expression of BxbI, TP901, and A118 recombinases, respectively. A detailed state diagram of the register on the output plasmid is shown in fig. S5. (B) The performance of the RSM in E. coli. Nodes represent populations of cells induced with all permuted substrings of the inputs ATc (orange arrow), Ara (blue arrow), and DAPG (purple arrow). Cultures were treated with saturating concentrations of each input (ATc, 250 ng/ml; Ara, 1% w/v; DAPG, 25 μM) at 30°C for 24 hours in three biological replicates. Node labels indicate the expected state (corresponding to fig. S5) and the percentage of cells in that state as determined by Sanger sequencing of colonies from individual cells in each population (at least 17 cells totaled over all three biological replicates). To evaluate the performance of this RSM in E. coli, we grew 16 populations of cells that were treated with all 16 permuted substrings of the inputs ATc, Ara, and DAPG. We sequenced the register in colonies of five or six cells from each population in each of three biological replicates to determine the percentage of cells with the expected DNA sequence (Fig. 4B) (32). In most populations, 100% of the cells adopted their expected state, and even in the worst-performing population (ATc → Ara → DAPG), 88% of cells adopted their expected state. Table S1 provides information for the sequenced registers that were not in the expected state. We also measured the predominant state of each population by qPCR with a set of six primer pairs elucidated by PSIT (32). In agreement with the sequencing results, the qPCR measurements for all experimental populations were most similar to what we would expect if all cells in each population adopted their expected state (fig. S6).

Gene-regulatory RSMs Our state machine framework enables the creation of state-dependent gene regulation programs that specify which genes should be expressed or not expressed in each state. This could be useful for a wide range of biological applications, such as programming synthetic differentiation cascades, encoding the identities and order of biological events into selectable or sortable reporters, or targeting genetic perturbations to cells that experience a particular order of biological events. Gene regulation programs can be implemented by incorporating genetic regulatory elements, such as promoters, terminators, and genes, into the registers of our RSMs. The rearrangement of these elements in each state should then alter gene expression in a predictable manner. Such RSMs are a biological realization of Moore machines from automata theory, where each state is associated with a set of outputs (1). We refer to them as gene-regulatory RSMs (GRSMs). To help researchers design circuits for desired gene regulation programs, we created a large, searchable database of two-input, five-state GRSM registers. To compile this GRSM database (Fig. 5), we first enumerated all possible registers that could result from interleaving functionally distinct parts (made from terminators, constitutive promoters, and genes; see text S6 for more details) before and after each recombinase recognition site in our validated five-state register from Fig. 3A. We evaluated each state of each register for gene transcription, and aggregated registers that implement the same gene regulation program. During this evaluation step, we assumed that all genes had bidirectional terminators on their 3′ ends, thus disallowing the possibility of an RNA polymerase traversing a gene (in either direction) to transcribe another gene. We also assumed that each gene in a register was distinct. These assumptions were made to simplify register designs and keep the database at a manageable size for fast computational search. Fig. 5 The GRSM database. (Top) Flow diagram depicting how the database was created. (Middle) The database has a precompiled list of GRSM registers for distinct gene regulation programs. State diagrams represent gene regulation programs, with each node containing stripes of different colors corresponding to which genes are expressed in that state (no stripes implies no expression of any gene). (Bottom) A search function accepts a user-specified gene regulation program and returns registers from the database capable of implementing it. To avoid redundancy in the database, we removed any register with superfluous parts (containing terminators, promoters, or genes that do not affect gene regulation in any state) if its “parent” register [the same register except without the superfluous part(s)] was also represented in the database. Moreover, all registers that transcribed either no gene or the same gene in every state were removed from the database, as this gene regulation is trivial to implement. The resulting database (database S1) contains a total of 5,192,819 GRSM registers that implement 174,264 gene regulation programs. Each register is different in the sense that no two registers have all of the same parts in all of the same positions. Registers in the database regulate the transcription of 1 to 14 genes (fig. S7A). A register for any desired program that regulates up to three genes is likely to be in the database, which comprises 100% of possible single-gene regulation programs, 95% of possible two-gene regulation programs, and 61% of possible three-gene regulation programs (fig. S7B). Moreover, 27% of possible four-gene regulation programs are represented in the database, but the percentage drops off steeply beyond that, as the number of possible gene regulation programs grows exponentially with each additional gene (text S7). One could apply straightforward gene replacement principles to go beyond the scope of regulation programs represented in the database—for example, by replacing multiple distinct genes on a register with copies of the same gene, or replacing a gene with a multicistronic operon (fig. S8). To conveniently use the GRSM database for design or exploration, we created a search function that accepts a user-specified gene regulation program and returns all registers from the database that may be used to implement it (Fig. 5 and appendix S1). To create functional GRSMs in E. coli, we implemented the same input-output plasmid scheme as our two-input, five-state RSM (Fig. 3A), except that we substituted registers from our database on the output plasmid. Fluorescent protein (FP) genes were built on the registers to evaluate gene regulation performance. We grew populations of cells treated with all five permuted substrings of the inputs ATc and Ara, and then used flow cytometry on each population to measure the percentage of cells with distinct FP expression profiles (32). We successfully implemented four single-gene regulation programs (Fig. 6, A to D) and one multigene regulation program (in which unique subsets of three distinct FPs were expressed in each state; Fig. 6E), with at least 94% of cells from each experimental population adopting the expected FP expression profile. These GRSMs enable convenient fluorescent-based reporting on the identity and order of cellular events. For example, the GRSM from Fig. 6E allowed us to evaluate the performance of the underlying RSM with increasing input time durations (by 1-hour steps) by means of flow cytometry (fig. S9). Our findings demonstrated that input durations of 2 hours were sufficient for a majority of cells to adopt their expected state. Fig. 6 Implementing two-input, five-state GRSMs. (A to E) We built GRSMs (one for each panel) in E. coli to implement the gene regulation programs depicted at the left, with each node containing stripes of different colors corresponding to which gene products (green, GFP; red, RFP; blue, BFP) are expressed in that state (no stripes implies no expression of any gene). The corresponding GRSM state diagrams are depicted in the middle column, with expressed (ON) fluorescent reporters represented by shaded genes and non-expressed (OFF) fluorescent reporters represented by outlined genes. In the right column, nodes represent populations of cells induced with all permuted substrings of the inputs ATc (orange arrow) and Ara (blue arrow). Cultures were treated with saturating concentrations of each input (ATc, 250 ng/ml; Ara, 1% w/v) at 30°C for 24 hours in three biological replicates. The nodes are shaded according to the percent of cells with different gene expression profiles (ON/OFF combinations of the fluorescent reporters) as measured by flow cytometry. Node labels show the percentage of cells with the expected gene expression profile (averaged over all three biological replicates). Because unpredictable behaviors can result when gene regulatory parts are assembled into specific arrangements, certain GRSMs may not implement gene regulation programs as expected. Indeed, this was the case when we initially tested a GRSM that was expected to express green fluorescent protein (GFP) after being exposed to one of two inputs, Ara only or ATc → Ara (fig. S10A) (32). Rather than debugging, we constructed two alternative GRSMs using different registers from our database (fig. S10, B and C) that performed better than the initial GRSM, one of which had at least 95% of cells with the expected gene expression profile for each experimental population (fig. S10C). In general, many gene regulation programs represented in our database have multiple possible registers that can implement them (fig. S11). For example, most single-gene regulation programs have at least 373 possible registers, most two-gene regulation programs have at least 55 possible registers, and most three-gene regulation programs have at least 14 possible registers. Even for programs in the database that regulate up to 14 genes, most have at least four possible registers that can implement them. This highly degenerate design space offers a range of GRSM registers that can act as alternatives for one another in the event that a particular register fails to perform to a certain standard. Additional computationally and experimentally derived rules might enable ranking of candidate registers for their likelihood of successful gene regulation function. To demonstrate the scalability of GRSMs, we built two different three-input, 16-state GRSMs by interleaving genetic parts into the register from Fig. 4A. One GRSM functions as a three-input passcode switch that turns on the expression of a gene (encoding blue fluorescent protein) only when it receives the input Ara → DAPG → ATc (Fig. 7A). The other GRSM expresses a gene (encoding GFP) by default and turns it off if it receives any input that is not along the Ara → DAPG → ATc trajectory (Fig. 7B). Both GRSMs were implemented in E. coli and tested with all 16 permuted substrings of the inputs ATc, Ara, and DAPG (32). Flow cytometry revealed that at least 93% of cells from each experimental population adopted the expected gene expression profile. Thus, scalable GRSMs that function efficiently can be implemented using our design framework. Fig. 7 Implementing three-input, 16-state GRSMs. (A and B) We built GRSMs in E. coli to implement the gene regulation programs depicted at the lower left of each panel, with each node containing stripes of different colors corresponding to which gene products (blue, BFP; green, GFP) are expressed in that state (no stripes implies no expression of any gene). The corresponding GRSM state diagrams are depicted at the top of each panel, with expressed (ON) fluorescent reporters represented by shaded genes and non-expressed (OFF) fluorescent reporters represented by outlined genes. At the lower right of each panel, nodes represent populations of cells induced with all permuted substrings of the inputs ATc (orange arrow), Ara (blue arrow), and DAPG (purple arrow). Cultures were treated with saturating concentrations of each input (ATc, 250 ng/ml; Ara, 1% w/v; DAPG, 25 μM) at 30°C for 24 hours in three biological replicates. The nodes are shaded according to the percentage of cells with or without gene expression as measured by flow cytometry. Node labels show the percentage of cells with the expected gene expression profile (averaged over all three biological replicates).

Discussion We created state machines by using recombinases to manipulate DNA registers assembled from overlapping and orthogonal recombinase recognition sites. We used a mathematical framework to analyze the information capacity and scalability of our state machines and understand their limits. For a fixed number of inputs, the information capacity enabled by RSMs is much greater than that of traditional combinational circuits. Furthermore, we created a rich database accessible to the scientific community (database S1 and appendix S1) to enable the automatic design of GRSM registers that implement two-input, five-state gene regulation programs. We validated our RSM framework by building two-input, five-state and three-input, 16-state RSMs, testing them with Sanger sequencing and qPCR, and applying them to build state-dependent gene regulation programs. Our state machines differ from other strategies for genetic programming, such as combinational Boolean logic gates that are stateless (33–44), cell counters that do not integrate multiple inputs (20), temporal logic circuits that are unable to report on all possible input identities and permutations in a single circuit (23), and other multi-input recombinase-based circuits that do not use overlapping recombinase recognition sites and thus cannot perform order-dependent input processing (21, 22). Although we implemented RSMs in bacteria, we anticipate that our framework will be extensible to other organisms in which recombinases are functional. For example, the large serine recombinases used here (BxbI, TP901, and A118), as well as ϕC31, ϕFC1, ϕRV1, U153, and R4, catalyze recombination in mammalian cells (45–48). Identification of additional recombinases that function in different organisms should expand the applicability of our framework. The incorporation of reversible recombination events through proteins such as recombination directionality factors could also enable reversible transitions between gene regulatory states (15). Depending on desired applications, the prototypical inducible promoters we used here to drive the RSMs could be replaced by sensors that correspond to the desired signals to be recorded. Such sensors need not be based on transcriptional regulation, as long as they can control recombinase activity. The integration of RSMs into complex systems should enable researchers to investigate temporally distributed events without the need for continual monitoring and/or sampling. For example, by incorporating RSMs into tumor models, scientists may record the identity and order of oncogene activation and tumor suppressor deactivation events in individual cancer cells, and further correlate this information to phenotypic data from transcriptomic analysis or drug assays. In a recent study of myeloproliferative neoplasms containing mutations in both TET2 (a tumor suppressor) and JAK2 (a proto-oncogene), it was discovered that the order in which the mutations occurred played a role in determining disease phenotype, including sensitivity to therapy (8). This research underscores the potential impact of order dependencies in other malignancies and the importance of studying them. Cell sorting based on reporter gene expression from GRSMs could be used to separate cells exposed to different identities and orders of gene regulatory perturbations, which could then be further studied to determine functional cellular differences. Aside from recording and responding to naturally occurring signals, RSMs have potential applications when the signals that control them are applied by a user. For example, RSMs can generate gene expression based not only on simultaneous combinations of inputs, but also on orders of inputs. Thus, they may be useful to bioengineers for programming multiple functions in cell strains for which there are limited numbers of control signals. For example, they could be used to program cell differentiation down many different cell fate paths based on the order and identities of just a few inputs. Beyond applications in biological research and engineering, our work has also revealed an interesting mathematical structure to recombinase systems. At first glance, the noncommutative behavior of recombinase operations suggests that there might be a superexponential relationship between the number of possible states in a RSM and the number of recombinases it incorporates. Instead, our results show that the number of states is bound exponentially given a finite number of attB-attP pairs per recombinase (Box 1 and texts S2 and S3). Many open mathematical problems remain. For example, what is the minimum number of recognition sites on a register needed to implement a particular state machine? Given a gene regulation program of arbitrary scale and complexity, how can we decide whether there exists a corresponding GRSM? We anticipate that solving such problems will be of interest to mathematicians and biologists alike.

Materials and methods Strains, media, antibiotics, and inducers All plasmids were implemented and tested in E. coli strain DH5αPRO [F-Φ80lacZΔM15 Δ(lacZYA-argF)U169 deoR recA1 endA1 hsdR17(rk−, mk+) phoA supE44 thi-1 gyrA96 relA1 λ−, P N25 /tetR, P laciq /lacI, Spr]. All experiments were performed in Azure Hi-Def medium (Teknova, Hollister, CA) supplemented with 0.4% glycerol. For cloning, we used E. coli strains DH5αPRO or EPI300 [F-mcrA Δ(mrr-hsdRMS-mcrBC) Φ80lacZM15 ΔlacX74 recA1 endA1 araD139 Δ(ara, leu)7697 galU galK λ− rpsL (StrR) nupG trfA dhfr], as indicated below. All cloning was done in Luria-Bertani (LB)–Miller medium (BD Difco) or Azure Hi-Def medium, as indicated below. LB plates were made by mixing LB with agar (1.5% w/v; Apex). For both cloning and experiments, the antibiotics used were chloramphenicol (25 μg/ml) and kanamycin (30 μg/ml). For experiments, the inducers used were ATc (250 ng/ml), Ara (1% w/v), and DAPG (25 μM). Plasmid construction and cloning All plasmids were constructed using basic molecular cloning techniques and Gibson assembly (49, 50). Figure S12 shows all plasmids and their relevant parts. Tables S3 and S4 give a list of relevant parts, their sequences, and the sources from which they were derived. All input plasmids (pNR64 and pNR220) have a kanamycin resistance cassette (kanR) and a ColE1 (high copy) origin of replication. The input plasmid pNR64 was adapted from the dual recombinase controller from (22) (Addgene #44456). We replaced the chloramphenicol resistance cassette in this dual recombinase controller with kanR to make pNR64. To make pNR220, we inserted the PhlF promoter system from (36) onto pNR64 to drive the expression of the A118 recombinase, a gift from J. Thomson (USDA-ARS WRRC, Albany, CA). To control A118 tightly in the absence of any input, we expressed the phlF gene (responsible for suppressing transcription from P PhlF ) from the strong constitutive proD promoter (51). All input plasmids were transformed into chemically competent E. coli strain DH5αPRO, and subsequently isolated using the Qiagen QIAprep Spin Miniprep Kit and verified with Sanger sequencing (Quintara Biosciences). All output plasmids (pNR160, pNR163, pNR164, pNR165, pNR166, pNR186, pNR187, pNR188, pNR291, pNR292, and pNR284) have a chloramphenicol resistance cassette (camR) and are built on a bacterial artificial chromosome (BAC) vector backbone to ensure low copy number, as we ideally want ~1 register per cell. The BAC we used is derived from (52) and is capable of being induced to a higher copy number with Copy Control (Epicentre) in EPI300 cells. Strings of attB and attP recognition sites for pNR160 and pNR188 were synthesized from Integrated DNA Technologies and cloned into their respective backbones. For the construction of all GRSM output plasmids (pNR163, pNR164, pNR165, pNR166, pNR186, pNR187, pNR291, pNR292, and pNR284), we interleaved the array of recognition sites on pNR160 (for two-input, five-state) and pNR188 (for three-input, 16-state) with promoters, terminators, and genes using Gibson assembly. In order to prevent unwanted recombination on our plasmids, we avoided reusing identical part sequences on the same plasmid. For promoters, we used proD, BBa_R0051, and BBa_J54200, which have all been previously characterized to have strong expression (53). The proD promoter is an insulated promoter, which helps with consistent performance across varying contexts (51). We fused the two promoters, BBa_R0051 and BBa_J54200, upstream of 20-nucleotide initial transcribed sequences (ATATAGTGAACAAGGATTAA and ATAGGTTAAAAGCCAGACAT, respectively) characterized in (54), and named the concatenated parts proNR3 and proNR4, respectively. We chose terminators for our GRSMs from among the set of validated strong and sequence diverse terminators characterized in (55). We often constructed terminators in tandem to increase termination efficiency. Lastly, we used the fluorescent reporter genes gfpmut3b (56), mrfp (57), and mtagbfp (58) to produce outputs. The ribosome binding site (RBS) of each gene was optimized using the Salis Lab RBS calculator (59). Upstream of each RBS, we fused a self-cleaving hammerhead ribozyme to prevent the upstream 5′ untranslated transcript region from interfering with translation of the downstream gene (60). All output plasmids were transformed into chemically competent E. coli strain EPI300 or DH5αPRO, and subsequently isolated using the Qiagen QIAprep Spin Miniprep Kit and verified with Sanger sequencing (Quintara Biosciences). Like the output plasmids, all plasmids to test the forward (attB-attP → attL-attR) and reverse (attL-attR → attB-attP) recombination efficiencies for each recombinase used in this study (see fig. S1) have camR and are built on a BAC. The forward reaction test plasmids (pNR230 for BxbI, pNR239 for A118, and pNR276 for TP901) were each constructed with a reverse-oriented gfpmut3b (attached to the same RBS and ribozyme as on the output plasmids described above) downstream of a forward-oriented proD promoter, and with anti-aligned attB and attP sites for the cognate recombinase flanking the gene. Each forward reaction test plasmid was transformed into chemically competent E. coli strain DH5αPRO, and subsequently isolated using the Qiagen QIAprep Spin Miniprep Kit and verified with Sanger sequencing (Quintara Biosciences). To generate the reverse reaction test plasmids (pNR279 for BxbI, pNR280 for A118, and pNR287 for TP901), we transformed each forward reaction test plasmid into chemically competent E. coli strain DH5αPRO containing the pNR220 input plasmid, induced the cognate recombinase for each test plasmid, and isolated the recombined plasmid from cells using the Qiagen QIAprep Spin Miniprep Kit. Each reverse reaction test plasmid was then transformed into chemically competent E. coli strain DH5αPRO, and subsequently isolated again using the Qiagen QIAprep Spin Miniprep Kit and verified with Sanger sequencing (Quintara Biosciences). The second transformation and isolation step for these test plasmids was done to separate them from the pNR220 plasmid, which inevitably was present in the purified DNA solution after the first isolation step. RSM implementation All RSMs were implemented with a two-plasmid system (an input plasmid and an output plasmid). Table S5 shows each RSM and the names of the input and output plasmids used to implement them. All two-input RSMs used the pNR64 input plasmid with various output plasmids depending on the desired gene regulation program. All three-input RSMs used the pNR220 input plasmid with various output plasmids depending on the desired gene regulation program. For the two-input, five-state RSMs, the input plasmid (pNR64) and the output plasmid were simultaneously transformed into chemically competent E. coli DH5αPRO cells. Post-transformation, the cells were plated on LB plates with chloramphenicol and kanamycin. Colonies from these plates were used to initiate RSM testing experiments (see below). For the three-input, 16-state RSMs, we first transformed the input plasmid (pNR220) into chemically competent E. coli DH5αPRO cells and plated the transformants onto LB plates with kanamycin. Subsequently, we inoculated a colony in Azure Hi-Def medium (with kanamycin) and grew it overnight at 37°C, then diluted it 1:2000 into fresh medium (same as the overnight) and let it regrow at 37°C to an OD 600 of 0.2 to 0.5. The cells from this culture were then made chemically competent and transformed with the output plasmid. The purpose for the sequential transformation in this case was to allow time for the phlF gene (on the input plasmid) to be expressed at a high enough level to suppress expression of the A118 recombinase from the P PhlF promoter (also on the input plasmid). This was to ensure minimal recombinase levels when the output plasmid was introduced into the system; otherwise the register on the output plasmid could have falsely recorded a chemical induction event prior to its actual occurrence. After transformation of the output plasmid, the cells were plated on an LB plate with chloramphenicol and kanamycin. Colonies from these plates were used to initiate RSM testing experiments (see below). Experiment for testing the two-input, five-state RSM from Fig. 3A To test the two-input, five-state RSM for one biological replicate, a colony of E. coli cells containing input plasmid pNR64 and output plasmid pNR160 was inoculated into medium with kanamycin and chloramphenicol, grown overnight (~18 hours) at 37°C, and subjected to two rounds of induction followed by a round of outgrowth. For the first round of induction, the overnight culture was diluted 1:250 into medium with no inducer, medium with ATc, and medium with Ara, and grown at 30°C for 18 hours. For the second round of induction, these three cultures were then diluted again 1:250 into fresh medium; the noninduced culture was diluted into medium with no inducer again, the ATc-induced culture was diluted into medium with no inducer and medium with Ara, and the Ara-induced culture was diluted into medium with no inducer and medium with ATc. These cultures were again grown at 30°C for 18 hours. The resulting cultures represented five populations of cells treated with all five permuted substrings of the inputs ATc and Ara. Lastly, for the outgrowth, these cultures were diluted 1:250 into medium with no inducer and grown at 37°C for ~18 hours. The purpose of this final outgrowth was to allow all cell populations to normalize to conditions without inducer, such that detected differences between populations could be attributed to their history of inputs rather than their current environment. This experiment was repeated with a different starting colony for each biological replicate. All cultures were grown in 250 μl of medium (in 96-well plates) shaken at 900 rpm. All media contained chloramphenicol and kanamycin. Final populations from the experiment were analyzed with sequencing assays and qPCR assays (see below). Sequencing assay for testing the two-input, five-state RSM from Fig. 3A For the sequencing assay, each of the five experimental populations described above (from each of three biological replicates) were diluted 1:106, plated (100 μl) onto LB plates with chloramphenicol and kanamycin, and grown overnight at 37°C such that each resulting colony represented the clonal population of a single cell from each experimental population. The register region on the output plasmid for around 24 (at least 22) colonies from each plate (experimental population) was amplified with colony PCR and sent for Sanger sequencing (Quintara Biosciences). Chromatograms from the sequencing reactions were aligned to the expected register sequence to determine whether they matched. Results from all three replicates were totaled, and the percent of cells matching their expected sequence is displayed in Fig. 3B. qPCR assay for testing the two-input, five-state RSM from Fig. 3A For the qPCR assay, plasmids from each of the five experimental populations described above (from each of three biological replicates) were isolated with the QIAprep Spin Miniprep Kit and used as template in qPCR reactions. All qPCR reactions were performed on the Roche LightCycler 96 Real-Time System using KAPA SYBR FAST Master Mix and according to Kapa Biosystems’ recommended protocol (200 nM each primer, 10 μl of 2× master mix, and no more than 20 ng of template in a 20-μl reaction). Each template was qPCR-amplified with each of three primer pairs (pp1, pp2, and pp3) elucidated by PSIT (described below; see appendix S2 for the program), as well as a normalizing primer pair (ppN) that amplified the backbone of the output plasmid. Figure S13 shows the regions on the register to which the three PSIT primer pairs bind and the register states that they are supposed to amplify. Table S6 gives the primer sequences. Along with the experimental templates, we also ran qPCR reactions of each primer pair with control template made up entirely of output plasmid containing register state S3 (fig. S13) that would get amplified by each primer pair. We isolated this output plasmid from our Ara-treated E. coli population and sequence-verified it to make sure that the register state matched S3. We calculated the “fractional amount” of output plasmid amplified by each primer pair (pp1, pp2, or pp3) for each experimental template (t1, t2, t3, t4, or t5) as where tx is the experimental template of interest (t1, t2, t3, t4, or t5), ppy is the primer pair of interest (pp1, pp2, or pp3), tc is the control template (output plasmid in S3), ppn is the normalizing primer pair (ppN), and Cq is the Cq value from the qPCR reaction of the template and primer pair indicated in its subscript. From these f tx,ppy values, we created a qPCR result vector for each experimental template, f tx : This result vector was compared to the theoretical result vector that we would get if the template were made up entirely of a register from one particular state in our RSM, f ts : where ts is the template made entirely of register from one state (S1, S2, S3, S4, or S5). The f ts,ppy values are 0 or 1 depending on whether the particular primer pair ppy amplifies that state (fig. S13). The similarity of f tx to f ts was quantified by Euclidean distance, D tx,ts : The Euclidean distances between the qPCR result vectors of each experimentally derived template and the theoretical qPCR result vectors of each state are displayed in a heat map in fig. S4 for each of three biological replicates. Experiment for testing the three-input, 16-state RSM from Fig. 4A To test the three-input, 16-state RSM for one biological replicate, a colony of E. coli cells containing input plasmid pNR220 and output plasmid pNR188 was inoculated into medium with kanamycin and chloramphenicol, grown overnight (~18 hours) at 37°C, and subjected to three rounds of induction followed by a round of outgrowth. For the first round of induction, the overnight culture was diluted 1:250 into medium with no inducer, medium with ATc, medium with Ara, and medium with DAPG, and grown at 30°C for 24 hours. For the second round of induction, these four cultures were then diluted again 1:250 into fresh media: The noninduced culture was diluted into medium with no inducer; the ATc-induced culture was diluted into medium with no inducer, medium with Ara, and medium with DAPG; the Ara-induced culture was diluted into medium with no inducer, medium with ATc, and medium with DAPG; and the DAPG-induced culture was diluted into medium with no inducer, medium with ATc, and medium with Ara. These cultures were again grown at 30°C for 24 hours. For the third round of induction, each of these 10 cultures were diluted again 1:250 into fresh media: The noninduced → noninduced, ATc → noninduced, Ara → noninduced, and DAPG → noninduced cultures were diluted into medium with no inducer; the ATc → Ara and Ara → ATc cultures were diluted into medium with no inducer and medium with DAPG; the ATc → DAPG and DAPG → ATc cultures were diluted into medium with no inducer and medium with Ara; and the Ara → DAPG and DAPG → Ara cultures were diluted into medium with no inducer and medium with ATc. These cultures were again grown at 30°C for 24 hours. The resulting cultures represented 16 populations of cells treated with all 16 permuted substrings of the inputs ATc, Ara, and DAPG. Lastly, for the outgrowth, these cultures were diluted 1:250 into medium with no inducer and grown at 37°C for 18 hours. This experiment was repeated with a different starting colony for each biological replicate. All cultures were grown in 250 μl of medium (in 96-well plates) shaken at 900 rpm. All media contained chloramphenicol and kanamycin. Final populations from the experiment were analyzed with sequencing assays and qPCR assays (see below). Sequencing assay for testing the three-input, 16-state RSM from Fig. 4A For the sequencing assay, each of the 16 experimental populations described above (from each of three biological replicates) were diluted 1:106, plated (100 μl) onto LB plates with chloramphenicol and kanamycin, and grown overnight at 37°C such that each resulting colony represented the clonal population of a single cell from each experimental population. The register region on the output plasmid for five or six colonies from each plate (experimental population) was amplified with colony PCR and sent for Sanger sequencing (Quintara Biosciences). Chromatograms from the sequencing reactions were aligned to the expected register sequence to determine whether they matched. Results from all three biological replicates were totaled, and the percent of cells matching their expected sequence is displayed in Fig. 4B. qPCR assay for testing the three-input, 16-state RSM from Fig. 4A For the qPCR assay, plasmids from each of the 16 experimental populations described above (from each of three biological replicates) were isolated with the Qiagen QIAprep Spin Miniprep Kit and used as template in qPCR reactions. As with the two-input, five-state RSM testing, all qPCR reactions were performed on the Roche LightCycler 96 Real-Time System using KAPA SYBR FAST Master Mix and according to the Kapa Biosystems recommended protocol (200 nM each primer, 10 μl of 2× master mix, and no more than 20 ng of template in a 20-μl reaction). Each template was qPCR-amplified with each of six primer pairs (pp1, pp2, pp3, pp4, pp5, and pp6) elucidated by PSIT as well as a normalizing primer pair (ppN) that amplified the backbone of the output plasmid. Figure S14 shows the regions on the register to which the six PSIT primer pairs bind and the register states that they are supposed to amplify. Table S7 gives the actual primer sequences. Similar to the two-input, five-state system, we also ran qPCR reactions of each primer pair with control template made up entirely of output plasmid containing a register that would get amplified by each primer pair. Unlike with the two-input, five-state RSM, however, there was no single register state that would get amplified by each primer pair. So we ended up using an output plasmid in state S2 as a control template for pp1, pp4, and pp5 and an output plasmid in state S8 as a control template for pp2, pp3, and pp6 (fig. S14). The plasmid with register state S2 was isolated from our ATc-treated E. coli population (and sequence-verified), and the plasmid with register state S8 was isolated from our Ara → DAPG–treated E. coli population (and sequence-verified). We proceeded with calculating the fractional amount of plasmid amplified by each primer pair for each experimental template, and then comparing the data for each template to each theoretical state (with Euclidean distance) the same way as we did for the two-input, five-state RSM, except generalized to six primer pairs and 16 states. That is, The Euclidean distances between the qPCR result vectors of each experimentally derived template and the theoretical qPCR result vectors of each state are displayed in a heat map in fig. S6 for each of three biological replicates. Designing the GRSM registers from Fig. 6 and fig. S10 We inputted our desired gene regulation programs into the database search function [coded in MATLAB R2013b (Mathworks, Natick, MA); appendix S1], and received an output list of registers, from which we chose our candidates for implementation. Table S8 shows the MATLAB search function input matrix we used to specify our desired gene regulation programs, as well as the search function output vectors that we chose as our registers to implement the gene regulation programs, as per the instructions on how to use the search function (appendix S1). Testing the GRSMs from Fig. 6 and fig. S10 The experiments to test the two-input, five-state GRSMs followed the same format as the experiment to test the two-input, five-state RSM from Fig. 3A, except that we used 24-hour inductions instead of 18-hour inductions for the induction rounds, and instead of analyzing the experimental populations with sequencing and qPCR assays, we used a fluorescence assay (see below). Testing the GRSMs from Fig. 7 The experiments to test the three-input, 16-state GRSMs followed the same format as the experiment to test the three-input, 16-state RSM from Fig. 4A, except that instead of analyzing the experimental populations with sequencing and qPCR assays, we used a fluorescence assay (see below). Testing the reversibility of BxbI, TP901, and A118 in fig. S1 For each recombinase in our study (BxbI, TP901, and A118), we isolated two plasmids that were recombined versions of each other: one with attB-attP and no GFP expression (pNR230 for BxbI, pNR239 for A118, and pNR276 for TP901), and the other with attL-attR and GFP expression (pNR279 for BxbI, pNR280 for A118, and pNR287 for TP901). We transformed each of these plasmids into chemically competent E. coli DH5αPRO containing the input plasmid pNR220 (prepared as described above). To measure recombination for each transformant, a colony was inoculated into media with kanamycin and chloramphenicol, grown overnight (~18 hours) at 37°C, and subjected to a round of induction followed by a round of outgrowth. For the induction, the overnight culture was diluted 1:250 into medium with no inducer and medium with inducer (ATc for BxbI, Ara for TP901, or DAPG for A118) and grown at 30°C for 16 hours. For the outgrowth, these cultures were diluted 1:250 into medium with no inducer and grown at 37°C for 18 hours. This experiment was repeated with a different starting colony for each of three biological replicates. All cultures were grown in 250 μl of medium (in 96-well plates) shaken at 900 rpm. We measured the percentage of cells from each population expressing GFP, as described below. RSM time course experiment in fig. S9 For one biological replicate, a colony of E. coli DH5αPRO cells containing input plasmid pNR64 and output plasmid pNR291 was inoculated into medium with kanamycin and chloramphenicol, grown overnight (~18 hours) at 37°C, rediluted 1:75 into fresh medium, split into 11 cultures, and grown at 30°C. When cells reached an OD 600 of 0.1, we rediluted cells from one culture 1:125 into fresh medium and let them outgrow at 37°C. This (uninduced) population would become the 0-hour time point in fig. S9, C to E. All other cultures were subjected to induction prior to outgrowth. Ara was directly added to five of the cultures, and ATc was directly added to the other five and they were allowed to continue growing at 30°C. Each of the five cultures for each input would become induction time points separated by 1-hour steps (for each input); we refer to them as input seed cultures. After 1 hour, we diluted cells from one ATc seed culture 1:125 into fresh medium and let them outgrow at 37°C. This would become the 1-hour time point for ATc in fig. S9C. From the same seed culture, we also diluted cells 1:25 into medium with Ara and let them grow for the equivalent amount of input exposure time (1 hour) at 30°C before diluting 1:125 into fresh medium and letting them outgrow at 37°C. This would become the 1-hour time point for ATc → Ara in fig. S9E. Then, for the same seed culture, we directly added Ara and let the cells grow for the equivalent amount of input exposure time (1 hour) at 30°C before diluting 1:125 into fresh medium and letting them outgrow at 37°C. This would become the 1-hour time point for ATc → Ara in fig. S9D. The same procedure was done for an Ara seed culture after 1 hour, except with ATc as the sequentially added input. This process was subsequently repeated at 2 hours with different ATc and Ara seed cultures, and so on for 3, 4, and 5 hours. The outgrowth for all cell populations continued for 16 hours after the final cells were diluted for outgrowth (10 hours after the initial induction began). This experiment was repeated for three biological replicates. All cultures were grown in 250 μl of medium (in 96-well plates) shaken at 900 rpm. All media contained chloramphenicol and kanamycin. Final populations from the experiment were analyzed with flow cytometry (see below). Fluorescence assay For all experiments with a fluorescence assay, we diluted cells 1:125 into phosphate-buffered solution (PBS, Research Products International) and ran them on a BD-FACS LSRFortessa-HTS cell analyzer (BD Biosciences). We measured 30,000 cells for each sample and consistently gated by forward scatter and side scatter for all cells in an experiment. GFP (product of gfpmut3b) intensity was measured on the FITC channel (488-nm excitation laser, 530/30 detection filter), RFP (product of mrfp) intensity was measured on the PE–Texas Red channel (561-nm excitation laser, 610/20 detection filter), and BFP (product of mtagbfp) intensity was measured on the PacBlue channel (405-nm excitation laser, 450/50 detection filter). A fluorescence threshold was applied in each channel to determine the percent of cells with expressed (ON) versus not expressed (OFF) fluorescent proteins. The threshold was based on a negative control (E. coli DH5αPRO containing pNR64 and a BAC with no fluorescent reporter genes) population, such that 0.1% of these negative control cells were considered to have ON fluorescent protein expression in each channel (corresponding to a 0.1% false positive rate). All fluorescence-based experiments had three biological replicates. For the recombinase reversibility experiment (fig. S1) and the RSM time course experiment (fig. S9), the data for all three replicates is shown. For the GRSM experiments (Figs. 6 and 7 and fig. S10), the data from all three replicates are averaged. For these experiments, the largest standard error for the percent of any fluorescent subpopulation was 1.22%. GRSM database and search function The GRSM database was constructed (as discussed in the main text) using MATLAB R2013b (Mathworks), partly run on the Odyssey cluster supported by the FAS Division of Science, Research Computing Group at Harvard University. The database contains three arrays: registerArray (an array of GRSM registers), grpArray (an array of gene regulation programs), and register2grp (an array that maps each register in registerArray to its corresponding gene regulation program in grpArray, by index). Each gene regulation program in grpArray is represented by a 70-element vector of 0s and 1s. Each contiguous stretch of 14 elements belongs to a state—S1, S2, S3, S4, and S5, respectively—corresponding to the states in Fig. 3A. And within each state, each element (1 to 14) represents a gene (G1 to G14, respectively). For example, given a vector in grpArray, element 1 represents G1 in S1, element 15 represents G1 in S2, element 29 represent G1 in S3, element 43 represent G1 in S4, element 57 represents G1 in S5, element 2 represents G2 in S1, element 16 represents G2 in S2, and so on. The binary value of each element indicates whether that gene in that particular state is OFF (0) or ON (1). If the value of any given gene in every state in a gene regulation program is 0, then that gene does not exist in the regulation program. Each register in registerArray is represented by a seven-element vector of numbers 1 through 25. Each element of the vector corresponds to a DNA region (a to g) interleaving the recognition sites of the register shown in Fig. 3A. The value of each element (1 to 25) represents a part, as defined in table S9. Each part is made up of genes, terminators, and constitutive promoters, arranged such that each part is functionally distinct (see text S6). Nonpalindromic parts (as indicated in table S9) can appear inverted on the register, in which case they take on a negative value. For example, part 1 is a gene, which is a nonpalindromic part. If it appears as a “1” on an element of a register vector, then it is facing left to right (5′ to 3′), and if it appears as a “–1” on an element of a register vector, then it is facing right to left (5′ to 3′). Note that all explicitly depicted terminators in the parts (table S9) are unidirectional; thus, transcription can move through them in the reverse direction. However, the unidirectional terminator in part 3 can be replaced by a bidirectional terminator without changing the function of the part. This is because placing an additional terminator upstream of the promoter in part 3 would only terminate transcription that would subsequently be reinitiated in the same direction. Also, the unidirectional nature of part 7 is not always necessary to the gene regulation program of the underlying register. That is, sometimes part 7 (a unidirectional terminator by itself) can be replaced by part 4 (a bidirectional terminator by itself) without affecting the gene regulation implemented by the underlying register. To make this distinction clear to database users, we parsed all occurrences of part 7 in the registerArray and replaced it with a special identifier, part 15, if its unidirectional nature is not important to the gene regulation program of the underlying register. Therefore, all occurrences of part 7 in registerArray now represent parts that necessitate “terminator read-through” (transcription through their unidirectional terminators in the reverse direction) for the gene regulation program of the underlying register. Likewise, because convergent (face-to-face) promoters can destructively interfere with each other (61), we made a special distinction for parts with promoters that necessitate “promoter read-through” (transcription through their promoters in the reverse direction; table S9). Because part 10 (a promoter by itself), depending on its register context, can sometimes necessitate read-through and sometimes not, we parsed all occurrences of part 10 in registerArray and replaced it with a special identifier, part 14, if it does not necessitate read-through for the gene regulation program of the underlying register. Therefore, all occurrences of part 10 in registerArray now represent parts that necessitate promoter read-through for the proper gene regulation program of the underlying register. All parts with genes in registerArray also have bidirectional terminators on the 3′ ends of those genes. These terminators are not explicitly depicted in table S9. Although the database has otherwise been reduced to avoid superfluous terminators, promoters, and genes, the implicit terminators on the 3′ ends of genes may sometimes be superfluous. That is, they may not be necessary for the proper gene regulation program of the underlying register. Lastly, the array register2grp has the same number of elements as registerArray. It maps each register in registerArray to a value that is the index of its corresponding gene regulation program in grpArray. We present the database as a MATLAB MAT-file (database S1), where each array is stored in a MATLAB variable. The search function for this MAT-file database was also created in MATLAB R2013b and requires MATLAB software to run. Code for the MATLAB search function and more information on how it works are included in appendix S1. PCR-based state interrogation tool (PSIT) The PSIT algorithm uses an abstract data type—the class DNARegister—to represent registers. To determine what sets of primer pairs may be used to uniquely detect an inputted DNARegister and all of its recombined states, the algorithm (i) “recombines” the input register, generating DNARegister instances for all states that result from any permuted substring of inputs; (ii) generates a list of primer pairs made up of all possible primers that bind to each region between recognition sites and on the terminal ends of the recognition site arrays; (iii) narrows the list to primer pairs that only amplify in any given state when they are on adjacent regions; and (iv) determines all subsets of this final list of primer pairs that can be used to uniquely identify each possible state of the DNA register. This final list of primer pair subsets is then returned as output along with details regarding which primer pairs amplify in which states. For qPCR compatibility purposes, step iii ensures that every amplicon is short and that every primer pair always yields the same amplicon when it amplifies (regardless of state). The PSIT program was implemented in Python 2.7. Code for the PSIT program and more information on how it works are included in appendix S2.