Exploring the design space for stripe-forming networks

Step 1 of our procedure was theoretical. We created a genotype–phenotype map of stripe-forming networks, by exhaustively enumerating all possible 3-node designs excluding negative autoregulation (2,897 in total). Every network was numerically simulated with 100,000 random parameter sets in a 1D row of virtual cells, using a Hill-like function model of gene regulation based on equilibrium binding of transcription factors to their sites on the promoter36 (Supplementary Fig. 1). Criteria were defined to select those designs that could create a stripe of expression in the centre of the morphogen gradient (see Supplementary Methods for details).

We found that 109 of the possible networks generated a stripe with appropriate parameter values. However, networks with different topologies may in fact operate using the same underlying dynamics. To elucidate this question and aid understanding of how many mechanisms may be present within these 109 designs, we organized them into a complexity atlas21 (Fig. 1a, Supplementary Fig. 2). In this atlas, each node represents one of the 109 stripe-forming networks and two nodes are connected with a line when they differ by a single interaction (that is, adding or removing one repression or activation). The atlas is thus a ‘metagraph’—a graph of graphs (networks). The nodes of the atlas are laid out such that topological complexity increases upwards, revealing a structure in which four main groups (‘stalactites’) appear. Each stalactite converges downwards to a single network with minimal complexity. Strikingly, these networks correspond to the four known types of incoherent feedforward loops35 (I-FFL: I1-I4). We explored the temporal dynamics of each minimal mechanism and found them to be distinct in each case (for example, the I1 stripe gene is initially highest on the right-hand side of the field of cells and its peak then shifts to the centre, while in I2 the stripe develops from the centre; Fig. 1b). Furthermore, phase portraits of these systems and the temporal trajectories also confirm that the four networks achieve the stripe with different dynamical mechanisms (Fig. 1c).

Figure 1: Design space of stripe-forming networks. (a) Complexity atlas: grey circles are gene regulatory networks (GRNs) and edges link those with a single connectivity change. The GRNs are laid out such that complexity increases upwards. Examples of networks with the corresponding number of interactions (complexity) are depicted on the left. The bottoms of the ‘stalactites’ represent minimal networks: the four incoherent feedforward loops (I1-I4). Key: arrow, activation; bar, repression; red, morphogen input receiver gene; blue, intermediate loop gene; green, stripe output gene. (b) The temporal development of the three genes were calculated from representative parameters sets of the complexity atlas and shown schematically for each distinct mechanism. (c) Qualitative phase portraits for each distinct mechanism at low, medium and high morphogen concentration. The x axis represents the activity of the blue gene (that is, C) and the y axis, the activity of the green gene (that is, B). The nullcline curves for the blue and green genes (where one variable does not change in time) are shown as coloured lines, and the stable steady state (S) occurs where they intersect (that is, where neither variable changes over time). The black star indicates the initial condition close to the origin. The full red arrows in these phase plots show the nullcline movements in response to the morphogen gradient and the dashed red arrows indicate the increase or decrease in the height of the nullcline. The calculated temporal trajectories are shown as dashed lines, showing why in some cases a low final level for the green gene is preceded by a temporary rise in levels, while in other cases it is not. It is clear from this analysis that mechanisms cannot be smoothly transformed from one into the other, further highlighting the qualitatively different dynamics for each stalactite. Full size image

Network scaffold and engineering criteria checklist

For Step 2, we built the minimal network of each design class synthetically. A main challenge we faced was to develop a method where components could be consistently reused in different contexts, allowing the exploration of network variants37 and ideally the full design space. Before building the four networks as synthetic transcription circuits, we therefore designed a general network scaffold strategy for Escherichia coli, consisting of three compatible plasmids (Fig. 2). Each plasmid codes for one node and contains the same multiple cloning sites, for the combinatorial insertion38,39 of promoters, repressor binding sites (operators), ribosomal binding sites, repressors and activators. The arabinose-responsive promoter P BAD receives the input signal (arabinose), whereas superfolder green fluorescent protein (GFP)40 is the network output, which should form the stripe. T7 and SP6 phage RNA polymerases function as activators from T7 or SP6 promoters41, while split T7 RNA polymerase42 integrates two activations to give an AND gate output (in I4). Negative interactions employ the transcriptional repressors lacI (lactose operon repressor protein) and TetR (tetracycline repressor), placing their operator sites (lacO, TetO) behind promoters.

Figure 2: Network scaffold. (a) An arabinose-dependent promoter links the concentration-gradient input to a network of activators (T7 and SP6 RNAP) and repressors (lacI and TetR). (b) Each node is encoded on a different E. coli plasmid (color-coded). Each plasmid contains the same multiple cloning sites (MCS) for the combinatorial insertion of the network components. The plasmids contain compatible origins of replication (ori) and antibiotic resistance cassettes (kanamycin, spectinomycin and ampicillin resistance). The 3-node, 3-plasmid scaffold is described fully in Supplementary Methods. Full size image

While testing the individual components, we observed that strong bacterial stress responses can be induced by expressing synthetic networks at high levels. This metabolic load43,44 can retard growth or decrease global transcription and translation, interfering with synthetic network function. Importantly, a high inducer concentration can even shut down gene expression due to stress, giving an apparent stripe of expression (Supplementary Fig. 3). In other studies, metabolic load has been used reproducibly as a desired property8,45. However, as our goal was to build four different networks, each functioning with a distinct predefined mechanism, it was essential to exclude metabolic load in each case. Stripes formed due to metabolic load can neither reproduce the appropriate dynamics nor could the model predict the behaviour of mutants. It was therefore critically important to reduce bacterial metabolic burden by using degradation tags and low-activity promoter mutants. We ultimately developed a list of controls or engineering criteria to ensure that our stripes are not caused by metabolic load (Fig. 3a). This checklist ensures that networks and mutants behave as predicted.

Figure 3: Controls for synthetic stripe-forming circuits in E. coli. (a) A list of controls or engineering criteria to verify that an observed stripe is not caused by a bacterial stress response to high network expression (metabolic load; Supplementary Fig. 3). (b) A list of controls to verify the mechanism. Green, wild-type network; orange, mutant network; red, morphogen input receiver gene; blue, intermediate loop gene. Full size image

A family of four stripe-forming networks

We used our network scaffold to build the minimal stripe-forming I1, I2, I3 and I4 networks synthetically (Fig. 4). The modelling from step 1 not only provided us with the topologies, but also gave us an indication of the relative strengths of the interactions. For example, in I3, we knew the activator between the red and blue nodes had to be weaker than the activator between the blue and green nodes. This knowledge guided our component choices and reduced ‘trial and error’ considerably. We tuned the parameters by using different promoter strengths and repressor binding sites, as well as partially de-repressing lacI and TetR with isopropyl-β-D-thiogalactopyranoside (IPTG) and anhydrotetracycline (aTc), respectively. Screening of circuits was performed on multiwell plates, with each well containing a different concentration of arabinose, and monitoring GFP levels over time gave stripe gene readouts. All networks passed the metabolic load control checklist of Fig. 3a (Supplementary Figs 4–6). In addition, the temporal dynamics of formation of the stripe were measured (Fig. 4d) and showed good qualitative agreement with the predictions (Fig. 1). To confirm that each network was capable of true spatial patterning, we also placed a localized source of arabinose on a two-dimensional bacterial lawn, and observed a discrete ring of GFP fluorescence at a fixed distance from the source28,31 (Fig. 4c).

Figure 4: The incoherent feedforward loop stripe-forming network family. (a) The four I-FFLs built (small arrow: constitutive promoter). (b) Implementations of the circuits in the network scaffold (Fig. 2). (c) Bacterial lawns display green fluorescent rings as a function of arabinose gradients from central paper disks (white)28,31. (d) E. coli transformed with each network display single fluorescent stripes in arabinose gradients as measured by fluorescence spectrometry (normalized by the absorbance). Time course for I1: 24-min intervals between each sample set, at 4–6 h of growth. For I2, I3 and I4: 12-min intervals, at 5–6 h of growth. Mean and s.d. from three biological replicates. Full size image

Quantitative mutant analysis and model fitting

One of the challenges for synthetic biology is that real biological cells are vastly more complex than the simple circuits we wish to engineer. Therefore, our component parts may not work exactly as intended, or may interact with the cells’ native machinery causing unexpected dynamics45. It is therefore of central importance to show that an engineered circuit shows the desired behaviour using the expected mechanism and exclude any other mechanism. One advantage of our design space approach is that we know in advance that each of the four engineered circuits should be operating with a different dynamical mechanism21 (Fig. 1), which we can test (Fig. 3b).

Thus, in Step 3, we confirmed each mechanism with two approaches: First, rather than just assaying the stripe pattern gene (from the GFP fluorescence; Fig. 4), we measured the activity of all three nodes of the network, using reverse transcription quantitative PCR of the messenger RNA (mRNA) levels at different concentrations of arabinose (Fig. 5a). Different mechanisms predict different spatial patterns of these intermediate genes (for example, in I1 the blue node shows a spatial gradient in the same orientation as the red node, while in I2 it is reversed; Fig. 1). Importantly, all results agreed with the predicted patterns (Fig. 5a). Second, we made a series of mutations to each core network, to alter quantitatively the effective strength of specific regulatory links. For each mutation, the expected impact on regulation was known from the literature (Supplementary Fig. 7). In some cases, these mutations led to qualitative changes in the resulting GFP pattern (for example, the loss of the stripe in two of the mutations of I1, Fig. 5b), while in others the GFP pattern was just quantitatively altered (Fig. 5b).

Figure 5: Quantitative mutant analysis and model fitting. (a) Measured mRNA concentrations for all genes other than the stripe-forming gene at 6 h of growth. Mean and s.d. from three biological replicates. (b) Comparison of WT (green) and mutant (orange) network fluorescence at 6 h of growth. The interactions marked with an asterisk are modified in the mutant networks. The exact changes and conditions are listed in Supplementary Fig. 7. Mean and s.d. from three biological replicates. The black lines represent the model fitted simultaneously to the RNA data and fluorescence output of the WT and mutant networks for each design class. Full size image

To confirm the mechanistic basis of each network class, we explored whether measurements from the individual nodes and all the mutated networks of each design class could be simultaneously fitted to the expected steady-state values of an ordinary differential equations computer model. The mathematical model uses the standard modelling approach36 based on equilibrium binding of transcription factors to their sites on a promoter, where parameters refer to binding constants, Hill coefficients and transcription rates (Supplementary Table 1), allowing thus for a comparison with existing values in the literature.

While sharing a Hill-like mathematical expression, the regulation function used at Step 3 has more free parameters than the function from the complexity atlas exploration at Step 1. In Step 3, these free parameters acquire specific values through the fitting process, and thus characterize the specific biological conditions of the four constructs. This is in contrast to the exploratory, qualitative study of thousands of complex networks in Step 1, in which we fixed some of these parameters. Importantly however, the dynamical mechanism of stripe formation is conserved during this parameter reduction (see Supplementary Fig. 8).

Employing the model, we asked whether it was possible to find a good fit in which only the mutated parameters are free to vary between the mutants, and the remaining regulatory parameters are required to be consistent across all networks. Good fits were obtained for all four network classes (Fig. 5, Supplementary Tables 2–6), indicating that they are indeed working according the four expected mechanisms. Moreover, the changes of the parameters predicted by fitting the mutant networks match the expected changes (Supplementary Table 7). Finally, despite fitting each design class independently, a number of parameters are consistent across the study, suggesting that we have captured the universal aspects of these components (Supplementary Table 8) and can use these parameters for the prediction of future networks. To our knowledge, such an extensive quantitative and multi-network verification of distinct mechanisms covering the complete design space has not been achieved before.

The close match we achieved between theory and experiments gave us confidence in engineering further networks. For example, we built a variant of I2 with a positive feedback on the green node (Supplementary Fig. 9). This network is located higher up in the complexity atlas than the minimal I2 network (Fig. 1), and clearly has a different wiring design. However, it belongs to the same stalactite and is therefore expected to function with the same dynamical mechanism. Theoretical analysis of this particular design confirms this expectation and more importantly the synthetically constructed circuit also reveals the predicted dynamics over time and the spatial pattern of the intermediate genes (Supplementary Fig. 9).

Archetypal 2-node stripe-forming topology

The success of building and modelling all four 3-node I-FFLs, led us to search for a deeper principle in the design space. Since all four designs contain both an activating and a repressing pathway between the morphogen sensor gene (red) and the stripe gene (green) we reasoned that it might be possible to achieve the stripe pattern with only 2 nodes. In this implementation, the first node would display both activator and repressor activities46 which would act directly on the second node, but with different dynamics (Fig. 6). We first used the computer model to confirm that such a simple network can indeed produce a stripe. This design represents the “archetype” of all I-FFL design classes—it can be envisioned as located just under the four 3-node networks in a complexity atlas and is thus a sub-stalactite (Fig. 6a). We therefore named this fundamental circuit ‘I-zero’ (I0). Most importantly, we were also successful in building this network synthetically (Fig. 6) for the first time. We thus engineered the most elementary network capable of stripe formation, unifying the design space.

Figure 6: Two-node archetypal stripe-forming network (I0). (a) I0 network complexity shown relative to the I-FFL stripe-forming family. (b) Circuit implementation in the network scaffold. (c) Schematic depictions of archetypal stripe (left) and anti-stripe (right) mechanisms. (d) E. coli transformed with the I0 network display a fluorescent stripe (left) or anti-stripe (right) in arabinose gradients, depending on the aTc concentration (0.125 μM for stripe, 0 μM for antistripe). Mean and s.d. from three biological replicates. The black lines represent the model fitted simultaneously to the RNA data and fluorescence output of the stripe, anti-stripe and a further variant at 5 h of growth (Supplementary Fig. 10). Corresponding bacterial lawns display a fluorescent ring or anti-ring as a function of arabinose gradients from central paper disks (white)28,31 (bottom). Full size image

Modelling also predicted that the stripe output of the I0 network can be transformed into an ‘anti-stripe’. By strengthening the repressor activity, its dose–response curve shifts towards lower arabinose concentrations (Fig. 6c). In doing so, the effect of the repressor starts at a lower arabinose concentration than that of the activator. Given a moderate basal level of the activator, the resulting pattern is an anti-stripe. We were also able to confirm this prediction experimentally simply by lowering the concentrations of aTc (TetR regulator), thus increasing the effective concentration of the repressor (Fig. 6d, Supplementary Fig. 7, Supplementary Tables 2 and 7). Finally, we also demonstrated the power of our model to predict successfully the behaviour of further I0 mutants (Supplementary Fig. 10).