Models for eukaryotic parts and pools

With the joint usage of MDL and BNGL, we propose models for eukaryotic parts and pools that arise, when possible, from the corresponding bacterial modules [1, 8]. We aim at giving part descriptions that are useful for synthetic biology applications and do not pursue an exhaustive representation of all the possible interactions that govern transcription and translation in eukaryotes. Moreover, not all the mechanisms behind mRNA and protein synthesis are well known, and the values of several kinetics parameters have not been measured yet. Therefore, despite the power of a rule-based modeling approach, one has to find a proper trade-off between model granularity and available knowledge in order to obtain a meaningful model that can predict synthetic gene circuit dynamics and performance. Specifically, among the set of composable parts and pools, only promoters and coding regions require a rule-based modeling approach because of their potentially complex structure where several binding sites for regulatory factors are present together with either an RNA polymerase or ribosome binding site, respectively.

A more exhaustive description of eukaryotic systems might take into account mechanisms that have been neglected here. For instance, cell metabolic reactions can be described by a network of pools that either store free molecules (e.g. kinases and phosphatases) or that represent enzymatic reactions (e.g. phosphorylation and dephosphorylation). Furthermore, part models presented below might be enriched by considering also operator positional effects and transcription squelching, for instance. Such a precise picture might be useful for the analysis of specific cellular phenomena–and the evaluation of the corresponding kinetic parameter values–on rather simple gene circuits.

Promoters

To model promoters, operator position is not explicitly taken into account, but activator binding sites are supposed to be placed upstream of the TATA box whereas repressors bind the DNA between the TATA box and the TSS (Transcription Starting Site). We take a prokaryotic repression model based on competition between repressors and RNA polymerases, where DNA-bound repressors prevent RNA polymerases from reaching their binding sites and initiating transcription. The use of bacterial transcription factors in eukaryotic cells is a way of combining orthogonal systems that is broadly exploited in synthetic biology [39, 40].

Different transcription factors can bind a promoter, each on N operators, in principle. Transcription factors of the same species can bind cooperatively. Here, as in our previous work [1], we assume that the affinity between DNA and transcription factors varies with the relative position of the operators with respect to the TSS. As for repressors, the strongest operator is the one closest to the TSS. In contrast, activators bind with higher affinity to the one furthest from the TSS and the TATA box (see Figure 3A). The binding of a transcription factor to an operator causes a rotation of the DNA such that the binding rate constant of the adjacent operator is increased [41].

Figure 3 Synthetic eukaryotic promoter and mRNA. A) In the configuration here shown, a promoter is bound by a repressor R 1 and an activator A 1 . Every operator is labeled with the name of the corresponding transcription factor and the position with respect to the TSS (the lower the integer, the closer the operator to the TSS). A star marks the operators with the highest affinity in case of cooperativity. B) mRNA with four riboswitches along the 5’-UTR (three of them are tandem ones) and two siRNA binding sites on the 3’-UTR region. The ribosome binding site is sequestered by the riboswitches nearby when they are in their inactive configuration. Full size image

RNA polymerase binds the DNA in the absence of any repressors. If it is recruited by activators, two scenarios are possible: a) if the activators do not bind cooperatively, only one of their operators has to be occupied in order to let RNA polymerase bind; b) if the activators bind cooperatively, all their operators must be bound to get transcription started. Chemicals can bind and inactivate transcription factors anchored to their operators. Depending on the presence or absence of cooperativity, the binding of a co-repressor [42] to an activator can have a different repercussion on RNA polymerase bound to the DNA. Without cooperativity, all the activator’s operators must be free to let polymerase leave the double chain. In contrast, in case of cooperativity it is enough to free the rightmost operator to destabilize the polymerase-DNA bond (see Additional file 1 for figures illustrating these interactions).

Promoter leakage is proportional to all the configurations where at least one repressor is bound or, in the absence of repressors, where there is no activator whatsoever on the DNA (without cooperativity) or all the right-most operators (with cooperativity) are free.

Coding region, siRNA and mRNA pools, terminators

As already mentioned above, eukaryotic cells do not have an RBS. Therefore, translation regulation–together with gene expression–concerns the coding region. Here, in contrast to our bacterial framework, each coding region has a corresponding mRNA pool in the cytoplasm. mRNA pools are connected to the ribosome pool and, potentially, to chemical and siRNA pools as well. In the nucleus, mRNA is transcribed and spliced, and it becomes mature inside the coding region part. Free molecules of the spliceosome have their own pool and they interact with the immature mRNA by following a Michaelis-Menten (enzyme-like) scheme. All the other steps of mRNA maturation and transport into the cytoplasm are lumped into a single reaction to minimize the model’s number of kinetic parameters.

Translation regulation occurs either via riboswitch activation/deactivation or RNA interference. As in the promoter case, position along the mRNA is not explicitly taken into account. However, riboswitches are normally placed on the 5’-UTR (Untranslated Region) [43] whereas siRNA binding sites lie on the 3’-UTR [44] (see Figure 3B).

Riboswitches are, essentially, RNA hairpin loops that can prevent ribosome binding. In our framework, they assume two different states: active (on) and inactive (off). Only the active state allow ribosome binding to the mRNA. Riboswitches change their state upon chemical binding to their aptamers. Only when all the riboswitches’ aptamers are on, ribosomes are allowed to bind the mRNA and to start translation. As an improvement of our previous representation [8], here we explicitly consider single as well as tandem riboswitches with one or two aptamers, respectively. Tandem riboswitches can be bound by a unique chemical species or by two different species. Since homo- and hetero-cooperativity have been reported in literature [45, 46], both have been taken into account in our model. In principle, N different riboswitches can be placed along the 5’-UTR.

RNAi interference is a regulation mechanism typical of higher eukaryotes such as mammals, but it has also been engineered into budding yeast [47]. In our framework, we suppose that a siRNA-coding region drives the formation of double-stranded small interfering RNAs in the nucleus. They undergo a splicing operation after interacting with the Dicer enzyme and are then exported to the cytoplasm as a single strand. As in the mRNA case, all the nuclear maturation processes and transport are lumped into a single reaction. Free Dicer molecules (from a distinct pool) act on double stranded RNAs following a Michaelis-Mentes scheme (analogous to the mRNA-spliceosome interaction above). In the cytoplasm, siRNA pools are connected both to the mRNA and the RISC pools. Despite its complex structure, RISC is here treated as a single molecule that binds an siRNA in the siRNA pool and brings it to its target mRNA. Once the siRNA is bound to the mRNA, the mRNA is cleaved and rapidly degraded, and any ribosome along the mRNA is released. Each siRNA can bind to any of N different sites placed on the mRNA’s 3’-UTR (see Additional file 1).

mRNA half life strongly influences the dynamics of synthetic gene circuits. Terminators introduce loop structures at the end of the mRNA sequence which may considerably alter the mRNA’s stability [48]. Therefore, in contrast to bacteria, eukaryotic terminators are characterized by specifying the decay rate of the mRNA (or siRNA) produced by the transcription unit they belong to.

Transcription factors and fluorescent proteins are synthesized inside the cytoplasm. The former are imported into the nucleus where they exert their regulatory action on the DNA, and the latter flow into a pool placed in the cytoplasm, since they are not normally localized into the nucleus.

In Figure 4 we provide a graphical representation of a simple gene circuit made of parts and pools in a eukaryotic cell. A more detailed model description is available in Additional file 1, including all the circuit reactions and rules in BNGL.

Figure 4 Gene circuits in eukaryotic cells. In the circuit, fluorescence expression is under the control of an activator and an siRNA. For the sake of simplicity we do not show all the terminators and all pools; RNA polymerase, ribosome, spliceosome, Dicer, and RISC pools were removed. Every full arrow represents a transcription process. Full size image

Application: logic evaluator in mammalian cells

As a benchmark for our eukaryotic part and pool model, we chose the RNAi logic evaluator by Rinaudo et al.[38]. In this work, Boolean gates of varying complexity have been implemented via siRNA-dependent translation regulation; siRNA expression is under the control of endogenous signals. In our previous work on the automatic design of gene digital circuits [8], we presented alternative solutions–in bacteria–to the circuit associated with the Boolean function: ( a ∧ b ∧ d ) ∨ ( a ¯ ∧ c ) , where a ¯ stands for N O T(a), ∧ for AND, and ∨ for OR. As inputs, we considered four external chemicals that interact with promoters and RBSs. However, since our composable parts accommodated at most two binding sites on DNA or mRNA, we could not predict a circuit design that only employs transcription or translation control, respectively. With the new set of eukaryotic parts and pools, we are now able to reconstruct an RNAi-based logic evaluator that is close to the original one, and to design an alternative circuit that performs the same Boolean function via transcription regulation alone.

In the Rinaudo et al. version of the circuit, each Boolean variable corresponds to an endogenous signal. When a signal is present, its corresponding siRNA is “inactivated” that is, no longer produced. Moreover, signal a also activates the siRNA associated with the a ¯ variable. However, siRNA synthesis is not shown explicitly. Therefore, in our circuit we decided to put siRNA-a,-b,-c, and -d under the control of an activator that is inhibited when bound by the corresponding input chemical, whereas the transcription of siRNA- a ¯ is controlled by a repressor that is also inhibited by signal a (see Figure 5A). Both AND gates are transcription units that produce the same fluorescent protein, the circuit output. Each siRNA has two binding sites on its target mRNAs, so the A N D 1 gate (a ∧ b ∧ d) has a total of 6 binding sites, whereas 4 lie on A N D 2 ( a ¯ ∧ c ) –see Figure 5B. To quantify the complexity of this circuit: each mRNA pool where either an activator or a repressor is transcribed has 5 internal species, 9 reactions, and 2 exchange fluxes; the mRNA pool associated with the A N D 1 (A N D 2 ) gate has 17 (13) species, 45 (33) reactions, and 5 (4) exchange fluxes. Overall, the circuit contains 197 species and 474 reactions. We assume that parts of the same type–such as promoters regulated by an activator or siRNA coding regions–are identical. This means that we have to specify only a limited amount of parameter values with respect to the number of circuit reactions (see Additional file 1 for details).

Figure 5 RNAi-based logic evaluator. A) Conversion of a chemical into a siRNA. Following Rinaudo et at., signal a inhibits siRNA-a and promotes siRNA- a ¯ expression. This double function is mimicked by requiring that this chemical binds and deactivates two different transcription factors. When a is present, only siRNA- a ¯ is transcribed thus–neglecting other signals in the circuit– A N D 2 mRNA is cleaved whereas A N D 1 produces fluorescence; vice versa in absence of a. B) Cytoplasmic AND gates. C) Comparison of in silico simulations and in vivo measurements. For each truth table entry, we calculated the ratio between the corresponding fluorescent protein concentration and the minimal 1-output value (absolute values are shown in Additional file 1). This is the procedure followed by Rinaudo and co-authors (in their case, the lowest 1 concentration is at the entry “0010”). They, however, measured fluorescence. Full size image

As shown in Figure 5C, deterministic circuit simulations correctly reproduce the circuit’s truth table in terms of high/low reporter outputs, but also with respect to the quantitative outputs, without specific tuning of model parameters. The choice of deterministic simulations is justified by the fact that both logic evaluators exceed 100 proteins in signal separation (see Additonal file 1), which we shown to be a condition for large Boolean networks to be insensitive to stochastic noise [8]. In our simulations, we first let the circuit get to the steady state in the absence of chemicals (96 hours). Then we fed it with the inputs in order to calculate all 16 entries of the truth table. After 48 hours (i.e, the time considered in the original work), a clear signal separation is already reached. The separation does not improve substantially if we simulate the circuit for 96 hours. Discrepancies with the published measurements mainly concern the logical 0 levels. These probably reflect the fact that our circuit is not identical to the reference circuit since we had to choose arbitrarily how to design the siRNAs’ expression. Moreover, our knowledge of RNAi kinetic parameter values is still quite limited and adaptations of parameter values or the implementation of more detailed models for RNAi [49] might further improve our results. In this paper, however, we want to show that our framework based on composable parts and pools generated via a rule-based modeling approach is applicable to the design and analysis of eukaryotic cells, and a more detailed analysis of the parameter space of such systems is left to a future work.

In the transcriptional version of this circuit, siRNAs are replaced by repressors (see Figure 6A-B). Every repressor binds non-cooperatively to two operators. Therefore, symmetrically to the original circuit, A N D 1 is a transcription unit whose promoter (p a n d 1 ) is regulated by three repressors and it contains a total of 6 operators; A N D 2 ’s promoter (p a n d 2 ) is controlled by two repressors and hosts 4 operators. These promoter configurations show a high degree of complexity: p a n d 1 hosts 65 species, exchanges 12 fluxes, and contains 834 reactions; p a n d 2 hosts 17 species, exchanges 7 fluxes, and contains 130 reactions. The promoter that leads to the synthesis of the repressor associated with a ¯ has a configuration (2 operators) close to the most complex one we could achieve with our old set of bacterial parts: 5 species, 6 fluxes, and only 22 reactions. Although p a n d 1 and p a n d 2 have the same number of binding sites as the mRNAs for A N D 1 and A N D 2 in the RNAi-based circuit version, the promoters’ species are much more numerous than the corresponding mRNAs’, because mRNA is degraded as soon as one siRNA binds (states where more than one siRNA is bound are forbidden). Overall, the transcription-based version of the circuit is made of 187 species and 1165 reactions. Hence, our modular, rule-based modeling approach turns out to be extremely useful in designing systems with complex promoters.

Figure 6 Transcription repression-based logic evaluator. A) Conversion of a chemical into a repressor. When signal a is present, only repressor a ¯ is expressed. Therefore–neglecting the other signals– p a n d 1 is not regulated and can lead to fluorescence production; vice versa when a is absent. This configuration requires two genes less than the siRNA-based one. B) Nuclear AND gates. C) Comparison of in silico simulations and in vivo measurements. Calculations are performed as in the RNAi-based circuit. Full size image

The transcriptional version of the logic evaluator also reproduces the circuit truth table faithfully (see Figure 6C). The final 1 and 0 output levels (concentration of the reporter proteins) are higher than the ones in the RNAi-based version of the circuit because more mRNA is transcribed and more proteins are expressed when all the regulations occur at the DNA level (see Additional file 1).