Choice of target site.

To selectively disrupt the female-specific isoform of dsx we targeted the upstream intron–exon boundary of exon 5, which has been shown to be expressed only in the female mosquito16. This exon spans a region of 1,712 bp on chromosome 2R (48,712,937–48,714,648) and contains at its 5′ end 89 bp encoding the sequence-specific portion of the female A. gambiae dsx isoform (AgdsxF). We identified a potential gRNA target site that showed almost complete sequence conservation across 16 different anopheline species and complete conservation across the A. gambiae species complex19 (viewed using http://people.csail.mit.edu/waterhouse/alnloc.cgi), with no nucleotide variation at 22 of the 23 targeted bases across 765 wild-caught A. gambiae collected as part of the Anopheles gambiae 1000 Genomes project20. A single nucleotide variant existing in the target site was represented at 2.9% allele frequency in the wild-caught mosquitoes (Supplementary Fig. 8). In vitro testing of this SNP variant revealed it to be as susceptible as the wild-type sequence to Cas9 cleavage directed by the gRNA used in our gene drive construct (Supplementary Fig. 9). The gRNA target and protospacer-adjacent motif (5′-GTTTAACACAGGTCAAGCGG TGG -3′) was also assessed in silico for off-target activity using the online-based tool ChopChop (http://chopchop.cbu.uib.no)28,29.

Generation of CRISPR and donor constructs.

We engineered available template plasmids to develop the CRISPR (p16510) and donor (pK101) constructs used to induce a double-strand break on the dsx target sequence and to provide template for homology-mediated repair, respectively. In practice a CRISPR construct2 containing a U6::gRNA spacer cloning cassette was utilized, using Golden Gate cloning, to generate a PolIII transcription unit containing the dsx-specific gRNA. The plasmid also contained a human-codon-optimized Cas9 coding sequence (hCas9) under the control of the vasa2 promoter, which directs the expression of the Cas9 protein in the pole cells of the developing embryo. The donor plasmid was designed to contain a GFP transcription unit under the control of the 3xP3 promoter enclosed within two reversible ϕC31 attP recombination sequences flanked both 5′ and 3′ by 2 kb sequence immediately upstream and downstream, respectively, of the target site in dsx exon 5. The homology recombination regions flanking the dsx target site were amplified using primers adapted for Gibson assembly (dsxϕ31L-F + dsxϕ31L-R, dsxϕ31R-F + dsxϕ31R-R) (Supplementary Table 4), and the 3xP3::GFP cassette and backbone were excised using restriction enzymes from plasmid p163 (ref. 2). The final donor vector was named K101 (GenBank accession code MH541846) and was assembled using the standard Gibson assembly protocol30.

Generation of the dsxF CRISPR homing allele (dsxFCRISPRh).

The dsxFCRISPRh homing allele was generated in vivo by ϕC31 recombinase-mediated cassette exchange (RMCE)31 using construct p17410, which encompassed the hCas9 and the dsx gRNA transcription units, as well as reporter 3xP3::RFP cassette within two reversible ϕC31 attB recombination sequences. The gene drive construct targeting dsxF is identical in design to that described in Hammond et al.2 except for the promoter and 3′ UTR surrounding the Cas9 gene: where previously these were from the ortholog of vasa (AGAP008578), in the current construct these are replaced by 1,074 bp upstream and 1,034 bp downstream of the germline-specific gene AGAP006241, the putative ortholog of zero population growth (zpg). A comparison of the fertility and homing rates in individuals heterozygous vasa- and zpg-driven gene CRISPRh constructs at the exact same target locus (in AGAP007280, previously described by Hammond et al.2), showed improved fertility in the zpg-driven constructs17 (summarized in Supplementary Fig. 4).

To make p17410 (GenBank accession code MH541847), we amplified both the promoter and terminator using primers carrying arms suitable for a subsequent Gibson assembly (Supplementary Table 4). The promoter, a 1,074-bp region upstream of the gene also containing the 5′ UTR, was amplified using primers zpgprCRISPR-F and zpgprCRISPR-R from the wild-type G3 mosquito strain. The terminator, a 1,037-bp region downstream also containing the 3′ UTR, was amplified using primers zpgteCRISPR-F and zpgteCRISPR-R. Using restriction enzymes, we removed the hCas9 gene, backbone and gRNA cassette from p16510 and reassembled everything in a Gibson assembly reaction using the zpg promoter and terminator fragments.

Microinjection of embryos and selection of transformed mosquitoes.

All mosquitoes were reared under standard conditions of 80% relative humidity and 28 °C. The mosquitoes were blood-fed on anesthetized mice, and freshly laid embryos were aligned and used for microinjections as described before32. We injected embryos with solution containing both p16510 and pK101 (each at 300 ng/μl) to generate mosquitoes (dsxF−) in which the splicing junction of dsx exon 5 had been disrupted by the insertion of the eGFP ϕC31 acceptor construct. To generate the dsxF CRISPR homing allele, embryos from the dsxF− knock-in line were injected with solution containing p17410 and a plasmid-based source of ϕC31 integrase2. All the surviving G 0 larvae were crossed to wild-type mosquitoes and G 1 positive transformants were identified using a fluorescence microscope (Eclipse TE200) as GFP+ larvae for the knock-in events and RFP+ larvae for the RMCE events.

Containment of gene drive mosquitoes.

All mosquitoes were housed at Imperial College London in an insectary that is compliant with Arthropod Containment Guidelines Level 2 (ref. 33). All GM work was performed under institutionally approved biosafety and GM protocols. In particular, GM mosquitoes containing constructs with the potential to show gene drive were housed in dedicated cubicles, separated by at least six doors from the external environment and requiring two levels of security card access. Moreover, because of its location in a city with a northern temperate climate, A. gambiae mosquitoes housed in the insectary are also ecologically contained. The physical and ecological containment of the insectary are compliant with guidelines set out in a recent commentary calling for safeguards in the study of synthetic gene drive technologies34.

Molecular confirmation of gene targeting and cassette integration.

Successful integration of dsxF− and dsxFCRISPRh cassettes into Agdsx at exon 5 was confirmed by PCR using genomic DNA extracted using the Wizard Genomic DNA purification kit (Promega). Generation of the HDR-mediated dsxF− allele was confirmed using primers binding the integrated cassette (GFP-F and 3xP3-R) and the neighboring genomic integration site, external to the sequence included on the homology arms (dsxin3-F and dsxex6-R). dsxF− heterozygotes and homozygotes could be further distinguished by PCR using primers that bind either side of the inserted cassette (dsxex4-F and dsxex5-R), giving rise to a smaller and/or larger product corresponding to the empty wild-type locus or the predicted dsxF− allele, respectively.

RCME of the dsxFCRISPRh construct into the dsx locus was confirmed using primers binding the drive cassette (hCas9-F and RFP-R) and the neighboring genomic integration site (dsxin4-F and dsxex5-R1). Primer sequences can be found in Supplementary Table 4.

Phenotypic characterization and microdissections.

Microdissection and phenotypic characterization were carried out using Olympus SZX7 optical microscopes. Mosquitoes were collected in Falcon tubes and anesthetized on ice 5 min before dissection. For phenotypic comparison, the legs of the mosquitoes were removed to achieve the profile orientation. Pictures were taken using a HiChrome-SMII GXCAM digital mounted camera (GT Vision). Pictures of gonads were taken using the EVOS imaging system (Thermo-Fisher).

Phenotypic assays.

Phenotypic assays designed to examine relative fecundity in mosquitoes carrying either dsxF− or dsxFCRISPRh alleles were carried out essentially as described before2. Briefly, the offspring of intercrossed heterozygous dsxF−/+ individuals were screened for heterozygous or homozygous knock-in on the basis of weak or strong GFP expression, respectively. Nonfluorescent progeny were kept as controls. Groups of 50 male and 50 female mosquitoes from each of the three classes were mated to an equal number of wild-type mosquitoes for 5 d, blood-fed, and a minimum of 45 females allowed to lay individually. The entire egg and larval progeny were counted for each lay and a minimum of 20 progeny investigated to confirm zygosity of the dsxF− allele in the parent. Females that failed to give progeny and had no evidence of sperm in their spermathecae were excluded from the analysis. Phenotypic assays for dsxFCRISPRh individuals were performed essentially the same way with the exception that the entire larval progeny were screened for presence of DsRed, which is linked to the dsxFCRISPRh allele. Statistical differences between genotypes were assessed using the Kruskal–Wallis test.

Cage trial assays.

Two cage trials were initiated using 300 wild-type females, 150 wild-type males and 150 dsxFCRISPRh/+ males. The wild-type and dsxFCRISPRh lines were reared in parallel and kept under the same conditions. For the starting generation only, age-matched male and female pupae were allowed to emerge in separate cages and were mixed only when all the pupae had emerged. Both dsxFCRISPRh and wild-type male pupae were screened for the presence of the RFP marker. Mosquitoes were left to mate for 5 days before they were blood fed on anesthetized mice. Two days after, the mosquitoes were set to lay in a 300-ml egg bowl filled with water and lined with filter paper. The eggs produced from the cage were photographed and counted using JMicroVision V1.27. Prior to counting, eggs were dispersed using gentle water spraying in the egg bowl to homogenize the population, and 650 eggs were randomly selected to seed the next generation. Larvae emerging from the 650 eggs were counted and screened for the presence of the RFP marker to score the transgenic rate of the progeny. The number of pupae used to seed the next generation was also recorded.

PCR of target site and deep sequencing analysis preparation.

For the deep sequence analysis, a limiting PCR reaction was performed on 40 ng of genomic material extracted en masse using the Wizard Genomic DNA purification kit (Promega) from a minimum of 359 mosquitoes taken at G 2 , G 3 , G 4 and G 5 from both cage experiments. Using the KAPA HiFi HotStart Ready Mix PCR kit (Kapa Biosystems) and primers that carried the Illumina Nextera Transposase adapters (underlined), 4050-Illumina-F ( TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG ACTTATCGGCATCAGTTGCG) and 4050-Illumina-R ( GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG GTGAATTCCGTCAGCCAGCA), we amplified a 358-bp locus containing the target site in 50-μl reactions. To maintain the proportion of the reads corresponding to particular alleles at the target site, the PCR reactions were performed under nonsaturating conditions; they were allowed to run for 20 cycles before 25 μl were removed and stored at −20 °C. The remnant 25 μl were run for another 20 cycles and used to verify the amplification on an agarose gel. Annealing time and temperature were adjusted to 68 °C for 20 s to minimize off-target amplification.

Libraries were prepared in accordance with the Illumina 16S Metagenomic Sequencing Library Preparation protocol and the Nextera XT Index Kit. AMPure XP beads were used to purify the amplicons. Dual indices and Illumina sequencing adapters were attached in a second PCR step using the Nextera XT Indexing Kit and purified with the AMPure XP beads. The resulting libraries were validated using an Agilent 2100 Bioanalyzer (DNA High Sensitivity kit, sample dilution 1:5) to determine size distribution and a Qubit 3.0 fluorometer to determine concentration of libraries. Indexed DNA libraries were normalized to 4 nM, pooled and loaded at a concentration of 9 pM onto an Illumina Flowcell v2 with 19% of ϕX control and sequenced using the Illumina MiSeq, 2 × 250 bp v2 paired end run.

Deep sequencing analysis.

We ran CRISPResso35 software v1.0.8 on raw sequencing data to detect mutations at the target site using parameter -q 30, setting the minimum average read quality score (phred33) to 30. Raw sequencing data was deposited in the NCBI BioProject database (accession code PRJNA476358). Resulting allele frequency tables were processed using ad hoc Python and R scripts to group, filter and visualize indels and substitutions in the amplicon. To visualize the frequency of the most abundant indels around the cut site in both cages over the four generations, we calculated the mean frequency of indels occurring within the target region, including 20 bp upstream and downstream of the target site. The top ten alleles with the highest mean frequency were then selected to show the change of frequency of each allele throughout four generations. To plot and show the distribution of indels and substitutions in the whole amplicon, we filtered out alleles with less than three reads.

Modeling.

We use discrete-generation deterministic and stochastic models with random mating and males and females treated separately as in Hammond et al.9, and incorporate different homing rates in males and females and a modified treatment of embryonic cleavage and repair from paternally and maternally derived nuclease, as observed (see “Population genetics model” below9,36). We include wild-type (W), driver (D), and nonfunctional nuclease-resistant (R) alleles. Cleavage followed by homing and repair occurs in the germline in heterozygous W/D females and males; otherwise inheritance is Mendelian. Gametes (W, D or R) from W/D females and W/D, D/R and D/D males carry nuclease that is transmitted to the zygote and acts in the embryo in somatic cells to reduce fitness if wild-type alleles are present, so that W/W, W/R and W/D females have fitness w10, w01, w11 or 1, depending on whether nuclease was derived from a transgenic mother, father, both or neither. All males are assumed to have fitness 1, and we assume no effects of parentally deposited nuclease in germline cells. In the stochastic version of the model, probabilities of mating, egg production, hatching and emergence from pupae are estimated from experiments (Supplementary Table 5) and random numbers for these events are taken from the appropriate multinomial distributions. To model the cage experiments, 300 females and 150 male wild-type adults along with 150 male drive heterozygotes (from transgenic fathers) are initially present. Females may fail to mate or may mate once randomly with a male of a given genotype according to its frequency in the male population. The number of eggs produced from each mated female is randomly chosen by sampling with replacement from experimental values in Supplementary Table 6. To start the next generation, 650 eggs are randomly selected, and these hatch with a probability that also depends upon on the genotype of the mother. The probability of subsequent survival to adulthood is assumed to be equal across genotypes.

Population genetics model.

To model the results of the cage experiments, we use discrete-generation recursion equations for the genotype frequencies, treating males and females separately. F ij (t) and M ij (t) denote the frequency of females (or males) of genotype i/j in the total female (or male) population. We consider three alleles, W (wild-type), D (driver) and R (nonfunctional resistant), and therefore six genotypes.

Homing. Adults of genotype W/D produce gametes at meiosis in the ratio W:D:R as follows:

Here d f and d m are the rates of transmission of the driver allele in the two sexes and u f and u m are the fractions of nondrive gametes that are nonfunctional resistant (R alleles) from meiotic end-joining. In all other genotypes, inheritance is Mendelian.

Fitness. Let w ij ≤ 1 represent the fitness of genotype i/j relative to w WW = 1 for the wild-type homozygote. We assume no fitness effects in males. Fitness effects in females are manifested as differences in the relative ability of genotypes to participate in mating and reproduction. We assume the target gene is needed for female fertility, and thus D/D, D/R and R/R females are sterile; there is no reduction in fitness in females with only one copy of the target gene (W/D, W/R).

Parental effects. We consider that further cleavage of the W allele and repair can occur in the embryo if nuclease is present, due to one or both contributing gametes derived from a parent with one or two driver alleles. The presence of parental nuclease is assumed to affect somatic cells and therefore female fitness but has no effect in germline cells that would alter gene transmission. Previously, embryonic EJ effects (maternal only) were modeled as acting immediately in the zygote. Here, we consider that experimental measurements of female individuals of different genotypes and origins show a range of fitnesses, suggesting that individuals may be mosaics with intermediate phenotypes. We therefore model genotypes W/X (X = W, D, R) with parental nuclease as individuals with an intermediate reduced fitness , or depending on whether nuclease was derived from a transgenic mother, father or both. We assume that parental effects are the same whether the parent(s) had one or two drive alleles. For simplicity, a baseline reduced fitness of w 10 , w 01 , w 11 is assigned to all genotypes W/X (X = W, D, R) with maternal, paternal and maternal/paternal effects, with fitness estimated as the product of mean egg production values and hatching rates relative to wild type in Supplementary Table 5 in the deterministic model. In the stochastic version of the model, egg production from female individuals with different parentage is sampled with replacement from experimental values.

Recursion equations. We first consider the gamete contributions from each genotype, including parental effects on fitness. In addition to W and R gametes that are derived from parents that have no drive allele and therefore have no deposited nuclease, gametes from W/D females and W/D, D/R and D/D males carry nuclease that is transmitted to the zygote, and these are denoted W*, D* and R*. The proportion e i of type i alleles in eggs produced by females participating in reproduction are given in terms of male and female genotype frequencies below. Frequencies of mosaic individuals with parental effects (i.e., reduced fitness) due to nuclease from mothers, fathers or both are denoted by superscripts 10, 01 or 11.

The proportions s i of type i alleles in sperm are

Above, and are the average female and male fitness:

To model cage experiments, we start with an equal number of males and females, with an initial frequency of wild-type females in the female population of F WW = 1, wild-type males in the male population of M WW = 1/2, and heterozygote drive males that inherited the drive from their fathers. Assuming a 1:1 ratio of males and females in progeny, after the starting generation, genotype frequencies of type i/j in the next generation (t + 1) are the same in males and females, F ij (t + 1) = M ij (t + 1). Both are given by G ij (t + 1) in the following set of equations in terms of the gamete proportions in the previous generation, assuming random mating:

The frequency of transgenic individuals can be compared with experiment: the fraction of RFP+ individuals is given by

All calculations are carried out using Wolfram Mathematica (Wolfram Research Inc.)

In vitro cleavage assay against wild-type and SNP variant target site.

We performed an in vitro cleavage assay to test the ability of the gRNA used in this study to cleave the target site that incorporates the SNP found in wild populations in Africa (Supplementary Fig. 9). Using Golden Gate cloning and primers modified to carry suitable overhangs, we introduced the two target sequences separately into a 2-kb plasmid. As a control, we also prepared a plasmid that carries a modified version of the dsx target site without the SNP that lacks the PAM sequence, necessary for Cas9 cleavage. All three vectors were linearized and verified on a gel before the cleavage assay. For the cleavage assay we used a ready-to-use sgRNA provided by Synthego (USA) and S. pyogenes Cas9 nuclease in the form of enzyme (NEB). To form ribonucleoprotein particles (RNPs), we mixed a 1:1 molar ration of the sgRNA and the Cas9 protein into a 40-μl reaction to a final concentration of 400 nM and left it to incubate at room temperature for 10 min. The linearized substrate was added to the reactions in a final concentration of 40 nM in a final volume of 50 μl and incubated at 37 °C for 30 min. Proteinase K was added to stop the reaction and 20 μl were verified on a gel. The primers used to create the three target sequences are outlined in Supplementary Table 4.

Ethics statement.

All animal work was conducted according to UK Home Office Regulations and approved under Home Office License PPL 70/8914.

Life Sciences Reporting Summary.

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability.

Raw sequencing data were deposited in the NCBI BioProject database under accession code PRJNA476358.