Capturing genome conformation of Plasmodium transmission stages

To complement our previous study describing the genome architecture of P. falciparum during the intraerythrocytic developmental cycle (IDC)16, we performed Hi-C experiments on three additional stages of the P. falciparum life cycle: early gametocytes (stage II/III), late gametocytes (stage IV/V), and salivary gland sporozoites (Fig. 1a, Supplementary Table 1, and Supplementary Fig. 1) using the tethered conformation capture methodology. In addition, to evaluate similarities and differences in genome organization between the highly pathogenic P. falciparum and the less virulent P. vivax, we generated Hi-C data for P. vivax salivary gland sporozoites. For each stage, we obtained high-quality data, evidenced by a log-linear relationship between contact probability and genomic distance (Supplementary Fig. 2A), as well as interchromosomal contact probability (ICP) and percentage of long-range contacts (PLRC) values in agreement with previous studies (Supplementary Table 2). Biological replicates of both P. falciparum and P. vivax sporozoite stage parasites showed a high degree of similarity (Supplementary Fig. 2B, C), demonstrating the robustness of our methodology. The data from these replicates were combined to obtain higher resolution for subsequent analyses.

Fig. 1 Genome organization in Plasmodium parasites. a Schematic overview of the parasite life cycle, with the samples generated in this study highlighted in blue and stages available from a previous study16 shown in orange. b ICE-normalized contact count matrices (top row) and fit-hi-c p-value matrices (bottom row) at 10 kb resolution of chromosome 7 for P. falciparum stages and chromosome 11 for P. vivax sporozoites. The boxed value indicates the maximum contact count (top row) or minimum p-value (bottom row). In all other figures comparing different stages, the contact counts were subsampled to the same total. Virulence clusters are indicated by yellow boxes, the centromere location by a dashed black line, and unmappable regions by gray in these and all other heatmaps. c Models of the consensus three-dimensional organization of the P. falciparum genome in stage II/III gametocytes, stage IV/V gametocytes, and salivary gland sporozoites, with light blue spheres indicating centromeres, white spheres indicating telomeres and green spheres indicating the location of virulence gene clusters Full size image

Next, the observed intrachromosomal and interchromosomal contacts were aggregated into contact count matrices at 10-kb resolution and were normalized using the ICE method to correct for experimental and technical biases18 (Fig. 1b, Supplementary Data 1, and Supplementary Data 2). In addition, we identified significant contacts using fit-hi-c, which controls for the propensity of adjacent loci to have more contacts and calculates a p-value reflecting the probability that the number of contacts in that bin is larger than expected by chance19. We inferred a consensus 3D genome structure for each of the transmission stages and the three IDC stages using Pastis20 (Fig. 1c). The stability of these consensus structures was assessed by generating 5000 possible structures from varying initial starting points. A principal component analysis showed strong clustering of structures from the same stage, and clear separation between structures of different stages, except for early and late gametocytes (Supplementary Fig. 2D). These results indicate that the genome organization of early and late gametocytes is similar, while there are distinct differences between all other stages that are captured using a single representative structure for each stage.

Universal and stage-specific features of genome organization

From the contact count matrices and the consensus 3D structures, it became apparent that large-scale features of genome organization at the gametocyte stage were comparable to those of the IDC stages, including colocalization of centromeres and clustering of telomeres and virulence genes (all p-values < 0.00001, Witten–Noble colocalization test21). However, we observed significant intrachromosomal rearrangements, including increased interactions among virulence genes and exported proteins, repression of invasion genes, change in organization of ribosomal DNA genes, as well as the formation of large domains on chromosome 14 in close proximity to a female gametocyte-specific pfap2 transcription factor locus. These changes will be addressed in more detail in the next sections. At the sporozoite stage of P. falciparum, the clustering of telomeres was conserved, but the colocalization of the centromeres was completely lost (p-value = 0.49, Witten–Noble colocalization test), Fig. 1c, rightmost panel, and Supplementary Fig. 3A). In contrast, in P. vivax sporozoites, the centromeres colocalized significantly (p-value < 0.0001, Witten-–Noble colocalization test), but these interactions were only observed at the centromere itself and did not involve any of the surrounding regions (Supplementary Fig. 3B). While our result will need to be validated by an independent approach, our Hi-C experiment is so far the only successful technique that has been able to monitor a reduction in centromere clustering in the sporozoite stage. To better visualize the large-scale differences between the various stages of the P. falciparum life cycle, we generated an animation of the changes in genome organization during the stage transitions, which highlights that chromosomes undergo dramatic rearrangement in sporozoites as compared to the blood stages (Supplementary Movie 1).

The groups of genes described in the previous sections were examined independently and were selected based on our prior knowledge of the function of these genes, rather than as the result of a systematic screen for changes in contact counts. To systematically identify changes in genome conformation between the various stages, we designed a statistical test that analyzes differences in the number of intrachromosomal contacts for each 10-kb bin in the normalized contact count matrices between pairs of stages. Loci that show a two-fold or larger difference in contacts, after normalizing for the effect of genomic distance, with a false discovery rate of less than 1% are listed in Supplementary Data 3. As expected, interactions at the sporozoite stages were most different from those in other stages of the life cycle. Several chromosomes showed large rearrangements towards the chromosome ends, for example the right arm of chromosome 4 (Supplementary Fig. 4), involving gene families coding for exported proteins involved in virulence and erythrocyte remodeling. The fold-change heatmaps (accessible at http://noble.gs.washington.edu/proj/plasmo3d_sexualstages/), the contact count heatmaps, and the confidence score heatmaps showed many additional differences in chromosome conformation during stage transitions, as detailed below. The interaction patterns were similar in two distinct field isolates for sporozoites and in two laboratory strains for gametocytes, suggesting that genetic variations did not influence our results. Furthermore, the Hi-C methodology has recently been used to detect translocations in genomes and to correct genome assemblies based on Illumina and/or PacBio sequencing in many organisms, including Plasmodium knowlesi, Arabidopsis thaliana, and Aedes aegypti22,23,24,25. It is therefore unlikely that the changes that we observed between life cycle stages are artifacts caused by genomic recombination during in vitro parasite culture, since none of the interchromosomal heatmaps (Supplementary Fig. 3A) showed any evidence of such recombination events. To further validate our results, we have introduced interchromosomal and intrachromosomal translocations in the P. vivax genome to visualize the aberrant patterns that such recombination events would produce (Supplementary Fig. 5). In addition, we scanned our samples using a recently published metric developed to detect genome assembly errors in Hi-C data24 and did not detect any signs of misassembly or translocations (Supplementary Fig. 6). Additional simulations showed that translocations of a single 10-kb bin are not easily detectable, but larger regions are detectable when there is a separation of a few bins between them (Supplementary Fig. 7).

pfap2-g leaves the repressive center during gametocytogenesis

Previous work has shown that knockdown of heterochromatin protein 1 (PfHP1) during the IDC results in activation of the gametocyte-specific transcription factor locus pfap2-g and an increased formation of gametocytes26. PfHP1 interacts with the repressive histone mark H3K9me3 on silenced var genes that are colocalized in a perinuclear heterochromatic compartment. Using DNA-FISH, we observed that the pfap2-g locus was located in close proximity to a subtelomeric var gene on chr8 in >90% of the cells observed (Fig. 2a and Supplementary Fig. 8A), suggesting that pfap2-g is associated with the repressive cluster during the IDC. In agreement with this observation, the trophozoite and schizont stage fit-hi-c p-value heatmaps showed a significant interaction between pfap2-g and the nearest internal virulence gene cluster (Fig. 2b and Supplementary Fig. 9A). The virulence cluster and pfap2-g locus each straddle two ten kilobase bins, and in trophozoites, each of the four possible pairwise interactions were significant (fit-hi-c q-value < 0.05). We performed virtual 4C at MboI restriction site resolution to demonstrate that this interaction was specific for pfap2-g and did not involve the nearby pfap2 PF3D7_1222400 (Supplementary Fig. 10), although these two ap2 TF genes are located in neighboring 10 kb bins and therefore fall outside our limits to detect potential translocations. In addition, pfap2-g significantly interacted with virulence clusters on chromosomes 6 and 8 in trophozoites (fit-hi-c q-value < 0.05; Supplementary Data 4). In stage II/III gametocytes, no significant interactions between pfap2-g and virulence clusters were observed (q-values = 1.0; Fig. 2b; Supplementary Data 4; Supplementary Table 3), indicating that pfap2-g dissociates from the repressive cluster in the transition from the IDC to early gametocytes. Interactions between pfap2-g and virulence clusters were partially regained in the late gametocyte stage (Supplementary Data 4, Supplementary Table 3). Unfortunately, DNA-FISH experiments were unsuccessful for the gametocyte stage. An improved DNA-FISH methodology for the gametocyte stages will need to be developed to further confirm these results by an independent approach.

Fig. 2 Changes in interaction of pfap2 genes and invasion genes with the repressive center. a Colocalization of pfap2-g and var gene PF3D7_0800300 by DNA-FISH. Additional images are presented in Supplementary Fig. 8. b Dissociation of the gametocyte-specific transcription factor locus pfap2-g (red) from the nearby internal virulence gene cluster (yellow) in stage II/III gametocytes. c Overall reduced number of intrachromosomal and interchromosomal interactions between pfap2 TF genes and virulence genes in gametocytes and sporozoites as compared to the IDC stages. d Invasion gene clusters (blue bar) on chromosomes 2 (top) and 10 (bottom) interact with subtelomeric virulence genes (yellow bar) in gametocytes, but not during the IDC. In each plot, the top triangle shows the aggregated data of both gametocyte stages, while the bottom triangle shows the aggregated data from the three IDC stages. Bins that depict interactions between virulence genes and invasion genes are highlighted by a red box. e Increased number of intrachromosomal and interchromosomal interactions between invasion genes and virulence genes in gametocytes and sporozoites as compared to the IDC stages Full size image

Significant contacts with virulence gene clusters at any parasite stage were observed for a total of 9 pfap2 loci (Supplementary Table 3). The dissociation of pfap2 genes from the repressive center in other stages than the IDC seemed to be a general trend: fewer significant contacts (q-value < 0.05) between pfap2 genes and virulence gene clusters were observed in gametocytes and sporozoites as compared to the IDC stages (p-value = 0.0027, one sided sign test; Fig. 2c and Supplementary Fig. 9B). We used a resampling procedure (see Supplementary Information) to compare this result to randomly-selected bins; the p-value was 0.021. Note that pfap2 PF3D7_0420300 is located in close genomic proximity to a virulence cluster in chromosome 4. To ensure that the significant decrease in pfap2 gene loci interacting with virulence clusters is not due to this gene, we repeated the p-value calculation without PF3D7_0420300 and confirmed that this remained significant (p-value = 1.192e−07, one-sided sign test). Such changes in interactions with virulence gene clusters between stages were not observed for an unrelated gene family (histone genes, data not shown). These results are indicative of an important connection between genome organization and the activity of pfap2 genes that drive parasite life cycle progression.

Relocation of invasion genes during gametocytogenesis

Distinct changes were observed in chromosomes 2 and 10 at loci that harbor invasion genes (Fig. 2d). These genes encode proteins that are expressed in merozoites and mediate attachment to and entry of red blood cells and include several merozoite surface proteins, S-antigen, and glutamate-rich protein. In contrast to several other invasion genes (pfrh427, clag, and eba28, these genes are not known to undergo clonally variant expression, and we therefore consider their association with heterochromatin in gametocytes to be a different regulatory mechanism than the epigenetic mechanisms that control expression of pfrh4, clag, and eba during the IDC. In gametocytes, these loci showed strong interaction with the subtelomeric regions, while these contacts were not observed in the IDC. To quantify this observation, we assessed the number of significant contacts between invasion gene loci and virulence clusters in each life cycle stage. A larger number of significant contacts were observed between invasion genes and virulence clusters in the transmission stages than in IDC stages (p-value < 2.2e−16, sign test; Fig. 2e and Supplementary Fig. 9C). The lack of interactions between invasion gene GLURP on chromosome 10 (PF3D7_1035300) and var gene PF3D7_0800300 during the IDC was confirmed by DNA-FISH (Supplementary Fig. 8B). As mentioned earlier, DNA-FISH experiments were unfortunately not successful for the gametocyte stage. Collectively, our data indicate that, similar to the pfap2 gene family, the expression of invasion genes during the life cycle is correlated with association with or dissociation from repressive heterochromatin.

Expansion of subtelomeric heterochromatin in gametocytes

To further study changes in chromatin organization during gametocytogenesis, we determined the distribution of repressive histone mark H3K9me3 in late ring/early trophozoites and stage IV/V gametocytes by performing ChIP-seq on two biological replicates for each stage (Supplementary Fig. 11A and Supplementary Data 5) that were combined for downstream analyses. We used two different commercially available anti-H3K9me3 antibodies to rule out that our ChIP-seq results were influenced by the antibody used. In trophozoites, H3K9me3 marking was restricted to subtelomeric regions, internal virulence gene clusters and a few additional loci (including pfap2-g and dblmsp2), as previously described for both H3K9me37,8 and PfHP129 (Fig. 3a and Supplementary Fig. 11B, C). These same regions were occupied by H3K9me3 in gametocytes. In addition, in several chromosomes, the subtelomeric heterochromatin marking expanded to more internally located genes in gametocytes (Fig. 3a–c and Supplementary Fig. 11B, C). The expansion of heterochromatin at the chromosome ends was also visible in the contact count heatmaps as larger subtelomeric domains that showed strong intra-domain interactions and were depleted of interactions with the internal region of the chromosome (Supplementary Data 1). While not all genes in these regions were marked by H3K9me3, a total of 79 genes showed increased H3K9me3 levels in gametocytes as compared to trophozoites, 61 of which were exported proteins that may play a role in erythrocyte remodeling (Supplementary Data 5). These genes included members of the phist (n = 15 out of a total of 68), hyp (n = 7 out of 34) and fikk (n = 7 out of 19) families, as well as 32 other genes annotated as exported proteins or containing a PEXEL export motif (Fig. 3d, e). Other members of gene families encoding exported proteins were marked with H3K9me3 in the IDC. However, 47 of these 61 genes have never been detected in a heterochromatic state in the IDC7,8,29. At two loci on chr9 and chr14, respectively, H3K9me3 was lost in gametocytes (Fig. 3f). The locus on chr9 is between two genes known to be involved in gametocyte differentiation (pfgdv1 and pfgig) and is deleted in various gametocyte-defective P. falciparum strains30. On chromosome 14, the genes that were not marked by H3K9me3 in gametocytes encode exported proteins that have been implicated in gametocytogenesis: PF3D7_1476600, PF3D7_1477300 (Pfg14-744), PF3D7_1477400, and PF3D7_1477700 (Pfg14-748)30. These results imply that H3K9me3 and chromatin structure play important roles in gene activation and silencing during gametocyte formation. To validate the 3D modeling and heterochromatin clustering, we performed immunofluorescence imaging against repressive histone mark H3K9me3. We identified a single nuclear H3K9me3 focus per nuclei in rings and schizonts (Fig. 3g and Supplementary Fig. 12), corresponding to the single repressive center harboring all virulence genes as predicted by our 3D models. In trophozoites, the number of foci varied, in line with nuclear expansion16,31, and the progression of DNA replication that takes place at the end of this stage (Fig. 3g). Gametocytes showed either one or two H3K9me3 foci that did not seem to be associated with a male or female phenotype.

Fig. 3 Silencing of genes encoding exported proteins in gametocytes through expansion of heterochromatin. a ChIP-seq analysis of genome-wide H3K9me3 localization in trophozoites (top tracks in black) and stage IV/V gametocytes (bottom tracks in red). Results of one representative biological replicate are shown for each stage. Results for a second biological replicate are shown in Supplementary Fig. 11. The regions depicted in panels B and F are indicated with black boxes. b Expansion of H3K9me3 heterochromatin in gametocytes as compared to trophozoites, predominantly to genes encoding exported proteins. c Length of each subtelomeric region in which the majority of genes is marked by H3K9me3, sorted by the difference in length between these regions in trophozoites and gametocytes. d H3K9me3 levels per gene at the trophozoite and gametocyte stages. e Enrichment of genes encoding for exported proteins among genes with increased levels of H3K9me3 in gametocytes (p-value from a two-tailed Fisher’s exact test). f Loss of H3K9me3 mark in gametocytes on chromosome 9 between gametocyte development genes pfgdv1 and pfgig, as well as at gametocyte-specific genes encoding exported proteins on chromosome 14 (indicated with an asterisk). g Immunofluorescence analysis showing a single H3K9me3 focus in ring and schizont stages, and either one or two foci in gametocytes. Scale bar denotes 1 μm Full size image

Formation of superdomains on a P. falciparum chromosome

Chromosome 14 showed the formation of a strong domain boundary in both early and late gametocytes, which was not observed in any of the other life cycle stages (Fig. 4a and Supplementary Data 1). This separation of the chromosome into two superdomains is reminiscent of the bipartite structure of the inactivated X chromosome (Xi) in human, rhesus macaque, and mouse32,33,34. Zooming in on the boundary region of P. falciparum chromosome 14 showed a sharp transition at the MboI restriction site at nucleotide position 1,187,169 (Fig. 4b and Supplementary Fig. 13). To demonstrate that this sharp boundary is not the result of a chromosomal translocation in the NF54 strain used for gametocyte isolations, we confirmed that the genomic region that spans the boundary region can be amplified by PCR and can be detected by Southern blot in both 3D7 and NF54 strains (Supplementary Fig. 14). In eukaryotic genomes, genes close to the domain boundary are often associated with higher levels of transcription35,36. The domain boundary is located inside or near PF3D7_1430100, which encodes serine/threonine protein phosphatase 2A activator (PTPA; Supplementary Fig. 15A). In humans, serine/threonine protein phosphatase 2A (PP2A) is one of the four major Ser/Thr phosphatases and is thought to play a complex, but mostly inhibitory role in the control of cell growth and division37. Gametocytes have a higher expression level of pfptpa than the IDC stages and express a different variant of pfptpa that does not contain exon 1 (Supplementary Fig. 16A). The sequence of intron 1 is unusual and contains many motifs that are repetitive (for example, 12 repeats of motif TGTACATACACTTAT and minor variations thereof, within the 705 nt intron; Supplementary Fig. 16B). These could be the binding sites for a lncRNA or protein involved in formation of the domain boundary.

Fig. 4 Formation of superdomains on chr14 in gametocytes. a ICE-normalized contact count heatmap at 10 kb resolution of early gametocyte (left) and late gametocyte (right) chromosome 14 showing the separation of the chromosome into two superdomains. The dashed line indicates the location of the centromere, and the arrowhead indicates the position of PF3D7_1429200. b Smaller region of chromosome 14 centered on the domain boundary that is located inside PF3D7_1430000, a conserved gene with unknown function. c The homolog of pfap2 gene PF3D7_1429200 in P. berghei (PBANKA_1015500; pbap2-o3) has a nuclear localization in female gametocytes and gametes, but is not detected in male gametocytes. The top row shows male and female gametocytes. The bottom row shows a male and female gamete activated by mosquito ingestion, which triggers expression of the female-specific surface protein P28. Male (M) and female (F) parasite are indicated in the brightfield and merged images. Scale bar denotes 10 μm Full size image

The domain boundary is also relatively close to the pfap2-encoding locus PF3D7_1429200 (chr14:1,144,518–1,148,078) (Supplementary Fig. 15A). To evaluate whether this TF could be involved in sexual differentiation, we generated a transgenic P. berghei strain in which the homolog of PF3D7_1429200 (PBANKA_1015500) was expressed as a GFP-tagged protein (Supplementary Fig. 15B, C). P. berghei is widely used as a model for P. falciparum, in part because of the higher efficiency of genetic manipulations as compared to P. falciparum. Female gametocytes, activated female gametes, and zygotes all expressed the tagged ApiAP2 TF with nuclear localization of the protein, while the protein was completely absent in male gametocytes and gametes (Fig. 4c), as well as in IDC stages (data not shown). These results demonstrate that this ApiAP2 TF (named PfAP2-O3 hereafter in line with a recent publication38) is expressed in a strict sex-specific fashion.

Rearrangement of chromosomes in sporozoites

Similar to the re-localization of invasion genes during the transition from the IDC to the gametocyte stage, the invasion genes also interacted more strongly with virulence genes in P. falciparum sporozoites than during the IDC (Fig. 2e and Supplementary Fig. 9C). In P. vivax sporozoites, a cluster of invasion genes on chromosome 10 showed depletion of interactions with other loci on the same chromosome as compared to surrounding genomic regions (Supplementary Fig. 17B). This observation may suggest that the invasion genes also have a distinct genome organization at this stage of the P. vivax life cycle. However, these results may also be caused by sequence variation in invasion genes in the field isolates used in this study as compared to the reference genome, resulting in lower mapping to this region.

In addition, distinct changes were noticeable around rDNA loci. P. falciparum encodes four rDNA units containing single copies of the 28S, 5.8S, and 18S genes (Fig. 5a). The units on chromosomes 5 and 7 are active during the human blood stages, whereas the units on chromosomes 1 and 13 are active in the mosquito stages. In general, sporozoites showed a large increase in the number of contacts between rDNA genes and virulence genes as compared to the IDC stages and gametocytes (Fig. 5b and Supplementary Fig. 9D). These changes in conformation were most visible in chromosome 7, in which the rDNA unit is located at the boundary of two large domains in the IDC stages and gametocytes, which presumably contributes to its activation status. In sporozoites on the other hand, the separation of the chromosome into two large domains disappeared (Fig. 5c and Supplementary Fig. 4B). Similar changes in domain conformation can be observed around the rDNA locus on chromosome 5 (Supplementary Fig. 9E).

Fig. 5 Changes in genome organization in salivary gland sporozoites. a Locations of rDNA genes in the P. falciparum genome. Units of 28S, 5.8S, and 18S genes on chromosomes 1, 5, 7, 11, and 13 are indicated with a filled symbol. Several additional rDNA genes are located on other chromosomes, including a unit of three 5S genes on chromosome 14, which is indicated with an open symbol. b Increased overall number of interactions between rDNA genes and virulence genes in P. falciparum sporozoites. c Loss of domain formation around the rDNA locus on chr7 in P. falciparum sporozoites as compared to other life cycle stages. The borders of the rDNA locus are indicated by red lines. d Strong interchromosomal interactions in P. vivax sporozoites, indicated by white rectangles. Dashed lines indicate chromosome boundaries Full size image

Prominent features of genome organization in P. falciparum and P. vivax sporozoites were strong long-range and interchromosomal contacts that involved other genes than virulence genes and that did not seem to be present in the IDC or gametocyte stages. In P. vivax, strong intrachromosomal contacts were present in chromosomes 7 and 11, which also formed strong interchromosomal interactions with each other and with additional loci on chromosomes 9 and 13 (Fig. 1b, rightmost panel, Fig. 5d, and Supplementary Table 4). For P. falciparum, intrachromosomal interactions were observed in chromosomes 3, 4, 8, 9, 11, 13, and 14 (Supplementary Data 1), although, interestingly, these are not homologous to the loci that participate in loops in P. vivax. Several of these loops involved pfap2 loci and genes involved in sporozoite migration to the liver and in some cases in hepatocyte invasion, such as circumsporozoite protein (PfCSP), sporozoite micronemal protein essential for cell traversal (PfPLP1), thrombospondin-related anonymous protein (PfTRAP), sporozoite protein essential for cell traversal (PfSPECT1), and gamete egress and sporozoite traversal protein (PfGEST) (Table 1).

Table 1 Loci involved in long-range intrachromosomal interactions in P. falciparum sporozoites Full size table

A putative clonally variant gene family in P. vivax

The genome annotation of P. vivax is less complete than that of P. falciparum, and many genes have been grouped into families based solely on sequence homology. An example is the Pv-fam-e (also named rad) gene family that is closely related to the Plasmodium helical interspersed sub-telomeric (phist) gene family39, encoding exported proteins involved in erythrocyte remodeling40,41,42. The P. vivax genome contains 45 rad genes, of which 10 and 27, respectively, are located in two separate clusters on chromosome 5. In the contact count matrix of P. vivax chromosome 5, the largest of the two clusters strongly interacted with the (sub-)telomeric regions and showed a depletion of interactions with all other intrachromosomal loci (Supplementary Fig. 17C). These results suggest that P. vivax rad genes may be regulated by organization into facultative heterochromatin.

PfHP1 is essential for virulence gene colocalization

The clustering of virulence genes seems to be a general feature of the P. falciparum genome that is maintained throughout its life cycle. Recently, it was shown that depletion of PfHP1 results in loss of var gene repression and an arrest in parasite growth26, suggesting that this protein is essential for structural integrity of the repressive cluster. To study the effect of PfHP1 depletion on genome conformation, we performed Hi-C experiments on a transgenic P. falciparum strain expressing PfHP1 fused to GFP and a destabilization domain (DD), both in the presence and in the absence of Shield-1, resulting in expression or knockdown of the PfHP1 fusion protein, respectively26. Ring-stage parasites expressing the tagged PfHP1 protein showed a decrease in virulence gene clustering as compared to wild-type parasites (p-value = 0.026 and p-value ≤ 0.001, respectively, BH FDR-corrected Paulsen colocalization test43 (see Supplementary Information)). In particular, interactions between internal and subtelomeric var gene clusters were lost (Supplementary Fig. 18). This result is in agreement with an increase in internal var gene expression in this strain, as reported previously26. Virulence gene clustering was completely lost in the PfHP1-depleted strain (p-value = 0.129, BH FDR-corrected Paulsen colocalization test), in line with a generalized loss of var gene repression26. Accordingly, we observed more significantly changing virulence gene bins as compared with wild-type in the PfHP1-depleted strain (n = 71) than in the PfHP1-tagged strain (n = 18). These results confirm that PfHP1 is indeed essential for maintenance of the structure of the repressive cluster and thus for regulation of virulence gene expression.

The 3D genome structure correlates with gene expression

Finally, we explored the relationship between gene expression and 3D structure, leveraging four published expression data sets44,45,46,47 and our 3D models of the genome architecture. As in our previous study, we applied kernel canonical correlation analysis (KCCA)48. KCCA is an unsupervised learning approach akin to principal component analysis that identifies a set of orthogonal gene expression components that are coherent with the 3D structure. To perform this analysis, we separated the 3D models into three distinct groups: those related to IDC (ring, trophozoite, schizont), gametocyte (early and late) and sporozoite stages. For each of these groups of time points, we extracted a gene expression component and a structure component that exhibited coherence to the expression profiles and 3D structure, such that genes whose expression is correlated with the selected gene expression component tend to be colocalized in 3D. The gene expression components for the three sets of structures were highly correlated and were dominated by the repressive center (Fig. 6a and ref. 16). To further interpret the results of the KCCA, we extracted ranked lists of genes based on their KCCA scores: lists that rank genes based on the similarity of their gene expression profiles with the first or second gene expression components, and lists that rank genes based on the similarity of their 3D position with the first or second structure components. We then investigated whether several sets of genes were enriched in those ranked lists. Var, rifin, and exported protein genes all showed strong and significant enrichment on the first gene expression and structure components for all stages (Fig. 6b), as expected based on their localization in or near the repressive center. In contrast, the invasion genes were significantly enriched for the first gene expression component in gametocytes and sporozoites, and for the second gene expression and structure components in all stages (Fig. 6b). While the average scores for this group of genes are relatively small, these results suggest that expression of these genes is also coordinated with their location within the nucleus. Gametocyte-specific genes did not show correlation to the first or second component for either gene expression or structure, which is in line with regulation of these genes by other factors, such as the gametocyte-specific transcription factor PfAP2-G, instead of localization within the nucleus.