The human genome is “littered” with remnants of ancient retrovirus infections that invaded the germ line of our ancestors. Only one of these may still be proliferating, named HERV-K HML-2 (HK2). Not all humans have the same HK2 viruses in their genomes. Here we show that one specific uncommon HK2, which lies close to a gene involved in dopaminergic activity in the brain, is more frequently found in drug addicts and thus is significantly associated with addiction. We experimentally show that HK2 can manipulate nearby genes. Our study provides strong evidence that uncommon HK2 can be responsible for unappreciated pathogenic burden, and thus underlines the health importance of exploring the phenotypic roles of young, insertionally polymorphic HK2 integrations in human populations.

HERV-K HML-2 (HK2) has been proliferating in the germ line of humans at least as recently as 250,000 years ago, with some integrations that remain polymorphic in the modern human population. One of the solitary HK2 LTR polymorphic integrations lies between exons 17 and 18 of RASGRF2, a gene that affects dopaminergic activity and is thus related to addiction. Here we show that this antisense HK2 integration (namely RASGRF2-int) is found more frequently in persons who inject drugs compared with the general population. In a Greek HIV-1–positive population (n = 202), we found RASGRF2-int 2.5 times (14 versus 6%) more frequently in patients infected through i.v. drug use compared with other transmission route controls (P = 0.03). Independently, in a United Kingdom-based hepatitis C virus-positive population (n = 184), we found RASGRF2-int 3.6 times (34 versus 9.5%) more frequently in patients infected during chronic drug abuse compared with controls (P < 0.001). We then tested whether RASGRF2-int could be mechanistically responsible for this association by modulating transcription of RASGRF2. We show that the CRISPR/Cas9-mediated insertion of HK2 in HEK293 cells in the exact RASGRF2 intronic position found in the population resulted in significant transcriptional and phenotypic changes. We also explored mechanistic features of other intronic HK2 integrations and show that HK2 LTRs can be responsible for generation of cis-natural antisense transcripts, which could interfere with the transcription of nearby genes. Our findings suggest that RASGRF2-int is a strong candidate for dopaminergic manipulation, and emphasize the importance of accurate mapping of neglected HERV polymorphisms in human genomic studies.

The human genome is littered with retroviral elements as a result of ancient retroviral infections in the germ line of our primate ancestors. Some of these retroviral invasions proliferated successfully by continuously reintegrating in their host genomes (1). Proliferation success of endogenous retroviruses (ERVs), measured by the relative abundance within different hosts, depends on a complex interplay of factors, including viral life cycle (2) and host life history (3). Furthermore, hosts have multiple diverse control mechanisms that inactivate ERVs in place. Some of these mechanisms that reduce potentially deleterious retroviral activity include the accumulation of inactivating mutations, internal recombination events between their long terminal repeats (LTRs), host-derived restriction factors, and transcriptional silencing (4).

While pathogenic ERVs have been described throughout the animal kingdom, none have been definitively linked with harmful effects in humans. Furthermore, ERV germ-line proliferation in humans and other great apes is lower compared with other primates (5), with the majority of human endogenous retroviruses (HERVs) being defective and fixed within the population. Perhaps retroviral proliferation poses a disproportionate burden for humans compared with other mammals. Although most HERVs ceased proliferating millions of years ago, HERV-K (HML-2) (referred to in the text as HK2) continued proliferating in the germ line of our ancestors after the human–chimpanzee divergence ∼5 to 6 Mya (1, 6), with some integrations still being polymorphic in the population (7, 8). As a consequence, some HK2 proviruses are exceptionally well preserved, express viral proteins, and produce viral particles (9). HK2 integrations tend to be found near genes, with an antisense bias (10), suggesting that antisense integrations are less likely to impose a pathogenic burden (11, 12). Some evidence suggests that intronic HK2 integrations modulate transcription of nearby genes (13), but their phenotypic role remains largely unknown. Intronic HK2, as well as other intronic transposable elements, have been proposed to have contributed to host evolution by modulating transcription of the surrounding genes (14, 15). Moreover, some evidence suggests that intronic integrations of transposable elements could reshape the transcriptome upon reactivation during cancer development (16), although there is no evidence that HK2 reactivation could result in gene modulation in cancer development (17).

One of the polymorphic HK2 integrations is found between exons 17 and 18 of RASGRF2 (hereafter RASGRF2-int), a gene involved in signaling pathways. RASGRF2 has at least two domains and is expressed in T cells, the heart, and the brain. The deregulation of the gene could thus be involved in complex phenotypes (or disease syndromes) involving more than one system (the brain and immune system). Double knockout mice mutants of Rasgrf2tm1Esn displayed no significant pathological phenotype with respect to growth and development (18), but experiments into its role in T-cell signaling responses showed diminished immune responses (19); the importance of the deficient immune response has not been explored in humans.

Stronger evidence exists for the brain-related phenotype of RASGRF2 in humans; a genome-wide association study on alcohol addiction provided a potential hit for the SNP rs26907 that is located between exons 3 and 4 (20). This SNP was subsequently shown to modulate addictive behavior (21) due to modifications of noradrenergic and serotonergic responses (22). Young carriers of rs26907 were more likely to have alcohol-induced reinforcement and show enhanced reward-related dopaminergic activity in functional MRI experiments (20). Crucially, the rodent model of addiction is simple, reproducible, and predictive of addiction in humans (23), with double knockout RASGRF2(−/−) mice being remarkably resistant to addiction (21).

Here we show that RASGRF2-int is significantly associated with drug addiction in two independent, genetically distinct human populations. We also provide evidence in support of a mechanistic interaction between HK2 integration and RASGRF2 transcription, suggesting that the observed association is likely to be causal.

Results

RASGRF2-Int Is Associated with Drug Addiction. The most well-established phenotype of RASGRF2 is linked with addiction. RASGRF2 is expressed most intensively in the brain, but also in the heart, lungs, and T cells (24, 25). Thus, we initially hypothesized that if the proviral HK2 modulated RASGRF2 in humans, it would be found at higher frequency among individuals with well-defined strong addictive behavior such as persons who inject drugs (PWIDs). We tested this by using PCR to screen the genomic DNA of 202 fully anonymized Greece-based HIV-positive individuals with blinded samples for the presence or absence of this integration (Fig. 1 A–D). The cohort consisted of 102 PWIDs (reported i.v. drug use within the past 6 mo as the likely transmission route) and 100 controls (infected by other transmission routes). In this population, we did not test for potential confounding variables including sex and age among groups. The power of our approach is 80% for recovering fourfold higher frequency in the PWIDs versus the expected 4% (7) in the general population. We found that the integration was present in 14 PWIDs vs. 6 non-PWIDs (P = 0.03, χ2, one-sided test) in the form of a solo-LTR, suggesting a more than twofold higher frequency in populations with long-term addictive behavior. Individuals who carried the integration were heterozygotic as determined by another PCR, specific for the preintegration site, which also served as an internal control to confirm the absence of the integration in other individuals. Fig. 1. (A) LTR integration screening—PCR design. Primer mapping and product sizes relative to the wild type (Top) and the HK2-LTR (red) integrated allele (Bottom). Exonic and intronic regions are in blue and gray, respectively. (B) Integration screening results of eight (p1 to p8) random patients: Primers LTR_splc_F and dn_intg_R were used in “p,” while primers up_intg_F and dn_intg_R were used in “s” mastermix. Patients 98523 and 99583 are positive (heterozygous) for the integration. (C) Primer sequences used for the LTR integration screening and the confirmation of the editing in HEK293 cells and for the expression assessment of the RASGRF2 exonic junctions. (D) Tabular index for the genotypic interpretation of the PCR products in B. (E) CRISPR/Cas9 editing of the HEK293 cell line to incorporate RASGRF2-int. pcDNA3.1(+) plasmid containing the LTR sequence flanked by 1-kb preintegration site homologous arms (light blue) was used for the homology directed repair (HDR) insertion mechanism. Primers intg_conf_F (mastermix “c”), which avoid the false-positive amplification of the HDR plasmid, and ingr_conf_R (mapping the 5′ LTR splice site) were used in conjunction with primers up_intg_F and dn_intg_R to confirm integration/genotyping of the cell line. (F) Post CRISPR/Cas9 clonal selection and screening/genotyping for the RASGRF2-int HEK293 cells. The gel is annotated according to the scale of the cells in culture over the days posttransfection: 96w/18d (96-well plate, 18 d), 24w/34d (24-well plate, 34 d), and T75f/48d (T75 flask, 48 d). The aneuploidy of the cell line selects for the wild-type alleles versus the edited ones. (G) Differential expression of exons (triplicates of the same clone) upon editing of HEK293 cells: SYBR Green qPCR was used to evaluate the expression levels of exon–exon junctions 2–3, 16–17, 17–18, and 18–19. Error bars represent 95% confidence intervals. The expression of the exons around the integration is reduced by more than >70% (P = 0.01, P = 0.047, and P = 0.002 for 16–17, 17–18, and 18–19, respectively, t test), while the expression of exons 2–3 is increased by more than threefold (P = 0.01, t test) compared with the wild-type HEK293 cells, 18 d posttransfection. The expression levels of the exons revert to normal after 48 d posttransfection. (H) RASGRF2 gene and alternative transcripts. Positioning of HK2 solo-LTR integration, between exons 17 and 18 of the main, 201, transcript. We further examined whether the association would be observed in an independent, genetically distinct population. Thus, we used a United Kingdom-based population of individuals with chronic hepatitis C virus (HCV) infection and tested the frequency of RASGRF2-int with respect to the route of transmission. Here, we posed a stricter criterion on the history of addictive behavior, as we selected PWIDs who had injected within the previous 6 mo from the sampling date and reported having their first injection at least 2 y before sampling (thus establishing long-term addictive behavior). The control population consisted of subjects who had been infected with HCV through bleeding disorders and matched the PWID population by age (±10 y), gender, and ethnicity. We found RASGRF2-int in 34 out of 100 PWIDs compared with 8 from the 84 controls, revealing a 3.5-fold higher frequency in populations with long-term addictive behavior (P < 0.001, χ2 test, two-sided test) compared with matched controls. We further tested potential confounding of gender, age, alcohol use, and smoking in multivariate models and found that the association of RASGRF2-int with drug addiction remained significant (P < 0.001) while no other parameters were found to be significantly associated with RASGRF2-int. A pooled analysis of the Greece- and United Kingdom-based cohorts further supported the strong significance for the association of RASGRF2-int with PWID (23.8% in 202 PWIDs versus 7.6% in 184 control patients; P < 0.0001, χ2 test, two-sided test).

Artificial Insertion of the HK2 LTR Can Modulate RASGRF2 Transcription. We then explored whether the above-found associations were due to a causal relationship between RASGRF2-int and addiction. Our primary hypothesis is that RASGRF2-int is modulating transcription of RASGRF2 (Fig. 1), as some intronic ERVs have been shown to modulate transcription in mice (12, 26, 27). RASGRF2-int, like the majority of HK2 intronic integrations of the human genome, is antisense compared with RASGRF2 (10). Intronic ERVs in mice are mostly antisense and the majority of them do not disrupt normal gene transcription (26), while a minority of antisense intronic mouse ERVs have been shown to disrupt transcription (26, 28). The alternative explanation for the above-found associations is that RASGRF2-int is a proxy of a genetically linked (yet unknown to us) polymorphism of RASGRF2, which bears the true causal effect. To test our primary hypothesis, we used the CRISPR/Cas9 approach (Fig. 1E) to introduce the LTR in the same position observed in the human population within the HEK293 cell line, derived from kidney cells but known to have a neuronal transcriptome (29). We performed PCR and Sanger sequencing of the integration and preintegration sites to show that the integration was heterozygous and that there was no off-target editing of the preintegration site. Eighteen days after clonal selection, we evaluated the transcription levels of RASGRF2 exons. We detected a significant modification of the normal transcription of RASGRF2 (Fig. 1G). More specifically, transcription and splicing of the early exons were significantly increased by more than five times while transcription and splicing of the surrounding exons were significantly diminished by ∼70% compared with the wild type (see Fig. 1G legend for detailed statistics). In a previous study, down-regulation of RASGRF2 by 70% with RNAi resulted in a marked decrease of the RASGRF2 protein (30), suggesting that our observed down-regulation can be also significant at the translational level. Crucially, according to ENSEMBL, two protein-coding transcripts are produced from RASGRF2 (RASGRF2-201 and RASGRF2-206; Fig. 1H), the presence of which in HEK293 cells we confirmed by analyzing publicly available RNA-sequencing (RNA-seq) datasets. RASGRF2-201 includes all of the exons, while RASGRF2-206 includes only the first 10 exons, suggesting that our findings could potentially be explained through down-regulation of RASGRF2-201 and up-regulation of RASGRF2-206. This hypothesis, however, needs to be explored with a stabilized cell-line model which will allow in-depth study of the potential underlying mechanism. RASGRF2 has at least two independent domains responsible for signaling activities through the guanine exchange factor (GEF), one for Ras and one for Rac1 (Fig. 1H). The exons with “diminished” expression are proxies for Ras-GEF activity, and their disruption should result in a decelerated rate of cell division. We therefore expected a fitness cost for the edited cells compared with wild type. Indeed, genome editing produced a slightly diminished survival-under-stress phenotype for the population of the edited cells; 3 out of 48 wells seeded with the clonally expanded edited cell line survived de novo clonal selection in nonenriched media compared with 13 out of 48 of the wild type, suggesting a selective disadvantage of the cells harboring the integration (P < 0.002, binomial test). Furthermore, chromosome 5 of HEK293 cells (where RASGRF2 lies) is aneuploid, the copy number of which fluctuates during passages (31). We found that within 30 d during serial passaging, the cells were losing the edited allele (Fig. 1), suggesting a selective disadvantage of the allele carrying the integration and a recovery of the normal phenotype (Fig. 1H). Remarkably, the modulated transcription profile of RASGRF2 exons was restored in the cells, which eventually lost the RASGRF2-int allele. The concurrent transcriptional recovery following the loss of the integration also supported that the observed fitness cost was indeed due to the RASGRF2 editing by the knocking down of the Ras-GEF activity and not due to potential off-target effects. Unfortunately, the fitness disadvantage resulted in destabilizing the HEK293 clone of RASGRF2-int; we attempted to stabilize the clone twice, but on both occasions loss of the clone as described above was observed (suggesting also that the fitness cost experiment is reproducible). We then attempted to edit eHAP1 cells, a haploid cell line which in theory should allow us to stabilize edited clones if the disruption caused by the integration is not deleterious when in the homozygotic state. On the other hand, if homozygosity has a significant deleterious cost, this would result in failure to expand the edited clone. After screening 92 potential clones, we obtained nine candidates, none of which survived upon expansion (see also SI Appendix). This prevented us from obtaining sufficient quantity of the edited clone that would allow us to perform direct detailed in-depth transcriptomic and proteomic analyses of RASGRF2-int. Although multiple mechanisms have been described for disruption of nearby genes by intronic integrations for other classes of human transposable elements (32), only a few studies have suggested a potential importance of intronic antisense HERVs (13), while the role of polymorphic intronic HERVs is largely unknown. To indirectly explore potential mechanisms of transcriptional modulation that could be involved in the observed in vivo and in vitro phenotypic changes, we then studied intronic transcriptional dynamics of HK2 integrations in a cell line known to highly express HK2.