Animals

This research used age-matched female C57BL/6, DBA/2J and BALB/cJ mice (N = 12 per strain). Animal behavior tests were performed on a single occasion. No specific statistical test was performed to determine sample size. Rather, consultation from bioinformaticians and other experts in the field were considered in order to arrive at a sample size that would allow for reasonable stratification of mice into unique cohorts, and subsequently allow for reasonable statistical measurement of gene transcriptional differences. Mice were introduced to behavior testing (described below) in random order. Sequencing analyses were performed in a single batch. Mice were purchased from Jackson Labs at 7 weeks and allowed to acclimate to the laboratory environment for 2 weeks prior to testing. At 9 weeks of age, they were exposed to the EPM; at 10 weeks, the black/white box; and at 11 weeks, the three-chamber social interaction experiment. Mice were allowed at least 20 min to acclimate to each testing environment, which was carefully engineered to be quiet and free of distraction. In all cases, where necessary and possible, the ‘operator’ was ‘blinded’ to the treatment conditions. The three experimental conditions are explained in greater detail below. All in vivo experiments were performed in compliance with IACUC protocol approved through Harvard Medical School and Brigham and Women’s Hospital, and in accordance with institutional guidelines, supervised on-site by veterinary staff.

Behavior tests

Elevated plus maze

This experimental model was used to assess emotional behavior and levels of general anxiety in mice. The EPM consists of two open and two closed arms extended out from a central platform. A camera, mounted from the ceiling, allowed the experimenters to observe murine behavior from outside the testing chamber. A computer-assisted video-tracking system (TopScan software, CleverSys Inc.) was used to record the number of open and closed arm entries as well as the total time spent in the different maze compartments over the course of 5 min. The percentage of time spent in open arms was used as a surrogate measure for anxiety-like state using the following formula: (open/open + closed) × 100. The EPM was performed under bright light. The start location of the EPM was on an open arm (around mid-way up the arm) with the mouse facing toward the center of the maze.

Light/dark box

Developed by Crawley and Goodwin45, the LD box highlights the ethological conflict between an animal’s desire to explore a new environment, and yet remain shielded from predators. Our apparatus consisted of two equally sized chambers, one dark and one brightly illuminated (700–800 lux), with a small opening connecting the two (Med Associates Dark Box Insert for Mouse Open Field Activity, Product Number: ENV-511). A mouse was allowed over 10 min to move freely between chambers and the number of entries into the bright chamber and the duration of time spent there was used as indices of bright-space anxiety in mice. Note: the experiment is initiated with the mouse in the light side of the box.

Three-chamber social interaction

A three-chambered rectangular apparatus (62 cm × 40 cm) made of clear plexiglass was used to evaluate social preference. Test mice were first placed in the middle chamber and allowed to explore all three chambers for 10 min. After this habituation period, the test mouse was confined to the center chamber while an unfamiliar female mouse of the same strain, that had no prior contact with the subject mouse, was confined to a random, counterbalanced side chamber in a small wire cup. The test mouse was then allowed to explore the entire test apparatus for a 10-min session. The amount of time spent in each chamber and the time spent in close proximity to either the empty cup or the cup with the stranger mouse, was scored by an automated video-tracking system (TopScan software, CleverSys Inc., Reston, VA). Decreased duration spent near the stranger mouse as compared to the empty cup was used as a measure of social anxiety.

The EPM is used to assess general anxiety levels in mice. The LD highlights the ethological conflict between an animal’s twin desires to explore new environments while also remaining shielded from predators. The SIT test evaluates social preference.

Stratification of isogenic mouse strains based on 12-parameter behavior model

Four parameters were defined from the EPM experiment: (1) the time in seconds spent in open arm of EPM (EPM Time in Open Arm); (2) the number of times a mouse entered the open arm of EPM (EPM Open Entry); (3) the number of times a mouse entered the closed arm of EPM (EPM Closed Entry); and (4) the number of times a mouse entered either arm of EPM (EPM Total Entry). Two parameters were defined from the black/white box include the percentage of total distance traveled spent in the lighted portion of the box (B/W Distance Traveled in Light) and the percentage of total time spent in the lighted portion of the box (B/W Time Traveled in Light). The six parameters from the social interaction (SI) experiment included: SI_3M (first 3 min in the box), SI_5M (first 5 min in the box), SI_10M (first 10 min in the box), SI_3M_Group (first 3 min in the group), SI_5M_Group (first 5 min in the group) and SI_10M_ Group (first 10 min in the group).

For each mouse subtype, we first analyzed the distributions of the values for each of the metrics integrated to identify those that manifest bimodal distributions (suggesting an ability to differentiate between mice with ‘anxiety’ behaviors and mice without). Next, each of the data points was centered and standardized to zero mean and unit variance, using a z-score transformation. For each of the mice subgroups, principal components were obtained. These represent the linear combinations of metrics that best describe the variability between the individual mice. We sought to find, across all metrics we considered, the strongest differentiators of phenotype, with equal weighting to all parameters. Moreover, to facilitate this, prior to implementing the principal components analysis, all data from all psychometric testing was standardized to mean 0 and variance 1 to ensure comparability at the same scales; it was after this analysis that the SIT emerged as a key differentiator in psychometric phenotype. That is, when performing analysis for PCA to identify mouse subgroups, we were careful to define principal components in an unbiased manner so that downstream analyses would not be biased by our group stratification. After ensuring that the first two principal components carried the majority of the variance between the individual mice with respect to these metrics, we were able to show a clustering into two distinct groups of mice. This was used to define the mice with behaviors more representative of “anxiety,” as opposed to those with behaviors less representative of “anxiety”. To ensure that the principal components we used for the separation into subgroups were in keeping with our initial hypotheses of how anxiety-like behaviors would manifest, we visualized the original variables in the principal component space. This facilitated an understanding of which results of psychometric testing were more discriminatory between the anxiety-like behaviors of the mice.

Neurosurgery

An Institutional Animal Care and Use Committee-approved rapid decapitation procedure was employed in the absence of neurological sedatives, carbon dioxide suffocation or anesthesia. A trained neurosurgeon isolated amygdala following a stereotaxic dissection protocol that harnessed murine brain geometry. Amygdala was defined according to a mouse stereotaxic atlas. A dissection microscope was employed for the dissection. Brains were gently removed from the skull and stripped of the meninges. Next, brains were submerged in RNAase-free phosphate-buffered saline and mounted on a vibratome. Cuts were made at bregma −1 mm and −2.75 mm. The resulting slice was then placed with the caudal aspect facing upward. Areas identified as amygdala were removed (1–3 mm lateral from the midline and 0–1 mm from basal to apical). Bilateral amygdalar Vibratome 1000-sliced biopsy tissue was homogenized, bathed in RNAase-free phosphate-buffered saline, and centrifuged. The precipitate was then extracted and snap frozen in liquid nitrogen. The tissue was placed in Trizol (Thermo Fisher). Ribonucleic acid was extracted using the phenol–chloroform phase separation and prepared for subsequent RNA Library preparation.

RNA Library preparation and sequencing

RNA libraries were prepared using Illumina TruSeq Stranded mRNA sample preparation kits from 500 ng of purified total RNA according to the manufacturer’s protocol. The resultant dsDNA libraries were quantified by Qubit fluorometer, Agilent TapeStation 2200, and RT-qPCR using the Kapa Biosystems library quantification kit according to manufacturer’s protocols. Uniquely indexed libraries were pooled in equimolar ratios and sequenced on a single Illumina NextSeq500 run with single-end 75 bp reads by the Dana-Farber Cancer Institute Molecular Biology Core Facilities.

RNA-Seq read mapping and expression level estimation

All samples were processed using an RNA-seq pipeline implemented in the bcbio-nextgen project (https://bcbio-nextgen.readthedocs.org/en/latest/). Raw reads were examined for quality issues using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to ensure library generation and sequencing data were suitable for further analysis. If necessary, adapter sequences, as well as other contaminant sequences such as polyA tails and low-quality sequences were trimmed from reads using cutadapt46. Trimmed reads were aligned to the UCSC build mm 10 of the mouse genome using STAR47. Quality of alignments was assessed by checking for evenness of coverage, rRNA content, genomic context of alignments, complexity and other quality checks. Expression quantification was performed with Salmon48 to identify transcript-level abundance estimates and then collapsed down to the gene-level using the R Bioconductor package tximport49.

Statistical analyses and classification of anxious mice

Statistical analysis was performed using Prism software (GraphPad) determined by two-tailed Student’s t-test used to identify statistical significance between individual groups with similar variance.

For initial exploratory analysis, filtering was applied to the full normalized counts matrix. Low expressers (bottom 25% based on average expression) were removed and of those only the most variable genes (the upper quartile based on coefficient of variation) were retained. PCA was used to identify sample AG5 as a clear outlier (Supplementary Fig. 2). This sample was removed and downstream analyses were performed on the remaining samples using all 47,729 genes in the original counts matrix.

To tie together the behavioral data to the molecular phenotypic variability data, correlations were computed for all psychometric test metrics against the first five principal components from the PCA post-outlier removal. The variables incorporated include: from the EPM, total entries and percentage of time spent on the open arms; from the black/white box experiment, percentage of time spent in the light and percentage of distance spent in the light; from the three-chamber social interaction experiment, percentage of time spent in the box beside the stranger mouse in the first 3, 5 and 10 min of the experiment. Based on a high correlation of PC2 with several of the metrics (LD and EPM), we used PC2 as a proxy for these metrics in our model fit. Rather than using PC2 as a continuous variable, we separated values in tertiles to classify groups of “less anxious”, ‘anxious’ and “more anxious”. Additionally, because we observed PC1 to explain 21% of the variance in the data, and observed slight negative correlations with the box:zone ratios, we included it as a covariate in the model to control for this effect.

Differentially expressed genes were identified using the LRT as part of the R Bioconductor package DESeq2 (ref. 50). Significant genes were obtained using an FDR threshold of 0.05. Genes were separated into clusters based on similar expression profiles across the defined anxiety groups.

IPA, cytoscape and cMap analysis

Differentially expressed genes were used to create an ‘anxiety’ gene signature of 202 well-characterized genes as described above. This was analyzed using core IPA to identify the involvement of biological pathways in an unbiased manner. Gene lists were matched to the IPA curated database of canonical signaling pathways and used to detect significantly represented biological pathways (p < 0.05). Next, the individual genes were matched to their ontological terms using IPA. The most over-represented ontologies were identified (behavior, emotion, learning, rearing, cognition) and protein network diagrams showing the genes for each of these ontological classes were created using Cytoscape. IPA was then used to build networks of known interactions between the protein products of the differentially expressed genes in the signature. IPA was allowed to impute highly connected protein nodes that connected directly to genes present in the gene signature in order to create a large protein network. The subsequent network was visualized in Cytoscape. Finally, we used the 202 gene ‘anxiety’ signature to query the latest version of the Connectivity Map (cMap) (https://clue.io/). Comparisons were carried out according to refs. 51,52 and the top 15 perturbagens with the most matching profiles were selected.