Experimental design

Crosses were performed with winter-run steelhead from the Hood River, Oregon, which are listed as threatened under the Endangered Species Act32. All steelhead returning to spawning grounds in the Hood River were first passed over the Powerdale dam, which was a complete barrier to migrating fishes. Every fish passed over the dam was individually handled, and samples of scales and fin tissue were collected for aging and genetic analysis by staff of the Oregon Department of Fish and Wildlife. Also recorded were the length, weight, sex and run timing of every fish (none of these traits differed between hatchery and wild fish). First-generation hatchery fish have been released since 1992 as part of programme to increase the local abundance of the steelhead population11,15,33. Steelhead were easily categorized as hatchery or wild origin because all hatchery fish had their adipose fin clipped before being released as juveniles. To create first-generation hatchery fish, wild adult steelhead were collected at the Powerdale dam and spawned at Parkdale fish hatchery. The hatchery fish were reared in the hatchery environment for 1 year, after which they were released into the Hood River near the spawning sites of wild fish. Both the first-generation hatchery fish and wild-born steelhead subsequently migrated downriver to the ocean, where they matured, on average, for 3 additional years before returning to the Hood River to spawn (or be collected for the crosses used in this experiment; see Supplementary Fig. 1). The first-generation hatchery parents used in this study came from broodyears 2006 and 2007. In broodyear 2006 a total of 17,061 fish were released and in broodyear 2007 a total of 26,094 fish were released. The dimension of the final rearing tank was 6,500 cubic feet, which translates to 2.62 and 3.85 fish per cubic foot of water in 2006 and 2007, respectively (or 0.093 and 0.14 fish per liter, respectively).

In 2010, upon returning to the Hood River as adults, both wild-born and first-generation hatchery fish were collected and crosses were performed in the hatchery. We created twelve 2 × 2 matrices by crossing a hatchery (H) and a wild (W) male factorially with a hatchery and a wild female (Fig. 1a; Supplementary Table 1). We initially sequenced two male and two female offspring per family from only the H × H and W × W families from six matrices (that is, six independent families of each type, n=48). There were very few DE genes between male and female offspring (sex identified via PCR of a sex-specific marker OmyY1 ref. 34; Supplementary Fig. 2), so for subsequent sequencing we used only female offspring. To test for maternal effects, we next sequenced two females per family from all four cross types (W × W, W × H, H × W, H × H) from an additional five matrices. We also added one more offspring each from an additional W × W and H × H cross for a total of 90 individuals sequenced (Supplementary Table 1). The offspring of each cross (that is, each family) were reared under identical conditions such that the only difference between the offspring was their parentage (that is, whether their parents were born in the hatchery or the wild; see Supplementary Fig. 1). All offspring were reared until the swim-up fry stage at which point the fry were frozen in liquid nitrogen and transferred to a −80 °C freezer for storage. All fry were collected on the same day and at the same time to minimize changes in gene expression due to development or circadian rhythms35. No obvious circadian genes were identified among the DE genes (Supplementary Data 1).

RNA-seq

Total RNA was isolated from frozen fry using a modified trizol/chloroform protocol described previously36. RNA was extracted using Trizol Reagent (Invitrogen). Total RNA was treated for 10 min at 65 °C with RNAsecure reagent (Ambion) and for 10 min at 37 °C with RNase-free TURBO DNase (Ambion). Total RNA was further purified using RNAeasy Mini Kit (Qiagen) according to the manufacturer’s protocol. Concentration, integrity and extent of contamination by ribosomal RNA were assessed using Qubit Fluorometer (Life Technologies) and Bioanalyzer 2100 (Agilent Technologies). Preparation of cDNA for Illumina Genome Analyzer 2500 was completed following the TruSeq RNA Sample Preparation Kit v2 protocol (Illumina). Each individual had a unique barcode and barcodes were assigned evenly across sample type. Individuals were randomly allocated across lanes, and each lane had 10–12 individuals. This procedure resulted in an average of 20 million 100-bp single-end reads for each individual, with a total of 1.8 billion reads across the entire study.

Main effects

We first processed the reads by replacing any nucleotides with a quality score of 20 or lower with an ‘N’. We next used Bowtie2 (ref. 37) using the ‘very-sensitive’ option to align all reads to the population-specific transcriptome38. Because there are a large number of homeologs within the O. mykiss genome24, we used strict criteria for read alignment where each read was only allowed to match to a single contig and reads that matched at multiple locations were discarded. After filtering and alignment, we were left with 956 million reads. For each SAM file, we counted the number of alignments to each contig for each individual. We next used the program edgeR39 to test for differential gene expression between H × H and W × W individuals. For each gene, we required all individuals to have minimum of 10 counts per million (c.p.m.) resulting in a minimum of 100 counts per gene per individual and average number of 200 counts per gene per individual given our average coverage of 20 million reads (filtering at a less stringent threshold of 1 c.p.m. gave almost identical results, with only 9 different genes changing in statistical significance). There were a considerable number of DE genes identified between each H × H and each W × W family (that is, family-level effects), thus we used a generalized linear model to control for differences between families40. All genes with an FDR-adjusted P value≤0.05 (as implemented in edgeR) were counted as DE. We calculated the number of DE genes using all H × H and W × W offspring (Fig. 1) and after accounting for siblings. To account for siblings, we: (i) randomly sampled one offspring per family, (ii) calculated the number of DE genes, (iii) repeated the process 100 times and calculated the mean and 95% confidence intervals (Supplementary Fig. 2).

To test whether differences in gene expression between the offspring of hatchery and the offspring of wild fish were substantially above the level expected between two groups of unrelated individuals having equivalent amounts of hatchery ancestry, we used the same approach outlined above to test for DE genes between H × W vs W × H individuals (Fig. 1c). Because there were fewer H × W and W × H individuals compared with H × H and W × W individuals, we only used H × H and W × W individuals from the same matrices for comparison. To account for siblings, we used the same random sampling process described above. Using even more stringent criteria for accepting DE genes by requiring a minimum fold change≥1.5 in addition to the FDR-adjusted P value≤0.05 resulted in 28% fewer DE genes overall, but the same qualitative patterns remained. When using these more stringent criteria, the mean fold-change values were substantially larger in the comparison of H × H and W × W offspring than for the comparisons between H × W and W × H offspring (P<0.009) in addition to the number of DE genes.

Testing for maternal effects

All counts were normalized across all genes and each DE gene was classified as upregulated in wild fish or upregulated in hatchery fish based upon the back-transformed values of the associated log fold change.

Gene enrichment

To identify DE genes, we annotated the entire steelhead transcriptome via blastx to the swiss-prot and uni-prot databases, resulting in the identification of 15,763 genes. All DE genes were found to have significant homology with known genes (largest E value=7 × 10−5), though the functions associated with many of these genes remain uncharacterized. The transcriptome sequence names associated with each of the DE genes and their corresponding E value obtained from the blastx search are available in Supplementary Data 2 (and the raw sequences are available in Supplementary Data 3). We next used blast2go to identify gene ontology terms associated with each annotated contig from the transcriptome41, where 41% of all contigs were determined to have significantly associated GO terms when using the annotated transcriptome as the reference. Lastly, we performed gene-enrichment analysis to identify GO terms that were over-represented with respect to the list of all GO terms associated with the DE genes42.

Code availability

The R Script used for performing tests of differential gene expression is available as Supplementary Software 1. Both R and edgeR are freely distributed at https://cran.r-project.org/ and https://www.bioconductor.org/, respectively.