Overview of data collection

The methods used are depicted graphically in Supplementary Fig. 3. Tissue samples from 360 individuals were collected from 39 localities across Tasmania between 1999 and 2014 (Fig. 1; Supplementary Data 1). IRB approval was obtained for tissue collection (Washington State University Institutional Animal Care and Use Committee protocol ASAF #04392). From 36 sites, one or two samples were collected per locality. A total of 294 samples were collected from three localities—Freycinet (92 samples), Narawntapu (80 samples) and West Pencil Pine (122 samples; Supplementary Fig. 4). These samples were chosen to get a minimum of 20 samples per time point of our analyses, a standard approach for population genetics analyses. We refer to these three sites as the ‘focal populations.’ The Freycinet site is a 160 km2 area incorporating the Freycinet Peninsula, which is on the east coast of Tasmania. DFTD was first detected in Freycinet in 2001. The Narawntapu site is Narawntapu National Park, which is in north-central Tasmania. DFTD was first detected in Narawntapu in 2007. The West Pencil Pine site is private timber land in northwestern Tasmania. DFTD was first detected at West Pencil Pine in 2006, but has impacted populations more slowly than at other sites, probably due to initial infection with a tetraploid cancer strain that was later replaced by a diploid strain in 2011 (ref. 6). Because samples could be separated into pre-DFTD arrival and post-DFTD arrival in each population, no randomization or blinding among treatments was necessary. For each sample, a single-digest RAD library was prepared with the restriction enzyme pstI using standard methods11, sequenced on an Illumina HiSeq 2500 with either paired-end or single-end 150 bp reads, and aligned to the reference genome24 after quality filtering and removal of PCR duplicates (Supplementary Fig. 3).

We genotyped all samples using the Stacks25 pipeline. After genotyping, we filtered out X chromosome SNPs, potential confounded paralogues (based on heterozygosity), SNPs with mean allele frequency (MAF) <0.01 (across all 360 samples) and SNPs that were genotyped in less than one-third of the total samples. Subsequent analyses focused on the focal populations and include some additional filtering. The following four sections provide additional details on the workflow, including specific program settings, and readers uninterested in this level of detail can skip to ‘Allele frequency changes.’

Details of sequencing

For 72 samples (one or two from each of the 39 sampling localities), three lanes of paired-end 150 bp reads were generated from individually barcoded libraries using a HiSeq 2500. The remaining 288 samples were individually barcoded, multiplexed in pools of 96 samples, and for each pool, six lanes of single-end 150 bp reads were obtained. For two of the pools, the RAD library prep was performed with both 12-PCR cycles and 14-PCR cycles; for these two pools, three lanes of sequencing were obtained for the 12-cycle library and three lanes for the 14-cycle library. All but six of the samples from the focal populations were sequenced as part of the 288-sample single-end-read sequencing run.

From the 72-sample run, we obtained 456,639,846 read pairs; after de-multiplexing, quality filtering, PCR-duplicate removal and removal of low-quality alignments, we had 30,290–3.6 million reads per sample. For the other 288 samples, we obtained a total of 2,649,033,674 reads, or 60,000–14.7 million reads per sample (mean=6.1 million; Supplementary Data 1) after de-multiplexing, quality filtering and removal of low-quality alignments. Quality filtering removed 20% of reads, PCR duplicate removal removed 50% of the remaining reads (among just the 72 samples with paired-end data) and 20% of the reads were removed due to low mapping quality (MAPQ<40). We did not explicitly remove any individuals due to low coverage. Mean coverage of RAD loci found in at least one-third of the samples was ∼6 × across all individuals and ∼9 × when including only the individuals genotyped at each locus (Supplementary Fig. 5). (See below for details of filtering and alignment.) For all analyses, we used the Murchison et al.24 genome assembly as the reference genome24, and the Ensembl Devil 7.0 annotation available from Biomart26.

Details of data processing and genotyping

We processed the sequencing data using the Stacks (v1.20) pipeline. For those samples with data from both 12-PCR-cycle and 14-PCR-cycle library preps, the data from different PCR cycles were processed separately until calling SNPs. We first de-multiplexed the reads and removed poor quality reads with process_radtags from the Stacks pipeline (using the -q option; we also ‘rescued’ RAD-tags and barcodes with the -r option; all other settings were left at the defaults). Then for the samples with paired-end data, we removed PCR duplicates with clone_filter; this is not possible with single-end read data. Reads were aligned to the reference genome using bowtie2 (ref. 27) with the following options: --sensitive, --end-to-end, -X 900. We processed the resulting alignment files with samtools28. We then used custom python scripts to remove reads with a mapping quality <40, and to keep only the first read from the paired-end data for consistency among samples. We then called SNPs and produced Plink28 format output files using the standard Stacks reference-aligned pipeline: pstacks, cstacks, sstacks and populations. The default settings were used for Stacks except that the minimum stack depth per individual (-m) was set to 3, the bounded error model was used with an upper bound of 0.1 (--bound_high 0.1), the locus catalogue was created by matching to genomic position (-g), and we initially used any locus that was present in at least 1% of the individuals (-r 1 in the populations program). Owing to computational limits, we were not able to run the rxstacks pipeline for population-based correction of SNPs.

Details of filtering

For all further analyses, we filtered out SNPs that were on scaffolds assigned to the X chromosome, SNPs on RAD loci where any SNP had an observed heterozygosity >0.5 across all the samples (to eliminate confounded paralogues), SNPs present in less than one-third of the samples (<120 samples) and SNPs with an MAF<0.01 across all the samples; across the entire set of 360 individuals, this resulted in 111,659 SNPs. The intention of the MAF filter was to remove SNPs that were the result of sequencing errors; thresholds set to less than 0.01 had a very marked excess of rare variants. After applying these filters, we conducted further analyses on just the samples from the focal populations, and for most analyses removed SNPs that were genotyped in less than one-third of the individuals in a particular population. With these filters, median proportions of SNP loci genotyped per individual were 69% in Freycinet, 56% in Narawntapu and 64% in West Pencil Pine. Within each population, we imposed additional filters for certain analyses.

Details of phasing

For analyses requiring phased data, we used fastphase (v1.4.0; ref. 29). We ran fastphase on each of the focal populations separately, with 20 random starts (-T20). Each chromosome was run separately, and scaffolds were concatenated in order according to Murchison et al.24. We only included SNPs that were genotyped in at least one-third of the samples in the target population, we did not impute missing genotypes (-g option), and we chose to minimize switching error.

Allele frequency changes

To identify SNPs and associated genes that had extreme allele frequency changes in response to DFTD, we focused on samples collected before or during the first year that cancer was detected in a population and samples from the most recent collection period. For Freycinet, we tested for allele frequency differences between 1999 and the combination of 2012 and 2013; for Narawntapu, the difference between the combination of 1999 and 2004, and 2009; and for West Pencil Pine, 2006 and the combination of 2013 and 2014. Time points were combined to increase the sample size, which allowed more accurate allele frequency estimates and increased the number of SNPs. Within each population separately, we imposed additional filters for the allele frequency change analysis: each SNP had to be genotyped in at least one-third of the samples at the beginning time point and one-third of the samples at the end time point. The exceptions to this criterion were the end time point in Freycinet and the beginning time point in West Pencil Pine, for which we required SNPs to be genotyped in at least one-half of the individuals (due to small sample sizes; Supplementary Table 2). We then pruned SNPs based on linkage disequilibrium: for every pair of SNPs within 20 SNPs and 50 kb on the same scaffold, we removed one SNP if the R2 value >0.99. R2 was calculated on unphased data by using the method implemented in Plink30, assuming that runs of Ns within each scaffold were accurate estimates of the gap size between contigs. This resulted in 19,639 SNPs in Freycinet, 39,378 SNPs in Narawntapu and 5,107 SNPs in West Pencil Pine.

For each population, we ranked the SNPs by allele frequency change, and identified the regions and annotated protein coding genes within 100 kb of the top 2.5% of SNPs using bedtools31. Those genes identified in all three populations (though not necessarily by the same SNPs in every population) were considered candidate regions based on strong LD at this genomic scale. Although it is noted that loci correlated with disease resistance could evolve independently in single populations, we restricted our candidate loci to those that evolved in all the three populations. Previous genetic studies have shown that K=2 island-wide (that is, there are two genetic clusters of devils across Tasmania9) and that our three focal sites are all part of the same genetic cluster, so that these three populations may share standing genetic variation for resistance that was present before DFTD. Focusing only on loci showing a signature in all the three populations minimizes issues with false-positive signatures of selection within any single population, and is thus a conservative approach. Some populations contained more than one SNP within a single candidate region; the two candidate regions contained nine unique SNPs in the top 2.5% of the distribution in at least one population (Supplementary Table 2). We repeated the analysis using the mean allele frequency change across 200 kb sliding windows (50 kb step) but did not identify any additional candidate regions across the devil genome.

Time series selection estimates

In addition to searching for SNPs with large allele frequency changes after DFTD introduction, we also used the method of Mathieson and McVean20 to estimate the strength of selection on each SNP within each population by testing for changes in allele frequency over time. We applied this method to SNPs genotyped in at least one-third of the target population (Freycinet: 88,676 SNPs; Narawntapu: 83,826; West Pencil Pine: 94,003), assumed a generation time of 2 years, and assumed an effective population size of 34 for Freycinet, 37 for Narawntapu and 26 for West Pencil Pine. For Freycinet and West Pencil Pine, all time points were used, but for Narawntapu, only 2004 and 2009, which bracket the first detection of DFTD, were used. Effective population sizes were estimated using NeEstimator32 (v2.01) with the Jorde and Ryman two-sample temporal method33 to calculate N e . The 95% confidence limits for N e obtained by jacknifing over loci were 32.3–35.8 (Freycinet), 35.5–38.9 (Narawntapu) and 23.7–28.2 (West Pencil Pine).

Linkage disequilibrium decay

To determine the rate of LD decay in the Tasmanian devil genome, we calculated the correlation between genotypes at pairs of sites using Plink30. This method produces results similar to the standard R2 measure of linkage disequilibrium, but does not require phased data. Only SNPs with MAF>0.05 in the target population were used. In every population, mean linkage disequilibrium persisted substantially above the background level to at least 100 kb (Supplementary Fig. 1).

To detect potential selective sweeps that occurred during the onset of infection in the three focal populations, we calculated the Rsb statistic12 on phased data using the R package rehh34. In addition to the filters we imposed before phasing, we also excluded scaffolds with <10 SNPs, we only used SNPs genotyped in at least one-third of individuals in the target population and time point, and only haplotypes with at least a 30% genotyping rate were retained. First, for each population separately, the extended haplotype homozygosity statistic was calculated for each SNP, and integrated over genomic distance to obtain an integrated EHH (iES). Then, the natural log of the ratio of iES in the pre-infection time points to iES in the post-infection time points was calculated and standardized by subtracting the median and dividing by the standard deviation. Rsb scores greater than zero indicate that the extent of haplotype homozygosity increased after the introduction of DFTD—an indication of a selective sweep. Rsb scores were only calculated for SNPs with MAF≥0.05 both before and after infection.

Composite test statistic

The composite test statistic, adapted from Grossman et al.13 was calculated by dividing the genome into 100 kb non-overlapping windows and selecting the SNP with the maximum quantile for Rsb and allele frequency change (separately for each population and statistic). Then, we raised the maximum quantile value to the power of the number of SNPs in the window and subtracted from one to get an adjusted P value for the window for each statistic:

where p i is the adjusted P value for statistic/population combination i, and s is the number of SNPs in the window with non-missing values for i. This is equivalent to the probability of getting a value this extreme in a sample of s SNPs, assuming the SNPs are independent. Because SNPs are likely not independent within each window, this adjustment is conservative. The composite score was calculated using the formula for Fisher’s method for combining P values:

where n is the number of population/statistic combinations with values for the window. We then calculated a combined P value for each window by comparing the composite score to a chi-squared distribution with degrees of freedom equal to twice the number of statistics.

Data availability

The sequence data has been deposited at NCBI under BioProject PRJNA306495 (http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA306495) and BioSamples SAMN05250006-05250365. The genotype data has been deposited at Dryad under doi:10.5061/dryad.r60sv. Any other relevant data is contained within the Article and its Supplementary Files or is available from the authors upon request.