Etter et al., 2011 Etter P.D.

Preston J.L.

Bassham S.

Cresko W.A.

Johnson E.A. Local de novo assembly of RAD paired-end contigs using short sequencing reads.

Catchen et al., 2011 Catchen J.M.

Amores A.

Hohenlohe P.

Cresko W.

Postlethwait J.H.

De Koning D.-J. Stacks: building and genotyping loci de novo from short-read sequences.

Broman et al., 2003 Broman K.W.

Wu H.

Sen S.

Churchill G.A. R/qtl: QTL mapping in experimental crosses.

Altschul et al., 1990 Altschul S.F.

Gish W.

Miller W.

Myers E.W.

Lipman D.J. Basic local alignment search tool.

Krzywinski, 2009 Krzywinski M.

Schein J.

Birol I.

Connors J.

Gascoyne R.

Horsman D.

Jones S.J.

Marra M.A. Circos: an information aesthetic for comparative genomics.

For RAD-sequencing, DNA was isolated from fins of the 188 F2 fish as well as from the P0 surface female and Pachón male using the DNeasy Blood & Tissue Kit (QIAGEN). Library preparation was carried out by Floragenex (Eugene, Oregon, USA) following the protocol of Etter et al. (). Briefly, genomic DNA was digested with SbfI (New England Biolabs) and libraries from individual F2 fish were barcoded. After random shearing with a Bioruptor (Diagenode), DNA 250 bp to 500 bp in size was isolated and RAD fragment libraries were sequenced on an Illumina HiSeq 2000 using single-end 100 bp chemistry. Raw sequence files have been deposited in the European Nucleotide Archive (ENA: PRJEB26692). FASTQ sequence data were demultiplexed and trimmed to 91 bp. The Stacks (v.1.44) () function process_radtags was used to remove poor quality reads. The remaining reads (2.6 million/sample on average) were processed with Stacks to identify single nucleotide polymorphisms (SNPs) and genotype F2 fish at these SNPs, essentially as described in the Stacks de novo pipeline documentation. Stack formation used default parameters except for requiring a minimum depth of 3 and enabling repetitive stack removal (ustacks -r -m 3). The stacks catalog was built using the P0 fish, allowing 2 mismatches (cstacks -n 2). Each fish was then matched to this catalog with sstacks. For QTL analysis, 176 F2 fish with phenotype data were used. The Stacks MySQL database interface was used to filter tags to retain only those where P0 fish had different, homozygous alleles and at least 150/176 F2 fish were genotyped. Tags were also filtered on log likelihood (lnl > −10). The genotypes for the 6,845 resulting markers were formatted along with phenotype values for import into R/qtl () for QTL analysis. We next excluded markers with distorted segregation patterns (p value < 0.05/6,845). Genotypes for the remaining 5,634 markers were found in the expected 1:2:1 ratio (AA:26.3%, AB:49.2%, BB:24.4%). Linkage groups (LG) were formed with maximum recombination fraction (RF) = 25% and minimum LOD = 6.9. Three LG with only 2 markers each were removed. After rippling and manual rearrangement to maximize order LOD score and minimize length, one LG was split into two as it consisted of two distinct blocks with high RF and low LOD between the blocks. The final set of 25 LG was scanned for markers linked with regeneration category or with percent open compact ventricular wall (open V / size V). The genome-wide LOD significance threshold was set at the 95th percentile of 1,000 permutations. Markers on LG with LOD peaks (LG 1, 9, 10) were aligned to the cavefish genome using BLASTN (). Single best hits were retained if they mapped the full length of the read with > 95% perfect nucleotide matches. Unaligned markers (∼10%) were dropped from these 3 LG. Remaining markers were then rearranged where necessary to keep the mapped contig order, even at the cost of reducing overall LOD and increasing length. Five markers from LG1 mapped to the same contig as 11 LG9 markers so the 5 were moved to LG9. QTL scans were repeated with this post-BLAST arrangement and identified the same high LOD regions. For the Circos () plots, LG were scaled to make LG1 roughly the same size as the largest contig shown. LOD score tracks show values from the QTL scans using the post-BLAST genetic map. Marker positions within and flanking the high LOD regions were linked to the midpoint of their aligned position on the cavefish contigs. For expression log2 fold change (logFC) tracks, transcripts overlapping the linked contig regions were identified in our RNA-seq data. LogFC values are shown at the midpoint of the corresponding gene.