Significance Copy number variation describes the degree to which contiguous genomic regions differ in their number of copies among individuals. Copy number variable regions can drive ecological adaptation, particularly when they contain genes. Here, we compare differences in gene copy numbers among 17 polar bear and 9 brown bear individuals to evaluate the impact of copy number variation on polar bear evolution. Polar bears and brown bears are ideal species for such an analysis as they are closely related, yet ecologically distinct. Our analysis identified variation in copy number for genes linked to dietary and ecological requirements of the bear species. These results suggest that genic copy number variation has played an important role in polar bear adaptation to the Arctic.

Abstract Polar bear (Ursus maritimus) and brown bear (Ursus arctos) are recently diverged species that inhabit vastly differing habitats. Thus, analysis of the polar bear and brown bear genomes represents a unique opportunity to investigate the evolutionary mechanisms and genetic underpinnings of rapid ecological adaptation in mammals. Copy number (CN) differences in genomic regions between closely related species can underlie adaptive phenotypes and this form of genetic variation has not been explored in the context of polar bear evolution. Here, we analyzed the CN profiles of 17 polar bears, 9 brown bears, and 2 black bears (Ursus americanus). We identified an average of 318 genes per individual that showed evidence of CN variation (CNV). Nearly 200 genes displayed species-specific CN differences between polar bear and brown bear species. Principal component analysis of gene CN provides strong evidence that CNV evolved rapidly in the polar bear lineage and mainly resulted in CN loss. Olfactory receptors composed 47% of CN differentiated genes, with the majority of these genes being at lower CN in the polar bear. Additionally, we found significantly fewer copies of several genes involved in fatty acid metabolism as well as AMY1B, the salivary amylase-encoding gene in the polar bear. These results suggest that natural selection shaped patterns of CNV in response to the transition from an omnivorous to primarily carnivorous diet during polar bear evolution. Our analyses of CNV shed light on the genomic underpinnings of ecological adaptation during polar bear evolution.

Brown bear (Ursus arctos) and polar bear (Ursus maritimus) diverged less than 500 kya (1⇓⇓–4). Despite this recent split, the polar bear has rapidly evolved unique morphological, physiological, and behavioral characteristics in response to the polar climate and ecology. The most obvious adaptation is the lack of fur pigmentation, which aids in camouflage (5). Additionally, polar bear claws are shorter, sharper, and more curved than those of the brown bear, adaptations that at once facilitate locomotion on icy surfaces and are better suited for grabbing and securely holding prey (5). Polar bears also have shortened tails and smaller ears to reduce heat loss, specialized front paws for swimming, and greater adipose deposits under the skin for thermal regulation (5⇓–7). Polar bears and brown bears also have drastically different diets owing to their differing ecologies. The polar bear has a lipid-rich diet consisting almost exclusively of marine mammals, while the brown bear is an omnivore with the vast majority of its diet consisting of plant material (5, 8⇓–10). The hypercarnivorous polar bear diet has resulted in several craniodental adaptations including a sharpening of the molars and a gap between canines and molars that allows for deeper canine penetration in prey (5, 9, 11).

Population genomic studies have identified genomic signatures of recent positive selection in polar bears suggesting genetic underpinnings to some adaptive phenotypes (3, 4). Miller et al. (4) sequenced the genomes of >20 polar and brown bear individuals, identifying hundreds of fixed missense substitutions in polar bears, as well as >1,000 genomic regions with highly divergent allele frequencies compared with brown bear populations. The functions of the genes identified in these analyses were involved in such processes as muscle formation, lactation, and fatty acid metabolism. Similar analyses performed by Liu et al. (3) on a different set of brown bear and polar bear individuals revealed functional associations of positively selected genes with adipose tissue development, fatty acid metabolism, heart function, and fur pigmentation. Together, these studies reveal a cohesive set of genes, pathways, and phenotypes that were likely shaped by natural selection during polar bear evolution.

These previous population genomic studies inferred recent positive selection solely from single-nucleotide polymorphisms (SNPs) and did not evaluate copy number variation (CNV) as an additional source of potential genomic divergence. CNV refers to differences in the number of repeats of a segment of DNA in the genome between individuals due to duplication, gain, or loss (12). CN variants have higher mutation rates than SNPs, and CNV loci together encompass an order of magnitude more nucleotides compared with SNPs (12, 13). CNV can influence phenotype, most commonly through changes in gene expression (14⇓–16). For example, AMY1 encodes a salivary enzyme that catalyzes the initial step of starch digestion, and in humans the CN of AMY1 corresponds to increased transcript and protein expression and is greater in populations with starch-rich diets (17, 18). Numerous other examples of population-differentiated CN profiles that may be targets of natural selection have been identified in mammals (14, 19⇓⇓⇓–23).

Here, we examine the extent of CNV in the polar bear and test the hypothesis that CN-variable genes reflect phenotypic adaptations. We generated whole-genome CN profiles for 17 polar bear and 9 brown bear individuals and identified genes with highly differentiated CN profiles between species that are indicative of recent positive selection. We observed a pervasive loss of genes involved in olfaction, as well as CN differences in genes underlying immune function, morphology, and diet. These results strongly suggest that gene CNV has contributed to polar bear adaptive evolution and illustrate the importance of including analysis of CN variants for comprehensive genomic scans of recent positive selection.

Materials and Methods Data Mining and Sequence Read Processing. We used the high-quality draft polar bear genome for our primary mapping reference (3). We downloaded the genome sequence, protein sequences, and gene annotations from the GigaDB (http://gigadb.org/dataset/100008). We also used the black bear genome as a mapping reference (24) to cross-validate our results and limit bias that could be introduced through the usage of a single reference genome. The reference black bear genome and gtf file was downloaded from ftp://ftp.jax.org/maine_blackbear_project/. We downloaded whole-genome paired-end Illumina data for 17 polar bears, 9 brown bears, and 2 black bears with reported coverage values ≥10× from the NCBI Sequence Read Archive (polar bear BioSample accession nos.: SAMN02261811, SAMN02261819, SAMN02261821, SAMN02261826, SAMN02261840, SAMN02261845, SAMN02261851, SAMN02261853, SAMN02261854, SAMN02261856, SAMN02261858, SAMN02261865, SAMN02261868, SAMN02261870, SAMN02261871, SAMN02261878, and SAMN02261880; brown bear BioSample accession nos.: SAMN02256313, SAMN02256315, SAMN02256316, SAMN02256317, SAMN02256318, SAMN02256319, SAMN02256320, SAMN02256321, and SAMN02256322; black bear BioSample accession nos.: SAMN01057691 and SAMN10023688) (2, 3, 24). We performed quality trimming for each sample using Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Residual adapter sequences were removed from reads, and reads were trimmed such that they contained a minimum quality score of 30 at each nucleotide position. We discarded trimmed reads shorter than 50 nucleotides. Estimating CNV. Quality and adapter trimmed paired-end Illumina sequence reads were independently mapped against the reference polar bear and black bear genomes using the “sensitive” preset parameters in bowtie2 (85). SAM alignment files were converted into sorted BAM format using the view and sort functions in samtools (86). We used the read depth based approach implemented in Control-FREEC to estimate integer CNs for each 10-kb window with a 2-kb step size across the entire genome (25). CN variants were not predicted in reference scaffolds <10 kb. We used the following parameters in Control-FREEC: breakPointThreshold = 0.8, coefficientOfVariation = 0.062, minExpectedGC = 0.35, maxExpectedGC = 0.55, degree = 3, and telocentromeric = 0. From the Control-FREEC outputs and gene coordinate files, we used a custom Perl script to identify genes that were entirely overlapped by CN variants (script available at https://github.com/DaRinker/PolarBearCNV). Although infrequent, we observed some instances of genes with more than one CN estimate. These rare events were due to imperfect estimation of breakpoints given the resolution of our window size and sliding window, which led to overlapping boundaries between the end of one CNV and the beginning of the next CNV. In these instances, we used the average CN. Additionally, we employed an independent read depth-based approach (BDN) to estimate gene CN. This approach is similar to our previous work and the work of others (26, 27, 87, 88). For each sample, we extracted protein-coding gene coordinates from the polar bear reference gff file and calculated average coverage values for each gene using the samtools depth function. The median value of all gene average coverage values was used as a normalizing factor representing diploid CN of two. Gene CN was calculated as follows: G e n e c o p y n u m b e r = x ¯ g e n e d e p t h x ̃ d e p t h o f a l l g e n e s × 2. Identifying Genes Differentiated by CN between Polar Bear and Brown Bear. We calculated V ST to identify divergent CNV profiles between the polar bear and brown bear populations (32). V ST is a measurement specific for multiallelic genotype data such as microsatellites and CN variants and is analogous to F ST . Both V ST and F ST consider how genetic variation can partition groups (populations or closely related species) (32) and range from 0 (no differentiation between groups) to 1 (complete differentiation between groups). We calculated V ST as follows: V ST = ( V total − ( V polar bear × N polar bear + V brown bear × N brown bear ) ) / N total V total , where V total is total variance; V polar bear and V brown bear is the CN variance for the polar bear and brown bear populations, respectively; N polar bear and N brown bear is the sample size for the polar bear and brown bear populations, respectively; and N total is the total sample size. V ST was calculated for sliding windows of 10-kb genomic bins, with a 2-kb step size across the reference polar bear genome (SI Appendix, Fig. S2). We also independently calculated V ST across all genes using gene CN estimates obtained from Control-FREEC and our BDN method (SI Appendix, Fig. S4). Permutation Testing for V ST Differentiation. To determine which genes displayed the greatest degree of observed interspecific CN variation that was likely not due to sampling bias, we performed permutation tests on the CN counts. Here, we randomly permuted all brown and polar bear individuals and calculated a new V ST for every gene. This process was repeated 1,000 times creating a distribution of V ST values for each gene (R script available at https://github.com/DaRinker/PolarBearCNV). We then selected those genes whose observed V ST fell above the 95th and 99th percentile of the permuted V ST distribution. These genes displayed strong intraspecific CN homogeneity, while also showing high degrees of interspecific differentiation (SI Appendix, Fig. S6). Finally, we took the maximum permuted V ST observed in the 95th and 99th percentiles of all genes to establish a genome-wide standard cutoff (maximum 95th percentile: V ST > 0.22; maximum 99th percentile: V ST > 0.35) for all subsequent analysis. Gene V ST values were considered significant when observed V ST values were above the maximum 95% confidence interval cutoff in both gene CN estimate methods. Protein Domain Classification and Gene Enrichment. We used hmmscan to predict protein domains (33) and ShinyGO v0.50 to perform functional enrichment on the set of CN-variable genes with homology to the human genome (i.e., those genes with Ensembl gene identifiers) (89). PCA of Gene CN. We performed PCA on gene CN to evaluate whether overall CNV was sufficient to separate the polar bear, brown bear, and black bear species. The PCs were computed using the prcomp function in R using a data matrix containing the CNs of genes in all of the bear samples analyzed (90). CN counts were first scaled and centered before PC computation. Independent analyses were performed on the full set of genes with CN information (21,142 genes), as well as on subsets of genes thresholded at a minimal V ST of 0.22 (197 genes) or 0.35 (134 genes) (Fig. 3 and SI Appendix, Table S8).

Acknowledgments We thank Shiping Liu and Zijun Xiong for information regarding the polar bear reference genome, Elizabeth Weston for helpful comments on this manuscript, and two anonymous reviewers for constructive suggestions on an earlier version of this manuscript.

Footnotes Author contributions: D.C.R., N.K.S., S.Z., and J.G.G. designed research; D.C.R., N.K.S., S.Z., and J.G.G. performed research; D.C.R., N.K.S., S.Z., and J.G.G. analyzed data; and D.C.R., N.K.S., and J.G.G. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1901093116/-/DCSupplemental.