Sequencing, mapping, SNP and InDel calling

Sequencing of three female Berkshire pigs generated a total of 36.65 Gb of paired-end DNA sequence, of which 36.29 Gb (99.02%) high quality paired-end reads were mapped to the pig reference genome assembly (Sscrofa10.2) (Supplementary Table S1). Consequently, for each individual, ~78.75% of reads mapped to 79.39% of the reference genome assembly with 3.64-fold average depth (Supplementary Table S1). In addition, we also downloaded the genome data of 38 individuals from across the world from the EMBL-EBI database1,11, including 14 European domestic pigs from five breeds, 6 Asian domestic pigs from three breeds in China, 7 Asian wild boars from four locations, 6 European wild boars from four locations, 4 other species in the genus Sus and an African warthog. The average depth for the compiled dataset is 6.46-fold, with average mapping rate of 95.17% and ~75.25% coverage of the reference genome assembly (Supplementary Table S2).

We performed single-nucleotide polymorphism (SNP) calling and identified 18.68 million (M) SNPs from 41 individuals (Supplementary Table S3). We then pooled the SNPs into three groups, including 5.25 M from the 23 domestic pigs, 5.47 M from the 13 wild boars and 15.73 M from the four wild genus Sus and an African warthog (Supplementary Table S3). A small portion of (2.48 M of 18.68 M, or 13.28%) SNPs was shared among the three groups, indicative of substantial genomic differences among them.

We identified 3.65 M SNPs from three Berkshire pigs, of which 21,905 coding SNPs leading to 7,773 nonsynonymous nucleotide substitutions (7,713 missense, 44 stop gain and 16 stop loss) were detected in 3,978 genes (Table 1 and Supplementary Data S1). Top 1,000 genes containing the highest number of nonsynonymous SNPs (nsSNPs) were mainly over-represented in olfactory-related categories, such as ‘olfactory transduction (52 genes, P = 1.68 × 10−10)’, ‘sensory perception of smell (53 genes, P = 8.70 × 10−9)’, ‘olfactory receptor activity (52 genes, P = 1.19 × 10−8)’ and ‘sensory perception (77 genes, P = 2.35 × 10−7)’ (Supplementary Data S2). The olfactory receptors, known to be involved in sensing of the extracellular environment, are encoded by the largest gene superfamily in the mammalian genome12,13. Pigs have one of the largest repertoire of functional olfactory receptor genes14, reflecting the strong reliance of pigs on their sense of smell while scavenging for food1 and other odor-driven behavior (particularly mate recognition and sexual receptive behavior)15,16.

Table 1 Summary and annotation of SNPs in Berkshire pigs Full size table

We also identified 2.93 M small insertion or deletion polymorphisms (InDels) ranging from 1–30 bp in length (Supplementary Table S4), which tend to be detected with greater frequency than long InDels. Only 2,991 (0.10%) InDels were located in coding sequences, of which 29.35% were multiples of 3 bp (Supplementary Fig. S1 and Data S3). The enrichment of in-frame InDels that are expected to preserve reading frame, can be explained by previous findings that in-frame InDels were under weaker negative selection than frame-shift InDels with lengths that are not evenly divided by three17,18. These InDels affected genes enriched mainly in terms related to basic cellular functions, such as the ‘binding of adenyl nucleotide, purine nucleoside, ATP, cation, ion, metal ion and nucleoside’ and ‘protein kinase activity’ (Supplementary Data S4), which is similar to previous reports in mammals about the effect of InDels on the functions of genes1,5,18,19.

Phylogenetic and admixture analysis

To explore relatedness among the Berkshire pig and other pigs distributed worldwide, we conducted principle component analysis (PCA) using genomic SNPs. The first eigenvector geographically distinguishes 23 individuals in Europe from 17 individuals in Asian and a warthog in Africa, whereas the second eigenvector captures the biological differentiation between pigs (including domestic and wild boar) and other outgroup (i.e., wild genus Sus and warthog) (Fig. 1a and Supplementary Table S5). The neighbor-joining (NJ) tree confirmed these findings and further revealed genetically distinct clusters that relate to geographic locations rather than by domestic versus wild (Fig. 1b). This result is consistent with a deep phylogenetic split between European and Asian pigs since domestication about 10,000 years ago in multiple locations across Eurasia1,20.

Figure 1 Phylogenetic relationship and gene introgression. (a) Two-way PCA plot of pig breeds. The fraction of the variance explained is 33.56% for eigenvector 1 and 9.56% for eigenvector 2 with a Tracy-Widom P value < 10−6 (Supplementary Table S5). (b) NJ phylogenetic tree of pig breeds. The scale bar represents p distance. (c) Four-taxon ABBA/BABA test of introgression. First panel from the left: ABBA and BABA nucleotide sites employed in the test are derived (- - B -) in Chinese domestic pigs compared with the warthog outgroup (- - - A), but differ among Berkshire and other 5 European domestic pigs (either ABBA or BABA). As this almost exclusively restricts attention to sites polymorphic in the ancestor of Chinese domestic pigs, Berkshire and other 5 European domestic pigs, equal numbers of ABBA and BABA sites are expected under a null hypothesis of no introgression, as depicted in the two gene genealogies. Second to last panel from the left: Distribution among chromosomes of D-statistic (± s.e.), which measures excess of ABBA sites over BABA sites, here for the comparison: Other 5 European domestic pigs (i.e. Duroc, Landrace, Pietrain, Large white and Hampshire), Berkshire, Chinese domestic pigs, African warthog. Full size image

It is well documented that a clear signal for admixture between domestic pigs in Asia and Europe1,9,21 is likely due to the importation of Chinese breeds into Europe (especially UK) at the onset of the agricultural revolution in the late 18th and 19th century22. To investigate the amount genetic material of Chinese origin in the Berkshire pigs relative to that shared by other five reprehensive European domestic pigs, we performed an admixture analysis (D-statistics) using ‘ABBA/BABA’ single nucleotide sites, which was originally developed to test for admixture between Neanderthals and modern humans23,24. We divided the genome into N blocks and computed the variance of the statistics over the genome N times, leaving each block aside and derived a standard error using the theory of the Jackknife24. Given the standard error of the D-statistics of different block sizes were very similar, we used 2 Mb as the block size for further analyses (Supplementary Table S6).

The excess of ABBA sites (0 < D < 1) indicates that the Berkshire pig has a stronger signal of introgression from Chinese domestic pig than other 5 European domestic pigs (Fig. 1c). Especially, when compared with Duroc pig, the Berkshire pig exhibits an highest excess of ABBA sites across 18 autosomes, giving a significantly positive D of 0.337 ± 0.010 (two-tailed Z-test for D = 0, P = 1.680 × 10−243) (Fig. 1c). The higher amount of Chinese genetic material in the Berkshire pig is consistent its history of origin: in the county of Berkshire in England, a reddish or sandy colored pig strain (sometimes spotted) was latterly refined with a cross of Siamese and Chinese blood (~300 years ago), bringing the color pattern we see today along with more efficient meat production22. Currently, the purebred Berkshire is recorded as a ‘transboundary’ (occurring in more than one country) breed.

Genome-wide selective sweep signals

To accurately detect the genomic footprints left by selection, we measured the genome-wide variations between six European wild boars and three Berkshire pigs, which are geographically close and genetically indistinguishable.

Compared with the wild boars, the domestic Berkshire pigs have lower levels of linkage disequilibrium (LD) across the range of distances separating loci (P < 10−16, Mann-Whitney U test) (Fig. 2a), reflecting relatively higher inbreeding under artificial breeding programs and thus a lower genomic diversity in Berkshire pig.

Figure 2 Identification of genomic regions with strong selective sweep signals in Berkshire pigs. (a) LD patterns of Berkshire and European wild boars. (b) Distribution of log 2 (θ π ratio (θ π, wild boar /θ π, Berkshire )) and F ST , which are calculated in 100 kb windows sliding in 10 kb steps. Data points located to the right of the vertical lines (corresponding to 10% right tails of the empirical log 2 (θ π ratio) distribution, where log 2 (θ π ratio) is 3.14) and above the horizontal line (10% right tail of the empirical F ST distribution, where F ST is 0.71) were identified as selected regions for Berkshire pigs (red points). (c) Violin plot of θ π ratio and F ST values for regions of Berkshire pigs that have undergone positive selection versus the whole genome. Each “violin” with the width depicting a 90°-rotated kernel density trace and its reflection. Vertical black boxes denote the interquartile range (IQR) between the first and third quartiles (25th and 75th percentiles, respectively) and the white point inside denotes the median. Vertical black lines denote the lowest and highest values within 1.5 times IQR from the first and third quartiles, respectively. The statistical significance was calculated by the Mann-Whitney U test. Full size image

Out of 272,292 windows of 100 kb in length sliding in 10 kb steps across the pig genome, 210,266 windows contain ≥ 50 SNP and cover 77.22% of the genome (Supplementary Fig. S2), which were used to detect signatures of selective sweeps. We used an empirical procedure and selected windows simultaneously with significantly high log 2 (θ π ratio (θ π , wild boar /θ π , Berkshire )) (10% right tail, where log 2 (θ π ratio) is 3.14) and significantly high F ST values (10% right tail, where F ST is 0.71) of the empirical distribution as regions with strong selective sweep signals along the genome, which should harbor genes that underwent selective sweep. Consequently, we identified a total of 11.95 Mb genomic regions (4.75% of the genome, containing 482 genes) with strong selective sweep signals in Berkshire pigs (Fig. 2b), which also exhibited significant differences (P < 10−16, Mann-Whitney U test) in log 2 θ π ratio and F ST values when compared to genomic background (Fig. 2c). SNPs from these regions formed two distinct clusters (i.e. Berkshire pigs and European wild boars) (Supplementary Fig. S3).

In total, 482 genes embedded in selected regions were predominantly related to immune (such as ‘defense response to virus (4 genes, P = 0.001)’ and ‘Immunoglobulin’ (11 genes, P = 0.001)), growth (such as ‘regulation of growth (11 genes, P = 0.003)’), reproduction (such as ‘oocyte meiosis (6 genes, P = 0.004)’ and ‘reproductive developmental process (9 genes, P = 0.005)’) (Table 2). This result coincides with previous reports of pig domestic genes1,8,11,21,25 and may be responsible for dramatic phenotypic changes in domestic pigs that are of economic values, such as disease resistance, pork yield and fertility. In addition, we also identified genes related to neuron functions (such as ‘neurotrophin signaling pathway (8 genes, P = 0.003)’) that experienced selective sweep (Table 2), which support the hypothesis that selection for altered behavior (such as tameness or aggression towards humans) was important during pig domestication and that mutations affecting developmental genes may underlie these changes10,26,27. For example, one of the genes under selective sweep in Berkshire pig is transcription factor SOX6 (SRY (sex determining region Y)-box 6) (Supplementary Data S5), a modulator of cell fate during neocortex development28, which plays roles in brain development and related to the differences in the development or maturation of the frontal cortex in domesticated animals29.

Table 2 Top ten functional gene categories enriched for genes affected by domestication Full size table

Body length in domestic pigs

Notably, we detected numerous well-characterized genes related to body length embedded in selected regions (Fig. 3a), which is the most characteristic morphological change between the wild boar and domestic pig. Wild boars, which are ancestors of domestic pigs, have 19 vertebrae. In comparison, European commercial breeds have 21–23 vertebrae, probably owing to selective breeding for enlargement of body size30.

Figure 3 Genes related to body length with strong selective sweep signals in Berkshire pigs. (a) Log 2 (θ π ratio (θ π, wild boar /θ π, Berkshire )) and F ST values are plotted using a 10 kb sliding window for genes embedded in selected regions. Genomic regions located above the upper horizontal blue line (corresponding to a 10% significance level of F ST , where F ST = 0.71) and above the lower horizontal red line (a 10% significance level of θ π ratio, where log 2 (θ π ratio) = 3.14) were termed as regions with strong selective sweep signals (green regions). Genome annotations are shown at the bottom (black bar: coding sequences, blue bar: genes). The boundary of ten genes related to body length is marked in red. (b) NR6A1 gene with strong selective sweep signals. Out of 482 genes embedded in selected regions which crossed 1,144 windows of 100 kb in length sliding in 10 kb steps, only one gene (i.e. NR6A1) is embedded in the most significantly (1% right tail log 2 (θ π ratio and F ST values) selected regions (log 2 (θ π ratio) = 6.72; F ST = 0.91). Full size image

Eight genes exhibiting strong selective sweep signals are significantly over-represented in ‘OMIM-disease term: Many sequence variants affecting diversity of adult human height’ (P = 0.002)’ (Supplementary Data S5), which has been documented to associate significantly with adult human height31. For example, ADAMTSL3 (a disintegrin-like and metalloprotease domain with thrombospondin type I motifs-like 3), a glycoprotein in extracellular matrix, is associated with the chondrogenesis, morphogenesis and growth of the skeleton in human31,32,33 and other mammals (cattle)34, which is also an attractive candidate genetic marker to identify animal body size or type. GPR126 (G-protein coupled receptor 126), an orphan receptor of the adhesion-G-protein coupled receptor family, is essential for mammalian embryonic viability35, myelination36, osteoclast function and regulation of bone mineral density37. Association between variation at GPR126 with height in childhood38 and adult31,33 as well as the skeletal frame size39 has been shown. PRKG2 (cGMP-dependant type II protein kinase) is involved in preovulatory follicles as a response to luteinizing hormone and progesterone, which is highly expressed in brain and in cartilage and contributed to the determination of dwarfism in mammals. The knockout mouse40 and naturally occurring rat41 and cattle42PRKG2 mutants resulted in unorganized growth plate with abnormal stacking of chondrocytes and dwarfism. In addition, the RASGEF1B (RasGEF domain family, member 1B) is a highly conserved guanine nucleotide exchange factor for Ras family proteins43, which is neighboring with the PRKG2. Ras superfamily proteins function as molecular switches in fundamental events such as signal transduction, cytoskeleton dynamics and intracellular trafficking. In human, the microdeletion (1.37 Mb) at chromosome 4q21 that encompass PRKG2 and RASGEF1B resulted in growth restriction, mental retardation and absent or severely delayed speech43.

We also found IGF1 (Insulin-like growth factor 1), a hormone similar in molecular structure to insulin, which is a primary mediator of the effects of growth hormone, could stimulate systemic body growth, especially skeletal muscle, cartilage and bone and has been recognized as a major determinant of body size in mammals44,45. In particular, NR6A1 (nuclear receptor subfamily 6, group A, member 1), which is involved in neurogenesis and germ cell development46, to be embedded in the most significantly selected regions (simultaneously with high log 2 (θ π ratio) (1% right tail) and F ST values (1% right tail) (Fig. 3b). It has been well documented that the NR6A1 is a strong candidate for being a causal gene underlying the elongation of the back and an increased number of vertebrae in pigs varies8,47,48.

Analogously with the human height31, the porcine trunk length is a highly heritable (the number of vertebrae, h2 = 0.62) and classic polygenic trait49. The strong selective sweeps of these genes related to ‘body length’ reveal the specific evolutionary scenarios triggered by artificial selection for agricultural production.