We sequenced the genome of Snowflake at 18.7× effective coverage using the Illumina GAIIx platform (114 bp paired-end reads). We aligned the reads to the reference human genome (NCBI build 37) using GEM [12], and used samtools [13] to identify single nucleotide variants (SNVs) (Methods). We found 73,307 homozygous non-synonymous Snowflake’s mutations compared to the human reference genome. Out of those, 20 were found within candidate genes for albinism (OCA related genes), but a single mutation was private compared to two other sequenced gorillas (Additional file 1: Tables S1 and S2) [9, 10]. This substitution is located in the last exon of the SLC45A2 gene at the position hg19: chr5_33944794_C/G and it causes a substantial amino acid change, Glycine to Arginine, (pGly518Arg) in a predicted transmembrane region of the protein. We then resequenced this mutation using capillary sequencing and it was confirmed as homozygous in Snowflake and heterozygous in all five tested non-albino offspring, as expected in Mendelian recessive disorders. To rule out the possible participation of other candidate genes, we also looked for structural variants that may be disrupting other genes related to pigmentation. We applied computational methods based on paired-end and split read approaches to detect genomic deletions (Methods), followed by experimental validation using array-comparative genomic hybridization (aCGH). We identified 1,390 validated deletions totaling to 9.5 Mbps, a similar proportion of the genome compared to previous reports [5] (Additional file 1: Table S3). These deletions overlap completely with 36 RefSeq transcripts and partially (>10%) with 660 transcripts (Additional file 1: Table S4) but none of them has a direct association with albinism.

Several pieces of evidence support the hypothesis that the non-synonymous mutation found in SCL45A2 might be responsible for Snowflake’s albinism. First, this specific Glycine residue is conserved throughout all available vertebrate taxa (Additional file 2: Figure S1), suggesting a conserved role of this amino acid. Second, we predicted whether this amino acid change may affect the protein structure and function based on sequence conservation and protein properties using SIFT [14] and PolyPhen-2[15]. It is predicted as a “damaging” mutation by SIFT, and “probably damaging” by PolyPhen-2. Third, this gene was reported to be the genetic cause of albinism in several other species (e.g., mouse [16], medaka fish [17], horse [18] and chicken [19]). Last, previous reports showed that Glycine to Arginine mutations within other transmembrane regions of SCL45A2 in humans result in severe albino phenotypes [20].

We followed up on this finding with an experimental study to determine how this amino acid substitution affects the transmembrane segment where this mutation is present. For this purpose we used a functional assay based on Escherichia coli inner membrane protein leader peptidase (Lep) that detects and permits accurate measurements of the apparent free energy (ΔG app ) of translocon-mediated integration of transmembrane helices into the endoplasmic reticulum (ER) membranes [21–23]. This procedure allows the quantification of the proper integration of the transmembrane region with the normal sequence and with the mutation. When we assayed the construct with the wild type sequence, we observed that 90% of the proteins were properly recognized for membrane insertion. However, translation of the mutant (G518R) found in Snowflake resulted in a significant reduction (~25%, p-value = 0.036 Mann–Whitney U test) in the membrane integration capability (Figure 2), suggesting that the replacement of a glycine by an arginine residue lowers the affinity of the transmembrane region and possibly alter the topology of the SLC45A2 gene product.

Figure 2 Membrane integration of non-albino wild-type (wt) and mutant ( Snowflake ) G518R sequences. (a) Schematic representation of the engineered leader peptidase (Lep) model protein. Wild-type Lep has two transmembrane (TM) helices (H1 and H2) and a large C-terminal luminal domain (P2). It inserts into rough microsomal (RM) membranes in an Nt/Ct ER luminal orientation. SLC45A2 TM12 domains (TM12) wild-type and G518R mutant were inserted in the P2 domain flanked by two glycosylation acceptor sites (G1 and G2). If the inserted sequence integrates into the membrane, only the G1 site is glycosylated (left), whereas both G1 and G2 sites are glycosylated for the sequences that do not integrate into the membrane (right). (b) Plasmids encoding the constructs were transcribed and translated in vitro in the absence (−) and presence (+) of RM membranes. Non-glycosylated protein bands are indicated by a white dot; singly or doubly glycosylated proteins are indicated by one or two black dots, respectively. (c) SLC45A2 TM12 sequence in each construct and ΔG app values predicted using the ΔG Prediction Server (http://dgpred.cbr.su.se/) and deduced from the data in panel b are shown. Full size image

Finally, the last piece of evidence supporting the role of the mutation in the phenotype is based on genome-wide patterns of heterozygosity in the genome of Snowflake (Additional file 2: Figure S2). We found that SLC45A2 gene is located in a large run of homozygosity (40 Mbps) orthologous to human chromosome 5 (Figure 3a), meaning that this allele was inside a block identical by descent, which is characteristic for Mendelian recessive disorders. The other three candidate genes are not found in any autozygous regions. Overall, we found 25 large runs of homozygosity (longer than 2 Mbps), and a general reduction of heterozygosity compared to the other known genomes sequenced of the same species (Figure 3b). Some of the runs of homozygosis are particularly large, such as a continuous 68 Mbps segment in chromosome 4 (Additional file 2: Figure S2). This reduction of variation might allow the emergence of certain phenotypes otherwise masked by dominance, and they could lead to inbreeding depression [24], as previously reported in chimpanzee and other primates [25].

Figure 3 Heterozygosity distributions of the studied gorillas. (a) Heterozygosity values in 1 Mbp non-overlapping windows along the human chromosome 5. The SLC45A2 gene is contained within a 40 Mbp run of homozygosity in Snowflake (blue). The other two gorillas do not show any run of homozygosity (Kamilah in red and Kwan in green). (b) Distribution of the heterozygosity values of 1Mbp non-overlapping windows in the aforementioned gorillas. While most of the genome shows a similar distribution, we observe an excess of windows with less heterozygosity as a result of the inbreeding observed in Snowflake. Full size image