Microarray data analysis

In the current study, we performed genome-wide scans of EHH in populations from Western Siberia and Northern European Russia. Via comparisons of the two alleles of particular SNPs, the ratios of integrated EHH (iHS) provide valuable information on selection strength at each locus, and the subsequent analysis of corresponding EHH profiles allows outlying the regions and genes under selection [27]. Eight such chromosomal regions were identified, four of which (i.e., the regions associated with the locations of the MESTP3, NRG3, NBEA, and PTPRM genes) were also revealed in cross-population testing of EHH (XP-EHH test).

The MESTP3 and NBEA loci were shown to be under selective pressure for the first time. This situation seemed to be the result of the absence of appropriate populations in the previous studies (i.e., Western Siberian populations (Khanty, Mansi, and Nenets). NBEA encodes the brain-enriched scaffolding protein neurobeachin, which is involved in membrane trafficking and synaptic functioning [33]. It has been identified as a candidate gene for neurodevelopmental diseases, including autism [34]. All selection signals associated with NBEA were inside the gene and intronic. The MESTP3 genomic region provided a more interesting case. MESTP3 is a non-coding gene (pseudogene) [35]. Moreover, none of the other genes in the region, including RN7SL101P, MIR4275, and LINC02364, encoded proteins. Their products are affiliated with long non-coding RNA or microRNA classes. Pseudogenes themselves can also be a source of small interfering RNAs, another RNA class resembling microRNAs [36]. All of these RNA types play important roles in the regulation of diverse cellular processes, ranging from gene transcription to chromosome remodeling and intracellular trafficking [37–39]. One can propose that cooccurrence of these different RNA genes within a certain signature of selection is not random. They can result in a functional complex with a broad regulatory capacity that would influence local population adaptation, particularly Western Siberian populations.

The NRG3 and PTPRM genes have been listed as potential targets of natural selection [15,40]. Neuregulin 3 is a key component of the NRG3–ERBB4 pathway, which is involved in the development of several tissues, with strongest effects on the differentiation of the neural system [41]. NRG3 has been implicated as a susceptibility gene for schizophrenia and schizoaffective disorders [41]. PTPRM encodes a protein tyrosine phosphatase with an extracellular region that functions as a receptor involved in mediating cell–cell interactions and adhesion. PTPRM expression has been shown to be negatively correlated with oncogenic cell growth and cancer prognosis [42,43]. In agreement with the work of Pickrell et al. [40], both iHS and XP-EHH extreme values in NRG3 and PTPRM were mainly detected in European populations. The signals associated with NRG3 were all located inside the gene. At the same time, our results showed that the signals related to PTPRM were represented by two distinct groups, one associated with an intergenic region located at 80 kb from the start of the gene, and the other located in the middle of the first intron. Because PTPRM is a large differentially expressed and alternatively spliced gene [44], one can propose that such structure of patterns of variability plays a role in the regulation of the expression of the longest forms of PTPRM.

In the strategy used by us for searching for selection signals, we paid particular attention to the significant SNPs that occurred in more than one population. Such SNPs were found in populations from the same geographical regions (e.g., North Eastern Europe and Western Siberia). Some of those detected in populations from the European part of Russia were also found in European populations from the 1000 Genomes Project. This was unsurprising in the context of the patterns of population structure commonly shared among Europeans [45,46]. Additional support for the sharing of selection signals between populations with similar location and demography were obtained in XP-EHH tests. In contrast to the common approach, we simultaneously applied three different reference populations—YRI, TSI, and CHB. By doing this, we expected to outline the geography of the patterns of selection signals in the regions [47]. Western Siberia, exemplified by Khanty, Mansi, and Nenets, was of particular interest. We found that many of the signals identified were independently shared with both Europeans and East Asians, which might be attributed to the complex population history of Khanty, Mansi, and Nenets making them related to both Europeans and Asians [10]. It has been recently shown that Khanty, Mansi, and Nenets are descendants of ancient Northern Eurasians (ANE) and Eastern Siberians[10]. ANE people are believed to be basal to modern-day Western Eurasians, particularly Europeans, with no close affinity to East Asians [8]. At the same time, Eastern Siberian people are a sub-lineage of East Asian communities that diverged about 10,000 years ago[10]; thus, being related to Eastern Siberians, Western Siberians are also related to East Asians (e.g., Chinese). In that context, the observation that the signals located in some genes were shared just between Nenets and CHB was of special interest. One of those genes was EDAR. It is required for the development of different ectodermal derivatives, including hair, teeth and sweet glands, and is a well-known target of selection in Asian (East Asian) populations [20,48]. Another such common gene in Nenets and CHB was CDCP1, which was established to be involved in cell adhesion. Summarizing the above, one can propose that stand-alone position of Nenets among Western Siberians is attributed to the greater proportion of East Asian ancestry in Nenets, which they additionally obtained from Eastern Siberians, after the split from Khanty and Mansi [10].

The group identified and the regional population specificity in the distribution of selected signals agree with the results of previous studies [24,47,49], which showed that the signals shared among populations followed the patterns of the population structures observed. Taken together, these results suggest that local adaptation is tightly constrained by ancestral relationships between populations.