The vast majority of protein sequence changes between MHs and archaic humans stem from nonsynonymous substitutions, whose functional consequences are usually hard to predict, except in the rare cases where the same substitution exists in modern populations (). Nevertheless, disease phenotypes associated with these genes could offer some clues to their effect. Out of 45 genes with fixed MH-derived substitutions (), particularly noteworthy are four genes, wherein mutations that reduce protein function were shown to underlie disease phenotypes which are similar to traits that separate MHs and Neanderthals. These genes include SETD5, which affects nasal bridge depression, jaw size and cranial length; ZBTB24, which affects maxillary and mandibular projection, nose length, nasal bridge depression, palate height, and chin length; PGM1, which affects stature; and ATRX, which affects skeletal maturation rate, jaw size, nasal bridge width and depression, palate height, nose length, femoral head-to-neck angle, and more (). Because the substitutions in these genes separate MHs from Denisovans too, they could potentially be used to reconstruct Denisovan morphology. However, in order to infer whether these changes underlie phenotypic divergence, one should first determine how they affect protein function, if at all. As of today, the ability to do so is very restricted, thus limiting protein-based phenotypic inference.

Of note, the source of these methylation maps has been previously shown to be osteogenic (rather than hematopoietic or mesenchymal), but the precise cell type (i.e., osteoblasts/osteocytes/osteoclasts) has not been determined ().

We defined promoter DMRs as DMRs that overlap at least 1 bp of the region 5 kb upstream to 1 kb downstream a transcription start site (TSS). This included 154 MH-derived DMRs, 171 archaic-derived DMRs, 113 Neanderthal-derived DMRs, 55 Denisovan-derived DMRs, and 415 chimpanzee-human DMRs ( Table S1 ). Differentially methylated genes (DMGs) were defined as genes that have a DMR in their promoter.

Differentially methylated regions (DMRs) separating the human groups were previously identified through the comparison of 59 modern human (MH) DNA methylation maps, two Neanderthal maps, one Denisovan map and 5 chimpanzee maps (). This comparison yielded 873 MH-derived DMRs, 939 archaic-derived DMRs (i.e., the lineage leading to the last common ancestor of Neanderthals and Denisovans), 570 Neanderthal-derived DMRs, 443 Denisovan-derived DMRs, and 2,031 DMRs that separate chimpanzees and humans. The filters used in order to detect these DMRs were conservative, requiring at least 50% difference in methylation, a minimum of 50 CpGs, and that all individuals within a group cluster completely outside the range of methylation across the other groups (). In addition to evolutionary dynamics, the number of DMRs detected along each lineage is determined by two technical factors: 1. higher coverage and deamination rates lead to increased power to detect DMRs. 2. more samples within a human group lead to increased power in identifying fixed changes, hence leading to the detection of less, but more robust DMRs. For further discussion of the DMR identification process, and on removing noise variability that is not relevant to differences between the human groups, see Gokhman et al. ().

Associating genes with phenotypes

Köhler et al., 2014 Köhler S.

Doelken S.C.

Mungall C.J.

Bauer S.

Firth H.V.

Bailleul-Forestier I.

Black G.C.M.

Brown D.L.

Brudno M.

Campbell J.

et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Köhler et al., 2014 Köhler S.

Doelken S.C.

Mungall C.J.

Bauer S.

Firth H.V.

Bailleul-Forestier I.

Black G.C.M.

Brown D.L.

Brudno M.

Campbell J.

et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Hamosh et al., 2005 Hamosh A.

Scott A.F.

Amberger J.S.

Bocchini C.A.

McKusick V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Li et al., 2015 Li A.H.

Morrison A.C.

Kovar C.

Cupples L.A.

Brody J.A.

Polfus L.M.

Yu B.

Metcalf G.

Muzny D.

Veeraraghavan N.

et al. Analysis of loss-of-function variants and 20 risk factor phenotypes in 8,554 individuals identifies loci influencing chronic disease. MacArthur et al., 2012 MacArthur D.G.

Balasubramanian S.

Frankish A.

Huang N.

Morris J.

Walter K.

Jostins L.

Habegger L.

Pickrell J.K.

Montgomery S.B.

et al. A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes. We used the Human Phenotype Ontology (HPO) database (build 110) to link DMGs to the phenotypes they might underlie (). HPO is based on ∼4,000 human disorders, which were translated to over 100,000 gene-phenotype associations. The monogenic diseases are taken from highly curated databases (OMIM, Orphanet and DECIPHER). Thus, HPO provides direct information on the phenotypes that are incurred by alterations of each gene in the database. HPO also includes information on the frequency of each phenotype (i.e., frequent or non-frequent) (). We took only frequent HPO phenotypes, as they represent the phenotypes that are most commonly associated with alterations to the gene (appear in > 50% of patients). Many of the phenotypes in HPO are driven by loss-of-function mutations, where one or both copies of a gene are dysfunctional due to mutations that cause frameshifts, gain of stop codons, splice site alterations, or partial or complete deletions (). Therefore, loss of one or both copies of a gene could be roughly paralleled to a decrease in its activity. This potentially provides information on the organ that is affected by gene silencing, as well as the directionality of the phenotypic change, but not on the extent of phenotypic change. Based on this logic, for each hypermethylated promoter we assigned putative phenotypes based on the diseases that the gene was shown to underlie. Gene-phenotype associations that were shown to be driven by gain-of-function mutations were removed from the analysis ( Table S3 ). This is because gain-of-function mutations usually do not represent increased activity, but rather a newly acquired gene function, and this novel function could not be used to predict what phenotype would arise from gene silencing ( Table S4 ).

Next, we examined for each HPO phenotype whether genes associated with it contain promoter DMRs. We determined the lineage in which the DMRs associated with it emerged (e.g., Neanderthal-specific), and the predicted direction of phenotypic change based on the lineage in which hypermethylation was observed (i.e., the lineage in which hypermethylation is observed was associated with the disease phenotype). Phenotypes associated with promoter DMRs were predicted to have changed. However, in order to determine the direction of change, an analysis of gene activity levels is required.

Notably, none of the relevant DMGs showed a significant positive correlation between methylation and gene expression. Because a trait that separates two groups could have arisen on either of their lineages, reconstructing the skeletal profile of a hominin group included also DMGs derived along the other lineages. Specifically, MH-, archaic- and Neanderthal-derived DMGs were used to construct the Neanderthal profile, MH-derived and chimpanzee-human DMGs were used to construct the chimpanzee profile, and MH-, archaic- and Denisovan-derived DMGs were used to construct the Denisovan profile. Additionally, Neanderthal-derived DMGs were used to identify Neanderthal-derived traits that are not shared with Denisovans.

Hamosh et al., 2005 Hamosh A.

Scott A.F.

Amberger J.S.

Bocchini C.A.

McKusick V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Lee et al., 2017 Lee M.K.

Shaffer J.R.

Leslie E.J.

Orlova E.

Carlson J.C.

Feingold E.

Marazita M.L.

Weinberg S.M. Genome-wide association study of facial morphology reveals novel associations with FREM1 and PARK2. Constant et al., 2017 Constant M.

Nicot R.

Vieira A.R.

Raoul G.

Sciote J.J.

Ferri J. Condylar geometry variation is associated with ENPP1 variant in a population of patients with dento-facial deformities. MacArthur et al., 2017 MacArthur J.

Bowler E.

Cerezo M.

Gil L.

Hall P.

Hastings E.

Junkins H.

McMahon A.

Milano A.

Morales J.

et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Hamosh et al., 2005 Hamosh A.

Scott A.F.

Amberger J.S.

Bocchini C.A.

McKusick V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. One potential limitation of this approach is that our method is based on the conjecture that even partial, rather than complete, change in the activity of a gene could potentially result in a phenotypic change. This is certainly not always the case, and therefore, cases where a gene activity partially decreased along a lineage, but only complete silencing results in a phenotype, would result in incorrect predictions of divergence. We estimate such instances to be relatively rare. Their extent could be estimated by investigating whether the symptoms of the monogenic diseases used in our study were shown to be driven by homozygous (recessive) or heterozygous (dominant) mutations. In 12 out of the 27 genes used for the Denisovan profile, heterozygous mutations have been shown to underlie the HPO phenotypes. At first glance, this suggests that 15 of the genes present a phenotype only in homozygous loss-of-function states and are therefore unsuitable for inference about their phenotypic effect in partial silencing. However, such a conclusion would be inaccurate because heterozygous mutations are vastly under-represented in genetic screenings: First, many of the studies that have tried to identify the underlying genetic mechanism of a disease specifically searched for stretches of homozygosity, often in consanguineous families, while ignoring heterozygous candidates (). Second, many of these mutations do not result in a complete loss-of-function and the gene remains partially expressed and active, even in homozygous states. This suggests that the phenotype is, to some extent, dosage-dependent, and that it manifests even when the gene is still active. Finally, heterozygous mutations regularly drive more subtle phenotypes which are sometimes classified as “normal variation” or more often - ignored. An example of such a gene in our reconstruction is PDE6A. One of the HPO phenotypes associated with this gene is Wide nasal bridge (HP:0000431). This phenotype was shown to be driven by homozygous mutations, ostensibly suggesting that the mechanism is recessive. However, nasal morphology GWAS show that a SNP in this gene (rs77409096) is significantly associated with nasal morphology (). An additional example is ENPP1, which was linked to the high palate (HP:0000218) phenotype only in homozygous individuals, but contains a SNP (rs9373000) which was shown to significantly affect jaw morphology (and specifically its height) in heterozygous as well as homozygous individuals (). This suggests that while the homozygous variants in these genes cause a severe set of phenotypes, heterozygous variants underlie milder phenotypes and are often classified as normal variation. This is further exemplified in the fact that six out of the 15 genes that supposedly drive phenotypes only through recessive inheritance were shown to be associated with skeletal phenotypes through GWAS, including in heterozygous individuals (). Six additional genes out of the 15 were shown to have partial activity even in homozygous states (). This implies that only three out of the 27 genes used in the Denisovan profile do not manifest in skeletal phenotypes when gene activity is partially altered. We believe that the true number is probably even lower than that due to the limited number of skeletal GWAS.

In summary, instances where only complete (homozygous) loss-of-function of a gene results in a phenotype could potentially cause an over-estimation of divergent traits. This is because a trait would be classified as divergent due to its associated gene being partially silenced, whereas only complete silencing of this gene results in a phenotype. We estimate such instances to be relatively rare, and they probably underlie some (or even many) of the ∼15% of cases where we predicted a trait to be divergent but no difference in morphology was reported between the groups.

Of note, the fact that most HPO phenotypes are observed as a result of loss-of-function events suggests that instances where the archaic promoter is hypermethylated would better match patient’s phenotypes (compared to instances where the MH promoter is hypermethylated). This is because in instances where the MH promoter is hypermethylated, the phenotypic prediction in our profile is a mirror image of the phenotypic effect in patients, which would not necessarily be correct, whereas in instances where the MH promoter is hypomethylated, the downregulation in patients and archaic humans is similar, and therefore the phenotype is more likely to be so too. Such incorrect predictions are encompassed in our ∼15% incorrect predictions of divergence or directionality.