Comparative sample of adult modern dogs and wolves

We carefully chose only dogs (N = 91) whose breeds closely resemble wolves in skull shape (for a complete list of breeds used see Supplementary Note S1). The wolf sample is composed of Arctic wolves (Canis lupus arctos) (N = 258) from Alaska and European wolves (Canis lupus lupus) (N = 57). For modern dog and wolf specimen locations see Supplementary Note S1.

Sample of ancient canids

In addition to Eliseevichi MAE 447/5298 (13,905 +/− 55 YBP; Epigravettian)11 and Goyet (31,680 +/− 250 YBP)10 we include in our fossil sample of ancient canids Shamanka II (7,372 +/− 47 YBP) and Ust'-Belaia (6,817 +/− 63 YBP)21, which were found in the Lake Baikal region of Eastern Siberia and identified as early Neolithic dogs, Trou Balleux which was previously identified as a late Paleolithic wolf (10,110 +/− 120 YBP)26 from Belgium, six Alaskan wolf skulls from the late Pleistocene and beginning of Holocene and four ancient Alaskan dog skulls dated to near 1600 CE, deposited before the first arrival of European explorers27, three Egyptian mummified dogs from the Saite–Ptolemaic period28, four Neolithic and one Gallo-Roman dog from France29 and a modern wolf-dog hybrid (Supplementary Table S1 and Supplementary Note S1).

Digitization of the 3D anatomical landmarks

We captured the 3D coordinates from 36 osteological landmarks (descriptions given in Supplementary Table S2) on the dorsal and ventral surfaces of the skulls. The dorsal and ventral coordinate configurations were combined into one set of coordinates using a least-squares fit (rotation and translation only) of four matching landmarks15,22. Each skull was digitized twice in order to quantify measurement error.

Cranial Index Analysis

Using the coordinate data we calculated Log 10 total skull length, Log 10 viscerocranial length, Log 10 alveolar (P4-M1) tooth row length, Log 10 greatest palate width and Log 10 minimum palate width for all specimens. These measurements were used to calculate the following cranial indices: Log 10 viscerocranial length/Log 10 total skull length, Log 10 alveolar (P4-M1) tooth row length/Log 10 total skull length, Log 10 greatest palate width/Log 10 viscerocranial length and Log 10 minimum palate width/Log 10 viscerocranial length following Germonpre et al10. We conducted a principal components analysis (PCA) of the wolf and dog cranial indices. Eliseevichi MAE, Goyet, Shamanka II, Trou Balleux, Ust'-Belaia and the other fossil wolf and dog specimens (see Supplementary Table S1) were then projected into the wolf-dog cranial index PCA. We also constructed bivariate plots of these indices to illustrate plainly the overlap of dogs and wolves. In addition, we recreated Boudadi-Maligne and Escarguel's1 plot of palatal width versus total skull length again to show how, when large dogs are included, there is overlap of dogs and wolves in this morphospace (Note: Boudadi-Maligne and Escarguel's1 use condyobasal length (Prosthion to the Occipital Condyles) – because we lack the point on the condyles we used total skull length (Prosthion to Basion) as a very close approximation).

Procrustes superimposition

Geometric morphometric analysis of three-dimensional landmark-based coordinates is an effective diagnostic tool for investigating biological shape that allows for the direct visualization of shape variation15,22,30,31,32,33,34. The raw coordinates of the landmark-based configurations of the canid skulls were converted to shape coordinates by generalized least-squares Procrustes superimposition using a procedure that takes into account the object symmetry of the specimens32,35. This involves rescaling the landmark coordinates so that each configuration has a unit Centroid Size (CS: square root of the summed squared Euclidean distances from all landmarks to their centroid). Then all configurations were translated and rotated to minimize the overall sum of the squared distances between corresponding landmarks. The amount of measurement error was calculated using a Procrustes ANOVA and was found to be insignificant35; we therefore averaged all replicates into a single configuration for each specimen.

Procrustes form space principal component analysis

A significant reduction of the overall size of the skulls is thought to characterize Paleolithic dog compared to Pleistocene wolf skulls1,3,6,8,10,11,12. Therefore, in addition to shape variables, a measure of overall size such as Centroid Size, for which the shape variables are independent, may help to determine whether fossil canids are either wolves or dogs. Centroid Size has been shown to be approximately uncorrelated with the shape variables for landmarks with small amounts of isotropic variation. After performing the Procrustes superimposition, size and shape variation were first explored with a PCA based on the covariance matrix of the dog and wolf Procrustes shape coordinates augmented by a column of the natural logarithm of Centroid Size (LnCS) – called Procrustes form space PCA33. The fossil specimens Eliseevichi MAE, Goyet, Shamanka II, Trou Balleux, Ust'-Belaia and the fossils cited above (Supplementary Table S1) were projected into the wolf-dog Procrustes form space PCA. The original covariance matrix used for the PCA excludes the fossil data because we wanted to identify where the fossil canid skulls would plot in an ordination of known wolf and dog skulls and to show the range of variation in Procrustes form space spanned between extant and extinct canids. In Procrustes form space PCA, PC1 usually captures overall size variation as well as size-related shape variation (allometry), whereas the other PCs contain residual, non-allometric, shape variation and are weakly correlated with size.

A 3D digital scan of a wolf skull was warped towards the Procrustes mean form using a thin plate spline (TPS) interpolation function using IDAV Landmark software34. Thereafter, the surface of the Procrustes mean configuration (consensus) was used to visualize size and shape variation along the PCs. The shape deformation represented by the eigenvectors of a particular PC was visualized as a TPS deformation from the consensus plus or minus the eigenvectors (right and left side of the PC, respectively). Once the eigenvectors (those related to the shape variables) are added or subtracted from the consensus, all variables are also multiplied by the exponent of the eigenvector for LnCS.

Quadratic discriminant analysis and typical probability

QDA with cross validation was used for the classification of the unknown specimens Eliseevichi MAE, Goyet, Shamanka II, Trou Balleux and Ust'-Belaia as well as the fossil specimens cited above (Supplementary Table S1). The use of QDA is justified because the Box's M test indicates that the dog and wolf covariance matrices are significantly different (Mbox = 253.94, df = 28, p-value <0.001).

Because we limited the types of breeds included in this analysis the wolf sample size is more than three times larger than that of the dogs. In order to balance the wolf and dog sample sizes, we randomly resampled the wolf specimens to create a dataset of 91 specimens. We then carried out a Procrustes form space PCA of the two groups (without the fossil skulls). This was done 1,000 times. Using a statistical test borrowed from Anderson36,37,38, for each iteration, we found that the eigenvalues from PC8 onwards were nearly equal and hence their ordination is more likely than not to be expressing only noise. Therefore, for each iteration we built a QDA model based on the first seven PCs of the Procrustes form space PCA accounting for 93% of the total form variance (as much as in the original Procrustes form space PCA with the full wolf and dog sample). The computation of the posterior probabilities (P post ) was made with an equal prior probability (P prior = 0.5) for the dogs and wolves. We assigned specimens to either the dog or wolf group only if the P post was greater than 0.90. The posterior probabilities are the probability of membership for each specimen in each group based on the relative distances to each group and they sum to 1. Therefore, the unknown specimens are forced to belong to one of the reference groups. Because of this, we defined a threshold of correct classification, giving the unknown specimen the opportunity to belong to neither of the reference groups, i.e. to be classified as an undetermined specimen (P post < 0.90). Each unknown specimen's cranium was tested through all 1,000 iterations of the cross-validation QDAs. The accuracy of the classification was computed as the percentage of iterations for which the unknown specimen was classified with a P post > 0.90 either as dog, wolf or unknown.

We also computed each specimen's typical probability (Typ.P) in order to evaluate the fit of a specimen to a group39. This probability represents how likely an unknown skull belongs to a particular group based on the variance-covariance matrix of the wolf and dog data pooled together. This probability is analogous to the probability of the null hypothesis that the specimen comes from a particular group. If above 0.05 the typical probability can be ignored, because there is no statistical ground to reject the null hypothesis. However if Typ.P ≤ 0.05, then the specimen's posterior probability should be ignored because the specimen does not belong to either the wolf or dog group.

We wanted to know whether including fossil specimens in the dog and wolf groups would change the classification of Eliseevichi MAE and Goyet. Therefore, after classifying the fossil specimens as either dogs or wolves based on the above analysis, we repeated the entire analysis with only Eliseevichi MAE and Goyet as unknowns. In this analysis we found that the eigenvalues from PC7 onwards were nearly equal. We therefore built the QDA model based on the first six PCs of the Procrustes form space PCA accounting for 93% of the total form variance for each iteration. All data were analyzed via software routines written in the R programming language.