The main level of the comparison is geographic samples, which, for simplicity, we will loosely refer to as ‘populations’. We used a perfectly balanced design with the same number of observations within each group. This is desirable as it facilitates computations and avoids giving greater weight to groups with larger samples. For each of the two populations, situated approximately 1.5 km apart, two leaves of sessile oak (Quercus petraea (Mattuschka) Liebl., 1784) were sampled at random from 22 randomly selected trees. Localities of provenance are near the municipality of Campobasso and Busso ( Table 1 ), which we will use henceforth as the names for the populations. Species assignment was verified using microsatellite genetic data on samples from three sympatric white oak species including the study populations [23] .

Scanned images were used to record 11 landmarks on the right half of each leaf ( Figure 1 – see Viscosi et al. [12] , for landmark definitions). We focused on one side only to adopt the same configuration as used in previous studies on the same species [12] , [13] . This is a common expedient to reduce the time of data collection in symmetric structures. However, if patterns of asymmetry and/or data on both sides of the leaf are required, such analyses can be performed in MorphoJ [24] using the methods described in Klingenberg et al. [25] .

Leaves were pressed, dried and scanned with the abaxial surface uppermost using an Epson GT-15000 scanner with a resolution of 300 dpi. The entire data collection procedure (i.e., image acquisition and landmark digitization) on the sample of leaves was repeated twice to estimate measurement error. The repetition was performed two weeks after the first round of data collection.

Step by step geometric morphometric and statistical analyses

1a) From landmark configurations to shape variables: theoretical background. Landmark based GMM [4], [5], [26] captures the form of a structure using Cartesian coordinates of a configuration of points. These points must have a one to one correspondence in the specimens to be compared. The type of correspondence (topographical, anatomical, developmental etc.) depends on the scientific questions being asked ([27]; see also Discussion). There is no absolute landmark configuration on any given structure and the choice of the configuration must be led by a clear statement on the hypotheses being tested: “... in a study of bat and bird wings if one is interested in function, landmarks at wing tips and along the leading and trailing edges may be functionally equivalent; they might embody the question in being related to functionally relevant aspects of form. However, these landmarks may lie on structures that are not equivalent in other ways; for a study of growth or evolution, alternative landmarks may be the most suited ones” [p. 89, 27]. The choice of landmarks is therefore a crucial step in the analysis and the use of outline methods, equally spaced points on contours and surfaces (called semilandmarks) [5] or other techniques, which do not depend on the explicit identification of anatomical landmarks, does not avoid the fundamental assumption of correspondence of the morphometric descriptors being used [28]. They, at least in their current form, simply delegate to an algorithm the role of the anatomist in selecting where to exactly locate the points [28]. Despite the apparent rigour, objectivity and ability to extract information otherwise difficult to obtain, different mathematical criteria and selection of parameters (e.g., the densitiy of semilandmarks) may produce incongruent shape distances and results [28], [29]. More importantly they leave the question about the biology underlying algorithmically defined anatomical features open and the possibility that mathematically corresponding points might actually map onto areas inconsistent among individuals and in relation to the biological model set by the scientific questions being addressed is real and should be acknowledged [27]. GMM is a disparate set of techniques [5] with a common purpose: the statistical analysis of differences in form using a quantitative description that preserves the geometry of shape variation. Unless differently specified, we will henceforth use GMM to refer only to landmark-based methods using Procrustes analysis. This is currently the standard method for the analysis of landmark data and the most common GMM technique together with various forms of Fourier analysis for the study of outlines [6], [30]. Thus, we performed a generalized Procrustes analysis (GPA, [31]) to separate size and shape components of form variation. Shape coordinates were computed by standardizing each configuration to unit centroid size and by minimizing differences in translation and rotation of all specimens using a least-square algorithm. Size was measured for each specimen as the centroid size of the landmark configuration. The centroid size (to which we will refer to simply as size throughout much of this article) measures the dispersion of landmarks using a function of their distances from the centroid, which is the ‘baricenter’ of a configuration: scattered landmarks will have a large centroid size, clumped landmarks a small one. This series of operations used to compute size and shape variables from the raw Cartesian coordinates of landmarks, commonly described as GPA, is also called Procrustes superimposition / alignment / registration / fit and it is one of several superimposition methods. Compared to other methods, the GPA has desirable statistical properties as a higher power in tests and increased accuracy in estimating sample means [32]–[34]. Procrustes shape coordinates are redundant. This means that there are more coordinates than the actual number of shape variables after the superimposition. In two dimensional analyses (i.e., landmarks on flat images, as in our case study), four degrees of freedom are lost: one for size standardization, two for translating configurations on the X and Y coordinate axes to superimpose their centroids, another one for minimizing rotational differences. In three-dimensions, when each landmark has an X, Y and Z coordinate, the loss of degrees of freedom is computed in the same way but the total amounts to seven due to an extra axis (Z) for translation and two more planes for rotations. Multivariate parametric statistical tests may, depending on the way they were implemented by the software authors, incorrectly compute variable degrees of freedom if performed on shape coordinates. The redundancy is, however, readily identified and accounted for by performing a principal component analysis (PCA) using the variance-covariance matrix of the GPA shape coordinates. A PCA is a way to summarize multivariate data by building linear combinations of the original variables that are uncorrelated and maximize the sample total amount of variance explained [2]–[35]. The spatial relationships between specimens are unaltered, the whole set of PCs accounts for the entire variance in the original variables and nothing is changed in the structure of the data, as only the axes on which they are projected have been rigidly rotated. The specimens can be thought of as a cloud of points in a multivariate space where the observer has changed his/her position to get a better view of the longest sides of the cloud. In GMM a PCA on shape variables is occasionally referred to as a relative warp analysis [36], [37]. There is a subtle difference between the two methods; however, in virtually all biological applications, they effectively function in the same manner and produce identical scores. For this reason that we favoured the well know term PCA.

1b) From landmark configurations to shape variables: software applications and shape spaces. We used the freeware program MorphoJ [24] for most of the analyses. The program is concisely presented in Klingenberg [24], but also has an extensive html user's guide. We will spend some time in detailing the specific operations performed in MorphoJ and some of the other freeware software we used. These programs are powerful comprehensive computer packages, which can perform a variety of analyses and data manipulations. Several others, which we are not using, can be found following the links at http://life.bio.sunysb.edu/morph/ (Accessed 2011 June 8). For more flexibility and a broader spectrum of analytical tools, shape data can be imported in R (www.r-project.org/. Accessed 2011 June 8) or directly generated in this statistical environment by using the package ‘shapes’ [38]. Claude's book [39] on GMM in R provides guidelines and examples to assist with using this software language. Files for data manipulation and digitization were created mostly using software from the TPS Series, a suite of programs for two-dimensional geometric morphometric analyses [37]: First we created a TPS file in TPSUtil using the “build TPS file from images” option. This is a simple ascii text file with the extension .TPS, which lists the names of the image files and specifies that no landmarks have yet been digitized. TPS files can be manipulated in TPSUtil (e.g., to change the order of / remove landmarks or specimens etc.) or manually in any text editor. Then we opened the TPS file in TPSDig and digitized the 11 landmarks in the same order on each picture, after setting a scale factor. The scale factor was set using the image tool menu of TPSDig (options: measure, set scale) to measure a distance specified by the user (10 mm, in our example) on a ruler placed beside the leaf when it was scanned. The scale factor (mm/pixel) is used to convert coordinates from pixels to millimeters (or another unit of measure) and to have landmark configurations of all specimens to the same scale. If the scale factor is the same for all pictures, however, as in our case where all leaves were scanned using the same magnification and resolution, it can be set during the digitization of the first image only; this will be used for all other individuals in the same file. We converted TPS into NTS (TPSUtil, convert TPS/NTS file option) checking the box for using the scale factor and also the one for using image names as labels. The NTS format is another ascii text file used for landmark coordinates (and other variables). The information on landmark coordinates stored in this type of file is the same as in the TPS format, but data are rearranged as a matrix with rows corresponding to specimens and columns corresponding to coordinates. In this example, the conversion to NTS is only used as a shortcut to quickly create specimen identifiers based on the original names of the image files. Indeed, if descriptive names are chosen for the image files and used as labels in NTS, they can be easily converted into grouping variables in MorphoJ, after importing the NTS file (menu: “file”; option: “create new projet” using two-dimensional data in “NTSYSpc” format, without object symmetry as only one side was digitized). For instance, in a file name as Busso_T01_L1_R1.jpg or Campo_T04_L2_R2.jpg: the first five characters indicate the locality; the 7th to 9th a tree from that locality; the 11th and 12th a leaf from that tree with its first (R1) or second (R2) replica image on which landmarks were digitized once on each image. Using the option “extract new classifiers” in the menu “preliminaries” of MorphoJ, one can tell the program to use characters 1–5 for the geographic populations; 1–9 for the trees; 1–12 for the individual leaves. Classifiers, as well as covariates made of continuous variables (e.g., geographic coordinates, environmental covariates as temperature, humidity etc.), can also be imported later from separate txt files or specified manually in the edit classifiers menu of MorphoJ. Finally, to aid visualizations, we drew lines, called links, between pairs of landmarks (menu: “preliminaries; option: create or edit wireframe”) to create a wireframe which resembles a stylized leaf. We also built and imported (menu: “file”; option: “import outline file”) a leaf outline to make a more effective graphical representation of the output of the analyses. Outlines are contours that are drawn in TPSDig using a series of points. These points will not be used in the analysis, as they are not landmarks, but can be used to show shape variation by rendering the contour image in the background. Outlines cannot be imported directly in the TPS format and they have to be converted in a ascii txt file format, as described in detail in MorphoJ user's guide. After having completed data collection and preliminary operations, the numerical analysis begins: Specimens are Procrustes superimposed (menu: preliminaries; option: Procrustes fit). In MorphoJ, as in most other GMM programs, this operation separates size and shape and also projects shape coordinates into a Euclidean space tangent to the Procrustes shape space. The projection into the tangent space is performed because standard statistical methods such as regression, analysis of variance and many others generally require data to be in a flat Euclidean space. In simple terms, this means that a distance between two observations is a straight line computed using the theorem of Pythagoras (or its multivariate extension). However, because the Procrustes shape space is curved, it has to be approximated by a tangent Euclidean space using a projection computed as a cartographer would do to draw the curved surface of the Earth onto a flat map. The point of tangency between the two spaces is the sample mean shape. The approximation in the tangent space for almost all biological datasets analysed until now is excellent [5]. The space occupied by real organisms, even when it is a macroevolutionary study of differences between mammal orders [40], is tiny compared to the space of all possible shapes. The tangent space approximation is seen as a purely theoretical issue by the majority of morphometricians working on biological data. Nevertheless it should be checked. TPSSmall [37] regresses through the origin the set of Euclidean distances in the Euclidean space onto the set of Procrustes shape distances. If the approximation is excellent, one will get a regression with both slope and correlation virtually equal to 1. The sample was inspected for outliers. This was done both on the total sample and within each population sample. Sub-samples are obtained in MorphoJ using the “preliminaries” option “subdivide dataset by” with an appropriate classifier. Outliers for size are easier to find using univariate methods as, for instance, box-plots [41] in PAST [42], [43]. For shape, MorphoJ has an option in the preliminaries menu that may help to detect specimens unusually distant from the mean. This is based on a model that assumes that the data are multivariate normally distributed. A second exploratory method to spot potential outliers consists in looking for individual points separated from the main scatter of observations in PCA scatterplots. A PCA can be performed in MorphoJ after computing the variance covariance matrix of the Procrustes shape coordinates (menu: “preliminaries”; option: “generate covariance matrix”) and projecting the data onto the corresponding eigenvectors (menu: “variation”; option: “principal component analysis”). A third option to aid outlier detection in combination with the previous two is to look for isolated branches, generally near the root of the tree, in phenograms. Phenograms can be computed by performing a cluster analysis in PAST [42], [43] using Euclidean distances calculated on the matrix of Procrustes shape coordinates (menu: multivar; option: cluster analysis). A phenogram is a summary of the similarity relationships in a multivariate dataset using a tree diagram. The distance among specimens in the tree is proportional to their differences. The most similar shapes are on sister branches, the most dissimilar ones are isolated next to the root. Trees tend to distort shape distances [44]. The index of cophenetic correlation available in PAST helps to quantify the magnitude of the distortion [45]. The index is computed as the correlation between the original shape distances and the distances reconstruced using the topology of the tree and ranges between 1 (no distortion) and 0 (maximum possible distortion). Different tree building algorithms may be used and the magnitude of their cophenetic indexes compared to select the one which minimizes ovarall distortions.

2a) Testing variation in populations, trees and leaves using a modified Procrustes ANOVA: theoretical background. For clarity, when populations, trees, leaves and replicas (i.e., error) refer to effects being statistically tested, we will be now using italics. Differences in leaf size and shape can occur at several levels. Our main purpose is to measure and test variation in geographic populations. We also need to know, however, how much leaves differ within and among trees and whether this is more than explained by measurement error. For this aim, we used a hierarchical analysis of variance (ANOVA) with populations as the main effect, trees and leaves as random effects (i.e., factors whose number of levels is not set by any objective a priori criterion and just reflect sampling), and leaves nested in trees. In the “variation” menu of MorphoJ, this analysis is called “Procrustes ANOVA” after Klingenberg et al. [25]. Variance is partitioned by using a “hierarchical sum of squares” (p. 620, [41]) in a way such that each effect is adjusted for all other effects that appear earlier in the hierarchy. This is taking into account the nested structure of the data (an issue that is crucial if the design is unbalanced, i.e., with unequal sample sizes), thus allowing one to quantify differences in populations, trees regardless of population and leaves regardless of both population and tree. The variance unexplained by any of these effects is measurement error and it is estimated using the differences between repetitions, which include both digitizing error and the error during image acquisition. Thus, in summary, we decomposed total variance in size or shape into main (populations), random (trees, leaves) and error (replicas) components and computed ratios between these components (populations/trees, trees/leaves, leaves/error) corrected by the appropriate number of degrees of freedom to generate the test statistics.

2b) Testing variation in populations, trees and leaves using a modified Procrustes ANOVA: software applications. The Procrustes ANOVA in the variation menu of MorphoJ was designed for studies of asymmetry in bilateral symmetric structures [25]. The analysis is automatically performed for both size (univariate) and shape (multivariate). It is crucial that the factors are accurately specified when the analysis is requested, because it is a hierarchical model and therefore the order of the effects (first populations, then trees, followed by individual leaves) is important. The current version of MorphoJ does not allow one to specify random effects other than individuals (i.e., leaves in our study). Thus, we had to take a few additional steps to obtain the correct result: We selected populations followed by trees as main effects (right panel in the analysis window) and leaves as the only random effect (first scroll down in the menu in the left side of the analysis window). This is a misspecification of factors and determines that both populations and trees are incorrectly considered at the same level with each of them being compared to (i.e., divided by) the individual leaves mean sum of squares. We manually computed the F ratio for the main effect of populations using trees as a random effect to correct for the misspecified model. The computation consists in dividing the populations mean sum of squares by the trees mean sum of squares and it is straightforward because the mean sum of squares and the corresponding degrees of freedom are the same as in the standard output of MorphoJ. Thus, one simply needs to take the numbers from the result window of MorphoJ and manually do the ‘populations to trees’ F ratio. Finally, it is possible to obtain the significance P level of the observed F statistics using an F distribution calculator (e.g., http://davidmlane.com/hyperstat/F_table.html. Accessed 2011 June 8) or a table of critical values (http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm. Accessed 2011 June 8) with the degrees of freedom corresponding to populations (numerator) and trees (denominator). To complete the analysis, we also calculated the percentage of sum of squares explained by each effect. The sum of squares measures the deviation of the observations from the mean (or means, when groups are present) and is more accurately referred to as ‘sum of squared deviations’. As it is an estimate of the variability in the sample which is readily available in the ANOVA output, it can be used to assess how well different factors fit the data. This is easily obtained by dividing the sum of squares of an effect by the total sum of squares and multiplying this ratio by 100. However, this is not the same as estimating variance components in the ANOVA, a more advanced statistical procedure that we briefly explain in a note in the Appendix 1.

3) Testing population differences using permutation tests and discriminant analyses: theoretical background and software applications. Results of the Procrustes ANOVA provide a basis on which to plan the next steps of the analysis. The lowest level, leaves, must be statistically significant, as differences among leaves regardless of populations and trees must be larger than measurement error. Measurement error should, therefore, explain a negligible percentage of variance. The next level, trees, indicates whether there is more variation in leaves of different trees than within the same tree. If that is the case, there is a stronger justification to pool leaves within trees and use their averages as an estimate of the trend in leaf form in each tree. Having removed pseudo-replicates (i.e., non-independent observations as multiple leaves from a tree) from samples, a variety of standard tests for group differences can be applied [12], [13], [46] to examine the highest and most interesting level, at least from a taxonomic perspective, of group variation: population differences. Thus: First we averaged leaves within trees in MorphoJ using the option “average observation by” from the menu “preliminaries”. Then, we used the averaged data for testing populations using a series of tests for sample mean differences including an estimate of the accuracy of leaf shape in predicting groups. For size, we performed a parametric t-test for independent samples. We also repeated the test using permutations, which do not assume normally distributed data and can be performed even if samples are small. All these tests are simultaneously performed in PAST using the menu “Statistics” with the option “F and T test for two samples”. Permutation tests for group differences can also be done in MorphoJ using a regression approach. A dummy covariate is created (menu: preliminaries; option: edit covariates), where one population is coded as -1 and the other as 1 (or vice versa). Then, size is regressed onto this dummy covariate using permutations to test significance. This test provides a P value together with the percentage of variance explained by populations. In this and other cases when we express the fit of the model in terms of variance instead of sum of squares, we do so because variance is a concept most readers may feel more familiar with. As there is a single set of predictors and no partitioning among factors, as in the Procrustes ANOVA, the percentage of sum of squares explained is identical to the percentage of variance (Zelditch et al., 2004). Multivariate shape differences can be tested pairwise in PAST using a parametric approach (menu: Multivar; option: Hotelling) or permutations (menu: Multivar; option: Two-group permutation). For those unfamilar with methods using randomizations, the rationale for permutation tests and their multivariate extension are well explained in the chapter on “Computer-based statistical methods” in Zelditch et al. [26] and also in the introductory book on resampling statistics by Manly [47]. An important caveat to bear in mind using these tests is that, although permutations can be performed with sample sizes too small for parametric tests, small samples will inevitably reduce statistical power and increase inaccuracies in estimating group means and variances. As for size, differences can also be tested in MorphoJ using the regression approach and permutations. Tests of mean sample differences are obtained in MorphoJ also as part of the output of a discriminant analysis (DA). Results will be equivalent to those using the regression approach but with the latter one also computes the variance explained and with the former one also produces a classification table, as explained in the next paragraph. We tested group differences also using a DA. In MorphoJ this is obtained from the menu “comparison”, option “discriminant analysis”. The DA is probably the most widely used statistical method for investigating taxonomic differences and is generally used as a synonym for canonical variate analysis (CVA). The term DA is preferentially used when only two groups are compared; CVA when there are three or more groups. Often in a DA on taxonomic data the main focus is on group prediction, whereas that in a CVA is more on ordinations. Neff and Marcus [35] and Albrecht [48] provide excellent and concise introductions to DA/CVAs; Klingenberg and Monteiro [49] discuss its use in GMM. In simple terms, a DA/CVA is another method to combine a set of variables, as the PCA. However, the linear combinations of the original variables are now derived to maximize group separation for: 1) testing groups (statistical inference), 2) plotting their differences (ordination) and 3) predicting group affiliation (classification). All three types of output are produced by MorphoJ when the analysis is specified as DA for two groups (as in our case). However, with three or more groups, the CVA option in MorphoJ will only test group differences pairwise and produce ordinations. The classification table and the test for overall (i.e., all groups together) differences can be obtained using PAST (menu: “Multivar”; option: “MANOVA/CVA, confusion matrix”). Thus, Procrustes shape coordinates can be exported from MorphoJ as a text file (menu: “file”; option: “export dataset”) and directly opened in PAST after changing the name of the first column from ID to LABELS (or anything else than ID). Alternatively, raw coordinates in TPS format could be directly imported in PAST, GPA superimposed, subjected to a PCA and then used for a CVA. In PAST, groups are specified by selecting and colouring rows (menu: “Edit”; option: “Row colour/symbol”) and multivariate tests should be performed using PCs with non-zero eigenvalues to be sure that degrees of freedom are correctly computed from Procrustes shape data. Finally, both in MorphoJ and PAST, only jack-knife cross-validated classification tables provide reliable information on groups [50]. In the jack-knife or leave-one-out cross-validation, one by one each individual is left out from the analysis and predicted using data from all other specimens. This way the jacknifed predictions avoid the ‘circular reasoning’ and consequently inflated accuracy of a non-cross-validated DA/CVA, where a specimen is classified using functions that were calculated on samples that included that same specimen. Both non-cross-validated and cross-validated results are produced in MorphoJ, whereas cross-validated results must be requested in PAST by checking the appropriate box in the confusion matrix window.

4) ‘Size-correction’ after testing the effect of size on shape (i.e., allometry) using a multivariate analysis of covariance (MANCOVA) design: theoretical background and software applications. Size and shape have, up to this point, been tested separately. It might be interesting to consider also the way they may interact and covary. For instance, if size variation is large, one may want to repeat shape comparisons after controlling for the effect of size on shape [51]. This effect is called allometry and in general terms refer to a change in shape associated with size differences [52]. Allometry can account for a large and statistically significant proportion of morphological variation. This is tested using a multivariate regression of shape onto size (in MorphoJ, menu: Covariation; option: Regression). Centroid size may be first transformed to its natural logarithm to increase the fit of the model, which is estimated by the percentage of shape variance explained by size. Significance is tested using a parametric test (PAST) or permutations (MorphoJ). If groups are present, one cannot fit a single regression line through all groups to test allometry. This is because lines could have group-specific slopes or intercepts. The standard parametric test for differences in slopes and intercepts of allometric trajectories also provides a method to ‘correct’ for the effect of size on shape. This is a MANCOVA with populations as groups and centroid size as a covariate [26]. A MANCOVA is similar to a MANOVA, in that one has a priori groups to compare, but also to a regression, as one can include one or more continuous variables as predictors. The aim is to test groups (populations) after removing the variance in the response variables (shape) accounted for by the covariate (size). By doing this, one may be able to say if differences in shape are actually the result of size variation only. Controlling for one factor, while testing for another one, makes simpler explanatory models and increases statistical power. The MANCOVA was applied to averaged tree leaves we have already tested for populations differences in step (3). It was also used to compute ‘size-corrected’ shapes, which were then examined using the same series of tests as on the full shapes (3): Before proceeding with the MANCOVA, the significance of allometry within groups could first be tested. This requires splitting populations into separate samples (MorphoJ menu: “Preliminaries”; option: “Subdivide dataset by”) and performing multivariate regressions of shape onto size one group at a time (menu: “Covariation”; option: “Regression”). If at least one of the groups is statistically significant, controlling for allometry using the MANCOVA, as described in the next paragraphs, might be interesting. A full MANCOVA with populations as groups, size as covariate and the populations by size interaction term included is performed. The main aim is to compare regression slopes between groups. These are tested by the interaction between populations and size. In this context, here and throughout the rest of the paper, we informally use ‘interaction’ as a concise way to indicate the test for slopes using the same convention as in most statistical programs, although this does not rigorously correspond to the meaning of ‘interaction’ in a MANOVA. The test for slopes compares the amount of variance explained by two models: one is simultaneously fitting group-specific multivariate linear regressions with each population having its own slope; the second one is also fitting group-specific lines but it does so by forcing them to be parallel. The fit of the first model (i.e., the percentage of variance explained) will always be better than that of the second one, as to keep parallel lines regression slopes become a compromise between group-specific slopes. However, if separate lines fit the data only slightly better than parallel ones and this is not enough to be statistical significant, differences between the two models are negligible and allometric trajectories can be considered parallel. In terms of shape variation, this means that the allometric pattern is the same across groups: as the leaf becomes bigger, the relative changes in proportions of its regions are similar in the different populations. For instance, population A could have an obovate leaf whereas population B might be ovate but in both A and B bigger leaves will tend to be slender compared to smaller ones. The shape of the leaf is not the same, but the trend of covariation with size is the similar. When slopes are different, allometric trajectories are pointing to different directions and one cannot easily control for the effect of size on shape in tests of group differences [26]. However, if (and generally only if) slopes do not differ significantly (i.e., the size by population interaction term of the MANCOVA is not statistically significant), one can proceed to the next step: controlling for the effect of allometry while testing groups and computing ‘size-corrected’ shapes. The MANCOVA is repeated after removing the (non-significant!) interaction term. The grouping variable, populations, is now testing differences in regression intercepts by comparing the percentages of variance explained by different models. However, the best-fit model in this instance is the one with separate but parallel lines, we have already seen, and the lower fit model is a single regression line through all data regardless of groups. If the difference is not statistically significant, intercepts of the two (or more) parallel lines with the Y axis are statistically indistinguishable. That means that allometric trajectories overlap among groups and therefore patterns are not only similar (i.e., parallel or laterally transposed) but actually the same. Group differences in this specific case are fully explained by allometry: if a population is bigger and shape covaries with size, its shape will be different simply because it has gone a little bit further along the same allometric trajectory. To reconstruct ‘size-corrected’ shapes, slopes must not be statistically significant. The ‘size-correction’ according to the MANCOVA model can be performed in MorphoJ. However, MorphoJ does not test slopes and assumes that this has already been done in another software. For instance, one can use TPSRegr [37] to perform a MANCOVA of two dimensional shapes following a regression approach. A detailed description of the analytical steps involved in this test is available in the help file of TPSRegr. Having demonstrated the non-significance of slopes, one can proceed with the ‘size-correction’ in MorphoJ. This is specified in the regression window (menu “Covariation”) using shape as the dependent dataset, centroid size as the independent one and the option “pooled regression within subgroups” with populations as subgroups. MorphoJ then performs a series of operations: fits parallel regressions lines; takes a straight line perpendicular to the size X axis; finds the points of intersection between this (b) line and the regression lines of each population (nota bene: the intersection is a point in the multivariate shape space, which has as many dimensions as the number of shape coordinates, i.e. twice the number of landmarks in two-dimensional data); computes the regression residuals, that is the differences between each observed shape and its prediction according to the population-specific parallel allometric trajectory; adds population by population (c) to (d) to reconstruct shapes in which the within-group allometric variation has been removed. The result is a sample of ‘size-corrected’ shapes according to group-specific parallel allometric trajectories. They can be used for further analyses (e.g., DA and permutation tests as in step (3)) by selecting the output of the regression in the project tree window of MorphoJ. Although MorphoJ refers to the ‘size-corrected’ shapes simply as “residuals”, they actually are residuals added to shapes computed as in (b–c). Because the (b–c) shapes are predicted at the same common size within each group and the residuals (d) are by definition independent on size, adding them back to (b–c) creates new samples whose means (b–c) are ‘size-corrected’ (d–e) and where allometric variation has been ‘squeezed’ around the (b–c) means. Thus, by using a common size to predict shapes we made the effect of size on shape comparable among groups. However, we have not said what this common size (b) might be. It turns out that one can use the grand sample mean size (i.e., the mean size of all groups) or any other size and in terms of statistical testing and relative differences among groups it will not make any difference. This is because regression lines are parallel and therefore their relative distances (i.e., the shape differences) are the same across the whole range of size variation. If this assumption is violated, because slopes appreciably different, results of the ‘size-correction’ could be different depending on the choice of the common (b) size [26].