Fossils are, by their very nature, incomplete. Whilst missing data in itself should not present a problem for reconstruction of phylogeny13,14,15,16,17,18,21, the results presented here demonstrate that the incompleteness particular to fossils – absence of soft-tissues – causes additional and systematic errors that could not be predicted from simulations alone. Not only does the preferential deletion of soft-part characters cause significantly more loss of the original phylogenetic relationships than the random deletion of characters, but it systematically shifts the reconstructed affinity of a fossil organism. When individual taxa are subjected to such simulated fossilization, they are significantly more likely to be displaced from their original position compared to random missing data. What is more, they are significantly more likely to be displaced towards the root of the tree than away from it. Fossils reconstructed as primitive, stem-group taxa may, therefore, actually have been derived members of the crown-group, spuriously displaced to the stem as an artefact of data loss during fossilization. This phenomenon of stem-ward slippage is much more than an inconvenience for those interested in the relationships of fossils from a palaeobiological perspective; it is the position of fossil taxa in trees relative to the root and extant taxa that is of central importance for studies inferring rates and sequences of macroevolutionary change. Specifically, fossils enable the accurate calibration of molecular clocks by defining minimum clade divergence times. Stem-ward slippage does not change the absolute age of a fossil, but it does change its inferred position in the tree. Any shift in fossil placement therefore has the potential to misplace calibration points within molecular trees (Fig. 1c) and makes it harder to identify the real first occurrence of a clade. Stemward slippage will result in lower calibration points and thus systematically distorts evolutionary rates, giving a narrower timeframe for evolutionary events to occur (Fig. 1c). Unless taphonomic factors are taken into account, it is likely that evolutionary rates of change, both molecular and morphological, will be overestimated (Fig. 1c). Similar problems will afflict studies attempting to determine the sequence of character change in stem lineages and thus the nature of cladogenesis. Purported stem-group fossils may not reveal accurately the stages in the origin of clades, but be mere artefacts of fossilization biases and subsequent erroneous reconstruction.

The calibration problems demonstrated above are properly seen in the context of the errors inherent in determining first fossil occurrences more generally. All such dates are necessarily provisional and subject to revision: typically downwards as older fossil exemplars are discovered and documented. Some clock methods therefore specify likelihood functions around their point calibrations, admitting much greater potential for underestimated than overestimated ages. These methods are therefore already designed to take account of calibration point underestimates, such that the systematic overestimation of first occurrence ages (believing a group to originate earlier than it did) is much the more problematic type of error1. J B S Haldane famously noted that his belief in evolution would be shattered were someone to discover a rabbit in the Precambrian22. His comment reflects an intuitive distrust of radically revised first occurrence dates for derived groups (in contrast to an easy acceptance of the discovery of relict, living ‘fossil’ last occurrences tens or hundreds of millions of years after their youngest fossil relatives). This, in turn, reflects the belief that the order of first fossil occurrences should be congruent with the order in which groups branch phylogenetically. Our results show that the removal of soft part character data makes the placement of taxa (either up or down the tree) significantly more labile than would be expected; 24% of the simulated fossil taxa that shift, shift significantly more than expected given random character loss. As such, fossilization filters causes appreciable displacement of taxa, irrespective of the direction of the movement. In this context, we note that significant crownward slippage (although less common than stemward slippage) occurred in 10% of all shifting simulated taxa. Such displacements could have the effect of making a stem representative appear to be part of a crown group, thereby potentially pulling the first occurrence date downwards; precisely the type of distortion with which relaxed clock methods are not optimally designed to contend.

Regarding the direction of displacement, the significant bias towards stemward slippage in simulated fossilization is relatively small (53% stemward versus 47% crownward), but the bias towards stemward slippage is much larger for those taxa that exhibit significant shift when compared to random incompleteness (61% versus 39%). The extent to which these displacements constitute a problem for clock and transition studies will ultimately depend upon the weight that is placed upon them in any given study. Where displaced fossils are utilised as one of just a small handful of calibration points, the greatest distortions are likely to ensue. We also note that the search for earliest exemplars can often result in the identification of fragmentary or poorly preserved material. Groups often originate at low diversity and with individuals of small size23 that will lack most of the diagnostic features of the crown clade. Earliest exemplars in real empirical data sets may therefore be less complete and more volatile than those in our simulations. An exception to this incompleteness is fossils from Konservat-Lagerstätten, but here different biases need to be taken into account. In fact, stem-ward slippage resulting from fossilization was first observed in exceptionally preserved early chordates where it results from decay biases within soft tissues6,7. Under these circumstances, systematic decay of anatomical features during fossilization can distort interpretation of the affinity of fossil taxa because taphonomic loss is easily conflated with phylogenetic absence. The results presented here indicate a more problematic effect over and above this; even where uncertainty is coded as such (? rather than 0), there can still be a residual tendency toward spurious migration of extinct taxa down phylogenetic trees. Hence, fossilization tends to degrade phylogenetic signal in precisely those characters that are most valuable for yielding an accurate, resolved tree. This observation was consistent across vertebrate and invertebrate taxa and was based on matrices compiled at different taxonomic levels. Of the reptile datasets, for example, one is a family level analysis of all squamates24 and another an analysis of 93 species of the same genus of spiny lizard25. What is more, it seems that stem-ward slippage is a more ubiquitous phenomenon, afflicting the fossil record at a more fundamental level: all animals with biomineralized skeletons.

Hard and soft part characters evidently do not convey a homogeneous phylogentic signal. Marked and significant changes to inferred phylogenies only occur with the removal of soft-part characters (simulating the palaeontological case) and not with the removal of hard part characters (Fig. 1a). As such, synergy between the information provided by hard and soft characters is unlikely to be a factor behind the problem of palaeontological biases. One solution is to focus on morphological data from extant organisms and analyze them in the light of fossilization filters. Analysis of larger compilations of zoological data matrices will identify those characters and character types that contain most homoplasy and crucially, highlight those subsets or modules of biomineralized characters that are most consistent with total evidence and molecular data26,27. It will then be possible to promote the use of such characters in palaeontological studies, enabling systematists to compare their results with and without controls for palaeontological biases. Until this point, we advocate caution when reconstructing and interpreting the phylogenetic relationships of fossil taxa and a careful consideration of the impact of missing data when doing so.