The use of model‐based methods to infer a phylogenetic tree from a given data set is frequently motivated by the truism that under certain circumstances the parsimony approach (MP) may produce incorrect topologies, while explicit model‐based approaches are believed to avoid this problem. In the realm of empirical data from actual taxa, it is not known (or knowable) how commonly MP, maximum‐likelihood or Bayesian inference are inaccurate. To test the perceived need for “sophisticated” model‐based approaches, we assessed the degree of congruence between empirical phylogenetic hypotheses generated by alternative methods applied to DNA sequence data in a sample of 1000 recently published articles. Of 504 articles that employed multiple methods, only two exhibited strongly supported incongruence among alternative methods. This result suggests that the MP approach does not produce deviant hypotheses of relationship due to convergent evolution in long branches. Our finding therefore indicates that the use of multiple analytical methods is largely superfluous. We encourage the use of analytical approaches unencumbered by ad hoc assumptions that sap the explanatory power of the evidence.

Methods We examined 1000 articles published in the journal Molecular Phylogenetics and Evolution between 2007 and 2009, recording methods employed and the authors’ interpretations of their results. We started with volume 53(3) (2009) and worked backwards through volume 42(1) (2007) until we reached our predetermined sample size of 1000 articles. No taxonomic groups were excluded, and levels of divergence ranged from intraspecific variation to relationships among genera and more inclusive taxa. To be included in the data set (see Supplementary Table S1), an article must fulfil the following criteria: The main purpose of the paper was to infer phylogenetic relationships among members of a taxon. Papers aiming to reconstruct the evolution of gene families etc. were not included. The paper must employ two or more methods to construct bifurcating phylogenetic trees; methods used to construct networks were not included in the comparison. Trees must be based on DNA sequences. The paper must contain a statement by the authors that compares the trees generated by the different methods [e.g. MP, maximum‐likelihood (ML), Bayesian inference (BI), neighbour‐joining (NJ)], based on the same data sets. Also included were papers containing no such statement, but showing only a single tree, even though multiple methods were employed. A paper was excluded from the data set if different phylogenetic methods were not used to analyse the same data set (e.g. MP for molecules + morphology; ML for molecules only). Papers were also excluded if there was no statement comparing the different trees: authors often discussed particulars of groups or parts of the trees without giving a general assessment of the tree topologies. Such papers were excluded. How the data were collected: 1 An article was considered to be suitable for our sample if it contained a statement about the general topology comparing the different methods. For example: “The majority rule consensus tree obtained from the Bayesian analyses […] was similar in topology with the strict consensus tree of the MP analysis, the posterior probabilities being in accordance with the bootstrap values.”

2 If several different trees were produced based on different matrices or partitions of the molecular data, the tree based on the most comprehensive combined data matrix of DNA sequences was used for comparison. If the tree topologies based on the combined data were not compared with each other by the authors, then the first data set with tree topologies compared to appear in the text was used.

3 If only a single tree (or a single “combined data” tree) was reproduced in the paper, even though several phylogenetic methods were used, this is indicated in Table S1. Often, support values for the different methods were plotted on this tree. If the authors did not discuss differences between results based on alternative methods, the topologies were recorded as identical. If the authors presented only a single tree, but were unclear in the text about differences, i.e. they talked about specific clades, not the whole tree, being similar or different, the paper was excluded. If the authors presented only one tree, but verbally described small differences between the trees, these were recorded as “minor differences”.

4 If trees produced by different methods were the same but support values differed, they were recorded as “identical”.

Results and discussion Multiple analytical methods were employed in 504 of the 1000 studies examined; of these, 411 used both MP and ML or BI on the same data set. Of the 504 papers, only two (< 0.5%) reported strong incongruence between MP and model‐based alternative topologies. Three others contained what the authors considered to be significant incongruence, but inspection of these revealed that although the topologies were quite different, the support for the incongruent nodes was weak. The vast majority of analyses produced identical topologies or trees that exhibit minor differences. The oft‐stated rationale for employing model‐based approaches is that MP might yield statistically inconsistent results from particular data architectures. The fact that more than 99% of the studies examined resulted in MP and model‐based trees that did not differ substantively from one another suggests either that most actual data sets do not yield inconsistent results when analysed by MP, or that if the MP results are inconsistent, then the ML and BI analyses have also behaved inconsistently. Either way, it is apparent that if the different methods produce the same result, then using more than one of them is redundant. Given that this appears to be the case, we think that the analytical speed, methodological clarity and interpretability of its results indicate that the MP approach is practically superior. Advocates of the ML or BI methods might argue that those methods are able to interpret the data in a more sophisticated manner, yielding branch length estimates, allowing inference of taxon ages and so forth. However, to even the most ardent ML enthusiast, sophistication is not an unqualified virtue: ML and BI model selection is usually conducted using the Akaike information criterion (Akaike, 1973) or some other statistical comparison that selects simpler models over parameter‐rich models that do not provide a significantly better explanation of the data. The interpretation of data by MP has been formally equated to that of simple ML models (Tuffley and Steel, 1997; Goloboff, 2003) . As we have shown, there is almost never a difference between MP and ML results from empirical data. This transitively implies that there is no difference between trees inferred via simple and not‐so‐simple models of character state transformation, even when the model selection criterion indicates that a more complex model fits the data better. Even an adherent of the statistical approach must concede that the results presented here indicate that complex a priori character weighting schemes do not, in general, enhance precision or accuracy of phylogenetic inference over the results of MP analysis. Another possible interpretation of our metadata is that the authors of the Molecular Phylogenetics and Evolution articles have chosen to overlook or de‐emphasize conflicts when they occur. This could be due to the false notion that when different methods support the same topology, this somehow lends extra support to their phylogenetic hypothesis (Brower, 2000), and de‐emphasizing differences lets the authors believe that their results are more robust. Or perhaps the authors felt sociologically compelled to methodological syncretism as a way to participate in the “modern” phylogenetics research paradigm. We suggest that a better use of the extra pages in these articles might be to describe what about the phylogenetic hypothesis is biologically interesting, rather than describing and comparing results from redundant methods.

Acknowledgements US National Science Foundation grant DEB 0640301 provided support for this research.

Supporting Information Table S1. Spreadsheet containing details of the 1000 articles examined for this study. Columns A to E give the reference data for the article in question. Columns F‐H summarize the numbers of in‐group taxa, out‐group taxa and genes used. Columns I‐N show which phylogenetic methods were used. Columns O and P give the taxa and the taxonomic level of the study. Columns Q to S give the assessment of the topologies and what the authors said about them Filename Description CLA_342_sm_tS1.xls1.2 MB Supporting info item Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.