Diversity and Similarity Measures

This site is aimed at ecologists rather than geneticists, but the mathematical issues are the same. Click here for the genetics version.

The biological literature about diversity and similarity indices is a mess. Much of it is superficial and confusing, and some of it is just plain wrong. Biologists frequently confuse diversity with other quantities, and frequently use diversity indices inappropriately. On the other hand many biologists have sensed that there is something suspicious about the way diversity indices are used, but their solution is to avoid frequency-based diversity and similarity measures altogether and use only species richness and the basic similarity measures derived from it, like the Jaccard and Sorensen indices. Yet ecologically significant differences between communities are really differences in species frequencies, not mere presence or absence. Abusing or avoiding the analysis of diversity and similarity holds back the whole field.

What is diversity? What diversity index should be used for a particular purpose? How should diversity indices be interpreted? What is the real definition of alpha and beta diversity--are alpha and beta multiplicative, as in Whittaker's law (alpha times beta equals gamma), or are they additive as in Lande's (1996) definition (alpha plus beta equals gamma), or neither? What do similarity and overlap measures really measure and which ones should be used? How are they related to diversity measures, and how are diversity measures related to each other? Though biologists have often treated these questions as if they were matters of opinion, in fact each of these questions has a definitive answer. These topics have deep logical and mathematical foundations, rich contexts, and many interconnections between them. These foundations have not been appreciated by most biologists. Diversity and similarity indices cannot be treated as random ingredients in some analytical soup

There are at least three fundamental mistakes in the literature of diversity analysis. First, the literature confuses diversity with the indices used to measure it. An index such as the Shannon-Wiener index is an entropy, not a diversity, and it must be converted to a diversity before it can be properly interpreted. This confusion is the subject of my paper published in 2006 in Oikos, Entropy and diversity. The second fundamental mistake is the incorrect definition of beta diversity. It is easy to prove that the current general definitions of beta (additive, as in Lande 1996, or multiplicative, as in Whittaker) don't work for most indices. This is the subject of my paper, Partitioning diversity into independent alpha and beta components, in press for the "Concepts and Synthesis" section of Ecology. The third mistake is a failure to appreciate the difference between the magnitude of an effect versus the statistical significance of an effect. Two diversity measurements might be significantly different statistically, but this does not tell us much about the real size of the difference. The magnitude of the difference might be negligible or huge ... the level of statistical significance reached has little bearing on this magnitude. To go beyond mere statistical significance and judge the real magntude of an effect (which is the more important question scientifically), it is essential to have good, well-behaved, intuitive, informative, and easily interpreted measures of diversity and similarity.

The new synthesis I propose gives us interpretable measures; it is based on two key ideas. One is the concept of effective number of species (a quantity introduced to ecology by Robert MacArthur 1965, and developed by Hill 1973) . As mentioned above, I explain the importance of this in my Oikos article Entropy and diversity, with additional explanation here. If you are comparing the diversities of two or more communities, you must convert your diversity indices to effective numbers of species, or you can reach wrong conclusions. Examples are worth a thousand words, so I have collected some examples (both imaginary and real) in Measuring the diversity of a single community and Comparing the diversities of two communities. Here you will see how misleading it can be to compare raw Shannon entropies or Gini-Simpson indices, and yet how sensible the results can be when they are done correctly using true diversities (effective numbers of species).

The other key idea in the new synthesis is the intuitive and widely accepted notion that alpha and beta must be independent of each other. They measure completely different aspects of regional diversity-- alpha measures the average within-community diversity while beta measures the between-community component of diversity. These are orthogonal dimensions which can vary independently: a high value of the alpha component should not, by itself, force the beta component to be high (or low), and vice versa. This mathematical independence between alpha and beta was made into an explicit condition on beta by Wilson and Shmida (1984), who noted that without this condition, it would be difficult to compare beta diversity between sets of communities whose alpha diversities differed. These authors also noted that levels of alpha and beta diversity may be established by different ecological mechanisms, and should therefore be separated to permit independent analysis.

Surprisingly, the standard general definitions of beta (additive or multiplicative) do not produce independent alpha and beta when applied to most indices. For example, the additive definition produces independent alpha and beta for Shannon entropy (Sum p_i ln p_i) but not for the Gini-Simpson index (1-Sum p_i ^2). This lack of independence for most indices leads to very serious problems which can completely derail an ecologist's analyses. A region consisting of a thousand equally-large completely distinct communities, each with a hundred equally common species (none shared between communities), would be described by all biologists as a region with extremely high beta diversity. A region with only two equally large communities, each with 5 equally common species, and with three of these species shared by both communities, is clearly a region of lower beta diversity. Yet by the additive definition, the Gini-Simpson beta of this second region (Hg - Ha = .04) is four times higher than that of the first (Hg - Ha = .01)! An ecologist who uses the additive definition of beta with the Gini-Simpson index is therefore very likely to reach incorrect conclusions. Anomalous results also arise when the multiplicative definition is applied to many indices, proving that ecologists do not yet have a general mathematical theory of alpha and beta that correctly captures our intuitive concepts.

This situation can be remedied by taking the independence condition as an axiom and deriving (instead of inventing) the proper relationship between alpha and beta for each index. This, plus a few other uncontroversial axioms, is enough to generate a complete new mathematics of alpha and beta for all standard diversity indices. This is the subject of my Ecology paper, Partitioning diversity into independent alpha and beta components. It turns out that the relation between alpha and beta depends on the index; there is no universal additive or multiplicative rule relating the alpha and beta components of an index. However, when these alpha and beta components of any index are converted to effective number of elements , they all follow a generalized version of Whittaker's multiplicative law, for all indices! That is, the effective number of species per community (the alpha diversity) times the effective number of distinct communities (the beta diversity) equals the effective number of species of the region (the gamma diversity). Also, it turns out that alpha and beta diversities (the effective number of elements) can be calculated directly from species frequencies and community weights without bothering to calculate diversity indices. Ecologists' most popular similarity and overlap indices, like the Jaccard, Sorensen, Horn, and Morisita-Horn indices, are just monotonic transformations of this new beta diversity.

If you only want to know what to do and how to do it, I provide examples of different kinds of diversity analyses in the links below. But I would hope that some readers would want to know why. Part 1: Theoretical background is for those readers.

I have produced a critical review of Anne Magurran's popular book, Measuring Biological Diversity. This book exemplifies the standard view of diversity measures, and the review serves to explain why these views are wrong. Click here to see the review.

I would like this to be a useful reference site for professional ecologists and students. Any questions, comments, or opposing views are welcome. Write me at the address that appears at the top of my home page.