In the study of group and sex differences in multivariate domains such as personality and aggression, univariate effect sizes may underestimate the extent to which groups differ from one another. When multivariate effect sizes such as Mahalanobis D are employed, sex differences are often found to be considerably larger than commonly assumed. In this paper, I review and discuss recent criticism concerning the validity of D as an effect size in psychological research. I conclude that the main arguments against D are incorrect, logically inconsistent, or easily answered on methodological grounds. When correctly employed and interpreted, D provides a valid, convenient measure of group and sex differences in multivariate domains.

Conclusion Psychologists routinely measure and discuss sex differences as collections of univariate effects, even when dealing with highly multidimensional domains such as personality, emotional experience and expression, cognitive abilities, vocational interests, and sexuality (e.g., Hyde, 2013). This would be akin to measuring sexual dimorphism in face or body shape by considering only one trait at a time, without ever trying to aggregate variables into the bigger picture of global similarity/dissimilarity patterns. Predictably, this approach can easily lead researchers to underestimate the magnitude of sex differences in many important domains. Multivariate effect sizes such as D offer more realistic estimates of global patterns of similarity and dissimilarity in personality, cognition, and behavior. However, the increase in measured sex differences brought about by a multivariate approach has led some researchers to react by questioning the validity of the method itself. As I have shown in this paper, the criticism directed against D does not hold up to scrutiny. Contrary to the critics' beliefs, D is a valid, interpretable measure of group and sex differences. Of course this does not mean that D is without limitations, or that it should be employed without regard for theoretical validity and/or methodological caution. For example, sample size must be commensurate with the number of variables included in the computation in order to minimize bias; having at least 100 cases per variable should prove a reasonable rule of thumb in most research contexts. Other methodological issues are discussed in Del Giudice (2009). By adding D to their analytical toolkit, researchers who deal with group and sex differences will increase their ability to see the forest for the trees, and gain a deeper appreciation of the many ways in which human beings resemble and differ from one another.

Acknowledgements I am indebted to Drew Bailey for his insightful comments and suggestions.