Monocots account for a quarter of angiosperm species richness and are among the most economically and culturally important plants, including cereals (grasses), palms, orchids and lilies. Previous investigations of correlates of monocot species diversity have varied in scale and usually concentrated on a few drivers of diversification. Here, to disentangle the correlates of monocot diversity, we reconstructed a genus‐level phylogenetic tree (1987 of the 2713 genera) and compiled an extensive database of abiotic, biotic and geographical characteristics to assess whether differences in these traits correlate with the vast asymmetrical species richness among genera present in this clade. Our results support several classical biodiversity theories, including species–area relationships, and latitudinal and elevational diversity gradients. Furthermore, interactions among these factors explain an additional 10% of the variation (compared to 36% from the main effects alone). We conclude that higher species richness among monocot genera is associated with geographical variables, especially larger ranges and lower elevations, rather than physical environment or physiology.

Introduction Why certain regions and lineages contain more species than others is a long‐standing mystery and one that still challenges biologists (Connor & McCoy, 1979; Ricklefs & Renner, 1994; Hurlbert, 2004; Davies & Barraclough, 2007; Mittelbach et al., 2007; Friedman, 2009; Rabosky, 2009; Hurlbert & Jetz, 2010; Jetz & Fine, 2012) and asymmetrical species richness is especially evident in the flowering plants (e.g. Davies et al., 2004a; Kreft & Jetz, 2006), with richness in families and genera varying from single species to many thousands. The monocots, which account for 25% of angiosperm species (Anderson & Janßen, 2009) and encompass some of the most species‐rich angiosperm families, such as orchids (> 27 000 species) and grasses (> 11 000 species), are among the most culturally and economically important plants. The monocots include nearly 75 000 described species split among 11 orders and 77 families (APG IV, 2016) or 12 orders, 78 families and 2713 genera (www.e-monocot.org) and inhabit almost all terrestrial biomes on Earth. However, many genera and even families contain just a single species (e.g. Petermannia cirrosa F.Muell. is the only species of Petermanniaceae). The amazing variation in species richness among monocot groups has been the focus of numerous investigations (Givnish et al., 2005, 2014), but as yet an overarching explanation for the huge imbalance remains elusive. Monocots, with their wide variety of traits and occupied habitats, act as a useful focus for understanding the drivers of all flowering plant diversity. Increased species richness in genera may be associated with attributes that bestow increased speciation rates and/or decreased extinction rates, and numerous phenotypic, environmental and geographical traits have been suggested as affecting diversification rates in plants, including increased speciation rates and/or decreased extinction rates (Miller, 1949; Heard & Hauser, 1995), phenotypic traits (Heilbuth, 2000; Soltis et al., 2009; Givnish, 2010; Rymer et al., 2010; Johnson, FitzJohn & Smith, 2011; Drummond et al., 2012; Bouchenak‐Khelladi, Muasya & Linder, 2014; Givnish et al., 2014; Litsios et al., 2014; Spriggs, Christin & Edwards, 2014), environmental traits (Cowling & Lombard, 2002; Davies et al., 2004b; Arrigo et al., 2011; Schnitzler et al., 2011; Givnish et al., 2014; Valente et al., 2014) and geographical traits (Hughes & Eastwood, 2006; Jansson & Davies, 2008; Cowling, Procheş & Partridge, 2009; Kisel & Barraclough, 2010; Drummond et al., 2012; Trigas, Panitsa & Tsiftsis, 2013). Past attempts to explain global diversity patterns have focused on several dominant hypotheses. Based on species–area theory, the geographical area hypothesis (Terborgh, 1973) states that species that live in larger areas have the capacity for larger ranges and population sizes and are therefore less prone to extinction and are more likely to undergo speciation (Rosenzweig, 2003). Furthermore, larger areas are more likely to encompass a greater diversity of habitats (Williams, 1964), which could act as a source for ecological speciation and enable coexistence among related species. A positive correlation between area and species richness would provide support for these hypotheses. Other explanations focus instead on the physical environment. The latitudinal diversity gradient is one of the most widely recognized patterns in ecology (Rohde, 1992; Willig, Kaufman & Stevens, 2003) and describes the tendency for lower latitudes to have higher species richness than higher latitudes. One potential explanation for this pattern is the species–energy hypothesis (Connell & Orias, 1964), which states that the upper carrying capacity of an area is set by the solar energy and water available for photosynthesis. Under this hypothesis, the tropics are more species‐rich because there is a high input of light and water available all year round. Without the need to compete for scarce resources, these high‐input areas would have a higher diversity. This higher level of sustainability and lack of seasonality would also reduce extinction risk. Furthermore, the increased UV radiation might result in increased mutation rates, which might promote increased diversification. If the species–energy hypothesis is true then environmental variables (i.e. temperature and precipitation) on their own should be closely tied with species richness. Beyond latitude, other physical gradients might explain diversity, such as elevational gradients. An alternative set of explanations has focused on the role of biological traits in promoting diversity. For example, ‘key innovations’ are traits that allow for an adaptive release, such that new ecological opportunities are available. Being able to advance into new ecological territories might result in competitive release and these opportunities might allow for increased speciation to fill new niche space. Photosynthetic pathways have attracted particular interest in monocots, namely the C 4 and CAM systems that are derived from the more common C 3 pathway (e.g. Crayn et al., 2015). Evolution of these systems allows a plant to concentrate carbon and thus photosynthesize when other plants might struggle. The evolution of C 4 photosynthesis has already been implicated in promoting species diversification in some grasses (Spriggs et al., 2014). Finding a correlation between carbon‐concentrating systems with increased diversity might generalize this pattern. Finally, as well as geographical, environmental and biological explanations, variation in species richness might be linked to lineage age, whereby newly colonized regions require a minimum period of time for speciation, such that reproductive isolation can develop and restrict gene flow and outcrossing among populations (time‐for‐speciation effect; Stephens & Wiens, 2003). Furthermore, it is predicted that older lineages would have had a longer time to build up diversity and so should be expected, all else being equal, to have a higher species richness. A relationship between the age of a clade and its diversity would add weight to this idea. Because many mechanisms may determine species diversity within and among biomes and clades, individual factors may only have weak independent effects on diversity (Harvey & Purvis, 2003; Ricklefs, 2006) and it is likely that the mechanisms responsible for the origin and maintenance of plant biodiversity depend on multiple factors interacting (de Queiroz, 2002; Davies et al., 2004a; Davies, Savolainen & Chase, 2005; Schemske et al., 2009). For example, some studies have found that the evolution of C 4 photosynthesis may produce subsequent increases in speciation rates under CO 2 ‐limiting conditions (Christin et al., 2008; Spriggs et al., 2014), species–area relationships in tree diversity are modulated by their age (Fine & Ree, 2006), bilateral floral symmetry promotes speciation if pollination is biotic (Sargent, 2004) and biotic seed dispersal increases speciation rates in woody, but not herbaceous, plants (Tiffney & Mazer, 1995). More generally, the elevational diversity gradient might correlate with the latitudinal diversity gradient, such that higher elevations are centred on the tropics. The latitudinal diversity gradient might be explained by increased temperatures and precipitation (energy hypothesis), the tendency for species to have narrower ranges (Rapoport's rule; Stevens, 1989) or finer niche partitioning in the tropics thus facilitating the coexistence of more species. Higher species richness at higher latitudes might also be associated with the greater age of the taxa, as older taxa have had longer time to diversify. Finding interactions among these correlates of diversity might shed light on particular hypotheses and how they shape each other. Therefore, a combined approach with a global sample across lineages, as well as multivariate analyses, should provide clearer insights over previous single environmental or trait studies (Barraclough, Vogler & Harvey, 1998; Davies et al., 2004a). Previous investigations into the correlates of monocot diversity have varied in scale, tended to be taxonomically focused on one or a few lineages and have concentrated on a few drivers for diversification (e.g. Smith et al., 2011). Studies may also have been hampered by the taxonomic scale of the analyses, where interactions between traits acting at finer taxonomic scales were undetectable (Davies et al., 2004a). Here, we focus on genus‐level diversity in the monocots and reconstruct a fully resolved, fossil‐calibrated phylogenetic tree for monocots, encompassing 74% of all genera. We combine this with a trait database of biological, environmental and geographical variables for each genus to identify correlates of species richness in a multivariate, phylogenetic framework. We evaluate patterns previously hypothesized to correlate with increased species richness: the latitudinal diversity gradient; the geographical area hypothesis; the elevational diversity gradient; the species–energy hypothesis; key innovations; and the clade age hypothesis.

Material and Methods Phylogenetic tree reconstruction Four DNA regions (rbcL, matK, ndhF and nrITS DNA) were used for phylogenetic reconstruction (Supporting Information, Table S1). DNA from herbarium samples was extracted using the 2× CTAB method (Doyle & Doyle, 1987) and purified using QIAquick silica columns (Qiagen) according to the manufacturer's protocol. DNA concentration was quantified initially on a 1% agarose gel and using a NanoDrop2000 spectrophotometer. PCR amplifications were performed in 25–30‐μL reaction volumes. Each reaction contained between 10 and 20 ng DNA, 22.5 μL of either 1.1× ReddyMix PCR Master Mix (2.5 mm McCl 2 ) or 1.1× ReddyMix PCR Master Mix (1.5 mm McCl 2 ) (Thermo Scientific), 0.5 μL each 100 mm primer and 0.5 μL 0.04% bovine serum albumin. PCR conditions for rbcL were: 94 °C for 3 min, plus 34 cycles of 94 °C for 1 min, 48 °C for 30 s, 72 °C for 90 s and 72 °C for 7 min, and for matK 94 °C for 5 min, plus 35 cycles of 94 °C for 40 s, 48 °C for 40 s, 72 °C for 40 s and 72 °C for 7 min. A PCR gradient was also used for problematic samples, with conditions as follows: 94 °C for 3 min, plus 35 cycles of 94 °C for 30 s, 48–58 °C for 1 min, 72 °C for 40 s and 72 °C for 7 min. PCR products were purified with NucleoSpin Extract II columns (Macherey‐Nagel)) according to the manufacturer's protocol, run on a 1% agarose gel for verification and quantified using a NanoDrop2000 spectrophotometer. Sequencing reactions were performed in 10‐μL reaction volumes. Each reaction contained 0.8–3 μL cleaned PCR product, 0.75 μL each 10 mm primer, 3 μL 2.5× sequencing buffer, 0.5 μL BigDye Terminator v3.1 (Applied Biosystems) made up to volume with MilliQ. Conditions were as follows: 96 °C for 1 min, plus 25 cycles of 96 °C for 10 s, 50 °C for 5 s and 60 °C for 4 min. Products were cleaned using sodium acetate/ethanol, dried and re‐suspended in 10 μL HiDi formamide for sequencing on a 3130xL Genetic Analyzer (Applied Biosystems). Sequences were edited using either Geneious v.5.1.7 (Biomatters) or Sequencher v.4.7 (Gene Codes Corporation). Sequences were aligned using Geneious v5.1.7 (Kearse et al., 2012) and concatenated into an alignment including 5855 bp. Maximum‐likelihood analysis of the concatenated matrix was performed using RAxML (Stamatakis, 2006) via the webserver (Stamatakis, Hoover & Rougemont, 2008) with a gamma model of rate heterogeneity and a maximum‐likelihood search for the best fitting model for each of the four partitions; 14 outgroup taxa were chosen for their broad representation of monocot sister taxa (Supporting Information, Table S2). The tree had 2001 taxa including the 14 outgroups (http://dx.doi.org/10.6084/m9.figshare.1549717). An ultrametric tree (chronogram) was produced using PATHd8 (Britton et al., 2007), such that substitution rates (branch lengths) corresponded to divergence times calibrated against 20 fossils or secondary calibration points (Supporting Information, Table S3). Synonymous taxa were removed based on the latest monocot taxonomy (e‐monocot.org; Chase et al., 2015) resulting in a tree with 1816 genera. As a complementary analysis, unsampled taxa were placed on the tree as polytomies within their respective clades using the most up‐to‐date taxonomy for each taxon. These 710 missing genera were added to the tree as polytomies at the oldest node subtending the members of their respective families/tribes using the phytools v0.3‐93 package (Revell, 2012) in R v3.1.1 (R Core Team, 2014). Polytomies were resolved using the birth–death polytomy resolver method of Kuhn, Mooers & Thomas (2011). For further details on the construction of the complete monocot tree see Supporting Information (Data S1). The sequences produced in this paper are available online on GenBank under the accessions detailed in the Supporting Information (Table S1) and the base phylogenetic tree reconstructed using RAxML and made ultrametric using PATHd8 and 20 fossil calibrations can be accessed via Figshare: http://dx.doi.org/10.6084/m9.figshare.1549717. Trait database Species numbers per genus were compiled from the e‐monocot database (http://www.e-monocot.org) and supplemented by data from Chase et al. (2015) for orchids. Environmental and geographical characteristics were determined for a set of polygons representing the union of geopolitical boundaries (Taxonomic Database Working Group (TDGW) Level 3; Brummitt, 2001) and Olson's biomes (Olson et al., 2001). These subdivided regions are more biologically meaningful because they encompass a homogeneous vegetation type, which is missing from country‐level distributions. The TDGW Level 3 distribution of each species within each genus (L3 – Brummitt, 2001), which broadly coincides with countries (with larger countries further subdivided), and the biome association (according to Olson et al., 2001) of each genus were mined from either the e‐monocot database or via literature searches. These distribution systems were merged using geographical information systems (in ArcGIS; ESRI, 2013) to produce a final set of 940 polygons (L3B units). We assumed that genera were present in all the applicable biomes in an applicable country, e.g. a genus found in Brazil, which is capable of inhabiting tropical dry broad forest and tropical wet broad forest, and so in our distribution, this genus is assumed to be present in both. This assumption inflates the generic richness of some L3B units, but is still more refined than each of the constituent distribution systems (biome – 14 units and L3 – 369 units). Genera were mapped onto these polygons by the combination of e‐monocot species records (TDWG L3) and separate data on the biome associations of genera. On another level, data for bioclimatic variables (19 measures including measures of temperature, precipitation, etc.), actual evapotranspiration, area, latitude, longitude and elevation were gathered from the Bioclim, USGS and EDIT databases (http://worldclim.org/bioclim; http://lta.cr.usgs.gov/GTOPO30; http://edit.csic.es/) and mapped onto each of the 940 polygons above. The means of these variables within occupied regions were then extracted for each genus. Traits were scored for each genus and include vegetative, leaf, reproductive and other traits (habit, photosynthetic pathway, pollination mode, dispersal mode. etc.; Supporting Information, Tables S2, S4) following an extensive literature search and supplemented by data extracted from the e‐monocot database. Data analysis The level of phylogenetic imbalance was assessed using Colless's Index (I c ; Colless, 1982) using the apTreeshape (Bortolussi et al., 2006) in R (R Core Team, 2014). A medusa model (Alfaro et al., 2009) was fit on the tree to identify whether there was a significant shift in diversification rate (Harmon et al., 2008). Multiple analyses were conducted to identify which factors were correlated with species richness in monocot genera with and without taxonomically placed data. Similar results were observed for these two sets of analyses (see Supporting Information, Data S1). Here we show the results of the analyses with the dataset represented by genetic data, which included 1816 genera. A phylogenetic generalized least squares (PGLS), phylogenetic independent contrast (PIC) and sister clade analysis [SCA; using both relative rate difference (SCA RRD ) and proportion dominance index (SCA PDI ); Isaac et al., 2003] were performed. Taxa with missing data were omitted; for the represented data tree this left 1816 data points for the PGLS and the PIC analyses and 557 data points for the SCA. If several traits were tightly correlated and led to unstable or unsolvable model structures, we would have to remove these traits from the models. So, to avoid collinearity, continuous variables were removed from the analysis if they had correlation coefficients > 0.6 (Tabachnick & Fidell, 2007) and statistical associations between categorical variables were assessed using chi‐squared tests. This reduced the candidate set of explanatory variables to eight: (1) geographical: area, latitude and elevation; (2) environmental: annual precipitation (bioclim 12), annual mean temperature (bioclim 1) and maximum temperature in the warmest month (bioclim 5); and (3) biological: photosynthetic pathway and age. A list of the unused variables is provided in Supporting Information (Table S4). We log transformed the area of occupation of each genus and took the absolute values of the mean latitude of genera. To facilitate model fitting and interpretation, we standardized all continuous variables to a mean of zero (Schielzeth, 2010). These analyses were split into (1) main effects, (2) geographical, (3) environmental and (4) biological variables alone and (5) a minimum adequate model based on all the variables and their interactions. Each of these models was compared to the best model in terms of the R2 of the model. Both PGLS and PIC were performed using the caper v0.5.2 package (Orme et al., 2013) in R, and the SCAs were performed using linear models in R. Another analysis was performed to see whether the prevalence of different photosynthetic pathways (C 3 and carbon concentrating systems C 4 and CAM) differed with respect to environmental conditions. Three phylogenetic ANOVAs (pANOVAs) were performed to identify whether (1) C 4 /CAM genera had more species than C 3 genera and whether C 4 /CAM genera were more prevalent in (2) hotter and (3) drier environments. The phylogenetic ANOVAs were performed using caper in R.

Results We present a tree of monocot genera based on maximum‐likelihood analyses of 1987 genera (Fig. 1). Taxonomic placement of the missing 710 genera produced results similar to those shown in the main text, but to avoid the limitations associated with the taxonomic placement of taxa (Rabosky, 2015), we present the results based on the tree with molecular data (results from the complete‐genus tree are shown in Supporting Information). These efforts represent the largest and most up to date phylogenetic tree reconstructions for this group. The distribution of species among genera exhibits the classic hollow curve (Supporting Information, Fig. S1; P < 0.0001), with significant phylogenetic imbalance among genera in terms of species richness (Colless index = 14.4, P < 0.0001) and multiple shifts in diversification rate across the tree (medusa analysis indicates that at least 14 shifts have occurred). Figure 1 Open in figure viewer PowerPoint Summary of the monocot phylogenetic tree, indicating families. Support values for the tree can be found online ( http://dx.doi.org/10.6084/m9.figshare.1549717 ). After selecting eight variables that were not collinear with each other, PGLS, PIC and SCAs were performed to identify which variables best explained differences in species richness among genera (the data used for these analyses are shown in Supporting Information, Table S2). PGLS analysis explained more of the variation in species richness than the SCA [using either the relative rate difference (SCA RRD ) or the proportion dominance index (SCA PDI ) indices], which in turn explained more variation than the PIC (Supporting Information, Table S5). The highest R2 of all of the models was a PGLS model, which included interactions among geographical, environmental and biological correlates of diversity; this analysis had an R2 of 45.8% (PIC = 14.98%; SCA RRD = 21.47%; SCA PDI = 20.14%; Fig. 2; Supporting Information, Table S5). Figure 2 Open in figure viewer PowerPoint Comparison of the model fits of the different phylogenetic analysis methods. Phylogenetic generalized least squares (PGLS), phylogenetic independent contrasts (PIC) and sister clade analyses (SCA) with either relative rate difference (SCA RRD ) or proportion dominance index (SCA PDI ) were performed. Each of these analyses was further split into the best model (black), a model with only the main effects and no interactions among them (white), only geographical variables (hatched), only environmental variables (dark grey) or only biological variables (light grey). The model fit of each of these methods and models is shown (R2). The asterisk represents the model presented in more detail in the main text. For each of the analyses, including interactions among the main variables improved the explanatory power of the models (Fig. 2; Supporting Information, Table S5). The main effects alone explained an average of 76.14% (± 3.06) of the total explanatory power of the models (four model comparisons; Fig. 2; Supporting Information, Table S5). Performing separate analyses of geographical, environmental and biological variables indicated that most of the variation in the dataset (irrespective of which method was used) was predominantly explained by differences among the geographical variables (i.e. absolute latitude, total area and elevation; Fig. 2; Supporting Information, Table S5). Indeed, the average proportion of the variation explained by geography in the best model across all different models was 63.19% (± 4.69), whereas environmental and biological variables explained much less (7.15 ± 2.03 and 10.58 ± 3.84%, respectively; Fig. 2; Supporting Information, Table S5). Here we describe the output of the best fitting model (PGLS; Fig. 3; Supporting Information, Table S6), but the trends in this model are similar across the other analyses (Supporting Information, Tables S6–S8). This model included 1816 of the 2713 genera (66.93%) and was limited to this number because some genera had no genetic data available (710 genera) or some genera had missing trait data (171 genera) (Supporting Information, Table S2). Figure 3 Open in figure viewer PowerPoint Summary of the factors explain species diversity in monocots (PGLS models, see text for details). The following factors are shown: mean annual temperature (Mean Temp), log area of genus distribution (log Area), absolute latitude (Lat), age of the genus, mean elevation (Elev), photosynthetic pathway (PP), maximum temperature in the hottest month (Max Temp) and mean annual precipitation (Precip). Circles are proportional to the effect (effect sizes measured as the difference in logged species richness with scaled and centred explanatory variables). Branches between circles indicate interactions between variables; thickness is proportional to the strength of the interaction (1 − the standard error of each coefficient as a proportion of the coefficient value). Finally, colour indicates the direction of the effect: red is lower, blue is higher; see scale. The model output is shown in Supporting Information (Table S5) and is based on 1816 data points. Species richness was predicted by a number of factors and interactions among them (Fig. 3; Supporting Information, Table S6). The maximal model started with eight main effects and all of the pairwise interactions (28 in total); it was then simplified to the final minimum adequate model. This minimum adequate model had eight main effects and 14 interactions and explained 46% of the variation in species richness (PGLS: F 22,1793 = 68.9, P < 0.0001; Supporting Information, Fig. S2). Of the main effects, higher species richness was predicted in genera with larger areas (t = 25.9, P < 0.0001), lower latitudes (t = −3.68, P < 0.0001; Fig. 4) and lower elevations (t = −5.58, P < 0.0001), in cooler environments with lower maximum temperatures (t = −5.98, P < 0.001), in drier environments (t = −2.78, P < 0.001) and with carbon‐concentrating photosynthetic pathways (C 4 or CAM, t = 4.58, P < 0.0001; Fig. 5). Clade age was a significant factor in the model, but it had a comparably small positive effect on species richness. Interactions among the variables explained an additional 10% of the data (full model R2 = 45.81%, main effects model R2 = 35.88%). Genera inhabiting larger areas with higher mean annual temperatures had fewer species (t = −9.21, P < 0.0001), whereas genera inhabiting larger areas with higher maximum temperatures had more species (t = 5.32, P < 0.0001). Genera inhabiting large areas at higher latitudes had fewer species (t = −9.03, P < 0.0001), whereas genera inhabiting higher latitudes with higher annual, maximum temperatures, annual precipitation or elevations had more species (t = 2.92, P = 0.0036; t = 3.59, P = 0.0003; t = 3.29, P = 0.001; t = 6.55, P < 0.0001, respectively). At high elevations, genera tended to have more species if the environment has higher maximum temperatures (t = 2.65, P = 0.0081) or if they were in a smaller area (t = −3.18, P = 0.0014). Among the bioclimatic variables, wetter and hotter environments harboured more species (t = 3.55, P = 0.00038). Genera occupying areas with high mean annual temperatures with high maximum temperatures had more species (t = 2.58, P = 0.01) and so did older genera from hotter or drier environments (t = 2.91, P = 0.0037; t = 2.14, P = 0.033, respectively). No interactions were found with photosynthetic pathway and the other variables, but at the genus level, genera with C 4 or CAM photosynthetic pathways are more prevalent in environments with warmer (pANOVA: t = −10.06, P < 0.001) or drier conditions (pANOVA: t = 5.71, P < 0.001; Fig. 5). Figure 4 Open in figure viewer PowerPoint Latitudinal gradient of diversity at a species (A) and a genus level (B). Figure 5 Open in figure viewer PowerPoint Difference in species richness (log scale; A), annual mean temperature (B; °C) and annual precipitation (C; mm) of genera with either C 3 or C 4 /CAM photosynthetic pathways. Asterisks represent the significance (P < 0.001) of the difference between the photosynthetic pathways for each of the phylogenetic ANOVA tests.

Discussion Using three different means of accounting for phylogenetic autocorrelation (PGLS, PIC and SCA), we found multiple correlates of diversity in monocots. Similar trends were found among the three different analyses, but the best model fit was found with the PGLS analysis, and so this is discussed here. Using a PGLS framework allowed us to model 46% of the variation in species richness among monocot genera. A multiple regression of the main effects alone accounts for the majority of the explanatory power of the model (78% of the explained variation), but interactions among main effects are also important (22% of the explained variation). Separate analyses of the geographical, environmental and biological variables indicate that geographical variables are the main source of explained variation (76.84% of the explained variation), whereas environment and biology (although tied in with the geographical variables; i.e. absolute latitude and mean annual temperature) explain a much smaller proportion of the variation (3.22 and 5.82%, respectively). To our knowledge, this is the first effort made to assess simultaneously multiple interacting correlates of diversity while controlling for phylogenetic autocorrelation, and to rank them categorically in terms of importance. In turn, we discuss the multiple correlates and the biological patterns that could explain them. We found a strong relationship between area and species richness. This was by far the most strongly supported pattern in any of the datasets and analyses; on its own, area accounted for 32.9% of the explained variation in species richness. Genera occupying larger areas tended to have more species, a pattern well documented in the literature (Arrhenius, 1921; Rosenzweig, 1995; Losos & Schluter, 2000; Kisel & Barraclough, 2010). However, the direction of causality underpinning the clade area–richness relationship is unclear. On the one hand, larger areas are expected to provide more opportunities for diversification because they are likely to encompass greater habitat diversity while also reducing the impact of competition by allowing species to partition the habitat (habitat diversity hypothesis). The strength of gene flow is also expected to decrease with distance within large ranges, which may promote reproductive isolation (Slatkin, 1973, 1985). Furthermore, larger species range sizes (and therefore possibly also genus larger range sizes) may lower extinction risk by supporting larger population sizes and providing refugia (Davies et al., 2004b). On the other hand, the ability of a genus to occupy a broader range may be a consequence of its evolutionary lability or, alternatively, greater landscape homogeneity, and that these factors may thus be the real proximate determinants of area‐linked species richness. The latitudinal diversity gradient remains one of the most studied patterns in ecology (de Candolle, 1855; Rohde, 1992; Gaston & Blackburn, 2000; Willig et al., 2003; Mora & Robertson, 2005). Our analyses again confirm this observation, with monocot species and genus richness decreasing from the equator to the poles (McInnes et al., 2013). More than a dozen hypotheses have been proposed to explain this pattern. For example, niche partitioning along latitudes would increase diversification rates, perhaps due to the higher availability of niches (Mayr, 1942), or the increased energy supply promoting a higher carrying capacity (Wright, 1983; Rohde, 1992). Higher species richness at lower latitudes might also be linked to increased temperature, which is associated with higher productivity and higher UV radiation, and thereby higher mutation rates and faster evolution, i.e. the energy hypothesis (Connell & Orias, 1964). We also find evidence for Rapoport's rule (Stevens, 1989); i.e. our model indicates that even greater species richness is associated with lower latitudes when the range size of the genera is smaller. Our analyses are consistent with the hypothesis that species inhabiting the tropics partition their niches more finely (i.e. a correlate of range size) and therefore genera in the tropics tend to have higher species richness. We did not find support for the energy hypothesis. Higher temperatures are correlated with lower latitudes (linear model: P < 0.0001, R2 = 68.16) and when these factors are assessed separately we find that both lower latitudes and higher temperatures are associated with higher species richness, but assessed together much of this variation is accounted for by latitude. Furthermore, the goodness of fit of the latitude–species richness relationship is higher than the temperature–species richness relationship (R2 = 5.74% vs. 4.18%, respectively). Therefore, in our multivariate model, which explains nearly ten times more of the variation than these two univariate models, we hypothesize that the variation in species richness is better explained by latitude than temperature and that the correlation between temperature and species richness offers additional insight (curvature) into these patterns. It is possible that latitude provides a better proxy of past environmental conditions than do measures of average physical environment over recent time. We found a significant negative correlation between elevation and species richness as has been documented before (MacArthur, 1972; Brown & Gibson, 1983; Begon, Townsend & Harper, 2005). At higher elevations the environment becomes more adverse to life, such that there is a decreasing availability of resources (Padien & Lajtha, 1992) and/or a need for plants to evolve adaptations to lower energy/temperature and water availability (Rahbek & Museum, 1995; McCain & Grytnes, 2010). The necessity for these adaptations probably constrains the number of species that can inhabit these habitats or plants themselves may be unable to adapt to these climes. Our data indicate that the limited water and light availability at higher elevations impedes diversification and/or colonization; the corollary of this is that higher elevations that benefit from higher levels of precipitation and higher temperatures have higher species richness. Finding these significant interactions indicates that both water and temperature limitations at higher elevations are the restricting factors for diversity there. Similar to the relationship between latitude and area, we found evidence that genera at higher elevations but with small areas are more species rich, which is contrary to the idea that species richness at high elevations is restricted by available area (Rahbek & Museum, 1995). We would expect this pattern if genera at higher elevation have more finely partitioned the available area/niches, thus promoting higher species richness. Indeed, mountainous areas tend to be highly dissected by erosion, which may favour higher rates of allopatric speciation than less dissected habitats at lower elevations; this hypothesis would explain both the genesis and the maintenance of diversity in these cases (e.g. Lupinus L. in the Andes; Hughes & Eastwood, 2006). With regard to the environment, we found that higher species richness (of genera) is associated with lower temperatures and drier conditions, which is directly contradictory to the finding that there are more species in the hot and wet tropics and the energy hypothesis. We think that these analyses are skewed by the high abundance of species‐poor genera in the tropics. Here, we treated genera as our unit of diversity, but we accept this shrouds correlates of diversity that occur at the species level. If we were to use species as our unit of biodiversity, unrestricted by which genus they belonged to, then we would expect the patterns to be more in line with classical theory (i.e. the energy relationship). The observed pattern of low species richness associated with tropical environments is due to the fact that there is a high abundance of species‐poor monocot genera in the tropics, typically those that have higher levels of endemism. For example, 70% of the genera are between latitudes −10° and 10°, but only 52% of the species, whereas outside of this latitudinal band, the proportion of genera is lower than the proportion of species. This pattern indicates that there is a higher proportion of endemism in the tropics and this skews our analyses because we use species per genus as our measure of biodiversity. Another explanation for the surprisingly high number of small genera in the tropics is that the rate of morphological change is likely to be higher there for reasons discussed above; this would have allowed taxonomists to resolve relationships more finely and probably influenced them to recognize more, smaller genera. If so, this pattern would be explained as a taxonomic artefact. We found that photosynthetic pathways that concentrate carbon were associated with increased diversity, which concurs with more taxonomically focused investigations [e.g. in Poaceae (Christin et al., 2008; Spriggs et al., 2014) or Bromeliaceae (Givnish et al., 2014)]. We hypothesized that these pathways would be correlated with higher species per genus ratios in hotter and drier environments, where the loss of water through typical gas exchange is higher (Sage, 1999). Indeed, we found that there were more genera with such photosynthetic mechanisms in hotter and drier environments, but no significant interactions in the PGLS. Again, assessing diversity patterns at a genus level would probably provide different conclusions to those at a species level. We believe that this pattern could be due to the following: (1) hotter and drier environments promote the evolution of C 4 or CAM photosynthesis and higher species richness, but that this diversity is partitioned among many genera (leading to fewer species per genus), which might be due to these few species per genus filling those niche spaces with none left to partition among more species; and (2) our scoring of photosynthetic pathways might be too broad, as genera with at least one C 4 or CAM species were scored to have carbon‐concentrating pathways, which potentially inflates the related number of species. Finer‐scale phylogenetics at the species level coupled with the species‐level coding of characters would provide a more powerful test. The lack of relationship between clade age and species richness pattern indicates that either species richness has become decoupled with time due to factors such as ecological limits (i.e. diversity‐dependent diversification; Rabosky, 2009) or that patterns across the whole tree on this scale mask opposing patterns that may be apparent at finer scales. There are a few other limitations in our analyses. Firstly, examining genera with the most poorly predicted species richness indicates a number of additional important factors to consider in future research. Like Davies et al. (2004b, 2005), our model predicts much less diversity than is actually present in the African Cape biodiversity hotspot where it has already been acknowledged that extraordinary drivers of diversity are at play (Davies et al., 2005; Schnitzler et al., 2011). The high species richness of Restio L. (Restionaceae; 167 species) and Geissorhiza D.Dietr. (Iridaceae; 97 species) is probably due to factors such as high diversity of pollinators (Goldblatt & Manning, 2002; Davies et al., 2005), the climatic stability of the region (Klopfer, 1959; Linder, 2003; Cowling et al., 2009) and edaphic heterogeneity (Schnitzler et al., 2011). Similarly, the bamboo genus Fargesia Franch. (Poaceae; 87 species), with a narrow distribution at high elevations in East Asia (Rietze, 2001), is massively underestimated. Secondly, multi‐collinearity, the tight correlation of two or more traits (Graham, 2003), forced us to remove traits that could have otherwise explained more of the variation in species richness. The inclusion of these correlated traits led to unstable or unsolvable model structures, but removal of these might have affected the explanatory power of the model. Traits such as pollination syndrome, leaf venation, seed dispersal and habit had to be removed from the analysis because they are tightly correlated with each other (e.g. C 3 grasses have wind pollination, linear leaf venation, wind‐dispersed seeds and a perennial life history). The variables dropped include factors that are known to influence diversification in specific contexts and/or specific clades (pollination system, leaf form, dispersal mode, etc.); to remedy this, future analyses will need to be performed at a finer taxonomic scale whereby there will be more independent contrasts between the traits. Finally, a limitation of our approach is the averaging of environmental and geographical variables associated with widely distributed genera, which might tend to have higher species richness due to the availability of more niches or, conversely, the higher species richness in genera might have promoted post‐speciation range expansion. In our model, the mean of their environmental and geographical variables is used for the dataset. For example, Carex L., which has the greatest number of species of any monocot genus (>2000 species) and a worldwide distribution, has averaged environmental and geographical data ranges that are unrepresentative for much of the species diversity within the genus. Again, a fine‐scale assessment of species‐level data and diversification rates would be ideal. Undoubtedly, a species‐level tree with c. 75 000 tips would add a huge number of data points that would allow one to better assess variation in diversification rates and independent contrasts for the biotic variables (Zanne et al., 2014). Finer‐scale taxonomic sampling coupled with more explanatory variables (i.e. finer‐scale geographical ranges, temperatures and pollination syndromes) would also help explain diversity differences in more detail (Barraclough et al., 1998). However, until then, this multivariate, phylogenetic framework is the most comprehensive assessment of monocot correlates of diversity to date. We show that despite a genus‐level sampling, general evolutionary patterns are inherently observable across the rich diversity of monocots and that geographical explanations for variation in species richness are the best fit.

Acknowledgements We thank Neil Brummit, Justin Moat and Rafael Govaerts for their help with distribution maps, Mike Fay and Felix Forest for providing DNA samples, Martyn Powell for his help in the laboratory and Peter Weston and an anonymous referee for comments. We thank the Leverhulme Trust and the Natural Environment Research Council for funding.

Author Contributions VS and TGB designed the study; SP carried out the molecular laboratory work, CQT and FAJ constructed phylogenetic trees and CQT, CDLO, LB and FAJ conducted data analyses under VS's supervision; CQT and VS wrote the manuscript with significant contributions from MWC; all authors commented on the manuscript.

Supporting Information Filename Description boj12497-sup-0001-FigS1.tiffTIFF image, 2.2 MB Figure S1. 72 710 species distributed among the 2713 monocot genera show a hollow curve distribution (red line). boj12497-sup-0002-FigS2.tiffTIFF image, 153.9 MB Figure S2. Diagnostic plots for the PIC model. boj12497-sup-0003-FigS3.pngPNG image, 14.3 KB Figure S3. Comparison of the model fits of the different phylogenetic analysis methods. boj12497-sup-0004-TableS1.docxWord document, 32.9 KB Table S1. GenBank accession numberss for the genetic data (rbcL, matK, ndhF, ITS) mined and generated for this study. boj12497-sup-0005-Legends.docxWord document, 33 KB Table S2. Trait data used in the models. Table S3. Fossils used to calibrate the phylogeny. Table S4. Trait data collated for this study but removed from the analyses due to multicollinearity. Table S5. Summary of the total explained variation of the various comparisons. Table S6. Model output from the phylogenetic generalized least squares models: best fitting and, separately, the main effects (geographical, environmental and biological variables). Table S7. Model output from the phylogenetic independent contrast models: best fitting and, separately, the main effects (geographical, environmental and biological variables). Table S8. Model output from the sister clade analysis models for both relative rate difference and proportional dominance indices: best fitting and, separately, the main effects (geographical, environmental and biological variables). Table S9. Summary of the total explained variation of the various comparisons. Table S10. Model output from the phylogenetic generalized least squares models from the phylogeny with taxonomically placed tips: best fitting and, separately, the main effects (geographical, environmental and biological variables). Table S11. Model output from the phylogenetic independent contrast models from the phylogeny with taxonomically placed tips: best fitting and, separately, the main effects (geographical, environmental and biological variables). Table S12. Model output from the sister clade analysis models for both relative rate difference and proportional dominance index indices from the phylogeny with taxonomically placed tips: best fitting and, separately, the main effects (geographical, environmental and biological variables). Data S1. All data analyses. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.