Germplasm and plant growth conditions

A total of 858 unique maize hybrids were tested in 21 environments across 14 states in the United States and one province in Canada for a total of 12,678 field plots in the summer of 2014 as a part of the G2F initiative. Each of the 21 environments grew a set of 250 hybrids in two field replications. The environments ranged from latitudes between 30.54° and 44.07° and longitudes between −101.99° and −75.20°. For more details about specific agronomic practice and growing conditions for each location, please refer to the metadata at https://doi.org/10.7946/P2V888. Female parents of hybrids were classified into eight pools based on genetic background. Briefly, those pools were: (1) Recombinant inbred lines (RILs) from the Intermated B73/Mo17 population (IBM)32; (2) RILs from the nested association mapping (NAM)33 population involving B73/Oh43; (3) RILs from the NAM population involving B73/Ki3; (4) Public and expired plant variety protection (PVP) lines belonging to the Iodent group; (5) Public and expired PVP lines belonging to the Stiff Stalk group; (6) Public and expired PVP lines belonging to the Lancaster group; (7) Public lines originating from the Texas A&M AgriLife corn breeding programs; (8) RILs developed by the University of Wisconsin’s biomass breeding program. The first maize inbred sequenced34, B73, is a parent of pools 1–3 and a founding member of the Stiff Stalk group represented by pool 5. Pools 4 and 6 represent the Iodent and Lancaster groups which are commonly crossed to Stiff Stalk materials in public and private breeding programs. Pool 7 represents temperate and exotic germplasm selected for adaptation to Texas35, while pool 8 contains RILS derived from diverse parents showing segregation for various biomass related traits. Pools 1 through 8 were crossed with up to five male testers (PB80, LH195, CG102, LH198, and LH185), and each pool-by-tester family was designated as a “set” (Supplementary Table 1). In addition to the sets created by the crosses described above, there were two additional sets: the first comprised of single locally adapted (in some cases commercial) check hybrids chosen by each principal investigator for their location, and the other comprised of a common set of hybrids grown in all locations. Sets were assigned to specific locations based on expected maturity with the exception of the set of common check hybrids, which were grown in all locations, and the locally adapted checks, which were grown only in their individual locations.

Field experimental design

The experiment followed a modified form of a split-plot design with individual sets as the whole-plot factor arranged in a randomized complete block design and hybrids as the subplot factor. The design differed from a classical split-plot because the subplot factor (hybrid) was nested within the whole-plot factor (set). This design is also referred to as a sets-in-replicates design. Two complete replicates of each hybrid were planted at each location; within each replicate, each set was grown in a block, and block order was randomized within replicate. Hybrid order was randomized within each whole-plot block. The locally adapted hybrid check selected by each investigator at each location was incorporated into each block within each replicate. Weather data were collected at each location (Supplementary Note 1).

Phenotypic data

Eleven morphological and agronomic traits were measured for all hybrids and across all locations. Methods for their measurement were standardized project-wide. A detailed description of phenotyping guidelines is available at https://doi.org/10.7946/P2V888. Days to anthesis was measured as the number of days between planting and half the plot exhibiting anther exertion on more than half of the main tassel spike. Days to silking was assessed as the number of days between planting and half the plot showing silk emergence. Ear height was the distance from the ground to the uppermost ear bearing node. Plant height was measured as the distance from the ground to the ligule of the uppermost leaf. Plot weight was the weight in grams of the shelled grain collected in each plot, and test weight (a measure of grain density) was recorded as pounds per bushel. Root lodging and stalk lodging were recorded respectively as the number of plants leaning more than 15 degrees from vertical and as the number of plants with broken stalks between the ground and the primary ear node. Stand count was recorded as the number of plants per plot at harvest. Grain moisture was measured as the percent water content in the grain at the time of grain harvest. Grain yield in bushels per acre assumed a 56 pounds per bushel conversion, 15.5% grain moisture, and used plot area measured without the alley. The calculation for grain yield was grain yield = (plot weight)×(1–0.01×moisture) × (area−1) ×920.5401. Ear height and plant height were measured on one to five representative plants per plot depending on the location while all other measurements were representative of the entire plot. Full phenotypic data can be found at https://doi.org/10.7946/P2V888.

Experimental design random effects model

As detailed in the field experimental design section, hybrids were classified into sets based on the female pool and the male tester. Hybrid genotypes were grown in a modified split-plot design. To calculate the variance attributable to each element of the field experimental design, we modeled each phenotype as \(y = E + R\left( E \right) + S + E \times S + R \times S\left( E \right) + L\left( S \right) + L \times E\left( S \right) + e\), where E represents the environmental effect; \(R\left( E \right)\) is the effect of replication nested within environment; S is the set effect, \(E \times S\) is the interaction term of environmental and set effects; \(R \times S\left( E \right)\) is the interaction term of replication by set, nested within environment; \(L\left( S \right)\) is the hybrid line effect nested within set; \(L \times E\left( S \right)\) is the interaction term of hybrid line by environment, nested within set; and e is the error term. Models were fit in R36 using the lmer() function in the lme4 package37. Variance component estimates were expressed as a percentage of the total variance. Predictions of hybrid effects were recorded as the best linear unbiased predictions (BLUPs) for hybrid line nested within set.

Genotypic data

A set of 336 inbred lines were used to generate the hybrid sets tested in the 2014 experiment. Sequencing data for 232 of the inbred lines used in this evaluation were downloaded from the ZeaGBSv2.7 Panzea release (http://www.panzea.org/#!genotypes/cctl). DNA for the remaining inbreds was extracted and genotyped using genotype-by-sequencing (GBS) following the protocol described by Elshire et al.38 at 96 plex. Genotypes were called using the Tassel5-GBS Production Pipeline with the ZeaGBSv2.7 Production TagsOnPhysicalMap(TOPM) file that was built using information about ~32,000 additional Zea samples39 (AllZeaGBSv2.7_ProdTOPM_20130605.topm.h5, available at panzea.org). Imputation was performed with FILLIN40 using the available set of maize donor haplotypes with 8k windows (AllZeaGBSv2.7impV5_AnonDonors8k.tar.gz, available at panzea.org). FILLIN has been shown to have an imputation accuracy of 0.996 on temperate inbred materials representative of the germplasm used in this study. All GBS samples used are listed in Supplementary Data 1. Available GBS data can be found at https://doi.org/10.7946/P2V888.

Synthetic hybrid genotypes for 624 hybrids were generated based on genotypes of parental inbreds. The subset of hybrids for which genotypes were calculated was based on availability and quality of parental genotypes, not deliberately chosen. Parental genotypes were coded as the number of major alleles at each locus (0, 1, or 2) and the hybrid genotype for each hybrid at each locus was computed as the mean of its two parents at that same locus.

G × E variation explained by high and low F ST regions

A set of 30 inbred lines which have undergone selection for high productivity in temperate growing conditions and 30 inbred lines selected for productivity in tropical climates were chosen to use for identification of genomic regions that are candidates for having undergone differential selection for growth in temperate conditions (Supplementary Table 2). Overlapping SNPs between ZeaGBS 2.739 and Hapmap 3.126 (341,048 SNPs) were used to perform a Multidimensional Scaling (MDS) analysis on the 916 Zea accessions described in Hapmap 3.1, following the procedure described by Romay et al.41 Based on the location of the inbreds that were part of Hapmap 242, inbreds with coordinates <−0.5 on the first coordinate and above 0 on the second coordinate were classified as temperate selected while inbreds with coordinates > 0.5 on the first coordinate and greater than 0 on the second coordinate were classified as tropical selected (Fig. 2). Thirty individuals were chosen from each group (Supplementary Table 2) based on pedigree, genetic distance from other individuals of the group (identity by state <0.9541), and missing SNP data. Thirty individuals from each group was the sample size that best balanced the amount of missing data between temperate and tropical lines. Previous publications have computed F ST between tropical and temperate materials with as few as 16 and 11 individuals per group (respectively)27. F ST 43 between the two groups was calculated for each SNP in Hapmap 3.1 using VCFtools44. The unweighted average was calculated to determine an F ST value for every 20-SNP interval. From the F ST results, 1248 SNPs in the hybrid lines were identified as present in regions that are more probable to have been subject to selection (windows with F ST values greater than 0.5), while 263,243 SNPs in the hybrid lines were chosen as present in regions that are unlikely to have been selected (windows with F ST values <0.15). Per-site pairwise nucleotide diversity was assessed in the high and low F ST regions within the temperate and tropical inbreds to assess evidence of divergent selection vs. directional selection within individual subpopulations. Nucleotide diversity calculations were performed with the Hapmap 3.1 sequencing data using SAMtools45 and ANGSD46. Because the high and low F ST SNP groups were chosen based on mean values of 20-SNP windows, some of the SNPs that were included in the high F ST group do not have an individual F ST greater than 0.5. The 736 high F ST SNPs that overlap between Hapmap 3.1 and the hybrid line genotypes were used to evaluate allele frequencies between the temperate and tropical groups, as well as F ST values at individual SNPs (Fig. 3). Minor allele frequencies in the hybrid lines of the high F ST SNPs, low F ST SNPs, and entire SNP set were calculated as \({\mathrm{min}}\left( {\frac{{\mu _m}}{2},1 - \frac{{\mu _m}}{2}} \right)\), where μ m was the mean at marker m. Minor allele frequency distributions were compared visually, and no major differences between the distributions were noted (Fig. 3). Despite inbred imputation, 16% of 372,273 SNPs in the hybrid genotypic data were still missing, ranging from 0 to 56% missing on a per-SNP basis and from 3 to 56% missing on a per-hybrid basis. Missing hybrid genotypes for each marker m were imputed by weighted random draws from the genotypes present at m, where the weights correspond to the genotype frequencies of m. To calculate empirical allele frequency based imputation accuracy, we performed 10,000 iterations of masking a single known SNP and comparing it to its imputed value, with an empirical imputation accuracy of 85.6%.

We calculated G × E variance explained by SNPs with high and low F ST values using a method similar to the variance components approach described by Gusev et al.28, but structured to evaluate interactions between the environments and specific loci rather than heritability estimations of functional categories47. The model describes the response of the i th hybrid in the j th environment as follows:

$$y_{ij} = \mu + E_j + g_i + \left( {g_LE} \right)_{ij} + \left( {g_HE} \right)_{ij} + e_{ij},$$

where \(\mu \) is the overall mean; \(E_j\) (j = 1,…,J) denotes the random effect of the j th environment such that \(E_j\mathop {\sim }\limits^{iid} N(0,\sigma _E^2)\) with \(\sigma _E^2\) as the variance of the environments; \(g_i = \mathop {\sum}

olimits_{m = 1}^p {x_{im}b_m} \) is a linear combination between p marker covariates \(x_{im}\) (m = 1,..,p) and their correspondent marker effects \(b_m\) such that \({\mathbf{g}} = \left\{ {{\mathrm{g}}_i} \right\}\sim N ( {0,{\boldsymbol{G}}\sigma _{\bf{g}}^2} ) ,\) where \({\boldsymbol{G}}\) is the genomic relationships matrix (GRM) computed using all 227,287 polymorphic markers with minor allele frequency >0.05, and \(\sigma _{\mathbf{g}}^2\) is the genomic variance; \(\left( {g_LE} \right)_{ij}\) represents the interaction between each SNP with low F ST and each environment such that \({\boldsymbol{g}}_{\boldsymbol{L}}{\boldsymbol{E}} = \{ {( {g_LE} )_{ij}} \}\sim {\mathbf{N}}( {0,( {{\boldsymbol{Z}}_{\boldsymbol{g}}{\boldsymbol{G}}_L{\boldsymbol{Z}}^{{'}}_{\boldsymbol{g}}} )^{\circ} ( {{\boldsymbol{Z}}_{\boldsymbol{E}}{\boldsymbol{Z}}^{{'}}_{\boldsymbol{E}}} )\sigma _{g_LE}^2} )\), where \({\boldsymbol{Z}}_{\boldsymbol{g}}\) and \({\boldsymbol{Z}}_{\boldsymbol{E}}\) are the incidence matrices for genotypes and environments, \(\sigma _{g_LE}^2\) is the associated variance parameter and ‘\( \circ \)’ stands for Hadamard or Schur (element-by-element) product between two matrices; similarly \(\left( {g_HE} \right)_{ij}\) represents a random effect of the interaction between SNPs with high F ST and the environments with \({\boldsymbol{g}}_{\boldsymbol{H}}{\boldsymbol{E}} = \{ {( {g_HE} )_{ij}} \}\sim {\mathbf{N}}( {0,( {{\boldsymbol{Z}}_{\boldsymbol{g}}{\boldsymbol{G}}_H{\boldsymbol{Z}}^{{'}}_{\boldsymbol{g}}} )^\circ ( {{\boldsymbol{Z}}_{\boldsymbol{E}}{\boldsymbol{Z}}^{{'}}_{\boldsymbol{E}}} )\sigma _{g_HE}^2} )\) and \(\sigma _{g_HE}^2\) acting as the variance component. \({\boldsymbol{G}}_H\) was a GRM computed using the 1248 SNPs whose F ST values were above 0.5 and \({\boldsymbol{G}}_L\) was a GRM computed using random samples of 1248 SNPs from the low F ST set of 263,243 SNPs. This model was fit 1000 times with random subsets of the low F ST SNPs, and the calculated variance components were recorded. Models were fit using the BGLR48 package in R36. Residuals across all model fittings followed a distribution approaching normality, and heuristic assessment of equal variance between environments49 was satisfied. Therefore, common transformations of phenotypic data were not explored.

Because presence of G × E variance is dependent on the presence of both genetic and environmental variances, we tested for the presence of genetic variance attributable to high and low F ST SNPs using the same model as described above, but with the \(g_i\) term split into \((g_H)_i\) and \((g_L)_i\) where \((g_H)_i\) was calculated using the 1248 high F ST SNPs, and \((g_L)_i\) was calculated using 1248 SNPs randomly subset from the 263,243 low F ST SNPs. The model was fit 1000 times for both grain yield and plant height and the calculated variance components were recorded.

The hypothesis being tested assumes that hybrids used in this field evaluation are representative of only temperate selected germplasm. A small number of hybrids had inbred line Ki3 as a parent, which is of tropical origin and as such could contain alleles that are not representative of germplasm selected for growth in temperate conditions. The variance decomposition analysis described above was run with hybrid lines containing Ki3 parentage both included and excluded, with no differences observed. With the hybrid lines containing Ki3 parentage removed from the data set, 552 unique hybrids were included in each model fitting.

Classification of variants associated with G × E

Hybrid stability was calculated using a method similar to the Finlay–Wilkinson regression25. Environmental means were calculated using 21 check hybrids that were grown in at least 20 of 21 environments. Ear height, plant height, number of days to silk and anthesis, and yield values for each hybrid were regressed on the respective environmental means by simple linear regression: \(y_{ij} = \beta _0 + \beta _1x_j + e_{ij}\), where \(y_{ij}\)is the phenotype of replicate i in environment j, \(x_j\) is the mean of the checks in environment j, and \(e_{ij}\) is a random error term. The deviation from a slope of one (i.e., \(\beta _1 - 1\), representative of deviation from the mean response of the checks and hereafter referred to simply as slope) and mean squared error (MSE) for each hybrid’s regression were recorded. Hybrids with less than six recorded observations or with observations recorded in less than four environments for a particular trait were excluded from further analysis.

BLUPs for the traits per se were calculated as the hybrid line within set effect from fitting the experimental design random effects model. Slope, MSE, and traits per se were used as response variables in a genome-wide associate study (GWAS) with the synthetic hybrid genotypic data. Synthetic hybrid SNPs were filtered to 413,796 SNPs with <80% missing data, a mean of 20% missing, and 95% of SNPs having less than 61% missing. GWAS was performed using the software GAPIT50 (Supplementary Figs. 9–11), with minor allele frequency threshold of 0.5%, kinship calculated by the VanRaden method51 using only individuals for which the response variable was present, and default parameters otherwise.

The 50 SNPs with the lowest p-values from each GWAS for slope were pooled. If any of the top 50 SNPs were within 5 kilobases of each other and in LD (\(r^2 >0.5\)), only the most significant SNP was retained. LD was calculated using PLINK v1.952. Results from each GWAS for MSE and the traits per se were pooled in the same manner. Base pair (bp) distances from the pooled SNPs to the closest gene were calculated in a manner similar to that described by Wallace et al.30, but rather than calculate the absolute distance to the nearest gene we also calculated whether each SNP was upstream or downstream (5ʹ or 3ʹ) based on annotated gene orientation in the B73 AGPv2 reference genome (ftp.gramene.org/pub/gramene/maizesequence.org/release-5b/filtered-set/ZmB73_5b_FGS.gff.gz). A null distribution of distance to the closest gene was calculated using all 421,142 SNPs in the hybrid genotype data set. We chose 50 SNPs per trait/parameter combination because it closely represented the proportion of total SNPs used in the Wallace et al.30 study. For the null, slope, MSE, and trait per se distances, SNPs were categorized as either upstream or downstream and as genic (within a gene), gene-proximal (1–5000 bp to closest gene), or intergenic (>5000 bp to closest gene). Tests for enrichment or reduction of slope- or MSE-associated SNPs in each position category were performed against the null distribution using a two-sided exact binomial test.

Code availability

Scripts for modeling variance attributable to high and low F ST regions can be found in Supplementary Software 1. Scripts for nucleotide diversity, stability analysis, GWAS, and distance to the nearest gene can be found at https://github.com/joegage/GxE_scripts.

Data availability

Hybrid phenotypic data, inbred genotypic data, weather data, metadata, and readme files are publicly available at https://doi.org/10.7946/P2V888. All relevant data are available from the authors upon request.