Study subjects and human samples

All samples and information were collected with written and signed informed consent. One hundred and nine children genetically predisposed to T1D were enrolled in the BABYDIET study. The BABYDIET study is an intensively monitored dietary intervention study testing the potential effect of delayed gluten exposure on the development of islet autoimmunity in children at increased risk for diabetes in Germany. Children younger than 3 months with at least one first-degree relative with T1D and one of three specific T1D-associated HLA genotypes (DRB1*03-DQA1*05:01-DQB1*0201/DRB1*04-DQA1*03:01-DQB1*03:02; DRB1*04-DQA1*03:01-DQB1*03:02/DRB1*04-DQA1*03:01-DQB1*03:02 or DRB1*03-QA1*05:01DQB1*02:01/DRB1*03-DQA1*05:01DQB1*02:01) were recruited between 2,000 and 2,006 (participation rate: 88.8 %) and randomized to exposure to dietary gluten from age 6 months or from age 12 months. After inclusion, children were followed in three monthly intervals until the age of 3 years and yearly thereafter for efficacy (persistent islet autoantibodies) and safety assessment, including intensive monitoring with three monthly sample collection of venous blood, urine and stool. PBMCs were isolated from venous blood samples taken at each visit and stored at −80 °C in TRIZOL.

The T1D PBMCs were collected as part of the Genetic Resource Investigating Diabetes (GRID)cohort collection ( http://www.childhood-diabetes.org.uk/grid.shtml) by our laboratory and others. Blood samples were collected in morning or afternoon hospital clinics, across more than 150 centres in the United Kingdom. Blood was collected into ACD vacutainers and PBMCs isolated separated using either Sigma Accuspin or Histopaque according to the manufacturer’s recommendations. PBMCs were cryopreserved in the presence of DMSO and stored at −80 °C until use. For RNA isolation, PBMC samples were thawed, washed with X-Vivo 15 (Lonza) and added to Trizol reagent. RNA was isolated and gene expression data were generated in the same way as the BABYDIET cohort, described in the following section.

All BABYDIET and T1D PBMC gene expression data are deposited with ArrayExpress (accession number: E-MTAB-1724).

Gene expression data for the multi-centre asthma cohort is publically available. The cohort was collected and processed as described by Bjornsdottir et al.34, and the raw and normalized data are deposited with ArrayExpress ( http://www.ebi.ac.uk/arrayexpress/, E-GEOD-19301). This gene expression data set was generated on Affymetrix HG-U133A GeneChip Array, which we found to have 22,283 probesets (which map to 10,457 unique ENSEMBL gene identifiers). The inclusion of patients in the initial collection of the study was dependent on participants being free from active infection, major intercurrent illness, allergen immunotherapy, pregnancy and lactation34.

The available processed data as well as the R ExpressionSet file were downloaded from ArrayExpress. Information regarding the disease phase of the samples, their country of origin and the date of bleeding was used in our analyses. Only asthma patients defined as being in a quiet disease phase were included in our analyses. Precise age at bleed for each donor was also not available in this data set, although individuals between 18 and 83 years of age were present in the cohort. The mean age of the asthma patients was 45.08 years34.

As information regarding sample gender in this data set was not available, we defined gender based on Y-expressed genes in PBMCs (DDX3Y, KDM5D, USP9Y, and RPS4Y1). The first principal component of the expression of the listed genes was calculated for each individual in the study. Patients with component values smaller than zero were classified as female and patients with component values greater than zero were classified as male.

The asthma PBMC data set was divided into four groups according to country of sample collection; United States, Australia, United Kingdom/Ireland and Iceland.

The subcutaneous adipose tissue gene expression data were collected by the MuTHER consortium53 and is publically available (Array Express E-TABM-1140). The adipose tissue data set includes 825 female twins, among them 80 singletons, 448 dizygotic and 297 monozygotic individuals. Gene expression data for 48,638 probesets (mapping to 24,332 unique Entrez genes) were downloaded.

Sample numbers included in our analyses of each cohort, and their monthly distributions, are shown in Supplementary Table 1.

Gene expression analysis in BABYDIET and T1D PBMCs

Only gene expression data for the BABYDIET and T1D PBMC cohorts were generated in our laboratory. In brief, single-stranded cDNA was synthesized from 200 ng total RNA using the Ambion whole-transcript expression kit (Ambion) according to the manufacturer’s recommendations. A total of 3.44 μg cDNA was fragmented and labelled using the GeneChip terminal labelling and hybridization kit and hybridized to 96-sample Titan Affymetrix Human Gene 1.1 ST arrays, which provide comprehensive whole-transcriptome coverage. After quality control, we measured the expression of 33,297 probesets, which map to 22,822 unique ENSEMBL gene identifiers.

BABYDIET, T1D and adipose gene expression data were summarized by exon-level probesets and normalized using variance stabilizing normalization: post quality control 454 BABYDIET35, 236 T1D and 825 adipose samples were used for analysing gene expression.

The gene expression data of the asthmatic patients were log2 transformed before any analysis.

Climatic data for modelling seasonal gene expression

Historical raw data for the mean daily temperature, as well as the total daily hours of sunlight in Munich (Germany), were obtained from the Integrated Climate Data Centre at the University of Hamburg ( http://icdc.zmaw.de/dwd_station.html?&L=1).

For the analysis of the T1D PBMC data that came from all around United Kingdom we downloaded the maximum and minimum temperature data from seven stations across United Kingdom (Armagh, Camborne, Eskdalemuir, Lerwick, Stornoway airport and Valley) from the National Climatic Data Centre, USA ( http://www.ncdc.noaa.gov/cdo-web/search) and averaged readings across all stations.

For the analysis of the asthma cohort (ArrayExpress: E-GEOD-19301), the daily maximum and minimum temperature for relevant cities/regions in the United Kingdom (Central England UK station at Birmingham), United States (New Jersey, Seattle, Atlanta, New Haven), Iceland (Reykjavik), Ireland (Dublin) and Australia (Melbourne, Perth, Adelaide) were obtained from the National Climatic Data Centre, USA and The Digital Technology Group. The average temperature values were computed and used in subsequent analyses.

Self-reported infections in BABYDIET cohort

At each visit, parents of BABYDIET children completed a detailed questionnaire on their children’s history of infections, fever and medication. Specifically, they were asked about fever, infectious symptoms (such as diarrhoea, vomiting, constipation and allergies) and the name of administered pharmaceutical agents or their active ingredient with starting date and duration of infections and medication. Infectious disease was defined as an acute event according to the ICD710 Code or by a symptom indicating an infectious genesis. Infectious events were assigned to a specific time interval by their date of onset, and infectious events that could be matched to microarray samples were included for analysis, as described35. Other disease events such as allergies or accidents were not considered as infectious diseases.

Soluble IL-6 receptor ELISA

Circulating sIL-6R concentrations were measured in BABYDIET and BABYDIAB serum samples using a highly sensitive non-isotopic time-resolved fluorescence ELISA assay based on the dissociation-enhanced lanthanide fluorescent immunoassay technology (DELFIA; PerkinElmer), as described45. Test samples were diluted 1:20 in PBS+10% FBS and measured in duplicate on 384-well MaxiSorp microtiter plates (Nunc), coated with 1 μg ml−1 monoclonal anti-human IL-6R antibody (clone 17506; RD Systems). Detection was performed using a biotinylated mouse anti-CD126 monoclonal antibody (clone M182, BD Biosciences) diluted to a final concentration of 100 ng ml−1 in PBS+10% FBS and a Europium-Streptavidin detection solution (PerkinElmer), diluted in PBS+0.05% tween, 1% BSA, 7 μg ml−1 DTPA to a final concentration of 0.05 μg ml−1. Quantification of test samples was obtained by fitting the readings to a human recombinant IL-6Rα (RD systems) serial dilution standard curve plated in quadruplicate on each plate. Data for 782 unique individuals existed from 722 families.

Cambridge BioResource full blood count data (UK cohort)

Full blood count data were obtained from the Cambridge BioResource. BioResource volunteers are subjected to a full blood count on the day of blood sample collection using Beckman Coulter LH700, Beckman Coulter DXH800 5 part diff analyser or a Sysmex 5 part diff analyser. The available months of bleed were from February to November (no FBC data was available for December) and took the numeric values 2 to 11, respectively. Responses measured included counts for basophils, eosinophils, lymphocytes, monocytes, neutrophils, platelets, erythrocytes and total white blood cells. HCT (haematocrit), HGB (haemoglobin concentration), MCH (mean corpuscular haemoglobin) and MCV (mean corpuscular volume) were also analysed.

Full blood count data from The Gambia

The Gambian cohort was collected as part of the Keneba Biobank ( http://www.ing.mrc.ac.uk/research_areas/the_keneba_biobank.aspx). All participants were recruited between 2012 and 2014 in the West Kiang district and within the catchment area of the MRC International Nutrition Group’s field station at MRC Keneba. Supplementary Figure 7 gives summary statistics for the cohort. Written informed consent was obtained from all participants and all procedures were approved by the joint Gambian Government/MRC Ethics Committee. FBCs were available from 4,200 healthy individuals (at the time of sample collection; 44.07% male) using a Medonic M-series analyser, which measures the numbers of white blood cells, lymphocytes, granulocytes, monocytes, platelets and RBCs. Furthermore, it also analyses the mean platelet volume, RBC haemoglobin concentration, the haematocrit, MCV and MCH.

C-reactive protein

The level of CRP in the peripheral circulation was measured in 3,412 donors (two samples per donor) collected as part of the ASCOT study68. Treatment with Atorvastatin did not remove the seasonal variation in this parameter. Age and sex were included as covariates, while a random intercept was added for the individual identifiers.

Statistical analysis of the data sets

Cosinor models with a period of 1 year were fitted to test the effect of season on gene expression. The general formula of the fitted model is given by:

where Y jik represents the log2 expression of gene j for individual i recorded at time t ik , with t ik computed as the calendar day of the date of bleed divided by the total number of days within the equivalent year.

The fixed covariates and random intercepts terms were data-set-specific. For the analysis of the BABYDIET and T1D data sets we added age at bleed and gender as fixed effects covariates, whereas only gender was added as a covariate in the analysis of the asthma PBMC microarray dataset (age was not available). The identity of each subject of the BABYDIET and of the asthma data sets were modelled as a random intercept in the corresponding models. For the adipose tissue data set we modelled age at bleed as a fixed covariate and added family identity and an indicator whether the twin was monozygotic or dyzogitic as random intercepts. Gender and age at bleed were treated as fixed effects covariates in the analysis of the soluble IL-6 receptor data, and family identity was included as a random intercept. As only the month of bleed was available in the Cambridge BioResource FBC data, we adjusted the cosinor model to depend on month instead of day; no other covariates were available and random intercepts were not required, as no individual was observed more than once. For the last two data sets, the response variable (Y) corresponds to IL-6R and to the tested FBC responses listed in the description of the data set. For analysis of CRP, age, sex and an age*sex interaction were included as fixed covariates, CRP was log transformed to remove right skew, and a random intercept was used to adjust for within individual repeated measures.

To examine whether the effect of season was significant we compared the fitted model in equation (1) with a model that did not include the effect of season. This alternative model is expressed by

The P-value for season was determined by comparing the two models for each gene using an analysis of variance test. Seasonal genes were classified as those with P values less than the data-set-specific Bonferroni correction threshold alpha=0.05. For the BABYDIET and T1D data sets, we defined as seasonal the genes with P values less than the corresponding Bonferroni correction P value and with mean log2 expression greater than or equal to, 6.

The relative estimated log2 expression of each seasonal gene for each data set was computed as

where and are the least squares estimates of b and c of the model in equation (1), respectively.

Furthermore, we tested whether temperature or sunlight hours could predict gene expression of the PBMC data sets. Temperature and sunlight were defined, respectively, as the average temperature and number of sunlight hours over the week preceding the date of bleed for each individual. For example the temperature model is given by

The three alternative models for the seasonal cosinor function, sunlight and temperature, each including only one of these predictors were fitted to log2 expression level for seasonal genes, as identified in each data set.

Definition of winter and summer seasonal genes in BABYDIET

Seasonal genes were classified as winter genes if the relative estimated log2 expression values of the genes were positive for all days of January, February and December and negative for all days of June, July and August. In contrast, summer seasonal genes were defined as those with positive relative estimated log2 expression for all days of June, July and August and negative for all days of January, February and December. The fold change for each summer and winter gene was computed as two raised to the power of the absolute difference of the estimated log2 expression between 15 January and 15 July (days 15 and 196 of a 365-day calendar year).

Network and functional analysis of the seasonal genes identified in BABYDIET

A weighted co-expression gene network of the seasonal genes identified in BABYDIET was constructed using the R package WGCNA69. For the construction of the network, individuals who sero-converted to T1D autoantibodies at any stage during the BABYDIET study were not included. A scale-free topology network was created based on the seasonal genes, where the correlation of their log2 gene expression was used as a measure of co-expression. Modules of highly correlated genes were detected through hierarchical clustering. Some genes were not correlated with other seasonal genes. The biological function of each module was examined through an over-representation pathway analysis carried out using the WEB-based GEne SeT AnaLysis Toolkit (WebGestalt, http://bioinfo.vanderbilt.edu/webgestalt/)70. The gene members of each module were uploaded to WebGestalt and tested for over-representation within KEGG pathways. Pathways with less than three genes within our gene lists were excluded. The Hypergeometric test was applied, and the P values of the test were corrected for multiple testing using the Benjamini–Hochberg method.

Analysis of the self-reported infections data of BABYDIET cohort

The BABYDIET samples were divided into two categories, one that included all samples with no self-reported infections (57 samples) and one with all the samples with at least one reported infection (152 samples). A principal component analysis (PCA) was performed, and the first principal component from the analysis was used to summarize the gene expression of BABYDIET seasonal genes. Similarly, the gene expression of the genes within the black module (detected using network analysis of the seasonal genes identified in BABYDIET) was also summarized as the first component of a second PCA. The effect of infection on either of the two components was tested using analysis of variance. The black module was chosen as it contained genes associated with the response to Staphylococcus infection.

Identification of common seasonal genes

We wanted to explore whether any of the seasonal genes identified in the PBMC cohorts were shared between the five data sets (excluding Iceland). We compared the Bayesian information criterion (BIC) of the cosinor model (1) with the BIC of the model excluding the seasonality effect (2) for each of the genes from the two in-house data sets (BABYDIET and T1D) that had a P value <0.05/33297 in at least one of the two data sets. The common seasonal genes of the two in-house datasets were defined as genes whose BIC was smaller for (1) than (2) within each data set. We repeated the aforementioned steps to identify common seasonal genes in the asthma cohort. The intersection of the two lists from the five data sets were defined as common seasonal genes.

We further computed a combined P value for the association of each common seasonal gene by combining the P values of the five data sets using Fisher’s product P value method.

Common seasonal genes between the adipose tissue data set and the BABYDIET data set were defined as the genes that were found seasonal for both data sets.

Seasonal analysis of The Gambian full blood count data

Given the different seasonal climates present in West Africa compared to Europe, FBC parameters from The Gambia cohort were assessed through linear models that included sex, age (modelled through splines) and with seasonality modelled using three Fourier terms using STATA12.1. The significance of season was assessed using an F-test.

Note added in proof

Adaptive oscillations at balanced polymorphisms in Drosophila in response to acute and persistent changes in climate were reported while this work was under consideration (Bergland, A.O., Behrman, E.L., O'Brien, K.R., Schmidt P.S. & Petrov D.A. Plos Genet. 10(11):e1004775 (2014)). Furthermore, seasonally-variable associations of three genes involved in glucose metabolism and circadian clock regulation, CRY1 (cryoptochrome 1), CRY2 (cryoptochrome 1) and MTNR1B (melatonin receptor 1B) have recently been reported in humans. (Renström, F., Koivula, R.W., Varga, T.V., Hallmans, G., Mulder, H., Florez, J.C., Hu, F.B. & Franks, P.W. Diabetologia 10.1007/s00125-015-3533-8 (2015)).