Ethics and consent

The UK Biobank was granted ethical approval by the North West Multi-centre Research Ethics Committee (MREC) to collect and distribute data and samples from the participants (http://www.ukbiobank.ac.uk/ethics/) and covers the work in this study, which was performed under UK Biobank application number 9072. All participants included in these analyses gave informed consent to participate. UK Biobank consent procedures are detailed at http://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=200. All 23andMe participants were customers of the personal genetics company 23andMe, Inc. and were genotyped for the 23andMe Personal Genome Service. The 23andMe participants included in our analyses provided informed consent for their data to be used for research purposes and responded to online questionnaires according to 23andMe’s human subject protocol, which was reviewed and approved by Ethical and Independent Review Services, a private institutional review board (http://www.eandireview.com). Details of 23andMe’s consent process can be found at https://www.23andme.com/en-gb/about/consent/.

Cohorts

The UK Biobank is a health resource with phenotypic and genetic data on over 500,000 volunteer participants who were aged between 40 and 69. Participants were recruited from the general UK population and baseline data were collected from 2006 to 2010 across 22 centres in England, Scotland and Wales, with recording of detailed anthropometric measures as well as self-report health and sociodemographic variables. The cohort is described in full elsewhere27. We used data on 451,454 individuals from the full UK Biobank data release that we identified as White European and that had genetic data available. To define a set of White Europeans, we performed principal components analysis in the 1000 Genomes (1KG) reference panel using a subset of variants that were of a high quality in the UK Biobank. We projected these principal components into the set of related UK Biobank participants to avoid the relatedness confounding the principal components. We then adopted a k-means clustering approach to define a European cluster, initialising the ethnic centres defined by the population-specific means of the first four 1KG principal components. This analysis was performed only within individuals self-reporting as British, Irish, White or Any other white background. Because association analyses are performed using the LMM method, we included related individuals.

We used summary statistics from a morning chronotype GWAS performed by 23andMe of 248,100 participants (120,478 cases, 127,622 controls) with a minimum of 97% European-ancestry. GWAS analysis was performed in a maximal set of unrelated participants, where pairs of individuals were considered related if they shared 700 cM IBD of genomic segments, roughly corresponding to first cousins in an outbred population. The 23andMe cohort is described in more detail elsewhere24.

Activity monitor data

A subset of the UK Biobank cohort was invited to wear a wrist-worn activity monitor for a period of a week. Individuals were mailed the device and asked to wear it continuously for seven days, including while bathing, showering and sleeping. In total, 103,720 participants returned their activity monitor devices with data covering at least three complete 24-hour periods. We downloaded the raw activity monitor data (data-field 90001) for these individuals, in the form of binary Continuous Wave Accelerometer (cwa) files. Further information, along with details of centrally derived variables, is available elsewhere66. Detailed protocol information can be found online at http://biobank.ctsu.ox.ac.uk/crystal/docs/PhysicalActivityMonitor.pdf and a sample instruction letter at http://biobank.ctsu.ox.ac.uk/crystal/images/activity_invite.png (UKB Resources 131600 and 141141, respectively; both accessed January 30th 2018). We converted the .cwa files to .wav format using the open-source software omconvert, recommended by the activity monitor manufacturers Axivity, which is available online (see https://github.com/digitalinteraction/openmovement/tree/master/Software/AX3/omconvert). To process the raw accelerometer data in.wav format, we used the freely available R package GGIR (v1.5-12)67,68. The list of our GGIR settings is provided in Supplementary Data 14 and the full list of variables produced by GGIR can be found in the CRAN GGIR reference manual (see https://cran.r-project.org/web/packages/GGIR/GGIR.pdf).

Genotyping and quality control

The 23andMe cohort was genotyped on one of four custom arrays: the first two were variants of the Illumina HumanHap550 + BeadChip (4966 cases and 5564 controls), the third a variant of the Illumina OmniExpress + BeadChip (53,747 cases and 61,637 controls) and the fourth a fully custom array (61,765 cases and 60,421 controls). Successive arrays contained substantial overlap with previous chips. These genotypes were imputed to ~15.6 million variants using the September 2013 release of the 1000 Genomes phase 1 reference panel. For analyses, we used ~11.9 million imputed variants with imputation r2 ≥ 0.3, MAF ≥ 0.001 (0.1%) and that showed no sign of batch effects.

The UK Biobank cohort was genotyped on two almost identical arrays. The first ~50,000 samples were genotyped on the UK BiLEVE array and the remaining ~450,000 samples were genotyped on the UK Biobank Axiom array in two groups (interim and full release). A total of 805,426 directly genotyped variants were made available in the full release. These variants were centrally imputed to ~93 M autosomal variants using two reference panels: a combined UK10K and 1000 Genomes panel and the Haplotype Reference Consortium (HRC) panel. For all analyses, we used ~12.0 M HRC imputed variants with an imputation r2 ≥ 0.3, MAF ≥ 0.001 (0.1%) and with a Hardy–Weinberg equilibrium (HWE) P > 1 × 10−12 (chi-squared; 1 degree of freedom). We excluded non-HRC imputed variants on advice from the UK Biobank imputation team. Further details on the UK Biobank genotyping, quality control and imputation procedures can be found elsewhere28.

Self-report phenotypes

Responses to two identical questions (“Are you naturally a night person or a morning person?”) were used to define the dichotomous morning person phenotype in the 23andMe cohort, with one question having a wider selection of neutral options. For the first instance, the possible answers were “Night owl”, “Early bird” and “Neither”, and for the second “Night person”, “Morning person”, “Neither”, “It depends” and “I’m not sure”. Individuals with discordant or neutral responses to both were excluded. For those with one neutral and one non-neutral response, their non-neutral response was used to define their phenotype. Morning people were coded as 1 (cases; N = 120,478) and evening people were coded as 0 (controls; N = 127,622).

The UK Biobank collected a single self-reported measure of Chronotype (“Morning/evening person (chronotype)”; data-field 1180). Participants were prompted to answer the question “Do you consider yourself to be?” with one of six possible answers: “Definitely a ‘morning’ person”, “More a ‘morning’ than ‘evening’ person”, “More an ‘evening’ than a ‘morning’ person”, “Definitely an ‘evening’ person”, “Do not know” or “Prefer not to answer”, which we coded as 2, 1, −1, −2, 0 and missing, respectively (distribution summarised in Table 1). Of the 451,454 white European participants with genetic data, 449,734 were included in the GWAS (had non-missing phenotype and covariates).

In order to provide interpretable ORs for our genome-wide significant variants, we also defined a binary phenotype using the same data-field as for Chronotype. Participants answering “Definitely an ‘evening’ person” and “More an ‘evening’ than a ‘morning’ person” were coded as 0 (controls) and those answering “Definitely a ‘morning’ person” and “More a ‘morning’ than ‘evening’ person” were coded as 1 (cases). Participants answering “Do not know” or “Prefer not to answer” were coded as missing. A total of 403,195 participants were included in the GWAS (252,287 cases and 150,908 controls).

Activity monitor phenotypes

The software package GGIR68,69 produces quantitative and timing measures relating to both activity levels and sleep patterns, with a day-by-day breakdown, as well averages across the period of wear. A new algorithm, implemented in version 1.5–12 of the GGIR R package and validated using PSG in an external cohort70, allows for detection of sleep periods without the use of a sleep diary and with minimal bias. Briefly, for each individual, median values of the absolute change in z-angle (representing the dorsal–ventral direction when the wrist is in the anatomical position) across 5-min rolling windows were calculated across a 24-h period, chosen to make the algorithm insensitive to activity monitor orientation. The 10th percentile was incorporated into the threshold to distinguish movement from non-movement. Bouts of inactivity lasting ≥30 min are recorded as inactivity bouts. Inactivity bouts that were <60 min apart were combined to form inactivity blocks. The start and end of longest block defined the start and end of the sleep period time-window (SPT-window).

The UK Biobank made multiple activity monitor data-quality variables available. From our activity monitor phenotypes, we excluded 4925 samples with a non-zero or missing value in data-field 90002 (“Data problem indicator”). We then excluded any individuals with the “good wear time” flag (field 90015) set to 0 (No), “good calibration” flag (field 90016) set to 0 (No), “calibrated on own data” flag (field 90017) set to 0 (No), “data recording errors” (field 90182) > 788 (Q 3 + 1.5 × IQR) or a non-zero count of “interrupted recording periods” (field 90180). Phenotypes determined using the SPT-window (all phenotypes except L5 and M10 timing) had additional exclusions based on short (<3 h) and long (>12 h) mean sleep duration and too low (<5) or too high (>30) mean number of sleep episodes per night (see below). These additional exclusions were to ensure that individuals with extreme (outlying), and likely incorrect, sleep characteristics were not included in any subsequent analyses.

Sleep midpoint was calculated as the time directly between the start and end of the SPT-window and is defined as the number of hours elapsed since midnight at the start of the calendar day on which the STP-window started (e.g., 02:30 = 26.5; 23:45 = 23.75) with a cut-off at midday (12:00 and 36:00). This accounted for participants whose sleep midpoint occurs before midnight. Our sleep midpoint phenotype represented the average of each participant over all their valid SPT-windows. After exclusions and adjustments, 84,810 participants had valid sleep midpoint, covariates and genetic data.

L5 and M10 refer to the least-active five and the most-active 10 h of each day and are commonly studied measures relating to circadian activity and sleep. L5 (M10) defines a 5-h (10-h) daily period of minimum (maximum) activity, as calculated by means of a moving average with a 5-h (10-h) window. As with sleep midpoint, we defined our L5 (M10) timing phenotype as the number of hours elapsed from the previous midnight to the L5 (M10) midpoint, averaged over all valid wear days. Of the 103,711 participants with activity monitor data, there were 85,205 and 85,670 with valid L5 and M10 timing measures respectively, covariates and genetic data. Basic summaries of these and other raw activity monitor phenotypes are given in Supplementary Table 1.

Sleep episodes within the SPT-window were defined as periods of at least 5 min with no change larger than 5° associated with the z-axis of the activity monitor68. The summed duration of all sleep episodes provided the sleep duration for a given SPT-window. We took both the mean and standard deviation of sleep duration across all valid SPT-windows to provide a measure of average sleep quantity and a measure of variability. After exclusions and adjustments, we had 85,449 (84,441) participants with valid sleep duration mean (SD), covariates and genetic data.

Sleep efficiency was calculated as a ratio of sleep duration (defined above) to SPT-window duration. The phenotype represented the mean across all valid SPT-windows and after exclusions and adjustments, left us with 84,810 participants with valid sleep efficiency, covariates and genetic data.

The number of sleep episodes was defined as the number of sleep episodes of at least 5 min separated by at least 5 s of wakefulness within the SPT-window. The phenotype represented the mean across all SPT-windows and, once adjusted for the mean length of time in bed, can be interpreted as a measure of sleep disturbance or fragmentation. After exclusions and adjustments, we had 84,810 participants with a valid number of sleep episodes, covariates and genetic data.

Diurnal inactivity was defined as the total daily duration of estimated bouts of inactivity that fall outside of the SPT-window. This comprised the total length of periods of sustained inactivity (>5 min) and captured sleep (naps), but did not include other inactivity such as sitting and reading or watching television, which involve a low but detectable level of movement. This variable likely captured some non-sleep rest as it was not possible to separate these without detailed activity diaries. The phenotype was calculated as the mean across all valid days and, after exclusions and adjustments, we were left with 84,757 participants with a valid measure, covariates and genetic data.

Genome-wide association analysis

We performed all association test using BOLT-LMM71 v2.3, which applies a linear mixed model (LMM) to adjust for the effects of population structure and individual relatedness, and allowed us to include all related individuals in our white European subset, boosting our power to detect associations. This meant a sample size of up to 449,734 individuals, as opposed to the set of 379,768 unrelated individuals. BOLT-LMM approximates relatedness within a cohort by using LD blocks and avoids the requirement of building a genetic-relationship matrix, with which calculations are intractable in cohorts of this size. From the ~805,000 directly genotyped (non-imputed) variants available, we identified 524,307 high-quality variants (bi-allelic SNPs; MAF ≥ 1%; HWE P > 1 × 10−6; non-missing in all genotype batches, total missingness < 1.5% and not in a region of long-range LD72) that BOLT-LMM used to build its relatedness model. For LD structure information, we used the default 1000 Genomes LD-Score table provided with the software. We forced BOLT-LMM to apply a non-infinitesimal model, which provides better effect size estimates for variants with moderate to large effect sizes, in exchange for increased computing time. At runtime, the chronotype and morning person phenotypes were adjusted for age (field 21003), sex (field 31), study centre (field 54; categorical) and a derived variable representing genotyping release (categorical; UKBiLEVE array, UKB Axiom array interim release and UKB Axiom array full release). Accelerometer-based phenotypes were adjusted at runtime for age activity monitor worn (derived from month and year of birth and date activity monitor worn), sex, season activity monitor worn (categorical; winter, spring, summer or autumn; derived from date activity monitor worn) and number of valid measurements (SPT-windows for sleep phenotypes, number of valid days for diurnal inactivity or number of L5 or M10 detections for L5 or M10 timing). The GWA analysis for the number of sleep episodes phenotype was also adjusted for the mean length of SPT-window (across all included SPT-windows) to account for the fact that individuals have a greater number of sleep episodes the longer they spend in bed.

In the 23andMe morning person GWAS, the summary statistics were generated through logistic regression (using an additive model) of the phenotype against the genotype, adjusting for age, sex, the first four principal components and a categorical variable representing genotyping platform. Genotyping batches in which particular variants failed to meet minimum quality control were not included in association testing for those variants, resulting in a range of sample sizes over the whole set of results. A λ GC of 1.325 was reported for this GWAS. Lead variants for the 23andMe only morning person GWAS are provided in Supplementary Data 15.

Sensitivity analysis

To avoid issues with stratification, we performed a sensitivity GWAS, in UK Biobank alone, to assess whether any of the associations were driven by a subset of the cohort with specific conditions. We excluded those reporting shift or night shift work at baseline, those taking medication for sleep or psychiatric disorders and those with either with a HES ICD10 or self-reported diagnosis of depression, schizophrenia, bipolar disorder, anxiety disorders or mood disorder (see Supplementary Methods for further details). Results for the 341 lead chronotype variants available in the UK Biobank are provided in Supplementary Data 1 alongside the main meta-analysis results.

Meta-analysis of GWAS results

Meta-analysis was performed using the software package METAL73. To obtain the largest possible sample size, and thus maximising statistical power, we performed a sample-size meta-analysis, using the results from the UK Biobank chronotype GWAS and the 23andMe morning person GWAS. Genomic control was not performed on each set of summary statistics prior to meta-analysis but instead the meta-analysis chi-squared statistics were corrected using the LD-score intercept (I LDSC = 1.0829), calculated by the software LDSC, as using λ GC is considered overly conservative and the LD-score intercept better captures inflation due to population stratification74. For interpretable results, we reported the OR from a secondary effect size meta-analysis between our dichotomous UK Biobank morning person GWAS and the 23andMe morning person GWAS. The primary chronotype sample-size meta-analysis produced results for 15,880,941 variants in up to 697,828 individuals, with the secondary effect-size morning person meta-analysis producing results for 15,880,664 variants in up to 651,295 individuals (372,765 cases and 278,530 controls). All reported meta-analysis P values were calculated by METAL using a Z-test.

Post-GWAS analyses

We used MAGENTA37, DEPICT38, PASCAL36 and MAGMA34 to perform pathway and tissue enrichment. For MAGENTA and DEPICT, we included all variants from the meta-analysis, whereas for PASCAL, we included only those with an RSID as the software assigns variants to genes using their RSID. For the MAGENTA analysis, we used upstream and downstream limits of 110Kb and 40Kb to assign variants to genes by position, we excluded the HLA region from the analysis and set the number of permutations for gene-set enrichment analysis to 10,000. For DEPICT, we used the default settings and the annotation and mapping files provided with the software. As each of the four pieces of software adopts a different gene prioritisation method or relies on different databases, we included results from all four to cover all bases and to allow for better comparison with other studies, where only a single method may have been used. Briefly PASCAL corrects for the effect of LD blocks by accounting for the LD structure between associated variants, MAGENTA uses distance-based mapping but allows the user to set the upstream and downstream distances for inclusion, DEPICT makes use of large-scale data on gene co-regulation to prioritise genes before calculating enrichment in its own reconstituted gene sets and MAGMA, the most recent method (and implemented in the FUMA GWAS35 platform), claims greater statistical power to detect enriched gene sets than methods such as MAGENTA and PASCAL, without affecting the type 1 error rate. By using multiple methods and looking for consistency, we provide more compelling evidence of enrichment in specific pathways and tissues.

We used the LD-score regression (LDSC) software, available at https://github.com/bulik/ldsc/, to quantify the genetic overlap between the trait of interest and 222 traits with publicly available GWA data. Briefly, to estimate heritability of a single phenotype, LDSC regresses chi-squared statistics from summary statistics against pre-computed LD Scores (a measure of how well each variant tags nearby variants) for all variants of the phenotype. The genetic correlation (r g ) between two phenotypes is, similarly, calculated by regressing each variant’s product of Z-scores from the two phenotypes against the LD scores; the slope of the regression line is the estimate of r g . The P values reported in this manuscript were calculated using a Z-test of calculated r g against the null hypothesis of r g = 0. Further methodological details are given elsewhere74. We used an LD-Score panel calculated in European samples from 1000 Genomes phase 3 v5 and removed variants that were not present in this reference panel. We considered any correlation as statistically significant if it had a Bonferroni-corrected P < 0.05.

Fine-mapping analyses were performed using FINEMAP v1.141 using a shotgun stochastic approach, allowing up to 20 causal SNPs at each locus and by focussing on a 1 Mb (±500 Kb) region around each index variant. As FINEMAP assumes a fixed sample size for all variants, we excluded variants not present in both the UK Biobank and 23andMe data, and to make the LD calculations more tractable we excluded variants with GWA analysis P > 0.01 to limit the total number of variants at each locus. We constructed an LD matrix for each locus by calculating the Pearson correlation coefficient for all pairs of variants using dosages derived from the unrelated European-ancestry subset of the UK Biobank imputed genotype probabilities (N = 379,769). A variant was considered to be causal if its log 10 Bayes factor was 2 or larger, a limit recommended by the FINEMAP documentation (http://www.christianbenner.com/index_v1.1.html).

We annotated variants identified by FINEMAP as likely to be causal using Alamut Batch v1.8 (Interactive Biosoftware, Rouen, France) with genome assembly GRCh37 and all options set to default. We retained only the canonical (longest) transcript for each variant and reported the variant location and coding effect (if applicable) in this transcript. To identify whether variants were cis-eQTLs for nearby genes, we performed a lookup of our variants in the GTEx single-tissue cis-eQTL dataset (v7), accessed at the GTEx portal (https://www.gtexportal.org/home/datasets) on 13/07/18, for significant associations. A variant was reported as an eQTL for a gene if the variant-gene association was significant (q value ≤ 0.05) for one or more brain or non-brain tissues.

With the aim of highlighting genes that have a role in regulating the internal circadian clock, we cross-referenced the genes identified by eQTL mapping, in addition to the two nearest genes (within 1 Mb), with catalogues from three gene expression studies. Firstly, we used data from an RNAi screen of circadian clock modifiers49, in which a genome-wide scan was performed on the effects of single-gene knockouts on the amplitude and period of the circadian expression. Secondly, we used data from a study of gene expression in SCN tissue over a 24-h light/dark cycle47 to identify whether our genes exhibit fluctuating expression in SCN tissue and whether the genes show enriched expression in the SCN compared to other tissues. Finally, we used data from a meta-analysis of gene expression in the SCN48 to investigate whether the genes were preferentially expressed in the SCN when compared to other brain tissues.

MR analyses

We undertook MR analyses to explore both the effect of chronotype on different outcomes and the effect of different exposures on chronotype as an outcome. These two-sample MR analyses can be summarised by:

1. Chronotype exposure using the 351 variants and effect sizes discovered in this meta-analysis against the five significant psychiatric outcomes from the genetic correlation analyses and three metabolic outcomes, using summary data from published GWAS (Supplementary Data 12). 2. Two of the five significant psychiatric exposures from the genetic correlation analyses and four metabolic exposures, all using variants from published GWAS, against chronotype as an outcome, using summary data from this meta-analysis (Supplementary Data 13).

In both analyses, we tested four MR methods:

a. Inverse-variance weighting (IVW)75 b. MR-Egger75 c. Weighted median (WM)76 d. Penalised weighted median (PWM)76

Analysis 1 (chronotype exposure) was performed using the R package TwoSampleMR using aggregated summary statistics available through the MR-Base platform56. We implemented the four MR methods listed above and also included the MR-Egger bootstrap to provide better estimates of the effect sizes and standard errors as compared to the MR-Egger method. We used data from published GWAS to test the effect of chronotype on the following exposures: schizophrenia77, major depressive disorder78, depressive symptoms79, subjective well-being79, PGC cross-disorder traits80, fasting insulin81, BMI55,82 and T2D83,84. To provide meaningful effect sizes for MR analyses, we used betas from the secondary effect size meta-analysis of the dichotomous UK Biobank and 23andMe morning person GWAS.

For analysis 2 (chronotype outcome) we applied the four MR methods listed above, utilising a custom pipeline. Using data from published GWAS, we tested whether chronotype is influenced by the following exposures: schizophrenia77, major depressive disorder78,85, insulin secretion86, favourable adiposity87, BMI55 and T2D88. As with analysis 1, chronotype effect sizes represented morningness liability and were taken from the secondary morning person meta-analysis, with the exception of the major depressive disorder exposure from a 23andMe study85 for which outcome effect sizes were taken from the UK Biobank-only chronotype GWAS.

We used the inverse-variance weighted approach as our main analysis method and MR-Egger, weighted median estimation and penalised weighted median estimation as sensitivity analyses in the event of unidentified pleiotropy of our genetic instruments. MR results may be biased by horizontal pleiotropy, i.e., where the genetic variants that are robustly related to the exposure of interest (here chronotype) independently influence the outcome, through association with another risk factor for the outcome. IVW assumes that there is either no horizontal pleiotropy (under a fixed effect model) or, if implemented under a random effects model after detecting heterogeneity amongst the causal estimates, that:

I. The strength of association of the genetic instruments with the risk factor is not correlated with the magnitude of the pleiotropic effects. II. The pleiotropic effects have an average value of zero.