Population registries

We established a population-based cohort of Danish women by linking data from the Civil Registration System (CRS) with data from the Medical Birth Registry and the Danish Cancer Registry. The CRS contains detailed demographic information on all Danish residents, including linkage of women to their children’s dates of birth. Since April 1, 1968, all Danish residents who were alive or born thereafter have been assigned a unique identification number in the CRS. This number permits information from different national registries to be linked together. All live and stillbirths in Denmark, with dates of birth, have been registered since 1973 in the Medical Birth Registry. Since 1978, gestational week at time of birth has been recorded. For sensitivity analyses, we obtained information on induced abortions in Denmark from the National Registry of Induced Abortions, where induced abortions have been mandatory reported to since 1939.

Information on breast cancer diagnoses was retrieved from the Danish Cancer Registry, which contains information on all cancers diagnosed in Denmark since 1943 and is considered close to complete35. From Statistics Denmark we acquired time-varying, individual-level socioeconomic data to address covariates potentially associated with reproduction and breast cancer36; educational attainment (since 1970), employment status (since 1976), and disposable household income (since 1990).

In Norway, we linked data from the National Registry, the Medical Birth Registry of Norway, and the Cancer Registry of Norway. The Medical Birth Registry has registered all births (including gestational week of delivery) since 196737 and the Cancer Registry is considered accurate and close to complete with regard to cancer diagnoses from 195338.

The research project was approved by institutional review for inclusion on Statens Serum Institutes permit for research projects given by the Danish Data Protection Agency (permit No. 2015-57-0102) and approved by the Regional Ethics Committee of Western Norway (permit 252.06).

Subjects

We established a cohort of all Danish women born between January 1, 1935 and December 31, 2002. Using the CRS number, we linked information on each woman’s childbirths with the corresponding pregnancy duration (gestational week of delivery), and information on whether she developed invasive breast cancer. We furthermore established a cohort of all Norwegian women born between January 1, 1935 and December 31, 1994, with equivalent information on reproductive history and breast cancer.

Statistical analyses

Incidence rate ratios (in the following termed RR) of breast cancer by pregnancy history were estimated by log-linear Poisson regression in the Danish cohort, the Norwegian cohort, and the combined cohort. In Denmark, each woman was followed from January 1, 1978, or from her 12th birthday, whichever came later, until breast cancer, death, emigration or December 31, 2014, whichever came first. In Norway, each woman was followed from January 1, 1967, or from her 12th birthday, whichever came later, until breast cancer, death, emigration or December 31, 2006, whichever came first. All analyses were adjusted for effects of current age and time period in 5-year categories.

Pregnancy history was modeled by time-dependent variables as described previously4. Thus, instead of describing history by the total number of childbirths (i.e., RR of cancer in women with 1, 2, 3, or 4 births compared with women with 0 births), pregnancy history was evaluated by the RR for women with n births compared with women with n−1 births (i.e., RR of cancer for 1 birth compared with 0, 2 births compared with 1, and 3 births compared with 2). This reparameterization allows for a focus on the effect of each additional birth on cancer risk. The RRs were assumed to be the same regardless of birth number, and the presented RRs are therefore RRs for each additional birth. To allow for a different short-term and long-term effect of pregnancy, RRs were allowed to vary according to time since birth (<10 years, ≥10 years) for parous women. In the presentation of the model we focused on the parameters related to the long-term effect of pregnancy. We furthermore allowed RRs to be different for childbirths at younger (<30 years) and older maternal age (≥30 years) to focus on early age pregnancies which have previously been associated with long-term reduced risk of breast cancer3,4. The previously used method (4) was extended to include pregnancy duration. In the previous approach the effect of each birth was stratified according to time since birth and age at childbirth, but in this extended approach it was further stratified by pregnancy duration. Thus, RRs were allowed to vary by duration of the pregnancy in weeks, by the following categories: 20–27, 28–29, 30, 31, …, 41, 42–45 weeks, missing duration of pregnancy, duration of pregnancy not reported, extremely early births (<20 weeks), and extremely late births (>45 weeks). The four last categories are further described in Table 1, Supplementary Table 1, and Supplementary Table 2.

In the analysis of pregnancy duration, all parameters described above were included simultaneously. For example, for biparous women whose first birth occurred in early age at week 38 and whose second birth occurred in late age at week 40, their pregnancy history was modeled by four parameters: the short-term and long-term effect of an early age birth at week 38, and the short-term and long-term effect of a late age birth at week 40. Thus, when estimating the long-term effect of the early age pregnancy at week 38, the model also included the short-term effect of an early age pregnancy at week 38, the short-term effect of an late age pregnancy at week 40, and the long-term of an late age pregnancy at week 40.

The analysis of pregnancy duration was based on follow-up time from January 1, 1978 in Denmark, and from January 1, 1967 in Norway, when the respective Medical Birth Registers began recording gestational week of birth. Childbirths registered in civil registrations systems were incorporated in the analyses to adjust for the effects of pregnancies before start of the Medical Birth Registers.

In analysis of the effect of age at childbirth on breast cancer risk, RRs were allowed to vary according to age at delivery in the categories <20, 20–21, 22–23, 24–25, 26–27, 28–29 and ≥30 years. In analyses of the adjustment effect of socioeconomic status, each socioeconomic variable was added as an additional variable.

All analyses were performed using SAS version 9.4 and procedure GENMOD.

Socioeconomic factors and risk of breast cancer

Using nationwide registries from Statistics Denmark on educational attainment, employment and disposable household income starting from respectively 1970, 1976, and 1990, we were able to create a time-varying, three factor adjustment for socioeconomic status. The following categories of the three variables for socioeconomic status were used:

Educational attainment: primary schooling; high school; high school with technical or mercantile focus; short basic education; higher education of short duration; higher education of intermediate duration; academic bachelor degree; academic master’s degree; and academic doctoral degree or equivalent educational degree.

Employment status: business owner, ten or more employees; business owner, five to nine employees; business owner, one to four employees; business owner, no employees; business owner, unknown number of employees; co-working spouse; executive officer in business, organization or public office; employee in job which necessitates advanced skills; employee in job which necessitates intermediate skills; employee in job which necessitates basic skills; employee, other; employee, unknown position; unemployed for more than 6 months; social security recipient because of disability; in educational program; disability pensioner; pensioner; early retirement pensioner; social security recipient; other; children under the age of 15 years; housewife (only categorized 1976–1990).

Disposable household income: groups of 10%-percentiles according to the 5-year disposable household income distribution.

Birthweight and maternal risk of breast cancer

In order to investigate the effect of birthweight relative to gestational age in pregnancies of different duration, we combined data on gestational age and birthweight from the Danish Medical Birth Registry compiled from 1978 and onwards. We defined a birth small for gestational age (SGA) if the birthweight was below the 10th percentiles of births at the given gestational week, in the corresponding 5-year period. We then stratified by weight category (SGA vs. 10–100th percentile of birthweights at same week) and assessed risk of breast cancer after a pregnancy of given duration, grouped into the following lengths of pregnancy: week 20–33, week 34–36, week 37, 38, 39, 40, 41, and week 42 or longer. To estimate RR of breast cancer by both relative birthweight and gestational period, we extended the model so that the pregnancy effect also varied by relative birthweight.

Threshold model analysis of pregnancy duration and risk of breast cancer

Our estimates of the effect of an early age pregnancy stratified by the duration of pregnancy (Fig. 2c) suggest that pregnancies lasting 34 gestational weeks are necessary to obtain a long-term reduced risk of breast cancer. To substantiate this conclusion we compared the observed pattern in Fig. 2c with week-specific threshold models, where breast cancer risk reduction is achieved only by pregnancies with a specific minimal duration or longer. The threshold model with the least difference in fit from the observed pattern in Fig. 2c is interpreted as providing the best estimate for the critical length of pregnancy necessary for the long-term breast cancer risk reduction.

The model used in Supplementary Figure 6 is in the following termed M fig2C . In this model, the long-term effect of each early age pregnancy with duration of the pregnancy w, is modeled as β w , with w noting the gestational week categories described in the paper. We compared M fig2C with simple week-specific threshold models (M threshold (w 0 )) by which a certain threshold of pregnancy duration is associated with a decreased risk of breast cancer. In that model the long-term effect of each early-age pregnancy according to the duration of the pregnancy w is modeled as β∙I(w ≥ w 0 ), i.e., by one parameter. With regard to the other parameters in the model, the two models are similar, i.e., a total difference of 14 parameters.

We furthermore compared models that allowed for difference in the pregnancy effect according to parity (primiparity, multiparity) and country (Denmark, Norway). All models were compared by the deviance (i.e., the difference in −2∙loglikelihood between two models).