Menstrual cycle data collection

Physiological data, including daily BBT (sublingual measurement), cycle by cycle dates of menstruation, and urinary LH test results, were collected prospectively from users of the Natural Cycles app. Participant characteristics including age and BMI were determined through mandatory in-app questions that must be completed during the sign-up process. Users are recommended to measure their temperature on 5 out of 7 days per week as soon as they wake up. They are requested to report whether a temperature measurement may be deviating for reasons such as disrupted sleep or alcohol consumption the night before. The algorithm also identifies deviating temperatures if the value is outside the range 35.0–37.5 °C.

All users in the study had consented at registration to the use of their data for the purposes of scientific research and could remove their consent at any time. This study was a subanalysis of data collected as part of a wider study protocol that was reviewed and approved by the regional ethics committee (EPN, Stockholm, diary number 2017/563-31).

Identification of ovulation day

A surge in LH is responsible for triggering follicle rupture.2 The start of the surge is approximately 28–48 h before follicle rupture and peak LH levels are reached 12 h before follicle rupture.42 After follicle rupture the corpus luteum forms, marking the start of the luteal phase, and secretes progesterone for the duration of the luteal phase in order to prime the endometrium for embryo implantation.2 Elevated levels of LH are detectable in blood and urine samples. At the onset of menses, marking the start of the follicular phase, the corpus luteum collapses and progesterone levels fall back to a low level until the next preovulatory increase. Progesterone has a thermogenic effect so its levels can be tracked by measuring BBT. BBT is at a relatively constant low level during the follicular phase, reaching its lowest level (the nadir) prior to ovulation,43 and then displays a distinct rise of 0.2–0.3 °C following ovulation.44 The higher level of BBT is sustained during the luteal phase before falling back to the lower level at the start of the next cycle.44,45

The algorithm within the app detects ovulation retrospectively based on BBT measurements, menstrual cycle parameters and additionally on positive urinary LH tests. The algorithm can identify the BBT rise associated with ovulation in the presence of measurement errors, missing data and BBT rise occurring over a variable length of time.20 The risk of misidentification is reduced by excluding deviating temperatures. In order to determine that ovulation has occurred, as a minimum requirement the rolling average BBT (average of valid (nondeviating) temperatures over the last three calendar days) must be higher than both the woman’s follicular phase average and her cover line (the average temperature across all data entries) and consistent with her luteal phase average. Figure 6 illustrates a typical biphasic temperature graph in an ongoing ovulatory cycle from a user in ‘Prevent’ mode. The horizontal grey line is the cover line. Comparisons are made using standard statistical techniques taking into account sample size and standard deviation. If ovulation is not detected in this initial test then more tests are performed with a rolling average over an increasing number of days up to 1 week. Ovulation detection is less likely if there are valid temperature measurements on fewer than about 50% of cycle days.

If a positive-LH test has been recorded, fewer high temperatures are required in order to detect ovulation since the LH test provides extra confidence that ovulation has occurred. The app recommends which days to take an LH test, considering the uncertainty of the ovulation day such that it minimises the number of LH tests used while ensuring that the user will not miss her surge. If the user is in Prevent mode, the algorithm only recommends to check for LH if the user had at least three cycles off hormonal contraception and the total ovulation uncertainty is less than ±10 days. For users on Plan mode the app always recommends which days to check for LH since Plan users are in general more keen on finding the surge, even if it requires a large number of LH tests. The app will, however, only recommend to start checking LH 10 days prior to the earliest recorded ovulation day even if the total uncertainty is larger.

As the LH surge typically lasts for several days42 the probability of missing the surge if only testing every other day is relatively small. The app, therefore, recommends to only test every other day until close to the expected ovulation day. If one positive LH test has been entered, but no positive or negative LH test entry exists on the day immediately before, then the user is encouraged to test the following day to establish whether the positive test corresponds to the first or second day of the surge. If no such test is entered, the app assumes the first LH test marks the first day of the surge.

Cycles in which ovulation has been detected are hereafter referred to as ovulatory cycles. If ovulation has been detected in the current cycle then the algorithm selects the most suitable candidate day to call the First High Point (FHP) using a system of measurements based on comparisons of each temperature to the phase averages. This is the day on which the temperatures immediately before and after are most consistent with the follicular and luteal phase averages respectively. On average the FHP temperature is just below the cover line. In a previous study the FHP was 1.9 ± 1.4 days after the estimated start of the LH surge,20 similar to a comprehensive study of different markers of ovulation by Ecochard et al.45 where the FHP was most often 2 days after the LH peak. An evaluation of the timing of the FHP and the LH peak relative to the data of Ecochard et al (2001) is available in Supplementary materials.

In a clinical study43 the FHP was observed in most cycles during the 12 h following ovulation. Because users of the app only measure temperature every 24 h, the FHP is expected to be detectable by the algorithm the day after ovulation. This means that ovulation itself is estimated to occur on the day of the last low temperature before the rise as suggested by Hilgers and Bailey46 and Mouzon et al.43 We define the EDO as the day before the FHP, the day of the last low temperature. According to convention the follicular phase ends on the EDO and the luteal phase starts the day after the EDO.

Another marker besides the BBT shift that has been used in clinical settings to estimate the day of ovulation is the day of luteal transition (DLT) defined as the ratio of oestrogen to progesterone falling below a critical threshold.28,47 The DLT ovulation detection algorithm has been designed to coincide with the peak of the LH surge.47 Although DLT is not intended for home use we mention it here because a study using it will be used as a source of reference data for validating the results of this study.

Inclusion/exclusion criteria

Women using the app who had registered between 1st September 2016 and 1st February 2019, had given their consent for the use of their data in research, were aged 18–45 at registration, had a BMI between 15 and 50 and had not been using hormonal contraception within the 12 months prior to registration were included. Users who stated at registration that they had a PCOS (hypothyroidism or endometriosis) or who had menopausal symptoms were excluded. They were required to have logged at least ten nondeviating temperatures.

Cycles were, included if they were recorded by a user with at least six complete cycles (with or without detected ovulation), the cycle length was between 10 and 90 days and nondeviating temperatures had been recorded on at least 50% of cycle days. ‘Non-deviating temperatures’ are defined as temperature measurements where the user has not selected the temperature to be abnormal (e.g., due to unexplained fever or high alcohol intake) when entering into the app. Cycles were excluded if a pregnancy was reported by the user or was otherwise flagged as possibly pregnant by the algorithm due to a significantly longer luteal phase than the user’s average and sustained high temperatures. Figure 7 summarises the number of users and cycles at each step of the selection process.

Study design

The ‘normal’ menstrual cycle is conventionally classified as 21–35 days in length, frequent menstrual bleeding (polymenorrheic) cycles as being under 21 (very short cycles) and infrequent menstrual bleeding (oligomenorrheic cycles) as being over 35 days (very long cycles).48 In this study bleed length was defined as the number of consecutive days on which bleeding—not spotting—was recorded. Spotting is defined as very light bleeding (a few drops of blood) or brown/pink fluids. Users are instructed not to log very light bleeding just before the period as bleeding but to wait until the flow increases. The follicular phase was defined as the first day of recorded menstruation to the EDO. Luteal phase length was defined as the day after the EDO to the day before the next day of recorded menstruation. The per-user cycle length variation was defined as one standard deviation of a user’s cycle lengths.

We calculated mean cycle length, duration of bleeding (bleed length), follicular phase length and luteal phase length in ovulatory cycles. The following cohort splits by cycle length were defined: very short cycles (15–20 days), short cycles (21–24 days), medium cycles (25–30 days), long cycles (31–35 days) and very long cycles (36–50 days). We calculated the same statistics as well as per-user cycle length variation for cohorts of ovulatory cycles by user age at registration (18–24, 25–29, 30–34, 35–39 and 40–45 years) and BMI (15–18.5, 18.5–25, 25–30, 30–35 and 35–50). We also calculated the mean proportion of ovulatory cycles as a fraction of all cycles recorded by the user in each of the age and BMI cohorts.

Owing to the very large sample sizes in this study, P values were not calculated since they can be very small even if differences between cohorts are of no clinical significance.49 Instead, effect size between two cohorts was estimated as a mean difference with a 95% confidence interval calculated from 200 bootstrapped cohort-sized randomly selected samples with replacement.50 Mean differences are also given as a percentage of the mean in the combined cohorts. Where linear regression is used, we quote the coefficient of the slope with a 95% confidence interval and R2 value.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.