The NutriNet-Santé study is an ongoing web based cohort launched in 2009 in France with the objective of studying the associations between nutrition and health as well as the determinants of dietary behaviours and nutritional status. Details about this cohort have been described previously. 44 Briefly, participants aged 18 years or older with access to the internet have been continuously recruited among the general population since May 2009 using multimedia campaigns. Questionnaires are completed online using a dedicated website ( www.etude-nutrinet-sante.fr ). Participants are followed using an online platform linked to their email address. Electronic informed consent is obtained from each participant.

Participants were also invited to complete a series of three non-consecutive validated web based 24 hour dietary records at baseline and every six months (to vary the season of completion), randomly assigned over a two week period (two weekdays and one weekend day). 49 50 51 To be included in the nutrition component of the NutriNet-Santé cohort, it was mandatory to have two dietary records during the overall baseline period. In this prospective analysis, we averaged the mean dietary intakes from the 24 hour dietary records available during the first two years of each participant’s follow-up (≤15 records) and considered these as baseline usual dietary intakes. The web based self administered 24 hour dietary records have been tested and validated against both an interview by a trained dietitian 49 and blood and urinary biomarkers. 50 51 Participants used the dedicated web interface to record all foods and beverages consumed during a 24 hour period for each of the three main meals (breakfast, lunch, and dinner) and any other eating occasion. We used previously validated photographs or usual containers to estimate portion sizes. 52 Dietary underreporting was identified with the method proposed by Black, using the basal metabolic rate and Goldberg cut-off, in order to screen participants with abnormally low energy intakes, and energy under-reporters (20.0% of the cohort) were excluded 53 (see supplementary appendix 1 for details about energy underreporting in the cohort). We calculated mean daily intakes of alcohol, micronutrients, macronutrients, and energy using the NutriNet-Santé food composition database, which contains more than 3300 different items. 54 Amounts consumed from composite dishes were estimated using French recipes validated by nutrition professionals. Sodium intake was assessed through a specific module included in the 24 hour records, taking into account native sodium in foods, salt added during cooking, and salt added on the plate. This method has been validated against sodium urinary excretion biomarkers. 51

Three trained dieticians categorised the food and beverage items of the NutriNet-Santé composition table into one of the four food groups in NOVA, based on the extent and purpose of industrial food processing. 24 55 56 A committee of specialists in nutritional epidemiology—three dietitians and five researchers—then reviewed the classification. When uncertainty existed about a food or beverage item, researchers reached a consensus based on the percentage of homemade and artisanal foods versus industrial brands of processed and ultra-processed foods reported by the participants. This study primarily focused on the NOVA group of ultra-processed foods. This group includes mass produced packaged breads and buns, sweet or savoury packaged snacks, industrialised confectionery and desserts, sodas and sweetened beverages, meatballs, poultry and fish nuggets, and other reconstituted meat products transformed with the addition of preservatives other than salt (eg, nitrites), instant noodles and soups, frozen or shelf stable ready meals, and other food products made mostly or entirely from sugar, oils, and fats, and other substances not commonly used in culinary preparations, such as hydrogenated oils, modified starches, and protein isolates. Industrial processes notably include hydrogenation, hydrolysis, extrusion, moulding, reshaping, and pre-processing by frying. Flavouring agents, colours, emulsifiers, humectants, non-sugar sweeteners, and other cosmetic additives are often added to these products to imitate sensorial properties of unprocessed or minimally processed foods and their culinary preparations, or to disguise undesirable qualities of the final product. In the ultra-processed group we also included food and beverages that did not fit in the three NOVA groups for unprocessed or minimally processed foods: (fresh, dried, grounded, chilled, frozen, pasteurised, or fermented staple foods such as fruit, vegetables, pulses, rice, pasta, eggs, meat, fish, or milk), processed culinary ingredients (salt, vegetable oils, butter, sugar, and other substances extracted from foods and used in kitchens to transform unprocessed or minimally processed foods into culinary preparations), and processed foods (canned vegetables with added salt, sugar-coated dried fruit, meat products only preserved by salting, cheeses and freshly made unpackaged breads, and other products manufactured with the addition of salt, sugar, or other substances of the “processed culinary ingredients” group). As previously described, 57 we used standardised recipes to identify and disaggregate homemade and artisanal food preparations, and we applied the NOVA classification to the ingredients. Supplementary appendix 2 presents the details about the NOVA classification along with some examples.

Participants were asked to report major health events through the yearly health questionnaire, a check-up questionnaire every three months, or at any time through a specific interface on the study website. We then invited participants to provide their medical records (eg, diagnoses, hospital admissions, radiological reports, electrocardiograms) and, if necessary, the study doctors contacted the participants’ doctors or medical facilities (clinic, hospital, or laboratory) to collect additional information. A committee of study doctors then reviewed the medical data to validate any major health events. Participants’ families or doctors were contacted when there had been no response to the study website for more than one year. This process constituted the main source of case ascertainment in the cohort. Our research team was authorised by the Council of State (No 2013-175) to link data from our general population based cohorts to medico-administrative databases of national health insurance (SNIIRAM). Thus, for participants who provided their social security number (n=50 240), we linked their data to medico-administrative databases of SNIIRAM, limiting potential bias from those who had not reported their CVD to the study investigators. A low proportion of participants (1.7%) emigrated and were not covered by SNIIRAM. Lastly, to identify deaths and potentially missed CVD cases for deceased participants we linked data to CépiDC, the French national cause specific mortality registry, which includes dates and causes of death. This registry is accessible to all French citizens, without specific authorisation or identification number. We classified CVD cases using ICD-CM codes (international classification of diseases-clinical modification, 10th revision). The present study focused on first incident cases of stroke (I64), transient ischaemic attack (G45.8 and G45.9), myocardial infarction (I21), acute coronary syndrome (I20.0 and I21.4), and angioplasty (Z95.8) occurring between inclusion and January 2018.

Statistical analysis

Up to 11 January 2018, 105 159 participants without CVD at baseline and who provided at least two valid 24 hour dietary records during their first two years of follow-up were included (fig 1). For each participant, we calculated the proportion (%) of ultra-processed foods in the total weight of food and beverages consumed (g/day). We determined this by creating a weight ratio rather than energy ratio to account for processed food that does not provide energy (eg, artificially sweetened beverages) and non-nutritional factors related to food processing (eg, neoformed contaminants, additives, and alterations to the structure of raw foods). A sensitivity analysis was also performed by weighting the ultra-processed variable by the energy (%Kcal/day) instead of weight. For all covariates except physical activity, 5% or less of values were missing and were imputed to the modal value (for categorical variables) or median (for continuous variables). For physical activity, the proportion of missing values was higher (14%) because we needed answers to all the questions in the International Physical Activity Questionnaire to calculate the score. To avoid massive imputation for a non-negligible number of participants or exclusion of those with missing data and risk of selection bias, we included a missing class into the models for this variable (main analysis). However, we also tested complete case analysis and multiple imputation in sensitivity analyses: multiple imputation for missing data was performed using the MICE method58 by fully conditional specification (20 imputed datasets) for the outcome59 and for several covariates: level of education (5.0% missing data), physical activity level (13.9% missing data), and body mass index (0.6% missing data). Results were combined across imputations based on Rubin’s combination rules6061 using the SAS PROC MIANALYZE procedure.62

Fig 1 Flowchart for study sample, NutriNet-Santé cohort, France, 2009-18

To examine differences in baseline characteristics of participants between quarters of the percentage of ultra-processed food in the diet with sex specific cut-offs (computed with PROC RANK BY SEX procedure in SAS), we used analysis of variance (ANOVA) or χ2 tests when appropriate. We chose sex specific cut-offs because women generally having a healthier diet and consume lower food amounts than men, and this allowed us to ensure equivalent sex ratios between quarters. To provide some information on the nutritional quality of ultra-processed foods, we calculated the proportion across the different categories of the Nutri-score. This score is calculated based on a modified version of the Food Standard Agency Nutrient Profiling system, and it has been endorsed by the French, Spanish, and Belgian ministries of health as the official nutrient profiling system in these countries (see supplementary appendix 3 for details about its calculation).

We used Cox proportional hazards models with age as the primary timescale to evaluate the association between the proportion of ultra-processed foods in the diet (coded as a continuous variable or as quarters with sex specific cut-offs) and incidence of overall CVD, cerebrovascular diseases (stroke and transient ischaemic attack), and coronary heart diseases (myocardial infarction, acute coronary syndrome, and angioplasty). In these models, we censored CVDs other than the one studied at the date of diagnosis (ie, they were considered as non-cases for the disease of interest and contributed person years until the date of diagnosis of CVD). We generated log-log (survival) versus log-time plots to confirm risk proportionality assumptions (see supplementary appendix 4). Hazard ratios and 95% confidence intervals were computed. In continuous models, hazard ratios corresponded to the ratio of instantaneous risks for an absolute increment of 10 in the percentage of ultra-processed foods in the diet (ie, a 0.1 absolute increase in the proportion of ultra-processed foods in the diet). In models based on quarters of the percentage of ultra-processed food in the diet, we obtained P values for linear trends by coding quarters of ultra-processed food as an ordinal variable (1, 2, 3, or 4). We verified the assumption of linearity between consumption of ultra-processed food and risk of CVD using restricted cubic spline functions with the SAS macro written by Desquilbet and Mariotti.63 Participants contributed person time until the date of CVD diagnosis, date of last completed questionnaire, date of death, or 11 January 2018, whichever occurred first.

Models were adjusted for age (timescale) and sex (model 0), in addition to body mass index (BMI, continuous), physical activity (high, moderate, low, calculated according to International Physical Activity Questionnaire recommendations48), smoking status (never, former, and current smokers), number of 24 hour dietary records (continuous), alcohol intake (g/day, continuous), energy intake (kcal/day, continuous), family history of CVD (yes or no), and educational level (less than high school degree, <2 years after high school degree, ≥2 years after high school degree) (model 1). To test for the potential influence of the nutritional quality of the diet in the association between intake of ultra-processed food and risk of CVD, we additionally adjusted this model for saturated fatty acids and sodium and sugar intakes (model 2), or for a healthy dietary pattern derived from principal component analysis (model 3) (see supplementary appendix 5 for details), or for intakes of sugary products, red and processed meat, salty snacks, beverages, and fats and sauces (model 4). We also tested a model without adjustment for BMI (model 5) to account for the potential mediating role of BMI in the association. In model 6, we performed further adjustments (based on model 1) for baseline prevalent type 2 diabetes, dyslipidaemia, hypertension, and hypertriglyceridemia (yes or no) as well as treatments for these conditions (yes or no).

We also investigated the association between consumption of ultra-processed food and overall risk of CVD separately in stratums of the population: men and women, younger adults (<45 years) and older adults (≥45 years), participants with a high lipid intake (more than the median) and those with a lower lipid intake, participants with a BMI less than 25 and those with a BMI of 25 or more, participants following a healthy dietary pattern and those following a less healthy one (discriminated by the median of the healthy dietary pattern obtained by the principal component analysis), and participants who tended to be sedentary (the low class of International Physical Activity Questionnaire) and those who tended to be more physically active.

Sensitivity analyses were performed based on model 1 by excluding CVD cases diagnosed during the first two, three, four, and five years of each participant’s follow-up to avoid reverse causality bias, by no adjustment for BMI and energy intake, and by testing further adjustments for a Western dietary pattern (continuous), number of smoked cigarettes in pack years (continuous), overall consumption of fruit and vegetables (continuous), dietary fibre intake (continuous), region of residence (Ile-de-France (Paris area) and east, centre east, west, north, southwest, Mediterranean region, or French overseas territories and departments), and season of inclusion in the cohort (spring, summer, autumn, or winter). Models were also tested after restriction of the population study to the participants with six or fewer, or more than six, 24 hour dietary records during the first two years of follow-up. We tested the associations between the quantity (g/day) (rather than the proportion) of intake of ultra-processed food and risk of CVD; as well as the associations between the quantity (g/day) of each ultra-processed food group and risk of CVD; we similarly tested the associations between the quantity (g/day) of non-ultra-processed foods in each group and risk of CVD to check that the associations were not driven by the consumption of specific food groups by themselves. A supplementary analysis was also performed by focusing on participants for whom the proportion of ultra-processed foods in the diet varied varied by less than |0.1| (that is, the absolute (non-negative) value of the difference) between the beginning and end of their follow-up. In the main model we included transient ischaemic attack (corresponding to a brief episode of neurological dysfunction, which has the same underlying mechanism as ischaemic stroke), but we performed a sensitivity analysis by excluding this CVD event. In this study we included angina pectoris events as acute coronary syndrome (ICD code I20), but not stable anginas (considered as soft events occurring only during effort or intense physical activity, which usually do not require hospital admission and might have other causes than coronary obstruction, such as anaemia, abnormal heart rhythms, and heart failure). However, we also tested sensitivity analyses including stable angina events.

Finally, we performed secondary analyses to test the associations between the proportions of unprocessed or minimally processed foods in the diet (continuous) with risk of CVD, using multivariate Cox models adjusted for model 1 covariates.

All tests were two sided, and we considered P<0.05 to be statistically significant. SAS version 9.4 (SAS Institute) was used for the analyses.