Conclusions In this large prospective study, a 10% increase in the proportion of ultra-processed foods in the diet was associated with a significant increase of greater than 10% in risks of overall and breast cancer. Further studies are needed to better understand the relative effect of the various dimensions of processing (nutritional composition, food additives, contact materials, and neoformed contaminants) in these associations.

Results Ultra-processed food intake was associated with higher overall cancer risk (n=2228 cases; hazard ratio for a 10% increment in the proportion of ultra-processed food in the diet 1.12 (95% confidence interval 1.06 to 1.18); P for trend<0.001) and breast cancer risk (n=739 cases; hazard ratio 1.11 (1.02 to 1.22); P for trend=0.02). These results remained statistically significant after adjustment for several markers of the nutritional quality of the diet (lipid, sodium, and carbohydrate intakes and/or a Western pattern derived by principal component analysis).

Setting and participants 104 980 participants aged at least 18 years (median age 42.8 years) from the French NutriNet-Santé cohort (2009-17). Dietary intakes were collected using repeated 24 hour dietary records, designed to register participants’ usual consumption for 3300 different food items. These were categorised according to their degree of processing by the NOVA classification.

To our knowledge, this prospective study was the first to evaluate the association between the consumption of ultra-processed food products and the incidence of cancer, based on a large cohort study with detailed and up to date assessment of dietary intake.

Studying potential effects on health of ultra-processed foods is a very recent field of research, facilitated by the development of the NOVA classification of products according to their degree of food processing. 9 Nevertheless, epidemiological evidence linking intake of ultra-processed food to risk of disease is still very scarce and mostly based on cross sectional and ecological studies. 25 26 27 The few studies performed observed that ultra-processed food intake was associated with a higher incidence of dyslipidaemia in Brazilian children and higher risks of overweight, obesity, and hypertension in a prospective cohort of Spanish university students. 28 29 30

This dietary trend may be concerning and deserves investigation. Several characteristics of ultra-processed foods may be involved in causing disease, particularly cancer. Firstly, ultra-processed foods often have a higher content of total fat, saturated fat, and added sugar and salt, along with a lower fibre and vitamin density. 10 11 12 13 14 15 16 17 19 Beyond nutritional composition, neoformed contaminants, some of which have carcinogenic properties (such as acrylamide, heterocyclic amines, and polycyclic aromatic hydrocarbons), are present in heat treated processed food products as a result of the Maillard reaction. 20 Secondly, the packaging of ultra-processed foods may contain some materials in contact with food for which carcinogenic and endocrine disruptor properties have been postulated, such as bisphenol A. 21 Finally, ultra-processed foods contain authorised, 22 but controversial, food additives such as sodium nitrite in processed meat or titanium dioxide (TiO 2 , white food pigment), for which carcinogenicity has been suggested in animal or cellular models. 23 24

At the same time, during the past decades, diets in many countries have shifted towards a dramatic increase in consumption of ultra-processed foods. 4 5 6 7 8 After undergoing multiple physical, biological, and/or chemical processes, these food products are conceived to be microbiologically safe, convenient, highly palatable, and affordable. 9 10 Several surveys (in Europe, the US, Canada, New Zealand, and Brazil) assessing individual food intake, household food expenses, or supermarket sales have suggested that ultra-processed food products contribute to between 25% and 50% of total daily energy intake. 10 11 12 13 14 15 16 17 18

Cancer represents a major worldwide burden, with 14.1 million new cases diagnosed in 2012. 1 According to the World Cancer Research Fund/American Institute for Cancer Research, about a third of the most common neoplasms could be avoided by changing lifestyle and dietary habits in developed countries. 2 Therefore, reaching a balanced and diversified diet (along with avoidance of tobacco use and reduction in alcohol intake) should be considered one of the most important modifiable risk factors in the primary prevention of cancer. 3

The research question developed in this article corresponds to a strong concern of the participants involved in the NutriNet-Santé cohort and of the public in general. The results of this study will be disseminated to the NutriNet-Santé participants through the cohort website, public seminars, and a press release.

Secondary analyses tested the associations between the proportion in the diet of each of the three other NOVA categories of food processing (continuous) and risk of cancer, using multivariate Cox models adjusted for model 1 covariates. All tests were two sided, with P<0.05 considered to be statistically significant. We used SAS version 9.4 for the analyses.

We did sensitivity analyses based on model 1 by excluding cases of cancer diagnosed during the first two years of each participant’s follow-up to avoid reverse causality bias, testing sex specific fifths of the proportion of ultra-processed foods in the diet instead of sex specific quarters, and testing further adjustments for prevalent depression at baseline (yes/no), dietary supplement use at baseline (yes/no), healthy dietary pattern (continuous, details in appendix 3), number of cigarettes smoked in pack years (continuous), overall fruit and vegetable consumption (continuous), and season of inclusion in the cohort (spring/summer/autumn/winter). We also investigated the association between ultra-processed food and overall cancer risk separately in different strata of the population: men, women, younger adults (under 40 years), older adults (40 years or over), smokers, non-smokers, participants with a high level of physical activity, and those with a low to moderate level of physical activity. We also tested models after restriction of the study population to the participants with at least six 24 hour dietary records during the first two years of follow-up. Similarly, we tested models including all participants with at least one 24 hour dietary record during the first two years of follow-up. We also tested associations between the quantity (g/d) of each ultra-processed food group and risk of cancer.

Models were adjusted for age (timescale), sex, body mass index (kg/m 2 , continuous), height (cm, continuous), physical activity (high, moderate, low, calculated according to IPAQ recommendations 35 ), smoking status (never or former smokers, current smokers), number of 24 hour dietary records (continuous), alcohol intake (g/d, continuous), energy intake (without alcohol, kcal/d, continuous), family history of cancer (yes/no), and educational level (less than high school degree, less than two years after high school degree, two or more years after high school degree). For breast cancer analyses, we made additional adjustments for the number of biological children (continuous), menopausal status at baseline (menopausal/perimenopausal/non-menopausal), hormonal treatment for menopause at baseline (for postmenopausal analyses, yes/no), and oral contraception use at baseline (for premenopausal analyses, yes/no) (model 1=main model). To test for the potential influence of the nutritional quality of the diet in the relation between intake of ultra-processed food and risk of cancer, this model was additionally adjusted for lipid, sodium, and carbohydrate intakes (model 2), for a Western dietary pattern derived from principal component analysis (model 3) (details in appendix 3), or for all these nutritional factors together (model 4). In addition, we did mediation analyses according to the method proposed by Lange et al to evaluate the direct and indirect effect of the relation between the exposure and the outcome through the following nutritional mediators: intakes of sodium, total lipids, saturated, mono-unsaturated and poly-unsaturated fatty acids, carbohydrates, and a Western-type dietary pattern. 45 The methods are described in appendix 4.

Up to 1 January 2017, we included 104 980 participants without cancer at baseline who provided at least two valid 24 hour dietary records during their two first years of follow-up. The flowchart is in appendix 2. For each participant, we calculated the proportion (percentage g/day) of ultra-processed foods in the total diet. We determined the proportion of ultra-processed foods in the diet by calculating a weight ratio rather than an energy ratio to take into account processed foods that do not provide any energy (in particular artificially sweetened drinks) and non-nutritional factors related to food processing (for example, neoformed contaminants, food additives, and alterations to the structure of raw foods). For all covariates except physical activity, less than 5% of values were missing and were imputed to the modal value (for categorical variables) or to the median (for continuous variables). Corresponding values are provided in the footnote to table 1 . The proportion of missing values was higher for physical activity (14%), as the answers to all IPAQ questions were needed to calculate the score. To avoid massive imputation for a non-negligible number of participants or exclusion of those with missing data and risk of selection bias, we included a missing class into the models for this variable. We examined differences in participants’ baseline characteristics between sex specific quarters of the proportion of ultra-processed food in the diet by using analysis of variance or χ 2 tests wherever appropriate. We used Cox proportional hazards models with age as the primary timescale to evaluate the association between the proportion of ultra-processed foods in the diet (coded as a continuous variable or as sex specific quarters) and incidence of overall, breast, prostate, and colorectal cancer. In these models, cancers at other locations than the one studied were censored at the date of diagnosis (that is, we considered them to be non-cases for the cancer of interest and they contributed person years until the date of diagnosis of their cancer). We estimated hazard ratios and 95% confidence intervals with the lowest quarter as the reference category. We generated log-log (survival) versus log-time plots to confirm risk proportionality assumptions. We tested for linear trend by using the ordinal score on sex specific quarters of ultra-processed food. Participants contributed person time until the date of diagnosis of cancer, the date of last completed questionnaire, the date of death, or 1 January 2017, whichever occurred first. Breast cancer analyses were additionally stratified by menopausal status. For these, women contributed person time to the “premenopause model” until their age at menopause and to the “postmenopause model” from their age at menopause. We determined age at menopause by using the yearly health status questionnaires completed during follow-up.

We obtained medical records for more than 90% of cancer cases. Because of the high validity of self reports (95% of self reported cancers for which a medical record was obtained were confirmed by our physicians), we included as cases all participants who self reported incident cancers, unless they were identified as non-case participants by a pathology report, in which case we classified them as non-cases.

Participants self declared health events through the yearly health status questionnaire, through a specific check-up questionnaire for health events (every three months), or at any time through a specific interface on the study website. For each incident cancer declared, a physician from the study team contacted participants and asked them to provide any relevant medical records. If necessary, the study physicians contacted the patient’s physician and/or hospitals to collect additional information. Afterwards, an expert committee of physicians reviewed all medical data. Our research team was the first in France to obtain the authorisation by decree in the Council of State (No 2013-175) to link data from our cohorts to medico-administrative databases of the national health insurance system (SNIIRAM databases). We therefore completed declared health events with the information from these databases, thereby limiting any potential bias due to participants with cancer who may not report their disease to the study investigators. Lastly, we used an additional linkage to the French national cause specific mortality registry (CépiDC) to detect deaths and potentially missed cases of cancer for deceased participants. We classified cancer cases by using the international classification of diseases, 10th revision (ICD-10). In this study, we considered all first primary cancers diagnosed between the inclusion date and 1 January 2017 to be cases, except for basal cell skin carcinoma, which we did not consider as cancer.

The ultra-processed food group is defined by opposition to the other NOVA groups: “unprocessed or minimally processed foods” (fresh, dried, ground, chilled, frozen, pasteurised, or fermented staple foods such as fruits, vegetables, pulses, rice, pasta, eggs, meat, fish, or milk), “processed culinary ingredients” (salt, vegetable oils, butter, sugar, and other substances extracted from foods and used in kitchens to transform unprocessed or minimally processed foods into culinary preparations), and “processed foods” (canned vegetables with added salt, sugar coated dried fruits, meat products preserved only by salting, cheeses, freshly made unpackaged breads, and other products manufactured with the addition of salt, sugar, or other substances of the “processed culinary ingredients” group). As previously described, 44 we identified homemade and artisanal food preparations, decomposed them using standardised recipes, and applied the NOVA classification to their ingredients. Precision and examples are shown in appendix 1.

We categorised all food and drink items of the NutriNet-Santé composition table into one of the four food groups in NOVA, a food classification system based on the extent and purpose of industrial food processing. 9 42 43 This study primarily focused on the “ultra-processed foods” NOVA group. This group includes mass produced packaged breads and buns; sweet or savoury packaged snacks; industrialised confectionery and desserts; sodas and sweetened drinks; meat balls, poultry and fish nuggets, and other reconstituted meat products transformed with addition of preservatives other than salt (for example, nitrites); instant noodles and soups; frozen or shelf stable ready meals; and other food products made mostly or entirely from sugar, oils and fats, and other substances not commonly used in culinary preparations such as hydrogenated oils, modified starches, and protein isolates. Industrial processes notably include hydrogenation, hydrolysis, extruding, moulding, reshaping, and pre-processing by frying. Flavouring agents, colours, emulsifiers, humectants, non-sugar sweeteners, and other cosmetic additives are often added to these products to imitate sensorial properties of unprocessed or minimally processed foods and their culinary preparations or to disguise undesirable qualities of the final product.

Participants were invited to complete a series of three non-consecutive, validated, web based 24 hour dietary records every six months (to vary the season of completion), randomly assigned over a two week period (two weekdays and one weekend day). 36 37 38 To be included in the nutrition component of the NutriNet-Santé cohort, only two dietary records were mandatory. We did not exclude participants if they did not complete all optional questionnaires. We averaged mean dietary intakes from all the 24 hour dietary records available during the first two years of each participant’s follow-up and considered these as baseline usual dietary intakes in this prospective analysis. The NutriNet-Santé web based, self administered 24 hour dietary records have been tested and validated against an interview by a trained dietitian and against blood and urinary biomarkers. 36 37 Participants used the dedicated web interface to declare all food and drinks consumed during a 24 hour period for each of the three main meals (breakfast, lunch, dinner) and any other eating occasion. Portion sizes were estimated using previously validated photographs or usual containers. 39 We identified dietary under-reporting on the basis of the method proposed by Black, using the basal metabolic rate and Goldberg cut-off, and excluded under-reporters of energy intake. 40 We calculated mean daily alcohol, micronutrient and macronutrient, and energy intake by using the NutriNet-Santé food composition database, which contains more than 3300 different items. 41 We estimated amounts consumed from composite dishes by using French recipes validated by nutrition professionals. Sodium intake was assessed via a specific module included in the 24 hour records, taking into account native sodium in foods, salt added during the cooking, and salt added on the plate. It has been validated against sodium urinary excretion biomarkers. 37

At inclusion, participants completed a set of five questionnaires related to sociodemographic and lifestyle characteristics (for example, date of birth, sex, occupation, educational level, smoking status, number of children), 32 anthropometry (height, weight), dietary intakes (see below), 33 34 physical activity (validated seven day International Physical Activity Questionnaire (IPAQ)), 35 and health status (personal and family history of diseases, drug use including use of hormonal treatment for menopause and oral contraceptives, and menopausal status).

The NutriNet-Santé study is an ongoing web based cohort launched in 2009 in France with the objective of studying the associations between nutrition and health, as well as the determinants of dietary behaviours and nutritional status. This cohort has been previously described in detail. 31 Briefly, participants aged over 18 years with access to the internet have been continuously recruited from among the general population since May 2009 by means of vast multimedia campaigns. All questionnaires are completed online using a dedicated website ( www.etude-nutrinet-sante.fr ). Participants are followed using an online platform connected to their email address. They can change their email address, phone number, or postal address at any time on the NutriNet-Santé website. Newsletters and alerts about new questionnaires are sent by email. In case of an “undelivered email” problem, participants are contacted by telephone and then by regular mail. The NutriNet-Santé study is conducted according to the Declaration of Helsinki guidelines, and electronic informed consent is obtained from each participant.

Results

A total of 104 980 participants (22 821 (21.7%) men and 82 159 (78.3%) women) were included in the study. The mean age of participants was 42.8 (SD 14.8, range 18.0-72.8) years. The mean number of dietary records per participant over their first two years of follow-up was 5.4 (SD 2.9); the minimum was 2, but it represented only 7.2% (7558/104 980) of the participants. After the launching of the study by the end of May 2009, half of the records were filled between June and November and the other half between December and May. Table 1 shows the main baseline characteristics of participants according to quarters of the proportion of ultra-processed foods in the diet. Compared with the lowest quarter, participants in the highest quarter of ultra-processed food intake tended to be younger, current smokers, and less educated, with less family history of cancer and a lower physical activity level. Furthermore, they had higher intakes of energy, lipids, carbohydrates, and sodium, along with lower alcohol intake. Although there was a higher proportion of women than men in this cohort, the contribution of ultra-processed foods to the overall diet was very similar between men and women (18.74% for men and 18.71% for women; P=0.7). The distribution of the proportion of ultra-processed food in the diet in the study population is shown in appendix 5. Main food groups contributing to ultra-processed food intake were sugary products (26%) and drinks (20%), followed by starchy foods and breakfast cereals (16%) and ultra-processed fruits and vegetables (15%) (fig 1).

Fig 1 Relative contribution of each food group to ultra-processed food consumption in diet

During follow-up (426 362 person years, median follow-up time five years), 2228 first incident cases of cancer were diagnosed and validated, among which were 739 breast cancers (264 premenopausal, 475 postmenopausal), 281 prostate cancers, and 153 colorectal cancers. Among these 2228 cases, 108 (4.8%) were identified during mortality follow-up with the national CépiDC database. The dropout rate in the NutriNet-Santé cohort was 6.7%. Table 2 shows associations between the proportion of ultra-processed foods in the diet and risks of overall, breast, prostate, and colorectal cancer. Figure 2 shows the corresponding cumulative incidence curves. In model 1, ultra-processed food intake was associated with increased risks of overall cancer (hazard ratio for a 10 point increment in the proportion of ultra-processed foods in the diet 1.12 (95% confidence interval 1.06 to 1.18), P<0.001) and breast cancer (1.11 (1.02 to 1.22), P=0.02). The latter association was more specifically observed for postmenopausal breast cancer (P=0.04) but not for premenopausal breast cancer (P=0.2). The association with overall cancer risk was statistically significant in all strata of the population investigated, after adjustment for model 1 covariates: in men (hazard ratio for a 10 point increment in the proportion of ultra-processed foods in the diet 1.12 (1.02 to 1.24), P=0.02, 663 cases and 22 158 non-cases), in women (1.13 (1.06 to 1.20), P<0.001, 1565 cases and 80 594 non-cases), in younger adults (<40 years old 1.21 (1.09 to 1.35), P<0.001, 287 cases and 48 627 non-cases), in older adults (≥40 years old, 1.09 (1.03 to 1.16), P=0.03, 1941 cases and 54 485 non-cases), in smokers (including adjustment for pack years of cigarettes smoked 1.18 (1.04 to 1.33), P=0.01, 255 cases and 15 355 non-cases), in non-smokers (1.11 (1.05 to 1.17), P<0.001, 1943 cases and 85 219 non-cases), in participants with low to moderate levels of physical activity (1.07 (1.00 to 1.15), P=0.04, 1216 cases and 59 546 non-cases), and in those with a high level of physical activity (1.19 (1.09 to 1.30), P<0.001, 744 cases and 28 859 non-cases).

Table 2 Associations between ultra-processed food intake and risk of overall, prostate, colorectal, and breast cancer, from multivariable Cox proportional hazard models*, NutriNet-Santé cohort, France, 2009-17 (n=104 980) View this table:

Fig 2 Cumulative cancer incidence (overall cancer risk) according to quarters of proportion of ultra-processed food in diet

More specifically, ultra-processed fats and sauces (P=0.002) and sugary products (P=0.03) and drinks (P=0.005) were associated with an increased risk of overall cancer, and ultra-processed sugary products were associated with risk of breast cancer (P=0.006) (appendix 6).

Further adjustment for several indicators of the nutritional quality of the diet (lipid, sodium, and salt intakes—model 2; Western pattern—model 3; or both—model 4) did not modify these findings. The Pearson correlation coefficient between the proportion of ultra-processed food in the diet and the Western dietary pattern was low (0.06). Consistently, analyses performed according to the method proposed by Lange et al to assess a potential mediation of the relation between ultra-processed food and risk of cancer by these nutritional factors showed no statistically significant mediation effect of any of the factors tested.45 The mediated effects ranged between 0% and 2%, with all P>0.05 (appendix 4).

No association was statistically significant for prostate and colorectal cancers. However, we observed a borderline non-significant trend of increased risk of colorectal cancer associated with ultra-processed food intake (hazard ratio for quarter 4 versus quarter 1: 1.23 (1.08 to 1.40), P for trend=0.07) in model 4.

Sensitivity analyses (adjusted for model 1 covariates, data not tabulated) excluding cancer cases diagnosed during the first two years of follow-up provided similar results (hazard ratio for a 10 point increment in the proportion of ultra-processed foods in the diet 1.10 (1.03 to 1.17), P=0.005 for overall cancer risk, 1367 cases and 102 502 non-cases included; 1.15 (1.03 to 1.29), P=0.02 for breast cancer risk, 441 cases and 80 940 non-cases included). Similarly, results were unchanged when we excluded non-validated cancer cancers (hazard ratio for a 10 point increment in the proportion of ultra-processed foods in the diet 1.11 (1.05 to 1.17), P<0.001 for overall cancer risk, 1967 cases and 102 752 non-cases included; 1.12 (1.02 to 1.23), P=0.02 for breast cancer risk, 677 cases and 81 274 non-cases included).

We obtained similar results when we included only participants with at least six 24 hour records (overall cancer risk: hazard ratio for a 10 point increment in the proportion of ultra-processed foods in the diet 1.13 (1.06 to 1.21), P<0.001, 1494 cases and 47 920 non-cases included) and when we re-included participants with only one 24 hour record (overall cancer risk: 1.11 (1.06 to 1.16), P<0.001, 2383 cases and 122 196 non-cases included).

Findings were also similar when we coded the proportion of ultra-processed food in the diet as sex specific fifths instead of quarters (overall cancer risk: hazard ratio for highest versus lowest fifth 1.25 (1.08 to 1.47), P for trend<0.001; breast cancer risk: 1.25 (0.96 to 1.63), P for trend=0.03).

Further adjustment for the following variables, in addition to model 1 covariates, did not modify the results: dietary supplement use at baseline (hazard ratio for a 10 point increment in the proportion of ultra-processed foods in the diet 1.12 (1.06 to 1.17), P<0.001 for overall cancer; 1.11 (1.02 to 1.22), P=0.02 for breast cancer), prevalent depression at baseline (1.11 (1.06 to 1.17), P<0.001 for overall cancer; 1.11 (1.01 to 1.22), P=0.02 for breast cancer), healthy dietary pattern (1.11 (1.05 to 1.17), P<0.001 for overall cancer; 1.10 (1.00 to 1.21), P=0.04 for breast cancer), overall fruit and vegetable consumption in g/d (1.10 (1.04 to 1.16), P<0.001 for overall cancer; 1.11 (1.01 to 1.22), P=0.03 for breast cancer), number of smoked cigarettes in pack years (1.13 (1.07 to 1.19), P<0.001 for overall cancer; 1.13 (1.03 to 1.24), P=0.009 for breast cancer), and season of inclusion in the cohort (1.12 (1.06 to 1.18), P<0.001 for overall cancer; 1.12 (1.02 to 1.22), P=0.02 for breast cancer).

We also tested other methods for handling missing data, such as multiple imputation and complete case analysis (that is, exclusion of participants with missing data for at least one covariate).46 The results were very similar for the multiple imputation analysis (hazard ratio for a 10 point increment in the proportion of ultra-processed foods in the diet 1.11 (1.06 to 1.17), P<0.001, 2228 cases and 10 2752 non-cases for overall cancer; 1.11 (1.01 to 1.21), P=0.02, 739 cases and 81 420 non-cases for breast cancer) and for the complete case analysis (1.11 (1.05 to 1.18), P<0.001, 1813 cases and 82 824 non-cases for overall cancer; 1.14 (1.03 to 1.26), P=0.01, 579 cases and 64 642 non-cases for breast cancer).

As a secondary analysis, we also tested associations between the proportions of the three other NOVA degrees of food processing and risk of cancer. We found no significant associations between the proportions of “processed culinary ingredients” or “processed foods” with risk of cancer at any location (all P>0.05). However, and consistent with our findings, the consumption of “minimally/unprocessed foods” was associated with lower risks of overall and breast cancers (hazard ratio for a 10 point increment in the proportion of unprocessed foods in the diet 0.91 (0.87 to 0.95), P<0.001, 2228 cases and 102 752 non-cases for overall cancer; 0.42 (0.19 to 0.91), P=0.03, 739 cases and 81 420 non-cases for breast cancer), in multivariable analyses adjusted for model 1 covariates.