The SUN project is a prospective, dynamic, and multipurpose cohort comprising Spanish university graduates. Its design, objectives, and methods have been described previously. 21 Briefly, recruitment started in December 1999, and, as the project was designed to be a dynamic cohort, it is permanently open. Participants are followed-up every two years, with information gathered through postal or web-based questionnaires. To ensure a minimum follow-up of two years, we only considered participants recruited before March 2014 (n=22 279). We excluded 165 participants with a total daily energy intake below and above the first and 99th centiles, and 2215 participants were lost to follow-up (retention rate: 90%). Data from 19 899 participants were available for analyses.

Petit suisse; custard; flan; pudding; ice cream; ham; processed meat (chorizo, salami, mortadella, sausage, hamburger, morcilla); pate; foie-gras; spicy sausage/meatballs; potato chips; breakfast cereals; pizza, including pre-prepared pies; margarine; cookies; chocolate cookies; muffins; doughnuts; croissant or other non-handmade pastries; cakes; churros; chocolates and candies; nougat; marzipan; carbonated drinks; artificially sugared beverages; fruit drinks; milkshakes; instant soups and creams; croquettes; mayonnaise; and alcoholic drinks produced by fermentation followed by distillation such as whisky, gin, and rum

To estimate the frequency of consumption of ultra-processed food we summed the amount consumed (servings per day) of each food item classified in the fourth category of the NOVA system (a total of 34 items). We then divided the sample into quarters according to total consumption of ultra-processed foods (total servings per day). Box 1 shows the classification of the foods in the food frequency questionnaire according to NOVA. The food frequency questionnaire is a validated tool that can be used to assess total energy intake; macronutrient and fibre intake; alcohol intake; and consumption of fruit, vegetables, fast food, fried food, processed meat, unprocessed meat, and sugar sweetened beverages. 22 23 24 Adherence to a Mediterranean diet was evaluated using the score proposed by Trichopoulou and colleagues. 25

We categorised all food and beverage items of the food frequency questionnaire into one of the four NOVA food groups—a classification system based on the extent and purpose of industrial food processing. 9 The first group includes unprocessed or minimally processed foods, which are fresh or processed in ways that do not add substances such as salt, sugar, oils, or fats, and infrequently contain additives. The processes aim to extend life, allow storage for long use, and facilitate or enable different methods to be used for preparation (freezing, drying, and pasteurisation). Examples in this group include fruit and vegetables, grains (cereals), flours, nuts and seeds, fresh and pasteurised milk, natural yogurt with no added sugar or artificial sweeteners, meat and fish, tea, coffee, spices, and herbs. The second group contains processed culinary ingredients. These are substances obtained from foods of the first group or from nature and might contain additives to preserve the original properties (ie, salt, sugar, honey, vegetable oils, butter, lard, and vinegar). The third group comprises processed foods, to which substances such as salt, sugar, or oil have been added and methods such as smoking, curing, or fermentation have been used. Examples include canned or bottled vegetables and legumes, fruit in syrup, canned fish, cheeses, freshly made bread, and salted or sugared nuts and seeds. The fourth group comprises ultra-processed foods and drink products that are made predominantly or entirely from industrial substances and contain little or no whole foods. These products are ready to eat, drink, or heat—that is, carbonated drinks, sausages, biscuits (cookies), candy (confectionery), fruit yogurts, instant packaged soups and noodles, sweet or savoury packaged snacks, and sugared milk and fruit drinks. We focused on this last NOVA group.

Type of diet consumed was assessed at baseline with a 136 item semiquantitative food frequency questionnaire previously validated and repeatedly re-evaluated in Spain. 22 23 24 We measured frequencies of consumption in nine categories (ranging from never or almost never to more than six servings daily), and the food frequency questionnaire included a typical portion size for each item. To estimate daily consumption for each food item, we multiplied the portion size by the frequency of consumption.

Follow-up for each participant was calculated from the date when the baseline questionnaire was returned to the date of death or the date when the last follow-up questionnaire was returned, whichever came first. In only 22 out of 335 deaths (6.5%) the cause of death was unknown.

The primary outcome was all cause mortality. More than 85% of deaths were identified by reports from next of kin, work associates, and authority postal service. With permission of the next of kin, we reviewed the medical records to confirm the deaths. To confirm the remainder of the deaths, we checked the Spanish National Death Index and the National Statistics Institute at least once a year. Given the continuous contact with participants in the cohort and the comprehensive and mandatory nature of the Spanish National Death Index, the use of these combined sources of information can be assumed to have 100% positive predictive value for fatal events.

From the baseline questionnaire we also collected information on sex, age, marital status, educational level, smoking, physical activity, television viewing, napping, diet and dietary habits, and snacking. A validated 17 item questionnaire was used to evaluate physical activity. 26 We also collected data on self reported anthropometric characteristics at baseline. A validation study with a subsample of the cohort showed sufficient validity for use in epidemiological studies. 27 To detect underweight, overweight, and obesity we calculated the body mass index (BMI) as body weight (kg) divided by height (m 2 ).

Statistical analysis

We used inverse probability weighting28 to adjust the means or proportions of baseline variables for age and sex according to quarters of consumption of ultra-processed foods. Consumption of ultra-processed food was adjusted for total energy intake using the residuals method and subsequently categorised into quarters: low consumption (first quarter), low-medium consumption (second quarter), medium-high consumption (third quarter), and high consumption (fourth quarter). No data were missing for this variable of interest.

To assess the association between energy adjusted quarters of ultra-processed food consumption at baseline and all cause mortality, we fitted Cox regression models with age as the underlying time variable (birth date as origin), and date of death or date when the last follow-up questionnaire was completed for survivors as exit time. We estimated hazard ratios for the second to fourth quarters along with 95% confidence intervals, with the lowest quarter as the reference category. To minimise the potential effect of a variation in diet during follow-up, we fitted Cox proportional hazard models with repeated dietary measurements using the updated data on food consumption after 10 years of follow-up.

We adjusted the Cox regression models for several potential confounders defined a priori. As recommended, we identified potential confounders based on existing literature, rather than deferring to statistical criteria.2930

Potential confounders included as covariates in multivariable models were age; sex; marital status, married (yes or no); baseline body mass index (linear and quadratic term); total energy intake (kcal/day, continuous); smoking status (never, current, former smoker); family history of cardiovascular disease (dichotomous); alcohol consumption (g/day, continuous); cardiovascular disease, cancer, or diabetes at baseline (yes or no); hypertension at baseline (yes or no); self reported hypercholesterolaemia at baseline (yes or no); depression at baseline (yes or no); educational level (non-graduate, graduate, postgraduate, doctorate); snacking (yes or no); following a special diet at baseline (yes or no); physical activity (quarters); and lifelong cumulative exposure to smoking (pack years of smoking, continuous). Results were stratified by recruitment period (1999-2000, 2001, 2002-03, 2004, 2005-07, 2008-14), deciles of age, time spent watching television (dichotomous, cut-off: ≥3 h/day), and four categories of a sedentary index defined as the number of hours spent daily watching television, using a computer, and driving. When participants had missing values on snacking or following a special diet, we considered them as doing neither, and we also used multiple imputation for missing values in those variables.

In addition to standard adjustment for confounders, we alternatively adjusted the models using propensity scores.

Although we adjusted for a wide range of confounders, we cannot rule out residual confounding. Consumption of ultra-processed food is a behaviour that might be closely linked to other aspects of a non-healthy lifestyle. To assess this in detail, we calculated the E value proposed by Vanderweele.3132 This value represents the minimum strength of association, on the risk ratio scale, that an unmeasured confounder would need with both the exposure and the outcome, conditioned by the measured covariates, to fully explain a specific association.

To investigate linear trends across the quarters of consumption of ultra-processed foods we assigned the median value to each category and considered the variable as being continuous. We verified the proportionality of hazards with a test based on Schoenfeld residuals; the non-significant result (P=0.11) suggested that the proportionality assumption had been met.

To assess the contribution of each food group to the total consumption of ultra-processed foods, we calculated the ratio between the servings of each food group divided by the total servings of ultra-processed foods multiplied by 100.

We used Kaplan-Meier curves, with inverse probability weighting to adjust for confounding, to describe all cause mortality according to baseline quarters of ultra-processed foods consumption. To simplify the graph, we merged the first and second quarters (low and low-medium consumption) into one group and the third and fourth quarters (high-medium and high consumption) in another group. This grouping lowers random variability and provides more stable estimates.

Based on our experience and on several simulations, we used restricted cubic splines to calculate the potential non-parametrical non-linear association between consumption of ultra-processed food and all cause mortality. Tests for non-linearity used the likelihood ratio test to compare the model that comprised the linear term with the model that comprised both the linear and the cubic spline terms. The likelihood of both models can be compared using the Akaike’s information criterion or bayesian information criterion. Both penalise the likelihood of the model, and the one that results in the lowest value using either criterion will be the most likely model.

Additionally, we conducted subgroup analyses by rerunning all the models under different a priori assumptions: including only men, only women, only participants aged 50 or older at recruitment, and only participants aged 50 or younger at recruitment; truncating the follow-up at three years; starting follow-up at three years after the baseline questionnaire; excluding participants with a BMI of less than 25 or 25 or more; including only never smokers; and excluding never smokers.

Sensitivity analyses were also conducted by rerunning the models under different a priori assumptions: using the 5th and 95th centiles as limits for allowable total energy intake; using energy limits previously proposed by Willett33; excluding participants with prevalent cardiovascular disease or cancer; excluding participants with hypertension at baseline; excluding participants with depression at baseline; excluding participants following special diets at baseline; and excluding deaths from injuries, deaths from cancer, and deaths from cardiovascular disease. We additionally adjusted for weight gain of 3 kg or more in the year before inclusion in the cohort, coffee consumption, a quadratic term of alcohol intake, consumption of all fried foods, following a Mediterranean diet,25 sodium intake, and intake of saturated and trans fatty acids, added sugars, and sodium.

We considered P values of less than 0.05 to be statistically significant, and these were corrected using Simes method.34 Analyses were performed using STATA version 15.0 (StataCorp, College Station, TX).