Study Design and Oversight

The China Kadoorie Biobank Study is a nationwide, prospective cohort study involving 10 diverse localities (regions) in China, which is jointly coordinated by the University of Oxford and the Chinese Academy of Medical Sciences. The study design and methods have been reported previously.9,10 Approval of the study was obtained from ethics committees or institutional review boards at the University of Oxford, the Chinese Center for Disease Control and Prevention (China CDC), the Chinese Academy of Medical Sciences, and all participating regions. The funders had no role in study design, data collection and analysis, preparation of the manuscript, or the decision to submit it for publication.

Baseline Survey

We selected 10 regional study sites (5 urban and 5 rural) in order to cover a wide range of risk exposures and disease patterns, taking into account the accuracy and completeness of death and disease registries for each region and the local capability to gather the necessary study data. Between June 2004 and July 2008, all nondisabled, permanent residents of each region who were 35 to 74 years of age were invited to participate in the study. Of the total of approximately 1.8 million eligible adults in these regions, almost 1 in 3 (33% in rural areas and 27% in urban areas) responded. Overall, 512,891 persons were recruited, including a few who were just outside the targeted age range, and all provided written informed consent.

At the local study clinics, trained health workers administered a laptop-based questionnaire on sociodemographic characteristics, smoking and alcohol consumption, diet, physical activity, and medical history; measured height, weight, waist circumference, and blood pressure; and performed spot random blood glucose testing using the SureStep Plus System (Johnson & Johnson). Blood pressure was measured at least twice with the use of an automated digital blood-pressure monitor (model UA-779, A&D Medical) after at least 5 minutes of rest in a seated position; the mean of two satisfactory measurements was used for analyses.11,12

Dietary data covered 12 major food groups: rice, wheat products, other staple foods, meat, poultry, fish, eggs, dairy products, fresh vegetables, preserved vegetables, fresh fruit, and soybean products. Respondents were asked about the frequency of habitual consumption during the previous 12 months and chose among five categories of frequency (daily, 4 to 6 days per week, 1 to 3 days per week, monthly, or never or rarely [the reference category]). In a subsample of 926 participants, the survey questionnaire was repeated within a year after the baseline assessment (mean interval, 5.4 months) in order to assess the reproducibility of the responses.

Resurveys

After completion of the baseline survey, we randomly selected 5 to 6% of the original participants for two resurveys, using procedures similar to those at baseline. The first resurvey took place from July through October 2008, with 19,788 participants resurveyed, and the second was conducted from August 2013 through September 2014, with just over 25,000 participants resurveyed. In addition to questions about the frequency of consumption, the second resurvey questionnaire asked about the amount consumed for each food group, which was used as a proxy measure to estimate the average consumption at baseline for each of the five frequency categories (see the Supplementary Appendix, available with the full text of this article at NEJM.org).

Follow-up and Outcome Measures

The vital status of each participant was determined periodically through the Disease Surveillance Points (DSP) system of the China CDC.13 The DSP vital-status data sets were checked annually against local residential records and health insurance records and were confirmed with street committees or village administrators. In addition, information about major diseases and episodes of hospitalization was collected through linkages with disease registries (for cancer, cardiovascular disease, and diabetes) and national health insurance claims databases. The Chinese National Health Insurance scheme provides electronic linkage to all hospitalization data.

Fatal and nonfatal events were documented according to the International Classification of Diseases, 10th Revision (ICD-10), by coders who were unaware of the baseline characteristics of the study participants.10 The four main outcome measures that were examined were cardiovascular death (ICD-10 codes I00 to I25, I27 to I88, and I95 to I99) and the incidence of major coronary events (fatal ischemic heart disease [codes I20 to I25] plus nonfatal myocardial infarction [code I21]), hemorrhagic stroke (code I61), and ischemic stroke (code I63). Other ischemic heart disease (i.e., ischemic heart disease not meeting the criteria for a major coronary event) and other cerebrovascular diseases (ICD-10 codes I60, I62, and I64 to I69) were analyzed separately. For analyses of incident disease, only the first cardiovascular event was counted.

Statistical Analysis

For the current study, we excluded persons who had a history of cardiovascular disease (23,132 persons) or antihypertensive treatment (48,174 persons) at baseline. Baseline characteristics of the remaining 451,665 participants were described as means and standard deviations or percentages in each category of fruit consumption, with adjustment for age, sex, and region as appropriate, by means of either multiple linear regression (for continuous outcomes) or logistic regression (for binary outcomes). The marginal mean values and 95% confidence intervals for body-mass index (BMI), waist circumference, blood pressure, and blood glucose according to frequency of fruit consumption were estimated separately for men and women with the use of multiple linear regression models adjusted for baseline covariates.

Cox regression was used to calculate hazard ratios and 95% confidence intervals for relative risks in relation to fruit consumption, with adjustment for baseline covariates and with stratification according to age at risk (in 5-year intervals), sex, and region. For analyses involving more than two exposure categories, the floating-absolute-risk method was used, which provides the variance of the logarithm of the hazard ratio (i.e., to compute a confidence interval for the hazard ratio) for each category (including the reference category) to facilitate comparisons among the different exposure categories.14 Conventional (unfloated) analyses were also performed so that the results of the two methods could be compared. We calculated the hazard ratio for cardiovascular death with nondaily fruit consumption as compared with daily consumption in order to estimate the population-attributable fraction, using the formula Pe(HR−1)÷[Pe(HR−1)+1],15 where Pe is the prevalence of nondaily consumption of fresh fruit in the Chinese population, and HR is the hazard ratio. Additional information about the statistical analyses is provided in the Supplementary Appendix.