Animal husbandry

FVB Hi-MYC mice (strain number 01XK8), expressing the human c-MYC transgene in prostatic epithelium, were obtained from the National Cancer Institute Mouse Repository at Frederick National Laboratory for Cancer Research21. Upon weaning (3 weeks), male mice heterozygous for the transgene (MYC), together with their wild type littermates (WT), were fed a purified control diet (CTD; Harlan Laboratories, TD.130838) consisting of 10% fat, or a high-fat diet (HFD; Harlan Laboratories, TD. 06414) consisting of 60% fat (Supplementary Table 1) until 12, 24 or 36 weeks of age; ingredients were adjusted on a kcal basis (Supplementary Table 6). For dietary intervention experiments, mice assigned an HFD were switched to a CTD at 10 weeks of age for the folllowing 2 weeks until the experimental endpoint. Litters were randomly assigned to each diet. Group allocation was performed in a non-blinded fashion. Food was changed on a weekly basis, and mice were weighed every three weeks, starting at weaning. Animals were kept on a 12-h light/12-h dark cycle, and allowed free access to food and water at the Dana-Farber Cancer Institute (DFCI) Animal Resources Facility. The animal protocol was reviewed and approved by the DFCI Institutial Care and Use Committee (IACUC), and was in accordance with the Animal Welfare Act. Mice sample size estimate for analyses was based on published literature.

Tissue collection

At defined time points, mice were weighed and euthanized by CO 2 , followed by cervical dislocation; blood was collected by cardiac puncture, and serum was collected using serum-separating tubes (#41.1378.005, Sarstedt), aliquoted, and stored at −80 °C. Urogenital apparatus and liver tissues were fixed in 10% buffered formalin and processed for paraffin embedding. Alternatively, mouse prostate lobes (anterior prostate, AP; dorsolateral prostate, DLP; ventral prostate, VP) were immediately dissected, weighed and flash-frozen in liquid nitrogen. Serum and tissues were consistently collected during the same periods to minimise inter-samples and circadian rhythm variability.

Histopathologic and immunohistochemical analyses

Formalin-fixed, paraffin-embedded mouse urogenital apparatus and liver tissues were sectioned (5 μm) and stained with hematoxylin and eosin (H&E). Histopathological slides were analysed by expert murine uropathologist, who were blind to the experimental conditions. Hepatic steatosis was also assessed for liver tissues (M.L.). The presence and extent of PIN in 12-weeks-old mice (AP, DLP, VP) was estimated for each mouse, by evaluating the percentage of the gland affected for each prostate lobe and reported in Supplementary Data 1 (M.L.). For Ki-67 staining, slides were baked for 60 min in an oven set to 60 °C. They were then loaded into the Bond III staining platform with appropriate labels. Slides were antigen retrieved in Bond Epitope Retrieval 2 for 20 min, and incubated with rabbit monoclonal anti-Ki-67 antibody (#VP-RM04 (clone SP6), Vectors Laboratories) at dilution 1:250 for 30 min, room temperature. Primary antibody was detected using Bond Polymer Refine Detection kit. Slides were developed in 3,3′-diaminobenzidine (DAB), dehydrated, and coverslipped. The percentage of Ki-67 positive cells was evaluated by counting the number of cell that expresses nuclear Ki-67 as a function of the total number of cells per high power field. Whenever possible, up to 10 high power fields for each VP lobe were counted, averaged, and counts were reported as each sample’s score (F.G. and M.F.). Sample size for histological evaluation was estimated based on previous literature data, using the same model10. For Ki-67 analysis, we performed sample size calculation using the software G*power version 3.1, extrapolating the effect size (d = around 0.87) from the data of Kobayashi et al.11 in MYC mice fed with HFD. Based on this assumption, we calculated that at least 22 mice/group should be used to detect a significant difference in Ki-67 positivity using a two-sided t-test for change in mean between two independent groups, with an alpha-error of 0.05 and a priori power of 0.8.

Insulin ELISA

Serum insulin levels were measured using an insulin-1 ELISA kit from Sigma-Aldrich (#RAB0817). Briefly, samples were diluted 1:3 or 1:5 in diluent buffer C (provided in the kit) and the assay was performed according to the manufacturer’s instructions. Each sample was measured twice (technical duplicate). Outliers (identified using the ROUT method, Q = 0.1%), and samples in which insulin levels were under the detection limit of the assay, were removed from the analysis. Statistical analysis and graphical representation were performed with use of GraphPad Prism version 7.0.

Metabolic profiling

For Metabolic profiling of serum and prostatic tissues (VP), we used the platform from Metabolon Inc. (Durham, NC, USA). Mice sample size to ensure adequate power for metabolomics analysis was based on previous literature data using a similar model26. Information regarding sample preparation, quality assurance (QA) and control (QC), and metabolite quantification was provided by the company as follows:

Sample preparation: Biological samples were stored at −80 °C and then thawed on ice just prior to extraction. Tissue samples were weighed at Metabolon on a 4-position analytical scale (1/10th mg) and then soaked overnight in 80% methanol/20% deionized water with recovery standards at a 60 μL: 1 mg ratio. The methanol contained four recovery standards (DL-2-fluorophenylglycine, tridecanoic acid, d6-cholesterol and 4-chlorophenylalanine) to allow confirmation of extraction efficiency. For serum, 100 μl sample volume was extracted with 500 μl of methanol containing recovery standards. All extracts were divided into four fractions: one for Ultra-performance liquid chromatography tandem mass-spectrometry (UPLC-MS/MS) with positive ion mode electrospray ionisation (IMEI); one for (UPLC-MS/MS) with negative IMEI; one for liquid chromatography (LC) polar platform; the final fraction was reserved as a backup. Aliquots were dried and then the first aliquot was reconstituted in 80 μL of 6.5 mM ammonium bicarbonate in water (pH 8) for the negative ion analysis, the second aliquot was reconstituted using 80 μL 0.1% formic acid in water (pH ~3.5) for the positive ion method, while the third aliquot was reconstituted in 80 µL of hydrophilic interaction liquid chromatography (HILIC) solvent (15% H 2 O: 5% MeOH: 80% ACN) with 10 mM ammonium formate (pH~10) for the HILIC method.

QA/QC: Several types of controls were analysed together with the experimental samples: (1) a pooled matrix sample specific for each sample type (i.e. prostate and serum) was generated by combining 20 μl of each experimental sample and injecting the pooled sample six times for each data set to serve as a technical replicate to assess process variability; (2) five water aliquots were extracted and analysed to serve as process blanks for artifact determination; (3) a cocktail of internal standards, carefully chosen to not interfere with the measurement of endogenous compounds, was spiked into every analysed sample to monitor instrument performance and serve as retention markers for chromatographic alignment. The list of internal standards is provided in Supplementary Table 7. Instrument variability was evaluated during the entire procedure. Experimental samples were randomised across the platform run.

UPLC Method: Separations were performed using a Waters Acquity UPLC (Waters, Milford, MA). Reverse-phase (RP) positive ion method analysis used mobile phase consisting of 0.1% formic acid in water (A) and 0.1% formic acid in methanol (B). Reverse-phase negative ion analysis used mobile phase consisting of 6.5 mM ammonium bicarbonate in water, pH 8 (A) and 6.5 mM ammonium bicarbonate in 95% methanol/5% water (B). The sample injection volume was 5 μL and a 2x needle loop overfill was used. Separations utilised separate acid and base-dedicated 2.1 mm × 100 mm Waters BEH C18 1.7 μm columns held at 40 °C. HILIC used mobile phase consisting of 10 mM ammonium formate in 15% water, 5% methanol, 80% acetonitrile (effective pH 10.16 with NH4OH) (A) and 10 mM ammonium formate in 50% water, 50% acetonitrile (effective pH 10.60 with NH4OH) (B). The sample injection volume was identical to RP method. The stationary phase consisted of a 2.1 mm × 150 mm Waters BEH Amide 1.7 μm column held at 40 °C. The gradient profiles for RP and HILIC methods can be found in Supplementary Table 8.

High Resolution Accurate Mass (HRAM) method: A ThermoFisher Scientific (Waltham, MA) Q-Exactive was the HRAM instrument used52. Detailed source and MS settings can be found in Supplementary Table 9 (conditions are also described in supplementary information from Evans et al.)53. The scan range was 80–1000 m/z with a scan speed of ~9 scans per second (alternating between MS and MS/MS scans), and the resolution was set to 35,000 (measured at 200 m/z). Mass calibration was performed as needed to maintain <5 ppm mass error for all standards monitored.

Biological sample analysis: Metabolon has developed a chemocentric approach that was used in peak detection and integration, and is described in detail elsewhere54,55,56. This in-house peak detection and integration software was used, the data output of which was a list of m/z ratios, retention indices (RI) and area under the curve (AUC) values. User specified criteria for peak detection included thresholds for signal to noise ratio, area and width. Relative standard deviations (RSDs) of peak area were determined for each internal and recovery standard to confirm extraction efficiency, instrument performance, column integrity, chromatography and mass calibration. The biological data sets, including QC samples, were chromatographically aligned based on a retention index that utilised internal standards assigned a fixed RI value. The RI of the experimental peak was determined by assuming a linear fit between flanking RI markers whose RI values are set. Peaks were matched against an in-house library of authentic standards and routinely detected unknown compounds specific to the respective method. The library consisted of 3200 endogenous and exogenous metabolites for which super and subpathway designations were provided. Identifications were based on retention index values, experimental precursor mass match to the library authentic standard within 10 ppm, and quality of MS/MS match. MS/MS forward and reverse match scores were based on a comparison of the ions present in the experimental spectrum to the ions present in the library spectrum. A forward score of 100 would mean all the ions present in the experimental spectrum were present in the library at the correct ratios. Any deviations in ion ratios or additional experimental ions not present in the library reduced the forward score, thus the forward score is a good indication of the purity of the compound being detected. Co-elution with another molecule with the same mass add ions to the experimental spectrum and reduce the forward score. Similarly, a reverse score of 100 indicated that all ions present in the library were present in the experimental spectrum at the correct ratios and deviations in ion ratios or ions in the library not present in the experimental spectrum reduced the reverse score. Identification was automatically approved if all the above criteria were met and the MS/MS forward and reverse scores were above 80. Compounds which met the above criteria but had low MS/MS scores, below 35 for both forward and reverse, were automatically rejected. Compounds with intermediate MS/MS forward and reverse scores, 36–79, were marked for manual review. If an MS/MS spectrum was not obtained for a given ion, the identification was based on retention and parent mass alone and marked for analyst reviews. In this case, identification can still be confirmed if it has historical precedent in the specific matrix. Further details can be found in Evans et al.56.

Metabolite quantification and data normalisation: Peaks were quantified using area-under-the-curve. Data was normalised, to correct variations that resulted from differences in the inter-day tuning of the instruments. Essentially, each compound was corrected in run-day blocks, by registering the medians to equal one, and normalising each data point proportionately. Each biochemical in OrigScale data was then rescaled, to set the median equal to 1. Compounds in which more than 50% of values were missing were not included in the statistical analyses. Scaled data are provided in Supplementary Data 2 and 15. Raw and OrigScale data for VP are provided in Supplementary Data 16 and 17. Raw serum data are provided in Supplementary Data 18. These tables include RI, accurate mass values, mean differences in the detected metabolite, and conversion to parts per million (PPM). Metabolomic data were log-transformed (applying the natural logarithm to the data plus one) before data analysis.

Data analysis: Principal Component Analysis (PCA) using R software was used to visualise the metabolomic data. Before PCA, data were imputed using a k-nearest neighbour (kNN) algorithm57 (with k = 5); they were then mean-centered and scaled to unit variance. Two-way ANOVA was used to compare the diets (irrespective of genotypes) or genotypes (irrespective of diets) and a t test was used for two groups’ comparison (Supplementary Data 2 and Supplementary Data 15). Differences were considered significant if the P was <0.05; and to account for multiple testing, a FDR58 of <0.15. Qlucore Omics Explorer (http://www.qlucore.com; version 3.1) was used for heatmap representation and unsupervised clustering of metabolites that were significantly altered by HFD in a WT or a MYC context, or by MYC overexpression irrespective of the diet. Metabolites were grouped into 8 different classes (lipids, aminoacids, nucleotides, peptides, carbohydrates, cofactors and vitamins, energy, or xenobiotics), according to Metabolon’s classification. Biochemical annotations were assigned by PhD level biochemists at Metabolon, integrating information from literature and public databases (e.g. HMDB). Metabolite Set Enrichment Analysis (MSEA) was performed using a hand-curated metabolite set (Supplementary Data 19) and run using the Gene Set Enrichment Analysis platform (GSEA; Broad Institute)33 using 1000 permutations. Metabolite sets including fewer than three metabolites were excluded from the analysis. Metabolite sets were considered significantly enriched at P < 0.05 and FDR < 0.15.

Global chromatin profiling

The global chromatin profiling assay was performed as described in Creech et al.59, with the following modifications:

Cell lysis, tissue lysis, and histone extraction: Flash-frozen tissue samples, 10–40 mg in mass, were thawed on ice and resuspended in 200 μL ice-cold PBS. Samples were homogenised for about 2 min using a motorised pestle (VWR, 47747–370), and were spun down at 4 °C, at 1500 g for 5 min. Supernatant was removed and 0.5 mL ice-cold nucleus buffer was added to the resultant pellet. Nuclei were centrifuged at 4 °C, at 10,000 g for 1 min and supernatant was removed. The nucleus isolation procedure was repeated twice, removing supernatant each time. Histones were extracted from the remaining pellet, with 400 μL 0.4 N H 2 SO 4 at room temperature for 16 h, while shaking; at this point, histone isolation proceeded using the same protocol as described59. In addition to the flash-frozen tissue, histones were extracted from one 25 million cell pellet each of Arg-15N 6 ,13C 4 SILAC-labeled HeLa, K562, and 293 T (as in Jaffe et al.29), following the protocol described by Creech et al.59.

Histone derivatization: The sample set used SILAC standardisation, with histones extracted from HeLa, K562 and 293T cell lines, as described above. In this workflow, input amount was reduced to 10 μg per sample (5 μg sample and 5 μg SILAC heavy standards), based on the protocol. Samples were adjusted to 100 mM sodium phosphate, pH 8.0, by adding 3 μL 500 mM sodium phosphate, pH 8.0; the total volume of the sample was brought up to 15 μL with HPLC-grade water. Phosphate-buffered samples were reacted with 60 μL of 400 mM NHS propionate in anhydrous methanol at room temperature, with shaking. Three hundred microliters of 0.1% trifluoroacetic acid (TFA) was added, to bring samples to a volumetric concentration of 20% organic solvent. Samples were desalted on a 96-well Oasis HLB 5 mg/cc plate (Waters, 186000309). Activation, equilibration, and wash volumes were 200 μL for each step, and sample elution volume was 100 μL. For the trypsin digestion, 1 μg trypsin was used in 10 μL of 50 mM ammonium bicarbonate, pH 8.0, while all other conditions were as described59. After digestion and lyophilization, new N-termini were derivatized, by resuspending peptides in 40 μL of 400 mM NHS propionate/anhydrous methanol, and adjusting to 18 mM sodium phosphate, pH 8.0, with 10 μL 100 mM sodium phosphate, pH 8.0. The reaction was quenched with 10 μL 15% hydroxylamine solution and incubated for 30 min at room temperature with shaking. Samples were brought up to a total volume of 260 μL with HPLC-grade water, frozen, and lyophilised via vacuum concentrator. Samples were resuspended in 200 μL 0.1% TFA, and desalted on a SepPak tC18 96-well μElution plate (Waters, 186002318). All activation and wash volumes were 200 μL. Elution volume was 100 μL. Desalted peptides were lyophilised via vacuum concentrator, and were brought up to a volume of 10 μL with 3% acetonitrile (ACN)/5% formic acid (FA). Samples were further diluted 1:10 with 3% ACN/5% FA, before introducing them into the mass spectrometer.

LC-MS/MS assay parameters: The gradient was modified so that peptides were separated at a flow rate of 200 nL/minute, with a 60 min linear gradient from 97% solvent A (3% ACN/ 0.1% FA) to 33% solvent B (90% ACN/ 0.1% FA). This gradient was followed by a 15 min linear gradient, from 33% solvent B to 65% solvent B. This gradient was followed by a 5 min linear gradient from 65% solvent B to 90% solvent B, at which point the 90% solvent B was held for an additional 5 min. Including sample loading and column equilibration times, each sample took 120 min to completion, 90 min of which was taken up by active data acquisition.

Scheduling for H3, H4, H2A, H2A.Z and H2B targets: To determine each peptide’s retention time, we employed a scheduling sample, comprising three samples in a 1:1:1 ratio instead of a synthetic peptide mix. Most method parameters were the same as in Creech et al.59, except that peptides were scheduled within a 23-min window, based on hypothesised elution time; also, the total run time for each scan was 0–90 min. The isolation width for MS1 and MS2 scans were narrowed to 1.7 m/z with a 0.3 m/z offset: these data were acquired on a Q-Exactive Plus (Thermo Scientific) mass spectrometer. A list of peptides targeted in addition to published histone marks in Creech et al.59 is presented in Supplementary Data 20.

Scheduled data acquisition: After determining retention times, 1 μL of sample was injected onto the same column that was utilised for scheduling, using the same gradients with previously described modifications. MS1 and MS2 scans used the same parameters as described in Creech et al.59, with the same scan run time and isolation width modifications as described above. The inclusion list was turned on for each MS2 scan, and included heavy as well as light versions of each peptide to be observed, its charge state, new acquisition windows based on the scheduling runs, and optimal collision energies.

Heatmap generation: GENE-E (http://www.broadinstitute.org/cancer/software/GENE-E/) was used for heatmap representation as well as statistical analysis of the data, using the comparative marker selection suite60. Differences were considered significant if the p-value was <0.05, and FDR was <0.1. Unsupervised clustering of histone marks (one minus Pearson correlation) was done on normalised values, based on the median level of each mark in the three WT prostate lobes (VP, DLP and AP).

ChIP-sequencing

The ChIP-sequencing was performed as described in Ku et al.61, with the following modifications. Fresh-frozen VP tissues from 12-week-old mice were pulverised (Cryoprep Impactor, Covaris), resuspended in PBS + 1% formaldehyde, and incubated at room temperature for 20 min. Fixation was stopped by the addition of 0.125 M glycine (final concentration) for 15 min at room temperature, then washing in ice-cold PBS + EDTA-free protease inhibitor cocktail (PIC; #04693132001, Roche). Multiple biological replicates were combined for each condition in two distinct pools (replicates). Chromatin was isolated by the addition of lysis buffer (0.1% SDS, 1% Triton X-100, 10 mM Tris-HCl (pH 7.4), 1 mM EDTA (pH 8.0), 0.1% NaDOC, 0.13 M NaCl, 1X PIC) + sonication buffer (0.25% sarkosyl, 1 mM DTT) to the samples, which were maintained on ice for 30 min. Lysates were sonicated (E210 Focused-ultrasonicator, Covaris) and the DNA was sheared to an average length of ~ 200–500 bp. Genomic DNA (input) was isolated by treating sheared chromatin samples with RNase (30 min at 37 °C), proteinase K (30 min at 55 °C), de-crosslinking buffer (1% SDS, 100 mM NaHCO3 (final concentration), 6–16 h at 65 °C), followed by purification (#28008, Qiagen). DNA was quantified on a NanoDrop spectrophotometer, using the Quant-iT High-Sensitivity dsDNA Assay Kit (#Q33120, Thermo Fisher Scientific). On ice, ChIP-validated H4K20me1 (2 μg, #ab9051, Abcam) or PHF8 (5 μg, #A301–772A, Bethyl Laboratories) antibodies62 were conjugated to a mix of washed Dynalbeads protein A and G (Thermo Fisher Scientific), and incubated on a rotator (overnight at 4 °C) with 1.5 μg (H4K20me1) or 5 μg (PHF8) of chromatin. ChIP’ed complexes were washed, sequentally treated with RNase (30 min at 37 °C), proteinase K (30 min at 55 °C), de-crosslinking buffer (1% SDS, 100 mM NaHCO3 (final concentration), 6–16 h at 65 °C), and purified (#28008, Qiagen). The concentration and size distribution of the immunoprecipitated DNA was measured using the Bioanalyzer High Sensitivity DNA kit (#5067–4626, Agilent). Dana-Farber Cancer Institute Molecular Biology Core Facilities prepared libraries from 2 ng of DNA, using the ThruPLEX DNA-seq kit (#R400427, Rubicon Genomics), according to the manufacturer’s protocol; finished libraries were quantified by the Qubit dsDNA High-Sensitivity Assay Kit (#32854, Thermo Fisher Scientific), by an Agilent TapeStation 2200 system using D1000 ScreenTape (# 5067–5582, Agilent), and by RT-qPCR using the KAPA library quantification kit (# KK4835, Kapa Biosystems), according to the manufacturers’ protocols; ChIP-seq libraries were uniquely indexed in equimolar ratios, and sequenced to a target depth of 40 M reads on an Illumina NextSeq500 run, with single-end 75 bp reads; Bowtie2 (version 2.2.1) was used to align the ChIP-seq datasets to build version NCB37/MM9 of the mouse genome63. Alignments were performed using default parameters that preserved reads mapping uniquely to the genome without mismatches.

H4K20me1

H4K20me1 read density between transcriptional start site (TSS) and transcriptional end site (TES) was averaged for each gene, and reported against the CTD_WT (reference) for the HFD_WT, CTD_MYC or HFD_MYC conditions. Waterfall plots of rank-ordered log 2 -fold changes were used to visualise H4K20me1 dynamic changes. Genes with a loss (<1.15 fold-change) or a gain (>1.15 fold-change) of the H4K20me1 mark between TSS and TES relative to the CTD_WT (reference) were identified for the HFD_WT, CTD_MYC and HFD_MYC conditions, and were associated with their corresponding transcript abundance. Venn diagrams were generated using the ‘VennDiagram’ R package (version 1.6.9).

RNA-sequencing

Fresh VP tissues from 12-week-old mice were dissociated to form a single cell suspension. RNA from a similar number of cells was extracted using the miRNeasy Micro Kit (#217084, Qiagen) coupled with on-column DNAse treatment (#79254, Qiagen). RNA sample concentration was measured and subjected to quality evaluation, using a Bioanalyzer RNA 6000 Nano kit (#5067–1511, Agilent). The Dana-Farber Cancer Institute Molecular Biology Core Facilities prepared libraries from 500 ng of purified total RNA, using TruSeq Stranded mRNA sample preparation kits (#RS-122–2101, Illumina) according to the manufacturer’s protocol; submitted the finished libraries to quality control analyses as described in the ChIP-seq Methods section, pooled uniquely indexed RNA-seq libraries in equimolar ratios, and sequenced these to a target depth of 40M reads on an Illumina NextSeq500 run with single-end 75 bp reads. Fastq files were aligned to the mm9 genome using tophat with default parameters (version 2.0.11). Transcript abundances were calculated using the cuffquant module of Cufflinks (version 2.2.0). FPKM values were calculated and normalised using the cuffnorm module of Cufflinks (version 2.2.0). Paired t-test was calculated using the t.test function in R (version 3.3.2).

Murine gene set enrichment analysis and MYC signature

Gene expression values from biological triplicates were input for Gene Set Enrichment Analysis (GSEA)33 using the Hallmark (H, v5.01; Supplementary Data 7) or the Chemical and Genetic Perturbations (C2.cgp, v5.1; Supplementary Data 8) Molecular Signature Databases (MSigDB) with 10,000 permutations. The Normalised Enrichment Score (NES)—associated with gene sets that were significantly enriched or depleted (p < 0.05 and FDR < 0.1)—was used for heatmap generation, using a custom-made R script. A murine prostatic MYC signature was obtained by combining leading edge genes from all MYC-related gene sets that were significantly enriched (P < 0.05 and FDR < 0.1) in the H and C2.cgp MSigDB (Supplementary Data 9). Aggregate read density profiles of PHF8 and H4K20me1, and their quantification around MYC signature genes, were generated using deepTools64. Mapped regions were visualised using the Integrated Genomics Viewer (IGV, version 2.3.68)65.

Protein analysis

Fresh-frozen VP tissues from 12-week-old mice were pulverised (Cryoprep Pulvrizer, Covaris) and lysed on ice in RIPA buffer (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% NP-40) with the addition of phosphatases and protease inhibitor cocktail tablets (Complete Mini, EDTA-free, Roche). MYC-CaP cells (kindly provided by Dr. Charles Sawyers, Memorial Sloan Kettering Cancer Center, New York, NY)66 were rinsed on ice with PBS and lysed as for the mouse prostates. Cells were authenticated via STR profiling (DDC Medical, 16 January 2015). Cells were tested negative for mycoplasma contamination using MycoAlert™ Mycoplasma Detection Kit (Lonza). Equal amounts of protein (15–20 μg; Bradford protein assay, Bio-Rad) were resolved on precast 4–12 or 4–20% Tris-glycine SDS-polyacrylamide gels (Invitrogen), and transferred to Nitrocellulose Blotting membranes (Amersham), following standard procedures. Membranes were probed with the following antibodies according to the manufacturer’s instructions: rabbit monoclonal [Y69] anti-c-MYC (#ab32072, Abcam), or rabbit polyclonal anti-β-Actin (#4967, Cell Signaling Technology). Densitometry analyses were made with ImageJ (U.S. NIH, Bethesda, MD; http://imagej.nih.gov/ij/). Results were normalised to β-actin and expressed as arbitrary units.

Epidemiological studies

Study population: We tested our hypothesis among prostate cancer patients who were enrolled in two prospective studies: the Physicians’ Health Study (PHS) and the Health Professionals Follow-up Study (HPFS). PHS I and II began in 1982 and 1997, respectively, as randomised trials of aspirin (PHS I) and dietary supplements (PHS II), and enrolled 29,067 male U.S. physicians for the primary prevention of cardiovascular disease and cancer67,68,69,70. The HPFS was initiated in 1986, when 51,529 U.S. men, 40–75 years of age and working in health professions, completed a biennial questionnaire mailed to them71. In both studies, participants were followed by means of regular questionnaires, and self-reported data on diet, lifestyle behaviours, medical history, and disease outcomes were collected. We confirmed the incidence of prostate cancer cases in this population by reviewing medical records and pathology reports. Following the confirmation of diagnosis, we retrieved archival formalin-fixed paraffin-embedded (FFPE) prostate tissue specimens, collected during radical prostatectomy or transurethral resection of the prostate. Pathologists undertook a standardised histopathologic review, including Gleason grading72, and standardised clinical data were abstracted from medical records. Deaths were ascertained via mail, telephone, and through periodic systematic searches of the National Death Index. Lethal prostate cancer was defined as the occurrence of distant metastases, or death due to prostate cancer. Men were followed through March 2011 for PHS and through December 2011 for HPFS. We obtained written informed consent from all participants, and the study was approved by institutional review boards at the Harvard T.H. Chan School of Public Health and Partners Health Care.

Whole-transcriptome expression profiling: In the current study, we undertook gene expression profiling of archival tumour tissue among 402 men with prostate cancer in the cohorts using an extreme case control design. Cases were men with lethal prostate cancer (developed metastatic disease or died from prostate cancer) and controls were men with indolent cancer (those survived at least 8 years after prostate cancer diagnosis, without any evidence of metastases). In total, there were 113 lethal cases and 289 indolent cases. We also included adjacent normal tissue for a subset of these tumour tissues (n = 200). Gene expression profiling of archival FFPE tissue was performed as described73. Briefly, two to three 0.6-mm cores were sampled from regions of high-density tumour, and from adjacent normal prostate tissue. RNA was extracted with the Agencourt FormaPure kit (Beckman Coulter), with use of the Biomek FXP automated platform. Whole-transcriptome amplification was performed using WT-Ovation FFPE System V2 (NuGEN) and the amplified cDNA was hybridised to a GeneChip Human Gene 1.0 ST microarray (Affymetrix). For the expression profiles generated, we regressed out technical variables and then shifted the residuals to derive the original mean expression values, and normalised these using the robust multi-array average method74,75. NetAffx annotations were used to map gene names to Affymetrix transcript cluster IDs, as implemented in the Bioconductor annotation package pd.hugene.1.0.st.v1; this resulted in 20,254 unique gene names.

Diet assessment: Self-administered semi-quantitative food frequency questionnaires (FFQs) were collected every four years from 1986 for the HPFS, and were administered once between 1999 and 2002 for the PHS. The FFQs asked men to report their usual intake of approximately 130 foods and beverages during the previous year, and also their fried food consumption, the type of cooking fat they used, and whether they consumed the visible fat on meat. Fat intake levels were estimated by multiplying the frequency of intake by the amount of the fat in the specific portion of each food (based on nutrient composition data from the US Department of Agriculture, supplemented with food manufacturer data), and were summed across all foods. The FFQ was validated among 127 men in the HPFS. The correlations between the FFQ and four prospectively collected one-week weighed diet records were 0.67 for total fat, and 0.75 for saturated fat76. Because FFQ was mainly administered after the diagnosis of prostate cancer for the PHS participants, we estimated post-diagnostic fat intakes in both HPFS and PHS, to maintain a bigger sampler size and harmonize the two cohorts. In HPFS, we calculated cumulative average post-diagnostic intake from the FFQ preceding diagnosis until the end of the follow-up in HPFS77. Fat intake (g/d) was multiplied by 9 kcal and divided by total calories per day to calculate the percent of daily calories from each fat of interest.

Statistical analysis: Fat intake after diagnosis was estimated in 4577 men enrolled in the HPFS and in 926 men from the PHS, all of whom had non-metastatic prostate cancer. Cohort-specific quintiles were determined based on fat intake distributions for each cohort, with the highest quintile denoted as the high-fat group and the lower four quintiles grouped as the low-fat group (Supplementary Data 21). The categorised fat intake groups were then integrated with gene expression data in tumour or in adjacent normal tissues. Finally, we had 319 tumour tissues from patients (213 from the HPFS and 106 from the PHS) for whom we had complete fat intake estimation (animal fat: high-fat group n = 65 vs. low-fat group n = 254; saturated fat: high-fat group n = 62 vs. low-fat group n = 257; monounsaturated fat: high-fat group n = 66 vs. low-fat group n = 253; polyunsaturated fat: high-fat group n = 55 vs. low-fat group n = 264) and a total of 157 adjacent normal tissues after merging with fat intake data (animal fat: high-fat group n = 33 vs. low-fat group n = 124; saturated fat: high-fat group n = 29 vs. low-fat group n = 128; monounsaturated fat: high-fat group n = 33 vs. low-fat group n = 124; polyunsaturated fat: high-fat group n = 24 vs. low-fat group n = 133).

Gene set enrichment analysis: Gene expression profiles of tumour and adjacent normal prostate tissues were input for GSEA33, with use of the Hallmark (H, v4.0) MSigDB with 10,000 phenotype-based permutations, to identify predefined sets of functionally related genes correlated with specific fat intakes (Supplementary Data 10, 12–14). Gene sets with P < 0.05 and FDR < 0.1 were considered for subsequent analyses. Animal fat and saturated fat intake-dependent MYC signatures were obtained by combining either the leading edge or the non-leading edge genes from the MYC_targets_V1 gene set from the H MSigDB in tumour tissues (Supplementary Data 11), to create a metagene score as previously described78. This was computed for each sample by averaging the normalised (mean-centered and variance scaled) expression values of all member genes. An additional signature was derived from 113 randomly selected genes from the MYC_targets_V1 gene set (Supplementary Data 11). Odds ratios and 95% confidence intervals were obtained by logistic regression for the association between the metagene score and lethal prostate cancer. The score was modelled as categorical (tertiles). We tested for linear trend across score categories by modelling the tertiles as a continuous variable. All models were adjusted for age and year at diagnosis. We further adjusted for Gleason grade to test whether the score is an independent predictor of lethal prostate cancer and BMI at diagnosis, to differentiate the effect from overweight/obesity. To assess whether the association between the score and lethal prostate cancer was modified by saturated fat intake, we obtained P for interaction by including an interaction term (saturated fat intake x MYC score) in the multivariable model using a Wald test. All analyses were conducted using SAS version 9.3 and R version 3.1.0.

Validation cohorts: To investigate the power of SFI-induced and non-SFI-induced MYC signatures to predict metastatic disease, we utilised genome-wide expression profiles of 751 patients with metastatic outcome follow-up from the Decipher Genomic Resource Information Database (GRID; NCT02609269). These patients were pooled from four studies of either case-cohort or cohort design. Patients for these studies came from four institutes: Thomas Jefferson University (TJU; n = 139)79, Johns Hopkins Medical Institutions-I (JHMI-I; n = 260)80, Mayo Clinic (n = 235)81, Cedars-Sinai (n = 117)82. A total of 120 non-randomly selected patients from case-cohort studies were removed before pooling the studies to avoid bias in estimating the hazard ratio. 631 patients were thus eligible for analysis, 70 of which developed metastasis. Median follow-up time for censored patients was 8 years and the median age at radical prostatectomy was 61 years.

The fat-induced MYC signature (113 genes) and non-fat-induced MYC signature (87 genes) were used to calculate pathway expression scores for each patient, using a z-score scaled, mean gene expression. Based on the tertiles of these scores, patients were divided into three groups with T1 being the lowest and T3 the highest. Kaplan–Meier curves and Cox proportional hazard regression were used to evaluate the metastatic prognosis. To test associations between signatures and BMI, we extracted BMI data from 494 patients pooled from three cohorts (TJU, n = 139; JHMI-I only, n = 144; JHMI-II83 only, n = 95; JHMI-I/II, n = 116). Correlation analysis using Pearson’s correlation was used to measure the association between MYC signatures score and BMI. JHMI-II was excluded from the survival analysis because only patients that developed biochemical recurrence were selected for this study, hence it was statistically inappropriate to pool the JHMI-II cohort with the others lacking this inclusion criteria as it would inflate the event rate. We also conducted univariate and multivariate analyses to associate the SFI-induced MYC signature with clinical outcome after adjusting for other clinicopathologic variables including pre-operative prostate-specific antigen (PSA) levels, seminal vesicle invasion, surgical margins, extracapsular extension, lymph node invasion, gleason grade or the Cell Cycle Progression score in the pooled cohort from which we utilised genome-wide expression profiles of 631 patients (deidentified and aggregated from routine clinical use of the Decipher prostate cancer classifier test; Decipher Biosciences Laboratory, San Diego, CA) with metastatic outcome follow-up from the Decipher GRID.

Adequacy of statistical analyses

All the statistical tests were justified as appropriate. Assumption criteria were met, analysis of variance was performed. When variance was not equal, Welch’s t-test (unequal variance t-test) was applied. Data are reported including estimation of variation within each group. Two-sided tests were used. Measurements were taken from distinct samples.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.