Patient samples

Eleven MDS patients (7 males and 4 females) were selected based on having a long disease course (2.5–11 years of follow-up, median 7) and many sampling moments (5–31, median 7) (Table 1). Two categories of patients were analysed: patients who received supportive treatment only (n=6) and patients who were treated with lenalidomide (n=5). Two patients of the latter group also received 5-azacitidine. BM and PB from these patients were obtained at multiple time points. The study was conducted in accordance with the Declaration of Helsinki and institutional guidelines and regulations from the Radboudumc Nijmegen (IRB number: CMO 2013/064), and included informed consent by all patients. The patient characteristics are listed in Table 1. Morphology of BM cells was examined using standard May-Grünwald-Giemsa stainings.

DNA isolation and amplification

DNA was isolated from PB or BM of MDS patients using the NucleoSpin Blood QuickPure kit (Macherey Nagel, Düren, Germany) according to the manufacturer’s protocol. In addition, BM and PB mononuclear cells (MNCs) and PB granulocytes were obtained after Ficoll-1077 density gradient separation. BM or PB cells were slowly added on top of a layer with Ficoll-Paque PLUS (density 1.077) (GE Healthcare, Chicago, IL, USA). After centrifugation at 700 g for 20 min, MNCs were present on top of the Ficoll layer and granulocytes (and red bloods) underneath. These two cell fractions were collected separately, after which DNA was isolated. When the extraction yield was insufficient (<5 μg) as measured with the Qubit fluorometer Quant-iT dsDNA BR Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA), 80 ng of DNA was amplified using the Qiagen REPLI-g kit (Qiagen, Venlo, The Netherlands) in 4 parallel reactions (20 ng per reaction), according to the manufacturer’s protocol.

Karyotype analysis

Bone marrow samples were cultured for 24–48 h in RPMI-1640 medium (Life Technologies, Carlsbad, CA, USA) supplemented with 10% fetal calf serum and antibiotics. After hypotonic treatment with 0.075 M KCl and fixation in methanol/acetic acid (3:1), microscopic slide preparations were prepared. Chromosomes were G-banded using trypsin (Life Technologies) and Giemsa and at least 20 metaphases were analysed in case of a normal karyotype, and at least 10 in case of an abnormal karyotype. Karyotypes were described according to the standardized ISCN 2013 nomenclature system34.

Fluorescence in situ hybridization

Standard cytogenetic cell preparations were used for FISH. FISH was performed using commercially available probe kits for LSI EGR1/D5S23D5S721, LSI IGH/MYC/CEP 8 and D13s319/13q34 FISH, according to the manufacturer’s specifications (Abbott Molecular, Des Plaines, IL, USA). Fluorescent signals of at least 200 interphase nuclei were scored and interpreted by two independent investigators. The cutoff values for both gains and losses were determined by statistical evaluation of FISH results from control tissue. For each probe the mean+3 s.d. of false positive nuclei was taken as the cutoff level.

T-cell culture

Pure T cells were obtained from each patient by in vitro expansion of T cells from PB (or BM). Monocytes were first depleted by adherence to tissue culture flasks. The remaining cells were cultured for 14 to 21 days in IMDM medium (Life Technologies) supplemented with 10% human serum (PAA Laboratories GmbH, Pasching, Austria), interleukin-2 (100 IU ml−1) and CD3/CD28-coated Dynabeads (Thermo Fisher). The purity of the T cells was measured by flow cytometric analysis using the CD3 surface marker. When the purity of the T cells exceeded 95%, DNA was isolated using the NucleoSpin Blood QuickPure kit.

Mesenchymal stromal cell culture

MSC lines were generated from five subjects. Bone marrow MNCs were obtained by Ficoll-1077 density gradient separation. BM-MNCs were seeded at a density of 8 to 23 × 104 cells cm−2 in α-MEM medium (Sigma-Aldrich, St Louis, MO, USA) supplemented with heparin (3.5 IU ml−1) and 5% platelet lysate. Platelet lysate was prepared by freeze-thawing of platelets (>0.8 × 109 platelets per ml), followed by centrifugation at 4,700 g and collection of the supernatant. At 7 days after seeding, the culture medium was refreshed. Subsequently, cells were passed when 80% confluency was reached. After 7 days of culture, all floating and dead cells were washed away and a layer with MSCs remained. MSCs were cultured for up to 5 passages.

CFU-GEMM culture and sequencing of single colonies

PB-MNCs or BM-MNCs were seeded in methylcellulose media (10,000–25,000 cells per ml for BM and 100,000–200,000 cells per ml for PB) containing stem cell factor, interleukin-3, granulocyte–macrophage colony-stimulating factor and erythropoietin (H4434; Stem Cell Technologies, Vancouver, Canada) and incubated for 14 days at 37 °C with 5% CO 2 . Individual colonies were collected on day 14 and washed with phosphate-buffered saline in a 96-well plate. Cells were lysed by adding 30 μl lysis buffer (TE-buffer+0.5% Igepal-CA630+0.6 μl proteinase K (10 mg ml−1)) followed by incubating at 56 °C for 120 min and at 90 °C for 30 min. Subsequently, 1 μl of the lysate was used for each PCR reaction. Targeted amplicon-based deep sequencing was performed as described below. To exclude the possibility of reporting the results of mixed colonies, only colonies in which mutations were detected with a VAF of >40% were reported as positive.

Sorting of myeloid progenitors

1 ml viably frozen bone marrow MNCs were thawed in the presence of 100 μl DNAse I (2 mg μl−1) and incubated for 10 min in a solution of 1.6 ml fetal calf serum, 10 μl heparin (5,000 U ml−1) and 100 μl MgSO 4 (0.22 μM). Subsequently, the myeloid progenitor cells were sorted according to a protocol adapted from Pang et al.35 The cells were washed and stained with CD34-APC (Beckman Coulter, Brea, CA, USA), CD38-PE-Cy7 (BioLegend, San Diego, CA, USA), CD123-PE (BioLegend) and CD45RA-PB (BioLegend) monoclonal antibodies. Cells were analysed and sorted using a FACS Aria SORP flow cytometer and DIVA software (Becton Dickinson, Franklin Lakes, NJ, USA). Viable cells were selected based on forward scatter and side scatter profiles, and doublets were discriminated using forward scatter area versus width and side scatter area versus width. The HSC population was defined as CD34+CD38−. Within the CD34+CD38+ fraction, the common myeloid progenitor cells (CD123+CD45RA−), the granulocyte-macrophage progenitor cells (CD123+CD45RA+) and the megakaryocyte-erythroid progenitor cells (CD123−CD45RA−) were selected. DNA isolation from these cell fractions, followed by DNA amplification, was carried out using the Qiagen REPLI-g single cell kit (Qiagen) according to the manufacturer’s protocol.

Whole-exome sequencing

WES to an average depth of 110 × was performed on sequential BM-MNC (n=43) and PB-MNC samples (n=2) taken at regular time intervals (2 to 8 samples per patient). For all patients, DNA isolated from cultured T cells was used as a constitutive reference to exclude germline variants. Mutations significantly higher in the tumour cells than the T cells were listed as high confidence mutations and taken along in our analysis. In both, UPN02 and UPN03 one mutation was clearly affecting the T cells (VAF 19% and 24% respectively, see Supplementary Data 1), but in both cases the VAF was significantly higher in the tumour sample. Furthermore, for five patients DNA was available from cultured MSCs and used as additional germline control to ensure that no variants acquired in multipotent HSCs (and therefore also affecting T cells36) were incorrectly marked as germline variants and excluded. No MDS-associated mutations were found in the T cells of these five patients (Supplementary Table 4), indicating that the T cells were not part of the malignant clone.

Exome capture was performed using SureSelect Human All Exon V5 (Agilent Technologies, Santa Clara, CA, USA). Enriched exome fragments were then subjected to massively parallel sequencing using the HiSeq 2500 platform (Illumina, San Diego, CA, USA). Sequence alignment and mutation calling were performed using our in-house pipelines, as previously described37, with minor modifications. Candidate mutations with (1) Fisher’s exact P≤0.001 and (2) a VAF in tumour samples ≥0.07 (to reduce false positive mutation calls) were selected. These variants were further filtered by excluding (1) synonymous SNVs, (2) SNVs in genes whose structure is not correctly annotated (complete open reading frame information is not available) and (3) SNVs listed as SNPs in the 1000 Genomes Project database (Nov 2010 release), dbSNP131 or our in-house SNP database. High-density SNP arrays were performed on DNA extracted from BM cells at several time points, allowing to correct VAFs for local copy number variations.

Targeted deep sequencing using gene panels

For one patient we analysed 2 samples collected after allogeneic stem cell transplantation using SureSelect (Agilent)-based targeted-capture sequencing for 72 known MDS driver genes (Supplementary Table 2). Mutation calling was performed as previously described3, with minor modifications. Germline SNVs were removed using WES data of paired germline control samples. Finally, we selected only mutations considered to be definitely oncogenic2. In addition, we used a myeloid gene panel (Trusight, Illumina) (Supplementary Table 1) to screen for driver mutations in unrelated clones.

Targeted amplicon-based deep sequencing

The candidate somatic variants detected by WES were validated and quantified by amplicon-based deep sequencing on an Ion Torrent Personal Genome Machine (Thermo Fisher Scientific) at high depth (aim 10,000 × coverage). Using this approach, mutational burdens were measured in all available PB and BM samples for each patient (Supplementary Data 3). Fragments with lengths of ∼200 base pairs were amplified in two consecutive PCR reactions, PCR1 and PCR2, both of which were performed using Q5 Hot Start High-Fidelity Master Mix (New England Biolabs, Ipswich, MA, USA) according to the manufacturer’s protocol. In PCR1, the target fragments were amplified and tagged with common sequence (CS)-tags (designed by Fluidigm, South San Francisco, CA, USA). For this purpose, sequence-specific primers were designed to obtain PCR fragments of ∼200 base pairs. CS-tags were attached to these primers (see Supplementary Fig. 17 for primer strategy and Supplementary Tables 5 and 6 and Supplementary Data 4 for primer sequences). Depending on the primer pair, the best of three optimized touchdown PCR protocols was used (see Supplementary Table 7). In PCR2, primers containing a CS-tag, a barcode and an adapter (see Supplementary Fig. 17 for primer strategy and Supplementary Tables 5 and 6, and Supplementary Data 4 for primer sequences), were used to label the PCR fragments with a sample-specific Ion Xpress barcode (designed by Thermo Fisher Scientific) and add the adapters required for emulsion PCR. The second PCR was performed twice, once with the A adapter attached to the forward primer and the truncated P1 (trP1) adapter to the reverse primer (PCR2-A) and vice versa (PCR2-B), making bidirectional sequencing possible. For the PCR protocol for PCR2 see Supplementary Table 8. Subsequently, PCR products were pooled and purified with Agencourt AMPure XP beads (Beckman-Coulter, Fullerton, CA, USA) to eliminate primer dimers. After purification, the purity of the pool (based on expected fragment size) was measured on the Agilent 2200 TapeStation (Agilent Technologies) using the high-sensitivity D1000 ScreenTape assay (Agilent). The purified pool was diluted to 3 pg μl−1 and loaded onto the Ion OneTouch system (Thermo Fisher Scientific) for emulsion PCR using the Ion PGM Template OT2 200 kit (Thermo Fisher Scientific), followed by an enrichment for loaded Ion Sphere Particles (ISPs). The quality of the enriched ISPs was checked with the Ion Sphere Quality Control Kit (Thermo Fisher Scientific) on the Qubit Fluorometer (Thermo Fisher Scientific). Subsequently, the ISPs were loaded onto an Ion 314, 316 or 318 v2 Chip (Thermo Fisher Scientific) and sequenced using the Ion PGM Sequencing kit v2 (Thermo Fisher Scientific) on the Ion Torrent Personal Genome Machine system (Thermo Fisher Scientific). All steps were performed according to the manufacturer’s protocols. The sequencing data were mapped to the GRCh37 (hg19) reference genome build and variants were called with the SeqNext module of the Sequence Pilot software, version 4.2.2 (JSI Medical Systems, Ettenheim, Germany). Besides the automatic calling of variants, all locations wherein variants were detected by WES were manually inspected. A mutation was marked as validated by targeted deep sequencing when detected in the tumour sample (which was also used for WES) with a higher VAF than in the germline sample (at least 5% difference). The median validation rate per patient was 66.7%. Most mutations that could not be validated were mutations detected by WES in an amplified DNA sample (mainly insertions or deletions of a C or G), or mutations in genes that have a highly identical family member (likely incorrect mapping of WES reads). To determine an optimal cutoff VAF to discriminate true mutations from sequencing noise, we determined the sensitivity and specificity of Ion Torrent targeted deep sequencing. When we analysed the presence of 8 different mutations in 10 healthy donors, a VAF cutoff of 0.2% resulted in a specificity of 100% (Supplementary Table 9). In addition, we made a dilution series of 3 different SNPs and observed that a VAF of 0.1% could still accurately be detected (Supplementary Table 10). Based on this, we used a cutoff of 0.2%, which means 20/10,000 reads should harbour the mutation. In addition, the mutated base had to be the second highest base at the investigated position. This ensures that also in a more difficult sequence context the mutation exceeds the sequencing noise. In addition, a FLT3-ITD mutation was detected using fragment length analysis.

Microarray-based genomic profiling (SNP array)

Microarray-based genomic profiling was carried out using the CytoScan HD array platform (Affymetrix, Inc., Santa Clara, CA, USA). Hybridizations were performed according to the manufacturer’s protocols. The data were analysed using the Chromosome Analysis Suite software package (Affymetrix), using the annotations of reference genome build GRCh37 (hg19). For a comprehensive analysis of the microarray-based genomic profiling data, we used a previously developed filtering pipeline. The interpretation was performed using criteria adapted from Simons et al.38 and Schoumans et al.39 First, all aberrations affecting segments larger than 5 Mb (resolution of conventional karyotyping), regardless of gene content, were denoted as true aberrations. In addition, all aberrations affecting segments smaller than 5 Mb that coincided with known cancer genes (http://cancer.sanger.ac.uk/cancergenome/projects/census/, date of accession November 2012) were included. Since paired control DNA was not used, alterations that coincided with established normal genomic variants were excluded. For this approach, we used the publicly available ‘Database of Genomic Variants’ (http://projects.tcag.ca/variation) and, in addition, in-house databases of copy number alterations (CNAs) detected in ∼1,000 healthy individuals studied with the CytoScan HD platform. Regions of copy-neutral loss of heterozygosity, also known as acquired uniparental disomy, were only considered if they were >10 Mb in size and if they extended towards the telomeres of the involved chromosomes, as reported by Heinrichs et al.40 Finally, focal CNAs in the immunoglobulin and T-cell receptor genes were excluded from this study, as these CNAs generally represent the rearranged T-cell receptor and immunoglobulin genes present in the PB lymphocytes of the normal reference samples. All the data were also visually inspected to define alterations present in smaller proportions of cells and to eliminate alterations reported in regions with low probe density. Only aberrations fulfilling the above criteria were included in the genomic profiles and were described according to the standardized ISCN 2013 nomenclature system34.

Reconstructing clonal composition and evolution patterns

Various software tools were tested to analyse clonal composition and evolution. However, different programs yielded different results, and close manual inspection showed imperfections in the patterns generated by all tested programs. Therefore, we constructed the clonal evolution patterns based on VAFs of all detected mutations at all time points, and included information from karyotyping, FISH and SNP arrays. For clonal reconstruction, all variants detected with a VAF of ≥0.2% were considered. Mutations were clustered based on the VAFs (corrected for ploidy) from all sequenced samples (PB and BM) at all different time points. The sequential order of mutational events and the most probable clonal evolution pattern were derived from these mutation clusters and their behaviour in time.

In UPN05, the clonal evolution pattern was calculated for the mononuclear myeloid cell fraction, rather than for the total BM-MNC fraction, as this patient developed bone marrow fibrosis and PB lymphocytosis, resulting in noncomparable sampling before and during treatment with romiplostim. In all other patients, lymphocyte counts were stable over time.

Data availability

Sequencing data (fastq files) of all 11 patients have been deposited into the NCBI Sequence Read Archive under accession number SRP094064. All other remaining data are available within the Article and Supplementary Files, or available from the authors on request.