When directly comparing the cfDNA size profiles of cancer patient cfDNA extract obtained by SSP-S and DSP-S, a high discrepancy was found. The population corresponding to the DNA size wrapped around a mononucleosome (120–220-bp range, peaking up to 167 bp) was observed using both methods. Such a profile was analogous to that of several reports analyzing cfDNA11,12,27 and the plasma of pregnant women or organ transplant recipients,6 suggesting DNA fragmentation during apoptosis.9,17 However, SSP-S revealed a substantial cfDNA fragment population ranging from 30 to 130 nt, especially from 30 to 80 nt, which was not detectable using the DSP library. CfDNA appeared more accessible for sequencing following SSP-S, as previously reported by our group27 and then by Burnham et al.28 It should be noted that the degree of depletion of short DNA molecules using double-stranded library preparation can be different across different studies, which can be due to adaptors clean-up steps. Fragments shorter than 100 nt are in abundance in cancer patient-derived plasma, but conventional DSP-S methods appeared insensitive to ultra-short cfDNA, emphasizing the need to use SSP-S for optimally examining cfDNA profiles. The SSP library has been recently described and used to generate high-resolution genomes when examining paleontological ancient DNA.25,26 This method uses a single-strand DNA ligase and a 5′-phosphorylated and biotinylated adapter oligonucleotide to capture and bind single-strand DNA molecules to beads without prior end repair.25 dsDNA is generated through use of primers from this ligation, and subsequently receives a second adaptor via blunt-end ligation. Completion of the adaptor sequence through an amplification reaction is then carried out from finished single strands obtained by heating the previously obtained molecules.

SSP-S offers various advantages over DSP-S with regard to the detection of cfDNA. The two most important reasons seem to be the different ligases used by each protocol and the denaturing step in the single-strand preparation. First, circligase, used in the SSP, is more efficient on small fragments than on longer fragments, and is almost certainly more efficient than T4 DNA ligase for short cfDNA fragments. Second, fragments that are damaged, for example, with nicks or abasic sites, are likely to be lost during DSP, but are retained during SSP, so if a sample has DNA damage (e.g., cfDNA), SSP is likely to capture the damaged molecules. DNA molecules with single-strand breaks on one or both strands may be present in cfDNA (Fig. 3). Whereas such molecules are completely lost under DSP procedure, SSP results to DNA break down into several fragments during heat denaturation and each fragment has an independent chance of being recovered in the library.26 Third, through the initial biotinylation of cfDNA, all SSP reaction steps are performed while the DNA is tightly bound to the streptavidin-coated beads.26 Loss of molecules in the DNA purification steps using silica spin columns or carboxylated beads, which are integral parts of DSP methods, are avoided. Fourth, DSP requires multiple bead-based, size-selective steps eliminating unwanted adapter-dimer products, whereas SSP does not require size-selective steps that eliminate shorter fragments.28 Consequently, SSP libraries may contain a larger fraction of shorter molecules than those produced by the double-strand method as demonstrated by Bennett et al.25 They observed that SSP improved the recovery of a higher proportion of mapped reads at almost every bin size, which could decrease for increasing fragment lengths although this is still controversial in the literature.25 Hence, we cannot totally rule out that SSP-S enriched short over longer fragments and might generate a bias in the representation of the natural distribution.

Fig. 3 Schematic diagrams of circulating DNA fragmentation from nucleosomes. a Two hypotheses are presented: DNA wrapping around a histone octamer or bound to transcription factor (TF). Different types of dsDNA fragments are schematically represented and may exhibit nicks. DNA double-strand or single-strand breaks may occur inside or outside both types of DNA/protein complexes, inside or outside cells. Following DNA denaturation (such as under PCR or SSP preparation), the resulting single strands may be of varying size from several oligonucleotides to few hundreds of nucleotides. The lengths given are indicative. They are based on nucleosome consisting of an octamer of core histone proteins wrapped ~1.65 times by 147 bp of DNA and on the presence of linker DNA describing the non-nucleosomal DNA connecting two or more nucleosomes in an array with length ranging between 20 and 90 bp and varying among different species, or tissues. SsDNA fragments produced by SSP or Q-PCR are subsequently replicated, and sequenced or quantified by SSP-S or Q-PCR, respectively. b cfDNA structures and fragmentation with regards to size profile as determined by SSP-S of cfDNA extracted as illustrated from the IC17 patient. Three fractions of the size profile could approximately be distinguished in light of our observations and other works:12,13,16,27 Blue curve, DNA fragments originating from cfDNA packed within mononucleosome without any intranucleosomal nicks revealed by both SSP-S and DSP-S; green curve, DNA fragments originating from cfDNA packed within mononucleosome exhibiting nuclease nicks, or within TFs without any nicks, observed by both SSP-S and DSP-S; and black curve, DNA fragments originating from cfDNA packed within mononucleosome with more nicks or within TFs with nicks, which are only observed by SSP-S Full size image

Our blinded study using a Q-PCR method18,20 on the same DNA extracts used for SSP-S showed strikingly similar fractional size distribution as those obtained using SSP-S. We previously showed using the Q-PCR method30 that size distribution of cfDNA fragments is significantly lower than the conventional paradigm wherein the lowest size is that of the DNA sequence wrapped around a single histone octamer (147–200 bp, 180 bp mean).18,20 Here, direct comparison of SSP-S and Q-PCR analysis showed a higher distribution of cfDNA fragments lower than 145 nt (145 bp corresponds to DNA wrapped around a nucleosomal core unit (167 bp) minus a linker fragment DNA of ~20 bp). Q-PCR is a robust and validated technique, but a few reports have scrutinized its efficiency and variation in quantifying short DNA fragments25,31 (SI-12). We demonstrated in this study that our fractional size profile determination by Q-PCR assay does not show any bias in artificially enriching short vs. longer fragments.

Naked DNA is very rapidly degraded in the blood circulation as the half-life of intact DNA without a double-strand break has been estimated to be less than a minute.32 Consequently, only cfDNA protected by stable structures can be detected in the bloodstream. Nuclear cfDNA fragmentation results from mapping locations of the chromatin organization along the genome, which protect/packed DNA with mononucleosomes as the lower unit. At least two key DNA/protein complexes enabling DNA protection from blood nucleases may be considered: DNA wrapped around a histone octamer or DNA bound to transcription factor (TF). Since linker DNA between nucleosomes is vulnerable to digestion, lengths corresponding to one nucleosomal subunit appear to be the most prevalent and conserved size with di- and tri-nucleosomal lengths showing much lower proportions.16 Stable nucleosome associated structures may be 192 bp (mononucleosome plus linker), 165 bp (trimmed mononucleosome), or 147 bp (core particle: nucleosome excluding the DNA connected to the peripheral histone H1, which adds ~20 bp; Fig. 3). Trimmed mononucleosome cfDNA-associated structures (165 bp) appear to be preferentially protected as shown by its prevalence in the cfDNA size profile. Note, as already observed by Chan et al.19 our data indicate that the fraction cfDNA fragments over the size of DNA wrapped in a mononucleosome as determined by Q-PCR as well as by SSP-S or DSP-S is very minor ( < 1.8% and < 6%, respectively). Whereas size distribution analysis by Q-PCR is not limited to DNA size over ~40 bp, size profile analysis through DSP- or SSP-S, as performed here, is limited to cfDNA of fragments under ~1000 bp and thus precludes examination of cfDNA of higher molecular weight. Nevertheless, cfDNA quantitations were similar when using Q-PCR and SSP-S, suggesting that high molecular weight (over 350 bp) cfDNA is a minor component (~2%) in terms of genome equivalent copy number in cancer patients (Fig. 3). Altogether, this suggest that presence of DNA circulating within di- or oligonucleosomes is minor and that high molecular weight DNA poorly circulate in cancer patients’ blood. Note, this is observed when stringent protocol for the pre-analytical conditions are used. We may postulate that the significant presence of high molecular weight in a cancer patient cfDNA extract could indicate a possible contamination of genomic DNA from lysed blood cells and may be a pre-analytical parameter to assess as quality control of the cfDNA extract.

The presence of cfDNA fragments lower than 100 bp may be explained by various hypotheses. First, degradation at both linker extremities of pieces of DNA protected from TFs previously bound to linker DNA between two nucleosomes (with length ranging between 20 and 90 bp and varying among different species, or tissues; Fig. 3) may release protected short double-stranded fragments into the blood circulation. Second, and more likely, DNA double-stranded or single-stranded breaks may occur in bloodstream inside or outside both types of DNA/protein complexes, inside or outside cells. Following DNA denaturation during PCR or SSP, the resulting single strands may be of varying size. Since we clearly observed the detection of polymerized short double-stranded DNA and sequencing of short sequences from SSP, it is reasonable to assume that the possible sources of detected short ssDNA fragments include both short double-stranded cfDNA (<145 bp) or nicked double-standed cfDNA of higher size. The ~10nt periodicity, within the 41–166nt range, observed with using SSP-S demonstrated the presence of nucleosome-derived degradation since this pattern has been attributed to the internal nucleosome cleavage of accessible nucleotides that lie further from the surface of the histone core at each helical turn as DNA wraps around the core.15,33 Observation of periodicity lower than 145 bp down to 81 bp by DSP-S might reveal the presence in blood of short double-stranded DNA associated to nucleosomes. Calculation of the number of reads cannot provide a true estimation of the percentage of nicked intranucleosomal cfDNA. However, since both SSP-S and DSP-S show the same peak at 166 bp we could consider that, at this size, a fraction of cfDNA molecule fragments, at least in one strand, are free of nicks, as illustrated in Fig. 3. DSP-S analysis showed that cfDNAs are principally associated with histones and that the lowest dsDNA fragment length is approximately 80 bp. SSP-S analysis, on the other hand, showed that the detected single-strand cfDNA fragments below the size covered in a mononucleosomal core (<145 nt) were initially mostly associated with histones. This suggests that there is at most only a minor fraction of short histone-free fragments. Since histone association implies dsDNA secondary structure, our data suggest that there is negligible single-stranded DNA circulating in blood.

Note, our previous observations by AFM analysis support the existence of short ds cfDNA structures as a significant proportion of cfDNA from cancer patients was ranging between 100 and 145 bp.8 In addition, previous studies of DNase I cleavage patterns identified two dominant classes of fragments: longer fragments associated with cleavage between nucleosomes, and shorter fragments associated with cleavage adjacent to TF-binding sites.34 By generating maps of genome-wide in vivo nucleosome occupancy, we previously found that short cfDNA fragments (35–80 bp) harbor footprints of TFs.27 Although additional observations from the literature are needed to estimate the proportion or significance of TF-associated cfDNA, higher scrutiny of those short cfDNA fragments might provide a new diagnostic potential based on TF presence. Note, the use of gel electrophoresis assay for cfDNA sizing, which is only based on detecting dsDNA never showed cfDNA fragment length peaking below 180 bp.1

All those various reasons concur to the significant discrepancy found in size profile, and that cfDNA structures are of high diversity spanning from tightly packed long dsDNA, mononucleosomes or oligonucleosomes, heminucleosome formation, short-sized TF-binding dsDNA, long-sized and short-sized DNA-associated microparticles, short-sized lipoproteonucleic complexes, and cell or cell-part association. These structures would be all subject to endonuclease and exonuclease degradation as soon as they are released from cells into the blood circulation. Our data showed that nucleosomal structures are one of the least degradable cfDNA structures with, to a lesser extent, TF-associated cfDNA. Apoptosis might appear as the main source of cfDNA; however, short-sized nucleosomal structures could also be the results of the progressive nuclease degradation of higher-sized cfDNA originating from necrosis, phagocytosis, micro-particle-containing DNA, or active release from lymphocytes.1 CfDNA structure diversity therefore results from different biological phenomena: various cellular mechanisms of release, dynamic nucleic or proteic degradation in the circulation, and potential association with blood constituents.

We recently demonstrated that deep-sequencing cfDNA for mapping genome-wide in vivo nucleosome occupancy may reveal its tissues of origin.27 The many structures and mechanical origins have shown that cfDNA is a complex entity. Sizing following cfDNA extraction cannot fully account for characterizing their structures. Nevertheless, we may hypothesize that the level of fragmentation vary upon cfDNA origins (mitochondrial, nuclear, tumor or healthy cells, lymphocytes, tumor microenvironment cells, metastatic cells, etc.) and information on sizing may be key in accurately detecting and quantifying cfDNA. In the light of this assumption, optimal detection and discrimination of cell-free DNA collected from other body constituents may rely on sizes specific to their origin.

This study shows that much higher cfDNA copies may be readily recovered by selecting/targeting short single-strand fragments, consequently providing higher sensitivity when detecting genetic or epigenetic alterations when testing cancer patient plasma. Based on our initial observation on cfDNA size distribution18 and the necessity of targeting short DNA sequences (50–80 bp) for optimal detection by Q-PCR, this strategy was first taken into consideration to an allele specific with blocker Q-PCR method (IntPlex), which demonstrated very high sensitivity,35,36 and afterwards when accordingly setting other PCR-based methods, such as single locus Q-PCR,22 Beaming,21 or dPCR,23 or by sequencing and selective amplification.37,38 Since cfDNA fragments, and especially mutant cfDNA in cancer patient, may be poorly represented in blood, optimal recovery of cfDNA is required for its analysis. Several reports have showed that SSP-S appears better suited than conventional DSP-S for obtaining an optimal analytical signal. Alternatively, Moser et al.39 did not observe a preferential enrichment of circulating DNA.39 Thus, it is still debatable as to whether or not SSP will definitively improve the quantification performance, and whether a shift towards SSP-S for analyzing cfDNA in a clinical practice is warranted.

As well as improving cfDNA recovery for optimal detection, knowledge on sizing may also enable subject stratification. We reported that total cfDNA18 as well as mutant cfDNA20 of cancer patients is more fragmented than that of healthy individuals or of wild-type cfDNA, respectively, by using a Q-PCR-based method. The presence of more fragmented DNA molecules in cancer patients was further elucidated in another study with the use of sequencing technology based on double-stranded library preparation.20

The limitations of this study are detailed in the supplementary notes. Briefly, these concern not examining the presence of mitochondria-derived cfDNA, the cancer stages other than stage IV, and potential bias with the extraction procedure. Moreover, the main potential theoretical factors that might contribute to the difference between % SSP-S and % Q-PCR reside in the analytical size window of the methods used here: our ultra-deep-sequencing method spans from ~30 to ~1000 bp and our Q-PCR method over 60 bp. Consequently, our comparative study should be limited to the 60 to ~1000 bp fragment size range. In addition, the study could not determine whether fragment size profiles in cfDNA are associated with tissue types and cancer types as previously reported.27,40,41,42 Furthermore, this study only focused on cancer patient plasma and all resulting observations should not automatically be applied to healthy individual plasma. Although ultra-deep-sequencing analysis showed in previous reports a roughly similar size distribution pattern in healthy and cancer plasma, previous works reported various distinguishing characteristics.11,18,20,41,42 We cannot rule out that in-depth scrutiny of size profile may reveal discriminating clear-cut assessment between healthy and cancer patient.

In conclusion, this study confirms the crucial importance of examining the structural features of any analytes circulating in blood, in particular with regards to their association with hetero-compounds. We compared DSP, SSP, and Q-PCR analysis in a blinded study to update and assimilate previous knowledge of cfDNA size profiles. The fragment length distribution of cfDNA, extracted from plasma of cancer patients, was very similar with the SSP-S and Q-PCR methods, which both rely on the analysis of single-stranded DNA as the initial matrix. Both approaches were clearly effective in optimally measuring cfDNA copy number, because a substantial fraction of cfDNA found by these methods consisted of short fragments that are not readily detectable by standard DSP protocols. We also observed that most of the detectable cfDNA in blood, as well as most of the shortest cfDNA fragments (down to ~40 nt), have footprint of a nucleosome, which appears the most stabilizing structure for DNA in the circulation. We conclude that cellular DNAs, initially packaged in chromatin, are released by different biological phenomena in the extracellular compartment in various structures undergoing degradation down to nucleosomes or to a lesser extent TF-associated subcomplexes, resulting from continuous dynamic internucleosomal and intranucleosomal nuclease activity. Thus, detectable cfDNA are mostly composed of a complex mixture of highly degraded DNA as regards to their primary, secondary, or tertiary structures. As sensitivity is clearly a limitation of cfDNA applications, delineating the structural features of cfDNAs may help adapt optimal analytical approaches to study cancer progression or tumor biology.