Significance Traditional microscopy is based on the propagation of interactions between light and small-scale objects up to larger scales. Such information may be encoded in DNA and transmitted with next-gen sequencing to be later reconstructed and visualized computationally. We provide a mathematical framework and computational proof of concept for a form of DNA-sequencing–based microscopy that may be used to construct whole images without the use of optics. Such an approach can be automated in a parallel and multiplexable way that current optical and scanning-based techniques are unable to achieve.

Abstract We describe a method whereby microscale spatial information such as the relative positions of biomolecules on a surface can be transferred to a sequence-based format and reconstructed into images without conventional optics. Barcoded DNA “polymerase colony” (polony) amplification techniques enable one to distinguish specific locations of a surface by their sequence. Image formation is based on pairwise fusion of uniquely tagged and spatially adjacent polonies. The network of polonies connected by shared borders forms a graph whose topology can be reconstructed from pairs of barcodes fused during a polony cross-linking phase, the sequences of which are determined by recovery from the surface and next-generation (next-gen) sequencing. We developed a mathematical and computational framework for this principle called polony adjacency reconstruction for spatial inference and topology and show that Euclidean spatial data may be stored and transmitted in the form of graph topology. Images are formed by transferring molecular information from a surface of interest, which we demonstrated in silico by reconstructing images formed from stochastic transfer of hypothetical molecular markers. The theory developed here could serve as a basis for an automated, multiplexable, and potentially superresolution imaging method based purely on molecular information.

Microscopic imaging has traditionally relied on optics to amplify signals derived from initially confined spatial regions. Exceptions include atomic force microscopy which images by using a probe to interact with the sample. DNA has a high information density, with storage levels of 5.5 petabits per cubic millimeter achieved (1), making it an attractive medium for encoding spatial information at microscales. In this paper, we present a theoretical foundation for a spatial information encoding approach that utilizes DNA sequencing and graph theory that could be used to generate whole images.

DNA-driven reactions can be coupled to optically acquired spatial information such as with proximity ligation assay (PLA) (2) and DNA-PAINT (3), where molecular interactions mediated by DNA are discovered using fluorescence. There is also a family of techniques for connecting spatial locations with single-cell RNA sequencing data: using a priori knowledge of spatial marker genes to associate unknown genes to approximate locations, the a priori data being in most cases obtained by microscopy such as with in situ hybridization or modeling of spatial expression patterns to retrieve locations of associated genes (4⇓⇓⇓⇓–9). Alternatively, direct microscopy-based in situ sequencing methods achieve precise context-sensitive spatial transcriptomic information without needing to scramble spatial data by dissociation prior to sequencing (10, 11).

Encoding spatial information in a way that is preserved in the scrambling during isolation and recovery from in situ contexts that can then be read and recovered with sequencing is a major challenge. A few techniques achieve this by encoding spatial information directly into a molecular format, e.g., in the form of DNA read during sequencing along with transcriptomic data. These methods are based on artificial generation of an addressable surface using printing or lithography (12⇓–14).

Herein, we describe a computational framework for a method called polymerase colony (polony) adjacency reconstruction for spatial inference and topology (PARSIFT), for the purpose of encoding images, for example of the positions of specific molecules relative to others on a 2D plane, directly into a DNA-based format without transduction of information through any other medium without a priori surface addressing. PARSIFT utilizes the connectivity of vertices in a graph of paired DNA sequences to infer Euclidean spatial adjacency and next-generation (next-gen) sequencing to recover that information a posteriori.

Encoding of topological data in DNA sequence format is possible by using DNA barcodes (unique molecular identifiers), i.e., randomized stretches of bases within a sequence of synthetic DNA. Barcodes associated with spatial patches can establish an identity for those locations, each patch distinguishable from another by sequence. A DNA barcode with 10 bases has over 1 million possible sequences, and larger barcodes can be used to create effectively unique labels in a system. The basic unit of topological data is an edge or association between 2 adjacent patches by physically linking between their barcodes. Topological mapping with barcoding has been used to infer neural connectomes by building a network from cells sharing common barcodes left by cell-traversing viruses (15) as well as features of DNA origami (16).

We can barcode surface patches using polony generation methods like bridge amplification (17), a 2-primer rolling-circle amplification (18), template walking amplification (19), or packing of barcoded beads (20). Unique “seed” strands are captured by primer strands on the surface (Fig. 1A) and locally amplified in the immediate vicinity where they landed. This generates numerous distinct patches, or “polonies,” of amplified DNA (Fig. 1B). Within each, all DNA is derived from a single seed molecule. Any of the above techniques could be applied to our method, although we focus herein on the polony-amplification by surface-primers approach.

Fig. 1. Encoding and recovering metrics through polony adjacency. (A) Seed molecules with unique barcode sequences land randomly on a surface of primers. (B) Local amplification of seed molecules produces sequence-distinct polonies. (C) Saturation of polonies occurs when polonies are blocked from further growth by encountering adjacent polonies, forming a tessellated surface. (D) Random cross-linking of adjacent strands leads to pairwise association of nearby barcodes. (E) Recovery and sequencing of barcode pairs enable reconstruction of a network with similar relative positions of polonies to those on the original surface.

By growing polonies on a surface of primers to saturation (Fig. 1C), i.e., when growing polonies encounter the boundaries of other adjacent polonies, a tessellation of neighboring polonies forms. Each polony has a limited number of immediately adjacent neighboring polonies with their own respective barcodes. Although each patch is associated with a unique sequence according to its parent seed molecule, isolation of this DNA and subsequent sequencing would scramble information about the polony’s position and its neighboring polonies. Thus the critical step is to cross-link strands (SI Appendix, Fig. S1) from each polony to strands from adjacent polonies (Fig. 1D) in a way that enables both barcodes to be sequenced together in a single read. Recovery of the strands, i.e., stripping them from the surface followed by next-gen sequencing (by any means including nonoptical approaches such as Oxford Nanopore), thus preserves topological association between neighboring polonies as pairs of barcodes—a complete set of which constitutes the whole topological network of adjacent polonies (Fig. 1E). For random seed distributions we show that topological information alone, constrained by being a 2D planar network with known boundary geometry, retains significant spatial metrics of the original distribution. By generating such a mappable surface, we propose that localization of molecules bound to the surface can be done by covalent association with polonies, enabling inference of molecular spatial distributions and construction of images with polonies as pixels.

Conclusion The 3 reconstruction methods (Tutte embedding, spring relaxation, and topological distance matrix) succeed in producing approximations of the original seed distributions that can be used to generate images. Tutte embedding exhibited the best estimated algorithmic complexity (based on run-time scaling with λ; SI Appendix, Fig. S14), making it the fastest technique which becomes significant for large reconstruction problems ( λ > 10,000 polonies per unit area). Both Tutte embedding and spring relaxation had the lowest distortion levels, with Tutte embedding exhibiting slightly better D f and l e v G , G ′ scaling with λ. Tutte embedding was sensitive to catastrophic failure at low ρ, with singly connected edges crashing the reconstruction, and all 4 approaches were sensitive to disjoint subgraphs—making noisy and unconnected graph data a likely challenge for experimental scenarios. SI Appendix, Fig. S13 and section I discuss our attempts to move toward an algorithm that optimally exploits the available information, and future research should seek to establish a provably maximum-entropy reconstruction that is efficient and deterministic. Along these lines, using information such as the number of self-pairing events could be useful to extract more information and weight edges according to estimated polony size and better control point placement. Alternatively, low-information content self-pairing events could be prohibited through a bipartite network approach whereby only pairings between A-type and B-type polonies would be allowed (SI Appendix, Fig. S15). The bridge amplification approach to polony generation leaves the possibility of doing this with 2 species of independent primers on the surface and 2 interpenetrating/overlapping and independently saturated polony surfaces. Another possible approach is series growth of polonies. In the basic concept presented in previous sections, a primer of uniform sequence is assumed; however, generation of a saturated layer of polonies that could then be used as primers for a subsequent polony generation step would then result in an overlapping of every second-layer polony with multiple first-layer polonies. This would result in efficient pairing of barcodes without the need for subsequent cross-linking steps. At the time of publication, we are aware of concurrent works whose contributions are complementary to ours on development of DNA-sequencing–based microscopy (25, 26). The former work experimentally demonstrates DNA microscopy with images of mRNA in cells using locally confined cDNA amplifications and polymerase extension-based fusion of barcodes to connect spatial patches. Their approach differs from ours through the fact that fusion events are used as a direct distance metric, whereas our data instead rely on topology as a proxy for Euclidean metrics. The latter work uses series proximity ligation to associate planar spatial patches and form a network, using a spring relaxation approach for reconstruction. PARSIFT is a concept for microscopic image reconstruction using spatial information encoding in DNA base format. We showed an in silico proof of concept by constructing a pipeline for taking decoupled edge data, generated from simulated polony distributions, that are then reassembled into a topological network and embedded in a Euclidean plane, resuming spatial characteristics of the original seed distribution. We saw that global distortions are low enough to resolve whole images. We hold that this framework and pipeline for reconstruction could be exploited for image acquisition of micro- and nanoscale surfaces with molecular libraries of potentially very high multiplicity and with throughput automated in a way that would not be possible with most optical approaches.

Supporting Information (SI Appendix) The code is available at https://github.com/Intertangler/parsift.

Acknowledgments We thank Ferenc Fördős for insightful discussions. This work was supported by Åke Wiberg Stiftelsen Grant for medical research M17-0214 (to I.T.H.), the Knut and Alice Wallenberg project Grant 2017.0114, the Knut and Alice Wallenberg Academy Fellow Grant KAW2014.0241 (to B.H.), and Academy of Finland Grant 311639 (to P.O.).

Footnotes Author contributions: I.T.H., G.B., and B.H. designed research; I.T.H., Y.Y., P.O., and B.H. performed research; I.T.H., Y.Y., P.O., and B.H. analyzed data; and I.T.H., Y.Y., P.O., and B.H. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. A.D.E. is a guest editor invited by the Editorial Board.

Data deposition: Code related to this work is available on GitHub at https://github.com/Intertangler/parsift.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1821178116/-/DCSupplemental.