Computer imaging techniques are commonly used to preserve and share readable manuscripts, but capturing writing locked away in ancient, deteriorated documents poses an entirely different challenge. This software pipeline—referred to as “virtual unwrapping”—allows textual artifacts to be read completely and noninvasively. The systematic digital analysis of the extremely fragile En-Gedi scroll (the oldest Pentateuchal scroll in Hebrew outside of the Dead Sea Scrolls) reveals the writing hidden on its untouchable, disintegrating sheets. Our approach for recovering substantial ink-based text from a damaged object results in readable columns at such high quality that serious critical textual analysis can occur. Hence, this work creates a new pathway for subsequent textual discoveries buried within the confines of damaged materials.

Keywords

INTRODUCTION

In 1970, archeologists made a dramatic discovery at En-Gedi, the site of a large, ancient Jewish community dating from the late eighth century BCE until its destruction by fire circa 600 CE. Excavations uncovered the synagogue’s Holy Ark, inside of which were multiple charred lumps of what appeared to be animal skin (parchment) scroll fragments (1, 2). The Israel Antiquities Authority (IAA) faithfully preserved the scroll fragments, although in the 40 years following the discovery, no one produced a means to overcome the irreversible damage they had suffered in situ. Each fragment’s main structure, completely burned and crushed, had turned into chunks of charcoal that continued to disintegrate every time they were touched. Without a viable restoration and conservation protocol, physical intervention was unthinkable. Like many badly damaged materials in archives around the world, the En-Gedi scroll was shelved, leaving its potentially valuable contents hidden and effectively locked away by its own damaged condition (Fig. 1).

Fig. 1 The charred scroll from En-Gedi. Image courtesy of the Leon Levy Dead Sea Scrolls Digital Library, IAA. Photo: S. Halevi.

The implementation and application of our computational framework allows the identification and scholarly textual analysis of the ink-based writing within such unopened, profoundly damaged objects. Our systematic approach essentially unlocks the En-Gedi scroll and, for the first time, enables a total visual exploration of its interior layers, leading directly to the discovery of its text. By virtually unwrapping the scroll, we have revealed it to be the earliest copy of a Pentateuchal book ever found in a Holy Ark. Furthermore, this work establishes a restoration pathway for damaged textual material by showing that text extraction is possible while avoiding the need for injurious physical handling. The restored En-Gedi scroll represents a significant leap forward in the field of manuscript recovery, conservation, and analysis.

Virtual unwrapping Our generalized computational framework for virtual unwrapping applies to a wide range of damaged, text-based materials. Virtual unwrapping is the composite result of segmentation, flattening, and texturing: a sequence of transformations beginning with the voxels of a three-dimensional (3D) unstructured volumetric scan of a damaged manuscript and ending with a set of 2D images that reveal the writing embedded in the scan (3–6). The required transformations are initially unknown and must be solved by choosing a model and applying a series of constraints about the known and observable structure of the object. Figure 2 shows the final result for the scroll from En-Gedi. This resultant image, which we term the “master view,” is a visualization of the entire surface extracted from within the En-Gedi scroll. Fig. 2 Completed virtual unwrapping for the En-Gedi scroll. The first stage, segmentation, is the identification of a geometric model of structures of interest within the scan volume. This process digitally recreates the “pages” that hold potential writing. We use a triangulated surface mesh for this model, which can readily support many operations that are algorithmically convenient: ray intersection, shape dynamics, texturing, and rendering. A surface mesh can vary in resolution as needed and forms a piecewise approximation of arbitrary surfaces on which there may be writing. The volumetric scan defines a world coordinate frame for the mesh model; thus, segmentation is the process of aligning a mesh with structures of interest within the volume. The second stage, texturing, is the formation of intensity values on the geometric model based on its position within the scan volume. This is where we see letters and words for the first time on the recreated page. The triangulated surface mesh offers a direct approach to the texturing problem that is similar to solid texturing (7, 8): Each point on the surface of the mesh is given an intensity value based on its location in the 3D volume. Many approaches exist for assigning intensities from the volume to the triangles of the segmented mesh, some of which help to overcome noise in the volumetric imaging and incorrect localization in segmentation. The third stage, flattening, is necessary because the geometric model may be difficult to visualize as an image. Specifically, if text is being extracted, it will be challenging to read on a 3D surface shaped like the cylindrical wraps of scrolled material. This stage solves for a transformation that maps the geometric model (and associated intensities from the texturing step) to a plane, which is then directly viewable as a 2D image for the purpose of visualization. In practice, this framework is best applied in a piecewise fashion to accurately localize a scroll’s highly irregular geometry. Also, the methodology required to map each of these steps from the original volume to flattened images involves a series of algorithmic decisions and approximations. Because textual identification is the primary goal of our virtual unwrapping framework, we tolerate mathematical and geometric error along the way to ensure that we extract the best possible images of text. Hence, the final merging and visualization step is significant not only for composing small sections into a single master view but also for checking the correctness and relative alignments of individual regions. Therefore, it is crucial to preserve the complete transformation pipeline that maps voxels in the scan volume to final pixels in the unwrapped master view so that any claim of extracted text can be independently verified.

The volumetric scan The unwrapping process begins by acquiring a noninvasive digitization that gives some representation of the internal structure and contents of an object in situ (9–11). There are a number of choices for noninvasive, penetrative, and volumetric scanning, and our framework places no limits on the modality of the scan. As enhancements in volumetric scanning methodology [for example, phase-contrast microtomography (6, 12)] occur, we can take advantage of the ensuing potential for improved images. Whatever the scanning method, it must be appropriate to the scale and to the material and physical properties of the object. Because of the particularities of the En-Gedi scroll, we used x-ray–based micro–computed tomography (micro-CT). The En-Gedi scroll’s damage creates a scanning challenge: How does one determine the correct scan protocol before knowing how ink will appear or even if the sample contains ink at all? It is the scan and subsequent pipeline that reveal the writing. After several calibration scans, a protocol was selected that produced a visible range of intensity variation on the rolled material. The spatial resolution was adjusted with respect to the sample size to capture enough detail through the thickness of each material layer to reveal ink if present and detectable. The chemical composition of the ink within the En-Gedi scroll remains unknown because there are no exposed areas suitable for analysis. However, the ink response within the micro-CT scan is denser than other materials, implying that it likely contains metal, such as iron or lead. Any analysis necessitates physical handling of the friable material, and so, even noninvasive methods must be approached with great care. Although low-power x-rays themselves pose no significant danger to inanimate materials, the required transport and handling of the scroll make physical conservation and preservation an ever-present concern. However, once acquired, the volumetric scan data become the basis for all further computations, and the physical object can be returned to the safety of its protective archive.

Segmentation Segmentation, which is the construction of a geometric model localizing the shape and position of the substrate surface within the scan on which text is presumed to appear, is challenging for several reasons. First, the surface as presented in the scanned volume is not developable, that is, isometric to a plane (13–15). Although an isometry could be useful as a constraint in some cases, the skin forming the layers of the En-Gedi scroll has not simply been folded or rolled. Damage to the scroll has deformed the shape of the skin material, which is apparent in the 3D scanned volume, making such a constraint unworkable. Second, the density response of animal skin in the volume is noisy and difficult to localize with methods such as marching cubes (16). Third, layers of the skin that are close together create ambiguities that are difficult to resolve from purely local, shape-based operators. Figure 3 shows four distinct instances where segmentation proves challenging because of the damage and unpredictable variation in the appearance of the surface material in the scan volume. Fig. 3 Segmentation challenges in the En-Gedi scroll, based on examples in the slice view. Double/split layering and challenging cell structure (left), ambiguous layers with unknown material (middle left), high-density “bubbling” on the secondary layer (middle right), and gap in the primary layer (right). Our segmentation algorithm applied to the En-Gedi scroll builds a triangulated surface mesh that localizes a coherent section of the animal skin within a defined subvolume through a novel region-growing technique (Fig. 4). The basis for the algorithm is a local estimate of the differential geometry of the animal skin surface using a second-order symmetric tensor and associated feature saliency measures (17). An initial set of seed points propagates through the volume as a connected chain, directed by the local symmetric tensor and constrained by a set of spring forces. The movement of this particle chain through the volume traces out a surface over time. Figure 5 shows how crucially dependent the final result is on an accurate localization of the skin. When the segmented geometry drifts from the skin surface (Fig. 5A), the surface features disappear. When the skin is accurately localized (Fig. 5B), the surface detail, including cracks and ink evidence, becomes visible. Fig. 4 A portion of the segmented surface and how it intersects the volume. Fig. 5 The importance of accurate surface localization on the final generated texture. (A) Texture generated when the surface is only partially localized. (B) Texture generated when surface is accurately localized. The user can tune the various parameters of this algorithm locally or globally based on the data set and at any time during the segmentation process. This allows for the continued propagation of the chain without the loss of previously segmented surface information. The segmentation algorithm terminates either at the user’s request or when a specified number of slices have been traversed by all of the particles in the chain. The global structure of the entire surface is a piecewise composition of many smaller surface sections. Although it is certainly possible to generate a global structure through a single segmentation step, approaching the problem in a piecewise manner allows more accurate localization of individual sections, some of which are very challenging to extract. Although the segmented surface is not constrained to a planar isometry at the segmentation step, the model implicitly favors an approximation of an isometry. Furthermore, the model imposes a point-ordering constraint that prevents sharp discontinuities and self-intersections. The segmented surface, which has been regularized, smoothed, and resampled, becomes the basis for the texturing phase to visualize the surface with the intensities it inherits from its position in the volume.

Texturing Once the layers of the scroll have been identified and modeled, the next step is to render readable textures on those layers. Texturing is the assignment of an intensity, “or brightness,” value derived from the volume to each point on a segmented surface. The interpretation of intensity values in the original volumetric scan is maintained through the texturing phase. In the case of micro-CT, intensities are related to density: Brighter values are regions of denser material, and darker values are less dense (18). A coating of ink made from iron gall, for example, would appear bright, indicating a higher density in micro-CT. Our texturing method is similar to the computer graphics approach of “solid texturing,” a procedure that evaluates a function defined over R3 for each point to be textured on the embedded surface (7, 8). In our case, the function over R3 is simply a lookup to reference the value (possibly interpolated) at that precise location in the volume scan. In an ideal case, where both the scanned volume and localized surface mesh are perfect, a direct mapping of each surface point to its 3D volume position would generate the best possible texture. In practice, however, errors in surface segmentation combined with artifacts in the scan create the need for a filtering approach that can overcome these sources of noise. Therefore, we implement a neighborhood-based directional filtering method, which gives parametric control over the texturing. The texture intensity is calculated from a filter applied to the set of voxels within each surface point’s local neighborhood. The parameters (Fig. 6) include use of the point’s surface normal direction (directional or omnidirectional), the shape and extent of the local texturing neighborhood, and the type of filter applied to the neighborhood. The directional parameter is particularly important when attempting to recover text from dual-sided materials, such as books. In such cases, a single segmented surface can be used to generate both the recto and verso sides of the page. Figure 7 shows how this texturing method improves ink response in the resulting texture when the segmentation does not perfectly intersect the ink position on the substrate in the volumetric scan. Fig. 6 The geometric parameters for directional texturing. Fig. 7 The effect of directional texturing to improve ink response. (Left) Intersection of the mesh with the volume. (Right) Directional texturing with a neighborhood radius of 7 voxels.

Flattening Region-growing in an unstructured volume generates surfaces that are nonplanar. In a scan of rolled-up material, most surface fragments contain high-curvature areas. These surfaces must be flattened to easily view the textures that have been applied to them. The process of flattening is the computation of a 3D to 2D parameterization for a given mesh (6, 19, 20). One straightforward assumption is that a localized surface cannot self-intersect and represents a coherent piece of substrate that was at one time approximately isometric to a plane. If the writing occurred on a planar surface before it was rolled up, and if the rolling itself induced no elastic deformations in the surface, then damage is the only thing that may have interrupted the isometric nature of the rolling. We approach parameterization through a physics-based material model (4, 21, 22). In this model, the mesh is represented as a mass-spring system, where each vertex of the mesh is given a mass and the connections between vertices are treated as springs with associated stiffness coefficients. The mesh is relaxed to a plane through a balanced selection of appropriate forces and parameters. This process mimics the material properties of isometric deformation, which is analogous to the physical act of unwrapping. A major advantage of a simulation-based approach is the wide range of configurations that are possible under the framework. Parameters and forces can be applied per vertex or per spring. This precise control allows for modeling of not only the geometric properties of a surface but also the physical properties of that surface. For example, materials with higher physical elasticity can be represented as such within the same simulation. Although this work relies on computing parameterizations solely through this simulation-based method, a hybrid approach that begins with existing parameterization methods [for example, least-squares conformal mapping (LSCM) (23) and angle-based flattening (ABF) (24)] followed by a physics-based model is also workable. The purely geometric approaches of LSCM and ABF produce excellent parameterizations but have no natural way to capture additional constraints arising from the mesh as a physical object. By tracking the physical state of the mesh during parameterization via LSCM or ABF, a secondary correction step using the simulation method could then be applied to account for the mesh’s physical properties.