Imaging process

The experimental set-up is shown in Fig. 1. Our scene consists of a 40-cm high and 25-cm wide wall referred to as the diffuser wall. We use an ultrafast laser and a streak camera and both are directed at this wall. As a time reference, we also direct an attenuated portion of the laser beam into the field of view of the streak camera (Fig. 2). The target object is hidden in the scene (mannequin in Fig. 1), so that direct light paths between the object and the laser or the camera are blocked. Our goal is to produce 3D range data of the target object.

Figure 1: Experimental set-up. (a) The capture process: we capture a series of images by sequentially illuminating a single spot on the wall with a pulsed laser and recording an image of the dashed line segment on the wall with a streak camera. The laser pulse travels a distance r 1 to strike the wall at a point L; some of the diffusely scattered light strikes the hidden object (for example at s after travelling a distance r 2 ), returns to the wall (for example at w, after travelling over r 3 ) and is collected by the camera after travelling the final distance r 4 from w to the camera centre of projection. The position of the laser beam on the wall is changed by a set of galvanometer-actuated mirrors. (b) An example of streak images sequentially collected. Intensities are normalized against a calibration signal. Red corresponds to the maximum, blue to the minimum intensities. (c) The 2D projected view of the 3D shape of the hidden object, as recovered by the reconstruction algorithm. Here the same colour map corresponds to backprojected filtered intensities or confidence values of finding an object surface at the corresponding voxel. Full size image

Figure 2: Streak image with calibration spot. The calibration spot in a streak image (highlighted with an arrow). The calibration spot is created by an attenuated beam split off the laser beam that strikes the wall in the field of view of the camera. It allows monitoring of the long-term stability of the system and calibration for drifts in timing synchronization. Full size image

The streak camera records a streak image with one spatial dimension and one temporal dimension. We focus the camera on the dashed line segment on the diffuser wall shown in Fig. 1a. We arrange the scanning laser to hit spots on the wall above or below this line segment so that single bounce light does not enter the camera. Though the target object is occluded, light from the laser beam is diffusely reflected by the wall, reaches the target object, is reflected by multiple surface patches, and returns back to the diffuser wall, where it is reflected again and captured by the camera. In a traditional camera, this image would contain little or no information about the occluded target object (Supplementary Figs S5 and S6; Supplementary Methods).

In our experimental set-up, the laser emits 50-fs long pulses. The camera digitizes information in time intervals of 2 ps. We assume the geometry of the directly visible part of the set-up is known. Hence, the only unknown distances in the path of the laser pulses are those from the diffuser wall to the different points on the occluded target object and back (paths r 2 and r 3 in Fig. 1). The 3D geometry of the occluded target is thus encoded in the streak images acquired by the camera and decoded using our reconstruction algorithm. The recorded streak images lack correspondence information, that is, we do not know which pulses received by the camera came from which surface point on the target object. Hence, a straightforward triangulation or trilateration to determine the hidden geometry is not possible.

Consider a simple scenario with a small hidden patch as illustrated in Fig. 3a. It provides intuition on how the geometry and location of the target object are encoded in the streak images. The reflected spherical pulse propagating from the hidden patch arrives at the points on the diffuser wall with different time delays and creates a hyperbolic curve in the space-time streak image. We scan and successively change the position of the laser spot on the diffuser wall. The shape and position of the recorded hyperbolic curve varies accordingly. Each pixel in a streak image corresponds to a finite area on the wall and a 2-ps time interval, a discretized space-time bin. However, the effective time resolution of the system is 15 ps owing to a finite temporal-point spread function of the camera. The detailed description of image formation is included in the Supplementary Methods.

Figure 3: Reconstruction algorithm. An illustrative example of geometric reconstruction using streak camera images. (a) Data capture. The object to be recovered consists of a 2 cm×2 cm size square white patch beyond the line of sight (that is, hidden). The patch is mounted in the scene and data is collected for different laser positions. The captured streak images corresponding to three different laser positions are displayed in the top row. Shapes and timings of the recorded response vary with laser positions and encode the position and shape of the hidden patch. (b) Contributing voxels in Cartesian space. For recovery of hidden position, consider the choices of contributing locations. The possible locations in Cartesian space that could have contributed intensity to the streak image pixels p, q, r are the ellipses p′, q′, r′ (ellipsoids in 3D). For illustration, these three ellipse sections are also shown in (a) bottom left in Cartesian coordinates. If there is a single world point contributing intensity to all 3 pixels, the corresponding ellipses intersect, as is the case here. The white bar corresponds to 2 cm in all sub-figures. (c) Backprojection and heatmap. We use a backprojection algorithm that finds overlayed ellipses corresponding to all pixels, Here we show summation of elliptical curves from all pixels in the first streak image. (d) Backprojection using all pixels in a set of 59 streak images. (e) Filtering. After filtering with a second derivative, the patch location and 2-cm lateral size are recovered. Full size image

The inverse process to recover the position of the small hidden patch from the streak images is illustrated in Fig. 3b–e. Consider three pixels p, q and r in the streak image at which non-zero light intensity is measured (Fig. 3a). The possible locations in the world that could have contributed to a given pixel lie on an ellipsoid in Cartesian space (see also Supplementary Fig. S1). The foci of this ellipsoid are the laser spot on the diffuser wall and the point on the wall observed by that pixel. For illustration, we draw only a 2D slice of the ellipsoid, that is, an ellipse, in Fig. 3b. The individual ellipses from each of the three pixels p, q and r intersect at a single point. In the absence of noise, the intersection of three ellipses uniquely determines the location of the hidden surface patch that contributed intensity to the three camera pixels. In practice, we lack correspondence, that is, we do not know whether or not light detected at two pixels came from the same 3D surface point.

Therefore, we discretize the Cartesian space into voxels and compute the likelihood of the voxel being on a hidden surface. For each voxel, we determine all streak image pixels that could potentially have received contributions of this voxel based on the time-of-flight r 1 +r 2 +r 3 +r 4 and sum up the measured intensity values in these pixels. In effect, we let each pixel vote for all points on the corresponding ellipsoid. The signal energy contributed by each pixel is amplified by a factor to compensate for the distance attenuation. If the distance attenuation factor were not accounted for, the scene points that are far away from the wall would be attenuated by a factor of (r 2 r 3 )2 and would be lost during the reconstruction. Therefore, we amplify the contribution of each pixel to a particular voxel by a factor of (r 2 r 3 )α before backprojection. Reconstruction quality depends weakly on the value of α. We experimented with various values of α and found that α=1 is a good choice for reduced computation time. This process of computing likelihood by summing up weighted intensities is called backprojection12. We call the resulting 3D scalar function on voxels a heatmap.

The summation of weighted intensities from all pixels in a single-streak image creates an approximate heatmap for the target patch (Fig. 3c). Repeating the process for many laser positions on the diffuser wall, and using pixels from the corresponding streak images provides a better approximation (Fig. 3d and Supplementary Fig. S2). In practice, we use ∼60 laser positions. Traditional backprojection requires a high-pass filtering step. We use the second derivative of the data along the z direction of the voxel grid and approximately perpendicular to the wall as an effective filter and recover the hidden surface patch in Fig. 3e. Because values at the voxels in the heatmap are the result of summing a large number of streak image pixels, the heatmap contains low noise and the noise amplification associated with a second-derivative filter is acceptable.

Algorithm

The first step of our imaging algorithm is data acquisition. We sequentially illuminate a single spot on the diffuser wall with a pulsed laser and record an image of the line segment of the wall with a streak camera. Then, we estimate an oriented bounding box for the working volume to set up a voxel grid in Cartesian space (see Methods). In the backprojection, for each voxel, we record the summation of weighted intensities of all streak image pixels that could potentially have received contributions of this voxel based on the time of flight. We store the resulting 3D heatmap of voxels. The backprojection is followed by filtering. We compute a second derivative of the heatmap along the direction of the voxel grid facing away from the wall. In an optional post processing step, we compute a confidence value for each voxel by computing local contrast with respect to the voxel neighbourhood in the filtered heatmap. To compute contrast, we divide each voxel heatmap value by the maximum in the local neighbourhood. For better visualization, we apply a soft threshold on the voxel confidence value.

We estimate the oriented bounding box of the object in the second step by running the above algorithm at low spatial target resolution and with down-sampled input data. Details of the reconstruction process and the algorithm can be found in the Methods as well as in the Supplementary Methods.

Reconstructions

We show results of the 3D reconstruction for multipart objects in Figs 4 and 5. The mannequin in Fig. 4 contains nonplanar surfaces with variations in depth and occlusions. We accurately recover all major geometrical features of the object. Figure 4i shows the reconstruction of the same object in slightly different poses to demonstrate the reproducibility and stability of the method as well as the consistency in the captured data. The sporadic inaccuracies in the reconstruction are consistent across poses and are confined to the same 3D locations. The stop-motion animation in Supplementary Movie 1 shows the local nature of the missing or phantom voxels more clearly; another stop motion reconstruction is shown in Supplementary Fig. S4. Hence, the persistent inaccuracies are not due to signal noise or random measurement errors. This is promising as the voxel confidence errors are primarily due to limitations in the reconstruction algorithm and instrument calibration. These limitations can be overcome with more sophistication. Our current method is limited to diffuse reflection from near-Lambertian opaque surfaces. Parts of the object that are occluded from the diffuser wall or facing away from it are not reconstructed.

Figure 4: Complex object reconstruction. (a) Photo of the object. The mannequin is ∼20 cm tall and is placed about 25 cm from the diffuser wall. (b) Nine of the sixty raw streak images. (c) Heatmap. Visualization of the heatmap after backprojection. The maximum value along the z direction for each x–y coordinate in Cartesian space. The hidden shape is barely discernible. (d) Filtering. The second derivative of the heatmap along depth (z) projected on the x–y plane reveals the hidden shape contour. (e) Depth map. Colour-encoded depth (distance from the diffuser wall) shows the left leg and right arm closer in depth compared with the torso and other leg and arm. (f) Confidence map. A rendered point cloud of confidence values after soft threshold. Images (g,h) show the object from different viewpoints after application of a volumetric blurring filter. (i) The stop-motion animation frames from multiple poses to demonstrate reproducability. See Supplementary Movie 1 for an animation. Shadows and the ground plane in images (f–i) have been added to aid visualization. Full size image

Figure 5: Depth in reconstructions. Demonstration of the depth and lateral resolution. (a) The hidden objects to be recovered are three letters, I, T, I at varying depths. The 'I' is 1.5 cm wide and all letters are 8.2 cm high. (b) 9 of 60 images collected by the streak camera. (c) Projection of the heatmap created by the back projection algorithm, on the x–y plane. (d) Filtering after computing second derivative along depth (z). The colour in these images represents the confidence of finding an object at the pixel position. (e) A rendering of the reconstructed 3D shape. Depth is colour coded and semi-transparent planes are inserted to indicate the ground truth. The depth axis is scaled to aid visualization of the depth resolution. Full size image

Figure 5 shows a reconstruction of multiple planar objects at different unknown depths. The object planes and boundaries are reproduced accurately to demonstrate depth and lateral resolution. For further investigation of resolution see Supplementary Figs S7 and S8. The reconstruction is affected by several factors such as calibration, sensor noise, scene size and time resolution. Below, we consider them individually.

The sources of calibration errors are lens distortions on the streak camera that lead to a warping of the collected streak image, measurement inaccuracies in the visible geometry, and measurement inaccuracies of the centre of projection of the camera and the origin of the laser. For larger scenes, the impact of static calibration errors would be reduced.

The sensor introduces intensity noise and timing uncertainty, that is, jitter. The reconstruction of 3D shapes is more dependent on the accuracy of the time of arrival than the signal-to-noise ratio (SNR) in received intensity. Jitter correction, as described in the Methods, is essential, but does not remove all uncertainties. Improving the SNR is desirable because it yields faster capture times. Similar to many commercial systems, for example, LIDAR, the SNR could be significantly improved by using an amplified laser with more energetic pulses and a repetition rate in the kilohertz range and a triggered camera. The overall light power would not change, but fewer measurements for light collection could significantly reduce signal independent noise such as background and shot noise.

We could increase the scale of the system for larger distances and bigger target objects. By using a longer pulse, with proportionally reduced target resolution and increased aperture size, one could build systems without any change in the ratio of received and emitted energy, that is, the link budget. When the distance r 2 between diffuser wall and the hidden object (Fig. 1) is increased without increasing the size of the object, the signal strength drops dramatically (∝1/(r 2 r 3 )2) and the size of the hidden scene is therefore limited. A configuration where laser and camera are very far from the rest of the scene is, however, plausible. A loss in received energy can be reduced in two ways. The laser beam can be kept collimated over relatively long distances and the aperture size of the camera can be increased to counterbalance a larger distance between camera and diffuser wall.

The timing resolution, along with spatial diversity in the positions of spots illuminated and viewed by the laser, and the camera affects the resolution of 3D reconstructions. Extra factors include the position of the voxel in Cartesian space and the overall scene complexity. The performance evaluation subsection of the Supplementary Methods describes depth and lateral resolution. In our system, translation along the direction perpendicular to the diffuser wall can be resolved with a resolution of 400 μm—better than the full width half maximum time resolution of the imaging system. Lateral resolution in a plane parallel to the wall is lower and is limited to 0.5–1 cm depending on proximity to the wall.