An introduction to non-line-of-sight imaging, and how to create images of hidden objects around a corner in under 10 minutes!

You might be thinking that recording strange patterns of light doesn’t seem very useful, but that is precisely the kind of data being collected by a NLOS imager. While images of light wave patterns is nice and all, that doesn’t really live up to the hype of capturing images of hidden objects in a scene. Read on to see how 3D images can be formed from such apparently useless data.

NLOS imagers need such small exposure times because they are actually recording the light patterns that form on a surface within the scene. When the laser illuminates the scene, its light travels throughout the scene and creates patterns of illumination on the observed surface. If you have ever observed patterns of light on the bottom of a swimming pool or sandy lake bottom, it is a similar effect.

The camera has lenses and a shutter to collect light only during some particular period of time. Most cameras you have used capture light over \(1/60^{th}\), or perhaps even \(1/300^{th}\), of a second duration, but NLOS cameras need to capture light more precisely than that. They collect light over picosecond durations, that’s \(1/1000000000000^{th}\) of a second!

Then a "flash" is used to illuminate the scene. In the case of NLOS imaging, that flash is a high power laser. That light bounces around the scene, reflects off the observed surface, and eventually enters the camera. The vertical-temporal image below shows the laser "flash" bouncing around the scene and returning to the imaging surface as various wavefronts.

A diagram of an example scene with an NLOS camera outside a room and hidden objects within the room.

The process of capturing NLOS image data is fairly similar to taking regular image data, and while there are numerous variations of the approach, they all share the same components. First, there is an opening to the scene, like a doorway or window into a room, through which the camera can observe a surface within the scene, like a wall, floor, or ceiling.

Imagine being able to take a three dimensional image of a room without having to enter it. That seems like a rather unbelievable accomplishment, but it is slowing becoming a reality with the technologies being developed in computational imaging, particularly non-line-of-sight imaging. Even though compact, commercialized forms of the technology are still a ways off in the future, it is already possible to build a "camera" to collect the data with today’s technology.

Comparing the methods, you can see why we selected the fast f-k migration algorithm for implementation. It is accurate, requires less software development effort, and can run quickly on a desktop.

There are many ways to form a 3D image from a sequence of light wave patterns, so we list some of the approaches below along with key highlights of each method:

A Julia implementation of f-k migration

We implemented in Julia the fast f-k migration algorithm presented at SIGGRAPH 2019 in Wave-based non-line-of-sight imaging using fast f−k migration [Lindell et al. 2019] to reconstruct 3D images from the NLOS image data. The algorithm involves four rather simple steps:

Calibrate the collected data. Transform the data. Using Stolt interpolation, resample along the temporal frequency dimension. Inverse transform the data to reconstruct the 3D scene.

We also need some NLOS imagery to try out our implementation of the algorithm, so we obtained datasets collected and published by the Stanford Computational Imaging Lab. It is the same data used in [Lindell et al. 2019] as well.

Imagery captured by an NLOS imager after calibration. Your browser does not support the video tag.

Calibration adjusts each pixel for the time it takes the light to travel from the observed surface to the camera. At the picosecond time scales captured in the NLOS image data, the travel time of light is non-negligible, so even if a light wave simultaneously illuminates the entire imaging surface, the camera would actually observe what appears to be a wave of light washing across the surface. Therefore, calibration is a very important step to processing the data.

The video on the right shows NLOS imagery and how the light wave patterns progress over time. The curved "black curtain" moving from right to left at the end of the video is a result of calibration and the different distances light had to travel for each pixel.

The three dimensions of the measured imagery, \(\tau(x,y,t)\), are the horizontal and vertical pixel counts of the imaging surface, \(X\) and \(Y\) respectively, and number of temporal samples, \(T\). To calibrate, each pixel’s vector of \(T\) samples must be shifted by its calibration time.

(X, Y, T) = size(tau) for x in 1:X, y in 1:Y delay = floor(Int, calib[x, y] / 32) tau[x, y, :] .= circshift(tau[x, y, :], -delay) end

Now the first time sample of each pixel corresponds to the exact same point in time at the imaging surface rather than at the camera. In seismic or acoustic applications of f-k migration, it is possible to acquire the image as phase and amplitude data, and in fact the algorithm assumes that data exists. However, in our application the wavelengths are too short to measure phase and amplitude effectively, so \(\tau(x,y,t)\) is an image of light intensity rather than amplitude and phase.

\[\tau(x,y,t)=\left|\Psi(x,y,t)\right|^{2}\]

We must, therefore, convert to an amplitude image, \(\Psi(x,y,t)\), using the inverse of the relationship above as our first transform. Next, we pad the amplitude image with zeros, apply a Fourier transform, and shift the frequency data to be centered on the zero-frequency.

Psi = zeros(eltype(tau), map(s -> 2*s, size(tau))) for t in 1:T Psi[1:X, 1:Y, t] .= sqrt.(tau[1:X, 1:Y, t]) end PhiBar = fftshift(fft(Psi))

With our imaging data in the frequency domain, \(\bar{\Phi}(x,y,f)\), we are ready to apply Stolt interpolation. You can think of Stolt interpolation as curving the data in the way that wavefronts are curved as they reflect from a surface. Before continuing though, we wish to point out that Stolt interpolation has a significant drawback: it is based on the assumption that waves travel at the same velocity throughout the entire scene. This is a limitation to consider when modeling acoustic wave propagation where density can significantly affect velocity, but it is an acceptable assumption for our application of the technique to visible light propagation.

\[\Phi(k_{x},k_{y},k_{z})=\frac{\frac{c}{2}\left|k_{z}\right|}{\sqrt{k_{x}^{2}+k_{y}^{2}+k_{z}^{2}}}\bar{\Phi}(k_{x},k_{y},\frac{c}{2}\sqrt{k_{x}^{2}+k_{y}^{2}+k_{z}^{2}})\]

We implement the interpolation function, \(\Phi(x,y,f)\), in the following snippet of Julia code. Being careful to convert indices to a common unit for proper application of Stolt migration and avoid divide-by-0 situations are both key to getting the correct results. As you can see, we opted to create a couple of helper functions rather than stuff all the code into a large loop block.

function stoltLinearInterpZ(A, x, y, z) zLow = floor(Int, z) zHigh = ceil(Int, z) low = 1 <= zLow <= size(A, 3) ? A[x, y, zLow] : 0 high = 1 <= zHigh <= size(A, 3) ? A[x, y, zHigh] : 0 return low*(zHigh-z) + high*(z-zLow) end function stoltInterp(A, x, y, z; physX, physY, resT) i = 2*(x-1)/size(A, 1) - 1 j = 2*(y-1)/size(A, 2) - 1 k = 2*(z-1)/size(A, 3) - 1 i *= (size(A, 1)/2*c*resT) / (4*(physX/2)) j *= (size(A, 2)/2*c*resT) / (4*(physY/2)) l = sqrt(i^2 + j^2 + k^2) f = (l + 1)*size(A, 3)/2 + 1 S = max(c/2 * k/max(l, eps()), 0) return k > 0 ? S*stoltLinearInterpZ(A, x, y, f) : 0 end Phi = zero(PhiBar) for K_x in 1:size(Phi, 1), K_y in 1:size(Phi, 2), K_z in 1:size(Phi, 3) Phi[K_x, K_y, K_z] = stoltInterp(PhiBar, K_x, K_y, K_z, physX = 2, physY = 2, resT = 32e-12) end

Finally, we bring the data back from the frequency domain with an inverse shift and inverse Fourier transform. The last detail is to crop the volume to remove the padding and apply the intensity-amplitude relationship to reconstruct the 3D image.

Psi = ifft(ifftshift(Phi)) tau = abs2.(Psi[1:X, 1:Y, 1:T])

It is rather impressive how few lines of code it takes to reconstruct a full 3D image from the data collected, isn’t it? So let’s take a look at the results of running the code against the "teaser" dataset to produce an image.

A reconstructed image (left) of the "teaser" scene, and for comparison, an actual image (right) of the scene.

Honestly, the quality is probably on par with a 40 year old digital camera. On the other hand, it hopefully means that in about 40 years we will have GoPro’s with this capability strapped to our helmets showing us what’s around the corner before we even get there!