Tomographic near-eye displays for virtual reality

Figure 1 illustrates the procedure for the reconstruction of 3D volumetric objects via tomographic displays. Note that the focus-tunable optics are represented by a focus-tunable lens for intuitive illustration. In the single cycle, the focus-tunable lens modulates the focal length, so that the images of the pixels scan along the specific range of the depth. At the same time, the FSAB determines the depth information of each pixel via illumination at the appropriate moment. The FSAB projects a binary image sequence onto the display panel, which is synchronized with the focus-tunable lens. The binary image sequence is derived from the depth information of the 3D volumetric objects. In summary, while the display panel performs the common role to reproduce a 2D image that includes color and gradation, the focus-tunable lens and the FSAB enable the 2D image to have depth information.

Fig. 1 Schematic diagram of the principle of tomographic displays. As shown in the figure, the FSAB and focus-tunable lens (FTL) are synchronized to provide users with depth information. a When the image of the display panel (DP) is formed at the depth of z 1 , the FSAB selectively illuminates the blue dice of the display panel. b When the focus-tunable lens forms the display panel image at the depth of z 2 , the FSAB selectively illuminates the red dice of the display panel. As a result, the blue and red dice appear to float at depths of z 1 and z 2 , respectively. Note that each depth can take on a negative value, which means the image can be created at the left side of the focus-tunable lens Full size image

The display module, which consists of the FSAB and the display panel, performs the key role in generating a large number of focal planes at 60 Hz. Because only binary images are handled, it is feasible to drastically increase the frame rate of the FSAB. Binary images could be updated at an extremely high frame rate, which could not be achieved by state-of-the-art display panels (<240 Hz). On the other hand, the display panel includes color and gradation information that the FSAB cannot deal with. In addition, the substantive resolution of a synthesized image is determined by the display panel, which means that the FSAB does not necessarily have a high resolution. In summary, the FSAB and the display panels have complementary relations for 3D displays: resolution, color, and gradation are determined by the display panel, while high frame rate is supported by the FSAB.

We implement a benchtop prototype for tomographic near-eye displays, which can be applied for virtual reality that provides an immersive and comfortable experience. In detail, the prototype is divided into four parts: the DMD projection system for the FSAB, the liquid crystal display (LCD) module, the focus-tunable lens, and the eyepiece. The DMD projection system consists of an LED light source with a collimating lens, a total internal reflection (TIR) prism, a DMD, and relay optics with the magnification. The binary image at the DMD is magnified and projected onto the LCD module, which corresponds to the FSAB. Note that the DMD could update the binary images more than 100 times during 1/60 s. The eyepiece secures enough eye relief22 (50 mm) for the observer while maintaining the optimal field of view that can be achieved with the focus-tunable lens.

Figure 2 demonstrates the display results of the prototype. We employ 3D contents29 that may show significant variation in depth. 2D projected images and depth maps are used to generate focal plane images. As we can see in the figure, the depth information of the 3D content is well reconstructed via tomographic near-eye displays. The tomographic near-eye display supports the original resolution of the display panel with full color expression. Eighty focal plane images are floated between 5.5D and 0.0D, so that each plane is separated by 0.07D. This separation is narrow enough to provide users with quasi-continuous focus cues30. We may observe clear focus cues and blur effects of reconstructed 3D content. Note that motion parallax within the exit pupil is also achieved via tomographic near-eye displays as shown in the figure (see Supplementary Movie 1). The prototype provides a diagonal field of view of 30° within the exit pupil of 7.5 mm. These specifications are verified using the optical simulation tool Zemax and the experiment, which is described in Supplementary Note 6.

Fig. 2 Experimental results of tomographic near-eye displays. a 2D projected images and the corresponding depth maps are illustrated. The source of 3D content is from the work of Burtler et al.29. b Experimental results are demonstrated by ten photographs with different focal depths from a CCD camera. As we can see in the photographs, tomographic near-eye displays may support quasi-continuous focus cues (white arrows) while preserving high resolution and contrast. c, d We demonstrate a brief experiment to show the motion parallax provided by tomographic near-eye displays. Additional results are available in Supplementary Note 6 and Supplementary Movie 1 Full size image

Along with the promising display performance demonstrated in the experiment, tomographic near-eye displays have two more advantages. First, they are capable of inserting black frames without decreasing the focal plane number, because our prototype does not necessarily increase the number of focal planes to the utmost limit of the DMD system (~280 planes25). We note that black frames contribute to mitigating undesired artifacts when a video is played. Without black frames, users may observe irregular striped patterns caused by simultaneous observations of focal plane image stacks in adjacent frames. Second, we can use an LED array instead of the DMD to implement wearable prototypes. The LED array supports a much lower resolution (8 × 8) and frame rate (<1 kHz) than that of the DMD. In tomographic displays, however, the additional display panel complements the limitation of the LED array by supporting a much higher resolution (491 dpi) as well as 24-bit depth colors. Supplementary Note 1 presents detailed demonstrations of the necessity of black frames, and Supplementary Note 6 demonstrates wearable tomographic near-eye displays using an LED array.

Occlusion blending to alleviate depth discontinuity artifact

Although tomographic displays have various advantages as demonstrated in the previous section, it may be premature to consider tomographic displays as the most promising system for virtual reality. Because focal plane images are merged via addition, tomographic displays could be vulnerable to depth discontinuity artifacts at occlusion boundaries as demonstrated in Supplementary Note 3. Without an adequate solution, the synthesis of focal plane images seems to be artificial when the depth discontinuities are significant. In previous studies related to multi-plane displays, it has been verified that linear8 or optimal blending15 could alleviate the depth discontinuities. Unfortunately, tomographic displays could not apply those blending methods directly, because all focal plane images are correlated with each other. Each focal plane image of tomographic displays cannot be determined independently, because the FSAB divides a constant RGB image into multiple focal plane images. Therefore, we need to conceive of an alternative blending method to minimize depth discontinuity artifacts.

In this study, occlusion blending is introduced for tomographic displays, and this approach adopts and combines the ideas of light field synthesis31 and optimal blending15. Although it demands large computation power that hinders real-time operation, this method could significantly minimize the artificial effect that comes from large depth discontinuities. To find optimal focal plane images that satisfy the unique constraint of tomographic displays, we must solve the binary least-squares problem categorized as nondeterministic polynomial-time hardness (NP-hard). Here, we solve the relaxation of the NP-hard problem to verify the ability of tomographic displays to minimize depth discontinuity artifacts. Figure 3 demonstrates the display results of tomographic displays when applying the optimal solution of the binary image sequence and the RGB display image.

Fig. 3 Simulation and experimental results demonstrating occlusion blending. A volumetric scene (Source image courtesy: “Interior Scene”, www.cgtrader.com) extends along the depth range between 0.0D and 4.0D. As shown in the figure, occlusion blending enables tomographic displays to represent volumetric scenes without noticeable artifacts even at the occlusion boundary (red arrows). Additional experimental results are available in Supplementary Note 2 Full size image

Evaluation of display capability

To assess tomographic displays, we define two evaluation criteria: upper-bound amplitude and bit depth. The upper-bound amplitude is the Fourier coefficient of synthesized retinal images, and bit depth denotes the degree of freedom to modulate pixel brightness. These two criteria provide insights into the contrast, resolution limit, and bit depth of tomographic displays. Figure 4 demonstrates the analysis of the upper-bound amplitude and bit depth supported by tomographic displays. Other state-of-the-art prototypes24,25 are assessed for comparison with tomographic displays. Among the candidates, tomographic displays have the most promising potential for the representation of high-frequency information as well as high-dynamic range images. A detailed description of the upper-bound amplitude and bit depth is presented in Supplementary Note 1.

Fig. 4 Analysis of upper-bound amplitude and bit depth. a The upper-bound amplitude is the normalized Fourier coefficient of synthesized retinal images, and b the bit depth denotes the degree of freedom in the brightness modulation. Each criterion is plotted according to the spatial frequency of retinal images and the focal depth of observers. The simulation result compares the three prototypes of Rathinavel et al.25, Chang et al.24, and this work. Rathinavel et al.’s prototype25 supports a limited depth of field and bit depth at a high spatial frequency, and Chang et al.’s prototype24 lacks the bit depth for a full color representation. On the other hand, tomographic displays show reliable performance, regardless of the spatial frequency and focal depth. The exact values at the red points are demonstrated for a precise comparison among those candidates Full size image

For a more quantitative evaluation of tomographic displays, we also conducted retinal image simulations to analyze how focal plane images are synthesized. We compared tomographic displays with 80-plane displays to investigate the drawbacks generated from the slow frame rate of the display panel (60 Hz) in tomographic displays. Contrary to tomographic displays, each focal plane image of 80-plane displays can be independently determined according to the blending method, such as linear blending8 or optimal blending15,31. In this simulation, we assumed that all systems had a resolution limit of 20 cpd, where the horizontal field of view was set to 10°. We employed several visual metrics, including the peak signal-to-noise ratio (PSNR), image quality factor (Q), and HDR-VDP-232 that estimates the probability of users being able to detect artifacts. Figure 5 demonstrates the simulation results that verify the validity of using the display panel to increase the number of focal planes and bit depth simultaneously. As shown in the figure, tomographic displays show comparable display performance to that of the 80-plane displays, where each focal plane is determined independently.

Fig. 5 Quantitative evaluation of tomographic displays. A volumetric scene (Source image courtesy: “SimplePoly Urban”, www.cgtrader.com) extends along the depth range between 1.0D and 3.4D. a, b Tomographic displays show the similar display performance with 80-plane displays in terms of average PSNR (Avg. PSNR) and average Q (Avg. Q). The Avg. PSNR is derived from the weighted sum of errors between the ground truth and synthesized retinal images. The weight is estimated by the reciprocal of the optical blur kernel size. The Avg. Q is the mean value of all focal stack images’ Q values. c We demonstrate a probability map of detection for visual differences between the reconstructed scenes and the ground truth. Each pixel value indicates the weighted average probability over all the focal depths of observers. The average probability (Avg. P) is the mean value of all pixels. In Supplementary Note 4, we present focal plane images for each display system and additional comparison results from other related systems Full size image

Illumination strategy for real-time operation

Despite the several merits involved in occlusion blending, such a blending method could be impractical in the real environment, because of the large computational demands. For some applications that require real-time operation, a computationally efficient blending method can be preferred over the accurate representation of occlusion boundaries. In this condition, we may render binary backlight images according to the depth information of 3D scenes. The rendering rule of binary backlight images is determined by the illumination strategy of the display pixels. If we apply this rendering methodology, it is feasible to operate tomographic near-eye displays in real time.

In this study, we optimize the illumination strategy to ensure adequate display performance in terms of brightness, contrast, resolution, and accuracy of focus cues. In the optimization, we consider various requisites of display systems for a comfortable and immersive experience. First, we suppose the lower bound for the illumination time to provide users with adequate brightness of the synthesized images. The lower bound is considered as a constraint in the optimization. Second, we consider the offset luminance to be a leakage of the backlight source caused by the multiple scattering of light through the backlight diffuser. When the offset luminance is not 0, each pixel of the focal plane is illuminated by a constant brightness, even if the corresponding backlight pixel is turned off. Third, the cost function for the optimization is derived in the frequency domain to reflect human visual characteristics. The most sensitive region for human vision is the spatial frequencies from 4 to 8 cpd33.

Figure 6 illustrates two illumination strategies: primitive and optimal approaches. In the primitive strategy, each pixel is illuminated by the minimized time when its image is formed at the desired depth. On the other hand, the optimal strategy employs a specific backlight operation that minimizes the cost function described above. If there is no offset luminance (\(c = 0\)) and a lower bound of brightness (\(A_{{\mathrm{low}}} = 0\)), the primitive and optimal strategies are identical. When the lower bound of brightness is determined as \(A = 0.625m\), the primitive solution is to illuminate the pixel around the desired depth. However, the optimal solution has several lobes for the backlight operation to exploit higher-order intensity distribution. When the offset luminance of the display system is determined as \(c = 0.025\), the display system should have enough brightness to surpass the offset luminance. In this condition, the optimal solution may have a longer illumination time than that of the lower bound, as shown in the figure. Compared with the primitive strategy, the optimal strategy enables tomographic displays to have a higher contrast with a sharper peak, so that users can accommodate the desired depth.

Fig. 6 Description of optimal illumination strategy. We simulate different conditions according to the degree of offset luminance and desired brightness: (\(c = 0\), \(A_{{\mathrm{low}}} = 0\)), (\(c = 0\), \(A_{{\mathrm{low}}} = 0.625m\)), (\(c = 0.025\), \(A_{{\mathrm{low}}} = 0.025m\)), and (\(c = 0.025\), \(A_{{\mathrm{low}}} = 0.625m\)). a We show the backlight operations according to the strategies and circumstances. b We demonstrate normalized contrast maps that are achieved by applying a corresponding backlight operation. Normalized contrast maps indicate the relative intensity of retinal images according to the spatial frequency and focal depths. c We provide contrast errors determined by the difference between the target and the reconstructed contrast maps. The differences between primitive and optimal strategies are highlighted by dotted or solid arrows and circles. An optimal illumination strategy allows more definite and precise contrast curves. Note that our prototype supposes the third circumstance: (\(c = 0.025\), \(A_{{\mathrm{low}}} = 0.025m\)). Additional results are available in Supplementary Note 6 Full size image

Advanced applications of tomographic displays

By virtue of the remarkable capability to modulate the depth of imaged pixels, tomographic displays could have various advanced optical applications. For instance, tomographic displays can correct optical aberrations, such as the field curvature, which is usually observed in near-eye display systems34. A high-dynamic range (HDR) display27 is also a feasible application, because the intensity of the backlight could be spatially modulated. We can render HDR focal plane images via modification of the illumination time according to the degree of brightness. In summary, tomographic displays have several advanced applications that provide a more immersive experience.

Figure 7 shows the simulation results that validate the proposed advanced applications of tomographic displays. In the HDR application, the illumination time of each pixel varies according to the desired intensity. The variation range of the illumination time lies between a 0.5× and a 1.5× ratio of the optimal illumination time. Second, a depth map of a 3D scene is pre-compensated to alleviate the optical aberration (i.e., field curvature) of the display system. The pre-compensation is determined by the degree of the optical aberration. Note that we use Seidel coefficients for the simulation of the field curvature.