Light throughput

The light throughput of our system is mainly limited by both the display device and diffraction. First, in an amplitude display, each pixel modulates the light intensity by blocking the light transmission. For example, in our experiment, due to the use of a polarizer in front of the SLM, the light with only one polarization direction can pass. Similar to conventional amplitude displays, the light transmission at each pixel is determined by the voltage (or hologram gray-scale) response of SLM. Therefore, the light throughput varies pixelwise according to the displayed content. Additionally, for most liquid-crystal-based passive displays, the light experiences an additional loss due to the filling gap (fill factor) between pixels in either transmissive or reflective configuration. For example, the fill factor is 95.7% for the SLM used in our experiments. Although the light efficiency of amplitude displays is generally lower than that of their phase counterparts, the amplitude displays such as LCD and DLP are more accessible for consumer applications because of their low cost.

The second major light throughput loss is attributable to diffraction. Due to the pixelated structure of the amplitude displays, the light emitted from the display panel diffracts into different orders, each associated with a duplicated image. The common practice is to use only the zero-order diffraction (also known as SLM bandwidth) because it has the maximum energy (78% for the zero-order diffraction efficiency in our SLM). This efficiency can be improved by using an SLM with a smaller pixel.

Reconstruction efficiency

The functionality of our method hinges on our ability to control both the phase and amplitude of the light wavefront to produce multiple converging beams carrying the image information. The encoding of a complex wavefront into an AO-CGH reduces the reconstruction efficiency because the AO-CGH contains the target complex wavefront as well as the DC and conjugated terms. The reconstruction efficiency can be numerically estimated through simulating the wavefront propagation from the encoded AO-CGH to the retina plane via pupil filtering. Also, to avoid the crosstalk between the signals and DC term, the incident beam on the hologram must have a slightly converging wavefront to increase the diffraction angle of the SLM.

To calculate the reconstruction efficiency, we first set all pixel amplitude values of the AO-CGH to unity and calculated the total power, P 0 , of the reconstructed image with no pupil filtering. Next, we computed the power P 1 of the reconstructed image with an encoded AO-CGH and pupil filtering. We define the reconstruction efficiency as P 1 /P 0 , and this ratio largely depends on the image content. For example, the calculated reconstruction efficiency is ~0.2% for the “letters” image (Fig. 4a), ~0.1% for the “logo” image (Fig. 4b), and ~0.7% for the “grid” image (Fig. 4c).

Eyebox size and field of view

We define the eyebox size as the area within which the Maxwellian-view image can be seen by the eye. In our proof-of-concept experiments, we form a 3 × 3 pupil array, and the distance between adjacent pupils is 1 mm. Therefore, the eyebox size is 3 mm × 3 mm. In general, when the eye pupil size is larger than the pupil spacing in pupil array, aliasing appears in the observed image where multiple duplicated Maxwellian-view images from two or more pupils overlap. In order to avoid image aliasing, the pupil distance must be greater than the physical eye pupil size, which varies from 1.5 mm to 8 mm dependent on the lighting condition. One possible solution is to update the AO-CGH by adjusting the plane carrier wave in Eq. (3) according to the detected eye pupil position and size from the pupil tracking device. The 1 mm pupil spacing in our experiments is limited by the small diffraction angle of the SLM, and this value can be increased for a smaller SLM pixel size.

In Fig. 7(a), we developed a theoretical framework to calculate the eyebox size. For simplicity, we used a one-dimensional model. Herein we denote the resolution and pixel pitch of the display as N and dx. The effective area of the display (AO-CGH) can be computed as L = Ndx, which is also the dimension of the DC term (L DC = L = Ndx). Provided that the desired signals (i.e., pupil array), DC, and conjugated terms occupy the full bandwidth of the zero-order diffraction (L b = λd 2 /dx under paraxial approximation31), to separate the off-axial signals from the DC term, the dimension of signal area L s (i.e., eye box) must be no greater than L b /2- L DC /2, i.e.,

$$2{L}_{s}\le \frac{\lambda {d}_{2}}{dx}-Ndx.$$ (6)

Figure 7 Calculation of (a) eyebox and (b) field of view. Full size image

To increase the eyebox area, Eq. 6 implies that we can increase the distance (d 2 ), decrease the resolution (N), or reduce the pixel size (dx). However, for near-eye displays, a small d 2 and a large N are desired because they yield a compact form factor and a high resolution, respectively. Therefore, the practical approach is to use a small pixel size. For example, to achieve 3 mm pupil spacing in a 9 × 9 pupil array, i.e., L s = 3 mm × 3 = 9 mm, the required dx is 3.7 μm in our current setup.

We calculated the field of view (FOV) of our system based on geometrical optics. As shown in Fig. 7(b), for each focal spot in the effective eyebox, the chief rays emitted from the virtual image (L v ) converge at the eye pupil via the display. The angle θ between the chief rays associated with the top and bottom of the virtual image defines the FOV. We assume this angle is approximately the same for all Maxwellian views seen from different pupils, and we calculate it as θ ≈ L v /(d 1 + d 2 ).

The FOV depends on the virtual image dimension L v , and it reaches the maximum when the chief rays associated with top and bottom pupil locations intercept the display screen edges as marked in Fig. 7(b). The maximum L v-max and the correspondent FOV θ max can be derived based on the trapezoidal geometry as:

$${L}_{v-\max }=\frac{L({d}_{1}+{d}_{2})-{L}_{s}{d}_{1}}{{d}_{2}},\,\,{\theta }_{\max }\approx \frac{{L}_{v-\max }}{({d}_{1}+{d}_{2})}=\frac{L}{{d}_{2}}-\frac{{L}_{s}{d}_{1}}{{d}_{2}({d}_{1}+{d}_{2})}.$$ (7)

In our proof-of-concept experiments, the size of virtual image L v = 24 mm. The FOV is calculated as θ ≈ L v /(d 1 + d 2 ) = 24 mm/500 mm ≈ 2.75°, close to its maximum θ max ≈ 2.8° defined by Eq. (7).

Equation 7 indicates that, given distances d 1 and d 2 , there is a trade-off between the FOV θ max and eyebox L s —increasing the FOV would unfavorably reduce the eyebox. To maintain the desired eyebox, we can alternatively reduce the distance d 2 . However, to display the correspondent hologram, the required pixel pitch becomes much smaller. For example, to increase the FOV by a factor of two, the required pixel pitch is 3.2μm compared to 9.2μm in the current setup. Alternatively, rather than using a plane illumination, we can shine a convergent wavefront onto the SLM to increase the FOV at the expense of using an additional lens32.

Resolution and color reproduction

The resolution of the AO-CGH in our experiment is 1024 × 1024 which is the same as the virtual target image. However, during the image reconstruction through diffraction, the high frequency information is lost due to pupil filtering. Also, the multiplexing of duplicated perspective views in to a single hologram reduces the information content of each Maxwellian-view image. To quantitatively evaluate the relations, we numerically reconstructed the Maxwellian-view image from one pupil view in different pupil array cases, and calculated the root mean square error (RMSE) values (all of the calculated intensities are normalized in [0, 1]) between each simulated reconstruction and the original target image. Figure 8 shows the simulation results for the two test images with the RMSE values marked in each image. Yellow dashed circles in each image column indicate the selected pupil positions of the numerical reconstructions from the pupil array of 1 × 1, 2 × 2, 3 × 3 and 4 × 4 cases. The reconstructions as well as the enlarged details imply that the quality and resolution of the image from single Maxwellian view degrade when the number of pupils increases, which is quantitatively verified by the RMSE values in each simulation result.

Figure 8 Evaluation of image quality under different pupil numbers. Full size image

Although beyond the scope of current work, our method has an advantage in reproducing colors. Conventional HOE-based Maxwellian displays suffer from chromatic aberrations because the recorded interference pattern is wavelength dependent, causing both on- and off- axial miss-alignment of RGB channels in color mixing. Our method alleviates this problem because the modulation of light beams is achieved by a single AO-CGH. To reproduce colors, we can load three independent AO-CGHs into the RGB channels of the display and display them simultaneously. Then the RGB light emitted from these holograms propagates independently and merges at the retina, creating a color representation.