Simple, reliable 2.5D photography

Note: You may also enjoy the Geiger-Mueller mood lamp project.

1. Intro and project goals

Alas, 3D scene reconstruction from series of freehand images is still an extremely hard problem in computing; while promising results in controlled settings are presented every now and then, the output is very far from being even vaguely acceptable for general-purpose photography: missing or incorrect data, stitching errors, low voxel / polygon accuracy, and severe texturing issues, seem to be nearly impossible to fully work around with today's technology. Virtually all usable 3D scans of real world objects are accomplished through different means: using fairly simple triangulating 3D laser scanners to capture a 3D model by observing how an off-axis laser beam "flows" around the analyzed shape, and then - optionally - projecting textures generated from aligned 2D images. This sort of full 3D acquisition is cumbersome and time-consuming enough to be out of question for casual applications, however.

As a CS buff, I am deeply impressed by the 3D scene reconstruction research; but as a long-time photographer, I couldn't help but wonder if major advances in photo processing could be made with much simpler tools. In particular, I was intrigued by 2.5D imaging (to borrow the term from the world of CNC machining): regular 2D pictures augmented with precise , per-pixel distance information, but no data about undercuts, backfaces, and so forth. This data alone would enable you to take a photo with a relatively small aperture, and then:

Automatically split objects into layers : accurate distance and distance continuity information could be used to very accurately isolate elements of the composition, and allow them to be selectively edited or even replaced - eliminating one of the most difficult and time-consuming steps in photo processing today.

: accurate distance and distance continuity information could be used to very accurately isolate elements of the composition, and allow them to be selectively edited or even replaced - eliminating one of the most difficult and time-consuming steps in photo processing today. Set focus and compute bokeh on software level : since per-pixel distance information is available, it is possible to apply (and then tweak) selective, highly realistic blur as a function of distance, to achieve the desired aesthetic effect after all the retouching and composition changes are done. Creative bokeh effects would be also easy to achieve. (A comparable goal is also being pursued with specialized sensor and aperture designs.)

: since per-pixel distance information is available, it is possible to apply (and then tweak) selective, highly realistic blur as a function of distance, to achieve the desired aesthetic effect all the retouching and composition changes are done. Creative bokeh effects would be also easy to achieve. Apply volumetric effects : a number of other depth-based 2.5D filters could be conceivably created, most notably including advanced, dimensional lighting and fog.

: a number of other depth-based 2.5D filters could be conceivably created, most notably including advanced, dimensional lighting and fog. Still use existing 2D software: since all mainstream 2D photo manipulation programs feature extensive support for layers and grayscale masks, their existing features could be easily leveraged to intuitively work with 2.5D photography, without the need to develop proprietary editing frameworks.

There are several ways to create 2.5D depth maps, and I toyed, for example, with the concept of varying focal length and then measuring pixel contrast in function of this setting (bad news: it yields vaguely usable distance information for well-defined edges only, and is horribly noisy everywhere else). Another approach is simply to dumb down a 3D scanner: attach a pattern-projecting diode laser module to a servo, and install it on camera's hot shoe. The laser could then rapidly sweep the scene - and by the virtue of being slightly off the optical axis, distance information could be easily triangulated by measuring pattern displacement. To explain a bit better, consider this simple case:

Naturally, sweeping the scene with a single dot would be painful - hundreds of thousands of readings would need to be taken to collect data with reasonable resolution; sweeping the target with a horizontal line provides non-ambiguous data for an entire row of pixels instead. It is worth noting that this principle could be taken even further - several projects resorted to projecting series of lines for 3D imaging purposes (also known as structured light scanning) - but this approach creates ambiguities in non-contiguous regions, and these are very difficult to reliably resolve in software.

When I first thought of building such a device in 2005 (and experimented a bit), the biggest problem would be that a very significant number of stills would still need to be taken to achieve a reasonable vertical resolution, taking several minutes - and several hundred shutter cycles - per every photo. Lowering Y resolution would be possible with simple, well-defined subjects - but all in all, this still seemed problematic. Using a video camera for high fps acquisition seemed like a possibility - but individual frames from a reasonably priced digicam were simply unusable for any type of photo work due to limited resolution.

Thankfully, in recent years, we witnessed the arrival of digital SLR cameras with unprecedented video capture capabilities, producing pixel-perfect captures at 30 fps without interlacing, minimal compression artifacts, acceptable resolutions, amazing low-light performance, and full manual controls. With Canon EOS 5D Mark II in hand, I decided to give the project another try.

2. Scanner design

WARNING: point beams from lasers over about 10 mW can cause eye damage or ruin camera sensors; and starting around 100 mW, they can also set many things on fire. Line-generating optics greatly decrease power per surface area, so they aren't nearly as risky as long as they are attached securely - but still, exercise caution.

NOTE: Following various minor but well-published incidents during sporting events or riots, several countries - most notably Australia - rolled out knee-jerk regulations banning the sale of virtually all useful lasers to general public. These regulations are generally poorly enforced, especially for Internet purchases - but it is still your responsibility to stay compliant.

The optics should be picked to match the angle of view of your lens: a line around 100° is best for wide-angle (16-35 mm or so), 50° for standard portrait lens (50-105 mm), and under 20° for telephoto (135 mm and beyond). Nothing bad will happen if the line is much wider than the frame, to be sure - other than the need for a more powerful laser to achieve the same level of illumination.

With the right laser in hand, you need to find a suitable power supply, and mount it on a high ratio geared motor (or a really slow servo), so that it can smoothly sweep the entire scene - covering both a distant object near the top of the frame, and the closest you plan to photograph near the bottom. The RPM needs to be selected to provide a sufficient resolution: around 150-300 discrete readings in the Y axis are probably optimal, and at 30 fps, this means about 5-10 seconds of screen time. For wide angle, about 0.5 - 1.5 RPM is needed; for normal portrait lens, 0.25 - 0.5 RPM may be more suitable; and for telephoto, plan for 0.15 - 0.2 RPM.

Note that you can't simply grab a nominally faster, low-cost servo, and pulse it to achieve lower RPM: most of them have a very limited angular resolution, and can't be positioned accurately enough. You can build a simple transmission to lower the RPM, though, or lower the driving voltage a bit. In any case, the key to success is smooth, seamless rotation of the laser.

If you are not using a servo, you should also equip the motor with a home position sensor - a microswitch or an optointerrupter - to be able to start from a known location. The whole setup needs to be hooked up to your computer or an autonomous microcontroller (possibly through a H-bridge driver and a power Darlington to turn the laser on and off), and programmed to exeute the following algorithm:

If the limiting switch is currently closed, go to 3. Rotate up until the switch is triggered. Abort if position not reached in expected time. Momentarily reverse the motor to avoid cramming the switch too hard. Wait for keypress. Turn on laser. Rotate down for predefined time needed to sweep the scene. Abort if the limiting switch not depressed in time.

Optionally: make a short stop in the middle to key in reference reading. Turn off laser. Go to 1.

The operator needs to follow this procedure:

Set up camera on a tripod, power up the scanner. Capture the actual image to be augmented with depth information. Adjust exposure to a predefined level suitable for capturing the projected line (overexposure reduces accuracy, underxposure limits scan range). Start recording video, capturing at least one reference frame with no laser beam present. Press the button to start scan. Stop recording once the laser shuts off.

The following photos document the mechanical design of my scanner (see my CNC maching guide for a better explanation of what's going on).

1) Low-level CAD work to design a 2500:1 gearbox for the laser:

2) A rendered CAD model of the entire gearbox assembly:

3) Positive molds for the assembly being machined on a CNC mill in RenShape 460:

4) Finished positive mold:

5) Negative molds cast in hard platinum cure silicone (ShinEtsu KE1310ST):

6) Final plastic parts cast in silicone molds using Innovative Polymers IE-3075:

7) Assembled gearbox, laser with line-generating optics visible:

8) Another take of the finished device:

3. Depth map software

Subtract the reference frame. Extract color channel matching laser wavelength. For each column: Normalize brightness. Find beam start and end location. Reject column if values abnormal. Triangulate distance of start and end locations based on focal length, approximated laser angle at this time code. If necessary, interpolate between data points (snapping to detected edges when interpolating non-contiguous regions). Create layers by isolating non-continuos regions in the depth map.

My initial tests, however, settled for something far more primitive - simply locating the frame for which pixel value is the highest, and correlating this with scanner angular position to compute unfiltered distance data. Still, the first scan isn't exactly horrible. Here's a reference "daylight" capture of the test scene (including a somewhat challenging, reflective marble surface):

And here's the raw, unedited 2.5D scan data:

4. Questions, comments...