HDR+ Burst Processing Pipeline

Visual Computing Systems Final Project

Tim Brooks and Suhaas Reddy

Summary For our final Visual Computing Systems project we implemented a burst photography pipeline based on Google’s HDR+. The technique combines multiple, underexposed raw frames as a means of noise removal, and later applies tone mapping to maintain local contrast while brightening shadows. Initially underexposing images allows for more robust alignment and merging, in addition to lower motion blur and fewer blown highlights. Our code is avaiable at github.com/timothybrooks/hdr-plus. Our implementation applies the same high level pipeline to raw images off a Canon 5D III DSLR, with modified algorithms. While Google’s HDR+ pipeline has proven a useful technique in mobile photography due to it’s ease (no user input) and robustness (used as a default setting), the main question we explored is whether this technique is potentially beneficial in a broader photography setting. The extreme low-light scene below of a car driving by in the snow is processed soley using our implementation of the HDR+ pipeline. Example Output Image The pipeline is broken into three main phases: the first aligns many raw, underexposed frames, the second merges the frames as a means of noise removal and the third finishes the image by applying tone mapping and standard image processing operations. We will briefly explain our approach to each phase, examine sample results, and discuss insights of the results on the potential use of HDR+ in a broader photography context.

Background When a camera captures information off an image sensor, a series of steps—refered to as a pipeline— must be applied to the data before it will be recognizable as a photograph. While it is common for cameras to apply an image processing pipeline immediately, producing a jpeg output image, some cameras allow for capture of images in a "raw" format, defering processing to that of a software application such as Adobe Photoshop or Lightroom. The pipeline that we describe processes images from start to finish—that is, from raw data to a recognizable image. The closeup of a toy below compares raw sensor data with its corresponding output image. Raw Sensor Data | source: graphics.cs.cmu.edu Output Image | source: graphics.cs.cmu.edu

Aligning We align raw frames hierarchaly via a Gaussian pyramid, moving from coarse to more fine alignments. Each level of the pyramid is downsampled by a factor of 4. Using 16 x 16 tiles and a search region of 4 pixels, we find the tile offset that minimizes the sum of L1 distances. While Google uses slightly different downsample and search region sizes and a mix of L2 and L1 distances, the pixel-level alignment is otherwise identical. Google supplements the hierarchal alignment described above with a sub-pixel alignment method that fits a bivariate polynomial to the pixel-level alignment, allowing for alignment at a finer granularity. By aligning the frames prior to merging, we are able to combine more robustly and without significant blurring or ghosting due to motion. Six-frame Average Six-frame Align and Merge

Merging Merging decreases noise from a reference frame by including similar information from aligned images of other frames. Google uses a variant of the Wiener filter to each tile across the temporal dimension. Our algorithm is significantly simplified, using the same intuition behind non-local means, which weights pixels in relation to the similarity of patches. We combine each tile based on the relative pairwise similarity between a reference tile and other tiles in the temporal stack. As the close crops below display, merging significantly improves image quality by removing noise, without the blurring side-effect of most single-frame blurring algorithms. Merging also improves color accuracy, particularly in darker regions of the image. Single Frame Six-frame Align and Merge Single Frame Closeup Six-frame Align and Merge Closeup To avoid artifacts at the boundaries of tiles, we overlap tiles by one half in each dimension, and blend the overlapping tiles using a modified raised cosine window. This form of spatial merging is exact with that of Google’s implementation. Raised Cosine Window on Checkerboard

Finishing We implemented black-level subtraction, white balancing, demosaicking with simplified gradient correction, bilinear chroma denoising, sRGB color correction, tone mapping, gamma correction, global contrast adjustment and unsharp mask sharpening. These steps include all of those in the Google pipeline except chromatic aberration correction, dehazing and dithering. To apply tone mapping, we simulate a more brightly exposed image and weight the pixels of the brighter and darker images according to a normal distribution--here the normal distribution represents the ideal pixel value distribution of a well-exposed image. The weights are then applied to the two images using a laplacian pyramid, which prevents hard edges and haloing around transitions between dark and bright portions of the scene. Differing slightly from Google’s approach, we apply this algorithm iteratively on high-contrast scenes in order to increase the amount of compression, while minimizing blending artifacts. No Tone Mapping Tone Mapping

Performance Our pipeline takes ~3.4 to merge eight raw images on a quad-core 2.2 GHz Intel i7. This excludes an I/O overhead of ~1 second per raw image. Our runtimes have slight variance on image complexity (align and merge), amount dynamic range (tone mapping iterations) and level of noise (chroma denoising strength). The runtimes meet our ~4 second usability goal. A direct comparison to Google's pipeline performance is not particularly telling, as the algorithms and underlying hardware vary greatly.

Ghosting Artifacts In rare cases, we exhibited ghosting artifacts in our merging. These artifacts occurred in very highly underexposed raw images, and are likely due to a failure to differentiate between noise and misalignment in our simplified merge method. Please note that our testing is not nearly extensive, as we only ran the pipeline on a batch of ~35 bursts that we took. Highly Underexposed Raw Resulting Ghosting

Image Quality Below, our pipeline output is compared with a single underexposed frame processed through dcaw, which is a barebones raw processing pipeline, and a single underexposed frame using automatic Adobe Lightroom processing. Our Pipeline Dcraw Lightroom Our Pipeline Closeup 1 Dcraw Closeup 1 Lightroom Closeup 1 Our Pipeline Closeup 2 Dcraw Closeup 2 Lightroom Closeup 2 Our Pipeline Closeup 3 Dcraw Closeup 3 Lightroom Closeup 3 We can see that the dynamic range compression of our pipeline produces favorable global lighting. Note that this is less relevant for scenes that do not have a high dynamic range. The detail is undoubtedly superior to the basic processing of dcraw. The noise removal of Adobe Lightroom outperforms that of ours in extremely noisy, low-light cases. However, it comes at the cost of lower saturation in shadowed regions, and generally less color detail. Since our noise reduction is primarily accomplished by merging frames, we can less aggressively apply spatial chroma denoising, producing higher quality color information in nearly all of our results.