The early code was not without issues. Initially, Media Proxy would leak 16 bytes on each request. This is a small enough loss that it takes quite a while to manifest, especially when testing at a small scale. Compounding the issue, Media Proxy keeps large static pixel buffers around for resizing purposes. It uses two of these buffers per CPU, so on a 32-core host its initial memory usage is several GB. It would take hours for Media Proxy to be restarted due to exhausting all of the system memory during testing. This was a long enough duration that it was hard to tell if we actually had a memory leak or if runtime usage was just putting us over the limit.

Eventually, we concluded that there must indeed be some kind of memory leak. We weren’t sure whether the leak was present in Go or C++, and reviewing the code failed to turn up the source of the leak. Fortunately, Xcode ships with an outstanding memory profiler — the Leaks tool in Instruments. This tool revealed the size of the leak and approximately where it was occurring. This was enough of a hint that further review allowed us to identify and fix the leak.

We encountered another showstopper bug in Media Proxy. Sometimes it would respond with strangely corrupted images where half of the image would be correct and the other half would appear “glitched”. We initially suspected that we might have been decoding partially retrieved images or somehow calling OpenCV incorrectly. This bug occurred infrequently and was hard to diagnose.

In order to deal with this, we developed a high throughput request simulator that provided image URLs that linked to an HTTP server within the simulator, so that the simulator would act both as requesting client and hosting server. The simulator randomly delayed its responses in order to provoke this strange image corrupting behavior from Media Proxy. With a reliable reproduction, we were able to isolate components in Media Proxy until we discovered a race condition on the output buffer that contained the resized image. We had been writing one image to this buffer and then writing another to the same buffer before the first had finished being put back on the network. The glitched images we had seen were actually two JPEGs written on top of one another.

An actual glitched JPEG rendered by Media Proxy

Another way to discover bugs in a complex system is fuzzing, which is a technique that generates random inputs and sends them into a system. This can cause the system to exhibit strange behavior or crash, and since our system needs to be resilient against all inputs, we decided to utilize this important technique while testing. AFL is an exceptionally good fuzzer, so we picked it and ran it against Lilliput, which revealed several crashes due to uninitialized variables.

After fixing the above bugs, we were confident enough to ship Media Proxy to production and were happy to find that our work had paid off. Media Proxy needed 60% fewer server instances to handle as many requests as Image Proxy while completing requests with much less variance in latency. Profiling shows that more than 90% of CPU time in this new service is spent performing image decompression, resizing, and compression. These libraries are already highly optimized, suggesting further gains would not be easily achievable. Additionally, the service creates almost no garbage at runtime.

Today, Media Proxy operates with a median per-image resize of 25ms and a median total response latency of 85ms. It resizes more than 150 million images every day. Media Proxy runs on an autoscaled GCE group of n1-standard-16 host type, peaking at 12 instances on a typical day.

Putting the Media in Media Proxy

After we had static images working, we wanted to support animated GIF resizing as well, which OpenCV would not handle for us. We decided to add another Cgo wrapper on top of giflib to Lilliput so that it could resize full GIFs, as well as output the first frame as PNG.

Resizing GIFs turned out to be somewhat challenging as the GIF standard specifies per-frame palettes of 256 colors, but the resizer operates in RGB space. We decided to preserve each frame’s palette rather than attempting to recompute new palettes. In order to convert RGB back into palette indices, we gave Lilliput a simple lookup table that crushes some of the RGB bits and uses the result as a key into a palette index table. This performs well and preserves the original colors, though it does mean that Lilliput can only create a GIF from a source GIF.

We also patched giflib so that it would be easier to decode just a single frame at a time. This allows us to decode one frame, resize it, and then encode and compress it before moving on to the next, reducing the memory footprint of the GIF resizer. This does add some complexity to Lilliput as it must preserve some GIF state from frame to frame, but having more predictable memory usage in Media Proxy seems like a clear advantage.

Lilliput’s giflib wrapper fixed a number of issues we had previously seen in Image Proxy’s GIF resizing as giflib gave us full control of the image resizing process. A significant number of our Nitro users had uploaded animated GIF avatars which would have glitches or transparency errors when resized by the Image Proxy but which worked perfectly through Media Proxy. In general, we found that image resizers had problems with some aspects of the GIF format and produced visual glitches for frames with transparency or partial frames. Creating our own wrapper allows us to address these issues as we encounter them.

Finally, we gave Lilliput a Cgo wrapper on libavcodec so that it could freeze the first frame from MP4 and WEBM videos. This functionality will allow Media Proxy to give previews of user posted videos so that users can decide from the preview whether they want to play the video. Freezing the first frame of videos was one of the remaining blockers for us to add an in-client video player for videos in message attachments and links.

More Open Source

Now that we’re satisfied with Media Proxy, we’re releasing Lilliput under MIT license. We hope that this package will be useful for anybody who needs a performant image resizing service, and that this post will help others build new Go packages.

We are hiring, so come join us if this type of stuff tickles your fancy.