Google recently released an algorithm that will make JPEGs up to 35% smaller with less artifacts. As the creator of an open source image hosting platform I wanted to test its performance in the real world.

Guetzli in action. Left: Original, Center: libjpeg, Right: Guetzli. You can see that Guetzli has less artifacts

Note: Google says the ideal way to use Guetzli is on uncompressed and really large images. On image hosting sites users upload usually only small(er) images so the results will probably not go up to 35% but let's find out.

Let's convert all images on PictShare

PictShare uses a smart query system so by changing the URL to an image, it will automatically resize, add filters, etc. So PictShare creates a copy for each modified image. I will include all commands I used so all other PictShare admins can test it for themselves.

# Get all JPEGs and save the list in a file cd /path/to/your/pictshare/upload; find . -mindepth 2 | grep '.jpg$' > jpglist.txt

Let's just see how many images we have and how large they are now

cat jpglist.txt | wc -l #10569 du -ch `cat jpglist.txt` | tail -1 | cut -f 1 # 3.7G

So we have 10569 JPEG images in total 3.7G large

Let's convert them and see what happens

mkdir /tmp/guetzli tmp="/tmp/guetzli" while read path do b=$(basename $path) guetzli $path "$tmp/$b" done < jpglist.txt

What they don't tell you: Guetzli is slow

For now, Guetzli only uses one CPU thread and is very slow. Even small images with 800x600 pixels take about 10-20 seconds on an i7 5930K @4GHZ. It also uses up to 6 gigabytes of RAM per image.

Almost 6G RAM on Windows

I split the conversion script in 9 parts (so the system still has some threads left) and let my spare server run the conversion exclusively.

The Results

Converting 10569 images with Guetzli took my Xeon Server with two physical CPUs and 16 threads 50 hours.

With the following script I tested if all images were valid images. Interestingly, 72 of the Guetzli images (less than 1 percent) were not. Most of them were not created or hat 0 bytes because guetzli crashed. I re-ran all missing images, crashed again.. odd.

orig_failed=0 converted_failed=0 while read path do orig="toconvert/$path" out="converted/$path" if [[ ! $(file -b $orig) =~ JPEG ]]; then orig_failed=$((orig_failed + 1)) echo "[$orig_failed] $orig is not a valid JPEG" fi if [[ ! $(file -b $out) =~ JPEG ]]; then converted_failed=$((converted_failed + 1)) echo "[$converted_failed] $out is not a valid JPEG" fi done < jpglist.txt echo "" echo "Results:" echo "$orig_failed of the original images are not valid JPEG images" echo "$converted_failed of the converted images are not valid JPEG images"

The original 10569 images had a size of 3.7G (3915306882 Bytes). Average 361KiB per image

per image The guetzli images have a size of 1.8G ( 1908475014 Bytes). Average 176.34 KiB per image

Screenshot of the end result

Guetzli images were in total a whooping 51% smaller

Examples

Some images were even more impressive than that.

The original version of this image went from 2.3M to 644K. 71% smaller

This was not an isolated case, I used the following script to analyze the images and the results were amazing.

In the end 18% (1940 of 10569) of the converted images were 60% (or more) smaller than the original files. I'm beginning to think this might have something todo with the phpGD library, which creates the JPEGs on PictShare.

# Note you'll need "identify" first. # Install with: apt-get install imagemagick while read path do orig="toconvert/$path" conv="converted/$path" echo > conversionlist.csv if [ -f $orig ] && [ -f $conv ]; then orig_size=$( du $orig | cut -f1 ) conv_size=$( du $conv | cut -f1 ) percent=$( echo "scale=2;100 - (($conv_size / $orig_size) * 100)" | bc -l ) percent=${percent%.*} dimensions=$( identify -format '%wx%h' $conv ) if [[ $percent -gt 60 ]]; then #only print if the new image is 60% smaller than the original printf "$path is $percent%% smaller\t$conv_size instead of $orig_size\t Dimensions: $dimensions

" echo "$path;$percent;$dimensions" >> conversionlist.csv fi fi done < jpglist.txt

Conclusio

Since not all images could be converted, we have a 1% error margin.

The computation time is massive but if done in background it might be not a problem. I have no idea why Guetzli works better than Google promised even though most images were not high quality images, but the results speak for themselves.

If you are a image hoster and you have spare CPU cycles (lots of them), Guetzli can save you good money in storage and it seems if your images were created with PHPgd your images will get much smaller.