Transformations have to be encoded in ascending order of transformation identifier. All transformations are optional.

to keep track of the range of actually occuring pixel values, in order to narrow it down.

to modify the pixel data (in a reversible way) to make it compress better, and

Initially, pixel values are assumed to be in the range 0..2^(bit_depth); this range can be modified by transformations. We’ll use range(channel).min and range(channel).max to denote the global minimum and maximum value of a particular channel.

We also use a potentially more accurate (narrow) conditional range crange(channel,values) to denote the range of a pixel value in channel channel , given that the pixel values in previously encoded channels are values . Initially, the conditional ranges are simply equal to the global range, but transformations might change that.

Finally, we define a function snap(channel,values,x) which given a pixel value x for channel channel and pixel values values in previously encoded channels, returns a 'valid' value as close as possible to x . Usually, snap(channel,values,x) simply clamps x to the conditional range crange(channel,values) , but the ColorBuckets transformation changes that behavior.

In this example, the conditional ranges also change: e.g. crange(1,2) (the range for Co given that Y=2) happens to be -7..7.

As a typical example, consider 8-bit RGBA to which the YCoCg transformation gets applied:

In the following descriptions of transformations, we use orig_range , orig_crange , orig_snap to denote the original ranges and snap function (the initial ones, or the ones resulting from the previous transformation in the chain). We use new_range , new_crange , new_snap to denote the updated ranges and snap function.

In part 4 we will use range , crange and snap to denote the final ranges and snap functions, i.e. after applying all transformations.

We will now describe the transformations and their encoding in more detail.

To reverse the transformation (after decoding the pixel values) :

The effect of this transformation is as follows:

The information is encoded as follows:

To be able to reconstruct the original values, the mapping from the reduced range to the original range is encoded. Near-zero symbol coding is used, with a single context which we’ll call A.

The ChannelCompact transformation looks at each channel independently, and reduces its range by eliminating values that do not actually occur in the image.

Transformation identifier 2 is not used. It is reserved for future extensions that support transformations to other color spaces like YCbCr.

The conditional range function crange is updated to reflect this. It is updated as follows:

Unlike the RGB color space, not every coordinate in the YCoCg color space corresponds to an actual color. In particular, the range for Co and Cg is much smaller for near-black and near-white colors than for intermediate luma values:

Define origmax4 to be equal to max( orig_range(0).max , orig_range(1).max , orig_range(2).max )/4+1 and newmax to be equal to 4 * (origmax4) - 1. In the most common case where the three channels have the range 0..255, this evaluates to origmax4 = 64 and newmax = 255.

Luma (Y) corresponds to roughly 50% green, 25% red, 25% blue. Chroma orange (Co) is positive for colors near orange (red, orange, yellow), and negative for colors near blue. Chroma green (Cg) is positive for colors near green and negative for colors near purple.

The transformation only affects the first three channels (0,1,2). It is not allowed to be used if nb_channels = 1.

The YCoCg transformation converts the colorspace from RGB to YCoCg. No information has to be encoded for this (besides the identifier of the transformation).

The decoder has to check that p actually describes a permutation, i.e. it is a bijection (no two input channels map to the same output channel).

To encode the parameters of this transformation, near-zero symbol coding is used, with a single context which we will call A.

The reverse transformation can easily be derived from this: given input values (in 0 ,in 1 ,in 2 ,in 3 ), the output values are given by out p(c) = in c if there is no Subtract or c is 0 or 3, and by out p(c) = in c + in 0 if there is Subtract and c is 1 or 2.

With subtraction, the forward transformation looks as follows:

Without subtraction, the forward transformation looks as follows:

We denote the permutation used by PermutePlanes with p , where p(nc)=oc means that the new channel number nc corresponds to the old channel number oc .

There are two main reasons to do a channel reordering: better compression (the order matters for compression since the values of previously encoded channels are used in the MANIAC properties, see below), and better progressive previews (e.g. Green is perceptually more important than Red and Blue, so it makes sense to encode it first). Additionally, subtracting channel 0 from the other channels is a simple form of channel decorrelation; usually not as good as the YCoCg transformation though.

This transformation is not allowed to be used in conjunction with the YCoCg transformation; it is also not allowed to be used if nb_channels = 1. Also, if alpha_zero is true, then channel 3 (Alpha) is not allowed to be permuted to a different channel number.

The PermutePlanes transformation reorders (permutes) the channels; optionally it also subtracts the values of the new channel 0 from the values of channels 1 and 2. This transformation is useful if for some reason the YCoCg transformation is not used: it can e.g. be used to transform RGB to G (R-G) (B-G).

Transformation 4: Bounds

Transformation 5: PaletteAlpha

Transformation 6: Palette

Transformation 7: ColorBuckets

The ColorBuckets transformation is an alternative to the Palette transformations; it is useful for sparse-color images, especially if the number of colors is relatively small but still too large for effective palette encoding.

Unlike the Palette transformations, ColorBuckets does not modify the actual pixel values. As a result, the reverse transformation is trivial: nothing has to be done. However, the transformation does change the crange and the snap functions. By reducing the range of valid pixel values (sometimes drastically), compression improves.

A 'Color Bucket' is a (possibly empty) set of pixel values. For channel 0, there is a single Color Bucket b 0 . For channel 1, there is one Color Bucket for each pixel value in orig_range(0); we’ll denote these Color Buckets with b 1 (v 0 ). For channel 2, there is one Color Bucket for each combination of values (v 0 ,Q(v 1 )) where v 0 is in orig_range(0), v 1 is in orig_range(1), and the quantization function Q maps x to (x - orig_range(1).min) / 4. Finally, for channel 3, there is a single Color Bucket b 3 .

The new ranges are identical to the original ranges: new_range(c) = orig_range(c).

The new conditional ranges are given by the minimum and maximum of the corresponding Color Buckets:

new_crange(0) = min( b 0 ) .. max( b 0 )

new_crange(1,v 0 ) = min( b 1 (v 0 ) ) .. max( b 1 (v 0 ) )

new_crange(2,v 0 ,v 1 ) = min( b 2 (v 0 ,Q(v 1 ) ) .. max( b 2 (v 0 ,Q(v 1 ) )

new_crange(3) = min(b 3 ) .. max(b 3 )

The new snap function returns the value in the corresponding Color Bucket that is closest to the input value; if there are two such values, it returns the lowest one. For example, if Color Bucket b 1 (20) is the set {-1,3,4,6,8}, then new_snap(1,20,x) returns -1 for x=-100, 4 for x=5, 6 for x=6, and 8 for x=100.

The ColorBuckets transformation is not allowed in the following circumstances:

Palette or PaletteAlpha is used, or in general, both channel 0 and 2 contain only zeroes: orig_range(0) = orig_range(2) = 0 .. 0

The image is a grayscale image

Channel 1 is trivial: orig_range(1) is a singleton

Channel 0, 1 or 2 requires more than 10 bits: orig_range(c).max - orig_range(c).min > 1023

To encode the Color Buckets, (generalized) near-zero symbol coding is used with 6 different contexts, which we will call A,B,C,D,E, and F. The encoding is quite complicated.

To decode, all Color Buckets are initialized to empty sets.

First b 0 is encoded:

Type Description Condition Effect nz_int_A(0,1) = 1 Boolean: nonempty gnz_int_B(orig_range(0).min, orig_range(0).max) min gnz_int_C(min, orig_range(0).max) max if max - min < 2, then b 0 = { min, max } nz_int_D(0,1) discrete max - min > 1 if discrete = 0, then b 0 = min .. max gnz_int_E(2, min(255, max - min)) n = size of b 0 discrete b 0 [0] = min, b 0 [n-1] = max gnz_int_F(b 0 [i-1]+1, max + 1 + i - n) b 0 [i] discrete, repeat: i from 1 to n-2

Next, for all values v 0 in b 0 , Color Bucket b 1 (v 0 ) is encoded:

Type Description Condition Effect nz_int_A(0,1) Boolean: nonempty if false, b 1 (v 0 ) is the empty set gnz_int_B(orig_crange(1,v 0 ).min, orig_crange(1,v 0 ).max) min nonempty gnz_int_C(min, orig_range(1,v 0 ).max) max nonempty if max - min < 2, then b 1 (v 0 ) = { min, max } nz_int_D(0,1) discrete max - min > 1 if discrete = 0, then b 1 (v 0 ) = min .. max gnz_int_E(2, min(510, max - min)) n = size of b 1 (v 0 ) discrete b 1 (v 0 )[0] = min, b 1 (v 0 )[n-1] = max gnz_int_F(b 1 (v 0 )[i-1]+1, max + 1 + i - n) b 1 (v 0 )[i] discrete, repeat: i from 1 to n-2

Next, for all values v 0 in b 0 , for all values qv 1 from 0 to (orig_range(1).max - orig_range(1).min) / 4, Color Bucket b 2 (v 0 ,qv 1 ) is encoded if for some k in 0..3, it is the case that v 1 = qv 1 * 4 + orig_range(1).min + k is in the set b 1 (v 0 ):

Type Description Condition Effect nz_int_A(0,1) Boolean: nonempty if false, remove v 1 from b 1 (v 0 ) gnz_int_B(min k=0..3 (orig_crange(2,v 0 ,v 1 + k).min), max k=0..3 (orig_crange(2,v 0 ,v 1 + k).max)) min nonempty gnz_int_C(min, max k=0..3 (orig_crange(2,v 0 ,v 1 + k).max)) max nonempty if max - min < 2, then b 2 (v 0 ,qv 1 ) = { min, max } nz_int_D(0,1) discrete max - min > 1 if discrete = 0, then b 2 (v 0 ,qv 1 ) = min .. max gnz_int_E(2, min(5, max - min)) n = size of b 2 (v 0 ,qv 1 ) discrete b 2 (v 0 ,qv 1 )[0] = min, b 2 (v 0 ,qv 1 )[n-1] = max gnz_int_F(b 2 (v 0 ,qv 1 )[i-1]+1, max + 1 + i - n) b 2 (v 0 ,qv 1 )[i] discrete, repeat: i from 1 to n-2

Finally, if there is an Alpha channel (i.e. nb_channels > 3), then b 3 is encoded in exactly the same way as b 0 .

Transformation 8: reserved (unused)

Transformation 9: reserved (unused)