We have reached an exciting milestone: We now have a working HQ universal encoder that supports both ASTC and BC7 for RGB/RGBA textures. It's currently a bit slow and it doesn't support RDO yet, but it works. Quality is extremely high (BC7 grade, no block artifacts) and the encoder's behavior is stable across a wide range of RGB/RGBA inputs including XYZ normal maps.

UASTC->BC7: 44.41



Original->Near-optimal BC1: 36.96 (stb_dxt STB_DXT_HIGHQUAL)

UASTC->Near-optimal BC1: 36.20





We're not encoding these modes to the standard ASTC block format (although we could), because the standard ASTC block encoding has a lot of unnecessary fields in there we can repurpose. Instead, we use a simple 128-bit/block BC7-like block format for the UASTC mode/endpoints/weights/partition index/comp rotation. Worst case the packed UASTC data takes 112-113 bits, leaving around 15-16 bits for other things.





We have an interesting plan on how to support ETC1/2 at high quality (way better than ETC1S) with fast transcoding. We can take the 15-16 bits left over in our custom block format to store ETC1/2 hints. These hints greatly accelerate real-time high quality ETC1/2 compression (by ~30x for ETC1 vs. a brute force encoder). The UASTC compressor will re-encode the final UASTC block to ETC1/2 and then determine the set of ETC1/2 hints that result in the lowest ETC1/2 error.





The next major step for us is to sit down and implement ETC1/2 to make sure this plan works well on a wide range of inputs.



As this is a universal GPU texture compression system it will support ALL LDR GPU texture formats, like Basis Universal does. Here's the plan for the other formats:





ETC2 R11 and RG11 might be able to reuse the ETC1/2 hints.





We have already prototyped BC1 and found a way to make that very fast in the 1-subset cases. For the other relatively rare 2/3-subset UASTC cases we'll need to use PCA+least squares. Real-time BC3-5 are fast.





PVRTC1, and the other niche/obsolete formats (like PVRTC2, ATC, etc.) will use solutions already implemented in Basis Universal.





UASTC mode constraints/notes:

1. All blocks are always LDR 4x4 pixels, and all UASTC modes use integer weight bits for compatibility with BC7.

2. Only uses Color Endpoint Mode (CEM) 8 or 12 (RGB/RGBA Direct) to simplify the encoder/transcoder. The other CEM's don't help enough to justify the added complexity.



3. CEM 8 and 12 support Blue Contraction, which is never utilized in UASTC. Instead, we swap the subset's endpoints if the MSB of the last weight index is 1 (exactly like BC7). This guarantees the last weight index has an MSB of 0, so we don't need to store it in the packed block format.



The UASTC->ASTC transcoder needs to check the dequantized endpoints to see if blue contraction would kick in. If so, it'll need to invert the weight indices and swap the subset's endpoints.

4. The 2 and 3 subset modes are constrained to only use the set of common 2/3-subset partition patterns that are in common between ASTC and BC7, which we've documented on our blog and on Twitter. Total of 60 patterns (30+11+19).

5. Mode 7 uses a 3-subset BC7 mode, but only a 2-subset ASTC mode. Two of the BC7 subset endpoints are set to equal colors to simplify the 3-subset partition pattern into a 2-subset pattern that's compatible with ASTC. This gives us 19 more useful partitions.

6. Opaque encodings get transcoded to BC7 modes 1,2,3,5,6. Alpha encodings transcode to BC7 modes 5,6,7. BC7 modes 0 and 4 are unused.



7. When the # of weight bits differ between BC7/ASTC encodings, we chose the closest BC7 weight (just a simple table lookup into a static 4/8 entry table). Note that BC7 and ASTC use the same 2-bit and 3-bit weight tables. Some ASTC 4-bit table entries are different by +- 1 compared to BC7, but the encoder can work around this.



8. BC7 and ASTC interpolate endpoints in a similar way, except ASTC endpoints are scaled up to 16-bits before interpolation and then only the top 8-bits are used. This is a surprisingly minor difference that a good encoder can work around by choosing the lowest overall BC7 error from the hundreds/thousands of possible UASTC configurations/partition patterns/endpoints/etc.



9. Strong encoders can compute both ASTC and transcoded BC7 error to choose UASTC encodings that result in minimal BC7 error. (This isn't necessary, it just helps a little.)



10. A driver could easily transcode UASTC texture data to ASTC or BC7 completely transparently to the user. The blocks are completely independent and the transcode step can be done 4-8 blocks at a time with SIMD operations.

UASTC modes:

Format is:

UASTC Mode #, Dual Plane Flag, Texel Weights BISE Range Index (# quant levels), # Subsets, Endpoint BISE Range Index (# quant levels), BC7 Target Mode



Opaque (CEM 8):

0. DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 19 (192) MODE6 RGB

1. DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256) MODE3

2. DualPlane: 0, WeightRange: 5 (8), Subsets: 2, EndpointRange: 8 (16) MODE1

3. DualPlane: 0, WeightRange: 2 (4), Subsets: 3, EndpointRange: 7 (12) MODE2

4. DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40) MODE3

5. DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 20 (256) MODE6 RGB

6. DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 18 (160) MODE5 RGB

7. DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40) MODE2

Solid 8. Void-Extent: Solid Color RGBA (MODE5 or MODE6)



Alpha (CEM 12):

9. DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 8 (16) MODE7

10. DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 13 (48) MODE6

11. DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 13 (48) MODE5

12. DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 19 (192) MODE6

13. DualPlane: 1, WeightRange: 0 (2), Subsets: 1, EndpointRange: 20 (256) MODE5

14. DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256) MODE6 Format is:UASTC Mode #, Dual Plane Flag, Texel Weights BISE Range Index (# quant levels), # Subsets, Endpoint BISE Range Index (# quant levels), BC7 Target Mode

These modes are easily converted directly to a BC7 texture encoding with no pixel-wise recompression, with low quality loss (around .75 dB on average). To convert to BC7, the endpoints are scaled, you compute the optimal p-bits to represent the ASTC endpoints (if any- this is simple), and then you either clone the ASTC indices or translate them with a tiny table. Transcoding to BC7 is very simple stuff, and doesn't require the large precomputed tables that Basis Universal's ETC1S solution needs.