Up to 70% reduction in overall memory usage with 8-bit 4:2:0 and frame threading enabled.

This should cover the most impactful allocations.

There are perhaps a few more possible sumsampling-related improvements that could be looked into, and some palette code could be templated to be able to use pixel instead of uint16_t , but I think we can leave those for some other time.

Testing (especially obscure corner cases like changing bitdepth/resolution/sumsampling) welcome.