This post is about how use least squares optimization to slightly adjust the texel values in a texture in order to minimize seams across edges in UV-space.

In other words, to turn this:



into this:



using least-squares optimization.

What are we trying to fix here?

The problem we have is that most meshes can’t be perfectly unfolded into a 2D image plane for texturing. To adress this, we cut the mesh up into flat-ish regions called “charts” and pack them into a texture. This means some edges must be split up - in 3D these edges are one and the same, but in 2D they belong to different charts. This in turn means that when the GPU filters the texture near such a “seam” edge, the filtered values on either side (which lie in two different charts) may have very different values. This is what produces the harsh filtering artefacts seen in the first image above. Here’s an example of how the charts in the above mesh are laid out in 2D space.



The fix is to figure out what values to put in the “gutter region” (i.e. the texels just outside chart edges), in order to achieve smooth filtering across the seam.

What are we not trying to fix

It’s worth taking a moment here to point out that we are assuming the texture data on either side of a seam edge roughly match up. If you want to fix up seams where the two sides are wildly different, you’re going to need some kind of texture synthesis. This is not what we’re doing. We’re merely taking textures that are supposed to match up perfectly along seams, but don’t due to parameterization issues, and tweaking them to fix things up.

The basic idea

The idea here is that we build up a massive equation system where all of the texels that are within a bilinear neighborhood of these seam edges are unknown variables that we’re trying to find. Then we generate a ton of sample locations along each seam, and create an equation for each setting the bilinear sample on one side of the seam to equal the bilinear sample of the other side.

This is an over-determined system and there isn’t an exact solution (in most cases), but we can use linear least squares to find a solution that minimizes the sum of the square errors for each of these equations.

This isn’t a new idea. I know plenty of studios who are doing this for their textures (for example naughty dog), but there doesn’t seem to be any sample code for this anywhere on the internet. Hopefully this will help more people use this technique.

Prepping the data

The first thing we’re going to need to do is load up the texture and mesh data. The sample code uses a very cheesy OBJ parser (which is unlikely to work for your meshes - it does the bare minimum to load a sample mesh I found online) and stb libraries for loading images. You may want to plug in your own asset loading routines here.

Next, we’ll need to figure out which texels in the image are actually inside the triangles, and which are in the “gutter region”. To do this, we rasterize all the triangels in the mesh into UV space, and store the result into a “coverage buffer”, which just sets values to 1 if they are inside the triangle, and leaves them at 0 if not. We use a simple method to do this, where we loop over a bounding box for the triangle, and check each texel coordinate if it’s inside the UV coordinates for the triangle:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Vec2 UVToScreen ( Vec2 in , int W , int H ) { in . v = 1.0f - in . v ; in . u *= W ; in . v *= H ; return in - Vec2 { 0.5f , 0.5f }; } bool isInside ( int x , int y , Vec2 ea , Vec2 eb ) { // Check if (x,y) is "inside" the edge (ea,eb) return ( float ) x * ( eb . v - ea . v ) - ( float ) y * ( eb . u - ea . u ) - ea . u * eb . v + ea . v * eb . u >= 0 ; } void RasterizeFace ( Vec2 uv0 , Vec2 uv1 , Vec2 uv2 , array2d < uint8_t >& coverageBuf ) { uv0 = UVToScreen ( uv0 , coverageBuf . width , coverageBuf . height ); uv1 = UVToScreen ( uv1 , coverageBuf . width , coverageBuf . height ); uv2 = UVToScreen ( uv2 , coverageBuf . width , coverageBuf . height ); // Axis aligned bounds for the triangle int minx = ( int ) min3 ( uv0 . u , uv1 . u , uv2 . u ); int maxx = ( int ) max3 ( uv0 . u , uv1 . u , uv2 . u ) + 1 ; int miny = ( int ) min3 ( uv0 . v , uv1 . v , uv2 . v ); int maxy = ( int ) max3 ( uv0 . v , uv1 . v , uv2 . v ) + 1 ; // The three edges we will test Vec2 e0a = uv0 , e0b = uv1 ; Vec2 e1a = uv1 , e1b = uv2 ; Vec2 e2a = uv2 , e2b = uv0 ; // Now just loop over a screen aligned bounding box around the triangle, // and test each pixel against all three edges for ( int y = miny ; y <= maxy ; ++ y ) { for ( int x = minx ; x <= maxx ; ++ x ) { if ( isInside ( x , y , e0a , e0b ) & isInside ( x , y , e1a , e1b ) & isInside ( x , y , e2a , e2b )) { int xout = WrapCoordinate ( x , coverageBuf . width ); int yout = WrapCoordinate ( y , coverageBuf . height ); coverageBuf ( xout , yout ) = 1 ; } } } }

Once we’ve done this for all faces in the mesh, we have a quick way of knowing whether a pixel is inside or outside the charts (indeed, this is how I generated the B&W coverage mask above).

The next thing we need to figure out is which of our edges are actual seams (i.e. two triangle edges that have the same 3D geometric positions, but different UVs).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 void FindSeamEdges ( const Mesh & mesh , std :: vector < SeamEdge >& seamEdges , int W , int H ) { using namespace std ; numSamplePoints = 0 ; struct tupleHasher // Hash pairs of vectors { hash < float > h ; size_t operator ()( const tuple < Vec3 , Vec3 > x ) const { // TODO: implement better hash function :) Vec3 a = std :: get < 0 > ( x ), b = std :: get < 1 > ( x ); return h ( a . x ) + 1733 * h ( a . y ) + 43 * h ( a . z ) + 3257 * h ( b . x ) + 1091 * h ( b . y ) + 557 * h ( b . z ); } }; unordered_map < tuple < Vec3 , Vec3 > , tuple < Vec2 , Vec2 > , tupleHasher > edgeMap ; for ( const auto & f : mesh . faces ) { // Make a list of edges (with UVs), to help with looping over them. tuple < int , int , int , int > edges [] = { make_tuple ( f . v0 , f . v1 , f . uv0 , f . uv1 ), make_tuple ( f . v1 , f . v2 , f . uv1 , f . uv2 ), make_tuple ( f . v2 , f . v0 , f . uv2 , f . uv0 ) }; for ( const auto & e : edges ) { // Grab the actual verts and UVs. The mesh may not have been // "welded" properly, so we have to check the actual coordinates. // and not just the indices. Vec3 v0 = mesh . verts [ get < 0 > ( e )], v1 = mesh . verts [ get < 1 > ( e )]; Vec2 uv0 = mesh . uvs [ get < 2 > ( e )], uv1 = mesh . uvs [ get < 3 > ( e )]; // Try to find the "opposite" edge (note: reversing the order of // verts here) auto otherEdge = edgeMap . find ( make_tuple ( v1 , v0 )); if ( otherEdge == edgeMap . end ()) { // First time we're seeing this edge, so add it to the map edgeMap [ make_tuple ( v0 , v1 )] = make_tuple ( uv0 , uv1 ); } else { // This edge has already been added once, so we have enough information // to see if it's a normal edge, or a "seam edge". Vec2 otheruv0 = get < 0 > ( otherEdge -> second ); Vec2 otheruv1 = get < 1 > ( otherEdge -> second ); if ( otheruv0 != uv1 || otheruv1 != uv0 ) { // UVs don't match, so we have a seam SeamEdge s = SeamEdge { HalfEdge { uv0 , uv1 }, HalfEdge { otheruv1 , otheruv0 } }; seamEdges . push_back ( s ); } // No longer need this edge, remove it to keep storage low edgeMap . erase ( otherEdge ); } } } }

This code is relatively straightforward. First we do the song and dance STL requires of us in order to use a hash table for our edges (which we represent as pairs of vectors). Then we simply find edges that have identical 3D vertex positions but differing UV coordinates, and add them to our seamEdges vector.

So, now we know where are seams are, and what pixels inside our triangles. The next thing we need is to figure out what texels we need to optimize. These are the texels that lie “on” the seam edges (in truth, the texels that are within the bilinear neighborhood of the seam edges). To do that we loop through our seams, generate the sample points we will later use to optimize our pixels, and the first time we end up using a particular texel we store it in a vector. The pixels that are in this vector are the ones we will end up tweaking in the rest of the program and later writing back out to the texture. We also build a 2D map that will make it easy to go from a 2D coordinate to the particular “pixel index” in this vector.

Finally, while we’re looping through these pixels anyway we initialize the color data for the pixels we’re trying to optimize. For covered pixels, we just convert to YCoCg in order to reduce the correlation between channels (this lets us optimize the color channels independently). For uncovered pixels we compute an average of the covered pixels in its neighborhood. Having an initial guess at the color value for our gutter pixels will help later, when we’re going to use an iterative solver to refine these guesses.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 struct PixelInfo { int x , y ; bool isCovered ; Vec3 colorYCoCg ; }; void ComputePixelInfo ( const std :: vector < SeamEdge >& seamEdges , const array2d < uint8_t >& coverageBuf , const array2d < RGB >& image , std :: vector < PixelInfo >& pixelInfo , array2d < int >& pixelToPixelInfoMap ) { // Find the pixels we will optimize for. Use a 2D map so that we can find a // unique set of pixels that we need to optimize for, and a quick way to find // the index of it given an (x,y) position. int W = coverageBuf . width ; int H = coverageBuf . height ; for ( const auto & s : seamEdges ) { // TODO: this is overkill.. Could do conservative line rasterization instead of brute // force sampling 3x per pixel and take the union of the 2x2 sampling neighborhoods. int numSamples = s . numSamples ( W , H ); for ( const auto & e : s . edges ) { Vec2 e0 = UVToScreen ( e . a , W , H ); Vec2 e1 = UVToScreen ( e . b , W , H ); Vec2 dt = ( e1 - e0 ) / ( float )( numSamples - 1 ); Vec2 samplePoint = e0 ; for ( int i = 0 ; i < numSamples ; ++ i , samplePoint += dt ) { // Go through the four bilinear sample taps int xs [] = { ( int ) samplePoint . u , xs [ 0 ] + 1 , xs [ 0 ] + 1 , xs [ 0 ] }; int ys [] = { ( int ) samplePoint . v , ys [ 0 ] , ys [ 0 ] + 1 , ys [ 0 ] + 1 }; for ( int tap = 0 ; tap < 4 ; ++ tap ) { int x = WrapCoordinate ( xs [ tap ], W ); int y = WrapCoordinate ( ys [ tap ], H ); // If we haven't already seen this pixel, make sure we take this // pixel into account when optimizing if ( pixelToPixelInfoMap ( x , y ) == - 1 ) { bool isCovered = !! coverageBuf ( x , y ); Vec3 colorYCoCg ; if ( isCovered ) { colorYCoCg = RGBToYCoCg ( image ( x , y )); } else { // Do dilation... colorYCoCg = RGBToYCoCg ( DilatePixel ( x , y , image , coverageBuf )); } pixelInfo . push_back ( PixelInfo { x , y , isCovered , colorYCoCg }); pixelToPixelInfoMap ( x , y ) = ( int ) pixelInfo . size () - 1 ; } } } } } }

Alright, now we have all the information we need to work with, and can finally get down to the meat of the problem.

Optimizing pixel values

To figure out the final pixel values, we build up a least squares problem of the form:

$$ Ax = b $$

Where $$ x $$ is a vector containing all the pixel values we’re trying to find (for one channel at a time). The $$ A $$ matrix will contain one row per sample generated above with columns set to the sample weights of the two bilinearly filtered samples on either side of the seam (the weights for one side of the seam will be negated), and the RHS $$ b $$ set to 0. This means that we’re trying to find an $$ x $$ such that the bilinear samples on either side of a seam are identical.

Of course, so far there’s a trivial soluiton - set $$ x=0 $$! To make sure we get a more useful solution, we also add in rows that tries to keep the pixel values equal to their initial values. Note: we do also add terms even for gutter pixels to try to keep them close to their initial guess (weighted by a small value). This might seem strange since we only really care about the pixels the artist drew, not the stuff outside. The reason we do this is that if we don’t, the gutter pixel values may end up being pushed to extreme values (e.g. very large values, or even negative), which could overlflow fixed point color formats. By adding some terms to keep the solution close to “reasonable” values, we typically don’t have to worry about any crazy values that can’t fit within e.g. 8 bits. This also helps block compression.

Now, solving this equation exactly isn’t possible (in general). We have too many equations for our unknowns, and they are potentially contradictory. What we do instead is use linear least squares to find a good compromise value (one which minimizes the square error for each term). There are many ways of solving linear least squares. For example, you could do SVD or QR decomposition, or you could solve the normal equations (by inverting the Gramian matrix or use gaussian elimination). These are all pretty complicated to implement would most likely be extremely slow for the huge matrix we have. There are two reasons to expect that an iterative algorithm will work much better for us:

We can use domain knowledge to produce a pretty good guess for our pixel values already. This allows us to start our iterative solver near the solution.

We only really need our solution to look okay, not be the strictly most optimal solution. So we can bail out of the algorithm pretty early.

So we’ll use an iterative method instead. The simplest iterative solver is gradient descent - however we won’t use that because it tends to be pretty darn slow and it’s hard to tune the learning rate. What we will instead do is compute the normal matrix (see Wikipedia entry linked previously) which gives us a symmetric matrix which can then be solved using The Conjugate Gradient method. I won’t dwell on the details, you can see the wikipedia entry to learn more, but basically we multiply both sides of the equation above by $$ A^t $$ to get this:

$$ A^tAx = A^tb $$

Then we iteratively improve our estimate of $$ x $$ to minimize residual of this equation. A key benefit of doing things this way is that we never actually have to form $$ A $$! We can directly construct $$ A^tA $$ and $$ A^tb $$ with far less storage required. Once we have that, we can implement the Conjugate Gradient method as described in the Wikipedia article above:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 void ConjugateGradientOptimize ( VectorX & out , SparseMat & A , const VectorX & guess , const VectorX & b , int numIterations , float tolerance ) { size_t n = guess . size ; VectorX p ( n ), r ( n ), Ap ( n ), tmp ( n ); out = guess ; VectorX & x = out ; // r = b - A*x; SparseMat :: Mul ( tmp , A , x ); VectorX :: Sub ( r , b , tmp ); p = r ; float rsq = VectorX :: Dot ( r , r ); for ( int i = 0 ; i < numIterations ; ++ i ) { SparseMat :: Mul ( Ap , A , p ); float alpha = rsq / VectorX :: Dot ( p , Ap ); VectorX :: MulAdd ( x , p , alpha , x ); // x = x + alpha*p VectorX :: MulAdd ( r , Ap , - alpha , r ); // r = r - alpha*Ap float rsqnew = VectorX :: Dot ( r , r ); if ( fabs ( rsqnew - rsq ) < tolerance * n ) break ; float beta = rsqnew / rsq ; VectorX :: MulAdd ( p , p , beta , r ); // p = r + beta*p rsq = rsqnew ; } }

That wasn’t so bad! It’s basically a 1:1 port of the pseudo code in the wikipedia article, and in return we can efficiently optimize huge least squares problems without having to link in a bunch of complicated linear algebra library (or worse, implement complicated linear algebra ourselves). We just pass in $$ A^tA $$ as as the matrix, and $$ A^tb $$ as the RHS and out comes our solution $$ x $$.

Now, I’ve glossed over something here. In order for the above code to work we’d need some support classes for matrices and vectors. In this case, we only need a really simple dense vector ( VectorX ), and a relatively simple sparse matrix ( SparseMat ). I won’t list the vector implementation, since I think it’s pretty straightforward. For the matrix, a key insight is that this matrix will be mostly full of zeros. It would be wasteful to store all those zeros, and even more wasteful to perform a bunch of multiplications against zeroes. Thus, we store the matrix data in a sparse matrix.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 struct SparseMat { SparseMat ( int numRows , int numCols ) : rows ( new Row [ numRows ]), numRows ( numRows ), numCols ( numCols ) {} float & operator ()( size_t row , size_t column ) { assert ( row < numRows ); assert ( row >= 0 ); assert ( column < numCols ); assert ( column >= 0 ); return rows [( int ) row ][( int ) column ]; } static void Mul ( VectorX & out , const SparseMat & A , const VectorX & x ) { assert ( out . size == A . numRows ); assert ( x . size == A . numCols ); concurrency :: parallel_for < size_t > ( 0 , A . numRows , [ & ]( auto r ) { out [ r ] = Dot ( x , A . rows [ r ]); }, concurrency :: simple_partitioner ( 100 )); } private : struct Row { template < typename T > struct AlignedDeleter { void operator ()( T * ptr ) { _aligned_free ( ptr ); } }; size_t n = 0 ; int capacity = 0 ; std :: unique_ptr < float [], AlignedDeleter < float >> coeffs ; std :: unique_ptr < int [], AlignedDeleter < int >> indices ; float & operator []( int column ) { // Find the element size_t index = findClosestIndex ( column ); if ( n == 0 || indices [ index ] != column ) // Add new element { if ( n == capacity ) { grow (); } // Put the new element in the right place, and shift // existing elements down by one. float prevCoeff = 0 ; int prevIndex = column ; ++ n ; for ( size_t i = index ; i < n ; ++ i ) { float tmpCoeff = coeffs [ i ]; int tmpIndex = indices [ i ]; coeffs [ i ] = prevCoeff ; indices [ i ] = prevIndex ; prevCoeff = tmpCoeff ; prevIndex = tmpIndex ; } } return coeffs [ index ]; } void grow () { capacity = capacity == 0 ? 16 : capacity + capacity / 2 ; float * newCoeffs = ( float * ) _aligned_malloc ( sizeof ( float ) * capacity , 32 ); int * newIndices = ( int * ) _aligned_malloc ( sizeof ( int ) * capacity , 32 ); // Copy existing data over memcpy ( newCoeffs , coeffs . get (), n * sizeof ( float )); memcpy ( newIndices , indices . get (), n * sizeof ( int )); coeffs . reset ( newCoeffs ); indices . reset ( newIndices ); } size_t findClosestIndex ( int columnIndex ) { for ( int i = 0 ; i < n ; ++ i ) { if ( indices [ i ] >= columnIndex ) return i ; } return n ; } }; std :: unique_ptr < Row [] > rows ; int numRows , numCols ; static float Dot ( const VectorX & x , const Row & row ) { float sum = 0.0f ; for ( size_t i = 0 ; i < row . n ; ++ i ) { sum += x [ row . indices [ i ]] * row . coeffs [ i ]; } return sum ; } };

There’s some additional complexity here that strictly speaking I could remove. I’ve allocated the memory using aligned allocations because I had some hope of being able to turn the sparse dot product into an AVX optimized kernel. Unfortunately that didn’t work out so far (it was slower).

Anyway, the way this works is that we store a dense array of Row s (in this case, all rows have at least one element, so no need to be sparse here). Each Row is in turn sparsely stored as a list of indices and coefficients. These are stored in separate arrays for efficiency, but conceptually it’s a single array of (index, coefficient) pairs. When we try to lookup a row element, we simply do a linear search in the index array for the closest index. If the index was found, we just return a reference to the corresponding coefficient, and if not we shuffle all the elements down to make room and insert a fresh 0 first. That’s really all you need to know! The matrix multiplication is pretty straightforward (it’s no accident that we stored the matrix row-major!), and the VectorX is implemented exactly as you’d think it would be.

Now all that’s left is to map the solution back to the image and save it out and we’re done. Performance varies by texture size, how many seams you have, etc. but it shouldn’t take more than a few seconds even for large textures.

Conclusion

There you have it! An algorithm you can use to improve seam filtering for basically any texture, without needing to include any expensive or complicated libraries in your project. As a side effect, you also get a linear least squares solver that you could use for other things.

There’s really three main things I’d like people to take away from this:

Least squares optimization is a pretty damn useful tool. It’s really not that hard to implement. Even without big math libraries like Eigen, Ceres, or Alglib you could get a pretty good solution with just a little bit of code on your end. So if you don’t need fancy stuff, you should seriously consider the costs of pulling in an external library. And as a bonus, the vehicle I chose for making the above two points is a pretty damn handy trick that you probably should be doing if you ever have texture seams to deal with.

For the rough and dirty code to this, click here. I haven’t bothered uploading the whole Visual Studio solution or anything, but it should be relatively straightforward to figure it out (it’s all in one CPP file, and you just need some stb libs).