How does Teradata handle Skew in Joins?

The skewed workload is one of the two most important problems we have to solve in Performance Tuning.

Teradata skewed tables lead to skewed workload

This is especially important for join steps, as large amounts of data may be copied between the AMPs.

The optimizer has techniques to execute joins separately for skewed primary index values and non-skewed primary index values. It then copies the skewed values to all AMPs and redistributes the unskewed values.

This technique is called Partial Redistribution and Partial Duplication or PRPD for short. You can read more about it here:

This article shows you how the optimizer adapts the join strategy (i.e. join method and join geography) at skew if suitable statistics are available. Our example join is a merge join.

In our example we use two tables, both have the column “PK” as the primary index.

However, the left table is joined using column “col6”, which is not the Primary Index.

Therefore, at some point in the join plan, the rowhash for column “col6” must be calculated for this table. This is a prerequisite for being able to perform a merge join.

We have collected statistics on the join columns to provide the optimizer with information about skew. Skewed columns are displayed in the statistics histograms as “biased values” and are taken into account when creating the join plan.

The left table contains 6 million rows, the right table 500,000 rows.

We execute the following query:

SELECT * FROM TEST_DB.left_table main INNER JOIN TEST_DB.TABLE2 right_table ON main.COL6 = right_table.PK WHERE right_table.COL4 = 'A';

First, we filled the tables with data in such a way that both tables are evenly distributed over all AMPs.



Here is the execution plan that the optimizer has created. I marked the keywords for you:

4) We do an all-AMPs RETRIEVE step from TEST_DB.main by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by the hash code of (TEST_DB.main.COL6) to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high confidence to be 6,708,141 rows (7,184,419,011 bytes). The estimated time for this step is 1 minute and 46 seconds. 5) We do an all-AMPs JOIN step from TEST_DB.right_table by way of a RowHash match scan with a condition of ("TEST_DB.right_table.COL4 = 'A '"), which is joined to Spool 2 (Last Use) by way of a RowHash match scan. TEST_DB.right_table and Spool 2 are joined using a merge join, with a join condition of ("COL6 = TEST_DB.right_table.PK"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 6,708,141 rows (28,422,393,417 bytes). The estimated time for this step is 31.20 seconds.

The optimizer distributes the left table (6 million rows) after the rowhash of the column “col6” to all AMPs.

Then a merge join is executed between the two tables. As you can see, the small table is directly joined, no spool file is required.

Deliberately Skewing the Data

To demonstrate how the optimizer adapts to skewed data, in a second step we changed the values of the column “col6” of the right table so that 30% of the rows are on only one AMP.

Of course, after skewed the data of the right table, we updated the statistics on the join columns.

As expected the optimizer does a good job: It duplicates the right table to avoid a sketched join. Here is the new execution plan:

4) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from TEST_DB.right_table by way of an all-rows scan with a condition of ( "TEST_DB.right_table.COL4 = 'A '") into Spool 2 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool Two by the hash code of (TEST_DB.right_table.PK). The size of Spool 2 is estimated with high confidence to be 3,600,000 rows (3,855,600,000 bytes). The estimated time for this step is 34.56 seconds. 2) We do an all-AMPs RETRIEVE step from TEST_DB.main by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 3 by the hash code of ( TEST_DB.main.COL6). The size of Spool 3 is estimated with high confidence to be 6,708,141 rows (7,184,419,011 bytes). The estimated time for this step is 45.73 seconds. 5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to Spool 3 (Last Use) by way of a RowHash match scan. Spool 2 and Spool 3 are joined using a merge join, with a join condition of ("COL6 = PK"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with low confidence to be 3,354,070,500 rows (14,211,196,708,500 bytes). The estimated time for this step is 3 hours and 26 minutes.

In the statistic histograms, the most common values are kept separately as “biased values” and are used by the optimizer for join planning.

The decision of the optimizer can also be to join the table separately for skewed and non-skewed values, and finally to combine the overall result.

Whatever join strategy the optimizer uses:

Without collected statistics, there is a high risk that the wrong strategy was chosen.

Therefore you should never forget to collect statistics, especially on the skewed columns. Here random AMP sampling or using sample statistics is definitely not enough.