I have BQ table that is partitioned by insert time. I'm trying to remove duplicates from the table. These are true duplicates: for 2 duplicate rows, all columns are equal - of course having a unique key might have been helpful :-(

At first I tried a SELECT query to enumerate duplicates and remove them:

SELECT * EXCEPT(row_number) FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY id_column) row_number FROM `mytable`) WHERE row_number = 1

This results in unique rows but creates a new table that doesn't include the partition data - so not good.

I've seen this answer here which states the only way to retain partitions is to go over them one-by-one with the above query and save to a specific target table partition.

What I'd really want to do is use a DML DELETE to remove the duplicate rows in place. I tried something similar to what this answer suggested:

DELETE FROM `mytable` AS d WHERE (SELECT ROW_NUMBER() OVER (PARTITION BY id_column) FROM `mytable ` AS d2 WHERE d.id = d2.id) > 1;

But the accepted answer doesn't work and results in a BQ error:

Error: Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN

Would be great if anyone could offer a simpler (DML or otherwise) way to deal with this so I won't be required to loop over all partitions individually.