Some time ago Joshua Tolley described how to reduce bloat from tables without locking (well, some locks are there, but very short, and not really intrusive).

Side note: Joshua: big thanks, great idea.

Based on his idea and some our research, i wrote a tool which does just this – reduces bloat in table.

This tool is available to download from OmniTI SVN repo.

How does it work?

First I'll create small table:

$ CREATE TABLE test_table AS SELECT i AS id , repeat ( 'value : ' || i , CAST ( random ( ) * 500 + 500 AS int4 ) ) AS textual , now ( ) - '10 years' :: INTERVAL * random ( ) AS when_tsz , random ( ) < 0.5 AS some_flag , random ( ) * 1000000 AS some_float FROM generate_series ( 1 , 1000000 ) i; SELECT 1000000

as you can see, it has 1M rows, and it's size is;

$ SELECT pg_relation_size ( 'test_table' ) , pg_total_relation_size ( 'test_table' ) ; pg_relation_size | pg_total_relation_size ------------------+------------------------ 207290368 | 207298560 ( 1 ROW )

since it was created by insert, it has zero bloat.

So, let's make it have 50% of table bloat:

$ DELETE FROM test_table WHERE 0 = id % 2 ; DELETE 500000

Now, for demonstration purposes, i will create 2 indexes:

$ CREATE INDEX whatever1 ON test_table ( when_tsz ) ; CREATE INDEX $ ALTER TABLE test_table ADD PRIMARY KEY ( id ) ; NOTICE: ALTER TABLE / ADD PRIMARY KEY will CREATE implicit INDEX "test_table_pkey" FOR TABLE "test_table" ALTER TABLE

please note that these indexes do not have any bloat – they were created after I deleted rows from table.

Sizes of the indexes:

$ select pg_relation_size( ‘whatever1' ), pg_relation_size( ‘test_table_pkey' );

pg_relation_size | pg_relation_size

——————+——————

11255808 | 11255808

(1 row)

Now. I can run my tool.

It requires that I will tell it how many pages to try to free.

It's important, as each page freed means bloat of indexes, so it's safer (although slower) to start with some small numbers. Let's say 20.

So, I run the script:

2010 - 10 - 17 15 :08: 38 : ./ compact_table Settings: - CLEAN_PAGES : 20 - EXTENDED_DEBUG : 0 - FINAL_VACUUM : 1 - INITIAL_VACUUM : 1 - PGDATABASE : - PGHOST : - PGPORT : - PGUSER : - PSQL : psql - TABLE_NAME : test_table - TABLE_SCHEMA : public - VERBOSE : 1 2010 - 10 - 17 15 :08: 38 : At most , we can have 315 tuples per page . 2010 - 10 - 17 15 :08: 38 : Updates will be done ON COLUMN : textual 2010 - 10 - 17 15 :08: 38 : Entering main loop . 2010 - 10 - 17 15 :08: 38 : Initial vacuuming 2010 - 10 - 17 15 :08: 53 : CURRENT TABLE SIZE : 25304 pages . 2010 - 10 - 17 15 :08: 53 : Working ON page 25303 ( 1 OF 20 ) 2010 - 10 - 17 15 :08: 54 : Working ON page 25302 ( 2 OF 20 ) 2010 - 10 - 17 15 :08: 56 : Working ON page 25301 ( 3 OF 20 ) 2010 - 10 - 17 15 :08: 58 : Working ON page 25300 ( 4 OF 20 ) 2010 - 10 - 17 15 :08: 59 : Working ON page 25299 ( 5 OF 20 ) 2010 - 10 - 17 15 :09:01 : Working ON page 25298 ( 6 OF 20 ) 2010 - 10 - 17 15 :09:03 : Working ON page 25297 ( 7 OF 20 ) 2010 - 10 - 17 15 :09:05 : Working ON page 25296 ( 8 OF 20 ) 2010 - 10 - 17 15 :09:06 : Working ON page 25295 ( 9 OF 20 ) 2010 - 10 - 17 15 :09:08 : Working ON page 25294 ( 10 OF 20 ) 2010 - 10 - 17 15 :09: 10 : Working ON page 25293 ( 11 OF 20 ) 2010 - 10 - 17 15 :09: 12 : Working ON page 25292 ( 12 OF 20 ) 2010 - 10 - 17 15 :09: 14 : Working ON page 25291 ( 13 OF 20 ) 2010 - 10 - 17 15 :09: 16 : Working ON page 25290 ( 14 OF 20 ) 2010 - 10 - 17 15 :09: 18 : Working ON page 25289 ( 15 OF 20 ) 2010 - 10 - 17 15 :09: 20 : Working ON page 25288 ( 16 OF 20 ) 2010 - 10 - 17 15 :09: 22 : Working ON page 25287 ( 17 OF 20 ) 2010 - 10 - 17 15 :09: 23 : Working ON page 25286 ( 18 OF 20 ) 2010 - 10 - 17 15 :09: 25 : Working ON page 25285 ( 19 OF 20 ) 2010 - 10 - 17 15 :09: 27 : Working ON page 25284 ( 20 OF 20 ) 2010 - 10 - 17 15 :09: 29 : Final vacuuming 2010 - 10 - 17 15 :09: 31 : Final TABLE SIZE : 25304 pages . SELECT 'Size of index (public.whatever1) before reindex:' , pg_relation_size ( 'public.whatever1' ) ; CREATE INDEX CONCURRENTLY whatever1_new ON test_table USING btree ( when_tsz ) ; DROP INDEX public . whatever1; ALTER INDEX public . whatever1_new RENAME TO whatever1; SELECT 'Size of index (public.whatever1) after reindex:' , pg_relation_size ( 'public.whatever1' ) ; SELECT 'Size of index (public.test_table_pkey) before reindex:' , pg_relation_size ( 'public.test_table_pkey' ) ; CREATE UNIQUE INDEX CONCURRENTLY test_table_pkey_new ON test_table USING btree ( id ) ; DROP INDEX public . test_table_pkey; ALTER INDEX public . test_table_pkey_new RENAME TO test_table_pkey; SELECT 'Size of index (public.test_table_pkey) after reindex:' , pg_relation_size ( 'public.test_table_pkey' ) ; 2010 - 10 - 17 15 :09: 31 : ALL done .

That's a lot of output. Let's see what it showed.

First it showed settings to be used to connect to database, and how to cleanup.

Then, it calculated that with this PostgreSQL, there can be at most 315 tuples per page. This is important, because we need to use “=" operator for tid scans, and not (like in Joshua blog) “>=" operator, because this forces seq scan, which we don't want.

Afterwards it ran initial vacuum. It's imporant, as vacuum has to mark free places in table data before we can move records there. Usually you don't need it (that's why it's disabled by default), but in my case – I just created the table, so autovacuum didn't yet had the chance to scan it.

Afterwards, it calculated current table size (25304 pages), and it worked on last 20 pages.

After finishing last (of 20) pages, it ran final vacuum. This vacuum is much more important, as this is when the relation truncate actually happens, so it's on by default.

Post vacuum, script checks current size of table. As you can see, in this case – size of relation did not change. I'm not sure why, but I guess it has something to do with vacuums, because when ran next time, i got this output (irrelevant parts skipped):

=$ ./compact_table -v -t test_table -k 20 ... 2010-10-17 15:14:41 : Current table size: 25304 pages. 2010-10-17 15:14:41 : Working on page 25303 (1 of 20) ... 2010-10-17 15:15:10 : Working on page 25284 (20 of 20) 2010-10-17 15:15:12 : Final vacuuming 2010-10-17 15:15:12 : Final table size: 25284 pages.

Which shows that table size decreased. After this 2nd run, all future will truncate table without problems.

And finally, compact_table script outputs small SQL script (starting with SELECT ‘Size of index…) which can be used to reindex all indexes on the compacted table, if you think you've compacted it already enough.

There is small problem with indexes, though. If given index is base for constraint (primary key, or unique constraint) – it cannot be dropped. In such case, you should drop the constraint first, but afterwards – it will not be possible to re-add the constraint (there is work in progress for 9.1 to allow adding constraints based on existing indexes, but it's not done yet).

Generally, the best way to use the tool, is to run it many times with small -k (10? 20? 100? depends on how big is the table), and only after freeing significant space, do the reindexation.

You can also check index sizes before compacting, and watch them grow, and decide when to stop compacting, and reindex indexes.

Have fun, and hope you'll find it useful.