

Counting Rows Accurate or Approximate Row Counts 01-Oct-2006 It is often the case that you want to know how many rows are in a table. If you find yourself in this situation, you should also ask how accurate that count should be. If you are doing accounting, you want it to be exactly accurate. If you are decorating a web page with counts, perhaps it would be OK to be off by a few. COUNT(*) The traditional method of counting rows in a table is to do a select count(*) from the table. COUNT(*) is notoriously slow, especially on large tables, because it checks each row. webstat=# select count(*) from rawlog; count --------- 2058704 (1 row) Time: 7202.873 ms You should avoid count(*) queries if possible but it is the definitive answer for accuracy. Reltuples One alternative for count(*) is to select the estimated number of tuples from the pg_class table. This value is updated with each vacuum of the table. If your count of the number of tuples can be off by the number of tuples you add or delete between vacuums, this is the best choice. Don't use this method for accounting purposes, though. The more often vacuum passes over your table the more accurate the number is. This number is stored in pg_class.reltuples. To see all of the row count estimates for all the tables in the public schema of your database use the following: SELECT relname, reltuples FROM pg_class r JOIN pg_namespace n ON (relnamespace = n.oid) WHERE relkind = 'r' AND n.nspname = 'public'; The kind of relation is a table relation ('r') and the namespace name is 'public'. Obviously, to just see one named table, use: SELECT reltuples FROM pg_class r WHERE relkind = 'r' AND relname = 'mytable'; where mytable is the name of the table in question. Simple Count Triggers If you must have an accurate count and using count(*) is prohibitive, then you may want to consider spreading the cost of maintaining the count through triggers. This technique involves creating an INSERT TRIGGER that increases the count and a DELETE TRIGGER which decrements the count. The count can be stored in a separate table. Create a table called, for instance, row_counts. row_counts would consist of the table name (relname) and the row count for that table. First you would create table, then the triggers and then initialize the row_counts table. CREATE TABLE row_counts ( relname text PRIMARY KEY, reltuples numeric); Being properly lazy, I've written one trigger function which handles both the insert and delete cases on the table. This is easy to do using the TG_OP constant which says what the operation is and the TG_RELNAME constant which contains the table name. These are trigger specific contants. And thank goodness, once more, for dollar quoting. CREATE OR REPLACE FUNCTION count_trig() RETURNS TRIGGER AS $$ DECLARE BEGIN IF TG_OP = 'INSERT' THEN EXECUTE 'UPDATE row_counts set reltuples=reltuples +1 where relname = ''' || TG_RELNAME || ''''; RETURN NEW; ELSIF TG_OP = 'DELETE' THEN EXECUTE 'UPDATE row_counts set reltuples=reltuples -1 where relname = ''' || TG_RELNAME || ''''; RETURN OLD; END IF; END; $$ LANGUAGE 'plpgsql'; Still being lazy I wrote a function to add the triggers to all of my tables in the public schema. You may or may not want to do this. If you do, I suggest writing a corresponding drop trigger function (which is left as an exercise for the reader :). CREATE OR REPLACE FUNCTION add_count_trigs() RETURNS void AS $$ DECLARE rec RECORD; q text; BEGIN FOR rec IN SELECT relname FROM pg_class r JOIN pg_namespace n ON (relnamespace = n.oid) WHERE relkind = 'r' AND n.nspname = 'public' LOOP q := 'CREATE TRIGGER ' || rec.relname || '_count BEFORE INSERT OR DELETE ON ' ; q := q || rec.relname || ' FOR EACH ROW EXECUTE PROCEDURE count_trig()'; EXECUTE q; END LOOP; RETURN; END; $$ LANGUAGE 'plpgsql'; Initialize the row_counts table immediately after a vacuum using: insert into row_counts select relname, reltuples from pg_class; This creates a small window of error, however. Any transactions completed between the vacuum and the creation of the row_counts table and triggers would be missed. To be absolutely sure of the counts, you must also stop activity on the server. Vacuum the tables explicitly if you can. I found a case where the count was off by one on a quiescent database after a normal vacuum. It corrected itself after an explicit "vacuum mytable". Even though you should run vacuum for this case explicitly on each table, if for any reason you are not sure of when your vacuum has run, you can write a function to do the count(*) initialization instead of using vacuum's counts to start off with. This function is slower and can still be off by some if your database is active during this time. CREATE OR REPLACE FUNCTION init_row_counts() RETURNS void AS $$ DECLARE rec RECORD; crec RECORD; BEGIN FOR rec IN SELECT relname FROM pg_class r JOIN pg_namespace n ON (relnamespace = n.oid) WHERE relkind = 'r' AND n.nspname = 'public' LOOP FOR crec IN EXECUTE 'SELECT count(*) as rows from '|| rec.relname LOOP -- nothing here, move along END LOOP; INSERT INTO row_counts values (rec.relname, crec.rows) ; END LOOP; RETURN; END; $$ LANGUAGE 'plpgsql'; This function selects the table names from the public schema from the pg_class table. For each table it finds, it executes a count(*) and stores the result in the row_counts table. The FOR crec IN EXECUTE...LOOP is required in order to get the result from an EXECUTE statement. This is the common workaround for that lack of functionality. Putting it all together, this is the order of events: Create row counts table

Create trigger function

Stop server activity if possible

Vacuum tables -- cannot be done in a transaction

In a transaction Add triggers Initialize row counts

From then on, you should be able to see your current row counts by selecting the table name at any time from the row_counts table. Contributors: elein at varlena.com