Date Mon 16 Nov 2015 Tags PostgreSQL / monitoring / debugging Lang ru

People having experience with commercial RDBMS are used to have the ability to answer the question “What a particular session is doing right now?” Or even “What was that session waiting 5 minutes ago?” For a long time PostgreSQL did not have such diagnostic tools and DBAs used to get out with different ways of sophistication. I gave a talk on pgday.ru (in Russian) about how we do it. This talk was collaborative with Ildus Kurbangaliev from PostgrePro. And Ildus was just speaking about tool that allows to answer questions above.

Strictly speaking it is not the first try to implement what people used to call wait [events] interface, but all previous attempts were not brought to some reasonable state and died as proof of concept patches. But pg_stat_wait is currently available as a set of patches to current stable 9.4 branch and currently developing 9.6 (actual versions should be looked at pgsql-hackers@).

After quite long testing and fixing bugs we even deployed them to production.

Installation

Before it all becomes part of core PostgreSQL you need to recompile postgres. I think description of rebuilding as ./configure && make && sudo make install is meaningless — much better to look into documentation.

After it you should add pg_stat_wait to shared_preload_libraries . Additionally, you can add following options to postgresql.conf :

waits_monitoring = on - enabling functionality on,

- enabling functionality on, pg_stat_wait.history = on - storing history of wait events,

- storing history of wait events, pg_stat_wait.history_size = 1000000 - number of last events to keep in history,

- number of last events to keep in history, pg_stat_wait.history_period = 1000 - how often should wait events be stored in history (ms).

After that you should restart PostgreSQL and make CREATE EXTENSION pg_stat_wait . After that everything will start working.

Capabilities

What exactly will start to work? First you may look at what is inside the extension:

rpopdb01g / postgres M # \dxS+ pg_stat_wait Objects in extension "pg_stat_wait" Object Description --------------------------------------------------------- function pg_is_in_trace ( integer ) function pg_start_trace ( integer , cstring ) function pg_stat_wait_get_current ( integer ) function pg_stat_wait_get_history () function pg_stat_wait_get_profile ( integer , boolean ) function pg_stat_wait_make_test_lwlock ( integer , integer ) function pg_stat_wait_reset_profile () function pg_stop_trace ( integer ) function pg_wait_class_list () function pg_wait_event_list () view pg_stat_wait_current view pg_stat_wait_history view pg_stat_wait_profile view pg_wait_class view pg_wait_event view pg_wait_events ( 16 rows ) rpopdb01g / postgres M #

Let’s see what wait events pg_stat_wait is able to monitor:

rpopdb01g / postgres M # SELECT version(); version --------------------------------------------------------------------------------------------------------------- PostgreSQL 9.4.5 on x86_64 - unknown - linux - gnu , compiled by gcc ( GCC ) 4.4.7 20120313 ( Red Hat 4.4.7 - 11 ), 64 - bit ( 1 row ) rpopdb01g / postgres M # SELECT class_name, count(event_name) FROM pg_wait_events GROUP BY 1 ORDER BY 2 DESC ; class_name | count ------------+------- LWLocks | 52 Storage | 9 Locks | 9 Network | 3 Latch | 1 CPU | 1 ( 6 rows ) rpopdb01g / postgres M #

You can see that waits monitoring for 9.4 knows about 52 LWLocks and for disk, for example, it can track next things:

rpopdb01g / postgres M # SELECT * FROM pg_wait_events WHERE class_id = 3; class_id | class_name | event_id | event_name ----------+------------+----------+------------ 3 | Storage | 0 | SMGR_READ 3 | Storage | 1 | SMGR_WRITE 3 | Storage | 2 | SMGR_FSYNC 3 | Storage | 3 | XLOG_READ 3 | Storage | 4 | XLOG_WRITE 3 | Storage | 5 | XLOG_FSYNC 3 | Storage | 6 | SLRU_READ 3 | Storage | 7 | SLRU_WRITE 3 | Storage | 8 | SLRU_FSYNC ( 9 rows ) rpopdb01g / postgres M #

Under “can track” the following is meant:

What and how long a particular process is waiting right now?

How many times a particular process hung in waiting of every event type and how much time did it spend waiting?

What was a particular process waiting some time ago?

For answering these questions there are pg_stat_wait_current , pg_stat_wait_profile , pg_stat_wait_history respectively. Best seen on the examples.

pg_stat_wait_current

rpopdb01g / postgres M # SELECT pid, class_name, event_name, wait_time FROM pg_stat_wait_current WHERE class_id NOT IN ( 4 , 5 ) ORDER BY wait_time DESC ; pid | class_name | event_name | wait_time -------+------------+---------------+----------- 23510 | LWLocks | BufferLWLocks | 17184 23537 | LWLocks | BufferLWLocks | 9367 23628 | LWLocks | BufferLWLocks | 9366 23502 | LWLocks | BufferLWLocks | 3215 23504 | LWLocks | BufferLWLocks | 2846 23533 | LWLocks | BufferLWLocks | 2788 23514 | LWLocks | BufferLWLocks | 2658 23517 | LWLocks | BufferLWLocks | 2658 23532 | LWLocks | BufferLWLocks | 2641 23527 | LWLocks | BufferLWLocks | 2507 23952 | Storage | SMGR_READ | 2502 23518 | Storage | XLOG_FSYNC | 1576 23524 | LWLocks | WALWriteLock | 1027 ( 13 rows ) rpopdb01g / postgres M #

We remove waits of classes ‘Network’ and ‘Latch’ because their waiting time is usually several orders of magnitude longer than waits of other classes. And listed above columns are not all columns that exist in the view:

smcdb01d / postgres M # SELECT * FROM pg_stat_wait_current WHERE class_id IN ( 2 , 3 ) LIMIT 2 ; - [ RECORD 1 ] ----------------------------- pid | 12107 sample_ts | 2015 - 11 - 16 10 : 36 : 59.598562 + 03 class_id | 2 class_name | Locks event_id | 4 event_name | Transaction wait_time | 24334 p1 | 5 p2 | 255593733 p3 | 0 p4 | 0 p5 | 0 - [ RECORD 2 ] ----------------------------- pid | 1266 sample_ts | 2015 - 11 - 16 10 : 36 : 59.598562 + 03 class_id | 3 class_name | Storage event_id | 0 event_name | SMGR_READ wait_time | 1710 p1 | 1663 p2 | 16400 p3 | 20508 p4 | 0 p5 | 220036 smcdb01d / postgres M #

Parameters p1 - p5 are text fields. For example, for heavy-weight locks they give approximately same information that you can see in pg_locks view and for disk I/O waits you can understand from which DB, relation and block we were waiting while reading.

pg_stat_wait_profile

For example, you can see how much time DB spent in each class of waits:

rpopdb01g / postgres M # SELECT class_name, sum(wait_time) AS wait_time, sum ( wait_count ) AS wait_count FROM pg_stat_wait_profile GROUP BY class_name ORDER BY wait_time DESC ; class_name | wait_time | wait_count ------------+--------------+------------ Network | 144196945815 | 11877848 Latch | 90164921148 | 3521073 LWLocks | 2648490737 | 10501900 Storage | 977430136 | 36444251 CPU | 68890774 | 365699457 Locks | 74 | 1 ( 6 rows ) rpopdb01g / postgres M #

Or which LWLocks are the hottest in the system:

rpopdb01g / postgres M # SELECT event_name, sum(wait_time) AS wait_time, sum ( wait_count ) AS wait_count FROM pg_stat_wait_profile WHERE class_id = 1 AND wait_time != 0 AND wait_count != 0 GROUP BY event_name ORDER BY wait_time DESC ; event_name | wait_time | wait_count ----------------------+------------+------------ LockMgrLWLocks | 1873294341 | 3870685 WALWriteLock | 1039279117 | 859101 BufferLWLocks | 299153931 | 7356555 BufFreelistLock | 7466923 | 75484 ProcArrayLock | 2321769 | 34355 CLogControlLock | 778148 | 21286 WALInsertLocks | 456224 | 7451 BufferMgrLocks | 107374 | 8447 XidGenLock | 84914 | 2506 UserDefinedLocks | 1875 | 7 CLogBufferLocks | 868 | 80 SInvalWriteLock | 11 | 3 CheckpointerCommLock | 1 | 1 ( 13 rows ) Time: 29.388 ms rpopdb01g / postgres M #

These two examples show that waiting time does not always correlate with wait events count. That’s why sampling without accounting waiting time can give not right the whole picture.

pg_stat_wait_history

This view allows to see what a particular process was waiting for in the past. Storage depth and sampling interval can be configured as shown above.

xivadb01e / postgres M # SELECT sample_ts, class_name, event_name, wait_time FROM pg_stat_wait_history WHERE pid = 29585 ORDER BY sample_ts DESC LIMIT 10 ; sample_ts | class_name | event_name | wait_time -------------------------------+------------+-----------------+----------- 2015 - 11 - 16 10 : 56 : 28.544052 + 03 | LWLocks | BufferMgrLocks | 983997 2015 - 11 - 16 10 : 56 : 27.542938 + 03 | LWLocks | CLogControlLock | 655975 2015 - 11 - 16 10 : 56 : 26.850302 + 03 | LWLocks | WALInsertLocks | 979516 2015 - 11 - 16 10 : 56 : 25.849207 + 03 | LWLocks | WALInsertLocks | 207418 2015 - 11 - 16 10 : 56 : 24.848059 + 03 | LWLocks | WALInsertLocks | 923916 2015 - 11 - 16 10 : 56 : 23.846909 + 03 | LWLocks | WALInsertLocks | 753185 2015 - 11 - 16 10 : 56 : 22.845808 + 03 | LWLocks | WALInsertLocks | 877707 2015 - 11 - 16 10 : 56 : 21.844718 + 03 | LWLocks | WALInsertLocks | 778897 2015 - 11 - 16 10 : 56 : 20.843562 + 03 | LWLocks | CLogControlLock | 991267 2015 - 11 - 16 10 : 56 : 19.842464 + 03 | LWLocks | CLogControlLock | 1001059 ( 10 rows ) xivadb01e / postgres M #

Session tracing

All described above views are designed to be always turned on, their performance overhead is minimal. But there are cases when sampling once in pg_stat_wait.history_period is not enough and you need to see all waits of a particular process. In that case you should use functions for tracing, for example:

rpopdb01g / postgres M # SELECT pg_backend_pid(); pg_backend_pid ---------------- 5399 ( 1 row ) rpopdb01g /postgres M # SELECT pg_start_trace(5399, '/ tmp / 5399 . trace ' ); INFO: 00000 : Trace was started to: /tmp/ 5399 . trace LOCATION: StartWait , wait . c:259 pg_start_trace ---------------- ( 1 row ) rpopdb01g / postgres M # SELECT pg_is_in_trace(5399); pg_is_in_trace ---------------- t ( 1 row ) -- some activity rpopdb01g / postgres M # SELECT pg_stop_trace(5399); INFO: 00000 : Trace was stopped LOCATION: StartWait , wait . c:265 pg_stop_trace --------------- ( 1 row ) rpopdb01g / postgres M #

A simple text file would be created where there would be two lines for each wait event, for example:

start 2015 - 11 - 16 11 : 17 : 26.831686 + 03 CPU MemAllocation 0 0 0 0 0 stop 2015 - 11 - 16 11 : 17 : 26.831695 + 03 CPU start 2015 - 11 - 16 11 : 17 : 26.831705 + 03 LWLocks BufferLWLocks 122 1 0 0 0 stop 2015 - 11 - 16 11 : 17 : 26.831715 + 03 LWLocks start 2015 - 11 - 16 11 : 17 : 26.831738 + 03 Network WRITE 0 0 0 0 0 stop 2015 - 11 - 16 11 : 17 : 26.831749 + 03 Network start 2015 - 11 - 16 11 : 17 : 26.831795 + 03 Network READ 0 0 0 0 0 stop 2015 - 11 - 16 11 : 17 : 26.831808 + 03 Network start 2015 - 11 - 16 11 : 17 : 26.831825 + 03 Storage SMGR_READ 1663 13003 12763 0 13 stop 2015 - 11 - 16 11 : 17 : 26.831844 + 03 Storage

Instead of conclusion

Wait interface is the long-awaited feature in PostgreSQL which allows significantly improve the understanding of what is happening inside the database. Right now this functionality is kicked into core PostgreSQL so that starting from 9.6 you would not need to recompile postgres.

Just in case, shortly before Ildus submitted his implementation on pgsql-hackers@ Robert Haas proposed the same idea and lots of people supported this idea. To become it true a couple of preparatory patches have already been commited, for example Refactoring of LWLock tranches.

I really hope that it will become part of PostgreSQL in 9.6.