This blog is about my work on the Postgres open source database, and is published on Planet PostgreSQL. PgLife allows monitoring of all Postgres community activity.

Online status:

Unread Postgres emails:

Email graphs: incoming, outgoing, unread, commits (details)

Monday, December 31, 2012

I mentioned previously that I have been working on improving pg_upgrade performance for Postgres 9.3. A few days ago I committed a new patch to allow multiple cpus to dump/restore databases in parallel. Since dump/restore of database schema takes the majority of execution time when using --link mode, this provides a significant speedup:

-- seconds -- tables git w/patch 1000 22.29 18.30 2000 30.75 19.67 4000 46.33 22.31 8000 81.09 29.27 16000 145.43 40.12 32000 309.39 64.85 64000 754.62 108.76

While users with a single large database will see no benefit from using the new --jobs parameter, those with many databases will see significant improvement. This is an example of the type of parallelism I am hoping we can add to Postgres in the coming years.

Friday, December 28, 2012

After my previous blog post about nulls, you might have decided to avoid null completely by never assigning them to your database columns. While this will reduce the number of nulls in your database, it will not eliminate them because various sql commands generate nulls as well. With the new realization that null are unavoidable and you are going to have to understand them, let's look at how nulls get generated. The first method is explicit null generation:

SELECT NULL; ?column? ---------- \pset null (null) SELECT NULL; ?column? ---------- (null)

(I will use \pset null in all future queries so nulls generate a visible value.) Above I am explicitly asking for a null value. This value can be assigned into a database field:

CREATE TABLE nulltest (x INTEGER, y INTEGER); INSERT INTO nulltest VALUES (1, NULL); SELECT * FROM nulltest; x | y ---+-------- 1 | (null)

For our first example of implicit null assignment, the query below does not specify a value for the y column with has no default value, so an implicit null is assigned:

INSERT INTO nulltest (x) VALUES (2); SELECT * FROM nulltest; x | y ---+-------- 1 | (null) 2 | (null)

Of course, there are many other ways nulls can be generated — outer joins can generate nulls for columns of non-joined tables. Aggregates and window functions can also generate null values, which might be part of an insert into … select which gets inserted into database columns.

All storage of nulls is easily prevented by using a not null specification when creating each column:

CREATE TABLE nulltest2 (x INTEGER NOT NULL, y INTEGER NOT NULL); INSERT INTO nulltest2 VALUES (3, NULL); ERROR: null value in column "y" violates not-null constraint DETAIL: Failing row contains (3, null). INSERT INTO nulltest2 (x) VALUES (4); ERROR: null value in column "y" violates not-null constraint DETAIL: Failing row contains (4, null).

So, even if you never type "null", you can get nulls into your database. The use of not null when creating columns is recommended, especially for numeric columns that should contain only non-null values. Ideally you could have not null be the default for all columns and you would specify null for columns that can contain nulls, but that is not supported. If a column is specified as the primary key, the column is also internally not null.

Wednesday, December 26, 2012

Sql nulls were designed to simplify modeling the real world in relational databases, but the debate about their usefulness has never been resolved. Specifically, nulls were designed to represent:

unknown values inapplicable values empty placeholders

For example, a character field employee.spouse could be null to represent the spouse's name as unknown (#1) or the employee as unmarried (no spouse) (#2). Alternatively a query could have generated a spouse column with no value (#3), e.g., via an outer join. Without nulls, fields needing such values have to resort to special values like zero-length strings or "n/a" for strings, "0" or "-99" for numerics, or "1901-01-1" for dates. Obviously, as complex as null handling is, using special values with little documentation and wide variability between applications is even worse.

Using nulls in a database is like using flat-nose pliers to do repairs. Flat-nose pliers are notorious for chewing up nuts and mangling everything it touches. If your first tool of choice is flat-nose pliers, something is wrong. However, there are a small percentage of jobs where flat-nose pliers are ideal, so the trick is knowing when flat-nose pliers make sense, and when they don't. Nulls are similar — as I stated above, there are valuable uses for nulls, but if their use is not controlled, they can cause havoc in relational systems, and these "surprises" are often the basis of people trying to avoid nulls completely.

In summary, here is some text from Joe Celko's SQL for Smarties: Advanced SQL Programming:

In their book A Guide to Sybase and SQL Server, David McGoveran and C. J. Date said: "It is this writer's opinion than NULLs, at least as currently defined and implemented in SQL, are far more trouble than they are worth and should be avoided; they display very strange and inconsistent behavior and can be a rich source of error and confusion. (Please note that these comments and criticisms apply to any system that supports SQL-style NULLs, not just to SQL Server specifically.)" … In the rest of this book, I will be urging you not to use them, which may seem contradictory, but it is not. Think of a NULL as a drug; use it properly and it works for you, but abuse it and it can ruin everything. Your best policy is to avoid NULLs when you can and use them properly when you have to.

In future blog entries, I will cover the places nulls often appear, and many of the surprises to avoid.

Tuesday, December 18, 2012

The B-52 Stratofortress was designed in the years after World War II to serve as a heavy bomber for the Cold War and took its maiden flight in 1952. The Cold War came and went, but the B-52 is still in active military service — 94 of the 102 H-model B-52 aircraft built in the 1960's remain on duty.

This year marks the sixtieth year of B-52 operation — that is a laudable goal for any piece of technology, but even more so considering the demanding reliability and performance requirements for military hardware. The plan is for the aircraft to remain in service through 2044. This 2002 article has some relevant quotes:

In an era when a 9-month-old laptop already feels retro, when people who keep their cars for six years are considered quaint, the fact that the most powerful military in the world relies on a fleet of 40-year-old bombers [now 50] is pretty astonishing.

And the B-52 is not some creaky relic that the military keeps around for air shows. The planes have seen more combat in this decade than in the previous three decades combined.

If the B-52 does remain in service for 80 years, it will be like using a weapon from the Civil War to win World War II.

"Right now, there are three generations of pilots who have flown that plane -- grandfather, father, and son -- in the same family. If it lasts until 2040 [now 2044], five generations will have flown the same plane."

So, how does the B-52 relate to Postgres? Ignoring the B-52's destructive purpose, there are some similarities:

Both were developed by the U.S. Defense Department (Postgres was initially funded by darpa

Both were developed decades ago and have remained in use far longer than expected

Both are easily enhanced with new technology, which has increased their longevity

Both operate in demanding environments where reliability and performance are critical

Both are inexpensive, compared to the alternatives

Both are covered on Slashdot

The B-52, being a machine, eventually will wear out and need to be replaced. Postgres, being software, never wears out — it could go on forever, but right now, let's just plan for plan for 2044, and we can reevaluate then.

Update: electronics upgrade 2014-05-27

Update: B-52 longevity 2015-12-18

Friday, December 14, 2012

Having just reported a rare but serious bug in pg_upgrade ( ), I wanted to address the meticulous effort the Postgres community takes in developing software. While most end-users just see the software releases and the release notes, there is obviously a huge amount of effort that goes into to creating those releases. Postgres backend developers are acutely aware of that effort, as every patch they submit receives detailed scrutiny to detect flaws or potential improvements.

Databases, like operating systems, are held to a high standard for reliability because failure often causes downtime for organizations. Also, databases lock user data into their software beyond many operating system use cases. Again, another reason that database server work requires serious focus.

However, the focus on reliability can't be borne alone by the database server developers. Users must also provide a reliable platform to run database software. Postgres can't maintain high reliability if it is dependent on an unreliable platform. Here are some of the things users can do to provide a reliable platform for Postgres:

Choose an operating system whose focus is on reliability, rather than desktop user experience or supporting the latest hardware or software (discussion thread)

Choose hardware designed for 24-hour operation in a demanding environment, not desktop hardware

Use hardware that reports errors, like ecc smart

Make sure your storage system performs reliable writes

Have on-site and off-site backups for cases when disasters happen

Educate administrative staff so mistakes don't cause downtime

The Postgres team does its best to provide a highly-reliable database. We hate to hear about database problems caused by running Postgres on unreliable platforms. Though we are not perfect, we do our best to be as reliable as possible — please do your part to see that the platform you use is also reliable.

With many companies committing to Postgres in the past year, Postgres has earned a reputation for high reliability. With our open source development model and the development team growing, Postgres has the potential to set a new standard for database reliability, and with everyone's help, we can make that happen.

Friday, December 14, 2012

Postgres, since version 8.2, has supported the ability to create indexes without blocking writes (insert, update, or delete) on the table being indexed. This is done by adding the keyword concurrently to the create index command. (Reads are never blocked by create index.)

Implementing this feature was very complex, and it has been recently discovered that all versions of pg_upgrade have a bug related to processing such indexes. The problem revolves around the rare case when create index concurrently fails. When this happens, an invalid index remains in the system. The invalid index is incomplete and is not used by queries. The pg_upgrade problem is that pg_dump dumps it as a valid index. While a normal restore of pg_dump output would create a valid index, pg_upgrade moves the old index file into place without recreating it — hence, an invalid index file is upgraded as a valid index. In this case, the index could return invalid results or crash the server. The steps necessary to trigger this bug are:

Use of create index concurrently

Failure of create index concurrently

Not dropping or recreating the index after failure

Use of pg_upgrade without the patch applied on December 11, 2012. The patch uses this query to throw an error if invalid indexes are found before the upgrade begins ( pg_upgrade --check also performs this check):

SELECT n.nspname, c.relname FROM pg_catalog.pg_class c, pg_catalog.pg_namespace n, pg_catalog.pg_index i WHERE (i.indisvalid = false OR i.indisready = false) AND i.indexrelid = c.oid AND c.relnamespace = n.oid AND n.nspname != 'pg_catalog' AND n.nspname != 'information_schema' AND n.nspname != 'pg_toast'

This bug remained unreported for so long because concurrent index creation is not common, and leaving invalid indexes in place is also rare.

The next set of minor Postgres releases will throw an error if invalid indexes exist in the old cluster — until then, users can run the above query manually in all databases to check for invalid indexes before using pg_upgrade. For those who have already used pg_upgrade, you can check if invalid indexes were improperly upgraded by running the query in the old cluster. Unfortunately, if the old cluster is not accessible, there is no known way to check for the existence of such indexes in the new cluster — reindex can be used to recreate any suspect indexes.

Friday, December 7, 2012

I already covered how to set Postgres configuration variables at various levels, and the power of reloading the postgresql.conf configuration file.

While setting configuration variables at the SQL level is very easy to program, modifying the postgresql.conf file programatically can be more complex. (The postgresql.conf file exists in the cluster data directory.) Some administrators use sed or perl to modify the file directly, but a simpler solution is to use the postgresql.conf include directive:

include 'filename'

By adding this to the bottom of the postgresql.conf file, any settings in filename replaces values set earlier, such as in postgresql.conf.

Once this is set up, programs can write into the include file to set any variables they wish. After the write, run pg_ctl restart for variables requiring a restart (labeled with "change requires restart" in postgresql.conf), or pg_ctl reload for other variables. Truncating the include file removes its settings.

The Postgres development team is discussing adding an SQL interface for changing postgresql.conf values, and it will use a similar mechanism.

As an example of why you might want to programatically control postgresql.conf, you might want modify autovacuum's settings to increase its activity during quiet periods, and revert settings during normal load.

Thursday, December 6, 2012

Having returned to blogging after a two-month break, I wanted to point out two new things on my blog site. First, months ago, I started categorizing my blog posts — the categories appear as small images next to the blog titles.

This has allowed me to automatically create a category index of all my posts. (The index is accessible by clicking on View Index at the top of my blog detail page, and then choosing View Category Index.) For example, the page collects all my tips together for easy review. (See "Tip" at the bottom of the page.) I wanted to mention this because those tips have been collected from discussions in many of my training classes, and might be helpful for new users. My blog posts about conferences, pg_upgrade, and performance are similarly organized.

Second, my old commenting service shut down, so I have migrated to a new blog commenting service. I was able to import all 362 comments into the new service, so nothing was lost.

Wednesday, December 5, 2012

I began working on pg_upgrade (previously called pg_migrator) in 2009 after my employer EnterpriseDB changed the license of their binary migration tool to BSD and instructed me to work on improving it. My initial goal was to produce a reliable and popular method for binary upgrades (avoiding data dump/restore). That goal was achieved in 2011, and since then I have been focusing on usability and performance enhancements.

One repeated request has been for better performance, especially for databases with many tables. You might wonder, "Isn't pg_upgrade already faster than dumping/reloading the database?" Yes, it is, but that doesn't mean it can't be made even faster.

After receiving scattered reports of link-mode upgrades taking 45 minutes or more, I dug in and made some major performance improvements in Postgres 9.2. However, that wasn't sufficient, and I got slowness reports from users using pg_upgrade 9.2. Therefore, I have been back on pg_upgrade performance duty this past month, and I got good results!

#tables ---------- Magnetic Disk ---------- -- SSD -- 9.0 9.1 9.2 9.3 9.3 1 19.87 18.46 14.66 13.91 11.13 1000 187.05 175.72 100.87 25.63 20.38 2000 377.48 340.35 190.76 76.78 27.05 4000 923.04 752.78 370.06 164.39 47.40 8000 2293.15 1670.37 1088.70 226.19 74.25

Pg_Upgrade performance for Postgres 9.0 and 9.1 are similar because there was little focus on performance during those releases. Pg_Upgrade 9.2 shows a doubling in performance, and 9.3 will have another 2-4x improvement. Notice that SSDs still provide a major performance boost.

There are still a few more pg_upgrade performance improvements that might make it into the final 9.3 release.

Tuesday, December 4, 2012

I attended two interesting conferences in October. The first was the Highload ++ conference in Moscow. The two-day conference had one full day of Postgres talks, thanks to dedicated work by Nikolay Samokhvalov. The conference has grown significantly in recent years, and I am hopeful Postgres will continue to play a significant role.

The second was the PostgreSQL Conference in Prague, which had record attendance of 291. The conference had many highlights for me — it opened with a keynote by Joe Celko.

Jonathan S. Katz's presentation, Marketing PostgreSQL (slides), inspired me because he accurately portrayed the methodical, results-driven approach organizers used to grow the New York PostgreSQL User Group. Rather than do what was easy, they analyzed what efforts would lead to significant user group growth, and pursued them consistently.

Antonin Houska's talk, pg_xnode - Alternative implementation of XML (slides), showed a method for storing XML data in binary format for easy indexing. This interested me because it addressed a general problem of how to efficiently store structured data types. Right now XML and JSON data types are stored as text strings, and this limits performance for some use cases.

The talk, Migrating Oracle queries to PostgreSQL (slides), got my attention because, while Oracle migration is very popular, I rarely see presentations on the topic. The presentation Full-text search in PostgreSQL in milliseconds (slides) promises dramatic full-text performance improvements.

Of course, my wife attending both conferences with me was the greatest highlight.

Tuesday, October 2, 2012

During my attendance at the surge conference last week, some people asked me about the new JSON support in Postgres 9.2. As I started to explain it, I realized that the description was quite complicated, and I agreed to post a blog entry about it — so here it is. (Linux Weekly News also has an article that summarizes the 9.2 JSON features.)

The JSON features added in Postgres 9.2 were not major:

Create a JSON data type that stores JSON as a text string, and checks that the string is valid JSON.

Add two support functions that allow arrays and row values to be converted to JSON (examples).

Seems pretty simple, and not a great leap forward. However, as with many things in Postgres, it is the coupling of these features with Postgres extensions and externally-developed features that really makes a must-have feature-set.

First, there is the problem that the JSON value is stored as text, limiting the ability to quickly find matching values. (This is the same way Postgres stores XML.) This was done to avoid the overhead of converting JSON to/from an indexed storage format. There are several solutions to this problem:

Postgres's full text search capability allows JSON tokens to be indexed and quickly retrieved. There is no record of which tokens are keys and which are values, so the output must still be filtered using Javascript, but it certainly allows fast indexing.

Postgres's hstore module allows storage of key/value pairs, so non-nested JSON structures can be converted to hstore pairs.

Added to that, there is the stored procedure language PLV8 (summary), which allows Javascript programs to be run inside the database. This Stackoverflow post goes into details about indexing possibilities, particularly expression indexes based on PLV8 function calls. You can actually write stored procedure functions in any language that understands JSON; this example is in PL/Perl.

As you can see, a validated JSON data type allows tools to then know the value is JSON, and operate on it in powerful ways.

Wednesday, September 19, 2012

Last week I gave a presentation about Postgres 9.2 features. Unfortunately, my presentation lacked details about the new range types feature, mostly because I have been unable to focus on the feature to fully understand it. People who have studied it have sung its praises, and that was good enough for me. This is similar to the guarded excitement about integrated full text search and writable common table expressions (CTE), whose impact wasn't fully understood at release time.

Fortunately, yesterday Jonathan Katz gave a great presentation explaining the new range type features; the features include:

A new data type that stores the start and stop values of a range in a single column

Operators that allow simple comparison of ranges, including overlaps, contains, and intersection; this avoids complex range comparison logic in the application

Index support, which greatly improves range query performance

With these features, Jonathan was able to clearly show the usefulness of range types for an event planning application. Jeff Davis, the author of range types, also gave a tutorial showing people how to use range types for the temporal recording of company asset usage. Both presentations were recorded so hopefully the videos will be online soon.

While range types are mostly considered for temporal data, even measurement data can benefit. Most measurements are not exact, e.g., 72°F degrees is not really 72°F, but rather 72°F plus-or-minus some amount, depending on the accuracy of the measurement device. While many applications handle this by querying for a range of values, e.g., 71.5° to 72.5°, assuming 0.5° precision, this does not work well if the column contains values of varying precision. With range types, these measurement ranges are easily stored in the database and queried.

Speaking of full text search, Dan Scott gave a great introductory talk about how to use full text search. Again, some Postgres features are so complex that watching someone explain them is often the most efficient way of getting started with new features.

Wednesday, September 12, 2012

Last night, Josh Berkus did a great job of presenting the Postgres 9.2 features to the SFPUG (slides, source). The 75-minute video is now online. I also have a video of 9.2 features (registration required) that is more focussed on the 9.2 release notes.

I am planning to use Josh's slides at tonight's inaugural Philadelphia Postgres Users Group meeting.

Tuesday, September 4, 2012

In 1997, as part of learning the Postgres backend source code, I created an image of the backend source code flow and added hot-links to the image so users could click on a section and get detailed information, including a link to the actual source code. This was on the Postgres website for many years, but because of website changes and the switch to git, it was removed.

After recent requests, I have redone the flowchart and moved it out of our source tree; it now exists on the main Postgres website, linked to from the Developers/Coding section. The detail text has been moved to a Postgres wiki page, to allow for easier community improvements.

So, if you were ever curious about how Postgres works, head on over to that web page and start clicking.

Wednesday, August 29, 2012

The Postgres conference offerings in the coming months are looking bright:

Update: My apologies for omitting Paris, which I have just added. 2012-09-02

Wednesday, August 22, 2012

I previously explained the ability to set Postgres configuration variables at different levels. In this blog entry, I would like to explain how changes at the top level, postgresql.conf, propagate to running sessions.

The postgresql.conf file is usually stored at the top of the pgdata directory, though it can be relocated. The most simplistic way to modify the file is to open it with a text editor. (Tools like pgAdmin allow file modifications via a gui.)

Once the file has been modified, you must signal that the configuration file should be reloaded and your modifications applied. There are three methods to signal this:

send a sighup signal the postmaster process, or sighup only individual backends

signal the postmaster process, or only individual backends run pg_ctl reload from the command-line

call the sql function pg_reload_conf()

Several settings cannot be changed in this way — they are flagged with the comment "change requires restart". A warning message will appear in the server log if you attempt to change one of these parameters:

LOG: received SIGHUP, reloading configuration files LOG: parameter "shared_buffers" cannot be changed without restarting the server LOG: configuration file "/u/pgsql/data/postgresql.conf" contains errors; unaffected changes were applied

For settings that can be changed via reload, it is obvious that new sessions will get updated values, but what about existing sessions? Existing sessions get those changes too, as soon as they complete their transactions:

BEGIN WORK; SHOW work_mem; work_mem ---------- 1MB -- work_mem is changed in postgresql.conf, and pg_ctl reload is run COMMIT; SHOW work_mem; work_mem ---------- 2MB

As you can see, with no client action, work_mem has changed from 1MB to 2MB. This allows you to change things like log_statement and have the changes take effect without waiting for all sessions to restart. While this propagation to running sessions works for postgresql.conf changes, it does not work for alter role/database-level changes — they require a client reconnection to take effect.

I hope this blog post explains how to make effective configuration changes to Postgres.

Tuesday, August 21, 2012

My conference schedule has solidified and I am presenting at events in Philadelphia, Chicago, Moscow, and Prague during the next two months.

Monday, August 13, 2012

All Postgres servers support tcp/ip connections, including localhost connections that allow clients to connect to servers on the same machine. Unix-like operating systems also support local or Unix-domain socket connections. These connections do not use the tcp/ip stack but rather a more efficient stack for local connections. (I previously showed that Unix-domain socket communication is measurably faster.)

Unix-domain socket connections require a socket file in the local file system. These are not normal files but more like entry points to listening servers. By default, Postgres places these socket files in the /tmp directory:

srwxrwxrwx 1 postgres postgres 0 Jul 30 20:27 .s.PGSQL.5432=

Notice the file permissions section starts with an 's', and the file size is zero — these are always true for socket files. Clients can connect to this socket file and then communicate with the Postgres server listening on port 5432. (5432 is the unique number used to identify the server on the local machine, and is the same for tcp/ip and Unix-domain socket connections.)

The default directory for the Unix-domain socket is /tmp, though there has always been concern about this socket location. Postgres does not require root privileges, so by default it must locate the socket file in a directory writable by normal users, and /tmp is the logical directory to use. Some installers that have elevated permissions place the socket file in secure directory, like /var/run — this requires modification of the unix_socket_directory setting, and the socket directory must be specified by clients making Unix-domain socket connections. Packagers who change the default socket directory modify the server and client defaults so no special configuration is required by users.

The Unix-domain socket location is one of those settings that the community would like to make more secure, but can't without requiring elevated permissions for installations. It is only possible for packagers or end-users, who potentially have access to elevated permissions, to change this default.

Wednesday, August 8, 2012

Hard-coding database connection parameters in application code has many downsides:

changes require application modifications

changes are hard to deploy and customize

central connection parameter management is difficult

Libpq does support the setting of connection parameters via environment variables, and this often avoids many of the down-sides of hard-coding database connection parameters. (I already covered the importance of libpq as the common Postgres connection library used by all client interfaces except jdbc, Npgsql, and a few less-common drivers.)

However, there is another libpq feature that makes connection parameter sharing even easier: pg_service.conf. This file allows you to name a group of connection parameters and reference the parameters by specifying the name when connecting. By placing this file in a network storage device, you can easily centrally-control application connections. Change the file, and every new database connection sees the changes. While you can store passwords in pg_service.conf, everyone who can access the file can see those passwords, so you would probably be better off using libpq's password file.

Monday, August 6, 2012

You might already be aware that Postgres updates the process title of all its running processes. For example, this is a Debian Linux ps display for an idle Postgres server:

postgres 2544 2543 0 10:47 ? 00:00:00 /u/pgsql/bin/postmaster -i postgres 2546 2544 0 10:47 ? 00:00:00 postgres: checkpointer process postgres 2547 2544 0 10:47 ? 00:00:00 postgres: writer process postgres 2548 2544 0 10:47 ? 00:00:00 postgres: wal writer process postgres 2558 2544 0 10:47 ? 00:00:01 postgres: autovacuum launcher process postgres 2575 2544 0 10:47 ? 00:00:02 postgres: stats collector process

You can see the postmaster, and its helper processes. "postgres:" indicates the name of the executable. You can also see session information:

postgres 6564 2544 0 18:27 ? 00:00:00 postgres: postgres test [local] idle postgres 6860 2544 9 18:28 ? 00:00:00 postgres: demo template1 [local] SELECT

The first session shows an idle user postgres connected to database test; the second shows user demo doing a select in the template1 database. These updates are controlled by update_process_title, which is enabled by default.

Another features is that tools like top can also display this status information, though such display is often disabled by default. For top, using the -c flag or typing c will display the process status — this is often helpful in obtaining information about a running session, perhaps one that is consuming much cpu or memory.

While the statistics views give in-depth reporting of database activity at the sql level (particularly pg_stat_activity), monitoring Postgres at the command line is also useful. It allows database information to be integrated with operating system monitoring tools to offer superior analysis about how the database is interacting with system resources.

Friday, August 3, 2012

During a recent conference, it was pointed out to me that there are contradictory recommendations about the use of caching on storage devices. For magnetic disk drives, it is pretty clear — if the cache is volatile (which it usually is on magnetic disk drives), switch the drive to write-through mode so all writes go to durable storage, i.e., the magnetic platters. The Postgres manual has a detailed description of how to do this.

For solid-state drives (ssds), things are more complicated. If the cache is volatile, you should switch it to write-through mode. but be aware that this will slow writes and decrease the life-time of the drive, but it is the only durable solution. If the cache is non-volatile, you definitely don't want to do this as this will give you all the negative aspects listed above, and not improve durability, which is already guaranteed because the cache is non-volatile.

So, turn the drive write cache? Turn it off? Hopefully this helps explain which one is appropriate.

Wednesday, August 1, 2012

Have you ever run benchmarks that topped out at 250 Transactions per Second (tps)? What about 120 or 90? If so, you might have been stumped about why you consistently hit these numbers and could never go higher.

These numbers are common rotational latency numbers for hard disks. Specifically, 250, 120, and 90 represent rotational latencies for 15k, 7.2k, and 5.4k rpm drives, respectively. Basically, if you have no durable cache between the cpu and your magnetic disk drives, this is the fastest you can durably perform write transactions. Of course, you can reduce durability requirements, and sometimes writes can be combined into a single disk write, but in general, disk drive rotational latency can be a significant limiting factor for any production server that lacks a durable cache layer, e.g., bbu (Battery-Backed Unit)>, non-volatile drive cache. Having a durable cache layer can improve write transaction performance 10 or 100-times.

Monday, July 30, 2012

How available do you want your services? A lot? Downtime is never desirable, so you want zero downtime, right? OK, that's impossible, so what is reasonable, and how much are you willing to pay for it?

Uptime is often measured in 9's, meaning how many nines in a percentage of uptime is acceptable? 99%, 99.9%, 99.99%, or even 99.95% if you want to be more precise. Of course, this is the desirable uptime — there will be years you are 100% available, then years you will fall far short of your goal.

Let's look at some uptime percentages:

Uptime Outage Duration per Year 99% 3.65 days 99.5% 1.83 days 99.9% 8.76 hours 99.95% 4.38 hours 99.99% 52.56 minutes 99.999% 5.26 minutes 99.9999% 31.50 seconds 99.99999% 3.15 seconds

If you have many read-only servers, some of those higher-nine numbers are possible, e.g., if 10% of your users are on a server that is down for 50 minutes, that might be only 5 minutes of downtime. If you have 10,000 servers, several of them could be down regularly while still maintaining high availability, assuming something doesn't happen that makes them all unavailable, e.g., network connectivity, power.

For database systems, some of those higher-nine numbers are pretty hard to reach. Unless you shard your data, writes are going to be centered on only a few servers, and keeping a few servers running constantly is a challenge, considering all the failures that can happen. Add to these failures the downtime associated with database server maintenance, like schema changes, software deployment, and upgrades, and things really get complicated. Postgres can help in some of these areas:

Transactional DDL helps with schema changes

Pg_upgrade facilitates rapid upgrades (and speed improvements are coming in Postgres 9.2 and later)

However, while Postgres has high reliability, high availability still requires serious planning. For example, pg_upgrade is still too slow for some users. There is also no easy way to upgrade the primary and have hot standby servers continue to function, because they are required to run the same major version as the primary. (The community is currently discussing possible solutions to reuse the majority of the standby data directory.) The overhead of periodic vacuum free operations also sometimes affects downtime by dramatically slowing servers.

Postgres will continue to improve its high-availability options as we gain more users with these requirements.

Thursday, July 26, 2012

Having just explained the value of using swap space, I would like to comment on memory overcommit. Last month, Tom Lane posted a very clear explanation of how copy-on-write and fork interact to require either excess swap space or unrecoverable process failure during high memory usage. This article explains historically how operating systems have dealt with this trade-off. (The Postgres documentation explains how to control the Linux out-of-memory (OOM) killer.)

In summary, this is another reason to have swap space, and to monitor it, because it supplies early warning that memory is in high demand. If you have no swap space, you can't monitor its usage, and you lose valuable information about the health of your system.

Wednesday, July 25, 2012

I have occasionally heard students in my training classes mention that they configure their servers with no swap space. The students have two reasons for this:

I have enough memory so I shouldn't be swapping

I don't want to swap because it is slow

Both of these reasons are inaccurate. Let's look at the first one, "I have enough memory so I shouldn't be swapping". While it is generally true that large memory systems never fully use their ram, it isn't optimal to avoid swap. Applications often have initialization data that can be written to swap so the memory can be used for other purposes. Kernel kernel hacker Andrew Morton once stated:

My point is that decreasing the tendency of the kernel to swap stuff out is wrong. You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful.

Without swap space, moving unused data to swap isn't an option. Also, if the system over-commits memory and must swap, the kernel will kill processes or panic if no swap is available — obviously not a desired outcome.

The second reason, "I don't want to swap because it is slow", is also wrong. Copying pages to swap is done by the kernel, and does not slow down applications. What does slow down applications is copying pages from swap, and if the kernel correctly guessed that the data is unreferenced, this is not a problem. By monitoring swap activity, both in and out, you will be alerted about high memory pressure. This article has extensive details about how swapping works, and how to monitor it, though free on Linux shows how much swap is in use, and vmstat has an all-important swap-in column (si).

So, how how much storage should you allocate for swap space? This web page has a complex formula for computing swap space, but any amount is probably sufficient, as long as you monitor it and increase it if necessary.

Tuesday, July 24, 2012

Ever need to generate random data? You can easily do it in client applications and server-side functions, but it is possible to generate random data in sql. The following query generates five lines of 40-character-length lowercase alphabetic strings:

SELECT ( SELECT string_agg(x, '') FROM ( SELECT chr(ascii('a') + floor(random() * 26)::integer) FROM generate_series(1, 40 + b * 0) ) AS y(x) ) FROM generate_series(1,5) as a(b);

This query generates the same data, but uses all non-control-character ascii characters:

SELECT ( SELECT string_agg(x, '') FROM ( SELECT chr(ascii(' ') + floor(random() * 94)::integer) FROM generate_series(1, 40 + b * 0) ) AS y(x) ) FROM generate_series(1,5) as a(b);

These random strings can be easily fed into a table:

CREATE TABLE dummy_data AS SELECT ( SELECT string_agg(x, '') FROM ( SELECT chr(ascii(' ') + floor(random() * 94)::integer) FROM generate_series(1, 40 + b * 0) ) AS y(x) ) FROM generate_series(1,5) as a(b); SELECT * FROM dummy_data; string_agg ------------------------------------------ :xD'|lg:vIKTKK$xWcoH34%3kZk1_6sz1:#/)kJ! 7vVbDZ\{@NETbKhP'UagkWc|qg.zUY|lRTPv1.sR ];(AfqqnHUX6.=5"'D}\}{KQR{Id,?8iZA-C4 4] Vltd,+g4OfqNd>"8;L}Gk73HY@lmA"MwnC^{NG1 0$NtBQ/^}.(kE;5 [$n|&>vuaVr2}%2/#a&L45-4

Monday, July 23, 2012

Most people know Heroku as a Platform as a Service (PaaS) provider — they supply a git server, website for your application, and an optional Postgres database (details). What is less well known is that Heroku also has a pure-database offering, more like Database as a Service (DBaaS). This is similar to creating your own Postgres cluster on Amazon Web Services (aws), except Heroku manages the database, and shares the costs. What is really interesting is that if you create an account, you can create free databases (of limited size) that will be always available (screenshot). (FYI, you can easily get stuck in the PaaS section of the Heroku website trying to return to the DBaaS section.)

In contrast, EnterpriseDB (my employer) also offers a cloud database product, but it is closer to a database cluster as a service (rather than just a database) because you control the entire database cluster, including cluster configuration, scaling options, and have full ssh access to the operating system (details). Unfortunately, their free trial lasts only 24 hours, probably because of the cost associated with Amazon instances.

Once you have created a free Heroku database, you can click on the database name to see database details, including instructions on how to connect to the database, like from psql (screenshot, screenshot; database subsequently destroyed). If you do a psql \l, you can see databases owned by other users:

resource50538=> \l List of databases Name | Owner | Encoding | Collate | Ctype | … ----------------+----------------+----------+-------------+-------------+- d105mgto5c11c7 | ddsdlejhgczbfc | UTF8 | en_US.UTF-8 | en_US.UTF-8 | d109cm8tvhs75q | tceospapvukfej | UTF8 | en_US.UTF-8 | en_US.UTF-8 | d12g5f0mud66ks | udetzhqknbnecf | UTF8 | en_US.UTF-8 | en_US.UTF-8 | d13bv78ucq0etq | amybaebhflzzsn | UTF8 | en_US.UTF-8 | en_US.UTF-8 | d13jdd2fqc9db0 | cmxkbwunzvuazo | UTF8 | en_US.UTF-8 | en_US.UTF-8 | …

Because Postgres uses global objects to record user and database names, Heroku has anonymized these identifiers.

So, if you want a free remote database for testing that is always available, try Heroku Postgres. Of course, you will still need a client, like psql, to connect. Heroku does have something called data clips which allows you to send live query output to users via a url.

Friday, July 20, 2012

I just finished attending oscon this week, and Matthew Soldo of Heroku gave a talk titled, PostgreSQL's place in the New World Order. (What's not to like about that title? )

The talk was interesting in its description of Postgres in the new world order of NoSQL and Cloud. He felt that Postgres's features like hstore and JSON (coming in Postgres 9.2) make it uniquely positioned to handle many NoSQL workloads. He felt Postgres's open source nature allowed more confident use of Postgres in public clouds without vendor lock-in.

Overall, it was an inspiring talk that highlighted Postgres's ability to adapt to new environments.

Thursday, July 19, 2012

I have received three phone calls in the last three days from job recruiters looking for Postgres employees. I usually get one call a month, which I think means the Postgres job market is heating up.

To help employers find Postgres talent, I have written a wiki page explaining how to post to the Postgres jobs group email list. If you are contacted by a recruiter, please mention this web page to them. I have added a suggestion to the bottom of my résumé page — hopefully that helps. If you are looking for Postgres employment, please subscribe to that email list.

Monday, July 9, 2012

Suppose a manufacturer produces a product and makes a profit, yet manufacturing generates greater social harm than the value of the product. How does the manufacturer stay in business? Because the manufacturing costs are distributed among many people and are not all paid by the company. This is called an external cost or externality and is commonly seen with pollution costs or natural resource depletion. It can also apply to database index creation.

Huh? Database index creation? Yes. Consider this case: you are troubleshooting a query; it is slow. It is slow because there is no index matching the query qualification, so you add one. The query is now much faster, so you are done. But are you? Your query is faster, but what about other queries? What about insert and some update operations? They now have to update that index you created — that has to take some time. How much? It is hard to measure, because it is distributed among many queries, and is probably small compared to the speed-up you saw in the query that prompted the index creation. But a lot of small slowdowns can easily surpass the speed-up you saw in that one query.

How can your measure all those small slowdowns? I don't know, but do know they exist, so be careful adding an index that has limited use — you might find that externalities make your system slower.

Friday, July 6, 2012

My blog post I Don't Need Backups, I Use RAID1! generated many comments at a blog aggregation site. The first comment has stuck with me — titled Dud, Flood, & Bud, it explains the three general areas of possible failure:

Hardware failure (Dud)

Complete hardware destruction (Flood)

User or administrator mistakes (Bud)

With these three words, it is very easy to remember why streaming replication is insufficient, and why continuous archiving or periodic logical backups are necessary, as well as off-site storage.

Thursday, July 5, 2012

Database application programming is never easy. Combine that with the use of server-side functions, and things can get quite complicated. Why use server-side functions? As Joe Conway said, "Sometimes it is useful to bring the data to the code, and other times it is useful to bring the code to the data."

Postgres minimizes the complexity of creating server-side functions by supporting server-side languages that are often used client-side: PL/Perl, PL/Python, PL/Ruby, PL/Java, PL/R, PL/TCL, C. In fact, there are 18 support server-side language. (You could argue that PL/pgSQL matches sql, especially Common Table Expressions, (ctes).)

This can allow near-seamless transfer of code to and from server-side functions, which allows seamless movement of code to wherever it can most easily access data. This is another Postgres database feature unavailable in many other data systems.

Tuesday, July 3, 2012

I mentioned in March that Joe Celko, "Mr. SQL", attended our Austin PGDay event. While he did not speak at that event, I am excited to learn he will be presenting a full-day SQL class at PostgreSQL Conference Europe in Prague. It isn't everyday you get to hear Joe Celko, and a full-day class is alone enough to justify the trip for serious SQL users.

My great disappointment is that I am also scheduled to give a class that day and cannot attend Joe's training. Devrim Gündüz is co-presenting with me … perhaps no one would notice if …

Monday, July 2, 2012

The recent PGCon conference reminded me of two things that make open source development unique. First, open source developers are really the face of the software — they give the presentations, write the blogs, and are the people users look to for answers and software direction. Closed-source developers are normally hidden within companies — they are trotted out occasionally, but marketing and sales really lead the show.

The second difference is operational — most closed-source developers have very little interaction with users — they must get user information though a filter of layered support, and because of this, their ability to improve the software and fix bugs is severely limited. Open source developers not only have direct contact with users, but they have a large team of other open source developers to help them solve problems, and can often have users test fixes and new features rapidly, sometimes the same day. This is in contrast to closed-source developers who often have to wait months or years for their fixes and features to reach a user.

This last aspect is how open source software, while vastly underfunded compared to closed-source software, can excel, because, while money can help produce good software, clear communication channels between developers and users and a unified team can easily outperform more rigid development environments.

Friday, June 29, 2012

After a two-year break, I am again teaching a course at Drexel University. (This blog entry has details about the previous course offering.) Robert Haas and Michael Glaesemann are again helping as guest lecturers — many thanks to them.

Thursday, June 28, 2012

For the past five years, PGCon has hosted an invitation-only developer meeting before the official conference. I found it interesting to look at the previous group photos and meeting notes:

Images: 2012, 2011, 2010, 2009, 2008 (1, 2, 3, 4)

Agendas: 2012, 2011, 2010, 2009, 2008

Images from 2012 and 2010 have all people identified, though when you click on the image to enlarge it, you lose the identifications. Year 2011 is an extra-wide version. In 2008, it seems we didn't take a group picture, so I have chosen four images that show most attendees.

Wednesday, June 27, 2012

If you are a Postgres speaker and are traveling somewhere on vacation, have you ever considered making a Postgres presentation at your vacation location? You might say, "I take vacations to get away from Postgres" — don't let me ever hear such crazy talk!

For example, six weeks before I left for a mission trip to the Dominican Republic, I sent an email to the advocacy list asking if anyone in the Dominican Republic would like to meet. (The mission trip blog has some humorous mentions of me.) I got a private reply that our public relations contact in that country would like a presentation or two. I ended up speaking to 85 students at a university in the Dominican city of Santiago.

What are the advantages of taking vacation time to present a talk about Postgres? Well, many of my most memorable travel experiences have been while as a guest of Postgres community members — a dacha in Russia, a cruise in Brazil, an elephant orphanage in Sri Lanka. These are not typical tourist sites, but are easily accessible with local friends. Even ordinary locations, like an office building or restaurant in Tokyo, is special because you are living as a Japanese — not something most non-Japanese experience, even as tourists.

Anyway, this isn't for everyone — there certainly are downsides, particularly if you are traveling with a spouse or family, but it might be worth considering — it could be the highlight of your vacation.

Tuesday, June 19, 2012

If you are near Philadelphia, you are invited to attend the 2012 Postgres Pool Party at my home:

When: Saturday, August 4, 2012, 1pm to 6pm

Saturday, August 4, 2012, 1pm to 6pm Where: my home in Newtown Square, Pennsylvania (directions)

my home in Newtown Square, Pennsylvania (directions) What: barbecue, swimming, and most importantly, good conversation

All Postgres users, developers, and groupies are invited, including their families. Please RSVP via email by July 30.

Friday, June 15, 2012

With the list of Postgres committers recently updated, I wanted to mention a perplexing question I occasionally get from companies, "How can I become a committer?". This is from companies that are just starting their contributions to Postgres and have never submitted a patch. It is a perplexing questions because, for Postgres, committers are people who have been around for years and have submitted many patches — obviously there are mismatched expectations here.

The cause is that these new companies are equating Postgres with open source communities with hundreds of committers, where you have to be a committer to get anything significant done. The Postgres work-flow (diagram) forces all work through community email lists, and a small group of committers apply patches agreed-upon by the group. So, for Postgres, committing is a mechanical process, not a control process, i.e., contributor ≠ committer.

Companies looking to measure their success in the Postgres community should not focus on committer count, but on the number of release note items and blog entries mentioning their work. I hope this helps companies adjust easier to the Postgres way of doing things.

Wednesday, June 13, 2012

Object Identifiers (oids) were added to Postgres as a way to uniquely identify database objects, e.g., rows, tables, functions, etc. It is part of Postgres's object-relational heritage.

Because oids where assigned to every data row by default, and were only four-bytes in size, they were increasingly seen as unnecessary. In Postgres 7.2 (2002), they were made optional, and in Postgres 8.1 (2005), after much warning, oids were no longer assigned to user tables by default. They are still used by system tables, and can still be added to user tables using the with oids clause during create table. Server parameter default_with_oids controls the default mode for table creation (defaults to "false").

Oids as still used extensively for system table rows, and are used to join system tables, e.g.:

SELECT oid, relname FROM pg_class ORDER BY 1 LIMIT 1; oid | relname -----+----------------------------------- 112 | pg_foreign_data_wrapper_oid_index

Only system tables that need oids have them, e.g., pg_class has an oid column, but pg_attribute does not.

Tuesday, June 12, 2012

While the English language is somewhat fluid in its use of single and double quotation marks, sql is very rigid: single quotes are used to delimit text strings, and double quotes can be used to delimit identifiers. What are sql identifiers? They identify an sql object, e.g., if you create a table, the table name and column names are identifiers.

Double quoting identifiers is usually optional, so either of these is valid:

CREATE TABLE my_table (a INTEGER); CREATE TABLE "my_table" ("a" INTEGER);

However, double quoting of identifiers can effect how objects are defined, e.g., these are not the same:

CREATE TABLE My_table (A INTEGER); CREATE TABLE "My_table" ("A" INTEGER);

Why are these different? Because Postgres automatically lowercases identifiers (contrary to the sql standard that requires auto-uppercasing). This means that the first creates my_table with column a, while the second creates My_table with column A. This becomes an issue when you need to access the table:

UPDATE My_table SET A = 3; -- sets my_table.a = 3 UPDATE "My_table" SET "A" = 3; -- sets My_table.A = 3

Basically, once you use double quotes at object creation time (e.g., create table, create function, create schema) and use a character that is affected by double quotes (uppercase, punctuation, spaces), you must use double quotes when accessing the object. If you create a table named "Oh wow, this is super-cool!", you are going to need to double quote that table name each time you use it.

Also be aware that many tools (e.g., pgAdmin) auto-double-quote supplied identifiers, so be careful when using any double-quote-affected characters with these tools. In addition. Postgres's auto-lowercasing rather than auto-uppercasing, as the standard requires, can cause additional quoting needs when porting applications.

Sunday, June 10, 2012

Having just attended the Southeast LinuxFest, I was reminded of the elements that make a great conference site:

Hotel connected to the conference venue

Affordable hotel

Conference venue walking distance to restaurants and entertainment

The Southeast LinuxFest location met all of these criteria, but that is rare — only one-third of conferences I attend meet these criteria, and when these criteria are not met, you feel it.

When the primary hotel is not connected to the conference, attendees have to take everything with them to the conference because they can't easily return to their rooms, and once they return to their rooms at night, it is often hard to get them to go out. Expensive hotels often drastically reduce attendee count because many attendees must pay for conference expenses with personal funds. Conferences not near food and attractions often lack the fun evening outings that attendees remember long after the conference ends.

Abstractly, you would think that conference technical content alone determines a conference's success, but it is often the nebulous "hallway track" that is most memorable because of the ability to interact with many people in situations similar to your own or who possess expert knowledge. The above conference elements are key in making for meaningful "hallway track" interactions.

So, if you are a conference organizer and find a site that meets all these criteria, grab it — there are not many, but they do exist, and having all these elements is half the battle of creating a successful conference.

Friday, June 8, 2012

I have accepted two speaking engagements in the Dominican Republic in mid-June; my website has dates and cities. I don't have any web links for the events, so if you want to attend, email me and I will get you in touch with the organizer.

Friday, June 8, 2012

I previously covered timing of queries from an external perspective. However, it is also possible to time queries internally. Each data manipulation language (dml) command (select, insert, update, delete) goes through three stages:

parser planner executor

You can actually time how long each stage takes using these server settings:

log_parser_stats = on log_planner_stats = on log_executor_stats = on

The following command outputs all three:

$ PGOPTIONS="-c log_parser_stats=on -c log_planner_stats=on \ -c log_executor_stats=on -c client_min_messages=log" psql <<END SELECT 1; END LOG: PARSER STATISTICS DETAIL: ! system usage stats: ! 0.000055 elapsed 0.000000 user 0.000000 system sec ! [0.000000 user 0.004000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/51 [0/762] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [2/0] voluntary/involuntary context switches LOG: PARSE ANALYSIS STATISTICS DETAIL: ! system usage stats: ! 0.000030 elapsed 0.000000 user 0.000000 system sec ! [0.000000 user 0.004000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/19 [0/795] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [2/0] voluntary/involuntary context switches LOG: REWRITER STATISTICS DETAIL: ! system usage stats: ! 0.000006 elapsed 0.000000 user 0.000000 system sec ! [0.000000 user 0.004000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/2 [0/797] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [2/0] voluntary/involuntary context switches LOG: PLANNER STATISTICS DETAIL: ! system usage stats: ! 0.000045 elapsed 0.000000 user 0.000000 system sec ! [0.000000 user 0.004000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/28 [0/825] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [2/0] voluntary/involuntary context switches LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 0.000016 elapsed 0.000000 user 0.000000 system sec ! [0.000000 user 0.004000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/7 [0/858] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [2/0] voluntary/involuntary context switches ?column? ---------- 1

What is all this output? Well, the headings give us a clue:

LOG: PARSER STATISTICS LOG: PARSE ANALYSIS STATISTICS LOG: REWRITER STATISTICS LOG: PLANNER STATISTICS LOG: EXECUTOR STATISTICS

The first three are output by log_parser_stats, while the rest are specific to the planner and executor. These values are from a kernel function called getrusage. It returns verbose output about the amount of cpu, i/o, page faults, and context switches made during each internal stage of query execution. You can get cumulative rusage statistics using log_statement_stats = on.

The numbers can be overwhelming, but it does allow users to find out how long each part of a query took (see "elapsed"), and how much parsing and planning (optimization) time contributed to the total query duration.

Ideally the majority of time should be spent in executor. If you add the parser and planner times, it gives you an idea of the reduction in query time possible by using prepared statements, assuming you run the statement more than once per session. The Genetic Query Optimizer (geqo) is designed to short-circuit planning when planning overhead might significantly increase query duration. Explain allows you to see the plan (generated by the planner) that would be run by the executor.

Hopefully this helps you understand how to analyze a query by looking at the internal processes that contribute to total query duration.

Wednesday, June 6, 2012

There are actually several ways you can time a query, and each has value depending on what you want to measure. You can time a query at the server (from the time it arrives until the time it completes) by setting the server parameter log_min_duration_statement = 0. However, that does not measure the network overhead, which can be done using \timing in psql. Then there is the connection and application overhead that is measured by time at the command-line. This command measures all three aspects of a query:

$ time PGOPTIONS="-c log_min_duration_statement=0 \ -c client_min_messages=log" psql <<END \timing SELECT 1; END Timing is on. LOG: duration: 0.318 ms statement: select 1; ?column? ---------- 1 Time: 0.435 ms real 0m0.006s user 0m0.004s sys 0m0.004s

Notice the server-timed duration is 0.318, but with network overhead it is 0.435, an increase of 36% in total duration. Of course, this is a simple query, but it does show that network overhead is not trivial. Including application/connection overhead shows 6ms, overwhelming the previous measurements. That test was with Unix domain sockets — here are the same measurements for tcp/ip sockets:

$ time PGOPTIONS="-c log_min_duration_statement=0 \ -c client_min_messages=log" psql -h localhost <<END \timing SELECT 1; END Timing is on. LOG: duration: 0.314 ms statement: select 1; ?column? ---------- 1 Time: 0.467 ms real 0m0.006s user 0m0.004s sys 0m0.000s

The server timing is similar (as expected), but the included network time (\timing) is 7% higher. Enabling ssl shows even more significant network overhead:

$ time PGOPTIONS="-c log_min_duration_statement=0 \ -c client_min_messages=log" psql "sslmode=require host=localhost" <<END \timing SELECT 1; END Timing is on. LOG: duration: 0.312 ms statement: select 1; ?column? ---------- 1 Time: 0.732 ms real 0m0.024s user 0m0.012s sys 0m0.004s

Same server timing, but the network overhead has increased 57%. If we look at just the network times (minus the server query time), we get:

Communication method time Change from previous Unix-domain socket 0.117 n/a tcp/ip, no ssl 0.153 +31% tcp/ip, ssl 0.420 +175%

With ssl enabled, total application execution time grew from 6ms to 24ms, a four-fold increase. This is expected, given ssl's high overhead for secure key negotiation.

As you can see, network overhead, and certainly application and connection overhead, are significant factors in determining query duration. When debugging slow queries, knowing what you are timing greatly increases the odds of success, and I hope this blog post helped.

Monday, June 4, 2012

There are 500k to 1-million words in the English language, so you would think that Postgres would be able to find a unique word for every aspect of the database, but unfortunately, that is not true. There are two cases where Postgres uses a single word to mean two different things, sometimes leading to confusion.

The first word is "cluster". One meaning represents the data directory and postmaster that controls it, e.g., a Postgres instance. One install of Postgres binaries can be used to create several Postgres clusters on the same machine. The second usage of "cluster" is as an SQL command called cluster, which reorders a table to match an index (to improve performance). The good news it that the meaning of cluster is usually clear from the context.

The same is not true of our second case, "schema". Traditionally, when someone talks about a "database schema", they are referencing table or object structure defined using a data-definition language (ddl) command, e.g., create table. The second use of "schema" is to define a namespace to contain objects, e.g., create schema. (I have already covered the value of using such schemas.)

The double-meaning for the word "schema" is often more problematic than for "cluster". For example, the pg_dump manual page uses both meanings of the word "schema":

-N schema --exclude-schema=schema Do not dump any schemas matching the schema pattern. -s --schema-only Dump only the object definitions (schema), not data.

Tell me that is not confusing! One hint that we are talking about two different meanings for "schema" here is that the first one defaults to -N, as a reference to "namespace". Internally, Postgres calls schema containers "namespaces", e.g., system table pg_namespace holds the list of schema containers.

So, if you are having a conversation, and you hear the words "cluster" or "schema", don't be surprised if you get a little confused — hopefully things soon get clarified and the conversation can continue. (Two more words for the Postgres drinking game? )

Friday, June 1, 2012

Acid (atomicity, consistency, isolation, durability) is a fundamental aspect of relational databases. It isn't easy to implement acid, and some database systems take shortcuts in their implementation, particularly for Data Definition Language (ddl) commands. (Oracle notoriously commits multi-statement transactions when a ddl command is issued; this wiki page outlines the support for transactional ddl among database systems.)

Postgres rarely takes shortcuts, and didn't take shortcuts in implementing transactional ddl. Just like dml can be wrapped in a multi-statement transaction, so can ddl. Why is this useful? Well, ddl operations are rarely isolated to a single object — they often are part of a larger roll-out that affects many objects. By using transactional ddl, you allow the ddl to either be applied atomically, or rolled back, which greatly simplifies deployments. Transactional ddl can also be used to replace objects in-place, by performing the create, drop, and rename in the same transaction.

This presentation I attended at ConFoo praised Postgres's support for transaction ddl — there is no wonder it was called Painless, Version-Controlled Database Refactoring. Hopefully you perform "painless" deployments too by using transactional ddl.

Thursday, May 31, 2012

I often ask people about their backup configuration because I want to make sure my own server backup procedures are sufficient. I remember asking someone a few years ago about their personal backup methods, and they said they used raid1 (mirroring), and didn't need backups. That seemed insufficient, and when I asked about recovery of deleted or improperly-modified files, I wasn't given a satisfactory answer. The bottom line is that while raid1 protects against the failure of a drive, it does not address other causes of data loss. Here is a table I created to illustrate the data loss and recovery options I use for my home server:

Deletion of file, discovered immediately Restore from daily or two-hourly backup Deletion of file, discovered < 30 days Restore from monthly backup Deletion of file, discovered >= 30, < 90 days Restore from quarterly backup Deletion of file, discovered >= 90 days Unrecoverable Drive failure Restore from previous day Server destroyed Restore from previous month Server and data safe destroyed Restore from quarterly backup

How likely are these failure cases? This report (article) shows the statistics for data retention failure:

Failure cause Pct. Hardware failure 40% Human error 29% Software corruption 13% Theft 9% Computer virus 6% Hardware destruction 3%

As you can see, there are many failure cases which raid1 cannot recover. Don't believe it? Watch this video about a Pixar film almost lost due to file deletion (and backup failure).

How does this relate to Postgres? Well, if you are only using replication, you are affectively doing raid1 at the server level -- changes to one server are "mirrored" to the other server, and instead of recovery from disk failure, you can recover from complete server failure. We already saw how limited raid1 is in covering data retention failures, and those same limitations apply to most replication systems. Yes, there can be a slight replication delay, but the delay is usually unpredictable. (The community has discussed the idea of adding a feature to specify a minimum streaming replication delay.)

In summary, replication is not enough — it is necessary to use some other method to allow for recovery to some point in the past — before the deletion, unintended modification, or corruption happened. Most users use continuous archiving or logical backups for this purpose. This is why combination of streaming replication and continuous archiving is such a potent combination — it allows the standbys to disconnect and reconnect/catch-up easily, and it allows for point-in-time recovery to address the data retention failures not covered by streaming replication alone.

How far back in time do you need to keep backups? Well, an ideal setup has many recent backups, then decreasing retention of backups, perhaps perpetually retaining backups at some predefined interval, e.g., quarterly.

Hopefully this blog entry has helped stress the importance of addressing all data retention failures, and hopefully you will never have to say that your backup plan is only raid1/replication.

Update: Additional comments 2012-06-09

Wednesday, May 23, 2012

During Robert Haas's talk (slides) last week at PGCon, I realized that three of our major performance features for Postgres 9.2 involved representing data in a more compact form:

Allow queries to retrieve data only from indexes, avoiding heap access

Allow hint bits to be set sooner

Move the frequently accessed members of the PGPROC shared memory array to a separate section, for performance

The first item reduces disk reads, the second reduces memory reads, and the third reduces CPU cache line accesses.

Tuesday, May 22, 2012

With pg_upgrade gaining wide acceptance and the Firefox browser implementing a continuous upgrade model, it might be time to consider auto-upgrades for Postgres. Consider this possible future output:

$ psql test psql (12.3.7) Postgres is up to date Type "help" for help. test=>

If continuous upgrades can be done for a web browser, why not for a database? Well, OK, there are some good reasons a database shouldn't do this, but it certainly is possible to do upgrades more cleanly than we do now. Postgres packagers are already starting to implement upgrades using pg_upgrade. Some developers might prefer auto-upgrades so they are always developing against the most recent Postgres version.

Automatic upgrades are something we probably never would enable unconditionally, but we might someday make it optional.

Monday, May 21, 2012

Pg_upgrade (documentation, presentation) was created in 1998 to allow in-place major version upgrades of Postgres. It was written as a shell script, as were many of our command-line administration tools at the time. (Postgres didn't support Windows natively until 2005). In those early years, Postgres changed its on-disk format frequently in major releases, causing pg_upgrade to be only occasionally useful for major version upgrades (6.5, 7.1, disabled in 2002).

Postgres feature additions soon made the shell script method unworkable, and binary-upgrades were unsupported for many years. In 2006, EnterpriseDB took on the job of rewriting pg_upgrade in C, a task that had been discussed many years on the mailing lists — this email thread, titled "Upgrading rant", give you an idea of the tone of previous discussions.

In 2009, EnterpriseDB realized that a successful binary-upgrade utility had to be community-developed, so they changed the license from GPL to BSD and encouraged me to work with the community to improve the tool. In 2010, pg_upgrade was added to the official Postgres 9.0 release. Postgres 9.1 saw only performance improvements. Postgres 9.2 will improve the user interface and error reporting. It is significant that pg_upgrade has not required large changes to support new major releases, only minor improvements.

Pg_upgrade's goal of allowing in-place upgrades is a bold one, but history shows that it has achieved that goal. Pg_upgrade has had its share of bugs along the way, but their frequency is diminishing as more and more people use pg_upgrade successfully. The community is increasingly supportive of providing in-place upgrade capabilities for users, so hopefully pg_upgrade will remain useful for many years to come.

Friday, May 18, 2012

PGCon 2012 is now over. Surprisingly, all the talks that excited me were from the second/last day. My morning started with WAL Internals Of PostgreSQL (slides) by Amit Kapila. I rarely see this topic covered, and his comparisons to Oracle's transaction logging were fresh. Also in the morning, Hooks in PostgreSQL by Guillaume Lelarge: I also rarely see this covered. It highlighted the C-language plug-in capabilities of Postgres.

My afternoon highlight was Big Bad "Upgraded" PostgreSQL by Robert Treat. After the occasional maligning of pg_upgrade (usually in jest), it was great to hear a big pg_upgrade success story. (This OmniTI blog entry has the details. This talk was also presented at PG Corridor Days DC.)

Thursday, May 17, 2012

As part of yesterday's PGCon Developer Meeting, I hosted a discussion about adding resource parallelism to Postgres, which I blogged about previously. Josh Berkus has kindly summarized the results of that discussion. I am hoping to help track and motivate progress in this area in the coming months; it is a multi-year project to complete.

Wednesday, May 16, 2012

You might have heard of the template1 database before, or seen it in the output of pg_dumpall:

REVOKE ALL ON DATABASE template1 FROM PUBLIC; …

It is one of those databases, like template0, that sits in the background and appears in psql \l output. While you can go for years never knowing about template1, it does have a useful purpose.

As its name suggests, it is the default "template" used for creating new databases. This means, if you modify template1, every new database created after the modification will contain the changes you made. So, if you need a table, schema, function, server-side language, or extension in every database, add it to template1 and you will have it in every newly-created database. The Postgres documentation covers this in detail.

I mentioned template1 is the default template database, but you can use other databases as templates for newly-created databases. Create database has a template option that allows you to specify a different database to copy from. For example, to create a copy of the production database called demo, have everyone disconnect from the production database, and use production as the template to create demo. The demo database can then be modified without affecting production. Create database does a file-level copy, so all the I/O is sequential and much faster than a dump/restore of the database.

As a side node, template0's purpose is often mysterious too, particularly because you can't even connect to it:

$ psql template0 psql: FATAL: database "template0" is not currently accepting connections

There must be something special in there because it's inaccessible! Well, sorry to kill the mystery, but template0 is used solely by pg_dumpall to properly dump out the customizations made to template1. If template0 did not exist, it would be impossible for pg_dumpall to determine the customizations made to template1 and dump those out appropriately.

Thursday, May 10, 2012

I have completed the Postgres 9.2 release notes I started seven days ago. Writing the release notes is always a taxing experience. I have to basically dedicate an entire week to the process of digesting 1100 commit messages to produce 3000 lines of SGML text. Once I am done though, it is rewarding to see the finished product. It is like working on a painting all year, and spending a frustrating week framing it and hanging it on the all — once it is hung, you stand and admire it, and forget much of the sweat it took to produce. I am sure many community members feel the same.

Curious how the 9.2 release item count compares to previous major releases? Here are the results:

Release Items 9.2 241 9.1 203 9.0 237 8.4 314 8.3 214 8.2 215 8.1 174 8.0 230 7.4 263

Of course, this is just the first draft of the release notes; the 9.2 count will change regularly until the final release.

Monday, May 7, 2012

My children have traveled to many conferences with me, and have heard many webcasts and training calls at home. I guess, after hearing about Postgres so often, they pick up certain phrases of interest, and the big one for them is "shared buffers". Anytime someone uses those words, they start howling and make up some funny sentence using the words.

There are a variety of beer drinking games that trigger drinking when a word is said — perhaps "shared buffers" is enough on its own to make a drinking game.

Friday, May 4, 2012

Having reported the methods for finding the size of the kernel cache on Linux, I wish to highlight the importance of the postgresql.conf setting effective_cache_size.

Unlike other memory settings that control how memory is allocated, effective_cache_size tells the optimizer how much cache is present, both in the kernel and shared buffers. This is important for determining how expensive large index scans will be. The optimizer knows the size of shared_buffers, but not the kernel cache size, which affects the probability of expensive disk access.

The kernel cache size changes frequently, so run free during a period of normal system load and add the size of shared_buffers that you think is not already in the kernel buffers — use that value to set effective_cache_size. The value doesn't have to be perfect, but just a rough estimate.

Thursday, May 3, 2012

As in previous years, I have started working on major release notes, this time for Postgres 9.2. I start with this command:

$ src/tools/git_changelog --since '2011-06-11 00:00:00 GMT' --master-only \ > --oldest-first --details-after

which generates 11k lines of output, and eventually reduce that to 3k lines of SGML-marked-up release notes. I have outlined the steps in the past, though git has improved the speed of researching items.

Wednesday, May 2, 2012

Measuring Linux free memory and kernel cache size can be somewhat tricky. You might look at /proc/meminfo for the answer (commas added):

# cat /proc/meminfo MemTotal: 24,736,604 kB MemFree: 3,805,392 kB Buffers: 743,016 kB Cached: 18,188,208 kB …

Unfortunately, this brings up two more questions: Why is "MemFree" so small on this idle system, and what is the difference between "Buffers" and "Cached"? Another way of displaying this information is using the Linux free command:

# free -m total used free shared buffers cached Mem: 24156 20351 3805 0 743 18188 -/+ buffers/cache: 1419 22737 Swap: 6219 0 6219

Addressing the first question, we see the same 3.8GB for free memory displayed on the first line under "free". This display is so often misinterpreted that it necessitated the creation of an explanatory website.

The Mem line output by free shows memory from the kernel perspective. The free value of 3.8GB represents kernel memory not allocated for any purpose, i.e., free, from the kernel's perspective. The bulk of the memory, 18GB, is shown as used for caching ("cached"). The second line, confusingly labeled "-/+ buffers/cache", represents memory from the process perspective. It is labeled that way because the "buffers" and "cached" have been removed from the "used" column and added to the "free" column, i.e., -/+. From the process perspective that memory is immediately available because it represents read cache that can be discarded as soon as memory is needed by processes. This labeled output summarizes the columns (details):

# free -m total used free shared buffers cached Mem: A+B+C+D B+C+D A B C -/+ buffers/cache: D A+B+C

(The "Swap" line has been removed from this and later free outputs.)

"A" represents kernel free memory, and "D" represents memory used by processes. You can see that "B+C" is added to the used column in line 1, and added to the "free" column in line 2. "total" represents all memory, less memory used for basic kernel operation. In fact, "total" always equals "used" + "free" for each line:

# free -m total used free shared buffers cached Mem: A+B,C+D A B -/+ buffers/cache: C D

Here is actual free output:

# free -m total used free shared buffers cached Mem: 24156 20351 3805 0 743 18188 -/+ buffers/cache: 1419 22737

Notice that the italicized values on line 1 equal the bold value on line 2. Similarly, these italicized value equal the bold value:

# free -m total used free shared buffers cached Mem: 24156 20351 3805 0 743 18188 -/+ buffers/cache: 1419 22737

While free is confusing, its layout does suggest the meaning of various columns. Open source has often debated what "free software" means, and it seem the Linux kernel also has multiple definitions of the word "free".

The second question regards the difference between "buffers" and "cache". (Josh Berkus recently blogged about a kernel bug that prevented most available ram from being used as cache.) The definitive answer for modern kernels comes from this source code comment:

Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so) Cached: in-memory cache for files read from the disk (the pagecache). Doesn't include SwapCached

The buffers and caches can be cleared using these commands (at least in Linux 2.6.16 and later kernels):

To free pagecache: echo 1 > /proc/sys/vm/drop_caches To free dentries and inodes: echo 2 > /proc/sys/vm/drop_caches To free pagecache, dentries and inodes: echo 3 > /proc/sys/vm/drop_caches

With the last command run, the free command shows almost nothing cached, and the first two entries in the "free" column are almost identical, as expected:

# free -m total used free shared buffers cached Mem: 24156 899 23257 0 0 91 -/+ buffers/cache: 807 23349 Swap: 6219 1 6218

Hopefully this explanation helps people understand the various perspectives of "free" reported by the Linux kernel and an easy way to find the size of the Linux kernel cache.

Monday, April 30, 2012

There are several levels of caching used in a typical server — here they are, in order of increasing distance from the cpu:

1. Cpu cache 2. Random-Access Memory ( ram ) 3. Storage controller cache 4. Storage device (disk) cache

All of these are a fixed size and set at hardware installation time, i.e., you can't move cache from one level to another. For Postgres, there is flexibility in how #2, random-access memory, is allocated, and this provides a never-ending opportunity for administrators to optimize their systems. The three ram allocation possibilities are:

The last item, kernel cache, isn't really an allocation but rather is based on the ram remaining from the previous two allocations. Some database systems avoid the kernel cache by directly writing to storage (direct i/o or raw disk access). While Postgres does allow direct i/o for writing of the Write-Ahead Log (wal) (via wal_sync_method), it does not support direct i/o or raw device control of data buffers. The reason for this is that Postgres tries to be operating system-neutral, and allows the kernel to do read-ahead and write combining. This is often more efficient, particularly if the storage subsystem has both database and non-database activity. In such cases, only the kernel knows about all the storage activity, and can better optimize the workload. This also helps Postgres run well in virtualized environments. The big downside of using the kernel cache is that it often contains data already in the shared buffer cache.

Future blog entries will explore the various configuration possibilities of allocating random-access memory (ram).

Saturday, April 28, 2012

During the next two months I will be attending events in the following cities: New York City, Ottawa, Charlotte (North Carolina), and Boston — the details are on my website. I will also be doing training in many of these locations. And I spoke in Philadelphia this week — I guess this is what the New Postgres Era looks like.

Friday, April 27, 2012

I mentioned that Postgres supports multiple clusters, databases, and schemas, but an open question is why use multiple schemas, rather than placing everything in the "public" schema? By default, search_path places everything in the public schema (assuming a schema matching the current user name does not exist):

SHOW search_path; search_path ---------------- "$user",public

Of course, search_path can be set at various levels, e.g., user, database, session.

So, why use more than the public schema?

Easier object management: Rather than having thousands of tables/objects in a single schema, arrange them in separate schemas, either by user or purpose. FYI, almost every object exists in a schema, not just tables and indexes, e.g., functions exist in schemas. (Technically, only languages and global objects are schemaless.)

Allow multiple objects with the same name to exist in the same database: For example, schemas allow for two applications in the same database to both have a "parts" table, e.g., vendor_app.parts, and production.parts.

Permission Control: Schemas have both create and usage (search) permissions that apply to all objects in the schema. It is possible to create a schema that only certain roles can use, and to remove creation permission from the "public" schema for read-only users.

When Informix introduced schemas in the mid-1990's, I didn't understand schemas and why all the tables now were prefixed with "public", so I never used this useful feature. Hopefully this blog post helps explain the usefulness of schemas and why ignoring schemas, as I did in the past, is not always a good idea.

Wednesday, April 25, 2012

I recently mentioned the use of connection poolers to reduce the overhead of server-side language initialization, but I believe the topic warrants fuller coverage.

Aside from the ability to reduce language initialization overhead, the two major advantages of connection pooling are:

Reducing session startup time by preallocating and reusing database sessions

Reducing session management overhead by reducing the number of idle sessions

While these advantages tend to get lumped together in people's minds, they actually address two different issues. The first, reducing session startup time, reduces the delay in executing the first query of a session, and reduces process creation overhead on the server. It is particularly useful for databases with many short-lived sessions, and for operating systems, i.e., Windows, that do not use fork (details about fork).

The second advantage, reducing session management overhead, improves overall performance by reducing the server overhead involved in managing many sessions, i.e., if many of those sessions are idle, it is possible to speed up the active sessions by reducing the total number of database sessions connected to the server. This is obviously advantageous when there are many sessions, but most are inactive, i.e., the database is more efficient executing 20 active sessions than running 20 active sessions and managing 400 inactive sessions.

Postgres has two popular connection poolers, pgpool-II and PgBouncer. The names are a little confusing, because pgpool-II does a lot more than pooling. (Josh Berkus covered this issue recently.) Connection poolers are also built into many application-building tools, like PHP and Hibernate. Whichever one you use, they all provide the benefits listed above.

One tricky limitation is that database sessions can only be pooled per-database, meaning that if you connect to many different databases in a unpredictable pattern, a connection pooler will be less useful.

Postgres does not provide a built-in connection pooler. and that is by design — external connection poolers can be placed closer to the client, or on inactive servers, and they can be used to redirect client connections to new servers after a fail-over — a built-in connection pooler would have none of these advantages.

Monday, April 23, 2012

I previously explained that Postgres allows multiple databases per cluster. The outstanding question might be, when should I use multiple clusters, multiple databases, or multiple schemas? The following table outlines the advantages of the various container types:

Feature Cluster Database Schema Isolated Server Start/Stop ✓ Connection Control ✓ ✓ Private System Tables ✓ ✓ Private Plug-Ins ✓ ✓ Isolated Administration ✓ Shared Administration ✓ ✓ Isolated Resource Usage ✓ Shared Resource Usage(1) ✓ ✓ Data Isolation(2) ✓ ✓ Cross-Container Queries ✓

(1) A large number of data containers increases the usefulness of resource sharing, e.g., shared_buffers. Resource sharing includes log shipping and streaming replication sharing.

(2) User and database names, being global objects, are visible in all databases. It is impossible to query across databases, except via an external database session, e.g., dblink. Schema permissions allow data access control, but pg_class still shows all tables defined in the database.

Hopefully this chart helps users choose the proper container for their data needs.

Wednesday, April 18, 2012

Postgres has three backup methods: logical, file system, and continuous archiving. The last two rely on copying the file system (with the possible assistance of the write-ahead log). The first method, logical backups, is performed via pg_dump or pg_dumpall, and has two pitfalls I would like to mention.

First, many people think that running pg_dumpall and running pg_dump on each database individually are equivalent — they are not. While pg_dumpall does backup each database individually, it also dumps out global objects that exist outside any specific database, e.g., roles, tablespaces. If you prefer to use pg_dump to dump each database individually, be sure to run pg_dumpall -globals-only as well. Failure to restore global objects before per-database objects causes all restored data to be owned by the restore user and be in a single tablespace.

The second pitfall is to believe that a logical dump of the database has captured all cluster modifications. While all the database objects have been dumped, configuration file changes are not included in a logical dump and must be backed up separately, e.g., postgresql.conf, pg_hba.conf. This is why many wise administrators and packaging systems move these configuration files out of the the data directory and into a directory that is regularly backed up by the operating system, e.g., /etc. With the configuration files moved, the data directory can be skipped entirely during operating system-level backups.

So, if you perform logical backups, make sure you backup your global objects and configuration files as part of your backup process.

Monday, April 16, 2012

The Postgres system is setup differently than some other database systems, and while long-time Postgres users consider its layout natural, people coming from other database are often initially confused. Therefore, it is helpful to explicitly mention some of those differences.

First, Postgres allows multiple databases in a single cluster (which is controlled by a single postmaster). This often confuses Oracle users because in Oracle you can only have one database in a cluster. This also confuses MySQL users because they have multiple "databases", but these databases share a single system catalog and allow cross-"database" queries. Effectively, they are "schemas", rather than databases. Microsoft SQL and DB2 have a similar facility to Postgres, with multiple databases in a cluster, and multiple schemas in a database.

As part of this multi-database, multi-schema capability, there are some things that don't make sense to store in per-database tables, e.g., roles, tablespaces. These are called global objects and a system catalog query can show them to us:

SELECT relname FROM pg_class JOIN pg_tablespace ON (reltablespace = pg_tablespace.oid) WHERE relkind = 'r' AND -- only tables spcname = 'pg_global' ORDER BY 1; relname -------------------- pg_auth_members pg_authid pg_database pg_db_role_setting pg_pltemplate pg_shdepend pg_shdescription pg_shseclabel pg_tablespace

Another way to see global objects is with pg_dumpall --globals-only.

Looking at the list, it seems natural that they are not per-database — you certainly wouldn't want the list of available databases stored in a single database. Role (user and group) information is also cluster-wide. Tablespaces are something that exists as part of the file system, so it makes sense for them to be global too.

I hope this helps users who might have been confused by the way Postgres does things.

Friday, April 13, 2012

I have already blogged about toast storage (and TOAST queries), but I would like to highlight one aspect that is easily overlooked by application programmers. There is an age-old debate over whether using select * is good programming practice — many feel that wildcarding all columns of a table makes applications more brittle when columns are added or removed, while others feel that specifying all columns in an application is more error-prone. If a table has only a few columns, specifying them is pretty easy, but when the number of columns is large, specification of column names can be cumbersome. Of course, having wide rows in tables has its own issues, but I don't want to get into that here — what I do want to cover is its affect on toast'ed values.

As I mentioned in my previous blog post, long values are not stored in the main row, but in toast tables that have additional access overhead. This is great when you only need the short values from a row, which is typical, but what happens when you use select *? Well, if you actually need all the column values, using select * or specifying the column name explicitly makes no difference, but many applications use select * when they only need some of the column values, and this is where toast becomes an issue. If you don't need to see some columns, and those columns might be long and hence toast'ed, it is wise to avoid selecting them in a query — this will avoid toast table access and speed up queries significantly.

With toast, some values are more expensive to access than others, so avoid unnecessarily accessing columns that might contain toast'ed values. (Observation from Stephen Frost.)

Wednesday, April 11, 2012

You might have seen shared_preload_libraries in the postgresql.conf configuration file, but its usefulness might have been unclear — I would like to explain its purpose here.

Unlike traditional database systems, Postgres is an object-relation database system, meaning it relies heavily on plug-in objects to enable flexible behavior, and many of these objects are supplied as shared object libraries (or dynamically-loadable libraries (DDL) on Windows). Examples of external objects include the PL/pgSQL server-side language, the pgcrypto cryptographic library, and the PostGIS geographic information system. These are all implemented as shared libraries that are dynamically loaded into the database server when accessed. The process of loading a library into a running executable is called dynamic loading and is the way most modern operating systems access libraries (the non-dynamic method is called "static linking").

However, this dynamic loading is not free — it takes time to load a dynamic library the first time it is accessed. The library must be mapped into the process address space, and its symbol table read, and this must happen before any functions in the dynamic library can be executed. This is where shared_preload_libraries is useful — on operating systems where child processes are forked rather than created from an executable (all but Windows), shared_preload_libraries loads the shared library into the parent of all child processes (the postmaster). This causes all child processes, i.e., new sessions, to inherit an address space that already contains the needed shared library, speeding up calls to any functions in that library. (Fork is illustrated in my Inside PostgreSQL Shared Memory presentation, slide 12.)

In one test, I found that setting shared_preload_libraries = '$libdir/plpgsql' in postgresql.conf caused a simple plpgsql function:

CREATE FUNCTION preload_test() RETURNS void AS E'BEGIN

END;' LANGUAGE plpgsql;

executed with these commands 10,000 times in the same psql session:

\connect test SELECT preload_test();

reduced execution time significantly. Of course, this is an atypical test and most workloads will not see as significant a gain. Here is a summary of my various tests:

----- S e c o n d s ----- % Net Change Library No Preload With Preload % Change from chr('a') ----------------------------------------------------------------------- chr('a') 28.0 28.2 +1 n/a pgcrypto(1) 32.8 30.9 -6 -40 plpgsql 34.7 30.5 -12 -63 plperl(2) 295.0 240.3 -19 -20 1 query: armor('a') 2 function body: 'return;'

PL/Perl, being a more complex library, showed a more significant improvement, and its high overhead suggests the advantage of using pooled connections so the Perl interpreter doesn't have to be started for every new connection. Notice that loading unneeded libraries can have a slight overhead, so use shared_preload_libraries discriminately.

In conclusion, testing shows that shared_preload_libraries can improve system performance (except on Windows) if your server heavily uses shared libraries.

Tuesday, April 10, 2012

There are three ways to install Postgres, and they are all listed on the Download menu on the Postgres website. While the web authors did their best to explain the differences between the various installation methods, I thought it would be helpful to more fully explain their advantaged and disadvantages. The three methods are:

Graphical Installers (also called "one-click" or "click-through" installers) Advantages: Provide an easy-to-install Postgres environment with access to many add-on modules via Stack Builder. This is ideal for first-time Postgres users. It is also possible to use the installer in non-interactive mode. You can even unzip the file and run the binaries manually. Disadvantages: As mentioned on the Postgres web site, the one click installers do not integrate with platform-specific packaging systems.

(also called "one-click" or "click-through" installers) Platform-Specific Packages Advantages: These are better integrated with other software installed on your operating system. This is ideal for production servers that rely on operating-system-supplied tools. Disadvantages: Requires work to identify which other packages are needed for a complete solution. Obtaining newer major versions of Postgres on older operating systems might also require work.

Source Code: Advantages: Allows selection of specific configure and compile options for Postgres binaries, and allows the addition of patches to enhance or fix Postgres. This is ideal for experienced users who can benefit from additional control of Postgres. Disadvantages: Requires compilation experience and managing Postgres tool integration, and requires user and server start/stop configuration.



Monday, April 9, 2012

Having attende