This blog is about my work on the Postgres open source database, and is published on Planet PostgreSQL. PgLife allows monitoring of all Postgres community activity.

Online status:

Unread Postgres emails:

Email graphs: incoming, outgoing, unread, commits (details)

Tuesday, December 6, 2011

Our web site lists many Postgres user groups, but perhaps only a dozen are active, having monthly meetings. Why is that? What makes a Postgres user group successful? How can you help create or revitalize a user group?

The answers aren't a great mystery, but it does require hard work. I think a lot of people who start Postgres user groups don't realize the work necessary to make it successful. They probably think, "Oh, I will announce a user group meeting, and people will show up, and it will be fun." Well, the "fun" part might be right, but growing a user group long-term requires lots of work — let me explain.

The first meeting is usually deceptively easy — you just announce it and curious Postgres users attend. But what next? You can't just keep calling meetings hoping people will show up. You need to give people a tangible reason to attend future meetings. That's where the hard work comes in:

Communication: Clear an consistent communicate are a requirement for any successful user group. Many USA-based user groups use Meetup, because it provides an easy, single point of contact for the group. The New York City Postgres User Group's (NYPUG) web site is a good example.

Clear an consistent communicate are a requirement for any successful user group. Many USA-based user groups use Meetup, because it provides an easy, single point of contact for the group. The New York City Postgres User Group's (NYPUG) web site is a good example. Consistency: When you set the expectation of monthly user group meetings, people then plan their schedules around those meetings. If leaders get busy and can't host expected meetings, people start to wonder if the user group is useful for them, and attendance lags. Most successful groups need at least two leaders to avoid this problem, so one can take over if the other is busy.

When you set the expectation of monthly user group meetings, people then plan their schedules around those meetings. If leaders get busy and can't host expected meetings, people start to wonder if the user group is useful for them, and attendance lags. Most successful groups need at least two leaders to avoid this problem, so one can take over if the other is busy. Speakers: Providing meaningful content is an important part of growing attendance. In the first few meetings, it is often necessary to bring in well-known, outside speakers to generate excitement. After the group grows, the members themselves can often be effective meeting speakers.

Providing meaningful content is an important part of growing attendance. In the first few meetings, it is often necessary to bring in well-known, outside speakers to generate excitement. After the group grows, the members themselves can often be effective meeting speakers. Food: Let's face it — no one wants to grab a warmed-over meal from a convenience store and eat it in the car in the way to the meeting. Provide food and advertise it so people know they can come right from work and have a nice time socializing and eating before the meeting starts.

Let's face it — no one wants to grab a warmed-over meal from a convenience store and eat it in the car in the way to the meeting. Provide food and advertise it so people know they can come right from work and have a nice time socializing and eating before the meeting starts. Funding: Funding for communication, food, and speaker travel costs is often available from region-specific Postgres organizations, like the United States PostgreSQL Association.

Funding for communication, food, and speaker travel costs is often available from region-specific Postgres organizations, like the United States PostgreSQL Association. Determination: As the group is getting started, there will be meetings that are poorly attended, which can lead to discouragement. Leaders should avoid scaling back and just continue going — it takes time to build momentum.

The bottom line is this — if leaders consider the user group important, then others will as well. If leaders behave in a way that indicates it isn't a priority, the rest of the group concludes the same, and the group fades away.

Monday, December 5, 2011

Postgres uses a process-based architecture. Rather than using threads, necessary information is shared among sessions via shared memory (diagram, slide 12). This allows for:

Simplified backend code

Efficient scheduling of multiple sessions across multiple CPUs by the operating system

Improved reliability

Unfortunately, this process architecture makes it difficult for a single session to utilize multiple CPUs and I/O channels. (Multiple sessions use multiple CPUs just fine.) This is becoming more of a limitation because of recent computer architectural improvements:

More CPU cores rather than faster CPUs

Increased memory capacity

Solid-state drives with reduced I/O latency

With the traditional database bottlenecks of I/O and memory receding (presentation), a single CPU will become more of a limiting factor in the coming years, and is already a limitation for some workloads.

Postgres has already made some parallelism inroads. Our pg_restore utility can use multiple parallel restore processes. We also support building the source code in parallel with gmake.

While these are useful first steps, there many more possible areas of parallelism:

Parallel I/O requests (already partly done using effective_io_concurrency)

Parallel dumping of data (already on our TODO list)

Parallel sorting of batches (though recent sort developments for Postgres 9.2 should make single-process sorting much faster)

Parallel execution

That last item needs some explanation. Databases are inherently stage-based, moving queries from parsing, rewriting, planning/optimizing, and execution. There isn't a whole lot of parallelism possible in those early stages, and it is difficult to pass partial results from one stage to the next, so it is hard to implement parallelism there. It is possible, but it requires rearchitecting major parts of the system, and the overhead of the new infrastructure often hurts reliability and diminishes the hoped-for performance benefits.

I think back to changes made in Informix 6 in 1994. Rearchitected to be massively parallel, Informix 6 was widely rejected and many users rolled back to process-based Informix 5. Informix 7, also designed for parallelism, took years to stabilize, and by that point the company had lost much of its momentum and credibility. Of course, we don't want that to happen to Postgres.

The Postgres approach has always been conservative, looking for ways to increase performance without sacrificing reliability or portability. Users often arrive with invasive performance patches, but our developers are able to test patches to determine which parts are contributing the majority of the performance improvements, and restructure the patches to minimize code disruption. There is a good example of this from an email posted today.

We will soon reach a point where more parallelism will be a requirement, particularly to speed up single large queries — we might be there already. But parallelism will not be done in a big-bang manner, but rather incrementally, where it can produce the greatest impact. Our first step will probably be to create a session-helper architecture, with multiple processes waiting for requests for parallel operations. With that in place, we can start to look at where parallelism is needed.

For a really interesting use of database parallelism, look at Tim Child's PostgreSQL OpenCL Procedural Language. It shows a server-side language designed to allow parallel operations on graphics CPUs (GPU). That is just one possible example of where the multi-year effort of parallelism might take us.

Friday, December 2, 2011

Having attended 138 Postgres-related events in the past twelve years (chart), I have done quite a bit of travel (map). Of course, many business travelers travel a lot more than I do, but the variety, distance, and duration of my trips is somewhat unusual.

I am often asked at conferences about how I manage my heavy travel schedule. There are a few tips that keep me sane and motivated to continue traveling, so I thought I would share them here. I especially hope this is useful for the many volunteers that give of their time to travel as speakers to distant Postgres conferences:

Control: For me, most travel is voluntary, meaning I can decide to accept or reject a conference invitation. Psychologically, that is an important aspect. If I have booked too much travel, I have only myself to blame.

For me, most travel is voluntary, meaning I can decide to accept or reject a conference invitation. Psychologically, that is an important aspect. If I have booked too much travel, I have only myself to blame. Pace: A related issue is pacing. You might remember I recently declined some events to recover from a heavy travel schedule. Pushing travel past the comfort point is an easy way to start hating it and therefore burn out.

A related issue is pacing. You might remember I recently declined some events to recover from a heavy travel schedule. Pushing travel past the comfort point is an easy way to start hating it and therefore burn out. Variety: It is hard to get excited about going to the same distant city repeatedly. Variety is a great way to keep travel exciting.

It is hard to get excited about going to the same distant city repeatedly. Variety is a great way to keep travel exciting. Vacation Day: I always try to book an extra day in the travel city for sight-seeing. I often don't remember many details of the conference, but I do remember those sight-seeing days, and that keeps me going and anticipating the excitement of travel. Also, my travels usually allow me to experience the culture as a native, rather than as a tourist, which is also enlightening.

I always try to book an extra day in the travel city for sight-seeing. I often don't remember many details of the conference, but I do remember those sight-seeing days, and that keeps me going and anticipating the excitement of travel. Also, my travels usually allow me to experience the culture as a native, rather than as a tourist, which is also enlightening. Sleep: Sleep is an often overlooked aspect of travel. A long flight where you can't sleep is a frustrating flight. A flight where you are asleep by wheels-up and are awoken by landing instructions is a breeze. (A window seat, blow-up travel pillow, ear plugs, eye shades, and travel socks can make all the difference.) Also, sleeping early when on location is important, especially the night before your presentation. You need to be sharp; you spent hours traveling — don't blow it by not pacing yourself and getting insufficient sleep. Sleep also helps prevent you from getting sick, which is more likely and more inconvenient when traveling.

Sleep is an often overlooked aspect of travel. A long flight where you can't sleep is a frustrating flight. A flight where you are asleep by wheels-up and are awoken by landing instructions is a breeze. (A window seat, blow-up travel pillow, ear plugs, eye shades, and travel socks can make all the difference.) Also, sleeping early when on location is important, especially the night before your presentation. You need to be sharp; you spent hours traveling — don't blow it by not pacing yourself and getting insufficient sleep. Sleep also helps prevent you from getting sick, which is more likely and more inconvenient when traveling. Working From Home: Because I work from home when not traveling, I have lots of travel energy. If I had to go to an office every day, I would have a much harder time. Heck, when I have not traveled for a few months, I emit a slight whimper when driving past the airport. (Weird, huh?) I remember waiting for a short flight and hearing a Madrid boarding announcement and thinking, "Hey, I could go over there and be in Madrid by morning". (Yeah, odd, I know. I guess I was made for this.)

Because I work from home when not traveling, I have lots of travel energy. If I had to go to an office every day, I would have a much harder time. Heck, when I have not traveled for a few months, I emit a slight whimper when driving past the airport. (Weird, huh?) I remember waiting for a short flight and hearing a Madrid boarding announcement and thinking, "Hey, I could go over there and be in Madrid by morning". (Yeah, odd, I know. I guess I was made for this.) Maximize: Travel is a big time-waster, so make the most of it. I don't worry about community emails and minor things while I am away but focus on making the most of the event by spending time with folks and being available for unscheduled things, like speaking at local companies that use Postgres. (This usually happens after the conference is over when people realize I am around for an extra day.) Combining trips to visit several cities is also always a win — who wants to fly back the next month?

Travel is a big time-waster, so make the most of it. I don't worry about community emails and minor things while I am away but focus on making the most of the event by spending time with folks and being available for unscheduled things, like speaking at local companies that use Postgres. (This usually happens after the conference is over when people realize I am around for an extra day.) Combining trips to visit several cities is also always a win — who wants to fly back the next month? Family: My work is not in a vacuum, and my family is certainly affected by my travel. I try to schedule things ahead of time so they have few surprises, and I often take one of my children with me on trips. That's a win in any family's book, and it cuts down on travel resentment. In fact, this is the first year I took all four of my children on at least one trip. (It is easier because they are home-schooled.)

Anyway, these are my tips. I didn't start out knowing most of this, so I thought I would help people starting out with travel. In fact, just yesterday I was advising someone who is considering travel to Japan to speak at a Postgres conference.

Done right, Postgres travel can be a meaningful part of your life and you can continue it indefinitely. Done wrong, you will soon hate it and look for any excuse to avoid it. I hope that doesn't happen to anyone.

Thursday, December 1, 2011

You might have heard the term "GUC" before, especially on the Postgres email lists. It is documented as an acronym meaning "Grand Unified Configuration", but the section it references doesn't mention the term "GUC" at all. Pretty obscure, huh?

Our Grand Unified Configuration (GUC) system was created in 2000 as a way to control Postgres at various levels. You are probably most familiar with controlling Postgres via the postgresql.conf file or using the SQL SET command, but it is much more powerful than that.

In fact, people often get into the habit of changing variables at certain levels, and never realize that these variables can be set at more global and more local levels. Of course, some settings, like shared_buffers can only be set in postgresql.conf. (These entries are usually marked with the file comment "(change requires restart)".) But other settings, like work_mem and search_path, can be set at many levels. This email gives a very concise description, which I have paraphrased:

globally in postgresql.conf per user and/or per-database via ALTER ROLE/DATABASE … [IN DATABASE ...] per session with SET per function via CREATE FUNCTION … SET per subtransaction with SET LOCAL

Of course, our documentation has an even more detailed explanation.

As I have said many times before, our GUC system seems limited at first, but once you realize that you can combine settings and set them at various levels, it becomes a powerful tool that can meet almost any need.

Wednesday, November 30, 2011

Having given many talks about Postgres administration over the years, I have recently felt convicted that I have not given sufficient coverage to external monitoring tools. I have always discussed how to get information out of Postgres, but not how to efficiently process that information. I have updated my administration presentation, but I would like to go into more detail here to atone for my previous lack of coverage.

Alerting: For alerting, it is tough to beat check_postgres and tail_n_mail. Check_postgres runs checks on many aspects of Postgres, reporting the results via text output, Nagios, or MTRG. Tail_n_mail scans for important messages in the server logs and emails them.

For alerting, it is tough to beat check_postgres and tail_n_mail. Check_postgres runs checks on many aspects of Postgres, reporting the results via text output, Nagios, or MTRG. Tail_n_mail scans for important messages in the server logs and emails them. Analysis: There are a lot of analysis tools out there. Nagios is great for distributing alerts, while Munin and Zabbix are good for graphing. In fact, some people use Nagios for alerting and Munin for graphing. Cacti and MRTG are also popular. This blog entry from 2009 compares several popular analysis tools.

There are a lot of analysis tools out there. Nagios is great for distributing alerts, while Munin and Zabbix are good for graphing. In fact, some people use Nagios for alerting and Munin for graphing. Cacti and MRTG are also popular. This blog entry from 2009 compares several popular analysis tools. Queries: pgFouine ("fouine" is French for weasel) is an often-overlooked tool for analyzing SQL query traffic to find the slowest, most frequent, and queries that took the most cumulative execution time (sample report). It analyzes the queries by reading the server logs.

pgFouine ("fouine" is French for weasel) is an often-overlooked tool for analyzing SQL query traffic to find the slowest, most frequent, and queries that took the most cumulative execution time (sample report). It analyzes the queries by reading the server logs. Commercial: For commercially-developed tools, there is Circonus (open source version, Reconnoiter), Postgres Enterprise Manager (strong Postgres integration), and Hyperic (both open and closed source versions).

Postgres is good at generating voluminous output suitable for monitoring. External monitoring tools help to make that information useful for administrators.

Tuesday, November 29, 2011

Recent Postgres major releases have been packed with features (9.1, 9.0), but you might have noticed a lack of major performance improvements in those releases. I was starting to think that we had basically addressed all the performance improvements we could.

Well, I was sure wrong. Postgres 9.2 is going to have a boat-load of performance enhancements. The inclusion in Postgres 9.2 of index-only scans and count(*) optimizations clearly is something that is going to help a lot of work-loads.

Large server workloads (+32 cores) is another area that is being addressed. This has been an area where we lagged behind commercial databases, and I was starting to think that to catch up with them, we were going to need to do a lot of very ugly, platform-specific hacks. Well, that turns out to be false also. Changing the layout of the of the per-session shared memory storage has yielded a 40% performance improvement on 32-core tests. Lock scalability for a large number of clients was also improved by 5 times!

And there is more work being done, but not yet committed to Postgres 9.2. Inlining some of the sort comparisons has lead to a 60% sorting improvement. Improvement in MVCC snapshot overhead has lead to a 20% improvement for large workloads.

These are not the 1–2% wins we normally chase down, but major benefits. Of course, they don't apply to all workloads, but the benefits are unmistakable. It is hard to guess everything that will be in Postgres 9.2, but it is clear that those looking for a faster Postgres will not be disappointed.

Update: I have updated the text above to include the big lock scalability item.

Saturday, November 12, 2011

After attending Postgres events in Chicago, New York City, Baltimore, Amsterdam, and Brazil during the past two months, I returned home this week to recover. Even though I travel frequently, this pace exceeded even my ability to adjust. I will be skipping upcoming Postgres events in Boston and Los Angeles (for my own sanity ) and am staying closer to home during the next few months.

I will be teaching a one-day training class in Philadelphia next week, and hope to do other classes in New York City, Boston, and Washington, DC in the coming months. (That shouldn't be too hard.) I also plan on making some modest code contributions for Postgres 9.2.

In February, I will be speaking at a Montreal developers conference.

Friday, September 16, 2011

Having just presented at PG Open, I have now placed my Unlocking the Postgres Lock Manager slides online.

Sunday, September 11, 2011

With PG Open starting next week, I begin an eight-week period that has me traveling for half that time, visiting Chicago, New York City, Baltimore, Amsterdam, and São Paulo (schedule). My sons are traveling with me on two of these trips, which makes the travel burden easier on my family.

Monday, September 5, 2011

I have posted an announcement about a pg_upgrade bug fix when upgrading from Postgres 8.3.

Friday, August 19, 2011

Four months ago, Greg Smith had an excellent blog entry describing his admiration for the Intel Solid-State Drive 320 Series because of their performance, low cost per gigabyte, and capacitors that guarantee safe writes.

Unfortunately, a significant number of drive users reported problems with the SSDs. (Amazon reviews) After power failure, the drive would sometimes show a data capacity of only 8MB.

Fortunately, Intel has released a fix, so the drives can be safely used again.

Sunday, August 14, 2011

Having received no RSVPs for my party next week, I am cancelling it. Maybe I should think of an event for the wintertime.

Monday, August 8, 2011

Final part of my Email Snippets series: While the Postgres email lists are certainly interesting and sometimes entertaining, users often share links to other sites which are also entertaining:

And, to conclude, here is a portrayal of the Postgres community at its best:

This newsgroup I always brag about to my other geek friends. 'Those people on the postgres news group are the steadiest, most helpful, most logical people I know on the Internet. The don't jump on fancy new trends nor hide technology in market speak. Their work revolves around the best tested, theory based practices. They just make the tech world a better place with some of the most reliable, best supported OSS out there.' And you guys and gals ARE that description. (email)

Friday, August 5, 2011

Part 5 of my Email Snippets series: Postgres is a community-managed open source project, and as part of that management, humor often pokes through:

Honesty > >> There is some amount of user code (I'm not sure precisely how > >> much) that runs after shared memory is mapped and before the > >> deadman switch is engaged. > > > > Er … what would you define as "user code"? > > Our code, as opposed to the failure-inducing boatload of crap > injected by the operating system. Don't hold back. Tell us how you really feel ;) (email)

Search Saying, in effect, "search the archives with our not-super-great search technology using keywords you didn't think of," comes off as pretty dismissive if not downright hostile. (email)

Wishing That doesn't preclude also providing some more-invasive tools that people can use when they do get into that situation; but let's not let I-want-a-magic-pony syndrome prevent us from doing anything at all. (email)

Wishing II Sure. I would also like a pony. (email)

Hard So, yeah, I found the "this is hard" part. (email)

Success > Tom, you are a genius! No, seriously, I mean it, this is awesome, it > worked! YES! You totally saved my day! Thank you! Finally! I'm so > happy! Hey, guys, I think it worked...! (email)

Discipline > P.S. Yes, I know I have other review work to do as well. Starting on the > rest of that tomorrow. cracks whip Man, this thing doesn't work at all. (email)

Performance We need to revisit that for 9.2 and un-reject it. It's nice that PostgreSQL can run on my thermostat, but it isn't nice that that's the only place where it delivers the expected level of performance. (email)

Marketing … there's an interesting angle to complete with NoSQL...wait, I'm feeling some more ad copy: PostgreSQL 9.1 lets you pick exactly the level of commit guarantee your data requires. Whether you want unlogged tables optimized only for speed, or you need durable synchronous replication to multiple servers, you're covered — all in one database. Need to take a shower to wash the stench of marketing off now. (email)

Email formatting A: Maybe because some people are too annoyed by top-posting. Q: Why do I not get an answer to my question(s)? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? ( email

Tuesday, August 2, 2011

Part 4 of my Email Snippets series: Postgres has a well-deserved reputation for reliability, as shown in our email postings:

60+ Ships > Of course, none of that helps with your current issue; I would just > hate to see you pop up here in a few months with bigger problems > than this if it can be avoided. We had bigger problems than this and still survived, don't worry ;) PostgreSQL (be it 7.4.2, 7.4.19 or 8.3.3) in our 60+ vessels, is the last thing we have worried about over the last 10 years. Generally PostgreSQL lasts even after hardware/OS have given up. (email)

500 Machines I had exactly the same experience 3 years ago. Complete power failure (the stand-by generator took fire) in one small datacenter (around 500 machines). We had Oracle, SQL Server, DB2, MySQL, Progress, and of course PostgreSQL. The only database engine that restarted with no operation required was PostgreSQL. There were very minimal problems with Oracle (typing recover on some instances), but we had quite a few problems with the other engines. (email)

Thousands I manage thousands of databases (PostgreSQL, SQL Server, and MySQL), and this past weekend we had a massive power surge that knocked out two APC cabinets. Quite a few machines rebooted (and management is taking a new look at the request for newer power cabinets heh). Talking theory is one thing, predicting results is another...and yet the only thing that counts is "what happens when 'worst-case-scenario' becomes reality?" Long story short, every single PostgreSQL machine survived the failure with zero data corruption. I had a few issues with SQL Server machines, and virtually every MySQL machine has required data cleanup and table scans and tweaks to get it back to "production" status. I was really impressed...you guys do amazing work. Thank you. (email)

Proven The only machine to survive was the corporate intranet running pgsql on twin 15k SCSI drives with a proven reliable battery backed controller on it. It was mine. (email)

Thursday, July 28, 2011

Part 3 of my Email Snippets series: The email lists are occasionally introspective, which is reflected in the following philosophical email postings:

Adventure > If you guys care about money and time why would you spend the best years of > your life basically copying commercial products for free? Because for a > person with higher than average IQ far less than one percent of any program > is creative and needs some thinking and the bulk of it is just a million > stupid details. > … The work on PostgreSQL is adventure, and very good experience, very good school for me. It's job only for people who like programming, who like hacking, it isn't job for people, who go to office on 8 hours. Next I use PostgreSQL for my job — and hacking on PostgreSQL put me a perfect knowledge, perfect contacts to developers, and I can work together with best programmers on planet. and I can create some good things. Probably if I work on commercial projects I can have a better money — but life is only one, and money is important, but not on top for me — life have to be adventure! (email)

Compensation Here you're assuming that open source code development on large projects like PostgreSQL is done in people's spare time. In reality, 80–95% of such development is done by people who are paid by their workplace to do so. In the case of PostgreSQL developers, this pay is at least comfortable, so your assumption that this is done uncompensated, in terms of money, is simply wrong. For those who do development and are not directly compensated by their employer for doing so, there are other monetary rewards, such as being able to put such projects on r?sum?s/CVs, which in turn results in better job prospects, consulting fees for specialized knowledge, etc., etc. "and time why would you spend the best years of your life" That time's compensated, in many different ways, as illustrated above. Perhaps your life is in such desperate straits that you can devote time to nothing but acquiring money. If this is true, I feel very sorry for you. I feel even sorrier for you if you are not in such desperate straits, but you are nevertheless devoting every waking hour to the pursuit of money. It's a sad and lonely way to waste your precious days of life. (email)

Dark Side >> In Oracle, deferrable primary keys are enforced by non-unique indexes. >> That seems logical, > > … maybe to an Oracle guy … > I humbly admit being one. I am getting used to the life without the dark side of the force, however. I saw the light, I am saved. When the rapture comes, I will not be left behind. However, I still have to maintain a rather big 4-way Oracle RAC configuration and some satellite Oracle databases. (email)

Workflow This is one of the reasons I love open source in general, and PostgreSQL in particular: Tom has the bandwidth to notice these kinds of workarounds being discussed on support lists, and turn them immediately into improvements in the planner. Partly because (I assume, based on the commit message) Andres's parenthetical comment red-flagged it for him, since he knew he could trust Andres's opinion that there was probably a planner improvement hiding here. Amazing! (email)

Paranoia Being paranoid is a good thing. It's what DBAs are paid for. (email)

Wednesday, July 27, 2011

Our fall (northern hemisphere) lineup is stronger than every before. We have:

That is the strongest fall Postgres-specific conference lineup that we have ever had.

Wednesday, July 27, 2011

EnterpriseDB is looking for beta testers for its new Postgres Enterprise Manager. The tool allows for monitoring, tuning, and capacity planning of multiple Postgres clusters from a single console. Download of the beta is free, but requires email registration.

Tuesday, July 26, 2011

Part 2 of my Email Snippets series: In helping Postgres users, we see many SQL queries. Some queries are well crafted, and others just need improved formatting to help clarify their intent or the cause of the problem. However, a small subset of queries elicit strong reactions, and I have referenced some of those reactions below:

Ill I would hate to see the rest of the design. I am ill from this tiny fragment. (email)

Second reaction to the same idea [ blanch... ] That's not the worst SQL code I've ever seen, but it might be in the top ten. (email)

Spinning For the record, the table we're having trouble inserting into is ~100 rows with ~50 indexes on it. E.F Codd is spinning in his grave. (email)

Read my mind > Is there anyway to load this data without specifying a column list? Nope, sorry, COPY lacks the "READ MY MIND" option. (email)

Torture And yes we do data analysis that tortures SQL, but SQL allows us to do so many things quickly and less painfully. Better to torture the machines [than] torture ourselves. (email)

Tables? > Ultimately we need to think of a reporting mechanism that's a bit > smarter than "rewrite the whole file for any update" … Well, we have these things called "tables". Any chance of using those? (email)

Explosion > Good luck finding out how to interpret the dots, but it's specified > somewhere. My head just exploded. (email)

Indexes Indexes aren't a magical performance fairy dust. (email)

Monday, July 25, 2011

Today at OSCon Data Postgres user Ted Dziuba gave a good presentation (summary) about the performance dynamics of storage. It was similar to my Database Hardware Selection Guidelines presentation, but with more practical and specific storage details.

Monday, July 25, 2011

The new version of Mac OS X server (code-named Lion) has replaced the default MySQL install with Postgres (article):

If you're looking for the configuration for MySQL, you won't find it, either in the GUI or in the command line. That's because Apple has replaced it with PostgreSQL, another open source database. On one hand, this is an improvement, because PostgreSQL is considered to be more powerful than MySQL. But whereas Snow Leopard's Server Admin tool had GUI settings for MySQL, PostgreSQL is command line only in Lion Server.

Sunday, July 24, 2011

Reading over a hundred community emails a day, I see various interesting jokes, phrases, and observations who's value extends beyond the original emails in which they appeared. Fortunately, I have saved some of these. With the northern hemisphere summer upon us, I though it would be a good time to post some of these to my blog in the next several weeks. I am not going to show the entire email messages nor the author, but will link to the original email so you can see the email author and message context.

To start, I will mention some self-deprecating comments:

Programming I just wrote my first patch, and I need to know whether I missed something or not. I haven't used C for a really long time, so sickbags on standby, and if you notice something really stupid don't hesitate to call me an asshole (according to Simon Phipps that proves we are a healthy open community). (email)

Reference to Homer's Odyssey Ooops, no DATE_IS_NOBEGIN/DATE_IS_NOEND in 8.3 or 8.2 … I heard the siren call of git cherry-pick, but should have lashed myself to the mast. (email)

Testing? > I kinda doubt that a function failing in that way would pass any > testing. What is this "testing" thing of which you speak? (email)

Tuesday, July 19, 2011

I created a 2-minute welcome video for a Postgres conference that just finished in China (image). (They wisely added Chinese subtitles.) If your conference would like a welcome video, I would be glad to make one for you, and I am sure other community members would be willing to do the same.

Monday, July 18, 2011

If you are looking for the Postgres booth at the O'Reilly Open Source Convention (oscon) next week — don't bother — there isn't one. Postgres has had a booth at oscon for years, but this year we were offered a booth too late for our community to properly staff it. We will not have a booth at the simultaneous OSCon Data conference either, again for the same reason. Kind of odd to have an open source database convention without a Postgres booth, but that's what's happening.

We will have a PG Day event the day before the convention, and will have a few talks and a tutorial during the convention.

Wednesday, July 13, 2011

Yesterday I presented a new talk, Explaining the Postgres Query Optimizer, to the Boston PostgreSQL Users Group, so I have now placed the slides online. I am working on three other new talks that I will post online once I present them at an event.

Wednesday, July 6, 2011

Two days ago I gave my PostgreSQL Adoption and Trends presentation at SERPRO in Porto Alegre, Brazil. They have fortunately placed a video of the 2.5 hour presentation online. The first half of the video is me giving the presentation with consecutive translation into Portuguese. The second half is a question and answer session, mostly in English, which covers many questions about the future growth of Postgres and its features.

Monday, June 27, 2011

If you are near Philadelphia, you are invited to attend the 2011 Postgres Pool Party at my home:

When: Saturday, August 20, 2011, 12pm to 4pm

Saturday, August 20, 2011, 12pm to 4pm Where: my home in Newtown Square, Pennsylvania (directions)

my home in Newtown Square, Pennsylvania (directions) What: barbecue, swimming, and most importantly, good conversation

All Postgres users, developers, and groupies are invited, including their families. Please RSVP via email.

Thursday, June 16, 2011

There is a perpetual debate among database designers between surrogate keys (generated by the database) and natural keys (supplied by the user). There was a discussion recently on the Postgres general email list (April thread, May thread) on this topic. Having attended many conference presentations on this topic, I thought I knew everything about it. However, this thread taught me a few things. What first got my attention was this comment by Jeff Davis:

Once you really try to define "natural" and "surrogate" keys, I think a lot of the arguments disappear.... In particular, I think you are falsely assuming that a natural key must be generated from an outside source (or some source outside of your control), and is therefore not reliably unique. You can generate your own keys, and if you hand them out to customers and include them on paperwork, they are now a part of the reality that your database models -- and therefore become natural keys.

This email thread doesn't solve the surrogate vs. natural key debate, but it does add some nuance to the debate that I never realized. What struck me was the realization that the surrogate vs. natural key debate essentially hinges around how a database design communicates to the outside work, and how it adjusts to changes in the outside world.

Another interesting insight was the requirement that databases using surrogate keys must use a uniqueness constraint to guarantee that data linked by surrogate keys can be uniquely matched with items in the outside world, or the schema risks being meaningless. So, basically, an effective schema must map to unique items in the real world, and must efficiently adjust to changes in the outside world too. No wonder this is a perpetual debate.

Wednesday, June 15, 2011

Pg_Upgrade often gets slighted by major Postgres developers as a "hack". In fact, during the last PGCon, pg_upgrade earned a derogatory nine bottles of beer during the Drinkers guide to PostgreSQL lighting talk. However, this is actually not surprising because the philosophy behind pg_upgrade is slightly different than most other Postgres utilities. Most utilities aim for high reliability, clean code, and a uniform interface to the underlying libraries and tools it uses. Pg_Upgrade also aims for high reliability and clean code, but its interface requirements are different. Rather than aiming for a uniform interface, it aims for a stable interface, one that is unlikely to change between major Postgres versions. This often requires pg_upgrade to do unusual operations. These are often referred to as "hacks" because the operations use tools and libraries in ways that were never intended.

The proper solution would be to better integrate pg_upgrade into Postgres by creating interfaces for pg_upgrade at the proper abstraction level. However, this is problematic because tighter integration between pg_upgrade and the Postgres internals makes pg_upgrade more sensitive to changes in major Postgres versions. Also, many of these interface changes could not be back-patched (for reliability reasons), meaning pg_upgrade couldn't rely on the proper interface existing in older versions of Postgres. For these reasons, pg_upgrade relies almost exclusively on the public interface of Postgres, which rarely changes between major releases, and exists in all versions of Postgres. This means pg_upgrade will always appear as a "hack" — and that is a good thing.

Wednesday, June 15, 2011

Having watched someone yesterday on IRC successfully use pg_upgrade to upgrade a multi-terabyte database, I thought it would be a good time to update everyone on the changes to pg_upgrade in Postgres 9.1 (now in beta):

Performance has been improved.

The source code has been cleaned up.

The user interface is unchanged.

Such minor changes are a good thing — it means that though Postgres 9.1 has many new features, pg_upgrade required no adjustments for those features, which has always been the goal.

Tuesday, June 14, 2011

The Postgres community is in beta testing for release 9.1 — the final 9.1 release is still a few months away. This process is familiar to many long-time Postgres observers. I have written about our predictable release process before, and there is even a chart showing our release history. Our release process has been done this way for so long that many people forget why it is important.

This blog entry by Andy Lester about Perl 6 cements in my mind why our release process is so critical to our continued success. It was written 3.5 years ago, when Perl 6 was already 7.5 years in development. It makes a powerful argument that the delayed release of Perl 6 ("by Christmas") has hurt Perl adoption in several significant ways. Frankly, if Perl weren't so strong, it could have killed the project, and has certainly dampened Perl adoption and fueled the momentum of competing scripting languages. Andy's blog entry states the damage even clearer:

I don't question the desirability of Perl 6 either. I can see how, when it's finally finished, it will be an improvement over any language available. However, from a Communications standpoint, it's obvious that there are significant problems in communicating about perl to the world at large. Perl 6 has been a Public Relations disaster, one that has made it harder to attract developers, other contributors, users and companies. Again, from a Communications/PR standpoint, our goal should be to stop shooting ourselves. And that means taking the public focus off Perl 6 as much as possible.

So, why is the Postgres release process so important? Why are timely, reliable, and feature-rich releases so important? Because what we are doing is radical ("An open source enterprise-class database?") and our release professionalism helps to encourage confidence in those moving to Postgres — and we do a fine job of it.

Monday, June 6, 2011

This Wednesday, June 8, I am doing a webcast of my MVCC Unmasked presentation (registration required). I have two more webcasts coming up and will announce them once they appear on the EnterpriseDB web site.

Wednesday, May 18, 2011

At the developer meeting at PGCon today, a Salvation Army group was booked in the adjacent meeting room. They started their meeting by singing songs, and that brought up the question of whether Postgres should have a song. Postgres already has logos, conferences, web sites, and t-shirts, so maybe it is time for a Postgres song. On the other hand, most companies don't have songs, though countries do — it would be nice to sing at conferences.

Friday, May 6, 2011

Yesterday InformationWeek had a good article about the upcoming Postgres 9.1 features. It focused on Nearest Neighbor search, Security Enhanced (SE) Linux integration, and true Serializable isolation enhancements. It quoted Josh Berkus and me.

Thursday, April 21, 2011

Postgres, as a community-driven open source project, lacks much of the hierarchical structure usually associated with successful organizations. The production of high-quality Postgres software year after year certainly proves that hierarchical structure isn't required, but how does the lack of an hierarchical structure affect long-term planning?

Well, surprisingly, we seem to be successful on that front too. A survey commissioned by Sun Microsystems in 2007 indicated five limitations to Postgres adoption (email):

1. Easy Install 2. Simple, low-overhead replication 3. Upgrade-in-place 4. Administration & monitoring 5. Driver quality/maintenance

(This list was officially recorded at a 2009 Postgres developer meeting.)

And how is Postgres today? Josh Berkus stated it well in the above email, "It's really nice the number of the above issues we've knocked out since 2007", and I replied, "Wow, that is amazing, like we actually have a plan.".

The email thread this spawned it quite interesting. The first part shows confusion over where the list is recorded (we probably could have used more organization there) and complaints about our poor wiki search capabilities (again, I guess we need help). The second part critically examines how well we have removed these limitations, and wonders about the next five limitations.

So, those five items seem to have been largely addressed by mostly independent contributors who worked to solve problems that both Postgres and their organizations and customers shared. Sounds like a win to me, even if it is surprising.

Wednesday, April 20, 2011

My previous blog post was in need of serious data analysis, so I wrote a query that uses a WITH clause, LAG window function, and UNION clause to generate this output:

version | reldate | months | lines | relnotes | change | % change ----------+------------+--------+---------+----------+--------+---------- 1.0 | 1995-09-05 | 18 | 172470 | | -78402 | -31 1.01 | 1996-02-23 | 6 | 179463 | | 6993 | 4 1.09 | 1996-11-04 | 8 | 178976 | | -487 | 0 4.2 | 1994-03-17 | | 250872 | | | 6.0 | 1997-01-29 | 3 | 189399 | | 10423 | 5 6.1 | 1997-06-08 | 4 | 200709 | | 11310 | 5 6.2 | 1997-10-02 | 4 | 225848 | | 25139 | 12 6.3 | 1998-03-01 | 5 | 260809 | | 34961 | 15 6.4 | 1998-10-30 | 8 | 297918 | | 37109 | 14 6.5 | 1999-06-09 | 7 | 331278 | | 33360 | 11 7.0 | 2000-05-08 | 11 | 383270 | | 51992 | 15 7.1 | 2001-04-13 | 11 | 410500 | | 27230 | 7 7.2 | 2002-02-04 | 10 | 394274 | 256 | -16226 | -3 7.3 | 2002-11-27 | 10 | 453282 | 311 | 59008 | 14 7.4 | 2003-11-17 | 12 | 508523 | 269 | 55241 | 12 8.0 | 2005-01-19 | 14 | 654437 | 236 | 145914 | 28 8.1 | 2005-11-08 | 10 | 630422 | 184 | -24015 | -3 8.2 | 2006-12-05 | 13 | 684646 | 236 | 54224 | 8 8.3 | 2008-02-04 | 14 | 762697 | 243 | 78051 | 11 8.4 | 2009-07-01 | 17 | 939098 | 336 | 176401 | 23 9.0 | 2010-09-20 | 15 | 999862 | 258 | 60764 | 6 9.1 | 2011-12-30 | 15 | 1069547 | 193 | 69685 | 6 Averages | | 10 | | | | 7.57

(I was also able to add data about some of the earlier releases.) You can see the few cases where the code size decreased, and the two large increases for 8.0 and 8.4. I assume the increase in 8.0 was due to point-in-time recovery, savepoints, and the Windows port. I guess the 8.4 increase was due to the addition of the WITH clause and window functions, which I used for this analysis.

Wednesday, April 20, 2011

I just realized we will exceed one-million lines of C source code in Postgres 9.1 (currently 1,069,547; computed using codelines). It has been a while since I reported on our C source code line count, so here is an updated list:

Date | Release | Lines of code --------------+----------+---------------- 1994-03-17 | 4.2 | 250,872 1995-09-05 | 1.0 | 172,470 1996-02-23 | 1.01 | 179,463 1996-11-04 | 1.09 | 178,976 1997-01-29 | 6.0 | 189,399 1997-06-08 | 6.1 | 200,709 1997-10-02 | 6.2 | 225,848 1998-03-01 | 6.3 | 260,809 1998-10-30 | 6.4 | 297,918 1999-06-09 | 6.5 | 331,278 2000-05-08 | 7.0 | 383,270 2001-04-13 | 7.1 | 410,500 2002-02-04 | 7.2 | 394,274 2002-11-27 | 7.3 | 453,282 2003-11-17 | 7.4 | 508,523 2005-01-19 | 8.0 | 654,437 2005-11-08 | 8.1 | 630,422 2006-12-05 | 8.2 | 684,646 2008-02-04 | 8.3 | 762,697 2009-07-01 | 8.4 | 939,098 2010-09-20 | 9.0 | 999,862 2011-??-?? | 9.1 | 1,069,547

It is also good to see the consistency of yearly major releases.

Wednesday, April 20, 2011

In the past few weeks I have received several compliments at conferences about the clarity of our source code. If you have never looked at it, you might want to (sample).

There are two reasons for this clarity: first, the community realized early that source code clarity was one of the best ways to encourage new developers to get involved, and to avoid mistakes. Second, the Postgres source code is automatically reformatted by pgindent before every major release to guarantee source code consistency.

It's radical to reformat source code on a regular basis, but it has helped to reenforce the high standards we have for everything that is done by the Postgres community.

Tuesday, April 19, 2011

I reported a pg_upgrade bug two weeks ago, and yesterday Postgres 9.0.4 was released which fixes this bug. As outlined in the wiki, this release only corrects new uses of pg_upgrade; users who ran pg_upgrade previous to Postgres 9.0.4 must run the fix script listed in the wiki.

I want to thank Andrew Gierth for the initial research on this, and thank the Postgres community for pushing out a release with this fix so quickly. The good news is that only three users reported experiencing this problem, and all their databases were properly corrected. I am hopeful that we caught this bug early and that pg_upgrade can continue to be a relied-upon method for rapid major Postgres upgrades.

Wednesday, April 13, 2011

I was asked for a list of major Postgres 9.1 features , so here is my list:

Unlogged tables to better handle some NoSQL workloads

Synchronous replication for greater reliability

SQL/MED (Management of External Data) (flat files, other databases)

Per-column collation support

Security Label, including SE-Linux integration

True serializable isolation with predicate locking (already had snapshot isolation)

Non-SELECT statement recursive queries

Nearest-Neighbor (order-by-operator) Indexes

Extensions (plugin) commands

PL/Python overhaul (PL/Perl was done in 9.0)

Wednesday, April 13, 2011

A video of my keynote address yesterday at O'Reilly's MySQL Conference is now online. This twelve-minute presentation was to a primarily MySQL audience so I covered some of the history of our project, ways in which we are unique, and features of the Postgres 9.0 and upcoming 9.1 releases.

Update: A common criticism of my keynote address was that I was too apologetic and/or humble, but that was by design. When speaking to an audience that might not be interested in my topic, I often low-ball the presentation because it is an effective way of getting listeners to think about my ideas, and to lower the natural barrier people have to hearing information that conflicts with their long-held beliefs. A humble presentation makes people realize I am not selling anything, and that my information is quite genuine and merits serious consideration. 2011-04-15

Monday, April 11, 2011

I am attending O'Reilly's MySQL Conference this week. I had to be convinced by Postgres community members to attend this conference, and I had flashbacks to my teenage years:

Mom: Greg (a family friend) is having a party on Friday. I think you should go. Me: I don't want to go. Mom: You will have a good time. Me: Mom, I won't know anyone there. Mom: Billy will be there. You like talking to Billy. Me: Mom, Billy is kind of weird. Mom: Still, I think you should go.

Seems I wasn't very outgoing as a child, and I guess I'm still not. Well, PG East had MongoDB talks so why not Postgres talks at MySQLCon. I will get to study MySQL developers in their native environment, or at least all bunched together.

I am speaking three times tomorrow (Tuesday), and then don't have anything scheduled until Friday when I am doing EnterpriseDB training, because, hey, I'm here, so why not. EnterpriseDB and the Postgres community both have booths at this conference, and they both want me to help with booth duty, plus I want to attend the Postgres talks, so it doesn't seem I am going to have much downtime at this conference.

So, I guess I will have people to talk to, and I will have a good time. Thanks Mom.

Friday, April 8, 2011

Today we released a fix for a bug that affects all systems upgraded by current versions of pg_upgrade and pg_migrator. This wiki page contains the announcement and steps necessary to avoid query errors caused by the bug.

I hope we have caught this bug early — there are only three known cases of query errors caused by this bug, and I believe those three users have been able to correct their systems thanks to help from the IRC channel. If you have used pg_upgrade or pg_migrator in the past, you should review the wiki and take the steps outlined to correct your systems. I apologize for the problems this has caused and I hope this is a rare mistake that will not be repeated.

Tuesday, April 5, 2011

I am attending or presenting at events every month for the next eight months, through November, in Santa Clara (CA), Philadelphia, Ottawa, Mexico, Oregon, Colombia, Baltimore, and Brazil. June is empty and two unannounced conferences are planned for October. If you are near any of these places, you should consider attending; details are on my conferences page.

Update: PostgreSQL Conference Europe has been announced for October. 2011-04-12

Update: I will now be attending FISLI in Brazil in June. 2011-04-19

Sunday, March 20, 2011

I have completed drafting the Postgres 9.1 release notes. They will be updated and improved in the coming months as we approach the final release. Release note generation was easier this time (4.5 days) because git was quicker at returning commit details.

Tuesday, March 15, 2011

The final 9.1 commit-fest was closed last week so the development team is now marching toward the final Postgres 9.1 release. There is still months of cleanup and testing ahead but all our energies are now focused in that direction. I have started on the release notes and others are pouring over features, fixes, and documentation to get ready for our first beta, which will probably be in April. You can get a sneak-peak by checking the Alpha 4 release notes.

Monday, March 14, 2011

I am excited about next week's PgEast conference at the Hotel Pennsylvania, but probably not for the reason you think. Thirty-five years ago, as a teenager, I attended a Star Trek convention at the hotel (then called the Statler Hilton). I remember clearly riding in an hotel elevator with James Doohan (Scottie), and when someone in the back mumbled "Bridge", we all cracked up. I am excited to be returning to the hotel for another convention. For reasons I will not mention (link, link), most attendees will be sleeping at a different hotel, The New Yorker Hotel.

Sunday, February 27, 2011

I am attending the SCALE conference this week in Los Angeles. The conference has been an eye-opener regarding the increased buzz about Postgres. Our dedicated PGDay on Friday was a huge success with many talks standing-room only and overflowing into the hallway. Our biggest problem was the need for a larger room, which we will get next year. That's a good problem to have!

Joe Conway was in our booth all day Saturday and he saw a big difference — last year there was much concern about Oracle buying Sun/MySQL — this year he engaged people walking by and they said they were using Postgres, loved it, and it was running fine so they had no questions — again, a good problem to have.

I remember the years we knew we were producing a superior open source solution but were often overlooked. That era is over and we are reaping the rewards for many years of hard work.

Tomorrow (Monday) I am teaching an EnterpriseDB class in the same hotel as the conference. I figured as long as I am here I might as well do some training. Looking ahead, I have big conferences for the next three months (New York City, Santa Clara, Ottawa) — June is wide open right now. Wow, another good problem to have!

Friday, January 7, 2011

I am excited to be giving presentations about Postgres every month for the next five months, with New York City starting it off this Monday (event list). Interestingly, all the events are in North America: New York City, Los Angeles, Santa Clara, Ottawa, and Philadelphia (attending only).