This blog is about my work on the Postgres open source database, and is published on Planet PostgreSQL. PgLife allows monitoring of all Postgres community activity.

Online status:

Unread Postgres emails:

Email graphs: incoming, outgoing, unread, commits (details)

Thursday, December 16, 2010

I just returned from conferences in Germany and Italy which, as usual, allowed me to easily learn about new technologies. I have linked to these excellent talks in hopes others can benefit:

Devrim's talk about Red Hat Cluster software was very interesting. Postgres relies on OS-specific tools for auto-failover and, though I have recommended Red Hat Cluster several times, I did not know much about it before Devrim explained it.

Similarly, David Fetter has been talking about Writeable Common Table Expressions (CTE) for several years, but I only understood their value after seeing his talk.

Another eye-opener for me was Jean-Paul Argudo's talk about Pgpool-II and its many features.

The surprise talk was Gianni Ciolli's talk (image, details) about writing Postgres code to play chess.

Tuesday, November 16, 2010

There are two Postgres gems of wisdom I was reminded of today. The first was something Joe Conway said at PgWest 2010 this year during his tutorial, Building an Open Geospatial Analysis Technology Stack. He said, paraphrased, "Sometimes it is useful to bring the data to the code, and other times it is useful to bring the code to the data". He was discussing the use of the PL/R server-side language, but it applies in many use cases.

The second gem is a video of Josh Berkus's 2009 talk, 10 Ways to Wreck Your Database. If you want to relish the word "anti-performance", this talk is for you. You will find yourself thinking, "Am I supposed to do that? Oh, yeah, that is what I am not supposed to do". I had the joy of seeing this talk live, but I enjoyed a second viewing today.

Friday, November 5, 2010

The most creative talk I saw at PgWest 2010 this year was one by Nathan Boley titled Improving Planner Prediction Accuracy with Custom Selectivity Functions.

First, some background — recent versions of Postgres record the 100 most common values (MCV) of each column for use by the optimizer. Queries that use values that are not in that 100 MCV must estimate the value's frequency. The current Postgres code assumes that non-MCV values are evenly distributed.

Nathan's idea was to use the existing MCV values to estimate the frequency of non-MCV values. For example, if the MCV values show a normal distribution, it is very possible estimate the frequency of non-MCV values by determining where the non-MCV value appears on the normal distribution curve, e.g., if it is near the mean, it has high frequency. A cruder example would be to just assume that non-MCV values near MCV values are more frequent than those farther away.

Such a computation does not need any additional statistics to be collected. However, only value distributions that have an inherent ordering would benefit, e.g., measurements and postal codes would benefit, but state codes, because they are arbitrarily assigned, would not. I am not sure how we can detect inherent ordering.

Thursday, November 4, 2010

I gave my MVCC Unmasked presentation yesterday at PgWest 2010 so the talk is now online. The talk was well-received, with many audience questions. The slides are heavily annotated, which should make them easier to read for people who didn't attend the talk. A video was made and I will post here when that is available.

I learned a few things writing this talk:

Query tricks to make MVCC visible

Cleanup would be needed even without MVCC (deletes, aborted transactions)

Single-page cleanup happens even in non-update (non-HOT) cases

One other thing I learned is that the order features are added to Postgres often skews how we view these features — that new features often make more sense thought of in a logical way, rather than considered in the order they were added to Postgres. I have tried to do that with this talk, and it gave me a new appreciation for Postgres's technological achievements.

Saturday, October 30, 2010

I just returned from five days in Moscow. HighLoad++ conference organizers invited Simon Riggs, Robert Treat, and myself to speak at a three-hour Postgres mini-conference as part of their main conference. Robert did a great job explaining the challenges of managing high-volume systems, and Simon presented the history of Postgres performance enhancements in a way I had never seen before. Conference attendance was good, so hopefully we helped make the conference a success.

I was surprised that they flew in three Postgres people (and paid for their expenses) for a non-Postgres-specific conference. This might have been a first, but probably not the last, and indicates Postgres's growing popularity. For example, O'Reilly's MySQL Conference is looking to create a Postgres track for their conference (new submission deadline: November 3).

I had a chance to have dinner with several Moscow Postgres people during the trip, which is always enjoyable. Anyway, I am home for a three days, and then off to PgWest 2010.

Thursday, October 28, 2010

You have probably seen many posts about the switch of Postgres to use Git. I thought it would be helpful to explain why the conversion was done.

Postgres has used CVS as its source code management (SCM) system since Internet development started in 1996. CVS was never perfect — one of the major limitations was that changes to multiple files by a single CVS commit were not treated as a unit. We found ways to work around that limitation. Subversion was designed as a replacement for CVS, and fixed the multi-file commit problem. However, the community had already worked around that problem and did not see switching to Subversion, and the disruption it would caused, as wise.

Five years ago the Linux development team started work on a new source code management called Git that would correct and improve upon many of the source code management systems that came before it. Particularly for Postgres, Git allowed for easier hosting of alternative source trees by developers, and easier code sharing. For these reasons, and because many developers were already using Git successfully in their personal development, the community decided at PGCon 2010 to convert to Git, and did the conversion in September.

While commiting with Git is similar to CVS, Git feels like CVS broken into 1000 pieces, meaning it offers a lot more flexibility in how source code is managed. Our developers seem pleased with the result.

Update: This article explains the history of source code management systems.

Thursday, October 28, 2010

I have written two new presentations: the first is MVCC Unmasked (description). This talk is unusual because it shows Postgres Multiversion Concurrency Control (MVCC) internals using an unmodified server. I will post the PDF next week once I deliver the talk at PgWest 2010. I will also be presenting this at PGDay Europe 2010. I already mentioned my second new talk, Database Hardware Selection Guidelines, in a previous blog entry.

Thursday, October 21, 2010

I was just looking at our 9.0 libpq documentation and was struck by how professional and clean it looks since Thom Brown overhauled it last month. Even my longstanding complaint that fixed-width fonts in Firefox were too large compared to the surrounding text was fixed last week.

Thursday, October 21, 2010

This creative video, Mysql Is a Database (warning: strong language), is a spoof of the popular iPhone4 vs HTC Evo video.

Update: I have been told this video is a spoof of the MongoDB is Web Scale, which is itself a spoof of iPhone4 vs HTC Evo.

Thursday, October 21, 2010

Greg Smith has created a new informative wiki page about reliable I/O writing, which is critical for guaranteeing durable transactions. The wiki page contains links to several excellent documents by Greg, including a chapter from his new book. It also contains a link to a new graphic-heavy presentation I wrote called Database Hardware Selection Guidelines.

Friday, October 8, 2010

Last week I attended a conference in Baltimore, and I am currently in Bolivia doing EnterpriseDB training. In the next four weeks I will attend conferences in Boston, Moscow, and San Francisco; my web site has all the details. (I was busy this time last year too.)

Interestingly, I was able to do most of the Bolivian training in Spanish with the help of Oxiel Contreras, a former online student from Bolivia. The odd part is that I was a poor high school Spanish student 30 years ago, and haven't studied Spanish since. Seems hearing so many foreign languages in recent years has affected my brain.

Tuesday, September 21, 2010

I usually read the Slashdot comments about Postgres releases because it gives me a good barometer of how Postgres is perceived. I have been doing this for 10+ years, and started at the time when MySQL was the open source database darling, and Postgres was something "you might want to look at".

For Postgres 9.0, the comments were very positive. The comments include these glowing quotes:

The Postgres documentation was mentioned several times:

Even the appearance improvements for the web documentation got a mention.

Finally, a discussion about whether conditional triggers added in Postgres 9.0 is a useful feature yielded this humorous news report:

Two DBA's shot each other dead yesterday morning after they fell out over the maintainability of column-based conditional triggers. A police officer at the scene remarked: "If only they had been using MySQL with the default ISAM tables which support no such functionality, then none of this would have happened."

Monday, September 20, 2010

With the big release of Postgres 9.0, I want to recommend a new Postgres article by Alan Shimel. First, the title, "Mirror, Mirror On The Wall, Which Is The Best Open Source DB Of Them All?" combines the question of which database is best, the new Postgres replication feature (mirroring), and a fair tale quote to create a memorable title. Second, the article has great content with emphasis on the benefits of our community and license, and includes the quote "They think … the move to PostgreSQL will soon turn into a stampede." It also includes a mention of in-place upgrades (pg_upgrade). It even includes audio of the interview upon which the article is based. What's not to love?

Tuesday, September 7, 2010

There was a rather philosophical discussion on the email lists last week about why people contribute to Postgres and open source that is definitely worth reading. I believe the most profound comment was this:

The work on PostgreSQL is adventure, and very good experience, very good school for me. It's job only for people who like programming, who like hacking, it isn't job for people, who go to office on 8 hours. Next I use PostgreSQL for my job — and hacking on PostgreSQL put me a perfect knowledge, perfect contacts to developers, and I can work together with best programmers on planet. and I can create some good things. Probably if I work on commercial projects I can have a better money — but life is only one, and money is important, but not on top for me — life have to be adventure!

Saturday, August 14, 2010

I have been invited to attend a Postgres security meeting at OmniTI in Maryland on September 8. The meeting will include other local Postgres community members, government security experts, and contractors.

The goal of the meeting is for everyone to meet in person and to plan for necessary Postgres security additions and secure installation guidelines. While this will be US-government specific, I expect all work products to be publicly released and available to benefit the larger Postgres global community.

Monday, August 9, 2010

'depesz' already wrote a clear blog entry explaining the effect of enabling standard_conforming_strings in Postgres 9.1. My blog entry explains why it will be changed in Postgres 9.1.

To understand standard_conforming_strings, you have to know some Postgres history. Postgres is written in the C language, and some C syntax was added to Postgres for convenience. For example, Postgres supports the C syntax for not-equal, != , as well as the SQL-standard non-equal syntax, <>. This is a good addition of C syntax because it does not affect SQL-standard-compliant behavior.

Another C syntax addition that was added to Postgres when it was developed at Berkely is the interpretation of C backslash escapes in strings, e.g., in 'abc

def',

embeds a newline into the string. While this is convenient, it unfortunately breaks SQL-standard-specified behavior. In the early years, this incompatibility was minor compared to other compatibilities, but as Postgres attained greater standards compliance, this incompatibility started to matter to users porting applications from other databases to Postgres.

This issue was addressed in 2005 with Postgres 8.1 by adding three new capabilities:

a new string designation prefix, E, which specifies escape string processing, e.g., E'abc

def'

a new parameter standard_conforming_strings, which, when enabled, provides SQL-standard-compliant string behavior in non-E strings by not interpreting backslash escapes (default: off)

a new parameter escape_string_warning, which, when enabled, issues a warning if a backslash is used in a non-E string, and standard_conforming_strings is off (default: on)

These changes were made to encourage users to use the new escape string designation (E'') for all strings needing backslash escape processing.

The plan was that once all users were using the new escape string designation for queries, standard_conforming_strings could be turned on so normal strings could stop interpreting backslash escapes. Of course, it is very hard to know the extent that people have migrated their applications, because, of course, they can just ignore the warnings. Aside from returning incorrect results, improper processing of escapes might lead to security vulnerabilities, such as SQL injection attacks. Therefore, it took five years for the development community to be comfortable that the vast majority of users had migrated strings needing escape processing to the E'' syntax. We also needed to be sure that all backend code and supplied tools handled strings properly.

So Postgres 9.1 will finally have SQL-standard-compliant string processing enabled by default, though such behavior has been available since Postgres 8.1 if enabled by the user. This brings Postgres closer the the SQL standard. The next community challenge is to be sure all external libraries properly handle escaping strings before Postgres 9.1 is released, so we have set up a wiki to record which interfaces are ready.

Sunday, August 8, 2010

I am speaking at the New York PostgreSQL Meetup Group this Tuesday about the new features in Postgres 9.0. (It will be similar to a webcast I did a few months ago.)

I have a few interesting conferences coming up: surge (Baltimore), OpenSQL Camp (Massachusetts), PgWest 2010 (San Francisco), and perhaps PGDay Europe 2010 (Germany).

Update: We had 13 people attend, which is a good turnout. Our next meeting is scheduled for September.

Wednesday, August 4, 2010

Postgres implements Multiversion Concurrency Control (MVCC) using several normally-invisible fields, notably xmin and xmax. The xmin column records the transaction id that created the row, and xmax records the transaction id that expired the row, either through an UPDATE or DELETE.

I often demonstrate MVCC by showing the xmin and xmax columns:

SELECT xmin, xmax FROM mytable; xmin | xmax ------+------ 664 | 0

Unfortunately it is hard to see a non-zero xmax column because by definition a non-zero xmax means the row is expired (or will be). I only recently realized that I can show a non-zero xmax column by deleting a row in another transaction and keeping the transaction open:

BEGIN WORK; DELETE FROM mytable;

and then querying the table from another session:

SELECT xmin, xmax FROM mytable; xmin | xmax ------+------ 664 | 665

The 665 indicates the row will become invisible if the multi-statement transaction completes. If it aborts, the system will realize 665 aborted and will ignore xmax.

Wednesday, August 4, 2010

As a followup to my previous blog post, this humorous Slashdot comment under the title "Reliability" warrants coverage:

> our MySQL server > 3 x 150GB Raptors > 100 Adaptec SATA RAID controller > RAID 5 Now you have four problems. Could you Do It Wrong in any more ways?

This is in response to someone who didn't want to use SSDs for reliability reasons.

Wednesday, August 4, 2010

The database is usually a critical piece of infrastructure in an organization; when the database is down, many things stop working, so database reliability is often a serious concern. While the reliability of database software is important, for Postgres it is often the infrastructure that Postgres depends on that causes outages, not Postgres itself. We see this regularly on the Postgres email lists.

To get started, a fundamental assumption has to be discarded — that computers are abstract machines and always do what they are told. While we often treat hardware as abstract devices, in reality they are physical, and are susceptible to failure just like any physical entity.

So, what things can go wrong? First, consider disk drives. This Slashdot comment explains how non-atomic disk drive really are:

Disks have a lot, and I mean a LOT of ECC on them. It is not a situation of "I need to write a 1 so I'll place one at this location on the drive." They use a complex encoding scheme so that bit errors on the disk don't yield data errors to the user. … Then there's the fact that bits aren't even stored as bits really. … They are written using flux reversals, but the level is not carefully controlled, it can't be. So when you read the data the drive actually looks at an analogue wave.

Yikes — that certainly makes me feel less confident about disk storage, and its abstract, predictable behavior. Many people are concerned about disk drive, so many people use inexpensive RAID controllers and believe their reliability issues are solved. Then there is the problem of disks not storing data permanently before acknowledging the write (writeback caching); this is covered extensively in the Postgres manuals.

RAM is also not an abstraction — it can fail too. This article does a great job of diagnosing an executable that suddenly stopped working due to a RAM error.

Finally, the motherboard can fail too. Last month there were extensive reports about Dell distributing faulty motherboards, and covering up the fact, with failure rates up to 97%!

All this information underscores how fragile hardware can be, and highlights that hardware reliability is a significant aspect of overall database reliability.

Update: Dell has settled the court case over covering up the shipment of defective motherboards. 2010-09-26

Update: More details on the Dell cover up. 2010-11-24

Tuesday, August 3, 2010

I wrote an article about the Postgres development process for the Japan PostgreSQL Users Group (JPUG) and they have now published it in English and Japanese. It highlights many of the processes that make Postgres open source development different, frustrating, and effective, with section titles like "No Single Authority", "No Bug Tracking System", and "No Development Roadmap".

Tuesday, August 3, 2010

While Steve Jobs tried to blame everyone but Apple for poor iPhone 4 reception, this article calls this long-term good news for the iPhone and smartphones in general because smartphone designers will increase their focus on call quality. This mirrors the competing interests that are often involved in technological progress:

"Almost everything about technology, to me, is a response to a negative. Something isn't working right. Something doesn't look right. Something doesn't perform right," Petroski told me. "And you try to remove that negative quality and that change is presumably an improvement. But then you release the new and improved product and people find faults with it. Because nothing is perfect. You always have competing constraints."

Postgres certainly has always dealt with this issue, and fortunately we don't have to release millions of devices to find out we made a mistake. We often find out very early, often hours after it is committed, so we usually avoid these pitfalls.

Update: Interestingly, this article argues that most phone conversation is inferior to electronic communication. 2010-08-04

Tuesday, August 3, 2010

Object-Relational Mappers are often a hot topic of discussion among database geeks. At oscon's PGDay this year, Christophe Pettus presented the clearest explanation I have ever heard of the complexities of ORMs and the basically unattainable expectations users have for them. The basic problem is that it is easy to map simple ORM operations into SQL, but complex cases and mass changes require access to SQL for performance. Christophe explained why ORMs increase programmer productivity, but also why ORMs will never be able to do everything SQL can, so organizations should be prepared to add SQL to their ORM systems where appropriate.

A video recording was made of this talk and hopefully it will be posted soon.

Friday, July 30, 2010

Adding features to Postgres is often a frustrating experience for all involved. Some features are clear-cut, but many are subject to interpretation and involve much email discussion. The email discussion is often draining, but this comment from a Slashdot thread about Linux desktop innovation is illuminating. The comment is about human interface engineereers asking for tiny application changes to improve usability, and while the application developer was originally "distracted trying to decide a fitting way to end the e-mail authors life", he ultimately agreed that the changes improved the user experience. I think Postgres developers often feel the same way.

Monday, July 26, 2010

I mentioned in March that user videos would be created at PG East. Those videos have now been released. Hopefully they will be useful in promoting Postgres.

Sunday, July 25, 2010

I have decide to attend the surge conference in Baltimore in September. It is being hosted by Postgres-friendly OmniTI, who just did several Postgres-related presentations at oscon (Robert Treat, Theo Schlossnagle). As a database internals guy, my focus is so much on the software that I often know little about how Postgres is deployed at high-volume sites; I hope to learn about that at the conference. With so much Postgres activity in the United States now, I am doing little travel to international conferences (schedule).

Sunday, July 25, 2010

I had a great time again at oscon, and feel more energized than ever about Postgres. Because I attend oscon every year, it is a great event to gauge changes in Postgres adoption and mindshare. Years ago, Postgres visibility at oscon was minimal, but now Postgres is a major player at the conference. For example, our booth this year was no longer in the open source section (A.K.A. the open source ghetto), but in the section with commercial vendors. We had abundant booth staffing, thanks to Gabrielle, and artistically designed staff t-shirts (image, image, image), thanks to Josh Berkus. I think we were moved to the commercial section because our booth staff and activity often overwhelm smaller booths.

The conference had major tutorials (1, 2) and sessions about Postgres, and Postgres as often mentioned in non-database-specific sessions, including the closing session. I heard one non-database session polled the audience about how many attendees had lost data on specific databases — obviously we did very well in that poll.

So, in summary, great conference, great presentations, and Postgres looked great too!

Saturday, July 24, 2010

There is one week until the Postgres Pool Party at my home. No RSVP necessary.

Friday, July 16, 2010

An email list posting this week revealed that Postgres will be used on the International Space Station by the end of the year, specifically "to store the data on orbit and then replicate that db on the ground". Postgres has always been heavily used by NASA and other government agencies, but this is a uniquely "high"-profile use of Postgres.

Tuesday, June 29, 2010

Some Postgres users have created an detailed outline of Postgres 9.0 features with practical examples of almost every feature. If you are curious about what is coming in 9.0 or are confused by the 9.0 release notes, this is a must read.

Wednesday, June 23, 2010

I am teaching another Drexel University database class this summer called Database Theory. This class builds on the Database Systems class I taught last summer, and again heavily uses Postgres. The class covers many of the practical challenges of database management, like replication, internals, security, and managing large systems. As you can see from the syllabus, I was able to enlist the help of local open source contributors as guest lecturers. The first class is tomorrow.

Wednesday, June 16, 2010

If you are near Philadelphia, you are invited to attend the 2010 Postgres Pool Party at my home:

When: Saturday, July 31, 2010, 2pm to 7pm

Saturday, July 31, 2010, 2pm to 7pm Where: my home in Newtown Square, Pennsylvania (directions)

my home in Newtown Square, Pennsylvania (directions) What: barbecue, swimming, and most importantly, good conversation

All Postgres users, developers, and groupies are invited, including their families. No need to RSVP — just come.

Wednesday, June 16, 2010

I have simplified my presentation, The Magic of Hot Streaming Replication, to show a hot/streaming setup without the need to archive WAL files. This greatly simplifies the setup, and is possible thanks to wal_keep_segments, which allows the retention of a sufficient number of WAL files in /pg_xlog for the slave without the requirement of configuring continuous archiving.

When I originally researched hot standby and streaming replication, I focused on it as an extension of continuous archiving, but I now realize that continuous archiving is unnecessary and overly-complicates the setup if all you need is a read-only, standby server, and don't need the point-in-time recovery features of continuous archiving.

Monday, May 31, 2010

This email thread discusses the advantages (ease of management) and disadvantages (uneven performance, poor reliability) of running Postgres on Amazon's Elastic Compute Cloud (EC2) using Elastic Block Stores (EBS). It includes the idea of using multiple EBS volumes in RAID10, and using multiple EC2 database clusters for redundancy. One quote is telling, "It's possible to run a larger database on EC2, but it takes a lot of work, careful planning and a thick skin." A similar thread from a year ago had a similar cost/benefit analysis.

Wednesday, May 12, 2010

While conventional economics theorizes that increased rewards always leads to better output, this video suggests that many activities, particularly creative ones, produce worse output with increased rewards. The video shows various cases where this can happen, and even interviews open source contributors as a tangible example of people working without financial reward.

This video answers a question I have been asked many times, "Why do people contribute to open source?", and it answers it better than I ever have. Basically, once people have adequate compensation, people are often motivated more by creative challenges than by money. A corollary of this is that financial reward can distort creative output. This might explain why the Postgres user and developer experience feels different from proprietary software — because there are few financial distortions to interfere with the creative work of designing and using Postgres.

Update: Another video about financial incentives

Wednesday, May 12, 2010

After community discussion, pg_migrator has been renamed to "pg_upgrade" and will be an additional supplied module (/contrib) in the main server distribution of Postgres 9.0 (documentation). (It will appear in Postgres 9.0 beta 2.) Pg_Upgrade supports upgrades from Postgres 8.3.X and 8.4.X to Postgres 9.0. As you can see from the documentation, many of the restrictions of upgrading from 8.3 to 8.4 are gone, even for upgrades from Postgres 8.3 directly to 9.0.

This signals a more serious committment from the community to try to provide non-dump/restore major upgrades. The pg_migrator pgFoundry project will continue, but will not be enhanced to support new major versions of Postgres — it will remain only to support upgrades to Postgres 8.4.

Monday, April 26, 2010

It is rare for me to hear about a new replication option, but I heard about one involving pgpool-II last week at the New York PostgreSQL Meetup Group. pgpool-II is not only a connection pooler but supports load balancing by sending data-modification queries to every host, while it load-balances read-only queries across all hosts.

One major problem with this usage is that non-deterministic functions, e.g., random(), CURRENT_TIMESTAMP, can produce different results when sent to multiple hosts. pgpool-II 2.3 and later handles some of these cases, but not all of them. Fully-correct operation requires either developers or administrators to handle unsupported non-deterministic functions. This new replication option involves using pgpool-II in 'master_slave_mode', which causes all data modification queries to be sent to the master, and the master replicates those changes (without non-deterministic side-effects) to the other slave nodes in the pgpool cluster. This diagram (page 19) illustrates this setup.

This new configuration allows pgpool-II to be used in more complex environments with reduced developer and administrator maintenance. Postgres 9.0's hot standby feature makes this even easier to use.

Update: pgpool 3.0 will improve routing of write queries to the master.

Monday, April 26, 2010

With the final Postgres 9.0 commitfest completed two months ago, you might be wondering what is delaying 9.0 beta 1. Along with the normal cleanup of open items, most of the time has been spent finalizing the behavior of our two big 9.0 features, streaming replication and hot standby (which I mentioned previously; presentation available).

Someone asked me to explain what that testing process looks like, and the most graphic description I could think of was a massive automobile accident (image; those fire trucks are our developers analyzing the failure ). It is as ugly as an automobile accident, e.g.:

server doesn't start if I stop it while doing a backup with a slave connected

slave cannot connect if configuration options are not set correctly, but poor feedback is given

slave shows massive overhead when applying streaming logs

configuration variables are unsettable and confusing to a normal user

Postgres has a reputation for adding features that are reliable, even if those features are complex. That doesn't happen by accident — all the work happening now is to make sure these features are as robust and as easy to use as possible when 9.0 is released. I would love to avoid this messy period, but there is no alternatives if Postgres 9.0 is to be as good as all previous Postgres releases. The good news is we are getting close to being ready for beta.

Monday, April 19, 2010

Initially the lure of Virtual Machines (VM) was to allow multiple operating systems to run simultaneously on a laptop. This is particularly useful if you need to run many applications that are only supported certain operating systems, e.g., MS Office. VMs have also always been popular for testing and development. (This video presents a simplified explaination of virtualization.)

However, recently VM usage has grown and are now regularly used on production servers. This recent community email post highlights the advantages of using VMs for production deployments:

fewer production servers (e.g., only two servers can host all VMs)

simpler deployments (administrators can easily copy VMs to production servers)

more reliable deployments (VMs provide a self-contained environment, with fewer potential problems during deployment)

better scalability (VMs can be given more server resources as necessary)

easier testing (VMs can be copied and tested independently of the deployment server)

improved reliability (VMs can be easily started on standby servers once they become production servers)

VMs seem to have the greatest benefit for production use when the deployments or workload are dynamic, or many operating systems are involved. However, there are still legitimate concerns over VM use in production. The VM does add overhead compared to raw-iron deployments (5-10%). Another concern is that the fsync flush to disk properly happens through the VM layer (email).

Postgres has some advantages when used with VMs:

no licensing issues

all data is stored in the file system (no raw storage)

sessions can make use of added CPUs without reconfiguration

additional memory can be automatically used by the kernel buffer cache

shared_buffers and workmem can be changed, but require modification of postgresql.conf

Postgres security is powerful and flexible

VMs that allow adding memory and CPU without restart (e.g., vSphere, Xen) can take advantage of many of these features dynamically. In fact, the only configuration setting that can't be modified without database restart to take advantage of "hot" (non-restart) resource changes is shared_buffers.

We clearly are going to be seeing more installations of Postgres on VM and cloud architectures in the future. If you are interested in how cloud computing relates to virtualization, this video explains it well.

Tuesday, April 13, 2010

Solid-state drives (SSD) are getting larger and more popular. Initially, SSDs appeared to be ideal for databases because they potentially allow fast I/O with non-volatile storage — unfortunately, neither of these is completely true. Let me explain:

Fast I/O: While SSDs offer random I/O speeds far in excess of traditional hard drives (because there are no moving platters or heads), the sequential I/O speed of SSDs is only marginally better than mechanical drives. Database activity that causes random I/O, like index scans that do not fit in RAM, will benefit from SSDs' superior random I/O speeds, but sequential scans only marginally benefit by using SSDs. (Greg Smith explains the limited use case for SSDs in this email.) Postgres 9.0 will allow random page cost to be set per tablespace so administrators can indicate that random I/O has the same speed as sequential I/O for SSD-based tablespaces.

Non-Volatile Storage: Because SSDs offer permanent storage, it is often believed that they are an ideal place to store the Postgres Write-Ahead Logs (WAL) which are flushed to disk on every transaction commit. However, SSDs typically write data in 256 kilobyte chunks, meaning the small write operations that occur at every commit are not ideal for SSD drives, and might not even be flushed to permanent storage immediately. (Many SSD vendors have been vague about this behavior.) (FYI, Fusion-io drives are getting good reviews.) This paper explains the internal workings of SSD drives, and the article summary contains this warning:

We find that SSD performance and lifetime is highly workload-sensitive, and that complex systems problems that normally appear higher in the storage stack, or even in distributed systems, are relevant to device firmware.

In summary, SSD drives are not the panacea we hoped, or at least, not yet. A battery-backed disk drive controller is still the ideal solution for high performance at a reasonable cost. This Postgres email thread from November and December covers many of these details.

Saturday, April 10, 2010

Having just blogged about NoSQL, I think the primary lesson the Postgres community can learn from NoSQL is that not everyone needs all our features, and if it can be easily accomplished, we should allow users to trade features for some of the NoSQL desired behaviors. We already do that in some cases:

You can improve speed by delaying or eliminating durability (synchronous_commit, fsync)

You can reduce consistency overhead by not defining constraints

You can use prepared queries to avoid parser and optimizer overhead

Arrays can often be used to avoid join overhead

You can store non-structured data with hstore

You can serve stale data using multi-master asynchronous replication (Bucardo)

However, there are a few things that will be difficult to allow:

Non-SQL access to data

Reduced overhead by eliminating atomicity or isolation

I can't think of any new behaviors we could optionally allow for potential NoSQL users, but we should be looking to provide them where possible.

Update: One thing on our TODO list that might help potential NoSQL users is the implementation of a native JSON data type. JSON is used as a storage format for many NoSQL databases.

Update: This article is a review of several NoSQL and Web Service databases.

Update: Another article about NoSQL 2010-10-21

Saturday, April 10, 2010

There is lots of buzz about NoSQL databases these days, and thanks to my attendance at Emerging Technologies for the Enterprise Conference, I have learned more about them. I heard a talk about Cassandra, and another about MongoDB.

NoSQL databases differ in several ways from relational databases:

crude or no query language (clients do much of the processing normally done by SQL)

no joins (data must be joined client-side)

sacrifice ACID and transaction properties for speed, fault tolerance, or easy addition and removal of nodes

Few are excited about losing these valuable relational database features, but if your query response and infrastructure requirements can't be met by relational databases, and you can accept these limitations (think social media sites or search engines), NoSQL makes perfect sense.

Thursday, April 8, 2010

Multiversion Concurrency Control (MVCC) is the method used by Postgres to provide high read/write concurrency. You might not have heard of it, but if you have heard the Postgres locking behavior described as "readers don't block writers, and writers don't block readers", MVCC is what allows that behavior. Postgres was an early implementor of MVCC, but now most relations systems use it.

I attended the Emerging Technologies for the Enterprise Conference today and heard an interesting talk related to MVCC. It applied the ideas of MVCC (which allows for high concurrency) to the problem of parallel programming, which is becoming more necessary as massively multi-core systems become more common.

Specifically, the talk was about Clojure, by its author Rich Hickey. Clojure uses MVCC for the access and modification of program data, allowing parallel programming with no locking necessary in the client code; Closure does all the MVCC, and is ACID-compliant, except without durability ("D").

Traditional multi-core-enabled programming involves threads and explicit locking, just like pre-MVCC database systems, but Clojure requires none of that. It is as though a threaded program used Postgres to read and write its variables so it didn't need to do any locking, except the MVCC layer is moved into the client language. This is certainly a radical approach, but so was MVCC when it was initially implemented in database systems.

Tuesday, March 30, 2010

I am speaking at New York PostgreSQL Meetup Group on April 20th about Postgres replication solutions. This will be a much larger group than previous New York City Postgres meetings; 13 people have already registered.

Tuesday, March 30, 2010

My slides from PG East are now online, and I will be giving a webcast of that presentation in late April. The presentation covers the new Postgres 9.0 features of hot standby and streaming replication: how they work, how to configure them, and their current limitations. The presentation also includes a live demonstration of setting up these features.

Tuesday, March 30, 2010

PG East was an unqualified success, as others have already blogged about. This conference might signal a change in the style for dedicated Postgres conferences. Postgres-specific conferences started only a few years ago, mostly at universities. I was one of the early proponents of having conferences at universities after seeing the success of fosdem. I felt the low admission cost made possible with inexpensive university facilities was critical in attracting the mostly-volunteer Postgres developers to the conference.

I am ready to rethink that suggestion having attended PG East. The conference cost only USD $125, and many things were definitely easier having the event in a hotel:

Informal conversations

Attracting business users to the show

Group excursions

Laptop and bag storage

The ability to facilitate informal conversations is an important part of any event, and one of the things that excite attendees and keeps them coming back.

I thought the convenience of having everyone in a single hotel would be prohibitively expensive, but PG East proved that wasn't true. We did have significant business sponsorship of the event, but I found the sponsorship to be tasteful and it did not detract from the event. I wonder if other Postgres conferences will take this approach too.

Saturday, March 20, 2010

I have completed the Postgres 9.0 release notes and you can view them online. There will, of course, be many adjustments to the release notes before 9.0 final.

Monday, March 15, 2010

The upcoming PG East Conference is in my home town, Philadelphia. The conference promises to be a new generation of Postgres conferences, and I want to highlight some of the changes that attendees can expect.

First, as I mentioned before, the hotel venue for this conference is much nicer than typical Postgres conferences. Also, it is located in an area of Philadelphia that is packed with great restaurants and shops.

The conference opens on Thursday, March 25, and the day includes two simultaneous talks every hour, including talks about migrating from MySQL, the use of Postgres at myYearbook.com, and hardware benchmarking. I bet the MySQL talk will be packed. There are many more business-oriented and end-user-focused talks than previous Postgres conferences.

Friday starts with a keynote by fellow Armenian Ed Boyajian, who is giving a non-technical talk challenging the community to expand its reach. The day also includes talks by a Forrester consultant (which I already mentioned), by the Federal Aviation Administration (FAA) about how they use PostGIS to map airports, and, a sure favorite, a talk about the exciting new features in Postgres 9.0. A free party is planned for Friday night.

Saturday contains three or four simultaneous talks every hour, with many deeply technical talks. The big challenge will be picking which talks to attend. Sunday offers three different five-hour training sessions. I am hoping to offer tours of Philadelphia for visitors on Monday.

Why Attend: The conferences has created a web site about why you should attend, including pictures. Practically, while the talks are exciting, the real value of this conference is for you to see a new dimension of Postgres. By spending time with many of the Postgres community developers and other Postgres users, you can get a better understanding of what makes Postgres different, not only from proprietary databases, but also from other open source databases. (PGCon in Ottawa in May has even more developers.) Postgres developers might not look the most polished, but they have a passion for databases that sets them apart, and that will make you more confident in your choice to use Postgres. As I have heard many times before, you can't believe it until you've seen it.

Monday, March 15, 2010

I have started preparing the release notes for Postgres 9.0. I went into great detail last year about how the release notes are created, so I will not bore you with the mind-numbing details this time. I should be ready with the first draft in a week or two.

Update: I have completed stage #3. 2010-03-16 03:33:20 GMT

Update2: I am over 50% done stage #4. 2010-03-17 22:00:48 GMT

Update3: I am done stage #4. 2010-03-18 21:50:36 GMT

Tuesday, March 9, 2010

EnterpriseDB believes in the marketing potential of videos. You might remember them producing a video of me explaining pg_migrator last year.

This year, they will be interviewing PG East attendees to create a Postgres user testimonial video. The community has not used videos extensively for marketing, partly because, while it is easy to create a video, it is difficult to create a good video. (Consider the quality of the average YouTube video.) Anyway, my guess is that the video will be something like this customer video. The video will be shared with the community to help promote Postgres worldwide.

Saturday, March 6, 2010

Many people are waiting for Postgres 9.0, so here is a status report about the release. Two weeks ago we completed the last 9.0 CommitFest and released 9.0 Alpha 4. We are now working on a 9.0 open items list. There are relatively few open items compared to previous major releases because there are now more developers focused on closing open items.

Once we deal with all the items on that list, and any new items that appear, we can release the first 9.0 beta. Technically, we could release a beta before dealing with all open items as long as we are sure that none of the existing open items will require cluster reinitialization (initdb).

Friday, February 26, 2010

A month ago I mentioned a surprising number of Postgres activities in Philadelphia. One month later, those events are past but now there are more.

First, PG East is shaping up to be a big conference:

The conference agenda looks very full, with both developer and business-oriented talks. The business-oriented focus is rather new for Postgres conferences, and I think signals more mainstream adoption of Postgres.

It is at the Warwick Hotel. This lobby photo should give you a good idea of how nice the hotel is — we will simply have to adjust to having a Postgres conference in a fancy hotel.

Noel Yuhanna of Forrester Research will be speaking. He is the person who authored last year's Forrester Wave report that found Ingres and MySQL as the leading open source databases (news report). You can judge for yourself how much "research" went into that report, but we will be nice to him — a bodyguard will be unnecessary.

Registrations are now being accepted. There is a useful "Reasons to Attend" page that will help people who are undecided.

Second, it looks like I will be co-teaching a database class at Drexel University again this summer, but this time, a new, more advanced class that will highlight Postgres technology. Drexel wants to expand their database offerings and train skilled Postgres engineers. Postgres certainly offers students a unique opportunity to understand database technology.

Saturday, February 20, 2010

With so many ways of electronically communicating in the Postgres community, why blog? I have thought about that question and want to list the various electronically communication methods and their advantages:

Email: Email is great for task-oriented communication. It reaches a large audience and allows for deliberate, consensus-oriented solutions.

Instant Messaging/IRC: This allows for quick feedback for dynamic tasks. I often use it to get a quick opinions before using other communication methods, or for dealing with complex problems that can't be easily phrased in an email.

Blog: I find this best for status reports and thought pieces. It reaches a large audience that is looking for that kind of information.

Wiki: The wiki is great for information that is both dynamic and needs to be retained, e.g., team projects, driver lists.

Documentation: This is for permanent information that should be seen by all Postgres users.

As you can see, this thought piece is best presented as a blog post.

Saturday, February 20, 2010

A few weeks ago I learned there was serious confusion over the client-side Python drivers available for Postgres. This chart shows many Python drivers, but the recommended libpq-based driver, Psycopg, was listed on the chart as GPL licensed. This would require programs that use the driver to also be GPL licensed. However, the license is not actually GPL but a complicated hybrid license.

Fortunately, the Psycopg author has agreed to relicense his Python driver under the LGPL in the next release, and there is hope that Postgres developers will get involved to help add any missing features to the driver.

Friday, February 19, 2010

Postgres does not require copyright assignment from contributors because it is BSD licensed. Unlike the GPL, there is no enforcement component in the BSD license. When code and documentation are contributed to the project, the authors give the project the right to distribute their contributions under the BSD license. However, technically, authors also retain full rights to their contributions. Of course, there is little practical difference between retaining full rights and obtaining contributions under the BSD license.

Anyway, while all work submitted to Postgres is covered under the BSD license, the existence of individual copyright lines in our source code, e.g., © Joe Coder, makes company lawyers nervous, so we have a policy of avoiding such content. Unfortunately, this policy has been active only during the past five years and was not always universally enforced. For this reason, we occasionally have to ask authors for permission to remove their copyright lines, or in drastic cases, rewrite trivial code segments when the author cannot be contacted. We perform such cleanups periodically as the personal copyrights are brought to our attention, e.g., 2006, 2006, 2007, 2009, 2010.

We believe we have finally removed all personal copyrights from our source code. Postgres 9.0 will be the first release of Postgres that has no individual copyright statements. Now, the existence of personal copyright statements is insignificant, but their complete removal will probably allow some corporate lawyers to sleep more soundly.

Thursday, February 18, 2010

Leonard Shapiro of Portland State University has created some excellent Postgres university course material.

His first course, Introduction to Databases, teaches databases from a Postgres perspective. His higher-level course, Relational DMBS/Database Internals, covers many aspects of Postgres in greater detail than is covered in the Postgres manuals.

You might remember I taught a university class last year. I hope someday to able to teach a higher-level course that makes use of Professor Shapiro's material, with his permission, of course.

Thursday, February 18, 2010

I am not sure people are aware that you can find the source code location of any generated server log message. When log_error_verbosity is set to 'verbose', additional details are output that can help diagnose errors. Specifically, I have added this to the Postgres 9.0 documentation:

VERBOSE output includes the SQLSTATE error code and the source code file name, function name, and line number that generated the error.

Here is some sample output:

ERROR: 42P01: relation "lkjasdf" does not exist at character 15 LOCATION: parserOpenTable, parse_relation.c:857 STATEMENT: select * from lkjasdf;

Such information can help when reporting errors to the community, particularly if the error message is not in English.

Thursday, February 4, 2010

Many computer programmers are introverts because they have chosen the solitary job of creating programs that accomplish specific tasks. Open source developers are probably even more prone to be introverts because the (boring?) meetings, office chit-chat, and lunch outings are mostly gone, with the only communication being via email, instant messaging, twitter, and blogs. Open source social interaction is certainly more controlled than traditional communication.

This excellent article describes introverts as generally misunderstood:

Extroverts are easy for introverts to understand, because extroverts spend so much of their time working out who they are in voluble, and frequently inescapable, interaction with other people. They are as inscrutable as puppy dogs. But the street does not run both ways. Extroverts have little or no grasp of introversion. They assume that company, especially their own, is always welcome. They cannot imagine why someone would need to be alone; indeed, they often take umbrage at the suggestion.

The Postgres community is probably filled with introverts, and that's to be expected, considering that our work does not have the kind of regular personal interaction that would keep extroverts happy. It is no surprise that introverts thrive in our community.

This also makes conferences even more valuable — it is a chance to be with other people who care about the same things we care about, and to tell some jokes. Of course, the conference talks are always good, but what stays with me after conferences is not the knowledge I have gained, but the people I have met and learned more about.

Update: A good video presentation about introverts

Friday, January 29, 2010

On Monday Josh Berkus requested suggestions for a new Postgres slogan. Our current slogan, "The world's most advanced open source database", was chosen in the early years of the project to distinguish ourselves from other open source databases, and it is easy to see why we should consider a change at this time. The suggestions fit into several categories, some serious, some humorous. Continued discussion is taking place on the advocacy (pgsql-advocacy@postgresql.org) email list.

General

PostgreSQL: The Elephant Never Forgets

PostgreSQL: Enjoy Your Database Again

PostgreSQL: The Professional Database Solution

PostgreSQL: Meet the Future

PostgreSQL: It Simply Works

Reliability

PostgreSQL: Reclaim Your Database

PostgreSQL: Because Your Data Is Worth It

PostgreSQL: Setting the Standard For Following the Standard

PostgreSQL: It's ACID!

"Your"

PostgreSQL: YourSQL

PostgreSQL: Your Next Database

PostgreSQL: Your Open Source Database

PostgreSQL: Your Data is Important

PostgreSQL: Your Data: Any time, Any Place

Open Source

PostgreSQL: Free Forever

PostgreSQL: Powerful Freedom

PostgreSQL: Liberate Your Data!

PostgreSQL: The 'Open' Open Source Database

PostgreSQL: Open-Source Database, Open-Ended Possibilities

Humorous

PostgreSQL: The Open Source Elephant Memory

PostgreSQL: Even Better With Bacon!

PostgreSQL: Less Complex Than Oracle (But That Doesn't Mean Much)

PostgreSQL: Duh

PostgreSQL: We Won't Be Bought Out

PostgreSQL: Licensed To kill

Our Name

PostgreSQL: Unpronounceably awesome!

PostgreSQL: The most powerful software you can't pronounce

PostgreSQL: It's OK to call it Postgres, just not Postgre

Postgres: Who stole my QL?

Update: Here is a more complete list.

Thursday, January 28, 2010

I was approached in the fall about speaking at various Philadelphia computer user group meetings. Now that I am home I have been able to speak at several, and it is clear there is greatly increased interest in Postgres here in the Philadelphia area, and I bet in other cities as well. For example, in January/February, I am speaking at:

Philadelphia area Linux User Group (PLUG) – North

Philadelphia area Linux User Group (PLUG) – West

The Mathlab at Drexel University

A Drexel University class

I am also speaking in Albany (full event list), and of course the big event for March is the PG East 2010 conference, also in Philadelphia. This year the conference has been extended for four days, and one of those days is made up of tutorials, which is great for new Postgres users. I am giving one on Postgres administration, and there are two others being given at the same time. I think it is likely there will be a huge increase in the number of conference attendees compared to last year.

For those of you in other cities, you might want to contact your local computer user groups to see if they would be interested in hearing a talk about Postgres. There is a lot of renewed interest in what our community is doing.

Thursday, January 28, 2010

I have always thought there were three possible threats to Postgres:

Control of the Internet domain name

Patent attacks

Hiring of volunteers to work on unrelated projects

The first issue is that our web domains are owned by a single individual, though others are able to modify the domain. The second involves patent attacks, which seems to be a risk for all software. The third item is a more general version of the skill siphoning / brain drain email thread I blogged about last year, but with a sinister goal of hiring away developers to work on anything but Postgres.

As Postgres becomes more widely adopted, these attacks become more likely. Are there other threats I have not listed? Are there more steps we can take to protect ourselves?

Thursday, January 28, 2010

Last week EnterpriseDB put out a press release with an inaccurate title. Instead of being titled, "EnterpriseDB Responds to EU Decision to Approve Oracle's Acquisition of MySQL", it was unfortunately titled "Postgres Community Responds ...". This has the potential to confuse journalists into thinking that Postgres development is not community controlled, and that's the last thing we want to happen at this point, with MySQL's problem in this area starkly highlighted. This dovetails well with the blog entry I posted a few weeks ago titled, "Don't Confuse Companies with Community".

Thursday, January 28, 2010

Based on recent discussions, I am wondering if Postgres is poised for a significant adoption increase in the next year. This is based on a few events:

The upcoming Postgres 9.0 release, which I just blogged about

The unstable economy, which is causing many companies to rethink their software spending practices

The uncertainty surrounding other databases, both proprietary and open source

The last time Postgres appeared poised for another adoption increase was in 2005, with the release of PostgreSQL 8.0, when the native Windows port was introduced. I would say that release marked the point where Postgres became a respected open source database; prior to that we had a reputation of being slow and hard to manage. We had made changes for years before that, but the 8.0 release helped solidify our reputation.

The 9.0 release has the potential for Postgres to move from a respected open source database to a respected mainstream database, because, frankly, how many other databases are attracting new users at the rate we are? — very few.

Thursday, January 28, 2010

Last week Dave Page announced that the next major release of Postgres will be numbered 9.0, rather than 8.5 as previous expected. This change was made because the next major PostgreSQL release will include built-in log-streaming replication, and that has always been considered a feature worthy of increasing the first version number.

While Postgres has had log-based replication support for a few releases, the 9.0 release will allow logs to be streamed to the slave using network transfer, which greatly increases the frequency at which logs can reasonably be sent. It also allows the slave server to accept read-only queries (hot standby), again a major feature addition.

Thursday, January 21, 2010

It seems the European Commission studying the Oracle purchase of Sun/MySQL has finally realized Postgres is a viable open source alternative to MySQL and has approved the purchase:

The Commission's investigation showed that another open source database, PostgreSQL, is considered by many database users to be a credible alternative to MySQL and could be expected to replace to some extent the competitive force currently exerted by MySQL on the database market.

This article, titled in part, "If MySQL fails, there's always PostgreSQL", puts it even more succinctly. I know many MySQL people worked very hard campaigning against approval, and I am sure they are disappointed by the outcome. Postgres users should thank them because even though they lost, their work has increased the visibility of all open source databases, including PostgreSQL.

Thursday, January 21, 2010

I mentioned two years ago that I was going to read through the Postgres documentation to see if I could improve it. Now, two years later, I have read through only 25% of the manual. (I will be making a commit to the "Administration" book in the next week.)

My new plan is to solicit volunteers to complete the proofreading of the entire manual. I have created a wiki describing the process and how to get involved. Amazingly, someone took responsibility for a book even before I announced the existence of the wiki. I have also emailed a few people who have offered proofreading assistance in the past.

Monday, January 4, 2010

There have recently been several excellent blog posts by Postgres community members regarding the campaign by MySQL developers to block the purchase of Sun Microsystems by Oracle. Unfortunately, I think a fatal mistake was made by the MySQL developers years ago when they equated the success of MySQL AB, the company, with the success of MySQL, the software. They are not the same, but, tragically, it is too late to undo that mistake.

I think the majority of Postgres developers understand the distinction between the success of their companies and the success of the PostgreSQL project. Most of us agree that the success of the PostgreSQL project is a purpose far greater than our individual or company successes. While the Postgres project will undoubtedly make mistakes in the future, confusing company success with community success will probably not be one of them.