This blog is about my work on the Postgres open source database, and is published on Planet PostgreSQL. PgLife allows monitoring of all Postgres community activity.

Online status:

Unread Postgres emails:

Email graphs: incoming, outgoing, unread, commits (details)

Sunday, December 1, 2019

I am attending re:Invent this week. Thanks to the kind sponsorship of aws, we are having a community Q&A and dinner on Thursday, December 5 in Las Vegas. The Meetup page is now online if you want to register.

Friday, September 27, 2019

For the past 16 months, there has been discussion about whether and how to implement Transparent Data Encryption (tde) in Postgres. Many other relational databases support tde, and some security standards require it. However, it is also debatable how much security value tde provides.

The tde 400-email thread became difficult for people to follow, partly because full understanding required knowledge of Postgres internals and security details. A group of people who wanted to move forward began attending a Zoom call, hosted by Ahsan Hadi. The voice format allowed for more rapid exchange of ideas, and the ability to quickly fill knowledge gaps. It was eventually decided that all-cluster encryption was the easiest to implement in the first version. Later releases will build on this.

Fundamentally, tde must meet three criteria — it must be secure, obviously, but it also must be done in a way that has minimal impact on the rest of the Postgres code. This has value for two reasons — first, only a small number of users will use tde, so the less code that is added, the less testing is required. Second, the less code that is added, the less likely tde will break because of future Postgres changes. Finally, tde should meet regulatory requirements. This diagram by Peter Smith illustrates the constraints.

There is an active TODO list to coordinate development. There is hope this can be completed in Postgres 13.

Saturday, August 31, 2019

Nine months ago, I started development of a key management extension for pgcrypto. The tool is called pgcryptokey and is now ready for beta testing.

It uses two-levels of encryption, with an access password required to use the cryptographic keys. It supports setting and changing the access password, multiple cryptographic keys, key rotation, data reencryption, and key destruction. It also passes the access password from client to server without it appearing in clear text in SQL queries, and supports boot-time setting. The extension leverages pgcrypto and Postgres custom server variables.

Friday, June 28, 2019

Having returned from last week's Ibiza conference, I have a new understanding of the event's goals. I know there was some uncertainty about the event, for several reasons:

Having a conference at a resort is a new thing for our community. We started years ago with conferences in universities, and steadily grew to hotel-based conferences in minor and then major cities.

is a new thing for our community. We started years ago with conferences in universities, and steadily grew to hotel-based conferences in minor and then major cities. Ibiza has a reputation in some countries as a party destination. The wildest thing I saw were yelping dogs being walked along the beach.

The wildest thing I saw were yelping dogs being walked along the beach. The beach mention often confused people. This was part of an effort to raise the importance of the hallway track , rather than it being just scheduled holes between technical talks. I didn't realize it was possible to eat lunch and swim in the ocean during a 90-minute break, but I did it!

mention often confused people. This was part of an effort to raise the importance of the , rather than it being just scheduled holes between technical talks. I didn't realize it was possible to eat lunch and swim in the ocean during a 90-minute break, but I did it! There is historical abuse of resort-based conferences as paid or tax-free vacations. This was certainly not the case for Ibiza, but it is an additional hurdle.

I returned from the conference with a warm feeling for the venue, the people I met, and the things I learned, as did my wife and daughter. While resort conferences are not for everybody, they are popular in other industries, and there is certainly a need for this type of event. The next scheduled "beach" conference is in Bali, and I plan to attend.

Thursday, June 27, 2019

Having delivered my new talk, The Democratization of Databases, at PostGres IBiZa and Postgres Vision, I am now posting my slides online. It covers the history of various governing structures and why democracy provides superior results. It has been well received.

Sunday, June 23, 2019

Now that I have given a presentation about Postgres 12 features in Ibiza, I have made my slides available online.

Thursday, June 6, 2019

I did a webinar two weeks ago titled, "Exploring Postgres Tips and Tricks." The slides are now online, as well as a video recording. I wasn't happy with the transition I used from the pdf to the blog entries, but now know how to improve that next time.

I think I might do more of these by expanding on some of the topics I covered, like psql and monitoring. Also, a new video is available of the sharding presentation I mentioned previously.

Friday, May 31, 2019

I presented my sharding talk today at PGCon in Ottawa. The slides have been updated to more clearly show what has been accomplished toward the goal of built-in sharding, and what remains to be done. The talk was well attended. I also attended a breakfast meeting this morning about sharding.

Update: A video is also available 2019-06-06

Sunday, May 12, 2019

I have completed the draft version of the Postgres 12 release notes. Consisting of 186 items, this release makes big advances in partitioning, query optimization, and index performance. Many long-awaited features, like reindex concurrently, multi-variate most-frequent-value statistics, and common table expression inlining, are included in this release.

The release notes will be continually updated until the final release, which is expected in September or October of this year.

Friday, March 8, 2019

There was a time when every piece of software had to be purchased: operating systems, compilers, middleware, text editors. Those days are mostly gone, though there are a few holdouts (e.g., MS Windows, vertical applications). What happened is that open source software has come to dominate most uses, and software selection is rarely based on cost requirements.

One of the final holdouts for purchased software is databases. You might think that is because database software is complex, but so is the other software mentioned. The big difference is that while non-database software processes or stores user data in a simple or standard way, databases lock user data inside the database. This data locking is a requirement for fast, reliable, and concurrent data access, but it does place the database on a different monetary plane.

In any organization, it is really their data that is valuable, and because the database is so tightly coupled to that valuable data, database software becomes something that is worth significant investment. This explains why databases have resisted the open source commoditization that has happened to so much other purchased software. (Custom database applications tied to complex business logic has also slowed migration.)

The rise of Postgres in the past few years shows that the days of expensive database software are numbered. However, once the move to open source becomes effectively complete, there will still be significant revenue opportunities. Few people purchase compiler support, and many don't even purchase operating system support, but database support, because of its tight integration with user data, might never disappear, though it could be subsumed into other costs like cloud computing. It will be the care and feeding of user data that people will pay for, rather than the database software itself, because it pays to protect things of value.

Thursday, March 7, 2019

Having explained that lock-in is not a binary option, what are the Postgres tool support options available, at a high level?

Develop in-house database tools and support them yourself

Use open source tools and support them yourself

Use open source tools with vendor support (hopefully the vendor supports your chosen tools)

Use closed-source tools with vendor support

Of course, you can mix and match these options, i.e., use a support vendor for the open source tools they support, use other open source tools they don't support, and use some tools you develop in-house, e.g.:

open source Postgres database (vendor support)

pgBackRest for backup (vendor support)

patroni for failover (community support channels)

In-house developed tools (self support)

I went over these options in more detail in this presentation. This diversity of options is rarely available for closed-source, single-vendor database solutions.

Update: This blog entry has much more detail about lock-in. 2019-09-26

Wednesday, March 6, 2019

Postgres has had streaming (binary) replication for a long time, and logical (row shipping) replication since Postgres 10. Implementing these was a lot of work, and they work well.

However, the simplest way to do replication is often considered to be replaying sql queries on standbys. The primary was modified by sql queries, so isn't the simplest way to replicate replaying sql? A novice would think so, and many database server developers initially try replication by replaying sql. It seems simple because sql queries are more concise than per-row changes. Imagine a delete that affects one million rows being shipped to a standby as a single sql query. The conciseness and simplicity of sql replication looks promising.

However, if you try implementing replication via sql, you will realize that sql runs in a multi-user environment. Sql commands do not contain enough information to replay queries the exact same way on standbys as the primary. Concurrent dml, volatile functions, sequence assignment, locks, and cursor contents can all cause inconsistencies. Developers have tried patching over these issues, but eventually the fundamental limitations of this approach become clear. I doubt Postgres will ever implement sql-level replication for this reason.

Tuesday, March 5, 2019

One of the major attractions of Postgres is the ability to stop using database software controlled by a single vendor. Single-vendor control means a single entity controls the software, tools, training, and support. There are sometimes options from other vendors, but they are usually hampered because the source code is closed.

Postgres is a strong move away from that, but is it always a complete disconnection from lock-in? Well, it can be — you could:

Download the Postgres source code and compile it yourself

Evaluate, integrate, and test the tools you need to use Postgres in your organization

Create a training program for your employees

Develop a Postgres internals team to support Postgres and your tools

This is not possible for proprietary software, but because Postgres is open source, it is certainly possible.

However, once you have completed these steps, what do you have? No lock-in? Well, no vendor/external lock-in, but you do have internal lock-in. Your staff is doing a lot of work, and any change in direction is going to be difficult. This also might be an expensive option due to staff costs. By choosing an external vendor, you can reduce costs, though you increase your external lock-in. (I covered this concept recently.)

So, lock-in isn't always a bad thing if it reduces costs or increases efficiency or flexibility. Importantly, you have the ability to switch vendors when advantageous, and since vendors know they can be easily replaced, they are less likely to be exploitative. Frankly, the problem to avoid is not lock-in as much as being a hostage of a single vendor.

Monday, March 4, 2019

Postgres has long lived in the shadow of proprietary and other open source databases. We kind of got used to that, though we had early support from Fujitsu and ntt. In recent years, Postgres has become more noticed, and the big companies promoting Postgres have become somewhat of a flood:

Amazon ( rds

Google

Ibm (cloud and non-cloud)

(cloud and non-cloud) Microsoft

Alibaba

Even with ibm having DB2 and Microsoft having SQL Server, they still support Postgres.

It is odd having multi-billion-dollar companies asking how they can help the Postgres community, but I guess we will have to get used to it. These companies support the community to varying degrees, but we certainly appreciate all the help we receive. Just having these companies list us as supported is helpful.

Monday, February 25, 2019

As an actively-developed open source project with a long history, Postgres often has to make decisions on how to integrate new features into the existing code base. In some cases, these new features potentially break backward compatibility, i.e., api breakage. This breakage can be caused by:

Fixing an incorrect result or behavior

Adding a feature that prevents the previous api from working

from working Replacing an existing feature with an improved one that requires api breakage

In these cases, the Postgres project has several options:

Add the new feature and retain the old api forever Add the new feature and retain the old api until all supported Postgres versions have the new interface (five years) Add the new feature and remove the old api Reject the new feature because it would cause api breakage

You might think that #1 is always the best option. However, if you have ever used older software that has several ways to do the same thing, with no logic behind it except the order features were added, you know that choosing #1 has costs. While #1 allows existing users to continue using Postgres unchanged, new users have to navigate the complex api required to maintain backward compatibility. There are some cases where the breakage would be so significant that #1 (or #4) is the only option. However, choosing #2 or #3 allows future users to interact with Postgres using a clean api.

Backward-compatibility breakage can happen at several levels:

Client interface, e.g., libpq Sql Administrative interface System catalogs Source code

The farther down the list, the more likely the Postgres community will decide to break backward compatibility because of the reduced impact of the breakage.

This email thread discusses the problems caused by a system catalog change, and the positives and negatives of backward-compatibility breakage.

Friday, February 22, 2019

I did a webcast earlier this week about the many options available to people choosing Postgres — many more options than are typically available for proprietary databases. I want to share the slides, which covers why open source has more options, how to choose a vendor that helps you be more productive, and specifically tool options for extensions, deployment, and monitoring.

Wednesday, February 20, 2019

Postgres supports two types of server-side languages, trusted and untrusted. Trusted languages are available for all users because they have safe sandboxes that limit user access. Untrusted languages are only available to superusers because they lack sandboxes.

Some languages have only trusted versions, e.g., PL/pgSQL. Others have only untrusted ones, e.g., PL/Python. Other languages like Perl have both.

Why would you want to have both trusted and untrusted languages available? Well, trusted languages like PL/Perl limit access to only safe resources, while untrusted languages like PL/PerlU allow access to files system and network resources that would be unsafe for non-superusers, i.e., it would effectively give them the same power as superusers. This is why only superusers can use untrusted languages.

When choosing server-side languages, the availability of trusted and untrusted languages option should be considered.

Monday, February 18, 2019

Sql is a declaritive language, meaning you specify what you want, rather than how to generate what you want. This leads to a natural language syntax, like the select command. However, once you dig into the behavior of select, it becomes clear that it is necessary to understand the order in which select clauses are executed to take full advantage of the command.

I was going to write up a list of the clause execution ordering, but found this webpage that does a better job of describing it than I could. The ordering bounces from the middle clause (from) to the bottom to the top, and then the bottom again. It is hard to remember the ordering, but memorizing it does help in constructing complex select queries.

Friday, February 15, 2019

This email thread from 2017 asks the question of whether there is an imperative language that generates declarative output that can be converted into an imperative program and executed. Specifically, is there an imperative syntax that can output sql (a declarative language) which can be executed internally (imperatively) by Postgres?

The real jewel in this email thread is from Peter Geoghegan, who has some interesting comments. First, he explains why developers would want an imperative language interface, even if it has to be converted to declarative:

Some developers don't like sql because they don't have a good intuition for how the relational model works. While sql does have some cruft — incidental complexity that's a legacy of the past — any language that corrected sql 's shortcomings wouldn't be all that different to sql , and so wouldn't help with this general problem. Quel sql was at the time. However, the OP [original poster] seemed to be describing something that maps imperative code to a declarative sql query or something equivalent, which isn't quite the same thing. The declarative nature of sql feels restrictive or at least unfamiliar to many programmers. What is often somehow missed is that it's restrictive in a way that's consistent with how the relational model is supposed to work. It seems hard to some programmers because you have to formulate your query in terms of an outcome, not in terms of a series of steps that can be added to iteratively, [emphasis added] as a snippet of code is written. It's very different to something like bash, because it requires a little bit of up-front, deliberate mental effort. And, because performance with many different possible outputs matters rather a lot.

Second, he explains why sql is one of the few successful declarative languages:

To state the very obvious: If you assume for the sake of discussion that the programmer of a hypothetical imperative query language is infinitely capable and dedicated, and so is at least as capable as any possible query optimizer, the optimizer still comes out ahead, because it is capable of producing a different, better query plan as the underlying data changes. Of course, it's also true that it's very hard to beat the query optimizer under ideal conditions.

Basically, imperative languages cannot adjust to changing data sizes and value frequencies, while declarative sql can.

Another interesting observation from Chris Travers is that it is simpler to convert from declarative to imperative, than the other direction. The thread contains many other interesting observations about why sql became so popular, and why it is unlikely that other languages will replace it anytime soon.

Wednesday, February 13, 2019

You might not be aware that you can store a virtual row, called a composite value, inside a database field. Composite values have their own column names and data types. This is useful if you want to group multiple statically-defined columns inside a single column. (The json data types are ideal for dynamically-defined columns.)

This email thread explains how to define and use them, I have a presentation that mentions them, and the Postgres manual has a section about them.

Monday, February 11, 2019

I saw at time zone used in a query, and found it confusing. I read the Postgres documentation and was still confused, so I played with some queries and finally figured it out. I then updated the Postgres documentation to explain it better, and here is what I found.

First, at time zone has two capabilities. It allows time zones to be added to date/time values that lack them (timestamp without time zone, ::timestamp), and allows timestamp with time zone values (::timestamptz) to be shifted to non-local time zones and the time zone designation removed. In summary, it allows:

timestamp without time zone ⇾ timestamp with time zone (add time zone) timestamp with time zone ⇾ timestamp without time zone (shift time zone)

It is kind of odd for at time zone to be used for both purposes, but the sql standard requires this.

First, let's see #1, at time zone adding time zone designations:

SELECT '2018-09-02 07:09:19'::timestamp AT TIME ZONE 'America/Chicago'; timezone ------------------------ 2018-09-02 08:09:19-04 SELECT '2018-09-02 07:09:19'::timestamp AT TIME ZONE 'America/Los_Angeles'; timezone ------------------------ 2018-09-02 10:09:19-04 SELECT '2018-09-02 07:09:19'::timestamp AT TIME ZONE 'Asia/Tokyo'; timezone ------------------------ 2018-09-01 18:09:19-04

What is basically happening above is that the date and time are interpreted as being in the specified time zone (e.g., America/Chicago), a timestamp with time zone value is created, and the value displayed in the default time zone (-04).

It doesn't matter if a time zone designation is specified in the ::timestamp string — only the date and time are used. This is because casting a value to timestamp without time zone ignores any specified time zone:

SELECT '2018-09-02 07:09:19'::timestamp; timezone ------------------------ 2018-09-02 07:09:19-04 SELECT '2018-09-02 07:09:19-10'::timestamp; timezone ------------------------ 2018-09-02 07:09:19-04 SELECT '2018-09-02 07:09:19-12'::timestamp; timezone ------------------------ 2018-09-02 07:09:19-04

This behavior is also shown in at time zone:

SELECT '2018-09-02 07:09:19'::timestamp AT TIME ZONE 'America/Chicago'; timezone ------------------------ 2018-09-02 08:09:19-04 SELECT '2018-09-02 07:09:19-10'::timestamp AT TIME ZONE 'America/Chicago'; timezone ------------------------ 2018-09-02 08:09:19-04 SELECT '2018-09-02 07:09:19-12'::timestamp AT TIME ZONE 'America/Chicago'; timezone ------------------------ 2018-09-02 08:09:19-04

The second use of at time zone (#2) is for removing time zone designations by shifting the timestamp with time zone value to a different time zone and removing the time zone designation:

SELECT '2018-09-02 07:09:19-04'::timestamptz AT TIME ZONE 'America/Chicago'; timezone --------------------- 2018-09-02 06:09:19 SELECT '2018-09-02 07:09:19-04'::timestamptz AT TIME ZONE 'America/Los_Angeles'; timezone --------------------- 2018-09-02 04:09:19 SELECT '2018-09-02 07:09:19-04'::timestamptz AT TIME ZONE 'Asia/Tokyo'; timezone --------------------- 2018-09-02 20:09:19

In these cases, because the inputs are timestamp with time zone, time zone designations in the strings are significant:

SELECT '2018-09-02 07:09:19-04'::timestamptz AT TIME ZONE 'America/Chicago'; timezone --------------------- 2018-09-02 06:09:19 SELECT '2018-09-02 07:09:19-05'::timestamptz AT TIME ZONE 'America/Chicago'; timezone --------------------- 2018-09-02 07:09:19 SELECT '2018-09-02 07:09:19-06'::timestamptz AT TIME ZONE 'America/Chicago'; timezone --------------------- 2018-09-02 08:09:19

The time zone is not being added to the date and time. Rather, the full date/time/time zone value is shifted to the desired time zone (America/Chicago), and the time zone designation removed (timestamp without time zone). This is useful because normally you would need to change your TimeZone setting to see values in other time zones.

Without the cast, at time zone inputs are assumed to be timestamp with time zone, and the local time zone is assumed if not specified:

SELECT '2018-09-02 07:09:19' AT TIME ZONE 'America/Chicago'; timezone --------------------- 2018-09-02 06:09:19 SELECT '2018-09-02 07:09:19-10' AT TIME ZONE 'America/Chicago'; timezone --------------------- 2018-09-02 12:09:19

Again notice the missing time zone designations in the results.

The most interesting queries are these two, though they return the same output as input:

SELECT '2018-09-02 07:09:19'::timestamp AT TIME ZONE 'America/Chicago' AT TIME ZONE 'America/Chicago'; timezone --------------------- 2018-09-02 07:09:19 SELECT '2018-09-02 07:09:19-04'::timestamptz AT TIME ZONE 'America/Chicago' AT TIME ZONE 'America/Chicago'; timezone ------------------------ 2018-09-02 07:09:19-04

As you can see the two at time zone calls cancel each other out. The first creates a timestamp with time zone in the America/Chicago time zone using the supplied date and time, and then shifts the value to that same time zone, removing the time zone designation. The second creates a timestamp without time zone value in the same time zone, then creates a timestamp with time zone value using the date and time in the default time zone (TimeZone).

Using different time zones for the two calls yields useful results:

SELECT '2018-09-02 07:09:19'::timestamp AT TIME ZONE 'Asia/Tokyo' AT TIME ZONE 'America/Chicago'; timezone --------------------- 2018-09-01 17:09:19

This gives the America/Chicago time for the supplied Asia/Tokyo time — quite useful.

I have updated the Postgres documentation to be clearer about at time zone. Hopefully, that change and this blog post make the feature less confusing, or more so.

Friday, February 8, 2019

I worked with two companies this week to help them build open-source Postgres teams. Hopefully we will start seeing their activity in the community soon.

One tool I used to familiarize them with the Postgres community was PgLife. Written by me in 2013, PgLife presents a live dashboard of all current Postgres activity, including user, developer, and external topics. Not only a dashboard, you can drill down into details too. All the titles on the left are clickable, as are the detail items. The plus sign after each Postgres version shows the source code changes since its release. Twitter and Slack references have recently been added.

I last mentioned PgLife here six years ago, so I thought I would mention it again. FYI, this is my 542nd blog entry. If you missed any of them, see my category index at the top of this page.

Wednesday, February 6, 2019

Thanks to a comment on my previous blog post by Kaarel, the ability to simply display the Postgres permission letters is not quite as dire as I showed. There is a function, aclexplode(), which expands the access control list (acl) syntax used by Postgres into a table with full text descriptions. This function exists in all supported versions of Postgres. However, it was only recently documented in this commit based on this email thread, and will appear in the Postgres 12 documentation.

Since aclexplode() exists (undocumented) in all supported versions of Postgres, it can be used to provide more verbose output of the pg_class.relacl permission letters. Here it is used with the test table created in the previous blog entry:

SELECT relacl FROM pg_class WHERE relname = 'test'; relacl -------------------------------------------------------- {postgres=arwdDxt/postgres,bob=r/postgres,=r/postgres} SELECT a.* FROM pg_class, aclexplode(relacl) AS a WHERE relname = 'test' ORDER BY 1, 2; grantor | grantee | privilege_type | is_grantable ---------+---------+----------------+-------------- 10 | 0 | SELECT | f 10 | 10 | SELECT | f 10 | 10 | UPDATE | f 10 | 10 | DELETE | f 10 | 10 | INSERT | f 10 | 10 | REFERENCES | f 10 | 10 | TRIGGER | f 10 | 10 | TRUNCATE | f 10 | 16388 | SELECT | f

Some columns are hard to understand. The first column is the role id of the grantor of the permission, which is 10 (postgres) for all entries. The second column is the role id who was assigned the permission. In this case, zero (0) represents public.

We can use array_agg to group permissions for each grantor/grantee combination into a single line:

SELECT a.grantor, a.grantee, array_agg(privilege_type) FROM pg_class, aclexplode(relacl) AS a WHERE relname = 'test' GROUP BY 1, 2 ORDER BY 1, 2; grantor | grantee | array_agg ---------+---------+----------------------------------------------------------- 10 | 0 | {SELECT} 10 | 10 | {SELECT,UPDATE,DELETE,INSERT,REFERENCES,TRIGGER,TRUNCATE} 10 | 16388 | {SELECT}

By casting the role id to text using regrole, the output is clearer (dash (-) represents public):

SELECT a.grantor::regrole, a.grantee::regrole, array_agg(privilege_type) FROM pg_class, aclexplode(relacl) AS a WHERE relname = 'test' GROUP BY 1, 2 ORDER BY 1, 2; grantor | grantee | array_agg ----------+----------+----------------------------------------------------------- postgres | - | {SELECT} postgres | postgres | {SELECT,UPDATE,DELETE,INSERT,REFERENCES,TRIGGER,TRUNCATE} postgres | bob | {SELECT}

This is certainly better than the letter spaghetti that I showed in my previous blog post. By adding relname to the target list and removing the where clause, you can display the permissions of all database tables. By using the pg_proc table instead of pg_class, you can display verbose function permissions. This method can be used for any system table that has a column of type aclitem[].

Monday, February 4, 2019

If you have looked at Postgres object permissions in the past, I bet you were confused. I get confused, and I have been at this for a long time.

The way permissions are stored in Postgres is patterned after the long directory listing of Unix-like operating systems, e.g., ls -l. Just like directory listings, the Postgres system stores permissions using single-letter indicators. r is used for read (select) in both systems, while w is used for write permission in ls, and update in Postgres. The other nine letters used by Postgres don't correspond to any directory listing permission letters, e.g., d is delete permission. The full list of Postgres permission letters is in the grant documentation page; the other letters are:

D -- TRUNCATE x -- REFERENCES t -- TRIGGER X -- EXECUTE U -- USAGE C -- CREATE c -- CONNECT T -- TEMPORARY

Let's look at how these letters are stored in the Postgres system catalogs by using psql's \dp (or alias \z):

CREATE TABLE test (x INTEGER); \dp Access privileges Schema | Name | Type | Access privileges | Column privileges | Policies --------+------+-------+-------------------+-------------------+---------- public | test | table | | | CREATE USER bob; GRANT SELECT ON TABLE test to bob; \dp Access privileges Schema | Name | Type | Access privileges | Column privileges | Policies --------+------+-------+---------------------------+-------------------+---------- public | test | table | postgres=arwdDxt/postgres+| | | | | bob=r/postgres | |

The output of the first dp shows no permissions, indicating that the owner is the only role who has access to this object. As soon as permissions are added for anyone else, the owner (postgres) permissions are explicitly listed with the new permissions — in this case, for bob. The /postgres at the end indicates the role who assigned the permissions.

Giving public permissions shows a line similar to bob, but, to indicate public, there is no role name before the equals sign:

GRANT SELECT ON TABLE test to PUBLIC; \dp Access privileges Schema | Name | Type | Access privileges | Column privileges | Policies --------+------+-------+---------------------------+-------------------+---------- public | test | table | postgres=arwdDxt/postgres+| | | | | bob=r/postgres +| | | | | =r/postgres | |

While this method of storing permissions is certainly compact, it is not obvious. I don't remember anyone complaining about our compact permissions display, and I am not sure what I would suggest to improve it, but it certainly takes study to become proficient at interpreting it.

Wednesday, January 30, 2019

This interesting email thread explores the question of how much you can prevent or detect unauthorized database superuser activity. The main conclusions from the thread are:

It is impossible to restrict database administrator access without hindering their ability to perform their jobs

Monitoring superuser activity is the most reasonable way to detect and hopefully discourage unauthorized activity

Monitoring includes: Assign a separate account to each administrator for auditing purposes; do not use generic/shared accounts Use an auditing tool to record database activity, e.g., pgAudit Use syslog to send database logs to a computer not under database administrators' control Record all shell command activity in a similar way



There is also a helpful summary email.

Monday, January 28, 2019

This wide-ranging email thread covers many of the challenges of adding encryption to Postgres. There is discussion of:

The need to understand the threats you are protecting against, "For anyone to offer a proper solution, you need to say what purpose your encryption will serve."

The need for layers of protection

The questionable usefulness of storage encryption, "Thus, unless you move your DB server on a regular basis, I can't see the usefulness of whole database encryption (WDE) on a static machine."

The value of encrypting network storage, "Having the 'disk' associated with a specific server encrypted can provide some level of protection from another machine which also has access to the underlying infrastructure from being able to access that data."

Credit Card industry requirements, "Non-Pubic Information (NPI) data should not be logged nor stored on a physical device in non-encrypted mode."

The limits of per-column encryption, "It is extremely unlikely you just want all the data in the database encrypted." (These five emails from another thread, 1, 2, 3, 4, 5, also discuss this topic.)

The many other database products that support built-in column-level encryption

As you can see, the discussion was all over the map. The Postgres project probably needs to do a better job communicating about these options and their benefits.

Friday, January 25, 2019

One frequent complaint about connection poolers is the limited number of authentication methods they support. While some of this is caused by the large amount of work required to support all 14 Postgres authentication methods, the bigger reason is that only a few authentication methods allow for the clean passing of authentication credentials through an intermediate server.

Specifically, all the password-based authentication methods (scram-sha-256, md5, password) can easily pass credentials from the client through the pooler to the database server. (This is not possible using scram with channel binding.) Many of the other authentication methods, e.g. cert, are designed to prevent man-in-the-middle attacks and therefore actively thwart passing through of credentials. For these, effectively, you have to set up two sets of credentials for each user — one for client to pooler, and another from pooler to database server, and keep them synchronized.

A pooler built-in to Postgres would have fewer authentication pass-through problems, though internal poolers have some down sides too, as I already stated.

Wednesday, January 23, 2019

I have already talked about external password security. What I would like to talk about now is keeping an external-password data store synchronized with Postgres.

Synchronizing the password is not the problem (the password is only stored in the external password store), but what about the existence of the user. If you create a user in ldap or pam, you would like that user to also be created in Postgres. Another synchronization problem is role membership. If you add or remove someone from a role in ldap, it would be nice if the user's Postgres role membership was also updated.

ldap2pg can do this in batch mode. It will compare ldap and Postgres and modify Postgres users and role membership to match ldap. This email thread talks about a custom solution then instantly creates users in Postgres when they are created in ldap, rather than waiting for a periodic run of ldap2pg.

Monday, January 21, 2019

As I already mentioned, passwords were traditionally used to prove identity electronically, but are showing their weakness with increased computing power has increased and expanded attack vectors. Basically, user passwords have several restrictions:

must be simple enough to remember

must be short enough to type repeatedly

must be complex enough to not be easily guessed

must be long enough to not be easily cracked (discovered by repeated password attempts) or the number of password attempts must be limited

As you can see, the simple/short and complex/long requirements are at odds, so there is always a tension between them. Users often choose simple or short passwords, and administrators often add password length and complexity requirements to counteract that, though there is a limit to the length and complexity that users will accept. Administrators can also add delays or a lockout after unsuccessful authorization attempts to reduce the cracking risk. Logging of authorization failures can sometimes help too.

While Postgres records failed login attempts in the server logs, it doesn't provide any of the other administrative tools for password control. Administrators are expected to use an external authentication service like ldap or pam, which have password management features.

For a truly sobering view of users' motivation to improve security, read this 2010 paper mentioned on our email lists. These sentences sum it up:

Since victimization is rare, and imposes a one-time cost, while security advice applies to everyone and is an ongoing cost, the burden ends up being larger than that caused by the ill it addresses. Security is not something users are offered and turn down. What they are offered and do turn down is crushingly complex security advice that promises little and delivers less. In the model we set forward it is not users who need to be better educated on the risks of various attacks, but the security community. Security advice simply offers a bad cost-benefit tradeoff to users.

Combine this with the limited value of password policies, the difficulty in determining if a site is using a valid tls/ssl certificate, and the questionable value of certificate errors, and it makes you want rely as little as possible on users for security.

Wednesday, January 16, 2019

I mentioned previously that it is possible to implement certificate authentication on removable media, e.g., a usb memory stick. This blog post shows how it is done. First, root and server certificates and key files must be created:

$ cd $PGDATA # create root certificate and key file $ openssl req -new -nodes -text -out root.csr -keyout root.key -subj "/CN=root.momjian.us" $ chmod og-rwx root.key $ openssl x509 -req -in root.csr -text -days 3650 -extfile /etc/ssl/openssl.cnf -extensions v3_ca -signkey root.key -out root.crt # create server certificate and key file $ openssl req -new -nodes -text -out server.csr -keyout server.key -subj "/CN=momjian.us" $ chmod og-rwx server.key $ openssl x509 -req -in server.csr -text -days 365 -CA root.crt -CAkey root.key -CAcreateserial -out server.crt

Then ssl must be enabled, cert authentication specified, and the server restarted:

# configure server for SSL and client certificate authentication $ psql -c 'ALTER SYSTEM SET ssl = ON;' test $ psql -c "ALTER SYSTEM SET ssl_ca_file = 'root.crt';" test # configure pg_hba.conf for 'cert' $ sed 's/host/#host/' pg_hba.conf > /tmp/$$ && cat /tmp/$$ > pg_hba.conf && rm /tmp/$$ $ echo 'hostssl all all 127.0.0.1/32 cert' >> pg_hba.conf # restart server $ pg_ctl stop $ pg_ctl -l server.log start

Finally, the client must have a copy of the root certificate and a client certificate must be created:

# copy root certificate to the client $ mkdir ~/.postgresql 2> /dev/null $ cd ~/.postgresql $ cp $PGDATA/root.crt . # create client certificate and key file # Use of -nodes would prevent a required password $ openssl req -new -text -out postgresql.csr -keyout postgresql.key -subj "/CN=postgres" $ chmod og-rwx postgresql.key $ openssl x509 -req -in postgresql.csr -text -days 365 -CA $PGDATA/root.crt -CAkey $PGDATA/root.key -CAcreateserial -out postgresql.crt

And now the test:

$ psql -c 'SELECT version();' 'host=momjian.us dbname=test sslmode=verify-full' Enter PEM pass phrase: version --------------------------------------------------------------------------------------------------- PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10+deb8u1) 4.9.2, 64-bit

With this all in place, we can implement removable certificate authentication. Insert a usb memory stick and give it a name, which causes the usb file system to appear at a predictable file path each time it is inserted. (The method varies among operating systems.) Now, move the ~/.postgresql directory to the usb memory stick and create a symbolic link to it from the local file system:

$ mv ~/.postgresql '/media/bruce/Bruce M USB/.postgresql' $ ln -s '/media/bruce/Bruce M USB/.postgresql' ~/.postgresql

With the usb memory stick inserted, psql runs normally, but when it is ejected, the symbolic link points to nothing and an error is generated:

$ psql -c 'SELECT version();' 'host=momjian.us dbname=test sslmode=verify-full' psql: root certificate file "/var/lib/postgresql/.postgresql/root.crt" does not exist Either provide the file or change sslmode to disable server certificate verification.

In conclusion, cert authentication with a password-enabled private key is already two-factor authentication — the private key file ("Something you have"), and its password ("Something you know"). By storing the private key file on a usb memory stick, the "Something you have" becomes independent of the computer used to access the database. As I mentioned before, piv devices have even more security advantages.

Monday, January 14, 2019

Traditionally, passwords were used to prove identity electronically. As computing power has increased and attack vectors expanded, passwords are proving insufficient. Multi-factor authentication uses more than one authentication factor to strengthen authentication checking. The three factors are:

Something you know, e.g., password, pin Something you have, e.g., cell phone, cryptographic hardware Something you are, e.g., finger print, iris pattern, voice

Postgres supports the first option, "Something you know," natively using local and external passwords. It supports the second option, "Something you have," using cert authentication. If the private key is secured with a password, that adds a second required factor for authentication. Cert only supports private keys stored in the file system, like a local file system or a removable usb memory stick.

One enhanced authentication method allows access to private keys stored on piv devices, like the YubiKey. There are two advantages of using a piv device compared to cert:

Requires a pin , like a private-key password, but locks the device after three incorrect pin entries (File-system-stored private keys protected with passwords can be offline brute-force attacked.)

, like a private-key password, but locks the device after three incorrect entries (File-system-stored private keys protected with passwords can be offline brute-force attacked.) While the private key can be used to decrypt and sign data, it cannot be copied from the piv device, unlike one stored in a file system

Unfortunately, libpq does not support piv access directly, though it can be accessed using external authentication methods like pam. Google Authenticator and FreeOTP can also be used to add a second factor of authentication to pam and other external authentication methods.

The third type of authentication factor, "Something you are," also requires an external authentication method. It is unclear if Postgres should support more authentication methods directly or improve documentation about how to integrate them with existing external authentication methods.

Wednesday, January 2, 2019

Postgres supports fourteen authentication methods — that might seem like a lot, but Postgres is used in many environments, and it has to support whatever methods are being used in those environments. The fourteen methods can seem confusing, but they are easier to understand in categories:

absolute: trust, reject always allow or reject

local password: scram-sha-256 md5, password compare a user-supplied password with something stored in the database

password compare a user-supplied password with something stored in the database external password: ldap, pam, radius, bsd

trusted network: peer, ident rely on the network connection to authenticate

trusted tokens: gss, sspi

certificate authority: cert uses access to the private key of a certificate signed by a trusted certificate authority

So, there is one absolute and five conditional classes of authentication.