June 24, 2015 / by Jörg Rathlev / Software engineer / @

PostgreSQL offers a JSON column type, but how do you migrate existing data into a JSON-based structure?

We recently had a case where we wanted to migrate some data in an existing table from a fixed set of columns into a more flexible JSON column.

Consider the following example:

CREATE TABLE example ( id serial , name text ); INSERT INTO example ( name ) VALUES ( 'test' ), ( 'foo' ), ( 'bar' );

This table is now modified by adding an additional column of type JSON. The goal is to migrate the data from the name column into a field in the JSON object, ideally with a different field name.

Adding the new column is simple:

ALTER TABLE example ADD COLUMN target JSON ;

However, how can you migrate the existing data into the new JSON column?

What doesn’t work?

Unfortunately, creating arbitrary JSON structures with the JSON functions offered by PostgreSQL is not always straightforward. The main problem is that the row constructor loses the column names, which become the field names of the JSON object created with row_to_json :

UPDATE example SET target = row_to_json ( row ( name ));

Note how this creates JSON objects where the field name is simply “f1”:

id | name | target ----+------+--------------- 1 | test | {"f1":"test"} 2 | foo | {"f1":"foo"} 3 | bar | {"f1":"bar"}

If you’re simply selecting data, you can use a common table expression to first select the columns that you need, and then use row_to_json on its rows, as shown below. However, this does not work for an update statement, at least not if none of the columns have a unique constraint.

WITH ex ( json_field_name ) AS ( SELECT name AS json_field_name FROM example ) SELECT row_to_json ( ex . * ) FROM ex ;

Solution

For PostgreSQL versions before 9.4, the only way to create a JSON structure with arbitrary field names seems to be to construct the JSON structure as a string and then cast it to JSON (if you’re aware of a better solution, please let me know!):

UPDATE example SET target = ( '{"json_field_name":' || to_json ( name ) || '}' ):: json ;

id | name | target ----+------+---------------------------- 1 | test | {"json_field_name":"test"} 2 | foo | {"json_field_name":"foo"} 3 | bar | {"json_field_name":"bar"}

The to_json function converts the value of the source column to a JSON value and ensures that, for example, any quote characters occuring in strings will be properly escaped in the resulting JSON object.

If you’re using PostgreSQL 9.4, and only if all your values are strings, you can also use the new json_object function:

UPDATE example SET target = json_object ( array [ 'json_field_name' , name ]);

The result is (in this case) the same. However, this does not work if you want to migrate a column of, for example, integers because json_object requires an array of strings as its input.

Conclusion

PostgreSQL’s JSON support is pretty good (and getting better with each release), but constructing JSON objects with arbitrary structure and field names is not as simple as it could be, especially not within the constraints of an update statement. A simple workaround is to create the JSON object as a string and then explicitly cast it to JSON.