Using PostgreSQL and hstore with Rails By Nando Vieira February 13, 2015 Read in 4 minutes

Rails 4+ introduced new features all over the place, but PostgreSQL support was one that has benefited the most, with many new column types.

One of these column types is called hstore, a column based on a key-value data structure. This feature is extremely useful when you have a data structure that can vary, but that needs to be queried eventually.

In Rails you can use serialized attributes, but this approach has a problem: you can’t query the stored value. This is not the case with hstore, since you can use functions and operators to query values and keys.

The downside of hstore is that all values are stored as strings. So if value type is important, you’ll have to coerce it to the proper type before using it.

About PostgreSQL support

hstore is supported by PostgreSQL 9.2+. If you’re using Ubuntu Trusty Tahr 14.04 LTS, you’re good to go. Older versions must use the PostgreSQL’s official Debian repository.

Using hstore

To enable hstore, you must issue the CREATE EXTENSION command.

CREATE EXTENSION IF NOT EXISTS hstore ;

If you’re running Rails, you can use enable_extension . Let’s add a column called settings to the users table.

class AddPreferencesOnUsers < ActiveRecord :: Migration def change enable_extension "hstore" add_column :users , :preferences , :hstore end end

If you’re going to query this column, you must add an index. There are two types you can use: GiST and GIN.

GIN indexes are three types faster to search, but they take more time to index. They also take more disk space. Use it when you have more than 100K unique terms.

GiST indexes are slower than GIN indexes, but they’re faster to update. Use it when you have up to 100K unique terms.

You mileage may very, so make sure you benchmark everything before changing your database setup.

You can define the index on your migration file with the :using option.

class AddPreferencesToUsers < ActiveRecord :: Migration def change enable_extension 'hstore' add_column :users , :preferences , :hstore add_index :users , :preferences , using: :gin end end

Now you have to identify this column on your model with ActiveRecord::Store::ClassMethods#store_accessor .

class User < ActiveRecord :: Base store_accessor :preferences end user = User . new user . preferences = { github: "fnando" , twitter: "fnando" } user . save! user . reload user . preferences [ :twitter ] #=> fnando

As I said before, hstore will store all values as string. And if you provide a different type, it’ll be coerced before storing it to the database.

user . preferences [ :newsletter ] = false user . save! user . reload user . preferences [ :newsletter ]. class #=> String

This can be a problem, but we have an easy solution for this.

Using the Virtus gem

Virtus is a gem that can define atributes with type coercion. So let’s create a class called UserPreferences , that will represent our user’s preferences.

class UserPreferences include Virtus . model attribute :github , String attribute :twitter , String attribute :newsletter , Boolean end

Virtus will do all the boring work, including the UserPreferences#initialize method with mass-assignment support, which also performs type coercion.

preferences = UserPreferences . new ( user . preferences ) preferences . newsletter . class #=> FalseClass

To avoid having to explicitly call UserPreferences.new , we can define a custom serializer that will transparently convert the hstore value into a UserPreferences instance.

Instead of using ActiveRecord::Base.store_accessor we’ll use ActiveRecord::Base.serialize , defining UserPreferences as our serializer.

class User < ActiveRecord :: Base serialize :preferences , UserPreferences end

A serializer is any object that implements the methods dump and load .

dump will be called whenever the object is going to be persisted on the database.

will be called whenever the object is going to be persisted on the database. load will receive the hstore value (a Hash) and must return the coerced value.

class UserPreferences include Virtus . model attribute :github , String attribute :twitter , String attribute :newsletter , Boolean def self . dump ( preferences ) preferences . to_hash end def self . load ( preferences ) new ( preferences ) end end

That was easy! Now let’s see if it’s working properly.

user = User . first user . preferences . class #=> UserPreferences user . preferences . newsletter = true user . save! user . reload user . preferences . newsletter . class #=> TrueClass

What about querying the hstore column? That’s our next topic.

Querying hstore columns

Now we’ll explore some basic operators that you can use to query the hstore column. There are other operators you can use, so make sure you read the documentation.

To test indexes without having too many rows, disable sequential scan before running EXPLAIN .

SET enable_seqscan = FALSE ;

Find all users that have the github key.

User . where ( "preferences ? :key" , key: "github" )

Find all users without the github key.

User . where ( "not preferences ? :key" , key: "github" )

Find an user with a specific Github account

User . where ( "preferences @> hstore(:key, :value)" , key: "github" , value: "fnando" ). first

Find all users that have a Twitter account with the substring %nan%

User . where ( "preferences -> :key LIKE :value" , key: "twitter" , value: "%nan%" )

Find all users that opted-in for newsletter

User . where ( "preferences -> newsletter = :value" , value: 'true' )

Only the operators @> , ? , ?& and ?| will use the index. Avoid performing queries like the two last examples, since it can be really slow depending on the number of records you have. But if you have to query a hstore field, make sure you create an index like the following:

CREATE INDEX preferences_newsletter_key_on_users_index ON users (( preferences -> 'newsletter' ));

What about JSON?

PostgreSQL also has JSON column types. If you’re already using the new PostgreSQL 9.4, create a JSONB column and you’ll benefit from GIN/GiST indexes, pretty much like hstore. That’s not the case for JSON columns, which is available on older versions.

Querying a JSON column will always perform a sequential scan, unless you create an expression index; otherwise PostgreSQL will fetch record by record until the condition is met, which can be a really slow operation.

Wrapping up

PostgreSQL’s hstore is a nice alternative for embedding documents without losing the benefits of a relational database.

The fact that all values are stored as strings can be a problem if you need to perform other operations, because you have to cast values beforehand on both database and application layers. We’ll see how to avoid this problem by using JSONB on the next article.