Ruby PostgreSQL has_one view

Sometimes, aggregating data can become overly complex in a normal ActiveRecord model. Because Rails works well with SQL views, we can create associations to SQL views that aggregate data for us, simplifying our models and potentially speeding up queries.

Modeling an inbox

I've got an inbox. A cat inbox. For real.

There are many possible implementations for modeling an inbox. I've gone with a relatively simple approach. Two users participate in a conversation, sending messages back and forth to each other. The Conversation model has a subject, but the body of the initial message is part of the Message object.

# app/models/conversation.rb class Conversation < ActiveRecord :: Base # fields: to_id, from_id, subject belongs_to :to , class_name: "User" belongs_to :from , class_name: "User" has_many :messages , dependent: :destroy , inverse_of: :conversation end # app/models/message.rb class Message < ActiveRecord :: Base # fields: user_id, conversation_id, body belongs_to :conversation , inverse_of: :messages belongs_to :user end

After the initial message, the two participants on the conversation send messages back and forth. A user may have any number of conversations with other users. As such, the main inbox view must list the conversations a user is a participant on, as well as some summary information about that conversation.

For our purposes, we've decided on an HTML table view with the following columns:

From - Who the original message was sent from

To - The original recipient of the message

Message - the Subject of the conversation, as well as the first line of the most recent message

Last post - The date/time of the most recent message

Replies - The number of replies on the conversation (excluding the first message)

Although the subject is part of the conversation itself, everything else comes from its various associations. This is the view, which reveals the expected interface each conversation object should have:

%table #inbox %thead %tr %th From %th To %th Message %th Last post %th Replies %tbody - conversations . each do | conversation | %tr %td = conversation . from_name %td = conversation . to_name %td %p %strong = conversation . subject = conversation . most_recent_message_body %td = time_ago_in_words ( conversation . most_recent_message_sent_at ) ago %td = conversation . reply_count

Let's explore a typical way to model this in our model directly.

A typical Ruby implementation

# app/models/conversation.rb class Converation < ActiveRecord :: Base # associations, etc... def most_recent_message_body most_recent_message . body if most_recent_message end def most_recent_message_sent_at most_recent_message . created_at if most_recent_message end def reply_count messages . size - 1 end def to_name to . name end def from_name from . name end private def most_recent_message @most_recent_message ||= messages . by_date . first end end # app/models/message.rb class Message < ActiveRecord :: Base # associations, etc... def self . by_date order ( "created_at DESC" ) end end

This approach is fairly straightforward. We obtain the most_recent_message_body and most_recent_message_sent_at from the most recent message, which is trivial after we've ordered the messages association by date. The to_name and from_name methods are delegated to their respective associations. And reply_count is simple the total number of messages, minus one (the initial message doesn't count as a "reply").

This approach offers a number of advantages. For one, it is familiar. I believe most Rails developers would be able to understand exactly what's going on above. It also locates all of the domain logic within the Conversation model, making it easy to find.

Having everything in the Conversation model is actually a blessing and a curse. Although everything is easy to find, the model is also quickly becoming bloated. It may not seem like much right now, but as more information is added to the inbox, it will become unruly.

The other problem with the above is the multitude of N+1 queries that it has introduced. With only 3 conversations in play, loading the inbox outputs a log like this:

Started GET "/" for 127.0.0.1 at 2013-02-11 09:49:02 -0600 Connecting to database specified by database.yml Processing by InboxesController#show as HTML User Load (12.8ms) SELECT "users".* FROM "users" LIMIT 1 Conversation Load (0.6ms) SELECT "conversations".* FROM "conversations" WHERE (1 IN (from_id, to_id)) User Load (18.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = 2 LIMIT 1 User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1 Message Load (12.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 7 ORDER BY created_at DESC LIMIT 1 (0.6ms) SELECT COUNT(*) FROM "messages" WHERE "messages"."conversation_id" = 7 User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = 3 LIMIT 1 CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1 Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 8 ORDER BY created_at DESC LIMIT 1 (0.4ms) SELECT COUNT(*) FROM "messages" WHERE "messages"."conversation_id" = 8 CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1 CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 2 LIMIT 1 Message Load (0.5ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 9 ORDER BY created_at DESC LIMIT 1 (0.4ms) SELECT COUNT(*) FROM "messages" WHERE "messages"."conversation_id" = 9 Rendered inboxes/show.html.haml within layouts/application (683.9ms) Completed 200 OK in 691ms (Views: 272.5ms | ActiveRecord: 418.1ms)

Eager-loading

We can definitely cut down on the N+1 query problem by introducing eager loading. In our controller, the conversations exposure is currently defined thusly:

# app/controllers/inboxes_controller.rb class InboxesController < ApplicationController expose ( :user ) { User . first } expose ( :conversations ) { user . conversations } end

Let's change that to eagerly load its associations:

expose ( :conversations ) { user . conversations . includes ( :messages , :to , :from ) }

With eager-loading in place, the log now looks slightly more reasonable:

Started GET "/" for 127.0.0.1 at 2013-02-11 09:55:24 -0600 Processing by InboxesController#show as HTML User Load (0.3ms) SELECT "users".* FROM "users" LIMIT 1 Conversation Load (0.3ms) SELECT "conversations".* FROM "conversations" WHERE (1 IN (from_id, to_id)) Message Load (0.3ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" IN (7, 8, 9) User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (1, 2) User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (2, 3, 1) Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 7 ORDER BY created_at DESC LIMIT 1 Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 8 ORDER BY created_at DESC LIMIT 1 Message Load (0.3ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 9 ORDER BY created_at DESC LIMIT 1 Rendered inboxes/show.html.haml within layouts/application (9.5ms) Completed 200 OK in 13ms (Views: 10.0ms | ActiveRecord: 2.4ms)

There are more optimizations we could make here in Ruby land. But data transformation and aggregation is something that databases are good at. We can use a native feature of SQL to aggregate information for us: views.

SQL views

A SQL view is essentially a virtual table. It can be queried just like a normal table, but does not physically store anything itself. Instead, a view has a query definition that it uses to represent its data.

In our case, SQL views can allow us to treat a complex SQL query as a table, abstracting away the complexity into view itself. SQL views are also read-only, and therefore are usually only used for querying, but not updating data directly.

ActiveRecord plays nicely with SQL views out of the box. It considers a SQL view a normal table, and all associations and querying methods work like they would with a normal table, with one exception: the records are read-only.

Add a migration for the view

Let's create a view to handle the to_name and from_name methods on conversation. We can do this in a normal migration, but it needs to be created with raw SQL:

class CreateConversationSummaries < ActiveRecord :: Migration def up execute <<- SQL CREATE VIEW conversation_summaries AS SELECT ... SQL end def down execute 'DROP VIEW conversation_summaries' end end

This is the basic syntax for adding a view with ActiveRecord migrations. Our view needs to incorporate to_name and from_name , so let's add those fields:

CREATE VIEW conversation_summaries AS SELECT c . id , f . name as from_name , t . name as to_name FROM conversations c inner join users t on t . id = c . to_id inner join users f on f . id = c . from_id

After we migrate our database, we can use our database console to verify that we see what we expect:

mailbox_development=# select * from conversation_summaries; id | from_name | to_name ----+-----------------+----------------- 7 | Felionel Richie | Cat Stevens 8 | Nelly Purrtado | Cat Stevens 9 | Cat Stevens | Felionel Richie (3 rows)

Cool. The id corresponds to the conversation, and to_name and from_name columns come from the users table, but it's all displayed to us as one table.

ActiveRecord associations for views

Now that our view exists, we can integrate it into our application:

class Conversation < ActiveRecord :: Base class Summary < ActiveRecord :: Base self . table_name = "conversation_summaries" self . primary_key = "id" belongs_to :conversation , foreign_key: "id" end has_one :summary , foreign_key: "id" end

Let's break down what's going on here.

I've chosen to nest the Summary model within the Conversation namespace, mostly to call out the fact that we're doing something non-standard. Also, the Summary class only makes sense in the context of a Conversation. For that reason, we need to manually set the name of the table.

We must also choose a primary key, because Rails cannot infer it for SQL views. The association itself should be familiar. It works like a normal has_one / belongs_to relationship, except that we override the foreign key.

Now that the relationships are set up, let's actually take advantage of our new view by changing the implementation of the to_name and from_name methods.

class Conversation < ActiveRecord :: Base # ... def to_name # Used to be to.name summary . to_name end def from_name # Used to be from.name summary . from_name end end

One the biggest benefits about this approach is that we can eager-load a view assocation. We no longer need the to or from associations eager-loaded, since we are no longer using any attributes from them in the view. Let's update our controller's exposure to only eager-load the necessary parts:

expose ( :conversations ) { user . conversations . includes ( :summary , :messages ) }

And when we visit the inbox again, the log looks like this:

Started GET "/" for 127.0.0.1 at 2013-02-11 14:26:12 -0600 Processing by InboxesController#show as HTML User Load (0.5ms) SELECT "users".* FROM "users" LIMIT 1 Conversation Load (0.4ms) SELECT "conversations".* FROM "conversations" WHERE (1 IN (from_id, to_id)) Conversation::Summary Load (0.6ms) SELECT "conversation_summaries".* FROM "conversation_summaries" WHERE "conversation_summaries"."id" IN (7, 8, 9) Message Load (0.3ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" IN (7, 8, 9) Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 7 ORDER BY created_at DESC LIMIT 1 Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 8 ORDER BY created_at DESC LIMIT 1 Message Load (0.4ms) SELECT "messages".* FROM "messages" WHERE "messages"."conversation_id" = 9 ORDER BY created_at DESC LIMIT 1 Rendered inboxes/show.html.haml within layouts/application (10.4ms) Completed 200 OK in 13ms (Views: 9.5ms | ActiveRecord: 2.9ms)

That's definitely an improvement, albeit a small one. We've pushed data from the user model into our SQL view, but we don't need to stop there.

Push it down!

Let's update our view migration to include more aggregated information about each conversation.

class CreateConversationSummaries < ActiveRecord :: Migration def up execute <<- SQL CREATE VIEW conversation_summaries AS SELECT c.id, f.name as from_name, t.name as to_name, m.body as most_recent_message_body, m.created_at as most_recent_message_sent_at, (select count(*) from messages m2 where m2.conversation_id = c.id) - 1 as reply_count FROM conversations c inner join users t on t.id = c.to_id inner join users f on f.id = c.from_id left outer join ( select distinct on(conversation_id) conversation_id, body, created_at from messages m1 order by conversation_id, created_at desc ) m ON m.conversation_id = c.id SQL end def down execute 'DROP VIEW conversation_summaries' end end

After running rake db:migrate:redo , we can verify that everything is still working as expect in the database console:

mailbox_development=# select * from conversation_summaries; id | from_name | to_name | most_recent_message_body | most_recent_message_sent_at | reply_count ----+-----------------+-----------------+----------------------------------------+-----------------------------+------------- 7 | Felionel Richie | Cat Stevens | Say you. Say meow. | 2013-02-08 02:45:27.07712 | 2 8 | Nelly Purrtado | Cat Stevens | Except that I'm a cat | 2013-02-05 16:45:27.088292 | 0 9 | Cat Stevens | Felionel Richie | I'm sorry that you're feeling that way | 2013-01-30 16:45:27.092443 | 1 (3 rows)

That's a lot of SQL! But actually, all I've added are one join to a subquery, and a subselect. Let's review both of these changes.

There are many ways to grab the most recent message for a conversation in SQL, including using window functions. The method I've opted for here is a subquery in the table expression. The subquery alone would return rows for only the most recent messages for each conversation:

conversation_id | body | created_at ----------------+----------------------------------------+---------------------------- 7 | Say you. Say meow. | 2013-02-08 02:45:27.07712 8 | Except that I'm a cat | 2013-02-05 16:45:27.088292 9 | I'm sorry that you're feeling that way | 2013-01-30 16:45:27.092443

By joining with only the most recent message per conversation, we avoid duplicate rows and only get the body and created_at columns from the most recent message. Then, joining against this subquery, we can add the body and created_at to the list of projections, naming them most_recent_message_body and most_recent_message_sent_at , respectively.

The other thing we've added to the view this iteration is the reply_count column, which is a subselect to get the count. We also subtract 1, just as before.

Let's take a look at our Conversation model now:

# before class Conversation < ActiveRecord :: Base belongs_to :to , class_name: "User" belongs_to :from , class_name: "User" has_many :messages , dependent: :destroy , inverse_of: :conversation def most_recent_message_body most_recent_message . body if most_recent_message end def most_recent_message_sent_at most_recent_message . created_at if most_recent_message end def reply_count [ messages . size - 1 , 0 ]. max end def to_name to . name end def from_name from . name end private def most_recent_message @most_recent_message ||= messages . by_date . first end end # after class Conversation < ActiveRecord :: Base class Summary < ActiveRecord :: Base self . table_name = "conversation_summaries" self . primary_key = "id" belongs_to :conversation , foreign_key: "id" end belongs_to :to , class_name: "User" belongs_to :from , class_name: "User" has_many :messages , dependent: :destroy , inverse_of: :conversation has_one :summary , foreign_key: "id" delegate :most_recent_message_sent_at , :most_recent_message_body , :reply_count , :to_name , :from_name , to: :summary end

With much of our data transformation and aggregation in our SQL view, our model has become trivially simple. It literally only contains assocations and delegation now. We update our exposure to only eager-load the conversation summary:

expose ( :conversations ) { user . conversations . includes ( :summary ) }

Now, reloading the page yields the following log output:

Started GET "/" for 127.0.0.1 at 2013-02-11 15:37:49 -0600 Processing by InboxesController#show as HTML User Load (1.0ms) SELECT "users".* FROM "users" LIMIT 1 Conversation Load (0.2ms) SELECT "conversations".* FROM "conversations" WHERE (1 IN (from_id, to_id)) Conversation::Summary Load (0.8ms) SELECT "conversation_summaries".* FROM "conversation_summaries" WHERE "conversation_summaries"."id" IN (7, 8, 9) Rendered inboxes/show.html.haml within layouts/application (5.5ms) Completed 200 OK in 8ms (Views: 6.0ms | ActiveRecord: 2.0ms)

Now we see some real improvement. All N+1 queries are gone, replaced instead with the eager-loading of the the Conversation::Summary model.

Real World Benefits

I used this technique in a real-world application. It helped abstract some of the mundane details of the inbox and allowed us to think about each conversation at a higher level with a summary.

In fact the app included even more business rules than I've included here. Each conversation had to include a read/unread status that updated with each sent message. Although it was easily implemented in pure Ruby, it cluttered the model and created yet more N+1 queries in the app view.

The inbox also had to be sorted by the most recent message date, so that the conversation with the most recent activity would appear first in the list. This kind of sorting without SQL is both cumbersome and inefficient in Ruby; you have to load all messages for each conversation. With the SQL view, it was as simple as changing the scope from user.conversations.include(:summary) to user.conversations.include(:summary).order("conversation_summaries.most _recent_message_sent_at DESC") .

Conclusion

Any time that we push stuff into the database, we make a tradeoff. In this case, when we move data transformation into the SQL view, we sacrifice the co-location of the conversation model and the definition of its summary. With the summary definition located in the database, there's one extra layer of indirection.

The other tradeoff is that any time we'd like to make a non-trivial change the view, we actually have to create an entirely new view, replacing the old one. If for example, we knew that our inbox was likely to change or add fields, the SQL view approach might be too brittle.

On the other hand, we effectively removed N+1 queries from our application and simplified our model considerably. By abstracting the conversation's summary into a model backed by a SQL view, we're able to think of the Summary as an object in its own right. This provides a cognitive simplification, but also yields performance gains as the dataset grows.

It may not be right for every situation, but knowing and understanding how we can use SQL views in our Rails applications adds another tool to our toolbelt.

Example app

As before, while writing this post, I created a sample Rails app to iterate quickly. I used TDD to write the pure-ruby approach, and reused the specs while I "refactored" the implementation to the subsequent approaches. Of particular note is the history of the Conversation model, which mirrors the code above.