There are different reasons why you might consider having multiple databases in your Ruby on Rails application. In my specific case scenario, I needed to store large quantities of data representing user behavior: clicks, pages visited, historical changes, and so on.

This kind of databases generally are not mission critical, and grow much faster (and larger) than most databases. Their requirements are often different: for instance, they need more storage space, are more tolerant in the face of hardware or software failures, and are write-intensive. For these reasons, sometimes it is interesting to separate them from your application’s primary database. Often, non-RDBMS databases are chosen for these kind of tasks, something which is however beyond the scope of this article.

I googled and read many different solutions, however I couldn’t find one that was able to fully cover how to:

Have different and isolated migrations and schemas for every database.

Use rails generators to create new migrations for every database, independently.

Offer database-specific rake tasks for the most common database operations (i.e. like the ones available for the primary database).

Integrate with RSpec’s default spec task.

task. Work with Database Cleaner.

Work on Heroku.

This is my take on how to solve all of these – and have a fully working multiple database solution for your Rails application.

Create the custom database files

For the purpose of this tutorial, we’re going to set up a second database called Stats. To do so, we’re going to duplicate how Rails handles the primary database, and stick to conventions.

First of all, create the file config/database_stats.yml and populate it as you do with the primary database’s config file. Your file will look something like this:

config/database_stats.yml development: adapter: postgresql encoding: utf8 host: localhost pool: 10 database: myapp_stats_development username: postgres password: test: adapter: postgresql encoding: utf8 host: localhost pool: 10 database: myapp_stats_test username: postgres password: production: adapter: postgresql encoding: utf8 url: <%= ENV["DATABASE_STATS_URL"] %> pool: <%= ENV["DB_POOL"] || 5 %> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 development : adapter : postgresql encoding : utf8 host : localhost pool : 10 database : myapp_stats_development username : postgres password : test : adapter : postgresql encoding : utf8 host : localhost pool : 10 database : myapp_stats_test username : postgres password : production : adapter : postgresql encoding : utf8 url : < % = ENV ["DATABASE_STATS_URL"] % > pool : < % = ENV ["DB_POOL"] | | 5 % >

Note that I’ve given specific names to the databases, trying to follow as closely as possible Rails’ naming conventions. Also, I’ve set the database production url to an environment variable DATABASE_STATS_URL. This will allow us to easily set this variable to a secondary database when deploying to Heroku.

We’re now going to create a directory that will hold the schema and all the migrations of the Stats database, so that it will have its own files clearly isolated from the primary database. We are basically going to duplicate Rails’ primary database db directory.

Create the directory db_stats in the Rails root and ensure to copy the structure and files of the primary database db directory within it. You will have something like:

-- db |-- migrate schema.rb seeds.rb -- db_stats |-- migrate schema.rb seeds.rb 1 2 3 4 5 6 7 8 -- db |-- migrate schema.rb seeds.rb -- db_stats |-- migrate schema.rb seeds.rb

The created files schema.rb and seeds.rb, together with the migrate directory, should just be empty.

Add Rake tasks

To handle the Stats database, and allow for its creation, migrations, schema dumping and other functionalities we’re going to need custom Rake tasks. These tasks will provide us with the same functionalities that Rails provides us for the primary database.

Create a new file lib/tasks/db_stats.rake, and paste the following:

lib/tasks/db_stats.rake task spec: ["stats:db:test:prepare"] namespace :stats do namespace :db do |ns| task :drop do Rake::Task["db:drop"].invoke end task :create do Rake::Task["db:create"].invoke end task :setup do Rake::Task["db:setup"].invoke end task :migrate do Rake::Task["db:migrate"].invoke end task :rollback do Rake::Task["db:rollback"].invoke end task :seed do Rake::Task["db:seed"].invoke end task :version do Rake::Task["db:version"].invoke end namespace :schema do task :load do Rake::Task["db:schema:load"].invoke end task :dump do Rake::Task["db:schema:dump"].invoke end end namespace :test do task :prepare do Rake::Task["db:test:prepare"].invoke end end # append and prepend proper tasks to all the tasks defined here above ns.tasks.each do |task| task.enhance ["stats:set_custom_config"] do Rake::Task["stats:revert_to_original_config"].invoke end end end task :set_custom_config do # save current vars @original_config = { env_schema: ENV['SCHEMA'], config: Rails.application.config.dup } # set config variables for custom database ENV['SCHEMA'] = "db_stats/schema.rb" Rails.application.config.paths['db'] = ["db_stats"] Rails.application.config.paths['db/migrate'] = ["db_stats/migrate"] Rails.application.config.paths['db/seeds'] = ["db_stats/seeds.rb"] Rails.application.config.paths['config/database'] = ["config/database_stats.yml"] end task :revert_to_original_config do # reset config variables to original values ENV['SCHEMA'] = @original_config[:env_schema] Rails.application.config = @original_config[:config] end end 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 task spec : [ "stats:db:test:prepare" ] namespace : stats do namespace : db do | ns | task : drop do Rake :: Task [ "db:drop" ] . invoke end task : create do Rake :: Task [ "db:create" ] . invoke end task : setup do Rake :: Task [ "db:setup" ] . invoke end task : migrate do Rake :: Task [ "db:migrate" ] . invoke end task : rollback do Rake :: Task [ "db:rollback" ] . invoke end task : seed do Rake :: Task [ "db:seed" ] . invoke end task : version do Rake :: Task [ "db:version" ] . invoke end namespace : schema do task : load do Rake :: Task [ "db:schema:load" ] . invoke end task : dump do Rake :: Task [ "db:schema:dump" ] . invoke end end namespace : test do task : prepare do Rake :: Task [ "db:test:prepare" ] . invoke end end # append and prepend proper tasks to all the tasks defined here above ns . tasks . each do | task | task . enhance [ "stats:set_custom_config" ] do Rake :: Task [ "stats:revert_to_original_config" ] . invoke end end end task : set_custom _ config do # save current vars @original_config = { env_schema : ENV [ 'SCHEMA' ] , config : Rails . application . config . dup } # set config variables for custom database ENV [ 'SCHEMA' ] = "db_stats/schema.rb" Rails . application . config . paths [ 'db' ] = [ "db_stats" ] Rails . application . config . paths [ 'db/migrate' ] = [ "db_stats/migrate" ] Rails . application . config . paths [ 'db/seeds' ] = [ "db_stats/seeds.rb" ] Rails . application . config . paths [ 'config/database' ] = [ "config/database_stats.yml" ] end task : revert_to_original _ config do # reset config variables to original values ENV [ 'SCHEMA' ] = @original_config [ : env_schema ] Rails . application . config = @original_config [ : config ] end end

This needs a little explanation: let’s break up this file in its main sections. First of all, we simply provide “proxies” to standard Rails database tasks, in a newly created Rake namespace stats:db:

task :drop do Rake::Task["db:drop"].invoke end task :create do Rake::Task["db:create"].invoke end task :setup do Rake::Task["db:setup"].invoke end task :migrate do Rake::Task["db:migrate"].invoke end [...] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 task : drop do Rake :: Task [ "db:drop" ] . invoke end task : create do Rake :: Task [ "db:create" ] . invoke end task : setup do Rake :: Task [ "db:setup" ] . invoke end task : migrate do Rake :: Task [ "db:migrate" ] . invoke end [ . . . ]

Then, we loop all of these tasks, and ensure the task stats:set_custom_config is run before and the task stats:revert_to_original_config after every one of the “proxy” tasks:

# append and prepend proper tasks to all tasks defined in stats:db namespace ns.tasks.each do |task| task.enhance ["stats:set_custom_config"] do Rake::Task["stats:revert_to_original_config"].invoke end end 1 2 3 4 5 6 # append and prepend proper tasks to all tasks defined in stats:db namespace ns . tasks . each do | task | task . enhance [ "stats:set_custom_config" ] do Rake :: Task [ "stats:revert_to_original_config" ] . invoke end end

We have to do this since, unfortunately, Rails support for multiple databases isn’t that great, hence we need to provide minor hacks to make everything work. For this reason we have to set specific environment and configuration variables to custom values which match our Stats database before we run the “proxy” tasks, and then ensure that the original values are set back once those tasks have been run. The following two tasks do just that:

task :set_custom_config do # save current vars @original_config = { env_schema: ENV['SCHEMA'], config: Rails.application.config.dup } # set config variables for custom database ENV['SCHEMA'] = "db_stats/schema.rb" Rails.application.config.paths['db'] = ["db_stats"] Rails.application.config.paths['db/migrate'] = ["db_stats/migrate"] Rails.application.config.paths['db/seeds'] = ["db_stats/seeds.rb"] Rails.application.config.paths['config/database'] = ["config/database_stats.yml"] end task :revert_to_original_config do # reset config variables to original values ENV['SCHEMA'] = @original_config[:env_schema] Rails.application.config = @original_config[:config] end 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 task : set_custom _ config do # save current vars @original_config = { env_schema : ENV [ 'SCHEMA' ] , config : Rails . application . config . dup } # set config variables for custom database ENV [ 'SCHEMA' ] = "db_stats/schema.rb" Rails . application . config . paths [ 'db' ] = [ "db_stats" ] Rails . application . config . paths [ 'db/migrate' ] = [ "db_stats/migrate" ] Rails . application . config . paths [ 'db/seeds' ] = [ "db_stats/seeds.rb" ] Rails . application . config . paths [ 'config/database' ] = [ "config/database_stats.yml" ] end task : revert_to_original _ config do # reset config variables to original values ENV [ 'SCHEMA' ] = @original_config [ : env_schema ] Rails . application . config = @original_config [ : config ] end

Notice how the lines 9-13 set values to the files and directories we have created in the previous steps.

Finally, if you’re using RSpec you can add one dependency to the spec task, to ensure that the Stats database is automatically prepared when tests are run:

task spec: ["stats:db:test:prepare"] 1 task spec : [ "stats:db:test:prepare" ]

Once all of this is set up, we can create the Stats database and run its first migration:

$ rake stats:db:create $ rake stats:db:migrate 1 2 $ rake stats : db : create $ rake stats : db : migrate

This will generate the Stats database schema file in db_stats/schema.rb.

Add a custom generator

Unfortunately, we cannot simply use Rails’ generator ActiveRecord::Generators::MigrationGenerator because it hardcodes the parent directory of the migration (notice the path hardcoded to the directory db/migrate in line 4 here below):

active_record/migration/migration_generator.rb def create_migration_file set_local_assigns! validate_file_name! migration_template @migration_template, "db/migrate/#{file_name}.rb" end 1 2 3 4 5 def create_migration _ file set_local_assigns ! validate_file_name ! migration _ template @migration_template , "db/migrate/#{file_name}.rb" end

Therefore, we need to have a custom generator to create migrations for the Stats database. However, we can still inherit from it and monkey patch this specific function. Create the following generator in lib/generators/stats_migration_generator.rb:

lib/generators/stats_migration_generator.rb require 'rails/generators/active_record/migration/migration_generator' class StatsMigrationGenerator < ActiveRecord::Generators::MigrationGenerator source_root File.join(File.dirname(ActiveRecord::Generators::MigrationGenerator.instance_method(:create_migration_file).source_location.first), "templates") def create_migration_file set_local_assigns! validate_file_name! migration_template @migration_template, "db_stats/migrate/#{file_name}.rb" end end 1 2 3 4 5 6 7 8 9 10 11 require 'rails/generators/active_record/migration/migration_generator' class StatsMigrationGenerator < ActiveRecord :: Generators :: MigrationGenerator source _ root File . join ( File . dirname ( ActiveRecord :: Generators :: MigrationGenerator . instance_method ( : create_migration_file ) . source_location . first ) , "templates" ) def create_migration _ file set_local_assigns ! validate_file_name ! migration _ template @migration_template , "db_stats/migrate/#{file_name}.rb" end end

In line 9 we set the directory base to the Stats database directory. Also, in line 4 we initialize the templates directory and point it at the original one used by the generator we’re inheriting from.

With all of this in place, we can now generate migrations for the Stats database:

$ rails g stats_migration create_clicks create db_stats/migrate/20151201191642_create_clicks.rb 1 2 $ rails g stats_migration create_clicks create db_stats / migrate / 20151201191642_create_clicks.rb

You’ll notice that the migration file gets created in the Stats database migrate directory db_stats/migrate. You can edit this file and then run your migrations with the Rake task that we’ve set up in the previous steps, just as you normally would do with your primary database:

$ rake stats:db:migrate 1 $ rake stats : db : migrate

Finalize connection and models

We’re almost done. Add a new initializer file config/initializers/db_stats.rb and paste the following:

config/initializers/db_stats.rb # save stats database settings in global var DB_STATS = YAML::load(ERB.new(File.read(Rails.root.join("config","database_stats.yml"))).result)[Rails.env] 1 2 # save stats database settings in global var DB_STATS = YAML :: load ( ERB . new ( File . read ( Rails . root . join ( "config" , "database_stats.yml" ) ) ) . result ) [ Rails . env ]

Notice that we reference the Stats database configuration file that we created in the first step here above. By doing this, we initialise a global variable DB_STATS that holds the current environment’s configuration of the Stats database.

Finally, we can set our models’ connection to this configuration. For example, let’s say that we have a Click model that corresponds to the migration here above. All you have to do is add one extra line that specifies which connection to use:

click.rb class Click < ActiveRecord::Base establish_connection DB_STATS end 1 2 3 4 class Click < ActiveRecord :: Base establish _ connection DB _ STATS end

It’s that easy. Your model will now use the database Stats.

If you have multiple models that need to connect to the Stats database, however, you will need to add an extra step. If you were to have another model establishing its own connection to the Stats database, it would have its own connection pool and you might risk getting out of available connections to your Stats database. Therefore, if you have multiple models it is recommended to inherit from a single model, so that all the models connecting to the Stats database will share the same connection pool.

To do so, create the base model that connects to the Stats database:

click.rb class StatsBase < ActiveRecord::Base establish_connection DB_STATS self.abstract_class = true end 1 2 3 4 class StatsBase < ActiveRecord :: Base establish _ connection DB _ STATS self . abstract_class = true end

You can now inherit in all your models:

click.rb class Click < StatsBase end class View < StatsBase end 1 2 3 4 5 class Click < StatsBase end class View < StatsBase end

Heroku

As already anticipated, the last step that you need to make this work on Heroku is to set the environment variable DATABASE_STATS_URL to the database you want to use as Stats. For example, if you created a second database called HEROKU_POSTGRESQL_TEAL_URL all you have to do is to set this database’s value using the Heroku toolbelt:

$ heroku config:set DATABASE_STATS_URL=postgres://gsdfjrthjsnaew:gry6OJF6drDjththjkSDngldsf@ec2-116-22-114-221.compute-1.amazonaws.com:5432/hmsrthj24dfgks 1 $ heroku config:set DATABASE_STATS_URL=postgres://gsdfjrthjsnaew:gry6OJF6drDjththjkSDngldsf@ec2-116-22-114-221.compute-1.amazonaws.com:5432/hmsrthj24dfgks

And you’re ready to go.

Bonus: DatabaseCleaner

If you’re using the DatabaseCleaner gem, you can set it to clean the models that use the Stats database too. For example, your spec/rails_helper.rb may look something like this:

spec/rails_helper.rb ENV["RAILS_ENV"] ||= 'test' require 'spec_helper' require File.expand_path("../../config/environment", __FILE__) require 'rspec/rails' Dir[Rails.root.join("spec/support/**/*.rb")].each { |f| require f } ActiveRecord::Migration.maintain_test_schema! RSpec.configure do |config| config.use_transactional_fixtures = false config.infer_spec_type_from_file_location! config.before(:suite) do DatabaseCleaner.clean_with(:truncation) DatabaseCleaner[:active_record, { model: Click }].clean_with(:truncation) end config.before(:each) do |example| unit_test = ![:feature, :request].include?(example.metadata[:type]) strategy = unit_test ? :transaction : :truncation DatabaseCleaner.strategy = strategy DatabaseCleaner[:active_record, { model: Click }].strategy = strategy DatabaseCleaner.start DatabaseCleaner[:active_record, { model: Click }].start end config.after(:each) do DatabaseCleaner.clean DatabaseCleaner[:active_record, { model: Click }].clean end end 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 ENV [ "RAILS_ENV" ] || = 'test' require 'spec_helper' require File . expand_path ( "../../config/environment" , __FILE__ ) require 'rspec/rails' Dir [ Rails . root . join ( "spec/support/**/*.rb" ) ] . each { | f | require f } ActiveRecord :: Migration . maintain_test_schema ! RSpec . configure do | config | config . use_transactional_fixtures = false config . infer_spec_type_from_file_location ! config . before ( : suite ) do DatabaseCleaner . clean_with ( : truncation ) DatabaseCleaner [ : active_record , { model : Click } ] . clean_with ( : truncation ) end config . before ( : each ) do | example | unit_test = ! [ : feature , : request ] . include ? ( example . metadata [ : type ] ) strategy = unit _ test ? : transaction : : truncation DatabaseCleaner . strategy = strategy DatabaseCleaner [ : active_record , { model : Click } ] . strategy = strategy DatabaseCleaner . start DatabaseCleaner [ : active_record , { model : Click } ] . start end config . after ( : each ) do DatabaseCleaner . clean DatabaseCleaner [ : active_record , { model : Click } ] . clean end end

According to DatabaseCleaner README, it should be possible to set a connection option instead of the model one. Unfortunately, my attempts at this have been unsuccessful. If anyone knows how to do this and avoid specifying a DatabaseCleaner strategy for every model, please let me know.

I hope you’ve enjoyed reading this, and that my ramblings can be helpful to someone going down this same path. As usual, any suggestions on how to improve any of this are warmly welcome.

Happy multiple db’ing! :)