We have a client that uses multi-tenant database where each database holds data for each of their customers. Whenever a new customer is added, a service dynamically creates a new database. In order to seed this new database we were tasked to implement a feature to copy data from existing “demo” database.

The “demo” database is actually a live client where sales team does demo. This ensures that the data that is copied is fresh and not stale.

We implemented a solution where we simply listed all the tables in namespace and used activerecord-import to copy the table data. We used activerecord-import gem to keep code agnostic of underlying database as we used different databases in development from production. Production is “SQL Server” and development database is “PostgreSQL”. Why this project ended up having different database in development and in production is worthy of a separate blog.

When we started using the above mentioned strategy then we quickly ran into a problem. Inserts for some tables were failing.

insert or update on table "dependent_table" violates foreign key constraint "fk_rails" Detail: Key (column)=(1) is not present in table "main_table".

The issue was we had foreign key constraints on some tables and “dependent” table was being processed before the “main” table.

So initially we thought of simply hard coding the sequence in which to process the tables. It means if any new table is added then we will have to update the service to include the newly added table. So we needed a way to identify the foreign key dependencies and determine the sequence to copy the tables at runtime. To resolve this issue, we thought of using Topological Sorting.

Topological Sorting

To get started we need the list of dependencies of “main” and “dependent” tables. In Postgresql, this sql query fetches the table dependencies.

SELECT tc . table_name AS dependent_table , ccu . table_name AS main_table FROM information_schema . table_constraints AS tc JOIN information_schema . key_column_usage AS kcu ON tc . constraint_name = kcu . constraint_name AND tc . table_schema = kcu . table_schema JOIN information_schema . constraint_column_usage AS ccu ON ccu . constraint_name = tc . constraint_name AND ccu . table_schema = tc . table_schema WHERE constraint_type = 'FOREIGN KEY' and ( tc . table_name like 'namespace_%' or ccu . table_name like 'namespace_%' ); => dependent_table | main_table ----------------------------------- dependent_table1 | main_table1 dependent_table2 | main_table2

The above query fetches all the dependencies for only the tables have namespace or the tables we are interested in. The output of above query was [[dependent_table1, main_table1], [dependent_table2, main_table2]] .

Ruby has a TSort module that for implementing topological sorts. So we needed to run the topological sort on the dependencies. So we inserted the dependencies into a hash and included the TSort functionality into the hash. Following is the way to include the TSort module into hash by subclassing the Hash .

require "tsort" class TsortableHash < Hash include TSort alias tsort_each_node each_key def tsort_each_child ( node , & block ) fetch ( node ). each ( & block ) end end # Borrowed from https://www.viget.com/articles/dependency-sorting-in-ruby-with-tsort/

Then we simply added all the tables to dependency hash, as below

tables_to_sort = [ "dependent_table1" , "dependent_table2" , "main_table1" ] dependency_graph = tables_to_sort . inject ( TsortableHash . new ) { | hash , table | hash [ table ] = []; hash } table_dependency_map = fetch_table_dependencies_from_database => [[ "dependent_table1" , "main_table1" ], [ "dependent_table2" , "main_table2" ]] # Add missing tables to dependency graph table_dependency_map . flatten . each { | table | dependency_graph [ table ] ||= [] } table_dependency_map . each { | constraint | dependency_graph [ constraint [ 0 ]] << constraint [ 1 ] } dependency_graph . tsort => [ "main_table1" , "dependent_table1" , "main_table2" , "dependent_table2" ]

The output above, is the dependency resolved sequence of tables.