a search for ‘unused database’ brings up this awesome wikimedia image for some reason.

Detecting DB Column Usage

I am always interested in ways to automate and cleanup old unused code. Removing the clutter helps focus in on the things that matter and make more clear patterns emerge. Continuing on some work to cleanup unused tables in a database, I really wanted a way to find unused columns / fields. I came up with two easy ways to try to automate detecting this in a Rails app. The first is to query the DB’s paper_trail history, which is easy if you are already using paper_trail. The second I will cover in another post soon.

Column Usage Output

What does column usage look like, well for each table it is a list of fields and the last date it was modified. The simple output should be pretty clear.

____________________________________________________________ field usage for Customer ************************************************************ barcode: 2017-05-20 email: No updates in 3 months fixed_location_id: No updates in 3 months agent_id: No updates in 3 months account_id: 2017-05-20 house_latitude: No updates in 3 months house_longitude: No updates in 3 months house_location_accuracy: No updates in 3 months ... notes: 2017-05-20 last_paid_at: 2017-05-20 ____________________________________________________________ field usage for Lead ************************************************************ fixed_location_id: No updates in 3 months custom_location_name: No updates in 3 months agent_id: 2017-05-20 captured_by_id: 2017-05-20 converted_at: 2017-05-20 service_level_id: No updates in 3 months source: 2017-05-20 latitude: 2017-05-20 longitude: 2017-05-20 status: 2017-05-20 assigned_at: No updates in 3 months ...

Obviously, a developer will need to interpret this, some tables aren’t often updated it doesn’t mean they have no purpose. For example we have some configuration tables that are only modified when we integrate new partners into our system. So they will show up in my list with “No updates in 3 months” but that can be quickly filtered out.

Detect Columns via Paper_Trail

If you use the gem paper_trail to track the history of changes on some of your models, you are already recording when any column was last changed. All one has to do is come up with the queries to pull the data out. Doing this by hand would be a huge pain, but luckily it is easy to hook into our code and automate the process. We just find all models that support paper_trail, then iterate through the fields.

Show Me the Code

The below functions and tasks can be added to any rake task file.

def find_ar_models Rails.application.eager_load! array = ActiveRecord::Base.descendants.collect { |x| x.to_s if x.table_exists? }.compact - ['ActiveRecord::SchemaMigration'] array.reject { |x| x.split('::').last.split('_').first == "HABTM" } end desc 'List when column was last updated' task :list_recent_columns => :environment do ignore_models = %w(RailsSettings::Settings Devise::Oauth::Client) models = find_ar_models models = models.reject{|model| ignore_models.include?(model.to_s) } ignored_columns = %(id created_at updated_at) results = {} models.each do |model| model_results = {} model = Kernel.const_get model next unless model.paper_trail.enabled? puts "_"*60 puts "field usage for #{model}" puts "*"*60 model.attribute_names.each do |attr| next if ignored_columns.include?(attr) result = PaperTrail::Version.where(item_type: model.to_s) .where("created_at >= ?",[3.months.ago]) .where("object_changes ILIKE ?",["%#{attr}%"]) .order("created_at DESC") .limit(1).first model_results[attr] = result&.created_at&.to_date || 'No updates in 3 months' puts "#{attr}: #{model_results[attr]}" end results[model.to_s] = model_results end end

Quick Note on Performance

Make sure you have indexes on created_at and item_type or this will be very slow. If you don’t find another way to limit. For example we don’t have an index on created_at so I did a query to find the max_id of 3 months ago and use that as the limit opposed to created_at which makes the queries run without any perf issues.

Please enable JavaScript to view the comments powered by Disqus.

Disqus