Data migrations:



Oh I can do this in an hour. Two tops.



Two days later: (╯°□°）╯︵ ┻━┻ — Swizec Teller (@Swizec) August 20, 2015

On Thursday we pushed to production. Three months worth of code went from no users, to thousands of users. Or hundreds. Depends how you count.

My point is, our code went through the ultimate integration test - the real world.

The whole thing was much less intense than expected. We pushed in the middle of the night, ran our migrations, waited 36 minutes for them to run, ate some pizza while waiting for the fires to start, and went home when they didn't. An uneventful deploy is the best kind of deploy.

Well okay, I say no fires started, but that's not quite true. The whole engineering team has spent the last couple of days fixing known bugs that slipped through, squashing unknown bugs that users on strange computers in strange lands discovered by clicking something they shouldn't, manually tweaking the production database to get people out of unforeseen corner cases, and implementing some features we forgot we needed.

Such is life without QA. You forget things.

As David Heinemeier Hansson once said: it's not worth fixing, if a user doesn't complain about it.

Right?

I kid, we knew hiccups would happen. Hair-on-fire mode lasting just three days after a major deploy is good. Very good.

That moment when workers run while you're importing data and duplicate ids and your db doesn't get primary keys and rails gets confused. — Swizec Teller (@Swizec) August 21, 2015

But the week before was the most frustrating week of my life. I was this close to tearing my hair out, punching my computer in the face, and taking up goat farming. This close. does the this close finger gesture

I was doing a test run of the database migrations we accumulated in those three months. Shouldn't take more than two hours, right? Wrong. It took a week.

Some of our database migrations are also data migrations. Mistake no. 1.

None of our new migrations that we've had for three months have ever been run against the full database. Mistake no. 2.

Our whole database is under 500,000 lines. The entire data dump falls well under 200 megabytes. Our database is tiny. What could possibly go wrong?

Oh so very much could.

Most migrations ran in a fraction of a second. Many took a second or two. A few beasts took ten minutes.

Ten whole minutes for a single migration on a tiny database. The mind recoils in horror, unable to understand what could be taking so long.

When we wrote a migration, we tested it on our local machines. 2.3 gigahertz computers with six or eight CPU cores, 16 gigabytes of memory, a solid state drive, and a database that could fit on a floppy drive.

"It was fast when I tried it". Of course it was. Your computer is a monster running on bare metal. Heroku dynos are paltry "computers" running on many layers of virtualization. They're meant to zerg rush a problem, not handle a big thing on their own.

You run into problems when your migrations look like this:

def change add_column :promotional_tokens, :type, :string Token::PromotionalToken.reset_column_information Token::PromotionalToken.all.find_each do |token| if token.promotion_type.eql? 'referral' # remove if user is not a student user = User.find(token.user_id) if user.is? :student #refactor it to student profile token.update!(type: 'Token::ReferralToken' ) ref_token = Token::ReferralToken.find(token.id) user.profile.update!(referral_token: ref_token) else #remove it as it is now redundant token.destroy end end end end

This migration adds a type column to promotional_tokens , then goes through existing tokens - of which each user has one - and updates them. Students get a new token, everybody else has theirs removed. Because that's what business wants.

There are 90,000 users in the system.

Let's play spot the problem.

Using .all that loads the whole table into memory We run a DB query for every token to fetch the user Then we run a DB query to update the token Then we run a DB query to get the token again, but with the correct object type Then we run a DB query to move the token reference to user.profile If the user isn't a student, we run a destroy query instead

There's only 200 users who aren't students. We can ignore those.

For every user we have, this migration hits the database four times. 360,000 database queries. And just to make it more fun, each is made using ActiveRecord instead of raw SQL.

It took 10 minutes to run through the whole dataset on my local machine. It flat out crashed Heroku's one-off dyno. Out of memory.

And if it didn't crash, who knows. Our database runs on a separate server. Maybe in the same datacenter, maybe not. Heroku knows, we don't.

That makes migrations even slower.

That moment when your laptop is so much stronger than Heroku dynos that you decide to migrate data locally. — Swizec Teller (@Swizec) August 21, 2015

I'm not kidding. I legit considered snapshotting the production database, running migrations locally, then swapping production with the updated snapshot. That could screw things up, especially permissions, but at least I could run it.

We salvaged this migration after many struggles. Like this:

def change add_column :promotional_tokens, :type, :string Token::PromotionalToken.reset_column_information User.with_role(:tutor).includes(:promotional_tokens).find_each do |tutor| tutor.promotional_tokens.destroy_all end Token::PromotionalToken.update_all(type: 'Token::ReferralToken') User.with_role(:student).includes(:referral_token, :profile).find_each do |student| tok = student.referral_token student.profile.update_attribute(:referral_token, tok) end end

Doesn't look like much, but an order of magnitude less queries hit the database. The magic of scoping and .includes .

200 queries run to delete non-student tokens. Nothing you can do about that. A single query updates the type of all tokens still in the system. And a single query per token, moves the token reference to student.profile .

Some 90,003 queries in total. Takes a minute or two.

In school they teach you that O(4n+1) is the same as O(n+3). It's just linear complexity, who cares. Reality cares. 4 times 90,000 is a lot more than 90,000. Especially when you're talking to a database across TCP/IP.

All in all it took myself and the genius who sits across from me an entire week to get our migrations running in under 20 minutes on a MacBook Pro. We removed a lot of .all s, reduced many factors of linear complexity, and in one epic instance even turned linear complexity into constant complexity. That was cool.

We solved the rest by using --size=px . That ran our migrations with Heroku's beefiest dyno. 2.5 gigs of memory, dedicated machine, and 12x of computing. Whatever that means.

It took 36 minutes to run our migrations on production.

It took 36 seconds to test them in our dev environment.

Lesson learned.

Did you enjoy this article? 👎 👍

Published on September 2nd, 2015 in Active record pattern, CPU time, Database, Heroku, Rails, Technical

Learned something new?

Want to become a high value JavaScript expert? Here's how it works 👇 Leave your email and I'll send you an Interactive Modern JavaScript Cheatsheet 📖right away. After that you'll get thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over my 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths. Start with an interactive cheatsheet 📖 Then get thoughtful letters 💌 on mindsets, tactics, and technical skills for your career. "Man, love your simple writing! Yours is the only email I open from marketers and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌" ~ Ashish Kumar Your Name Your Email Your Address Subscribe & Become an expert 💌 Join over 10,000 engineers just like you already improving their JS careers with my letters, workshops, courses, and talks. ✌️

Have a burning question that you think I can answer? I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

Ready to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first 💰 on the side with ServerlessReact.Dev

Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️