Understanding, not avoiding, the cutting edge

You mentioned MySQL. Facebook and other companies get this reputation of being the champions of scalable MySQL, but Etsy is not a small service. Has it been a chore to scale your database layer as the company grows?

There’s a couple of ways to look at it. If you think of the evolution of a technical infrastructure of a growing web property, then there are these identifiable episodes in the evolution. You’re either making a relational database work in a distributed way, or you’re not. To be fair, at a high level, we don’t take a much different approach to federating data across many MySQL servers than does Facebook.

That’s more about the architectural pattern than it is about any of the tech. … For example, instead of having all of Etsy in one database server, usually the next thing people do is, “All right, let’s take our favorites, listings and user profiles. Then we’ll get one database for user profiles, one database for favorites, and one database for listings.”

That functional partitioning also has an expiration date on it. Then you have to take the leap to make it so that we’re going to store the majority of Etsy across many, many machines and make it so we can balance data between them.

None of that is really MySQL-specific. There are some tips and tricks, certainly. You’re going to want backups to work the way you expect them to. You have to do extra work in the application, like finding the database server that has the data you’re looking for.

But, again, that’s really just an architectural pattern. At that point you’re using the database as a reasonably done data store, which is you want because you want it to be really good at being stable. You want it to really be good at being reliable. All the other levers you’ll be putting in your application anyway.

It could be any other database really. There’s nothing special about MySQL.

“We want to plan for a world where stuff breaks all the time. And we want to make it so that when things break they matter a lot less.”

You mentioned you’re using this set of well-known tools that can handle a wide variety of stuff. But what are examples of what you might call new or cutting-edge or next-generation stuff that Etsy’s using?

I’ll be even more descriptive about it. I would say that we want to prefer a small number of well-known tools. …

If I find myself trying to unscrew a screw with the end of a hammer, then it’s probably time for me to think, “Well, this effort is not going to be worth it. I’m going to need a screwdriver.” Having said that, it also doesn’t mean that I’ve got 1,000 hammers — one for marble, one for balsa wood, one for plaster.

It’s not like an edict that says, “These are the blessed tools and everything else is forbidden.”

Instead, what we do is we do process-wise, or largely culturally, is we identify use cases that are departures from the norm. An engineer says, “Here’s this problem. I don’t think I can solve it with PHP, MySQL and Linux … or Hadoop or Lucene or whatever.

“Here’s what I tried. I tried to use those things, and here’s where they fell down and I don’t think they’re good. I really don’t want to use anything new, at least without any good reason.

“So, everybody, my peers in engineering, does anybody else have any good ideas? I think I’ve landed on this new piece of software. I just want to make sure before I keep going with this that everybody knows that this is a thing that we’re all willing to get good at.”

“I would rather have … carpenters because they’re really passionate about solving hard problems, given the choice between them and those candidates who say, “I don’t care what I build. I just need to use the laser nail gun.”

Redis — and this was a number of years ago — was one of those departures. Elasticsearch has been one of those departures. Sharded Solr is one. About half of our search is in Solr, half of it is in Elasticsearch. There’s some various storage engines that are a part of MySQL that were departures.

The thing is, when you pull something shiny and new off the shelf, there can be operational overhead. If it breaks and you’re the only one who knows how it works, then it probably wasn’t a great technical choice. It can be a really good technical choice if you’re planning for an optimal future. We don’t want to plan for an optimal future.

We want to plan for a world where stuff breaks all the time. And we want to make it so that when things break they matter a lot less, that they’re not critical. That they break and we can fix them and we can adapt and be resilient.

One of the ways that we do that is taking a critical-thinking look at the choices that we make. We don’t want to have choices made by the very well meaning, well intentioned, but very enthusiastic engineer who didn’t think everything through. No single engineer is going to think of all the contingencies. That’s why we want to take a much more diverse look.

Then when we say, “Alright, this is the thing. Redis — we’re going to use it. Here’s where we’re going to use it. Here’s where we’re going to get good.” Then we’re actually going to get good at it, which means that we’ve got a lot more confidence.

Another new tool Etsy uses is HipHop Virtual Machine, a framework created by Facebook in order to boost PHP performance. Source: Code as Craft

The one thing I keep hearing is that when it comes to hiring, people like to know they’ll get to work on new things. Does is affect who Etsy can hire if prospects don’t think, “Yes, I’ll be developing in Golang in the next three months!”

Sort of. I’d put it this way: I personally would take the same approach if I were hiring carpenters to build a house. I want the carpenters to be psyched to get on the job because of what the design of the house is and the challenges. We’ve got to build this museum on the edge of the cliff.

I would rather have those carpenters because they’re really passionate about solving hard problems, given the choice between them and those candidates who say, “I don’t care what I build. I just need to use the laser nail gun. I don’t care if it’s an outhouse. I don’t care if it’s a barn.”

Those engineers will have a lot more cognitive space. They’ll also have a lot more focus of attention on solving the problems, not on a particular chit. The song matters more than the guitar.

But there’s nothing that says you’re not going to work with tools that are going to be great for solving particular problems. In fact, we write a lot of our own tools because we can’t find the tools that really fit our use case.

As it turns out, there’s a lot of really hard problems here — incredibly hard engineering problems that actually don’t have anything to do with the tools. They’re just hard problems.

The more well-known one that we’ve been talking about recently is recommendations. We’re not a regular e-commerce site. We’ve got millions and millions and millions of unique things as opposed to a very small number of unique categories.

It’s like one long tail.

It’s all tail, basically. We could say that. Our data science and engineering teams … they don’t want to spend more time messing with their tools than they need to because they want to solve the problems. How do you suggest something for somebody to buy when there’s only one of those things in the world? That sort of thing.