A year ago, we became the first major tech company to adopt high-density SMR (Shingled Magnetic Recording) technology for our storage drives. At the time, we faced a challenge: while SMR offers major cost savings over conventional PMR (Perpendicular Magnetic Recording) drives, the technology is slower to write than conventional drives. We set out on a journey to reap the cost-saving benefit of SMR without giving up on performance. One year later, here’s the story of how we achieved just that.

The Best Surprise Is No Surprise

When the first production machines started arriving in September, it was only natural to have a little apprehension and think, “Does this actually work? What if it breaks?!” It was raw, completely new technology—it was terrifying. And then ... nothing happened. That is, everything simply worked. Not that this was much of a surprise. The lead-up to this event was the classic Dropbox story: Sweat the details, build confidence by relentlessly bashing out the bugs—and then roll the project out on a large scale. For the past three years, the Dropbox teams involved in the change over to SMR have been doing a lot of innovation—and more testing than any of us care to remember. The hardware team worked very tightly with the Magic Pocket software team, envisioning each possible edge case and running through every imaginable scenario. That diligent work helped ensure that the migration to the SMRs went off without a hitch.

To prepare for running SMR, we had to make substantial changes to the software layer. We optimized the software by improving the throughput of the networking stack to the disk. We also added an SSD cache: since you can only write to the SMR disks in fixed-size zones, and you can only write to them sequentially, we knew we needed an area to stage the writes. Adding support for this SSD staging layer to the software was specifically targeted for our transition to SMR, but it has helped latency in other cases, as well. Fortunately, we worked on a significant portion of the necessary software changes required for sequential writing ahead of time, testing our existing fleet of PMR disks. Before we even began to build the new architecture, we made sure it would support both PMR and SMR. This meant that the whole stack was thoroughly tested by the hundreds of thousands of installed disks before we even started bringing the SMR machines online. This removed a considerable amount of risk from the equation. All we had to do once we received the new machines was change the actual disk we were writing to. In the end, one aspect of our original design helped smooth the transition to SMR. The generic Dropbox infrastructure handles data in immutable blocks up to 4MB in size, which was convenient for SMR, since it allows random writes onto the disk sequentially into a new block. And the size of the write zones we’ve set up in Magic Pocket, with 1 GB extents of data, fit perfectly with the 256 MB zones used to split up SMR drives.

Hardware

Initially, SMR was a proof-of-concept case: can we actually make it function the way we want? From a hardware point of view, turning to SMR would help us build data storage density quicker than with PMR. What we found was that the use-case for SMR matched up very well with the way we’ve already architected Magic Pocket. But in comparison to the control we have over our own software stack for SMR, the hardware team had a massive hill to climb in terms of learning everything that went on in the background. One of the biggest challenges in enabling SMR for Dropbox was that it is a new technology in the data-center context. It was the healthy working relationship between the hardware team and the Magic Pocket team that allowed the project to be as successful as it turned out to be. Dropbox is not the only large tech company that’s working on calibrating and fine-tuning their software for SMR, but the use-case is so natural for us that we’ve been motivated to move quickly. Still, being first had its challenges—not least being the sheer amount of data we already manage. Our hardware team had very limited support when it came to preparing for SMR. The vendors selling the drives didn’t have the chassis configuration that we have—our current test cluster is about six racks, and there are 48 systems, or close to 5,000 drives. So when we iterated through our revisions, we were able to obtain a far better signal, which led to a stronger test process. And that helped put us at the bleeding edge of the technology: few companies have really invested in SMR, so we often ended up doing a lot of the testing for our vendors, which kept us a step ahead.

Increasing Our Density

One of our goals when embarking on the SMR initiative last year was to have 25 percent of our data storage capacity on SMR drives in 2019. Since September, all new drives for our storage servers are now SMR. Meanwhile, we’ve been able to continuously increase the density of our disk capacity faster than the growth of the data itself. At this rate, close to 40 percent of Dropbox’s data will be on SMR by the end of 2019, surpassing our predicted goal.

Cost Savings

Much like our data storage goals, the actual cost savings of switching to SMR have met our expectations. We’re able to store roughly 10 to 20 percent more data on an SMR drive than on a PMR drive of the same capacity at little to no cost difference. But we also found that moving to the high-capacity SMR drives we’re using now has resulted in more than a 20% percent savings overall compared to the last generation storage design. And we’re also realizing savings in part due to other new lower cost hardware. Meanwhile, our efforts to work with multiple vendors for the SMR drives will further benefit the entire Dropbox supply chain and Dropbox’s future storage cost structure.

Energy Savings

The transition to SMR has also made Dropbox a much more efficient energy consumer. SMR drives have a lower power footprint, so we’re realizing savings by using the new 14-terabyte drives compared to the previous 8-terabyte drives. In essence, we are working with much denser racks, but our power draw has increased only marginally. And we have been able to increase the number of storage disks from 60 to 100 on a single machine while maintaining the same CPU and memory. Thanks to these efficiencies, we expect to realize even further energy savings as we eventually move to 18, 21 and 24-terabyte drives.

Open Source

The library we use to write to the SMR drives is the open-sourced libzbc , and through the process of working with it and running into the occasional issue, we’ve made 13 contributions to the library. What’s more, we developed our own testing tool called SMRtest which generates synthetic workloads to write/read verify, and benchmark read/write throughput performance on SMR drives. We’ve already shared this tool with our ecosystem partners, suppliers, and vendors, to ensure they have what they need to enable SMR. And we’ll deploy SMRtest as open-source software in the coming months to benefit the wider community.

Specific Challenges

On the software side of things, we opted for more capacity, performance and flexibility by writing directly to the disks without a filesystem. Some operating systems are adding support for that, but when we were working with it, it wasn’t an option. So in order to talk to the disk, we used Libzbc— which is essentially a wrapper to send commands directly to the disk, without going through the Linux device or a block device stack. But during testing, we ran into the issue of the disk simply failing, over and over. It turned out the failures were due to a hardcoded loop—since we weren’t using Linux, whose kernel code includes retry logic, we had to implement our own retry logic around accessing the disk. Firmware was also another issue when it came to getting the SMR drive technology to work on the existing platform, largely because the components came from various vendors. We work with multiple hard-drive vendors, as well as various kinds of intermediary technologies, such as the host bus adaptor, to connect multiple drives to a system. Each one of these vendors—as well as the server chassis itself—operated with its own firmware. There were a lot of moving pieces, so the first initiative on the hardware side was to get our various partners and vendors to talk to each other. We then worked with each individual vendor to identify and resolve any issues early on, and all of the vendors have come forward and engaged with us. But we are convinced that this will pay dividends in the long term. Opting to be multi-source across all the components, for example, insulates us against any single points of failure or too much reliance on a single supplier from a supply chain perspective.

Cold Storage and SMR

One of the latest developments at Dropbox is the incorporation of a new cold storage tier, designed for less frequently accessed data. Depending on the replication scheme, we’ve managed to cut down on disk usage by 25 to 33 percent with no noticeable change to the end-user experience. Similarly, our cold storage system uses our current mix of SMR and PMR drives, which translates to additional cost-savings without any difference in performance. If you want to learn more about how we set up the cold tier, read Preslav Le’s recent blog post.

What the Future Holds