I’m not keeping this up to data and all sort of awesome things happen on Azure like they are now allowing some of the stuff this talks about and have made some real advancements in performance so - THIS IS OLD don’t rely on it, hugs and kisses x x x

When it comes to storage there are two things that matter, latency and throughput. Sure, size matters but not much, if we want more we can just add more (well an L32s supports up to 64 data disks so by using, 4TB disks so you can get up to 256TB per VM).

Throughput is determined by the virtual machine size, the heavy hitters such as the GS5 and the M128’s (not all variants) can deliver up to 3600 MB/sec by splitting the IOs over disks that use caching and no caching. When it comes to the VM’s that can support premium disks you get given two throughput limits, the cached and uncached throughput. For example, the M128 has a “Max cached” throughput in IOPS (IO’s per second) of 250,000 and MBps of 1,600 MBps. (As an aside, one IO in Azure is 256KB so to get the perfect throughput of 1,600 MBps you would need 6,400 IOPS so I am not sure how the hell you would ever get to 250K IOPS.)

This cached throughput means that if you have disks attached with read or read/write caching enabled then these disks are throttled at 1,600 MBps. In reality, this could be less if you did something like issued IO requests that weren’t divisible by 256KB.

As well as the cached throughput, the M128 has a “Max uncached disk throughput” which is 80,000 IOps which is more reasonable and 2,000 MBps.

Now the eagle-eyed amongst you will have noticed that there aren’t any disks you can attach to a VM that offer anywhere near 2,000 MBps throughput. The fastest disk is the P40/P50 which support up to 250 MBps and 7,500 IOPS. To get to the VM throttles you need to add multiple disks and either stripe them or get your application to make use of many disks concurrently. There is one slight variation when you use caching for a disk, anything served from the cache does not count towards the throughput throttles so you can achieve more than 3,600 MBps. The way I see it is that a VM in Azure sits on a server and that server has a local SSD that is used for the temp ephemeral drive on all VM’s in Azure but they also use it for a local cache if you ask for it. The VM throttle limits are applied on the network between the VM and the storage so if you serve from the cache, it doesn’t hit that throttle point.

If you hit a throttle then it kills your performance so onto the main point of this article, how fast are disks in Azure? I thought it would be interesting to do some performance testing and compare a premium managed SSD, a managed HDD and the local SSD all with and without caching enabled. The ephemeral drive has caching on and it can’t be disabled, so I couldn’t test without it on, but that should, in theory, just be serving from the same place as the cache, a local SSD on the host machine (it probably is the same SSD).

For the test, I used diskspd form Microsoft on a D2 v3, I attached a single SSD and HDD and had caching disabled for the first tests, then enabled caching, rebooted and finished off getting the results. I ran two tests per configuration:

HDD - Read and Write

SDD - Read and Write

Local SSD - Read and Write

HDD - Read and Write with hardware caching enabled

SDD - Read and Write with hardware caching enabled

The throttle limits for the DS2 v3 are 32 MB/sec (cached) and 48 MB/sec (uncached) and the disks I added were 50MB/sec and 60 MB/sec, so I had to keep the throughput below 32 MB/sec for the cached test and 48 MB/sec for the uncached test.

Here were the results:

Caching on? Type IO Type Average Latency (ms) Min (ms) Max (ms) No HDD Write 18.056 7.278 464.461 No HDD Read 11.230 5.359 300.107 No SSD Write 5.051 3.842 46.526 No SSD Read 4.448 3.143 24.085 No (Disabled in diskspd) Local SSD Write 1.178 0.197 34.424 No (Disabled in diskspd) Local SSD Read 1.058 0.348 27.598 Yes Local SSD Write 0.628 0.196 11.875 Yes HDD Write 0.471 0.167 12.577 Yes HDD Read 0.534 0.146 23.913 Yes SSD Write 0.456 0.160 10.403 Yes SSD Read 0.501 0.167 16.210

The results with pictures:

The interesting thing here is the two peaks where it suddenly got very slow, I suppose if you have physical spinning disks you sometimes need to move an arm :)

Again the variation was pretty large but the numbers pretty good for cheaper HDD drives - an average of 11 milliseconds latency, it wouldn’t be any good for a sql server or something but for something that wasn’t IO bound then not too bad.

Moving onto the SSD we have much more consistent results:

Just over 5 millisecond latency.

Just under 5 millisecond latency!

Just over 1 millisecond latency!!

Just over 1 millisecond latency!!!

Now I changed the diskspd test so that it didn’t disable hardware caching and the local ssd went down to just over half a millisecond average latency:

The HDD write with caching on went down to similar speeds, average latency of 0.471 of a millisecond:

Reading from the cached HDD:

Finally the SSD with caching on:

Write:

Read:

I think we can sum that up with the phrase “Zoom Zoom”!

The interesting thing for me is that if we have IO patterns that mean we can take advantage of the local cache or even just the local temp drive then we can get some pretty fast access times. If we don’t need that much space then we can use the local temp ssd and if we need to have more permenacne then we can use HDD disks with caching enabled.

Where we need more throughput we can split disks into cached/uncached and then at least we know with premium managed disks that the disk times are going to be consistent. With HDD, when they don’t hit the cache or don’t have caching on then there are variable response times, probably because of an actual mechanical move in the disk so if you need guaranteed predictable response times then HDD are probably a bad choice.

This post was picked up on hacker news and there were a couple of interesting points:

https://news.ycombinator.com/item?id=17492217

What are the sustained latencies over a period of time?

What are the latencies in different regions and are these times consistent throught a region / globally?

I am running a test for 24 hours right now so will write about that once it has finished, I will also perform some of these tests in different regions and on multiple vm’s within a region. Probably not all the tests but the uncached test for 1 minute, maybe a few times.

I am really enjoying the feedback from reddit and, in this case, hacker news!