

IO amplification refers to the broad set of circumstances where one IO operation triggers other, unintentional IO operations. Though it may appear that only one IO operation occurred, in reality, the file system had to perform multiple IO operations to successfully service the initial IO. This phenomenon can be especially costly when considering the various optimizations that the file system can no longer make:



When performing a write, the file system could perform this write in memory and flush this write to physical storage when appropriate. This helps dramatically accelerate write operations by avoiding accessing slow, non-volatile media before completing every write.

Certain writes, however, could force the file system to perform additional IO operations, such as reading in data that is already written to a storage device. Reading data from a storage device significantly delays the completion of the original write, as the file system must wait until the appropriate data is retrieved from storage before making the write.











In general, if the cluster size exceeds the size of the IO, certain workflows can trigger unintended IOs to occur. Consider the following scenarios where a ReFS volume is formatted with 64K clusters:



Consider a tiered volume . If a 4K write is made to a range currently in the capacity tier, ReFS must read the entire cluster from the capacity tier into the performance tier before making the write . Because the cluster size is the smallest granularity that the file system can use, ReFS must read the entire cluster, which includes an unmodified 60K region, to be able to complete the 4K write.

If a cluster is shared by multiple regions after a block cloning operation occurs, ReFS must copy the entire cluster to maintain isolation between the two regions. So if a 4K write is made to this shared cluster, ReFS must copy the unmodified 60K cluster before making the write.

Consider a deployment that enables integrity streams . A sub-cluster granularity write will cause the entire cluster to be re-allocated and re-written, and the new checksum must be computed. This represents additional IO that ReFS must perform before completing the new write, which introduces a larger latency factor to the IO operation.







By choosing 4K clusters instead of 64K clusters, one can reduce the number of IOs that occur that are smaller than the cluster size, preventing costly IO amplifications from occurring as frequently.





4K clusters limit the maximum volume and file size to be 16TB



64K cluster sizes can offer increased volume and file capacity, which is relevant if you’re are hosting a large deployment on your NTFS volume, such as hosting VHDs or a SQL deployment.







NTFS has a fragmentation limit, and larger cluster sizes can help reduce the likelihood of reaching this limit



Because NTFS is backward compatible, it must use internal structures that weren’t optimized for modern storage demands. Thus, the metadata in NTFS prevents any file from having more than ~1.5 million extents.



One can, however, use the “format /L” option to increase the fragmentation limit to ~6 million. Read more here .





64K cluster deployments are less susceptible to this fragmentation limit, so 64K clusters are a better option if the NTFS fragmentation limit is an issue. (Data deduplication, sparse files, and SQL deployments can cause a high degree of fragmentation.)



Unfortunately, NTFS compression only works with 4K clusters, so using 64K clusters isn’t suitable when using NTFS compression. Consider increasing the fragmentation limit instead, as described in the previous bullets.













Microsoft’s file systems organize storage devices based on cluster size. Also known as the allocation unit size, cluster size represents the smallest amount of disk space that can be allocated to hold a file. Because ReFS and NTFS don’t reference files at a byte granularity, the cluster size is the smallest unit of size that each file system can reference when accessing storage. Both ReFS and NTFS support multiple cluster sizes, as different sized clusters can offer different performance benefits, depending on the deployment.In the past couple weeks, we’ve seen some confusion regarding the recommended cluster sizes for ReFS and NTFS, so this blog will hopefully disambiguate previous recommendations while helping to provide the reasoning behind why some cluster sizes are recommended for certain scenarios.Before jumping into cluster size recommendations, it’ll be important to understand what IO amplification is and why minimizing IO amplification is important when choosing cluster sizes:ReFS offers both 4K and 64K clusters. 4K is the default cluster size for ReFS, andbecause it helps reduce costly IO amplification:Additionally, 4K cluster sizes offer greater compatibility with Hyper-V IO granularity, so we strongly recommend using 4K cluster sizes with Hyper-V on ReFS. 64K clusters are applicable when working with large, sequential IO, but otherwise, 4K should be the default cluster size.NTFS offers cluster sizes from 512 to 64K, but in general, we recommend a 4K cluster size on NTFS, as 4K clusters help minimize wasted space when storing small files. We also strongly discourage the usage of cluster sizes smaller than 4K. There are two cases, however, where 64K clusters could be appropriate:While a 4K cluster size is the default setting for NTFS, there are many scenarios where 64K cluster sizes make sense, such as: Hyper-V, SQL, deduplication, or when most of the files on a volume are large.