While working with deduplication with volumes of around 30TB, we noticed the various job types were not executing as we were expecting. As a Microsoft MVP, I’m very fortunate to have direct access to the people with deep knowledge of the technology. A lot of the credit for this post goes to Will Gries and Ran Kalach from Microsoft who were kind enough to answer my questions as to what was going on under the hood. Here’s a summary the things I learned in the process of understanding what was going on.

Before we dive in any further, it’s important to understand the various deduplication job types as they have different resource requirements.

Job Types (source)

Optimization This job performs both deduplication and compression of files according data deduplication policy for the volume. After initial optimization of a file, if that file is then modified and again meets the data deduplication policy threshold for optimization, the file will be optimized again.

Scrubbing This job processes data corruptions found during data integrity validation, performs possible corruption repair, and generates a scrubbing report.

GarbageCollection This job processes previously deleted or logically overwritten optimized content to create usable volume free space. When an optimized file is deleted or overwritten by new data, the old data in the chunk store is not deleted right away. By default, garbage collection is scheduled to run weekly. We recommend to run garbage collection only after large deletions have occurred.

Unoptimization This job undoes deduplication on all of the optimized files on the volume. At the end of a successful unoptimization job, all of the data deduplication metadata is deleted from the volume.



Another operation that happens in the background that you need to be aware of is the reconciliation process. This happens when the hash index doesn’t fit entirely in memory. I don’t have the details at this point in time as to what exactly this is doing but I suspect it tries to restore index coherency across multiple index partitions that were processed in memory successively during the optimization/deduplication process.

Server Memory Sizing vs Job Memory Requirements



To understand the memory requirements of the deduplication jobs running on your system, I recommend you have a look at the event with id 10240 in the Windows Event log Data Deduplication/Diagnostic. Here what it looks like for an Optimization job:

Optimization job memory requirements.

Volume C:\ClusterStorage\POOL-003-DAT-001 (\\?\Volume{<volume GUID>})

Minimum memory: 6738MB

Maximum memory: 112064MB

Minimum disk: 1024MB

Here are a few key things to consider about the job memory requirement and the host RAM sizing:

Memory requirements scales almost linearly with the total size of the data to dedup The more data to dedup, the more entries in the hash index to keep track of You need to meet at least the minimum memory requirement for the job to run for a volume

The more the memory on the host running deduplication, the better the performance because: You can run more jobs in parallel The job will run faster because You can find more of the hash index in memory The more index you fit in memory, the less reconciliation job will have to be performed If the job fits completely in memory, the reconciliation process is not required

If you use throughput scheduling (which is usually recommended) The deduplication engine will allocate by default 50% of the host’s memory but this is configurable If you have multiple volumes to optimize, it will try to run them all in parallel It will try to allocate as much memory as possible for each job to accelerate them If not enough memory is available, other optimization jobs will be queued

If you start optimization jobs manually The job is not aware of other jobs that might get started, it will try to allocate as much memory as possible to run the job potentially leaving other future jobs on hold as not enough memory is available to run them



Job Parallelism

I’ve touched a bit in the previous point about memory sizing but here’s a recap with additional information:

You can run multiple jobs in parallel

The dedup throughput scheduling engine can manage the complexity around the memory allocation for each of the volume for you

You need to have enough memory to at least meet the minimum memory requirement of each volume that you want to run in parallel If all the memory has been allocated and you try to start a new job, it will be queued until resources become available The deduplication engine tries to stick to the memory quota determined when the job was started

Each job in currently single threaded in Windows Server 2012 R2 Windows Server 2016 (currently in TP4) supports multiple threads per job, meaning multiple threads/cores can process a single volume This greatly improves the throughput of optimization jobs

If you have multiple volumes residing on the same physical disks, it would be best to run only one job at a time for those specific disks to minimize disk thrashing

To put things into perspective, let’s look at some real world data:

Volume Min RAM (MB) Max RAM (MB) Min Disk (MB) Volume Size (TB ) Unoptimized Data Size (TB ) POOL-003-DAT-001 6 738 112 064 1 024 30 32.81 POOL-003-DAT-002 7 137 118 900 1 024 30 35.63 POOL-003-DAT-004 7 273 121 210 1024 30 35.28 POOL-003-DAT-006 4 089 67 628 1 024 2 18.53

To run optimization in parallel on all volumes I need at least 25.2GB of RAM

To avoid reconciliation while running those jobs in parallel, I would need a whopping 419.8GB of RAM This might not be too bad if you have multiple nodes in your cluster with each of them running a job



Monitoring Job

To keep an eye on the deduplication jobs, here are the methods I have found so far:

Get-DedupJob and Get-DedupStatus will give you the state of the job as they are running

Perfmon will give you information about the current throughput of the jobs currently running Look at the typical physical disk counters Disk Read Bytes/sec/ Disk Write Bytes/sec Avg. Disk sec/Transfer You can get an idea of the saving ratio by looking at how much data is being read and how much is being written per interval

Event Log Data Deduplication/Operational Event ID 6153 which will give you the following pieces of information once the job has completed: Job Elapsed Time Job Throughput (MB/second)

Windows Resource Monitor (if not on Server Core or Nano) Filter the list of process on fsdmhost.exe and look at the IO it’s doing on the files under the Disk tab



Important Updates

I recommend that you have the following updates installed if you are running deduplication on your system as of 2016-04-18:

Final Thoughts

Deduplication is definitely a feature that can save you quite a bit of money. While it might not fit every workload, it has its use and benefits. In one particular cluster we use to store DPM backup data, we were able to save more than 27TB (and still counting as the jobs are still running). Windows Server 2016 will bring much improved performance and who knows what the future will bring, dedup support for ReFS? Who knows!

I will try to keep this post updated as I find out more information about deduplication operationally.

Other Resources