Overview

While spending a lot of time on the Storage Spaces Direct Slack group, one thing that comes up, again and again, is patching of S2D Clusters, and what is the best way to do it.

For this blog series, I’m going to break down the patching best practices into 2 separate scenarios:

Offline Patching Using Cluster Aware Updating

Offline Patching

Offline patching is a pretty common scenario when patching S2D Clusters, and in my mind it is used for 2 reasons, catching up on multiple months of patching where there are known issues, and planned patching in a small window with an outage.

The process is pretty straight forward, shut everything in the cluster down, patch the hosts, and start it all back up again. But this means your highly available platform isn’t that available, so why do it?

Pros

The main advantages to offline patching is that there is no risk of VMs experiencing unexpected reboots because they’re already powered off, and for the same reason, there is no risk to the storage volumes provided by Storage Spaces Direct.

And you can patch and reboot all nodes at the simultaneously because there are no storage jobs that need to run when all of the CSVs are offline.

Cons

The obvious disadvantage, of course, is the fact that you need to arrange a business outage to shut everything down for patching, however, this outage only needs to be 1-2 hours.

Steps

Plan your maintenance window. Shutdown all VMs on the cluster Take the virtual disks offline. Use Failover Cluster Manager to take the Cluster Shared Volumes offline under ‘Storage > Disks’ Or use Powershell to offline all disks with Get-ClusterSharedVolume -Cluster S2D-Cluster | Stop-ClusterResource Take the Cluster Pool offline Use Failover Cluster Manager to take the Cluster Pool offline under ‘Storage > Pools’ Or use Powershell to offline the pool with Get-ClusterResource -Cluster S2D-Cluster | ?{$_.ResourceType -eq "Storage Pool"} | Stop-ClusterResource Stop the cluster. Run the Stop-Cluster -Cluster S2D-Cluster command Or use Failover Cluster Manager to stop the cluster. Disable the cluster service on each node. This prevents the cluster service from starting up while being patched. Set Cluster Service to Disabled in services.msc Or use Get-Service clussvc -ComputerName Server01 | Set-Service -StartupType Disabled Apply the Windows Server Cumulative Update and any required Servicing Stack Updates to all nodes. (You can update all nodes at the same time, no need to wait since the cluster is down). Restart the nodes, and ensure everything looks good. Set the cluster service back to Automatic on each node. Set the Cluster Service to Automatic in services.msc Or use Get-Service clussvc -ComputerName 'Server01' | Set-Service -StartupType Automatic Start the cluster. Run Start-Cluster -Name S2D-Cluster Bring the Cluster Pool back online. Use Failover Cluster Manager to bring the Cluster Pool online under ‘Storage > Pools’ Or use Powershell to offline the pool with Get-ClusterResource -Cluster S2D-Cluster | ?{$_.ResourceType -eq "Storage Pool"} | Start-ClusterResource Bring the virtual disks back online. Use Failover Cluster Manager to bring the Cluster Shared Volumes online under ‘Storage > Disks’ Or use Powershell to online all disks with Get-ClusterSharedVolume -Cluster S2D-Cluster | Start-ClusterResource Monitor the status of the virtual disks by running the Get-Volume and Get-VirtualDisk cmdlets.

Simplifying the process

Seeing as this is a 13 step process, it’s 13 times human error can occur. To help with removing human error, and because I love automating things with Powershell, I’ve created some scripts to reduce the number of steps down to just 7.

Now stopping the cluster once your VMs are offline is a single command - Stop-S2DCluster .

Stop-S2DCluster will check all your volumes are healthy, and all VMs are shut down before taking any action. It will then stop all CSVs, and the Storage Pool, before stopping the cluster and setting all the cluster services to disabled on the hosts.

Starting things up after you’ve patched all the hosts is just as easy with Start-S2DCluster .

Unlike Stop-S2DCluster, Start-S2DCluster needs to be run against a cluster node, rather than the cluster, as it will start the cluster service on that node first, and then automatically discover all the other nodes in the cluster. It’ll set all the cluster services back to automatic and start them. After the nodes have joined the cluster, it will bring the storage pool and CSVs back online.

# Arrange a maintenance window for the outage ... # Stop the VMs Get-VM -ComputerName ( Get-ClusterNode -Cluster S2D-Cluster).Name | Stop-VM # Shutdown the S2D Cluster Stop-S2DCluster -Name S2D-Cluster # Patch Hosts and Reboot ... # Start the S2D Cluster Start-S2DCluster -ComputerName S2DHost01 # Start the VMs Get-VM -ComputerName ( Get-ClusterNode -Cluster S2D-Cluster).Name | Start-VM

These Powershell commands are part of my S2D-Maintenance functions, and the latest version can be downloaded from by GitHub repo.

# Download location $FileLocation = C:\Scripts\S2D-Maintenance.ps1 # Download link $URL = "https://github.com/comnam90/bcthomas.com-scripts/raw/master/Powershell/Functions/S2D-Maintenance.ps1" # Download the file [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 Invoke-WebRequest -UseBasicParsing -OutFile $FileLocation # Import Functions for use . $FileLocation

Wrapping up

So in Part 1, we’ve gone over the process for performing offline maintenance to a Storage Spaces Direct or AzureStack HCI Cluster and automated a number of the steps to simplify the process.

Offline maintenance is always advised when your cluster is 6 months or more behind on patching, as it reduces the risk of hitting known bugs and the window required to catch up to date on patches.

Next time we’ll cover off using Cluster Aware Updating to make sure you don’t fall behind in patch level in the first place.

Original Code