Server Group Patching: Busted (A.K.A. We are the QA Team)

Released initially in Technical Preview 1511 and most recently as a pre-release feature in Current Branch 1606, server group patching promises to automate most cluster patching scenarios. It coordinates the patching of any arbitrary collection of servers by percentage, a specific number of servers, or a specific order. Additionally, you can run scripts before and after the patching process. The product team has said that server group patching is 100% functional and only being held back while they argue internally about the user interface.

My team decided to try this out on our Exchange servers which require both a deterministic order and for workloads to be migrated around the cluster. We quickly decided upon three groups, the patching order of each, and created the appropriate scripts to move the active database node around the cluster. Setting up a server group for patching is extremely easy and is well documented here: Service a Server Group. With everything in place I eagerly awaited to see how it went. It didn’t go well. The first server patched just fine but the subsequent servers did nothing. I dug into the logs and in UpdatesDeploymentAgent.log I could see the first server waiting for a lock (Lock State:0), get the lock (LockState:1), install updates, and release the lock (LockState:2). I tried a whole bunch of things but nothing I did would get the remaining servers to patch.

So, I opened a ticket with Premier and sent a bunch of logs. After about a month of back and forth I got escalated to a Senior Support Engineer. He tried to replicate the issue by setting up the simplest test possible: three servers patched in order. He found the exact same issue I had. In fact, he found not just one bug but four separate bugs:

What is the moral of this story? Configuration Manager users are the product’s best QA team. Despite being released for over a year in it’s current form it would appear that no one internally has actually tested that this particular feature works. It took a Premier case and a month of back and forth until I got escalated to someone who actually tried it. This is what Agile looks like, new features and new bugs that in theory get squashed quickly based on user feedback. You may not like this new reality but this example and others like it prove that this is the new world order. However, for this to work users need to provide high quality feedback which usually means opening a case with Premier support. Complaining on Technet, Reddit, or Twitter doesn’t really count. Those of us lucky enough to have the resources to do so need to step up and make it happen.