This time I was involved to identify a problem that stuck all the DR tests in a large enterprise, let me show you a description of the environment.

ENVIRONMENT

The Customer have a Production Hyper-V Farm (multiple HV clusters) on a Primary Datacenter, this farm have multiple LUNs (Logical Unit Number) on a Storage visible to the hosts via FC (Fiber Channel). The storage replicate the LUNs with an asynchronous replica to the DR (Disaster Recovery) datacenter.

The customer, during the year, make multiple simulation on the DR datacenter to validate if the DR work correctly, but in the last simulation the Orchestrator server in the DR datacenter stopped the execution of the runbooks, that automate the import of the VMs from the LUNs, with the following error message:





"Add-ClusterVirtualMachineRole : The File exists."

TROUBLESHOOTING

We identified inside of the orchestrator runbooks the piece of code that generate the exception and we retried the execution of the cmdlet only on a single VM like in this case:

We start thinking that may be the VM was already present in the Hyper-V cluster, but why we receive the message "The File Exists"! So we have retried the command with the "-verbose" parameter, and we received this additional info:

From this info, it seams that the cmdlet "Add-ClusterVirtualMachineRole" (that add the VM to the Hyper-V cluster) is trying to get a temporary file:

VERBOSE: Connecting to cluster on local computer HV1-2016. VERBOSE: at System.IO.__Error.WinIOError (Int32 errorCode, String maybeFullPath)

at System.IO.Path.InternalGetTempFileName (Boolean checkHost)

So we downloaded the Process Monitor and we retried Add-ClusterVirtualMachineRole for a single VM and we noticed these events:

During the execution of the cmdlet we saw that the PowerShell try to create multiple TMP files like "tmp2C0E.tmp" but it fail with a "NAME COLLISION" message.

So we start to investigate on the method used by this cmdlet:





System.IO . Path.InternalGetTempFileName

SOLUTION

Reading the documentation of the method used by the cmdlet, we identified the root cause:

This method creates a temporary file with a .TMP file extension. The temporary file is created within the user's temporary folder, which is the path returned by the GetTempPath method. The GetTempFileName method will raise an IOException if it is used to create more than 65535 files without deleting previous temporary files. The GetTempFileName method will raise an IOException if no unique temporary file name is available. To resolve this error, delete all unneeded temporary files. Official Reference available here.

Because the customer during the year have made multiple tests on the DR environment, the Orchestrator user that start the runbooks had more than 65535 temporary files in the temp user folder with this kind of syntax " tmpXXXX.tmp" .

The final solution was to add few line of code at the beginning of the runbook on the orchestrator server to cleanup the temp folder of the orchestrator user.

NICE TO KNOW

I was able to reproduce the Customer issue on a single Hyper-V Cluster in my lab, by using the "New-TemporaryFile" cmdlet, because it use the same method for the creation of the TMP files.