Multithreading Powershell Scripts

In your scripting journey there will come a time that you have a script that is simply running too long. Perhaps you’re wanting to gather information hourly and the script is taking two hours to run. Maybe you’re a consultant and need a discovery script to run as fast as possible so you can get out of there? Whatever it is at some point you’ll consider multithreading. Powershell has this capability baked right into it using Powershell Jobs, but .Net has a way too, and initial testing shows it might be faster! Read on to see what I mean.

Update – 6/3/2015

This has turned out to be one of my more popular posts, so I feel the need to do a little update. STOP reading this! Boe Prox has released a full module that allows you to use Runspaces simply and easily. I’ve implemented it in some of my newer scripts and I can’t rave enough about how well it works. Check out the GitHub project here, and the blog he did here.

A Note about Jobs

Jobs are really the “Powershell” way of doing multithreading, at least until Workflow begins taking on more steam, which was only introduced with Powershell 3.0. There are a couple of ways to run jobs, one using the Start-Job cmdlet and the other is to watch for cmdlet’s with the -AsJob parameter. Both will put a scriptblock–or cmdlet function–into a background job and return the console immediately to you. One major problem with Jobs is that there’s no easy way to throttle them. Setup a loop with 1000 elements in it, each submitting a job into the background and you’ll end up with 1000 jobs running on your computer and everything pretty much grinding to a halt. So in order to throttle this down you have to control the flow of background jobs. This can be done using the Get-Job cmdlet like so:

Do { Start-Sleep -Seconds 1 } Until (@(Get-Job).Count -le 5)

You have to force Get-Job into an array–by surrounding it with @( )–because if there are no background jobs running it will return a $null, which of course doesn’t have a count property. This works very effectively but you essentially have to write it twice, because you need to throttle the submission of jobs and then after you’re done you need to monitor the jobs for when they’re all done. You then use Retrieve-Job to get any information returned by the job. After you’ve retrieved the information you then have to dispose of the job to clean up memory using Remove-Job.

The interesting thing here, is it turns out there is a lot of overhead with jobs, especially in the creating and retrieving of the job. If you create a lot of jobs–like DFS Monitor with History does–you could be leaving a lot of performance on the table.

Runspaces

Runspaces are not necessarily a Powershell function, really more of a .NET one. Luckily, since Powershell is a .NET language we have full access to it. I wish I could say I discovered them and wrote the upcoming code myself, but I didn’t. I found two great sources:

First is Boe Prox, who wrote this blog post Using Background Runspaces Instead of PSJobs For Better Performance that really turned me on to the possibilities. I had seen some other posts from him about Runspaces but never looked into it, but with this post you can see the overhead created with Jobs and avoided with Runspaces. Great stuff.

Next shoulder I’m standing on is Jon Boulineau who wrote a very interesting Powershell module to submit and use Runspaces: psasync Module: Multithreaded PowerShell. Another great read and a blog you should definitely follow. Honestly, if you read no further and just used Jon’s module you’d be in great shape, and probably better than the code I’ll be showing you!

So why I’m I writing more on this subject? As good as the above posts were they left some information out and I ended up spending a lot of time distilling what was there so that I understood it. I’ve always said I learn with my fingers and this was a great case of having to write it myself, in my own way, in order to understand it. I’m going to try to save you that and explain what the heck is going on, which is pretty straight forward. This is no deep dive, either. Don’t expect to come away knowing everything there is to know about Runspaces. What I do hope to achieve is you understanding Runspaces well enough to put the code in your own scripts and execute background jobs successfully.

Setting Up Runspaces

The first thing we need to do is set up a Runspace pool. This is where you set aside memory and resources for our background jobs, or pipes/pipelines. There’s really not a lot to these, except for one of the best things about Runspaces and that’s the automatic throttling.

$MaxThreads = 5 $RunspacePool = [RunspaceFactory ]::CreateRunspacePool(1, $MaxThreads) $RunspacePool.Open()

Now we have $RunspacePool with your pool definition, and we’ve set it for a maximum threads of 5. You can change this value to whatever you want but keep in mind there will be a point of diminishing returns. Have a PC with a single CPU, single core and no Hyper-threading (there are still a few of those out there, aren’t there?) and you probably don’t want to push that thread count too high. Got dual processors with 6-cores each? Yeah, go for it!

The beauty here is you don’t have to worry about submitting too many jobs, the Runspace will manage that for you. Remember our loop above with 1000 elements? Go ahead and submit them all and only up to $MaxThreads will run at a time.

Now we need a script to run in the background, and a variable to hold the Runspace handle which we need to use to track the background job. Last we’ll need something to hold the variable reference to the Runspace itself. I’ve seen a couple of different ways to do all of this, from a hashtable (which I’m not the biggest fan of) to a customized object. I like to keep things simple so I’m just going to use my favorite object type, the PSObject.

$ScriptBlock = { Param ( [int]$RunNumber ) $RanNumber = Get-Random -Minimum 1 -Maximum 10 Start-Sleep -Seconds $RanNumber $RunResult = New-Object PSObject -Property @{ RunNumber = $RunNumber Sleep = $RanNumber } Return $RunResult } $Jobs = @()

Notice the Param section? Runspaces are like Powershell Jobs in that they are completely separated from the script and you have to pass arguments down to them. Now the meat:

$Job = [powershell ]::Create(). AddScript($ScriptBlock ).AddArgument ($argument1) $Job.RunspacePool = $RunspacePool $Jobs += New-Object PSObject -Property @{ Pipe = $Job Result = $Job.BeginInvoke() }

First define $Job as a Powershell object, then use the AddScript() method to add our scriptblock to the object. Then another method, AddArgument() to put our variable into there. Need to submit multiple arguments? Just keep adding .AddArgument() to your line, or reference the job variable and add more like this: $Job.AddArgument($variable).

After that we use the RunspacePool property to add our Runspace definition to the job. Last line is using the PSObject to store the relevant information. I use the Pipe property to track the job itself, and the Result property to store the Job handle information. You use the BeginInvoke() method for that information, and this will start the background job assuming the number of threads allowed in the Runspace pool isn’t full.

Watch It Go By

So we’ve defined a Runspace pool, we’ve defined our script in a scriptblock and we’ve submitted our job into the background. Now what? We need a mechanism to monitor the jobs running and see when they’re completed and there’s a pretty easy way to do that by watching the IsCompleted property in the Runspace handle.

Write-Host "Waiting.." -NoNewline Do { Write-Host "." -NoNewline Start-Sleep -Seconds 1 } While ( $Jobs.Result.IsCompleted -contains $false ) Write-Host "All jobs completed!"

We stored the Runspace handle in the Result property of our $Jobs object, so we need to monitor that. One way you could do that is to loop through the entire array of objects stored in $Jobs, or we can use the -contians interrogator which will go through array for us. Because of that we can set up a simple Do loop to monitor that IsCompleted property until all of the jobs report back as $true. I like to give a little feedback while it’s checking too.

Now all of the background jobs are done we need to get the information they’ve collected back. That’s why we kept the Job information in the Pipe property of our $Jobs object. It’s all in there, we just have to get it out.

$Results = @() ForEach ($Job in $Jobs ) { $Results += $Job.Pipe.EndInvoke($Job.Result) }

We setup a loop to go through all of the elements in the $Jobs array–of objects–and use the EndInvoke() method to pull the data out of the Runspace and store it into another variable, $Results.

And that’s it. You’ve gone through the entire cycle of creating and running background jobs in Runspaces, the Surly way.

What about DFS Monitor?

Interesting you should bring that up. I, of course, immediately went to the DFS Monitor to see if Runspaces would shave any time off of them and it really didn’t! Hopefully you read Proe’s blog Using Background Runspaces Instead of PSJobs For Better Performance above and you know that overall it should improve your multi-threading performance but I actually saw several seconds added on to my DFS Monitor performance! Now there are a lot of things involved with that, including how busy the server I’m querying is at the moment, which can affect performance and I didn’t have time to run some extensive tests. I didn’t have time because I ran into a really bad problem!

Hashtables and Me

In DFS Monitor I use a hashtable to return multiple points of data from the background job back to the primary script and this works just fine when using a PS background job but did not work at all with a Runspace job! The hashtable came back as a weird PSCustomObject that I had to use specialized dot sourcing to get to the information ($Result.Item.Status kinda stuff).

I’ll have to do some testing and whatnot to figure out what’s going on. Since DFS Monitor was one of my first Powershell scripts it could very well be that I am not creating the hashtable correctly and while a background job allows these rule breaks .NET Runspaces don’t. Or it could be something else entirely. I’ll be doing some testing over the next couple of weeks to try to find out what’s happening and I’ll report my results back once I have them.

Test Code

If you’re interested in trying out my test code, here it is.

cls $Throttle = 5 #threads $ScriptBlock = { Param ( [int]$RunNumber ) $RanNumber = Get-Random -Minimum 1 -Maximum 10 Start-Sleep -Seconds $RanNumber $RunResult = New-Object PSObject -Property @{ RunNumber = $RunNumber Sleep = $RanNumber } Return $RunResult } $RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle) $RunspacePool.Open() $Jobs = @() 1..20 | % { #Start-Sleep -Seconds 1 $Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($_) $Job.RunspacePool = $RunspacePool $Jobs += New-Object PSObject -Property @{ RunNum = $_ Pipe = $Job Result = $Job.BeginInvoke() } } Write-Host "Waiting.." -NoNewline Do { Write-Host "." -NoNewline Start-Sleep -Seconds 1 } While ( $Jobs.Result.IsCompleted -contains $false) Write-Host "All jobs completed!" $Results = @() ForEach ($Job in $Jobs) { $Results += $Job.Pipe.EndInvoke($Job.Result) } $Results | Out-GridView

Enjoy!

Follow-up: Made another post about multi-threading the “Powershell” way, using Jobs.