PowerShell How-To

Filtering Command Output in PowerShell

In which Adam demonstrates the truth in the old PowerShell adage: "The more you can limit the number of objects returned to the pipeline, the faster you code will run."

Lots of commands will return objects that aren't always exactly what you'd like. Get-ChildItem can return a list of files on a storage volume but it's not realistic to enumerate the entire volume just to find one file. For that matter, you wouldn't just type Get-Vm and go through the hundreds of virtual machines you've got just to see what's happening on a single one. We need to limit that output somehow.

To do that, we have a few options in PowerShell. There's a saying in the PowerShell world to "filter left." This means it's best practices to limit the number of objects that are returned from commands as close to the source of output as possible. Generally, the closer you can get to limiting the number of objects returned to the pipeline, the faster you code will run.

Lots of methods exist to put this into practice, but a couple of popular ones are deciding whether to use the Filter parameter that's on the Get-ChildItem, Get-Ad* commands and many others, or using the more generic Where-Object command. Each will do the job of filtering output, but the difference in performance and memory consumption can be great.

For one example, PowerShell has a concept of providers. Each of these providers has its own built-in filtering system that PowerShell exposes via the Filter parameter. It's generally better to use the Filter parameter than Where-Object because the Filter parameter passes instructions to .NET to limit output at the provider level, rather than having to pull all of those objects out and then filtering the output at the pipeline level. The more you can avoid the pipeline, the faster your code will run.

To demonstrate this concept, let's look at a couple of different ways of filtering files in a folder -- although this technique could apply across a number of different scenarios.

I have a folder with 10,000 text files in it, with each file name incrementing by one: 1.txt , 2.txt , 3.txt , et cetera.

PS> (Get-ChildItem -Path C:\testing\).Count 10000

Let's say I want to find all of the files that have a "1" in the name. We need to filter the total results somehow. One way to do that is by using the Filter parameter on Get-ChildItem . This allows the user to specify what kind of files should be returned at the file system level. When using Filter , you must adhere to a specific syntax. For this example, to find all files with a "1" in them, I can do something like this:

PS> Get-ChildItem -Path C:\testing\ -Filter '*1*.txt'

On my computer, this takes about 145 milliseconds.

PS> Measure-Command { Get-ChildItem -Path C:\testing\ -Filter '*1*.txt' } Days : 0 Hours : 0 Minutes : 0 Seconds : 0 Milliseconds : 142 Ticks : 1429123 TotalDays : 1.6540775462963E-06 TotalHours : 3.96978611111111E-05 TotalMinutes : 0.00238187166666667 TotalSeconds : 0.1429123 TotalMilliseconds : 142.9123

Let's now use the more generic Where-Object command, which forces Get-ChildItem to enumerate all of the files on the file system, pass to PowerShell and then, at the pipeline, filter the results. Notice the time difference.

PS> Measure-Command { Get-ChildItem -Path C:\testing\ | Where-Object { $_.Name -like '*1*.txt' } } Days : 0 Hours : 0 Minutes : 0 Seconds : 0 Milliseconds : 619 Ticks : 6197156 TotalDays : 7.17263425925926E-06 TotalHours : 0.000172143222222222 TotalMinutes : 0.0103285933333333 TotalSeconds : 0.6197156 TotalMilliseconds : 619.7156

For the exact same result, we've increased the time 4x! I think I'll stick with using Filter .

We can even use the faster where() method and it's still substantially slower.

PS> Measure-Command { (Get-ChildItem -Path C:\testing\).where{( $_.Name -like '*1*.txt' )}} Days : 0 Hours : 0 Minutes : 0 Seconds : 0 Milliseconds : 429 Ticks : 4296713 TotalDays : 4.9730474537037E-06 TotalHours : 0.000119353138888889 TotalMinutes : 0.00716118833333333 TotalSeconds : 0.4296713 TotalMilliseconds : 429.6713

Use Filter whenever possible. If the command doesn't have a Filter parameter, look through the commands parameter to ensure it does not have another kind of filtering mechanism. The Where-Object command and where() method are universal and can be applied to any object being returned by any command, but that universality comes at a performance cost.