Regex is often considered something of a black art, and not without reason. It is, arguably, the antithesis to PowerShell in terms of syntax. It's terse, unforgiving, and difficult to get meaningful debug data out of. However, sometimes you just have to parse text, and there often is no better tool than some well-applied regex.

Text Parsing is Messy; Objects Are Not

There's no way around it, really. At some point when parsing text, the code gets messy. Personally, I like to constrain the awful bits to regex, and make the most of it. Its terseness becomes an advantage here, as it contains the mess in one small spot, rather than resulting in large blocks of crude, messy parsing code.

There are a lot of ways to cram otherwise messy text into an object in PS. You can manually parse with or without regex, extracting data one painful piece at a time to build your object. You can use ConvertFrom-String or ConvertFrom-StringData (a personal favourite of mine).

But using the built-in language features of PowerShell's regex engine, which originates from the .NET libraries, is perhaps the most effective and simple way to go.

The Setup

Let's say we're trying to parse the output from the Windows netstat command, which looks like this:

PS ~\> netstat Active Connections Proto Local Address Foreign Address State TCP 127.0.0.1:2002 WS-JOEL:51464 ESTABLISHED TCP 127.0.0.1:5354 WS-JOEL:49695 ESTABLISHED TCP 127.0.0.1:5354 WS-JOEL:49696 ESTABLISHED TCP 127.0.0.1:27015 WS-JOEL:51470 ESTABLISHED

We could parse this with a whole bunch of $string.Split('`t') methods and a while loop, but cramming all that into a custom object would leave us with messy and difficult to read and review code.

Parsing with Regex

The ultimate goal here is to end up with an array of custom objects that we can emit to the pipeline, and to remove any unusable munged data. A basic regex pattern capable of parsing the netstat output would look something like this:

(\w+)\t(([0-9]+\.){3}[0-9]+):([0-9]+)\t([\w\d_-]+):([0-9]+)\t(\w+)

Okay, wow, what a mess. We could stop this pattern here and deem it "good enough," but it would likely need at least five lines of comments to properly document what that pattern is doing, so that it's recognisable at a glance. Let's improve this with a few named match groups:

Named Matches

$MatchPattern = @( '(?<Protocol>\w+)' '(?<LocalAddress>(?:[0-9]+\.){3}[0-9]+):(?<LocalPort>[0-9]+)' '(?<RemoteAddress>[\w\d_-]+):(?<RemotePort>[0-9]+)' '(?<State>\w+)' ) -join '\s+'

Notice that due to the length of the string I have split it with a common delimiter here and opted to have it programatically joined into a single match string with the missing \s (whitespace) characters that are also a necessary part of the pattern. This is an optional step that lends us some extra readability in the match pattern.

Try Before You Buy

It's always a good idea to check your pattern against the string you want to match, to see what happens.

# String copied from NETSTAT output $String = ' TCP 192.168.22.144:51546 vs-in-f188:5228 ESTABLISHED' $String -match $MatchPattern # Output True

Okay, great! Now let's check the $Matches variable. This is automatically populated when doing a -match operation on a single string.

$Matches # Output Name Value ---- ----- Protocol TCP RemotePort 5228 LocalPort 51546 State ESTABLISHED LocalAddress 192.168.22.144 RemoteAddress vs-in-f188 0 TCP 192.168.22.144:51546 vs-in-f188:5228 ESTABLISHED

Interesting. You can see that all our requested match groups are there, plus one extra, which is the full matched string. We're halfway there.

Let's Get Down to Business

$Matches is a [hashtable] , and in PowerShell we can convert this directly to [PSCustomObject] . However, in a case like this we're not particularly interested in the full string that gets matched, since that's basically just our original data. Instead, we'd much rather trim out the extra values and just convert the result.

Making use of output from netstat itself, this is one possible method of making it happen:

$Pattern = @( '(?<Protocol>\w+)' '(?<LocalAddress>(?:[0-9]+\.){3}[0-9]+):(?<LocalPort>[0-9]+)' '(?<RemoteAddress>[\w\d_-]+):(?<RemotePort>[0-9]+)' '(?<State>\w+)' ) -join '\s+' $Connections = netstat | ForEach-Object { if ( $_ -match $Pattern ) { $Matches . Remove ( 0 ) [ PSCustomObject ] $Matches } } | Select-Object -First 5 $Connections | Format-Table # Output RemotePort LocalPort State LocalAddress RemoteAddress Protocol ---------- --------- ----- ------------ ------------- -------- 51464 2002 ESTABLISHED 127.0.0.1 WS-JOEL TCP 49695 5354 ESTABLISHED 127.0.0.1 WS-JOEL TCP 49696 5354 ESTABLISHED 127.0.0.1 WS-JOEL TCP 51470 27015 ESTABLISHED 127.0.0.1 WS-JOEL TCP 5354 49695 ESTABLISHED 127.0.0.1 WS-JOEL TCP

The Format-Table is simply for display here, as custom objects with more than 4 properties output in list format by default.

Caveats

As always, each approach has its share of potentially-undesirable results. Most immediately obvious is that the order of the properties is not preserved, because $Matches is a hashtable. If we want to define a specific display order, we have two fairly simple options.

One is to insert a PSTypeName property and add a formatting hint for that type name in order to specify the order the properties are displayed in. More on that can be found in this post by Kevin Marquette.

The other option is to define a class with these properties and cast the hashtable to that class type instead of [PSCustomObject] .

In both cases you can add other formatting hints, such as properties to avoid displaying by default.

Using Classes

class Connection { [ datetime ] $Timestamp [ string ] $Protocol [ string ] $LocalAddress [ int ] $LocalPort [ string ] $RemoteAddress [ int ] $RemotePort [ string ] $State Connection () { $this . Timestamp = Get-Date } } $Pattern = @( '(?<Protocol>\w+)' '(?<LocalAddress>(?:[0-9]+\.){3}[0-9]+):(?<LocalPort>[0-9]+)' '(?<RemoteAddress>[\w\d_-]+):(?<RemotePort>[0-9]+)' '(?<State>\w+)' ) -join '\s+' $Connections = netstat | ForEach-Object { if ( $_ -match $Pattern ) { $Matches . Remove ( 0 ) [ Connection ] $Matches } } | Select-Object -First 5 $Connections | Format-Table

As you can see, it's relatively similar to working with a [PSCustomObject] . In essence, as long as the properties you're trying to set are publily settable, and the object type you're casting to has a default public constructor (i.e., one that takes no parameters) you can cast a hashtable to it.

If there is an appropriate .NET type the fits the bill, you can even cast to that, should you see the need to do so.

Not all data is sufficiently consistent for regex to yield meaningful results, but most data sources can be regexed. However, in a lot of cases there are more effective and quick methods to get the data in PowerShell.

Thanks for reading!