Rambling

We occasionally get requests we can’t refuse. Do I have any interest in working with spreadsheets? Not particularly. But if my boss, or my bosses boss asks me to merge two spreadsheets based on a common column, I don’t want to say “sorry, no can do.”

I also don’t want to get stuck being the spreadsheet guy. So I wrote a quick PowerShell function that merges two sets of data, found a more flexible but slow alternative from Lucio Silveira, and settled on extending a nice modification from Dave Wyatt.

The result is yet another Join-Object.

Overview

Let’s look at a silly concocted example. I have a spreadsheet of managers, and a spreadsheet of departments. For some inexplicable reason, I need to match up departments to their manager’s birthday.

If you’ve spent any time with T-SQL, you’ve probably seen the handy Venn diagrams (source) out there, and the discussions on why Venn diagrams don’t match join statements alone.

Let’s take a look at how we might join the manager and department data.

Left joins

A left join is fairly self explanatory - we include all rows from the left data set (managers), and if anything matched on the right side (departments), we include that data:

Inner joins

An inner join is the intersection of both data sets. We return only data where a row on the left matched up with a row on the right:

Full joins

In some cases you want all the data, regardless of whether it is in either set:

Let’s use PowerShell to join this data up!

Join-Object

Join-Object can be found in my hodge podge repository of PowerShell functions. Here’s a quick rundown on the parameters:

Left - a collection of objects for the left side - you can pipe in data for this

Right - a collection of objects for the right side

LeftJoinProperty - The property on the Left collection whose value must match RightJoinProperty in the Right collection

RightJoinProperty - The property on the Right collection whose value must match LeftJoinProperty in the Left collection

LeftProperties (optional) - If specified, limit properties we keep from the left to this set

RightProperties (optional) - If specified, limit properties we keep from the right to this set

Prefix and Suffix (optional) - If specified, add a prefix or suffix to all Right collection property names. Quickly avoid collisions.

Type (optional) - What type of join: AllInLeft - Left join AllInRight - Right join OnlyInBoth - Inner join AllInBoth - Full join



That was painful! Let’s look at this in practice, where it’s a bit easier to see what’s going on.

Demo data

I’ll use PSExcel to pull in data from a spreadsheet. Keep in mind there are other options, like the fantastic ImportExcel module from Doug Finke, which PSExcel borrowed from.

$L = Import-XLSX -Path C:\temp\JoinTest.xlsx -Sheet 1

$R = Import-XLSX -Path C:\temp\JoinTest.xlsx -Sheet 2

We have the data, how do we join it together?

Inner join

Join-Object -Left $L -Right $R -LeftJoinProperty Name -RightJoinProperty Manager -Type OnlyIfInBoth -RightProperties Department

We can tell Join-Object that we only want rows where $L.Name is equal to $R.Manager (inner join), and to only return the Department property from the $R collection

Left join

Here’s where things get more interesting. Let’s run Lucio’s code first:

What happened! We only got back a name and birthday, where is the department that I wanted?

It turns out that PowerShell will see the first object in the pipeline, which only has a name and birthday property, and display output with only those two properties.

We could manually select Name, Birthday, and Department, or use Format-List to see the other properties, but I like abstraction, so in the extended Join-Object, we select the full set of properties for output.

Join-Object -Left $L -Right $R -LeftJoinProperty Name -RightJoinProperty Manager -Type AllInLeft -RightProperties Department

This time, we get the expected output, without worrying about missing columns:

Full join

There’s another issue you can run into. What if you have two properties with the same name, with differing values and meaning? In our example, we have a set of managers, where name refers to the manager’s name. We also have a name property on the department set, that refers to the department name.

Let’s look at the practical implication if we don’t account for this:

Interesting, the left values for name were overwritten by the right values. How can we fix this? The simplest solution is to add a prefix or suffix to all properties from the right collection:

Join-Object -Left $L -Right $R -LeftJoinProperty Name -RightJoinProperty Manager -Type AllInBoth -Prefix r_

Now we get all the properties, and nothing is overwritten:

Maybe you are more familiar with calculated properties. The RightProperties parameter can take individual properties, calculated properties, or a mix:

Join-Object -Left $L -Right $R -LeftJoinProperty Name -RightJoinProperty Manager -Type AllInBoth -RightProperties @ { N = "DeptName" ; expression = { $_ .Name } }

Rather than a generic prefix or suffix, we can use calculated properties to rename these conflicts, or manipulate their values:

Let’s take a peak at performance!

Performance

Lucio’s script offers some pretty cool flexibility, allowing you to specify a custom ‘Where’ scriptblock rather than assuming we want data where one value is equal to another. Unfortunately, Add-Member and invoking those scriptblocks takes a bit more time.

As you can see, this flexibility comes at a pretty steep cost, even when comparing two relatively small data sets:

Version Seconds Lucio’s 145.0 Dave’s ~1.0 Warren’s ~1.2

Practical example: Active Directory Input

The request:

“Hey Warren, we need to match up SSNs to Active Directory users, and check if they are enabled or not. I’ll e-mail you an unencrypted CSV with all the SSNs from gmail, what could go wrong?”

The code:

# Import some SSNs. $SSNs = Import-CSV -Path D:\SSNs.csv #Get AD users, and match up by a common value, samaccountname in this case: Get-ADUser -Filter "samaccountname -like 'wframe*'" | Join-Object -LeftJoinProperty samaccountname -Right $SSNs ` -RightJoinProperty samaccountname -RightProperties ssn ` -LeftProperties samaccountname, enabled, objectclass

The result:

Join-Worksheet

It’s a bit simpler to just use Join-Object, but you can find a crude Join-Worksheet function in PSExcel:

Closing

That’s about it! If you like this sort of thing and haven’t worked with SQL yet, read up on T-SQL. You can use simple PowerShell functions like Invoke-Sqlcmd2 or the PSSQLite module to work with MSSQL and SQLite, respectively, or use the many other tools out there.

If you have any suggestions or run into any issues, a pull request would be welcome : )

Disclaimer: title image source