Paul Revere was the inspiration for getting me more into PowerShell.

In my day job as an infosec blogger for a a data security company, I have the chance to work with incredibly knowledgeable security pros and peek into some of what they’re doing. A few years back I had to learn PowerShell just to keep up with what I was hearing and seeing. Yeah, I was a math major, and sys-level programmer once upon a time, so I wasn’t starting from scratch. But still it was a challenge to work with this strange object-oriented scripting language that seemed to be massive overkill for some of the tasks I had in mind.

But then I kind of got it. I was also looking into the horrid Windows Event log, and hearing from experts how difficult it was to correlate events and get meaningful info out of it. And after one too many afternoons gazing at the Event log, I gave up trying to understand let alone explain it to others. Hold that thought.

I also had a side hobby a few years back writing about certain off-the-beaten path ideas in tech policy, regulations, strange startups, and other matters that struck me as interesting for a modest blog I started. All with liberal dashes of humor.

Does anyone remember sociologist Kieran Healy’s post on using meta data to find Paul Revere? Yeah, it inspired me to take the next step with Healy’s metadata idea, and I then wrote about how a simple probability model involving Markov chains can reveal more information about Revere’s ring of friends. Hold that thought as well.

I would have given up dealing with Windows log data had I not learned about … Sysmon, which thankfully pulls together a lot of practical information into a usable log entry, particularly for the process creation event #1. Finally I could see command lines, parent process ids, etc. Nice!

Sure you have to be careful with filtering out certain processes otherwise the log becomes, ahem, unwieldy. Thankfully PowerShell cmdlets let you pull this event log data into a script — covered partially in this Reddit post.

In exploring PowerShell, I found terrific resources on Github and PowerShell Gallery, where I finally was able to explore libraries implementing algorithms and tools that I thought were only available in “real” programming languages.

Thankfully my managers and editors at the company where I work gave me enough room to let me delve into my “random” ideas. So I wanted to show that after using PowerShell to parse the Sysmon data, I could do some fairly sophisticated threat analysis. Seemed like a tall order. But not … once you get to know PowerShell!

I was helped along the way by Doug Finke’s wonderful PowerShell library of standard comp sci algorithms for graphs, queues, etc. What I wanted to do was go beyond finding related nodes in an association graph (which I did with my Paul Revere idea) to the world of malware threats.

My algorithm of choice was the Random Walk With Restart — used by the way in lots of recommendation systems — but kind of turned around. I was looking for nodes that were least similar to a starting point! In other words, maybe I could detect threats from the Sysmon log by working out what would be the most unlikely apps to be launched by a user. And that might point to an account that was taken over by a hacker.

The key part is to work out the Markov transition matrix. Not too difficult with Finke’s PS algorithms. For my project, I parsed the Sysmon data and put it into a graph structure using this bit of PS code. The question is how to calculate the transitions to each neighbor. Well … here’s the PS code below:

$T = @{} #transition matrix for($ix=0; $ix -lt $g.vertices.count; $ix++) {

$start = $g.vertices[$ix]

#lets build a row

$row= @(0)*$g.vertices.count #initialize to 0





$w=0

foreach($e in $start.getEdges()) {

$w+=$e.weight

}

if ($w -eq 0) {

$row[$ix] =1 #lets keep it looping until it's restarted :-)

}

else {

#now create transition probability

foreach($e in $start.getEdges()) {

$ev = $e.endVertex

$p = $e.weight

$jx = v-index $ev.value.Key #convert app to index

$row[$jx]= $p/$w

}

} $T[$ix] = $row }

I had to step back after getting this far in my project, which would been darn near impossible for a soloist not that long ago. I mean taking real-world event data from a Windows system and being able to analyze it without involving other programmers or stats folks?

The PS code that implements my “reverse” rating system, called “random-rater”, along with supporting code can be all found in my Github repository.

And if you want more insight into what I was doing, spend a little time with this blog post. It connects Sysmon, PowerShell, and probability models. Who would have thunk they were all deeply related. :-)