Please believe me that I was meaning to write this post for my old blog, somewhere in March. It is a total coincidence that I work for Google GCP / Chronicle now that happens to be really good at this very task…

So you recall my recent post about TI matching to security telemetry like logs in near real-time? I did say that most threat intelligence (TI, also called “threat data” in this post) comes from past observations of badness. In fact, the whole model of value for threat intelligence is that even though such badness is past for the initial observer, it is a likely future for the intel recipient / consumer.

Hence it is more useful to match TI to past observations in your environment — such as your logs and other telemetry collected over time and not only after the threat data arrives at your door. Naturally, it is even better to match TI to both current/future and past telemetry.

For example, if the initial badness (say, an activity by a particular threat actor) was observed in June 2019, resulting intel was delivered to a client in August 2019 and this is when matching vs their logs started, how do you detect a compromise by the same actor that happened in July 2019? Exactly — with TI retro-matching! The word “retro” here does not mean the 1990s, BTW :-)

See my crude visual that proves, hands-down, that I should stick to writing text:

In this post I wanted to talk about specifically matching TI to past telemetry. To rind, “past” here is defined as “events that occurred before this threat data landed in your lap.”

Now, in real life, most organizations do real — time matching (i.e. they match logs as they arrive to the TI they already have) and do not do any historical matching. Most vendor products from literally firewalls to antivirus as well as “advanced” security tools make real-time matching as easy as flipping one switch or setting a checkbox to ON.

This starts the process of checking the incoming logs versus the existing repository of threat data. You can immediately see that hilarity ensues when you try to match 1,000,000 indicators (the question why the hell you have so many seemingly relevant threat data points is left for the reader to ponder) to 100,000 events per second.

Now, logs arrive as the activities are observed, and threat data feeds arrive on a schedule set by the TI provider(s) or your TIP configuration. So there is no universal law that says matching needs to only look forward — and in fact, that is counter to a large part of TI’s value discussed above.

The historical matching problem is both easier and harder. It is easier since you face little pressure to match in the now. And, obviously, it is harder since you need to match vs larger pool of data and you need to do it repeatedly after new threat data arrives.

However, the question many have is: is this actually that important? Let me convince you that you want to explore threat intel retro-matching:

Much wider net cast for signs of badness in your environment

Increased the likelihood of actually seeing observables in their original context and relevant time windows

More value gained from threat data feeds you receive

Improved chance to uncover advanced threats as better intelligence on their behavior becomes available

Ok, and why should some take a pass?

Very obviously, historical matches are not real time detections, since the threat was logged a while ago and not moments ago. If you are uncomfortable with that, well, take a pass.

You probably need a separate system with different performance and scalability requirements — if you cannot afford one, you should not attempt it

If using poor quality intel, the number of false patches may overwhelm you

You have no process (and no desire to create one) for dealing with persistent compromise at all; for you, a historical compromise is just not actionable.

Operationally, how often do you need to match TI beyond merely the logs that arrive after the threat signal? Personally, I want it done every time new intelligence arrives or soon thereafter. Practically, even a daily scan won’t be a bad thing since you may discover evidence of an intrusion that happened weeks ago.

How far back to dig? I want to go back a year, just for the heck of it (and because a year is a good retention number). It is very hard to justify, say, 30 days vs 90 days. On the other hand, you want to try and match this up to “average dwell time” (nowadays this number hangs in the 100–200 day range depending who you ask). Hence, you probably won’t go wrong with a year. However, keep in mind that environments do change and sometimes investigating an incident that happened a year ago brings up many different challenges (this discussion is left for future posts, if you care).

Now, this again raises the same old question: are TI matches finished/cooked detections or merely hunting clues. Personally, I’d treat all historical TI matches as hints or hunting clues, not as cooked detections or wake-me-at-3AM alerts. After a match, you still need to do the work to unravel the entire incident scope…

Related posts: