The goal of risk quantification is to express cybersecurity risks using numbers, instead of qualitative labels.

By doing this, the hope is you can more easily prioritize between different security initiatives, and track changes in your risk posture at a more granular level.

How is risk calculated?

Say you’d like to quantify the risks for a company.

I believe that you’ll want to identify the assets you care about, enumerate the risks to these assets, and then for each risk, assess its frequency and magnitude. See this talk to learn more.

(None of these four steps are trivial, and this post is dedicated only to assessing frequency. If you’d like to learn about the other three steps, you can email me and I’ll try writing up a post.)

After all:

an event’s risk = its frequency * its magnitude

Frequency

Frequency is the expected number of times an event will occur in a given time frame. It’s an integer.

I used to think that frequency is the probability of an event occurring in a time frame — a number between 0 and 1, in other words. This isn’t right.

One trick I use is mentally converting a 0.2 frequency (given the time frame is a year), for example, into thinking “the event should happen once every five years”.

Magnitude

Magnitude is the loss if the event occurs, in dollars.

To compute the magnitude of a data breach, I’d research past breaches and come up with all the likely losses we’d face, such as:

cost of credit monitoring for victims

legal fees and settlements

fines from regulators

stock price dropping

losing sales

brand damage

loss of productivity

Then, I’d convert each of these losses into dollars. Some, like fines and legal fees, are already in dollars. For others, like brand damage, I’d need to study how to quantify them.

riskquant lets you choose a distribution for your magnitude (either lognormal or PERT). You must provide low loss and high loss parameters for these distributions. Then, riskquant will sample a value from these distributions to be your magnitude.

I think the idea here is that magnitude is uncertain. We don’t know if the best case outcome will occur or the worst case, so instead of trying to compute the exact magnitude, we model our understanding of the magnitude with a probability distribution. And then we sample.

Before you forecast

To make this concrete, here’s a couple events you may want to forecast:

there’s a data breach at our company

a C-level exec is extorted for money

an employee leaks internal emails to the public

an engineer accidentally pushes a secret to a public GitHub repo

If you’re like me, you’d feel very uncomfortable if asked to come up with a probability that any of these events happen. I’d rather mark the probability as “low”, “medium”, or “high” and move on.

That’s why I wrote this post. As you read it, think about which technique(s) you’d use to estimate the probability of these four events.

You don’t need to be correct, absolutely

The way I see it, it doesn’t matter how high or how low your risk score is for any given event.

Most likely, you’re not planning to publish your risk scores or compare them with risk scores from other organizations.

What’s important is that your risk scores make sense relative to each other. In other words, it’s fine if you are more optimistic, or more pessimistic, than others, when assessing frequencies, as long as you stay consistent.

Add a time range

Any event that can occur, will occur, as time approaches infinity. So first of all, we need to specify a time range for each event. Let’s change our first event to:

our company encounters a data breach in Q4

Make the scenario as specific as possible

It’s unclear what a “data breach” is…is data being accidentally leaked a breach? If non-sensitive logs are exposed, is that a breach?

This is better:

an outsider steals some or all of our customer data stored in our Postgres database for product X, in Q4

If a “data breach” means one of several assets is breached, you need to sum the probabilities of each individual asset getting breached.

Frequency has two components

Frequency = chance of attempt * chance of success

In order for your Postgres database to be breached, you need an attacker to attempt to breach it, and for that attacker to succeed.

Whether the attacker succeeds is in your control. Your defenses dictate his chance of success, so you can estimate his chance of success.

Whether the attacker makes the attempt, on other other hand, is not in your control. To estimate the chance of him making an attempt, you could:

Base it on the reward if the attacker succeeds. Your credit card data should have a much higher chance of attempt than a Jira instance

Or, just ignore this factor, and say frequency is proportional to chance of success

Now, let’s go through some of the techniques for quantifying frequency.

1) Examine the data

Say you’re trying to forecast the frequency of a malware infection. All you need to do is query your EDR tool for the number of malware infections you’ve gotten in some recent time range.

Given you have a large enough sample size, you can compute frequency based on that.

Unfortunately, you probably don’t have this data on many of the uncommon attacks you want to forecast. And even if you do, your sample size may be small. Other orgs may have this data, but they usually don’t share it.

A great product idea

Someone should build an anonymized database, where I can ask it something like “give me a list of reported C-level extortion attempts in the last 24 months, at these 100 companies, which are all in the same location and industry as me, and about the same size” — and get real results.

In a blog post, Ryan McGeehan praises the field of nuclear science for writing postmortems with detailed root cause analysis, and publishing them publicly. We don’t do this in security today.

(The challenge is not the tech; it’s that companies don’t want to disclose this data.)

2) Testing

Say you want to forecast the frequency of a laptop being infected with ransomware, in particular, and you have no past data on this.

So, you could have your red team try to infect several laptops with the ransomware, and see how many times the red team succeeds. Atlassian does this.

You might be wondering “what TTPs (like delivery mechanism) should our red team use in executing the attack?” I don’t have a good answer for that yet, unfortunately.

3) Just guess the order of magnitude

Instead of guessing how exactly how frequently an event occurs, ask: does it happen every month? every year? every 10 years? every 100 years?

(This is a lot easier to answer!)

4) Standardized tables

Actuaries use life tables, like this, to estimate the chance of a person dying at or before a certain age.

This doesn’t exist yet, but we, collectively, could create similar tables for common security risks. I’d like to be able to look up “what’s the probability of bad event X, given that I’m a SOC certified tech company, with under 20 employees?”

5) Use a trained panel and measure their Brier scores

There’s three parts to this:

(a) train a group of people in forecasting. According to the book Superforecasting, prediction is in fact a skill that can be learned, and those who do are significantly better forecasters than untrained people — regardless of what needs to be predicted.

(b) put these people together in a panel. ask them to work together to predict the event. everyone produces their own prediction, though. this way, everyone is made aware of everything they should consider while making their decision

(c) if you can get the “correct answer” for the frequency through empirical observation, do it. then for each panelist, compute a score that measures the panelist’s accuracy; Brier scores are popular for this. you want panelists’ Brier scores to decrease over time.

Check out Ryan McGeehan’s blog posts to learn more about this.

A question I asked one well-known person in this field was how do you measure Brier scores if you can’t get the “correct answer”? How would you get the “correct” frequency for a rare event like a major data breach?

His reply was that you can’t, but going through the exercise, which gives everyone the same understanding of the risk in question, is worth it.

6) Attack trees

Attack trees map all the combinations of steps that can be used in a particular attack.

If decomposing a risk this way is practical, and you can measure the frequency of each step, try this approach.

7) Attack graphs

An attack graph has a node for each asset an attacker can compromise, and an edge between nodes if the attacker can move between the two corresponding assets.

Look at the example above. It shows that an attacker can compromise a web app from the Internet by exploiting a vulnerability, then pivot to the database server (eg, by finding our database credentials in Bash history).

Given an attack graph, you can run simulations. You’d start an agent from one of many different entry nodes (eg, phishing, credential stuffing, etc) and calculate the probability that the agent gets to the asset we’re concerned about (likely the database server, if the risk is a data breach).