Stages

The Funnel of Fidelity depicts the process of applying different analytical procedures to manage millions of contextual events and apply limited investigative resources to the events or situations that are most likely to be malicious. The funnel consists of 5 stages: collection, detection, triage, investigation, and remediation. Each stage takes an input that was generated in the previous stage, performs some sort of filtering or noise reduction, and produces an output for the following stage. Ideally, each stage allows for deeper or more manual analysis to be applied to the event in question because non-relevant events have been filtered out.

The remainder of this post describes the stages in depth to give an idea of what is involved with each stage and how they interact with each other. For the sake of clarity, the input and output of each stage is named specifically (e.g., events, alerts, leads, and incidents). Additionally, each stage description includes the stage’s responsible role, input, and output.

The roles specified below are representative of conceptual responsibilities within an organization. Organizations will staff these roles in different ways depending on manpower and organizational constraints. For instance, there may not be an individual who is explicitly responsible for each of the described roles. We commonly see individuals who wear numerous hats as part of their responsibilities (e.g. Tier 3 SOC Analyst who is responsible for Detection Engineering, Alert Triage, and Investigation).

Collection

Role: Data Engineer

Input: Data Sensors

Output: Events

Telemetry is the building block of security monitoring as it provides context about activity occurring throughout the environment. Without telemetry it is nearly impossible to detect malicious activity. Mature organizations should strive to centralize as much telemetry as possible to enable enterprise detection activities. For example, Windows Event Logs are generated and stored locally, but must be centralized to support enterprise detection efforts. The collection phase gathers events from data sensors (Windows event logs, commercial EDR solution, proxy logs, netflow, etc.) and makes them available for the detection stage.

Commonly, detection and response teams look to the collection phase as the problem point because they assume that their problems arise from a lack of robust telemetry collection which clogs the funnel early on. The reality is that most organizations have a decent base level of collection that can be improved, but also provides enough telemetry to get started. It is uncommon for us to interact with a client that is leveraging collected telemetry to its fullest extent.

Detection

Role: Detection Engineer

Input: Events

Output: Alerts

With events being collected in a centralized fashion, detection engineers define detection logic to identify events that are security relevant. The goal of the detection stage is to reduce the millions or billions of events from the collection stage to hundreds or even thousands of alerts which will be analyzed during the triage stage.

These detections are created in an interactive process often referred to as threat hunting or detection engineering, but should be implemented in production through an automated process where detection logic is applied to events to generate alerts. Generally we see that the detection stage converts millions of events generated through collection into hundreds of alerts which are passed on to the triage phase.

Triage

Role: SOC Analyst (Tier 1 or 2)

Input: Alerts

Output: Leads

Alerts are the result of detection logic, but it is reasonable to expect some amount of false positives. The triage stage is where SOC analysts work to categorize alerts as known bad (malicious), known good (benign), and unknown activity. Malicious activity is immediately identified as an incident and moved to the remediation stage, while unknown activity is identified as a lead and sent to the investigation stage as it requires additional scrutiny.

The triage stage is where we see many organizations struggle. This typically manifests itself in malicious activity being marked as a false positive through alert fatigue. It is very common for organizations to delegate triage responsibilities to tier 1 SOC analysts, without checks and balances to ensure success. A large contributing factor to alert fatigue is the often unclear nature of alerts in general. SOC analysts have a hard time understanding the goal of the detection logic that produced an alert, the context of the attack that is being detected, and the steps they should take to properly triage the alert. These issues can be mitigated by using a detection documentation standard like Palantir’s Alerting and Detection Strategy Framework.

Investigation

Role: SOC Analyst (Tier 2 or 3) or Forensic Analyst

Input: Leads

Output: Incidents

The triage stage works to remove false positives from the pipeline and results in a manageable number of leads (likely in the single or double digits). A lead is an activity that cannot be identified as malicious or benign and thus requires additional investigation. The investigation stage is used to collect additional context that may not be available during the detection or triage phases. This may involve more manual / less scalable analysis such as file system analysis, memory forensics, binary analysis, etc. to help identify the true source of the activity. This additional scrutiny is possible because of the reduction in noise that occurred during the previous stages.

Remediation

Role: Incident Responder

Input: Incident

Output: N/A

Once an incident is declared, remediation activities must occur. This is the phase where incident responders work to identify the scope of the incident and remove the infection from the network. Many organizations work with third parties to accomplish remediation activities and ensure that they are completed in a timely manner. It is important to practice remediation in non-emergency situations to ensure the plan is sufficient and any issues are worked out.

What is Detection?

The concept of detection tends to be very nuanced in many organizations. For this reason we must distinguish between micro detection (the process of writing logic to alert on a potentially malicious event) and macro detection (the process of taking a true positive event from alert all the way to remediation). To truly consider an attack as detected, in the macro sense, the attack must result in remediation activity of some kind. Anything less is considered passive detection, which in the grand scheme of detection and response doesn’t matter. Below, I will explore two example cases where I’ve seen confusion regarding the concept of detection.

Example 1

Red team assessments often include a debrief of the attack path for the defenders. Commonly, during the debrief, someone on the detection and response team will learn about an attack that was carried out by the red team and will begin reviewing events in their SIEM. This exercise frequently concludes with the defender saying something along the lines of , “yea we saw that… here is the event”. This exercise is valuable, but what the defender is actually saying is “yes we collected relevant information to that attack, but we have not yet created the detection logic to detect that activity”.

Example 2

We’ve seen numerous organizations that have detection logic built for a specific technique that we used during a red team exercise, for example Kerberoasting, but for some reason the SOC never detected the activity. We eventually find that an alert fired, but it was marked as a false positive. Unfortunately the analyst responsible for this ticket didn’t know enough about Kerberoasting to differentiate between benign and malicious service ticket requests. The activity may have alerted, but in reality there was no detection that occurred in the macro sense.

How can I use the Funnel?

A huge benefit of the Funnel of Fidelity concept is that we can diagnose at which stage the breakdown is occurring. In example 1, it appears that there is sufficient collection, but a robust detection is missing. We could work with this customer to identify strategies to engineer robust detections using the data they are currently collecting in their environment. In example 2, we see that an alert is produced, but the triage process is failing. To address this we could focus on building alert documentation to grow organizational knowledge and remove guess work from the alert triage process. Both of these examples should not be seen as a failure of the detection and response program. Instead they should be viewed as an opportunity for process improvement (which is a great topic for future posts).