As a way to create alerts in real-time

Let’s imagine you are using SIGMA to create alerts while processing a log line and transforming it into a context-aware document, you could have the following rule to alert as soon as an event is seen for a user performing more than 5 failed logins on more than 2 hosts:

title: Multiple Failed Logins with Different Accounts from Single Source System

description: Detects suspicious failed logins with different user accounts from a single source system

logsource:

product: linux

service: auth

detection:

selection:

pam_message: "authentication failure"

pam_user: '*'

pam_rhost: '*'

timeframe: 24h

condition: selection | count(pam_user) by pam_rhost > 3

falsepositives:

- Terminal servers

- Jump servers

- Workstations with frequently changing users

level: medium

For the previous example, most tools would use something like Elasticsearch to perform the aggregations needed to trigger the alert (for example using sigma2elastalert). While this is quite convenient, it does mean that the time to respond will largely depend on your time to deliver data to Elasticsearch and have it indexed properly for a job to perform the query, and on top of this the load on the cluster will be a lot higher due to the number of aggregations that constantly need to be executed.

Using features in your documents, the component that is parsing and creating the processed documents would be able to perform the same checks (use a feature on a sliding window of the selected time frame and alert if the condition matches). For the example provided, this would be a unique counter feature of an authentication failure that would be incremented for the amount of failed users on a single host.

As a way to perform simple historical queries

Another benefit of this approach is that if you were to store this data in long term storage such as Big Query, when evaluating a new alert you could rerun the same query over large periods of time, since the check is a simple comparison with the features present in the documents stored. This is possible since any rule would be a combination of document tags and their values. Instead of having to perform aggregations on a year’s worth of data, the query would simply compare each document against an expected set of tags being present and their values to trigger an alert.

Here it’s an example of a query in Big Query where it would perform a simple check on any event that had stored as a feature in the document the amount of failed logins for the user in a sliding window of one hour. This would allow to easily perform a one-year check of how many alerts would have triggered with a different threshold.

Types of features

Now that I hopefully showed the value of these features, there are some ways we can categorize them which will have technical implications on their possible implementations.

By their time constraints

Fixed window features

Example of fixed windows for particular keys (features in this case). Image source here.

These features are simple to calculate but can usually provide less value given they are prone to not reflect an accurate picture of the context depending on the window size. Examples are windows that count unique or total amount of logs related to a particular property and that should happen in a fixed period of time, such as in one particular day or hour. For example, if you were to state that an IP address should not perform more than 5 failed logins in one hour, and if one does 3 failed logins at 13:59:58 and 4 at 14:00:05, then you would not alert since the total counts are for 3 at 13pm and for 4 at 14pm.

Sliding window features

Example of sliding windows for particular keys (features in this case). Image source here.

Sliding window features are more complicated to calculate since they need to be able to provide counts in any period of time-related to the time frame they intend to represent. In the previous example, an alert should trigger around 14:00:05 since in a sliding window period of one hour we reached 7 failed logins for a particular IP address. This complexity is justified by the power it brings, of being able to state for any log the properties present for a particular period of time previous to the log line.