It is Monday and you want to start your week by learning about a new adversarial technique and build detections around it. What is the first thing that you do besides reading about the technique? If you do not have a strategy in place, you might end up asking yourself some of the following questions:

Do I test the technique right away? If so, how do I prepare for the test? Do I build my own test or use someone else's? Do I even need to use a command and control framework (i.e Empire, Caldera)? How do I even use a command and control framework? How many technique variants do I test?

Do I do it in production? Am I even authorized to execute random tests in production? Do I do it in a lab environment? What do I need in a lab? Do I have the right event log auditing enabled?

How do I collect the data generated? Do I filter anything from the event logs to reduce the noise? Do I collect logs from one endpoint only? How do I share the data collected with other team members?

Do you notice that most of the questions are more related to how to produce the data rather than how to start understanding and analyzing the data to build a detection 🏹? You might be going through this without noticing.

Sharing My Story

This is something that I have experienced since the beginning of my career. I used to and still today, in some occasions, spend a lot of time trying to make a simulated test to work with the right setup. Every time I see one of my co-workers working on innovative offensive research, I always try to simulate it and produce data to analyze it and start sharing some detection ideas (i.e. Hunting in Active Directory: Unconstrained Delegation & Forests Trusts)

However, if someone else wanted to also simulate the test, sometimes that person would have to go through and do some of the same things that I did in order to get the technique to work and produce similar data. Also, remember that not everyone has a lab environment available for everyone to share or have standard configurations to produce the same data.

Once again, the time spent to build a detection is more related to produce the data rather than understanding and analyzing the data. Also, this is without considering the time that it could take to build your own environment and troubleshoot why it is not working as expected even when you automate the whole build and can deploy it in a cloud infrastructure 🤔.

The Beginning

Almost two years ago, I shared this with my brother Jose Luis Rodriguez, and I noticed that it was also happening to him. Therefore, we both decided to come up with a way to be able to share the data we were producing after simulating an adversarial technique with each other. This was helpful because one of us was always jumping straight to the analysis of the data and focusing more on the development of a detection. This idea was going beyond just sharing or using the same SIEM after simulating an attack technique. We wanted to make sure that we also had the data offline and available in a format that would be easy to import to other analytic platforms and analyze it even while on the road (Consulting life!) without access to the Internet.

Initial Goals

After a few conversations, I started to document a few initial goals to not only share data with each other, but also with the community:

An easy and flexible way to export and import data after a simulation test.

An easy way to capture data from a specific time window.

A capability to collect not just data from only the known targeted endpoints, but other systems that might have additional context for the development of the detection. For example, a lateral movement technique in a windows environment that leverages kerberos for authentication, involves a source host, a destination host and a domain controller.

A standardized format for structuring the data collected without applying any data transformation to it. A format that is simple and practical to consume. I did not want to share files that users would need to apply additional parsing steps to be able to run basic data analysis techniques.

A standard and universal way to categorize the data collected (ATT&CK)

How Did I Start?

I decided to take a look at the components of a project that I had developed named HELK to get some ideas. The project allows me to ingest and analyze event logs, so I thought it was a good idea to use some of its components.

Identify Source

I wanted to first find the best place to export data from. If we look at the image below of the current HELK architecture, we have data being published to Kafka for temporary storage, consumed by Logstash for data transformation jobs, and sent to Elasticsearch for long term storage.

I decided to go with Kafka, because the data is published and stored without any transformations (i.e. renaming of fields), and it is already built to allow several applications to consume and publish streams of data at scale.

What is Kafka?

Apache Kafka is a community distributed publish-subscribe event streaming platform designed to be fast, scalable, fault-tolerant, and durable. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log.

There are several ways that you can consume and publish data to Kafka, and it all depends on the use-case. For example, If you want to consume data from a Kafka broker to easily execute SQL-like queries on the top of the data streams, you can do it with KSQL. I wrote about it in here. If you simply want to consume and publish data to Kafka topics, then Kafkacat is your friend.

What is Kafkacat?

kafkacat is a generic non-JVM producer and consumer for Apache Kafka. kafkacat is a command line utility that you can use to test and debug Apache Kafka deployments. You can use kafkacat to produce, consume, and list topic and partition information for Kafka. Described as “netcat for Kafka”, it is a swiss-army knife of tools for inspecting and creating data in Kafka

Kafkacat: Consumer Mode

In consumer mode, Kafkacat reads messages from a topic and prints them to standard output (stdout). You can also redirect it to a file (i.e. JSON)

Example: Consuming data from the last Kafka record. I could run the following command before starting a simulation test.

$ kafkacat -b <Kafka-IP>:9092 -t <kafka-Topic> -C -o end > file.json

-b : Kafka broker

: Kafka broker -t : Topic to consume from

: Topic to consume from -C : Consumer Mode

: Consumer Mode -o: Offset to start consuming from

Kafkacat: Producer Mode

In producer mode, Kafkacat reads messages from standard input (stdin). You can also send data to kafkacat by adding data from a file. This means that we can send data back to any other Kafka broker. Do you remember TCPReplay? A tool used to replay previously captured network traffic. Well, think about the same concept, but for security event logs 😉

Example: Producing data from a file. I could run the following command when I want to re-play the data generated after a simulation.

$ kafkacat -b <Kafka-IP>:9092 -t <kafka-Topic> -P -l file.json

-b : Kafka broker

: Kafka broker -t : Topic to produce to

: Topic to produce to -P : Producer Mode

: Producer Mode -l: Send messages from a file

That’s it! Now, all we need to do is integrate the Kafkacat concept to an environment that is already using Kafka as part of its data pipeline. Remember. that you do not need Kafkacat to start analyzing the data. You can use python libraries like Pandas and ingest that JSON file for further analysis. I blog about it already here 😉.

Wait , What?

I believe it is better to show you how all this works with a basic example 🍻

Share Your First Dataset!

Environment

Before you even execute a simulation test, you have to make sure you have a well documented environment to play with. I do not like to play with data that I do not know anything about (users, host names, audit policies, etc.) A basic simulation environment setup might look like this:

Since I use HELK in my testing environment, I already have a Kafka broker available. Therefore, I can use Kafkacat to consume data during the simulation test, and push the data being collected to a JSON file.

Scenario: Over-pass-the-hash without touching LSASS

Let’s say you want to simulate an adversary using a stolen rc4_hmac hash (NTLM) from a domain-joined user to turn it into a fully-fledged ticket-granting-ticket (TGT) without patching LSASS with the stolen hash/key to kick off the normal kerberos authentication process. You can learn more about this technique with a C# toolset for raw Kerberos interaction and abuses named Rubeus developed by my co-worker Will Schroeder.

Simulate Technique and Consume Data

Following the basic Kafkacat commands that I showed you above and reading about the ticket request commands from Rubeus from here, I put together this video to show you how to consume/export data during a simulation test.

“But, you are fingerprinting Empire!”

If you want to focus on specific indicators of compromise from specific command and control frameworks, then yes the dataset is fingerprinting Empire. However, I find it more valuable to build detections around the technique behavior, and not the specific command line arguments or user agents being used by a specific command and control framework.

For example, one of the main reasons to use Rubeus for over-pass-the-hash is to avoid the known LSASS manipulation approach. However, the unintentional consequences of that is that Kerberos traffic to port 88 should normally only originate from lsass.exe — sending raw traffic of this type from another process is more valuable for my detection strategy. The dataset provided represents that behavior and more.

Let’s Covenant it!

If I have not convinced you yet, here is another video where I use Covenant to simulate the same technique. I recorded the data produced after executing Rubeus with asktgt ptt arguments via Kafkacat and got similar results.

Share a Dataset and Import It to an Analytic Platform

We now have two sample datasets for Rubeus asktgt ptt module. You can send those datasets to anyone in the community. You can also import them to a HELK or analyze them directly with other tools (i.e. Jupyter Notebooks). I put together this video to show you how you can import the dataset to HELK.

That’s it! Now you know how to export the security events your produce while simulating an adversarial technique, and how to import them into another analytics platform if it is necessary.

How Can We Help the Community?

Once Jose Luis Rodriguez and I figured out we could start doing this for every adversarial technique we were testing, we decided to create a GitHub repo and start sharing all the datasets we were producing in our own environment with everyone in the community 💙. We named this project Mordor!

Enter Mordor

The Mordor project provides pre-recorded security events generated by simulated adversarial techniques in the form of JavaScript Object Notation (JSON) files for easy consumption. The pre-recorded data is categorized by platforms, adversary groups, tactics and techniques defined by the Mitre ATT&CK Framework. The pre-recorded data represents not only specific known malicious events but additional context/events that occur around it. This is done on purpose so that you can test creative correlations across diverse data sources, enhancing your detection strategy and potentially reducing the number of false positives in your own environment.

The name Mordor comes from the awesome book/film series “The Lord of the Rings”, and it was a place where the evil forces of Sauron lived. This repository is where data generated by known “malicious” adversarial activity lives, hence the name of the project 😜

Mordor Goals

Provide free portable datasets to expedite the development of analytics.

Facilitate adversarial techniques simulation and output consumption.

Allow security analysts to test their skills with real known bad data.

Improve the validation stage of data analytics in a more efficient way.

Enable data scientists to have semi-labeled data for initial research.

Contribute to the ATT&CK framework Data Sources section

What Can I Do With Mordor Datasets?

Map Adversary Groups

You can simulate an entire Adversary Group Playbook and collect all the data it produces all at once. You can use the ATT&CK Groups section or the Unit42 Playbooks viewer to know what techniques to execute. I decided to do that with the help of the ATT&CK Eval Round 1 — APT3 (Second Scenario). I put together a Playbook to use with the Empire C2 Framework following the ATT&CK team example for Metasploit and Cobalt Strike. It was a lot of fun, but it took time to get the environment ready and every single command to work. Good thing it is already available as the first large dataset in the project. You can download it and start playing with it! 😄🍻

Map Threat Hunter Playbooks

You can use every single dataset that is pushed to the repo or make your own, and map them to a threat hunter playbook or any documentation framework you use to share your research. I used to add specific commands or scripts used to simulate the specific adversary technique to my playbooks. I now map the data produced to a threat hunter playbook in the validation section. I save a lot of time every time anyone asks me about how I validate the data analytics I create. I can simply run my analytics against the datasets 🤜 🤛

Example Playbook: Domain DPAPI Backup Key Extraction

Train Others With Them!

This is for offensive and defensive operators. Wether you are training a red teamer or blue teamer, you can use Mordor datasets to show them what adversaries might look like while executing certain techniques. It will expedite their learning experience from a data mapping to adversary actions perspective. For example, you can map Mordor datasets to several rows in this document that I put together as part of my project OSSEM.

Example: Threat Hunting with Jupyter Notebooks — Part 4: SQL JOIN via Apache SparkSQL 🔗

Test Your Pipeline!

One feature that I am working on is a way to translate a Mordor dataset to anyones environment context. That would allow anyone to pick computer names, user names, IP addresses and domain names that you would like to see in a Mordor dataset. That way, if you want to replay an attack that looks like it happened in your environment, you will be able to do it right before you import the dataset. You can add that to your validation testing strategy. You will be able to make a Mordor dataset look like it happened in environments where you are not authorized to execute anything, but have control to update audit policies. You will be able to prove why you need to enable audit policies.

How Can I Replicate Your Environment?

That’s a great question! One of the main things that I wanted to accomplish with this project was a way to provide a well-documented environment so that you know exactly how it is configured, and help you understand each dataset from an environment perspective. I also thought it would be useful for you to understand not just audit policies applied to it, but the specific user roles, security groups, and custom configurations via group policy objects defined. In addition, I also understand that you might want to play with it and contribute to our project. Therefore, thanks to the amazing work from my co-worker Jonathan Johnson, Mordor will be available in a cloud provider soon right after this post. He has put a lot of work to replicate and improve everything I did, and is very excited to share it with the community.

The main goal of the project is to store and share pre-recorded datasets that you can download and replay right away, but if you want to share your datasets with the community also following our standard environment, then Mordor in the cloud is perfect for you! 💙. Once you record your dataset just send us a PR and we will add it for you!.

I leave you with a great quote from my co-worker Dwight Hohnstein ❤️