Who’s a good boy?

Update April 15, 2019: Added gists to make code easier to read and improved the script with JSON output.

BloodHound is a fantastic tool for analyzing Active Directory, but that’s really just the beginning. The BloodHound database is a resource that enables users to execute all sorts of queries — the only limit is your creativity. In this case study, the SpecterOps team used BloodHound to assist with an analysis of password hashes from two different domains.

The Setup

First, collect two lists of typical NTLM hashes, formatted like this:

DOMAIN\USERNAME:::32-CHAR-NTLM-HASH:::

For this case study, SpecterOps collected two lists of NTLM hashes from two different domains, but the analysis could be done with one big list for analyzing how many accounts in the same domain share a password.

It would be simple to compare these lists using basic *nix commands to split the strings and check if the hash in list one appeared in list two; however, that doesn’t account for disabled accounts. Some organizations will set a default password for newly created or disabled accounts which will make any script looking for password reuse go berserk.

Enter BloodHound

BloodHound enhances our analysis by keeping track of several useful pieces of information for every account: what level of privilege the user has, what AD security groups it effectively belongs to, and whether the account is even enabled in the first place. If the script detects password reuse, and checks in with BloodHound before logging it, there is an opportunity to make one very smart analysis script. This can be accomplished using the Python 3.7 neo4j library and Bolt driver. The provided snippets will assume you have a BloodHound database and Neo4j is running.

Note that for this to be as accurate as possible the hash and BloodHound collections should be performed at the same time. Otherwise, password resets may occur in-between collections and the pwdlastset values might be incorrect. It is a small thing (the hashes are much more important) but could cause some confusion.

Connecting to BloodHound

The following function will set up a connection to a local Neo4j graph database using the provided username and password and then return the Bolt driver object. It’s really just one line of code but it’s always a good idea to include some error handling and user feedback.

Now the setup_database_conn() function can be called at the start of the script to prepare for executing queries.

Executing Queries

The next function executes the query. In this case the query is a simple MATCH query that filters out any users BloodHound has marked as disabled. There are a couple of things that are different from what is normally done for a Cypher query. Take a look and see if you notice anything unusual.

First, the query uses the WHERE clause to filter on the user’s name label. This could be done in the MATCH using MATCH (u:User {name:’FOO@BAR.COM’}) but there are two things to be mindful of when constructing the query. The first thing is BloodHound stores all user objects as USERNAME@DOMAIN.COM . A search for just a username will return nothing so the WHERE looks for a name that is like ( =~ ) the username + @.* . The .* is Java’s wildcard character and the @ ensures the query doesn’t accidentally match multiple usernames.

The other thing is usernames in the dumps may be a mix of upper and lowercase. BloodHound stores all names in uppercase and Cypher is case sensitive. This is easy to fix by wrapping the name the query looks for in the UPPER() function. This way any name provided to the query will be converted to all uppercase.

An example might help make sense of these considerations.

Let’s say there are two usernames in the dump: CHRISM and CHRISM_da

A query for MATCH (u:User) WHERE u.name =~ ‘CHRISM.*’ RETURN u would return both usernames because CHRISM.* matches CHRISM_da . This is fixed by adding the @ to restrict the wildcard to only the domain part of the username. MATCH (u:User) WHERE u.name =~ ‘CHRISM@.*’ RETURN u will only return CHRISM@DOMAIN.COM from BloodHound.

The other username has lowercase characters. The above query won’t return any results for CHRISM_da because it doesn’t match the version with all uppercase characters in BloodHound. To account for any instances of lowercase characters the query can include UPPER() without causing any harm to usernames that are already uppercase.

Finally, the function returns the query’s results which will contain the account’s pwdlastset timestamp, domain , and enabled status (a boolean value). The Bolt driver will return a BoltStatementResult object that must be enumerated, even if there is just one match. The enumeration could be done inside this function, but that would limit the results to just the first match. Your dataset might have identical usernames in different domains.

Parsing the Hashes

The Neo4j and BloodHound portion is ready to go so now the script needs to parse the hashes. The following snippet can be used multiple times to parse as many files as are needed. This snippet opens the file for reading and creates a dictionary to hold the hashes. Then it loops through the lines of the file and filters out machine accounts (account names ending with $ ) and any accounts with the NTLM hash 31d6cfe0d16ae931b73c59d7e0c089c0 (empty / no password).

The same sort of thing can be done with a Hashcat potfile. Then the dictionaries can be compared later to check if the hash has been cracked.

There is one last function that may be helpful. A final report probably should not have the full NTLM hash so the script can sanitize the hashes for reporting purposes. This function takes a 32-character NTLM hash and replaces all but the first and last four characters with asterisks. This is adapted from Carrie Roberts’ DPAT.py script (https://github.com/clr2of8/DPAT/blob/master/dpat.py#L40).

The Final Loop

All that is left to do is compare the dictionaries of usernames and NTLM hash values. This snippet checks if a hash from the first list exists in the second list. If there is a match it loops through the second list looking for all matching hashes. This is the first place this snippet can be customized. If you only desire to know if a user is in Domain A has the same password as any account in Domain B the comparisons can stop here. The loop through the second list catches instances where the user in Domain A has the same password as two or more users in Domain B.

The final output is a dictionary that can parsed and used as JSON for additional reporting.

The important piece is the execute_query() function calls. These check in with BloodHound to determine if the provided username is enabled. This way disabled accounts can be ignored. Again, the script can be customized based on your needs. Running this function just once ensures the account in Domain A is enabled but does not check if the account in Domain B is enabled. If you only want to know if two enabled accounts share a password then the query can be run for both accounts. If you would like to know if an engineer in Domain A is reusing their password for a bunch of test accounts that may or may not be enabled in Domain B then the query can be run just the one time.

Furthermore, the example query only uses a few BloodHound attributes. The query can be updated to return more information about the user, such as their display name and email address. The script could also include additional queries, like checking if accounts are in a high value group (e.g. Domain Admins).

Wrap Up

If you needed more of a reason to leverage BloodHound for defense maybe these automation possibilities will offer some encouragement! A few Cypher queries or lines of Python code can answer questions about Active Directory that you cannot easily obtain from AD itself. This case study shows how the data can also be leveraged for other projects that don’t directly involve AD.

Additionally, this Python approach offers the benefit of allowing you to run more advanced queries. There are some queries that simply will not complete when run via Neo4j’s web console but will complete when run using a Bolt driver and your favorite scripting language.

Andy Robbins (@_wald0) has a pair of articles from 2018 that offer a good primer on using BloodHound data to perform deeper analysis of Active Directory resilience and generating statistics:

Andy (@_wald0) and Rohan Vazarkar (@cptjesus) also recently presented an excellent talk on this topic at TROOPERSCon 2019:

Happy hunting! Aroooo!