When dealing with tasks that have rules and wordlists, Hashcat internally distributes the wordlist to the shaders on the GPUs but, gives all the rules to each shader. This means every shader has a partial chunk of the words in the dictionary and then applies all rules to the words that particular shader has when cracking. This also applies when a task is split into chunks by Hashcat’s keyspace calculation. The wordlist gets split into parts but, every shader will still apply all the rules to the words given to the shader. When distributing hashcat with Hashtopolis, this situation can get worse (Especially with small wordlists). The wordlist is generally divided in chunks that may be too small to entirely fill the GPUs shaders. With the rule splitting feature in Hashtopolis we want to solve that problem.

As an example we take a simple task with a slow algorithm (as we would not require splitting if we have a small wordlist with a fast algorithm, Hashcat mode 14800 in our case), using a relatively small wordlist and some rules:

#HL# -a 0 top10000.txt -r best64.rule

The Hashcat keyspace of this task is 10’000 (number of candidates in the wordlist). So if Hashtopolis is now benchmarking this task using the –progress-only flag, we receive a result which is problematic to use for splitting:

10000:7387876.83ms

That means, ideally Hashcat would like to get the full keyspace at once (to use the GPU fully, or at least the maximum) and it would approximately take 7300 seconds to complete. But if we cut the task into pieces of approximately 600s (as by default on Hashtopolis), we would get chunks of length 812 (600/7388*10000). Such a chunk would contain 812 words from the dictionary but all the rules for this task. So, Hashcat cannot fully use the GPUs potential with such a chunk.

If Hashtopolis now detects such a benchmark, where it would not be good to split it into parts and there are rules available, it automatically does rule splitting. Basically, a supertask is created out of the original task containing multiple subtasks. The number of subtasks depends on how many files Hashtopolis splits the rule file into. This strategy will produce subtasks with the entire wordlist and instead split the rule file. The objective of this is to fully load the shaders with words to better utilize GPU performance.

The number of rules in each subtasks rule file will need to be determined as described in what follows. Based on the given benchmark and the number of rules available, Hashtopolis will decide how many pieces the task should be split into. In some cases, this could mean that there will only be a single rule used in each subtask. In the given example above, Hashtopolis created 34 parts out of the best64.rule (which consists of 102 lines), so every subtask theoretically got 3 rules (in some cases 2, because of empty lines and comment lines). The benchmark of a subtask should give a much better result now, as Hashcat still wants to have the full keyspace but, now that we have reduced the number of rules each shader has we should be able to complete the task much more efficiently within our configured chunk length:

10000:190955.31ms

This benchmark is now much easier to distribute as it would be smaller than one chunk of 600s and we can distribute the subtask to an agent that will be fully utilized. This is because we are splitting the number of rules each agent receives instead of splitting the wordlist.

The conversion of the single task into a supertask containing many subtasks may introduce some time overhead, among other things (ie. additional benchmarks). To estimate these “overhead costs”, we have compared the total time of the supertask to the time required to run the complete task as a single chunk and the time required to complete the original task distributed without rule splitting (i.e. as it would have been done with the previous version of Hashtopolis). We had to stop this latter test as it was, as expected, really slow. The time given in the table below is a projection based on the observed speed during the execution. The time and speed of all the tests are presented in the table below.

The results show that in our specific example the time overhead is about an additional 4% compared to the single chunk but ~30 times faster than without the rule splitting feature. This overhead is definitely worth the price for more complex tasks as a single chunk is not an option for a larger task in a multi GPU configuration.

The rule splitting feature is currently included in the development branch of Hashtopolis and will be included in the next release after v0.6.0.

https://github.com/s3inlc/hashtopolis