YARA has been popular tool and language in a malware hunting community for quite some time and it is still getting a lot of attention. Individuals and organizations collect rulesets that they process, scan with them, export them to VirusTotal Hunting etc. Another possibility is that they have some kind of intelligence elsewhere and they generate YARA rulesets which are then processed as usual. Doing all of this manually would be a burden on malware analysts and that’s where automation steps in. But for automation, you need tools. Tools that would help you with processing YARA rules.

That’s the reason why yaramod was created.

What is yaramod?

Yaramod is a library for parsing, creating and formatting YARA rulesets. It is able to give you YARA ruleset in form of internal representation, which you can analyze or modify and then turn the internal representation back into YARA ruleset. It does not depend on libyara because it uses its own parser (written in pog). In this blog post, I willl showcase what is yaramod capable of and how you can utilize it yourself.

Analyze rules

Let’s say you want to do the simplest thing imaginable – print our all rules that you have in your YARA file and also location of all rules. The very first thing that you might think of are regular expressions. It’s not that hard to come up with regular expression rule\s+[a-zA-Z_]\w* . BUT, what if the rule is commented out? You might or might not want it. What if someone was so evil and put block comment in between rule and its name? What if you care about included files? The more you think about it, the more edge cases keep popping up (like what if someone put rule <name> inside meta?). It’s just simply better if you don’t have to care.

import yaramod ymod = yaramod.Yaramod() yfile = ymod.parse_file( 'ruleset.yar' ) for rule in yfile.rules: print(rule.name) print( f' {rule.location.file_path} : {rule.location.line_number} ' )

In the same way as rules, you can also iterate over strings and inspect their contents. For example, counting number of strings of each type is as easy as this.

string_count = { 'plain' : 0 , 'hex' : 0 , 'regexp' : 0 } def inc_string_count(type): string_count[type] += 1 for rule in yfile.rules: for string in rule.strings: if string.is_plain: inc_string_count( 'plain' ) elif string.is_hex: inc_string_count( 'hex' ) elif string.is_regexp: inc_string_count( 'regexp' )

Working with conditions is much harder. As you know, condition is an expression that is evaluated when the rule is being matched against the input. The condition itself can be as simple as just a single expression or it can be complex and consist of several expressions joined with several logical ands, ors, nots, …. In yaramod, we represent the condition in form of Abstract Syntax Tree (AST). It is just a tree where each node represents some specific syntactic construct in your condition. For example, let’s take condition $str1 or (#str2 > 2) . When represented as AST, this condition would look like this.

AST of an expression $str1 or (#str2 > 2)

So in order to inspect and analyze conditions you have to preform some kind of traversal though AST. Yaramod doesn’t leave you on your own here and tries to ease this process by providing helper types which assist you in traversal using visitor-like interface.

One of the most performance-heavy things you can do in YARA are regular expressions. They have their own bytecode and each time we want to match any regular expressions we are basically running interpreter which executes some sort of instructions so it is often recommended to stay away from them. But you can’t completely avoid them and you’ll run into multiple cases where regular expressions are going to be needed. In that matter, let’s build a tool using yaramod, that will keep track of how many regular expressions we use for each rule in our ruleset.

import json import yaramod class RegexpFinder (yaramod.ObservingVisitor) : def find (self, yara_file: yaramod.YaraFile) : result = {} for rule in yara_file.rules: self.counter = 0 self.observe(rule.condition) result[rule.name] = self.counter return result def visit_RegexpExpression (self, expr: yaramod.Expression) : self.counter += 1 ymod = yaramod.Yaramod() yfile = ymod.parse_file( 'ruleset.yar' ) regexp_finder = RegexpFinder() rule_regexp_table = regexp_finder.find(yfile) print(json.dumps(rule_regexp_table, indent= True , sort_keys= True ))

And that’s it. We are using ObservingVisitor what is a helper class that will traverse the whole AST by default and we can only provide specific visit methods for the parts of condition we are interested in. We could have used Visitor directly but we would need to write our own traversal for all the and expressions, or expressions, function calls and many others. It is usual to base the code that revolves around traversing the whole AST on ObservingVisitor . We’ve then provided specific visit method for RegexpExpression which is a specific type of expression used for regular expressions inside conditions. This method is called whenever AST runs into node with RegexpExpression in it. Output of this script will look like this:

{ "no_regexps" : 0 , "regexp_overload" : 6 }

for ruleset like this one:

import "cuckoo" import "pe" rule regexp_overload { condition: cuckoo.sync.mutex(/^EvilMutex$/i) and pe.imports(/kernel32\.dll/i, /(Read|Write)ProcessMemory/) and for any i in (0 .. pe.number_of_sections - 1) : ( pe.sections[i].name matches /^UPX\d+/ ) and ( cuckoo.filesystem.file_access(/C:\\Windows/) or cuckoo.filesystem.file_access(/C:\\Users/) ) } rule no_regexps { condition: pe.imports("kernel32.dll", "WriteProcessMemory") }

There are many different expression types which you can run into. List of all of them can be found in our documentation.

VirusTotal ruleset

If you use your rulesets on VirusTotal Hunting, there is a high chance that you are using at least one of the external variables provided by VirusTotal during YARA runtime. By default, you will be able to parse those without any problems as you can see here.

ymod = yaramod.Yaramod() yfile = ymod.parse_string( r''' rule virus_total_rule { condition: new_file and positives > 10 and signatures matches /Abc.*/ } ''' ) print(yfile.text)

rule virus_total_rule { condition: new_file and positives > 10 and signatures matches /Abc.*/ }

Modifying rules

In the same way how you observe rulesets, you can also modify them while doing the traversal. You just need to use ModifyingVisitor . The only change between observing one and modifying one is that ModifyingVistor expects return values of visit methods. There are 3 possible return values:

None which means you would like to keep this AST node intact

which means you would like to keep this AST node intact Another instance of Expression type which means that you would like to replace this AST node with the one you return

type which means that you would like to replace this AST node with the one you return Special value yaramod.VisitAction.Delete which means that you want to delete this AST node. Deleting a node has a subsequent impact on other AST nodes which are in relationship with this node (parents, siblings, …). The consequences differ from type to type but usually, the delete request will be propagated to parents and that will mean the whole branch of AST will be removed.

Let’s say that we want to simplify conditions a little bit. According to YARA documentation, there are multiple overloads of pe.imports function some of which accepts regular expressions and others strings. The overloads with strings will have less impact on performance because they are just doing simple strcasecmp . Let’s do a script which will simplify those calls.

At first, we would like to recognize which regular expressions to simplify. We can’t for example easily convert regular expression a.c into anything string-like because we can’t do wildcards. Same for many other operations offered by regular expressions. Since it is not the point of this blogpost to build function that recognizes these regular expressions, we will use something rather simple which just iterates over the regexp and whenever it runs into \ for escape sequence, it skips the character immediately following it. Otherwise it checks whether some specific characters are in the rest of the string. It is not ideal but sufficient for this example.

def simple_regexp (text) : idx = 0 while idx < len(text): c = text[idx] if c == '\\' : idx += 1 elif c in { '(' , ')' , '|' , '[' , ']' , '*' , '+' , '?' , '.' , '{' , '}' }: return False idx += 1 return True

Another step is to create function that will transform the regular expression into the string because we can’t have the same escape sequences in both of them. For example \. in regular expression denotes actual . character but in strings we can’t use it. Again, let’s come up with something simple.

def to_string (text) : if text.startswith( '^' ): text = text[ 1 :] if text.endswith( '$' ): text = text[: -1 ] idx = 0 result = '' while idx < len(text): c = text[idx] if c == '\\' : idx += 1 continue result += c idx += 1 return result

At the start, it just removes anchors for the beginning and end of line. Then it looks for escape sequences and copies only the character which is following the \ in the escape sequence. Now that we have these functions, let’s put it all together using ModifyingVisitor .

class RegexpCaseInsesitiveAdder (yaramod.ModifyingVisitor) : def add (self, yara_file: yaramod.YaraFile) : for rule in yara_file.rules: self.inside_pe_imports = False rule.condition = self.modify(rule.condition) def visit_FunctionCallExpression (self, expr: yaramod.Expression) : if expr.function.text == 'pe.imports' : self.inside_pe_imports = True new_args = [] for arg in expr.arguments: new_arg = arg.accept(self) new_args.append(new_arg if new_arg is not None else arg) expr.arguments = new_args self.inside_pe_imports = False def visit_RegexpExpression (self, expr: yaramod.Expression) : regexp_text = expr.regexp_string.unit.text if self.inside_pe_imports and simple_regexp(regexp_text): return yaramod.string_val(to_string(regexp_text)).get()

We try to visit each FunctionCallExpression and detect whenever we enter pe.imports function call. Once this condition is satisfied, we recursively visit each function argument using accept method of each argument. These will invoke visits of RegexpExpression where we detect simple regular expressions using our simple_regexp function and convert it to string using to_string if so. Since there can be regular expressions in other functions, we need inside_pe_imports to distinguish those calls. If there were functions in YARA which return regular expressions, we might need to use stack for storing these as there may be nested function calls in arguments of pe.imports . This isn’t our case so we went with simple solution.

Generate rules

Condition

Yaramod is capable of generating your own YARA rulesets out of data that you might have in a database, text file, etc. It is heavily based on builder design-pattern and operator overloading to make the generation as easy as possible without having to write much code. It aims for declarative style of generation. Let’s image that we pulled the list of strings out of database and we would like to create YARA rule out of them.

>>> condition = yaramod.of(yaramod.all(), yaramod.them()).get() >>> print (condition.text) all of them

We can also request specific number of matches if we don’t want all of them at once.

>>> condition = yaramod.of(yaramod.int_val( 5 ), yaramod.them()).get() >>> print (condition.text) 5 of them

Let’s modify the condition so that each string must hit at least 2 times.

>>> condition = yaramod.for_loop( ... yaramod.all(), ... yaramod.them(), ... yaramod.match_count( '#' ) >= yaramod.int_val( 2 ) ... ).get() >>> print (condition.text) for all of them : (

There are much more things you can express using the declarative style of yaramod. See our documentation for list of all functions that are available for creating expressions.

Rule

Now that we have condition, we can go to generation of rules. Rules use the same declarative style of describing how they should look like. Following piece of code will generate rule named rule_name with plain strings with both ascii and wide modifiers, named $s<index> and the condition we’ve created in the previous step.

rule_builder = yaramod.YaraRuleBuilder() rule_builder.with_name( 'my_rule' ) for idx, string in enumerate(strings): rule_builder = rule_builder.with_plain_string( '$s{:02}' .format(idx), string).ascii().wide() rule = rule_builder.with_condition(condition).get() print (rule.text)

The generated rule looks exactly like this:

rule my_rule { strings: $s00 = "abc" ascii wide $s01 = "def" ascii wide $s02 = "ghi" ascii wide condition: for all of them : ( # >= 2 ) }

To make it more feature rich, we can add few more meta information.

rule_builder \ .with_string_meta( 'author' , 'Marek Milkovic' ) \ .with_int_meta( 'version' , 1 )

Which will make our rule look like this:

rule my_rule { meta: author = "Marek Milkovic" version = 1 strings: $s00 = "abc" ascii wide $s01 = "def" ascii wide $s02 = "ghi" ascii wide condition: for all of them : ( # >= 2 ) }

YARA file

To put it all together, we need to create YARA file which contains multiple rules at the same time and also specifies required imports.

file_builder = yaramod.YaraFileBuilder() file = file_builder.with_rule(rule).get() print (file.text)

Formatting

Very recent addition to yaramod features is automatic formatting The purpose of this feature is make it easier to write YARA rules in a specific format like following the style of placement of { and } , ( and ) , aligning of comments, normalization of whitespaces, indentation levels etc. The big advantage of autoformatting is that it keeps all the comments you have in the YARA file (usually parses discard any comments). We have the formatting hardcoded for now but we would definitely like if users were able to specify their own configuration for example for indentation character being used. In order to use autoformatting, instead of accessing text attribute (which just turns internal representation into YARA) of YARA file use text_formatted (which will use token stream of all seen tokens on the input and will take care of formatting them). Let’s say this is our YARA file on the input:

// Imports import "cuckoo" import "pe" // Rules rule my_rule { meta: author="Marek Milkovic" version=1 strings: $s01 = "Hello" ascii // Comment 1 $s02 = "World" wide ascii // Comment 2 condition: all of them }

And if we parse this and ask for text_formatted :

// Imports import "cuckoo" import "pe" // Rules rule my_rule { meta: author = "Marek Milkovic" version = 1 strings: $s01 = "Hello" ascii // Comment 1 $s02 = "World" wide ascii // Comment 2 condition: all of them }

As you can see, it has fixed some of the things I have mentioned but it still doesn’t want to be too intrusive. We are trying to fix the very necessary and still keep some of your formatting. If the feedback will be positive then we can push for being more and more intrusive.

Formatting doesn’t work well with modifying of rulesets but it is something we would like to work on. Just be aware when you are using those two at the same time. It’s always best to start building the rule bottom-up – from condition, to rule and to the whole YARA file. So let’s start with the condition by matching all the strings.

Conclusion

We’ve shown how yaramod can be user to analyze, modify or even generate new YARA rulesets. We’ve shown some of its useful features and that it is a viable alternative to plyara while providing even more features like parsing of conditions, modifications of rulesets, generating of completely new ones, autoformatting and more. Hopefully the examples and our extensive documentation will help you to build your own tools. Don’t forget to visit our GitHub repository and leave an issue if you need any help (or if you just want to tell us that you use yaramod).