"MALWARE.EXE" -> os.system("/usr/bin/md5sum " + _)

"MALWARE.EXE" -> [os.system("/usr/bin/md5sum " + _), os.system("/usr/bin/sha1sum " + _)]

sys.argv[1] -> [os.system("/usr/bin/md5sum " + _), os.system("/usr/bin/sha1sum " + _)]

% /usr/local/bin/pythonect md5_and_sha1_sums /bin/ls 92385e9b8864032488e253ebde0534c3 /bin/ls 8800fee57584ed1c44b638225c2f1eec818a27c2 /bin/ls

glob.glob('*.EXE') -> [os.system("/usr/bin/md5sum " + _), os.system("/usr/bin/sha1sum " + _)]

"MALWARE.EXE" -> subprocess.check_output(['/usr/bin/md5sum', _]) -> print

"MALWARE.EXE" -> open(_, 'r').read() -> hashlib.md5 -> _.hexdigest() -> print

"MALWARE.EXE" \ -> open(_, 'r').read() \ -> [re.finditer("\xcc", _), re.finditer("\xcd\x03", _)] \ -> print "Found INT3 between Offset #%d and #%d" % _.span(0)

import math def entropy(data): entropy = 0 if data: for x in range(2**8): p_x = float(data.count(chr(x))) / len(data) if p_x > 0: entropy += - p_x * math.log(p_x, 2) return entropy

"MALWARE.EXE" -> open(_, 'r').read() -> entropy.entropy -> print

"MALWARE.EXE" -> subprocess.check_output(['/usr/bin/objcopy', '-O', 'binary', '-j', '.text', _, '/dev/stdout']) -> entropy.entropy -> print

About 5 months ago I have released the first version of Pythonect - a new, experimental, general-purpose high-level dataflow programming language based on Python, written in Python.It aims to combine the intuitive feel of shell scripting (and all of its perks like implicit parallelism) with the flexibility and agility of Python.Crazy? Most definitely. And yet, strangely enough, it works!Pythonect, being a dataflow programming language, treats data as something that originates from a source, flows through a number of processing components, and arrives at some final destination.As such, it is most suitable for creating applications that are themselves focused on the "flow" of data. Perhaps the most readily available example of a dataflow-oriented applications comes from the realm of real-time signal processing, e.g. a video signal processor which perhaps starts with a video input, modifies it through a number of processing components (video filters), and finally outputs it to a video display.As with video, malware analysis can be expressed as a network of different components such as: disassemblers, regular expressions, debuggers and etc. that are connected by a number of communication channels.The benefits, and perhaps the greatest incentives, of expressing malware analysis this way is scalability and parallelism. The different components in the network can be maneuvered to create entirely unique dataflows without necessarily requiring the relationship to be hardcoded. Also, the design and concept of components make it easier to run on distributed systems and parallel processors.In this tutorial I will show you how to automate static malware analysis using Pythonect. The examples will be simple enough that you can extend them if you want to.Before you read this tutorial you should have at least a basic knowledge of x86 Assembly, Python, and Pythonect (I recommend reading the Pythonect Tutorial: Learn By Example ).There isn't exactly a "Hello, world" program in the malware analysis realm, so I will start with my equivalent to "Hello, world", an example program that computes a MD5 digest of a file:The program above uses theprogram of GNU coreutils to compute and print MALWARE.EXE's MD5 digest. Let's extend it to compute the MALWARE.EXE's SHA1 digest as well:The new program above uses theandof GNU coreutils to compute and print MALWARE.EXE's MD5 and SHA1 digests. Let's keep improving it:Now, the new program reads the malware filename from a command-line argument. To run the script just save it (e.g.) and run the Pythonect interpreter like this:Often, the goal is to handle the large volume of malware samples collected each day, let's change the program to work on all the executables (i.e. *.EXE) in the current directory:Of course it can be further finetuned or customized at will. Also, it's worth mentioning that the program above is multi-threaded. Meaning, each file starts a new thread.So far, I have used Python'sfunction in all of the example programs. Theis handy when it comes to writing small scripts, it executes a command in a subshell and returns it's exit status.But since there is little interest in passing the exit status to another component, a different command executing function will be needed when building an advanced script.Much like the original example program, the program above uses theprogram of GNU coreutils to compute MALWARE.EXE's MD5 digest, but prints the result using Pythonect'sfunction.Moving on. The Python Standard Library is a rich set of libraries (modules and packages) for tackling just about every programming task. For example:The program above is an alternative to the original example program, it uses Python'smodule to compute and MALWARE.EXE's MD5 digest and Pythonect'sto display it. What else?The program above searches for all the INT 3 instructions occurrences in MALWARE.EXE file, and prints the offsets of the beginning and end of each matched record.Now, for the times when the Python Standard Library don't have what you looking for. You can always implement your own in Python:The above is an implementation of Shannon's entropy equation in Python. To use it, simply save it (e.g.), and reference it in a program:The program above usesofto measure and print MALWARE.EXE's entropy. To conclude this tutorial, let's tweak it one more time:Now, the program above usesofto measure and print MALWARE.EXE's .text section (usingof GNU binutils) entropy. Pythonect is still under heavy development, there's a ton of unimplemented features and even more bugs. It's not ready for production yet, but you still can start to play with it and have plenty of fun!That's all for now.