Mapping your enemy Botnet with Netzob

Have you ever been staring for nights at binary or hexadecimal data flows extracted from an USB channel? Don't you remember yourself searching for some patterns and similarities in this fuc*g mess of zeros and ones grabbed from a binary configuration file? How long did it take you to find an 16 bits decimal size field last time you reversed an IPC communication protocol? Did you know you were not alone and that among them, Rob Savoye (@ FOSDEM-08) and Drew Fisher (@ 28C3) have already reported the main difficulties of the RE operations. Both of them called for the creation of a tool which would help experts in their work.

After 2 years of intensive researches, we are pleased to present our results. A tool that facilitates the analysis of binary flows, finds relations between segments of data, deduces data types and formats, infers the state machine and other few little things, including fuzzing and simulating implementations of undocumented protocols .

Released under GPLv3, Netzob is (to our knowledge) the most advanced available tool that helps reversers and security evaluators/auditors in their work on undocumented protocols.

There are many reasons why an I.T. Advanced User would engage himself in RE operations. For example, some want to understand how their favorite game saves their credentials while others want to make interoperable their USB device on natively unsupported OSes. In addition to these common usages, security auditors (and evaluators) often use RE process in their work, either for vulnerability assessment of implementations or for analyzing malicious traffic and malwares. This presentation will discusses usage of RE by security auditors and evaluators in the context of malware analysis, and as a specific use case on botnets C&C.

We will present Netzob, an Open Source tool, and show how it helps to semi-automatically reverse undocumented communication protocols (USB, Network, IPC, ...). It leverages bio-informatic, automata theory and data dependencies algorithms to infer both the message format and the state machine of a protocol. Most of these algorithms were re-implemented from scratch which allowed us to customize their specifications regarding our needs. These algorithms will be pedagogically explained and their uses for RE purposes will be detailed.

We will also expose the methodology to generate contextualized communications based on the obtained specifications. Hence, the provided simulation module allows the creation of realists servers and clients in a controllable manner.