BitBlaze: Binary Analysis for Computer Security

Binary analysis is imperative for protecting COTS (common off-the-shelf) programs and analyzing and defending against the myriad of malicious code, where source code is unavailable, and the binary may even be obfuscated. Also, binary analysis provides the ground truth about program behavior since computers execute binaries (executables), not source code. However, binary analysis is challenging due to the lack of higher-level semantics. Many higher level techniques are often inadequate for analyzing even benign binaries, let alone potentially malicious binaries. Thus, we need to develop tools and techniques which work at the binary level, can be used for analyzing COTS software, as well as malicious binaries.

The BitBlaze project aims to design and develop a powerful binary analysis platform and employ the platform in order to (1) analyze and develop novel COTS protection and diagnostic mechanisms and (2) analyze, understand, and develop defenses against malicious code. The BitBlaze project also strives to open new application areas of binary analysis, which provides sound and effective solutions to applications beyond software security and malicious code defense, such as protocol reverse engineering and fingerprint generation.

The BitBlaze Binary Analysis Platform The underlying BitBlaze Binary Analysis Platform features a novel fusion of static and dynamic analysis techniques, dynamic symbolic execution, and whole-system emulation and binary instrumentation. The BitBlaze platform has different components for each task: Vine, TEMU, and Rudder. The three components in tandem provide the power for effective analysis of real-world binary programs for various applications.

Vine, the static analysis component. Open source release available now. Vine provides an an intermediate language for assembly (ILA), and an infrastructure for analyzing programs written in this language. ILA is a full language in which programs can be written, type-checked, then compiled down to assembly. We also provide analysis on the ILA, such as abstract interpretation, dependency analysis, and logical analysis via interfaces with theorem provers.

Vine provides an an intermediate language for assembly (ILA), and an infrastructure for analyzing programs written in this language. ILA is a full language in which programs can be written, type-checked, then compiled down to assembly. We also provide analysis on the ILA, such as abstract interpretation, dependency analysis, and logical analysis via interfaces with theorem provers. TEMU, the dynamic analysis component. Open source release available now. TEMU provides a dynamic analysis environment through whole-system emulation and dynamic binary instrumentation. TEMU is OS-aware (i.e., it understands OS-level semantics) and enables various fine-grained dynamic analysis to build upon, such as dynamic taint analysis and fine-grained behavioral analysis.

TEMU provides a dynamic analysis environment through whole-system emulation and dynamic binary instrumentation. TEMU is OS-aware (i.e., it understands OS-level semantics) and enables various fine-grained dynamic analysis to build upon, such as dynamic taint analysis and fine-grained behavioral analysis. Rudder, the component for online dynamic symbolic execution. Rudder is an engine for online dynamic execution on binaries. At a high level, with a specified set of input sources of interest, Rudder can automatically explore different execution paths in a program determined by the input sources. It will automatically build logical formulas representing the constraints on the chosen input to take the followed paths.

Release Information: We are now making some key parts of the BitBlaze Binary Analysis Platform available under open-source licenses. See a separate page for more information.

In conjunction with our BlackHat 2010 presentation, we have also made a demonstration binary release of some tools for trace-based crash analysis.

BitBlaze in Action: Security Applications

In particular, we show below three classes of security applications: (1) vulnerability detection, diagnosis, and defense; (2) automatic in-depth malware analysis and defense; (3) automatic model extraction and analysis.

Vulnerability Detection, Diagnosis, and Defense Hybrid Information- and Control-Flow Graph (HI-CFG)

Many security analysis tasks require understanding the high-level structure of a binary program in terms of both its control-flow and the data it operates on. To facilitate the automatic reverse engineering of such structure, we have introduced a new program representation, a hybrid information- and control-flow graph (HI-CFG). Our research explores algorithms to infer a HI-CFG from an instruction-level trace, without requiring source-level information or static analysis.



Identifying Causal Execution Differences for Security Applications

A security analyst often needs to understand two runs of the same program that exhibit a difference in program state or output. This is important, for example, for vulnerability analysis, as well as for analyzing a malware program that features different behaviors when run in different environments. Differential Slicing is an automatic slicing technique for the analysis of such execution differences. The causal difference graph it outputs captures the input differences that triggered the observed difference and the causal path of differences that led from those input differences to the observed difference. Automatic Defense System against Zero-day Exploits and Worms

Worms such as CodeRed and SQL Slammer can compromise millions of hosts within hours or even minutes and have caused billions of dollars in estimated damage. How can we design and develop effective defense mechanisms against such fast, large scale worm attacks? Sting is an automatic worm defense system which proposes a suite of novel techniques to automatically detect new exploits, perform in-depth diagnosis, and generate effective anti-bodies (vulnerability signatures and hardened binaries) to protect vulnerable hosts and networks from further attacks. Automatic Patch-based Exploit Generation

Security patches are supposed to fix vulnerabilities in programs. But what are the security implications of a security patch? In this work, we propose new techniques and demonstrate that one could automatically generate exploits from the patch binary and the original vulnerable program binary and sometimes in minutes of time. Loop-extended Symbolic Execution: Buffer Overflow Diagnosis and Discovery

Loop-extended symbolic execution (or LESE) is a new technique that generalizes the results of previous dynamic symbolic execution techniques, which broadens the results with effects of loops. LESE is a key enabler for powerful automated discovery of security vulnerabilities, especially buffer-overflows, which is highly inefficient with pure symbolic/concrete execution. It also enables deeper diagnosis of known vulnerabilities, which allows automated signature generation tools to reason about variable-length input or repeated elements in the input. Measuring Quantitative Influence

Dynamic taint analysis is a fundamental tool for detecting overwrite attacks, but it is limited to an all-or-nothing distinction as to whether values are under the control of an attacker, and suffers from both false-positive and false-negative errors. We propose quantitative influence to more precisely characterize the degree of control an attacker has over a value. A specialization of the concept of channel capacity from information theory, we show that quantitative influence can be computed precisely using a decision procedure. Quantitative influence accurately distinguishes real attacks from false positives among warnings generated by a dynamic taint analysis tool on vulnerable binary servers. Statically-Directed Dynamic Automated Test Generation

Static analysis, dynamic analysis, and symbolic execution have complementary strengths for exploring the space of program executions, but on its own each has significant limitations. How can we combine them to leverage the best features of all three? Our work on statically-directed dynamic automated test generation explores a three-stage process. It first performs dynamic analysis to build a control-flow model, then performs static analysis to search for potential vulnerabilities, and finally uses dynamic symbolic execution to prove that warnings are true positives by finding concrete test cases for them. In an evaluation on a suite of buffer-overflow benchmarks extracted from real applications, the results of the first two phases allowed symbolic execution to trigger vulnerabilities it otherwise could not, including all but one of the benchmarks.

Automatic Malware Analysis and Defense Detection and Analysis of Privacy-Breaching Malware

A myriad of malware such as keyloggers, Browser-helper Objects (BHO) based spyware, rootkits, backdoors, accesses and leaks users' sensitive information and breaches users' privacy. Can we have a unified approach to identify such privacy-breaching malware despite their widely-varied appearance? Panorama proposes a unified approach to detect privacy-breaching malware using whole-system dynamic taint analysis. Hidden Code Extraction from Packed Executables

Code packing is one technique commonly used to hinder malware code analysis through reverse engineering. Even though this problem has been previously researched, the existing solutions are either unable to handle novel samples, or vulnerable to various evasion techniques. Renovo proposes a fully dynamic approach for hidden code extraction, capturing an intrinsic nature of hidden code execution. Detection and Analysis of Malware Hooking Behaviors

One important malware attacking vector is its hooking mechanism. Malicious programs implant hooks for many different purposes. Spyware may implant hooks to get notified of the arrival of new sensitive data. Rootkits may implant hooks to intercept and tamper with critical system information to conceal their presence in the system. A stealth backdoor may also place hooks on the network stack to establish a stealthy communication channel with remote attackers. HookFinder proposes fine-grained impact analysis to automatically detect and analyze malware's hooking behaviors. Since this technique captures the intrinsic nature of hooking behaviors, it is well suited for identifying new hooking mechanisms. Automatic Malware Dissection and Trigger-based Behavior Analysis

Malware often has embedded behavior which is only exhibited when certain conditions are met. Such trigger-based behavior includes time bombs, logic bombs, and botnets programs which reacts to commands. Static analysis of malware often provides little utility due to code packing and obfuscation. Vanilla dynamic analysis can only provides limited view since the trigger conditions are usually not met. How can we design automatic analysis methods to uncover the trigger conditions and trigger-based behavior hidden in malware? BitScope enables automatic exploration of program execution paths in malware to uncover trigger conditions (such as the time used in time bombs and commands in botnet programs) and trigger-based behavior, using dynamic symbolic execution. BitScope also provides in-depth analysis of the input/output behavior of the malware.

Automatic Model Extraction and Analysis Extracting security-related models from browsers for analysis and vulnerability discovery

In this work, we show how to use string-enhanced white-box exploration techniques to automatically extract security-related models from browsers and to automatically discover cross-site scripting (XSS) vulnerabilities by comparing the extracted models with websites' filters. Deviation Detection in Binaries

Many network protocols and services have several different implementations. Automatically identifying deviations in different implementations of the same protocol/service can enable the detection of potential implementation errors without protocol specification, and can enable automatic generation of fingerprints to identify an implementation remotely. How can we automatically identify such deviations in binaries implementing the same specification? Deviation Detection automatically identifies deviations in different binaries to detect implementation errors and generate fingerprints. It is achieved by building symbolic formulas that characterize how each binary processes an input. Protocol Reverse Engineering and Application Dialogue Replay

Many network protocols are proprietary or have no well documented specification. However, many security applications require protocol reverse engineering and application dialogue (network trace) replay. Dispatcher, Polygot and Replayer automatically extract information about network protocols and enables application dialogue replay using binary analysis.



FPGate project got Microsoft BlueHat Prize Contest's Special Recognition Award in 2012.

FPGate stops attacks targeting function pointers by limiting indirect transfers to only those targets that are legal in the original program. When deployed together with other existing lightweight protections, FPGate can provide a level of protection comparable to CFI (Control Flow Integrity), stopping almost all control fow hijacking attacks including ROP. FPGate has two main advantages compared with previous solutions: it can inter-operate well with existing non-hardened libraries, so it can be deployed progressively; we also develop a method to recognize all sources and targets automatically in modern security-sensitive binary executables, thus FPGate can be applied directly on these binary files. The performance overhead of FPGate is only 0.36% in average measured using SPECint 2006. FPGate is a joint work of Lenx with Chao Zhang, Zhaofeng Chen, Lei Duan from Peking University, and Laszlo Szekeres, Stephen McCamant, Dawn Song from UC Berkeley.

Vulnerabilities Discovered

CVE-2011-0904 Out-of-bounds Memory Access in Gnome VNC Vino Server

CVE-2011-0905 Out-of-bounds Memory Access in Gnome VNC Vino Server

CVE-2008-3465 (MS08-071) Heap-based buffer overflow in gdi32.dll

OSVDB-66497 Out-of-bounds memory access in Cutwail bot

OSVDB-66498 Null pointer dereference in Gheg bot

OSVDB-66499 Null pointer dereference in Zbot trojan

OSVDB-66500 Infinite loop in Zbot trojan

OSVDB-66501 Stack-based and heap-based buffer overflow in Zbot trojan

News Coverage

The BitBlaze project is looking for developers to help extend and enhance our state-of-the art framework for binary analysis in security applications. In particular, we're looking for developers/researchers with skills and experience including computer security, languages and compilers, assembly language, low-level operating system work, and decision procedures. We have openings for interns (for the summer or another similar period), staff scientists/staff programmers, postdocs, and open-source contributors. If interested, send a CV/resume and interest description to bitblaze.jobs at gmail.com.

For general questions regarding the BitBlaze project, please send email to bitblaze at gmail.com.

To receive announcements about code releases and other bitblaze related updates, please subscribe to the Bitblaze Announcement List