Starting out with joern

Recently, I started playing with joern after watching Fabian Yamaguchi's excellent 31c3 talk. This post is intended as a resource for people who want to learn more about it. Simply, joern is a C (and limited C++) parser that stores it's output in a graph database. Specifically, it stores abstract syntax trees combined with control-flow and data-flow graphs into a single 'code property graph.' With this data it's possible to formulate graph searches to query for vulnerabilities across many common bug classes present in C programs. Fabian chose to use the graph traversal language gremlin and implemented a number of useful program analysis primitives as gremlin steps (analogous to SQL prepared statements) to be able to reason about higher-level program constructs instead of graph structures.

To get started I would recommend watching Fabian's talk. Additionally, I found the following resources helpful in learning about gremlin and joern:

On the Nature of Pipes: A simple introduction to the high-level ideas of gremlin

GremlinDocs: API documentation for built-in gremlin traversals

python-joern: The joern-specific traversals for reasoning about code property graphs.

Modeling and Discovering Vulnerabilities with Code Property Graphs: Paper presenting joern by Fabian Yamaguchi, et al. Feel free to skip the mathematical formalism--for those familiar with basic static program analysis, the meat is in sections 1, 5, and 6.

When I initially began using joern I wrote queries against dummy code to expirement. Once I began to look at real code bases I still found it useful to test against isolated examples so I wrote a simple wrapper to 'unit test' joern traversals. Since there are few public joern queries, I've put this wrapper together with a few queries into the joern-traversals repository to serve as examples for others. There aren't many queries yet, so I encourage you to contribute any interesting ones you might write (especially if they've found real bugs!) Currently it includes a progression of queries that illustrate an experiment I tried evolving from a very simple query to a fairly complex one looking for infinite loop DoS conditions in wireshark. I think it's a compelling example because it illustrates joern's utility in searching for bugs with complex control- and data-flow semantics (and finding them!)

Lastly, I've added a small joern-console script to joern-tools that you might find helpful in writing traversals.