This is Part 2 in a short series of long articles looking at automating the process of finding CWE-369: Divide By Zero bugs in compiled binaries with Binary Ninja. If you haven’t read Part 1 please consider giving it a read before diving into this one.

In this article, we’re going to address the issue of modeling imported library functions, specifically well-known functions of standard C libraries. We’ll see why this is an issue and how we can include simple approximations to our model to cover many of them. This will help us address common false negatives plaguing our analyzer at the conclusion of Part 1.

While we’re at it, we’re going to take a look at a fascinating false-negative that occurs when we run our updated analyzer against an ARM binary with a divide by zero bug, find out why it occurs, and fix it. Here we go!

Another Contrived Example (x86_64)

In Part 1, we modeled a case where a constant zero could be linked directly to the denominator of a division operation. But what if the denominator can’t be linked to a constant? What if we trace it back to the return value of a function call instead? How do we know for sure the function can return a zero value?

In this example, we’ll take a look at modeling denominators stemming from function calls and model a well-known function, atoi. We’ll also see how targeting another architecture (ARM) leads to a false negative, and how we can expand our model to deal with the interesting challenge ARM presents when dealing with division operations. But first, let’s check out atoi.

The atoi(const char *nptr) function converts the initial portion of the string pointed to by nptr to an int.

A contrived, yet more complex CWE-369 example

Take a moment to examine the C program above. Do you see the issue? On line 13 there’s a division operation where the variable data is used as the denominator. This variable is initialized to 1 on line 8 but gets overwritten with the return value of atoi on line 12.

The atoi function is a standard C library function that converts an ASCII string to an integer. The string being converted is the first command line argument to this application. The issue here is that a user could enter “0” as the first argument resulting in the variable data becoming zero, leading to a divide by zero on line 13. Let’s see what this looks like in MLIL SSA.

The return value of atoi leading to the denominator of a division operation.

Because this function graph is rather large, I’ve isolated the screen capture to the important basic block only. If you take a look at index 15 you’ll see rax_3#7 will contain the return value of atoi, which is masked from 64- to 32-bits at index 16 and stored in var_10_1#2 which appears in the denominator of the division operation at index 20.

Let’s see what happens when we run our analyzer from Part 1 on this program as complied for x86–64.

Analyzing file: cwe369B_x64

Function: main >> CWE-369 analyzer found 0 issues.

As you can see, our analyzer found nothing. That’s because our naive model doesn’t take into account return values from function calls, nor does it attempt to handle interprocedural data flow. So how can we address this issue?

Well, we could follow the atoi function and analyze it — assuming we had the library available to analyze. This could lead to a huge can of worms we aren’t yet prepared to deal with. So instead, let’s take a much easier approach by creating a lookup table of well-known standard library function names and their possible return values we care about. It’ll look something like this:

RETURN_MAP = {

'atoi': 0,

}

All we have to do now is add a lookup to our model whenever a function call is made to see if we know what the potential return values could be.

In the case of atoi, the return value is a 32-bit signed integer, which includes all values in the range [−2147483648, 2147483647] which includes zero. Of course, we don’t want all of these possible values to appear in our data graph, that would be a bit excessive. So instead we can either limit it to the value we care about (zero) or make it a 32-bit signed integer and make sure our values of interest fall within that data type.

We’ll choose the absolute simplest option, which is just correlating atoi to 0 since that’s all we care about in this analyzer. Here’s the code we’re adding to our analyzer:

# MLIL_CALL_SSA

if instr.operation in [binaryninja.MediumLevelILOperation.MLIL_CALL_SSA]:

func_name = get_function_name_from_address(bv, instr)

if func_name in RETURN_MAP:

value = RETURN_MAP[func_name]

for var_written in instr.vars_written:

vw_str = "{}#{}".format(var_written.var, int(var_written.version))

graph.add_edge(str(value), vw_str)

This code adds an edge in our data graph between the variable holding the return value of atoi and a 0. This zero comes from our “RETURN_MAP” lookup table (Python dictionary). There’s a helper function here called get_function_name_from_address that I’ve omitted for brevity. Here’s a Gist of the newly updated analyzer (checkout line 18 for the helper function and line 47 for our new code additions to handle call return values).

Now let’s run this updated model against this program (cwe369B_x64) and see what happens.

Analyzing file: cwe369B_x64

Function: main [ALERT 1]: Possible divide by zero detected.

Function: main

Index: 20

Address: 0x6c9

Operation: MLIL_SET_VAR_SSA

Instruction: temp0_1#1 = divs.dp.d(temp4_1#1:temp5_1#1, var_10_1#2)

Variable: var_10_1#2

Chain: ['var_10_1#2', 'rax_3#7', '0']

Not too bad huh? While it isn’t perfect, this simple enhancement to our model addressed an important class of false-negatives in our analyzer from Part 1.

So all we have to do is add all well-known C library calls to our lookup table and we’re good right? Unfortunately, it’s not always that easy. Let’s take a look at an interesting false-negative when we run our analyzer against an ARM version of our example C program.

Handling ARM Division Wrappers

Let’s run our analyzer against our contrived example compiled for ARM (cwe369B_ARM32).

Analyzing file: cwe369B_ARM32

Function: main >> CWE-369 analyzer found 0 issues.

So why on earth does our model work on an x86–64 target but fail on an ARM target? Let’s take a look at the MLIL SSA form of cwe369B_ARM32.

cwe369B_ARM32 in MLIL SSA form showing an interesting function call to handle division

Do you see that at index 16? Our division operation is now a call to a function name __aeabi_idiv with the first argument being the numerator (0x64, or 100) and the second argument being the denominator (r1#12).

If you’re interested you can read about this phenomenon at the ARM Information Center. What’s going on here is GCC wrapped the division operation with a call to this function to mimic ARM’s RealView compiler process for integer division. If we run this program using qemu, we’ll see that a 0 passed as an argument does, in fact, cause early termination as we expect.

qemu-arm ./cwe369B_ARM32 0

qemu: uncaught target signal 8 (Floating point exception) - core dumped

Floating point exception

So how do we deal with this? Well, we now have a new divide by zero sink we’re interested in tracking. Actually, we have two; __aeabi_idiv (for signed division) and __aeabi_uidiv (for unsigned division).

We’re going to implement these using another lookup table called DIVISION_WRAPPERS. The key is the name of the function wrapper and the value is the zero-indexed argument representing the denominator variable.

DIVISION_WRAPPERS = {

# name: denominator argument (zero-indexed)

"__aeabi_idiv": 1,

"__aeabi_uidiv": 1,

}

We also need to alter our find_all_paths_from function to handle nested generators which will cause enumeration to hang if not handled properly. Putting it all together, we get the following updated analyzer. Please take a minute to scan over the code.

Now let’s run this updated model against this program (cwe369B_ARM) and see what happens.

Analyzing file: cwe369B_ARM32

Function: main [ALERT 1]: Possible divide by zero detected.

Function: main

Index: 16

Address: 0x104ac

Operation: MLIL_CALL_SSA

Instruction: r0_2#3, mem#3 = 0x104f4(0x64, r1#2) @ mem#2

Variable: r1#2

Chain: ['r1#2', 'var_10_1#1', 'r0_1#2', '0'] >> CWE-369 analyzer found 1 issues.

Ahh, that’s better! We now have an analyzer that takes ARM’s funny wrapped division sinks into account. Pretty cool huh? Oh, here’s the MLIL SSA again so you can follow the chain without scrolling up and down.

Moving Forward

At this point, we have an analyzer capable of finding a few simple manifestations of CWE-369 but we’re not done yet.

We still need to consider reachability due to path constraints, operations occurring in loops, and interprocedural data flow. We’ll address these details in Part 3, Part 4, and Part 5 respectively. Hang in there, we’ll get this wrapped up before you know it!

Again, thank you so much for reading. It means a lot to me that you took the time to read this. I feel like we’re really starting to bond, don’t you? Keep your head up, you’re worth it!

Series Links

Please use this handy set of links to navigate this series on finding CWE-369: Divide By Zero bugs with Binary Ninja.

Part 1: Mapping constant 0 values to denominators

Part 2: Zero values stemming from well-known functions, new sinks

Part 3: Considering reachability by evaluating path constraints

Part 4: Tracking operations occurring in loops

Part 5: Interprocedural data flow