Using it all

In order to use our plugins one needs to install the mainline radare2 and the radare2-extras, preferably the latest versions from git.

Simple examples

Let’s first take a look at a very basic example of solidity contract and translate it to binary code:

$ cat ./example1.sol

pragma solidity ^0.4.0;



contract Example1 {

uint a = 0;



function setA(uint b) {

a = b + 0x42;

}

}



$ solc ./example1.sol --bin-runtime -o ./out/

$ ls ./out/

Example1.bin-runtime

Calling solc with --bin-runtime flag creates binary code of a contract as it would appear when loaded into the blockchain. If we chose --bin option instead, this same code would be prefixed with the code actually placing this contract into the blockchain, for now we don’t want to bother with that. solc creates the output in hexadecimal format, so let’s use rax2 utility that comes with r2 to convert it to binary format:

$ rax2 -s < ./out/Example1.bin-runtime > ./out/Example1.bin-runtime.bin

And now we can open it with r2 and analyze it:

Great. We are starting to see some EVM bytecode in the r2 framework. Let’s try to understand step-by-step what this code actually does, how it is executed, what are its inputs and so on.

Understanding the contract’s entry code

Every time we call a contract with some input data its execution starts from the very beginning, the 0x0 address. At the start of execution the memory, stack, and storage are empty. First two instructions push two values on the stack, 0x60 and 0x40 that will become operands for the next instruction. At this point the stack will contain

0: 0x40

1: 0x60

The instruction MSTORE will save a word to memory. It takes the first value off the stack as a dst addr, where to store the word, and the next value from the stack as the value to store. So in our case it will store a 32-byte word with value 0x60 in memory at address 0x40 , so after execution of this command the memory will contain

0x0: 0000 0000 0000 0000 0000 0000 0000 0000

0x10: 0000 0000 0000 0000 0000 0000 0000 0000

0x20: 0000 0000 0000 0000 0000 0000 0000 0000

0x30: 0000 0000 0000 0000 0000 0000 0000 0000

0x40: 0000 0000 0000 0000 0000 0000 0000 0000

0x50: 0000 0000 0000 0000 0000 0000 0000 0060

And the stack will be empty.

The next instruction will push 0x4 on the stack, the instruction after it is CALLDATASIZE . This would push on the stack the size of the input data that our contract has been called with. So stack would become

0: 0x4

1: $size_of_input_data

We will talk about input data a bit later, right now let’s move on to the next instruction which is LT . It simply compares two values on top of the stack with each other and pushes the result of comparison back to the stack:

1 if stack[0] > stack[1], 0 otherwise.

So, clearly we are comparing the size of the input data here with 0x4 .

Next, we push a constant 0x3f on the stack, and do the JUMPI instruction. JUMPI is a conditional jump, that first pops the dst addr from the stack and then pops the condition from the stack. So, as you may see, if the size of the input data is less than 0x4 , code execution will jump to 0x3f . Ok, nice, we are done with reading our first basic block of Ethereum bytecode!

Now let’s quickly take a look at what happens at address 0x3f for the case when our input length is less than 0x4 .

First instruction is a JUMPDEST , as noted earlier this is just a nop marking the valid dst for a jump instruction. Next instruction pushes 0x0 on the stack and the next one duplicates it. And finally, the REVERT instruction will terminate the execution of the transaction, refunding all the used gas to the caller and returning some data from the memory pointed by the the arguments on the stack. In this case both it’s arguments are 0x0 , so it will be returning nothing.

Ok, so if the length of input data is less than 0x4 , we revert the execution returning nothing.

The function call dispatcher

So, let’s follow the other branch. 0x0 is pushed to the stack, and the CALLDATALOAD instruction is called. It writes the first 32-byte word of the input to the stack at the address pointed by the top of the stack, in our case 0x0 . Next instruction is PUSH29 , that pushes 29 bytes to the stack. In our case its operand is incorrectly decoded as 0x0 , due to inability of r2 framework to handle such large numbers. However, using the hexdump, we can get the whole number:

Here, we print 30 bytes at the beginning of our PUSH29 instruction. The opcode itself is 0x7c , followed by the operand 0x0100000000000000000000000000000000000000000000000000000000 .

Then we SWAP these values on top of the stack and DIV the first 32-byte word of the input by this constant 0x01...0 . Obviously, this division will just shift the four leftmost bytes of the 32-byte word to all the way to the right. I.e. 0xdeadbeef42424242...42 becomes 0xdeadbeef .

Next 0xffffffff is pushed to the stack and AND -ed with the result of the previous right shift operation. Then, the top of the stack is duplicated with the DUP1 command and some constant 0xee919d50 is pushed to the stack.

EQ is called to compare this constant with the result of the previous operation and if they appear to be equal, we jump to addr 0x44 . If not, we continue with this branch at addr 0x3f , which, as we have already seen just reverts the execution.

So we revert if the first four bytes of our input data are not equal to 0xxee919d50 . This value is actually first bytes of the sha3 hashsum of the function name and its parameters that we’ve defined in our contract:

> web3.sha3('setA(uint256)').substr(0, 10)

"0xee919d50"

So, right now we have figured out that our contract ABI looks the following way: the first four bytes of the input data are hash of the function that we are calling. The section of code that compares the hashes known to the contract with the first bytes of the input may be seen as a dispatcher. If the hash is not found, we revert the execution, that’s what the dispatcher does.

Function itself

Ok, in the previous subchapter we stopped at the jump to 0x44 . Leaving the analysis of this block to the reader I will only say that since our Solidity function is not payable, the code has to check if our code has been called by a transaction with an amount of ether equal to zero. If not, we will revert. That is done with the CALLVALUE instruction.

If this check passes we will go to the function body itself:

Ok, let’s quickly run through this code. The first part of it loads the function argument with CALLDATALOAD command, does some stuff and jumps to 0x64 . There we actually add 0x42 to the value of the input and store the resulting value into the contract’s storage actually updating the a variable. That is done with the SSTORE command. It all ends with an unconditional JUMP with an unknown address. But if we note the push1 0x62 instruction at the beginning of this code and the two dead code instructions at 0x62-0x63 we may guess that this unknown address JUMP is actually leading to 0x62 . Code there does nothing but call STOP instruction that stops the execution of the transaction.

You may find code and binary data for this post in https://github.com/montekki/r2evm

Ok, that’s it for this simple example, in the next parts we will be taking a look at more complex examples and the usage of the debugger. Stay tuned!

Useful links