If you are new to this series, the previous articles:

Solidity Events

A Solidity event looks something like this:

event Deposit(

address indexed _from,

bytes32 indexed _id,

uint _value

);

It has the name Deposit .

. It has three parameters, of different types.

Two of these types are “indexed”.

One parameter is not “indexed”.

There are two quirky limitations to Solidity events:

There may be at most 3 indexed parameters.

If the type of an indexed parameter is larger than 32 bytes (i.e. string and bytes), the actual data isn’t stored, but rather the KECCAK256 digest of the data is stored.

Why is this so? And what is the difference between indexed and non-indexed parameters?

EVM Log Primitives

To start to understand these quirks and limitations of Solidity events, let’s look at the log0 , log1 , ..., log4 EVM instructions.

The EVM logging facility uses different terminologies than Solidity:

“topics”: There may be up to 4 topics. Each topic is exactly 32 bytes.

“data”: The data is the payload of the event. It may be arbitrary number of bytes.

How does a Solidity event map to a log primitive?

All the “non-indexed parameters” of an event is stored as the data.

Each of the “indexed parameters” of an event is stored as a 32 bytes topic.

Since string and bytes can be longer than 32 bytes, if they are indexed, Solidity would store KECCAK256 digest instead of the actual data.

Solidity lets you have at most 3 indexed arguments, but EVM lets you have at most 4 topics. It turns out that Solidity consumes one topic for the event’s signature.

The log0 Primitive

The simplest logging primtiive is log0 . This creates a log item that only has data, but no topic. The data of logs can be an arbitrary number of bytes.

We can use log0 directly in Solidity. In this example, we'll store a 32 bytes number:

pragma solidity ^0.4.18; contract Logger {

function Logger() public {

log0(0xc0fefe);

}

}

The generated assembly can be divided into two halves. The first half copies the log data ( 0xc0fefe ) from stack into memory. The second half puts arguments on stack for the log0 instruction, telling it where in memory to load the data.

The annotated assembly:

memory: { 0x40 => 0x60 } tag_1:

// copy data into memory

0xc0fefe

[0xc0fefe]

mload(0x40)

[0x60 0xc0fefe]

swap1

[0xc0fefe 0x60]

dup2

[0x60 0xc0fefe 0x60]

mstore

[0x60]

memory: {

0x40 => 0x60

0x60 => 0xc0fefe

} // calculate data start position and size

0x20

[0x20 0x60]

add

[0x80]

mload(0x40)

[0x60 0x80]

dup1

[0x60 0x60 0x80]

swap2

[0x60 0x80 0x60]

sub

[0x20 0x60]

swap1

[0x60 0x20] log0

Just before executing log0 , there are two arguments on the stack: [0x60 0x20] .

start : 0x60 is the position in memory to load the data.

: 0x60 is the position in memory to load the data. size : 0x20 (or 32) specifies the number of bytes of data to load.

The go-ethereum implementation for log0 looks something like this:

func log0(pc *uint64, evm *EVM, contract *Contract, memory *Memory, stack *Stack) ([]byte, error) {

mStart, mSize := stack.pop(), stack.pop() data := memory.Get(mStart.Int64(), mSize.Int64()) evm.StateDB.AddLog(&types.Log{

Address: contract.Address(),

Data: data,

// This is a non-consensus field, but assigned here because

// core/state doesn't know the current block number.

BlockNumber: evm.BlockNumber.Uint64(),

}) evm.interpreter.intPool.put(mStart, mSize)

return nil, nil

}

You can see in this code that log0 pops two arguments from stack, then copies the data from memory. Then it calls StateDB.AddLog to associate the log with the contract.

Logging With Topics

Topics are 32 bytes of arbitrary data. An Ethereum implementation would use these topics to index logs for efficient event logs querying and filtering.

This example uses the log2 primitive. The first argument is the data (of any number of bytes), followed by 2 topics (32 bytes eacb):

// log-2.sol

pragma solidity ^0.4.18; contract Logger {

function Logger() public {

log2(0xc0fefe, 0xaaaa1111, 0xbbbb2222);

}

}

The assembly is very similar. The only difference is that the two topics ( 0xbbbb2222 , 0xaaaa1111 ) are pushed onto the stack at the very beginning:

tag_1:

// push topics

0xbbbb2222

0xaaaa1111 // copy data into memory

0xc0fefe

mload(0x40)

swap1

dup2

mstore

0x20

add

mload(0x40)

dup1

swap2

sub

swap1 // create log

log2

The data is still 0xc0fefe , copied to memory. Just before executing log2 , the state of the EVM looks like:

stack: [0x60 0x20 0xaaaa1111 0xbbbb2222]

memory: {

0x60: 0xc0fefe

} log2

The first two arguments specifies the memory region to use as log data. The two additional stack arguments are two 32 bytes topics.

All EVM Logging Primitives

The EVM supports 5 logging primitives:

0xa0 LOG0

0xa1 LOG1

0xa2 LOG2

0xa3 LOG3

0xa4 LOG4

They are all the same, except for the number of topics used. The go-ethereum implementation actually generates these instructions using the same code, varying only in size , which specifies the number of topics to pop from stack.

func makeLog(size int) executionFunc {

return func(pc *uint64, evm *EVM, contract *Contract, memory *Memory, stack *Stack) ([]byte, error) {

topics := make([]common.Hash, size)

mStart, mSize := stack.pop(), stack.pop()

for i := 0; i < size; i++ {

topics[i] = common.BigToHash(stack.pop())

} d := memory.Get(mStart.Int64(), mSize.Int64())

evm.StateDB.AddLog(&types.Log{

Address: contract.Address(),

Topics: topics,

Data: d,

// This is a non-consensus field, but assigned here because

// core/state doesn't know the current block number.

BlockNumber: evm.BlockNumber.Uint64(),

}) evm.interpreter.intPool.put(mStart, mSize)

return nil, nil

}

}

Feel free to poke around the code on sourcegraph:

https://sourcegraph.com/github.com/ethereum/go-ethereum@83d16574444d0b389755c9003e74a90d2ab7ca2e/-/blob/core/vm/instructions.go#L744

Logging Testnet Demo

Let’s try to generate some logs with a deployed contract. The contract logs 5 times, using different data and topics:

pragma solidity ^0.4.18; contract Logger {

function Logger() public {

log0(0x0);

log1(0x1, 0xa);

log2(0x2, 0xa, 0xb);

log3(0x3, 0xa, 0xb, 0xc);

log4(0x4, 0xa, 0xb, 0xc, 0xd);

}

}

This contract is deployed on the Rinkeby test network. The transaction that created this contract is:

https://rinkeby.etherscan.io/tx/0x0e88c5281bb38290ae2e9cd8588cd979bc92755605021e78550fbc4d130053d1

Click on the “Event Logs” tab, you should see the raw data for the 5 log items:

The topics are all 32 bytes. The number we logged as data is encoded as a 32 bytes numbers.

Querying For The Logs

Let’s use Ethereum’s JSON RPC to query for these logs. An Ethereum API node would create indices to make it efficient to find logs by matching topics, or to find logs that are generated by a contract address.

We’ll use the hosted RPC nodes provided by the kind folks at infura.io. You can get the API key by registering a free account.

Once you get the key, set the shell variable INFURA_KEY for the following curl examples to work:

INFURA_KEY=my_infura_key_blah_blah_blah

For a simple example, let’s call eth_getLogs to fetch all the logs associated with the contract:



-X POST \

-H "Content-Type: application/json" \

--data '

{

"jsonrpc": "2.0",

"id": 1,

"method": "eth_getLogs",

"params": [{

"fromBlock": "0x0",

"address": "0x507e86b11541bcb1f3fe200b2f10ed8fd9413bd0"

}]

}

' curl " https://rinkeby.infura.io/$INFURA_KEY " \-X POST \-H "Content-Type: application/json" \--data '"jsonrpc": "2.0","id": 1,"method": "eth_getLogs","params": [{"fromBlock": "0x0","address": "0x507e86b11541bcb1f3fe200b2f10ed8fd9413bd0"}]

fromBlock : from which block to start looking for logs. By default it starts looking at the tip of the blockchain. We want all logs, so we start from the first block.

: from which block to start looking for logs. By default it starts looking at the tip of the blockchain. We want all logs, so we start from the first block. address : logs are indexed by contract addresses, so this is actually quite efficient.

The output is the underlying data that etherscan displays for the “Event Logs” tab. See the full output: evmlog.json.

A log item returned by the JSON API looks like this:

{

"address": "0x507e86b11541bcb1f3fe200b2f10ed8fd9413bd0",

"topics": [

"0x000000000000000000000000000000000000000000000000000000000000000a"

],

"data": "0x0000000000000000000000000000000000000000000000000000000000000001",

"blockNumber": "0x179097",

"transactionHash": "0x0e88c5281bb38290ae2e9cd8588cd979bc92755605021e78550fbc4d130053d1",

"transactionIndex": "0x1",

"blockHash": "0x541bb92d8de24cad637717cdc43ae5e66d9d6193b9f964fbb6461f6727eb9e57",

"logIndex": "0x2",

"removed": false

}

Next, we can query for logs that matches the topic “0xc”:



-X POST \

-H "Content-Type: application/json" \

--data '

{

"jsonrpc": "2.0",

"id": 1,

"method": "eth_getLogs",

"params": [{

"fromBlock": "0x179097",

"toBlock": "0x179097",

"address": "0x507e86b11541bcb1f3fe200b2f10ed8fd9413bd0",

"topics": [null, null, "0x000000000000000000000000000000000000000000000000000000000000000c"]

}]

}

' curl " https://rinkeby.infura.io/$INFURA_KEY " \-X POST \-H "Content-Type: application/json" \--data '"jsonrpc": "2.0","id": 1,"method": "eth_getLogs","params": [{"fromBlock": "0x179097","toBlock": "0x179097","address": "0x507e86b11541bcb1f3fe200b2f10ed8fd9413bd0","topics": [null, null, "0x000000000000000000000000000000000000000000000000000000000000000c"]}]

topics : an array of topics to match. null matches anything. See details.

There should be two matching logs:

{

"address": "0x507e86b11541bcb1f3fe200b2f10ed8fd9413bd0",

"topics": [

"0x000000000000000000000000000000000000000000000000000000000000000a",

"0x000000000000000000000000000000000000000000000000000000000000000b",

"0x000000000000000000000000000000000000000000000000000000000000000c"

],

"data": "0x0000000000000000000000000000000000000000000000000000000000000003",

"blockNumber": "0x179097",

"transactionHash": "0x0e88c5281bb38290ae2e9cd8588cd979bc92755605021e78550fbc4d130053d1",

"transactionIndex": "0x1",

"blockHash": "0x541bb92d8de24cad637717cdc43ae5e66d9d6193b9f964fbb6461f6727eb9e57",

"logIndex": "0x4",

"removed": false

},

{

"address": "0x507e86b11541bcb1f3fe200b2f10ed8fd9413bd0",

"topics": [

"0x000000000000000000000000000000000000000000000000000000000000000a",

"0x000000000000000000000000000000000000000000000000000000000000000b",

"0x000000000000000000000000000000000000000000000000000000000000000c",

"0x000000000000000000000000000000000000000000000000000000000000000d"

],

"data": "0x0000000000000000000000000000000000000000000000000000000000000004",

"blockNumber": "0x179097",

"transactionHash": "0x0e88c5281bb38290ae2e9cd8588cd979bc92755605021e78550fbc4d130053d1",

"transactionIndex": "0x1",

"blockHash": "0x541bb92d8de24cad637717cdc43ae5e66d9d6193b9f964fbb6461f6727eb9e57",

"logIndex": "0x5",

"removed": false

}

Logging Gas Costs

The gas costs for the logging primitives depends on how many topics you have and how much data you log:

// Per byte in a LOG operation's data

LogDataGas uint64 = 8

// Per LOG

topicLogTopicGas uint64 = 375

// Per LOG operation.

LogGas uint64 = 375

These constants are defined in protocol_params.

Don’t forget the memory used, which is 3 gas per byte:

MemoryGas uint64 = 3

Wait what? It costs only 8 gas per byte of log data? That’s 256 gas for 32 bytes, and 96 gas for the memory use. So 322 gas versus 20000 gas for storing the same amount of data in storage, only 1.7% of the cost!

But wait, if you are passing in the log data as calldata to a transaction, you’ll need to pay for the transaction data too. The gas costs for calldata are:

TxDataZeroGas uint64 = 4 // zero tx data abyte

TxDataNonZeroGas uint64 = 68 // non-zero tx data byte

Assuming all 32 bytes are non-zero, this is still a lot cheaper than storage:

// cost of 32 bytes of log data

32 * 68 = 2176 // tx data cost

32 * 8 = 256 // log data cost

32 * 3 = 96 // memory usage cost

375 // log call cost

----

total (2176 + 256 + 96 + 375) ~14% of sstore for 32 bytes

Most of the gas cost is actually spent on transaction data, not for the log operation itself.

The reason that a log operation is cheap is because the log data isn’t really stored in the blockchain. Logs, in principle, can be recalculated on the fly as necessary. Miners, in particular, can simply throw away the log data, because future calculations can’t access past logs anyway.

The network as a whole does not bear the cost of logs. Only the API service nodes need to actually process, store, and index the logs.

So the cost structure of logging is just the minimal cost to prevent log spamming.

Solidity Events

Having seen how the logging primitives work, Solidity events are straightforward.

Let’s look at a Log event type that takes 3 uint256 parameters (non-indexed):

pragma solidity ^0.4.18; contract Logger {

event Log(uint256 a, uint256 b, uint256 c);

function log(uint256 a, uint256 b, uint256 c) public {

Log(a, b, c);

}

}

Instead of looking at the assembly code, let’s just look at the raw log that’s generated.

Here’s a transaction that invokes log(1, 2, 3) :

https://rinkeby.etherscan.io/tx/0x9d3d394867330ae75d7153def724d062b474b0feb1f824fe1ff79e772393d395

The log data:

The data is the event parameters, ABI encoded:

0000000000000000000000000000000000000000000000000000000000000001

0000000000000000000000000000000000000000000000000000000000000002

0000000000000000000000000000000000000000000000000000000000000003

There is one topic, a mysterious 32 bytes hash:

0x00032a912636b05d31af43f00b91359ddcfddebcffa7c15470a13ba1992e10f0

This is the SHA3 hash of the Event type signature:



# https://github.com/ethereum/pyethereum/#installation

> from ethereum.utils import sha3

> sha3("Log(uint256,uint256,uint256)").hex()

'00032a912636b05d31af43f00b91359ddcfddebcffa7c15470a13ba1992e10f0' # Install pyethereum> from ethereum.utils import sha3> sha3("Log(uint256,uint256,uint256)").hex()'00032a912636b05d31af43f00b91359ddcfddebcffa7c15470a13ba1992e10f0'

This is quite similar to how ABI-encoding for a method call works.

Because a Solidity event uses one topic for the event signature, there are only 3 topics left for indexed parameters.

Solidity Event With Indexed Arguments

Let’s look at an event that has an indexed uint256 parameter:

pragma solidity ^0.4.18; contract Logger {

event Log(uint256 a, uint256 indexed b, uint256 c);

function log(uint256 a, uint256 b, uint256 c) public {

Log(a, b, c);

}

}

The generated event logs:

There are now two topic:

0x00032a912636b05d31af43f00b91359ddcfddebcffa7c15470a13ba1992e10f0

0x0000000000000000000000000000000000000000000000000000000000000002

The first topic is the event type signature, hashed.

The second topic is the indexed parameter, as is.

The data is the ABI encoded event parameters, excluding the indexed parameters:

0000000000000000000000000000000000000000000000000000000000000001

0000000000000000000000000000000000000000000000000000000000000003

String/Bytes Event Parameter

Let’s now change the event parameters to be strings:

pragma solidity ^0.4.18; contract Logger {

event Log(string a, string indexed b, string c);

function log(string a, string b, string c) public {

Log(a, b, c);

}

}

Generate the log with log("a", "b", "c") . The transaction is:

https://rinkeby.etherscan.io/tx/0x21221c2924bbf1860db9e098ab98b3fd7a5de24dd68bab1ea9ce19ae9c303b56

There are two topics:

0xb857d3ea78d03217f929ae616bf22aea6a354b78e5027773679b7b4a6f66e86b

0xb5553de315e0edf504d9150af82dafa5c4667fa618ed0a6f19c69b41166c5510

The first topic is again the method signature.

The second topic is the sha256 digest of the string parameter.

Let’s verify that the hash of “b” is the same as the second topic:

>>> sha3("b").hex()

'b5553de315e0edf504d9150af82dafa5c4667fa618ed0a6f19c69b41166c5510'

The log data is the two non-indexed strings “a” and “c”, ABI-encoded:

0000000000000000000000000000000000000000000000000000000000000040

0000000000000000000000000000000000000000000000000000000000000080

0000000000000000000000000000000000000000000000000000000000000001

6100000000000000000000000000000000000000000000000000000000000000

0000000000000000000000000000000000000000000000000000000000000001

6300000000000000000000000000000000000000000000000000000000000000

Unfortunately, the original string for the indexed string parameter is not stored, so there is no way for the DApp client to recover it.

If you REALLY need the original string, just log it twice, both indexed and non-indexed:

event Log(string a, string indexed indexedB, string b); Log("a", "b", "b");

Query For Logs Efficiently

How can we find all the logs whose first topic matches “0x000…001”? Naively, we can start from the genesis block and re-execute every single transaction, and see if the logs generated match our filtering condition. This is no good.

As it turns out, the block header includes enough information for us to quickly skip over blocks that don’t have the logs we want.

The block header includes information like the parent hash, the uncles hash coinbase, and a bloom filter for all the logs generated by the transactions included in this block. It looks like:

type Header struct { ParentHash common.Hash `json:"parentHash" gencodec:"required"` UncleHash common.Hash `json:"sha3Uncles" gencodec:"required"` Coinbase common.Address `json:"miner" gencodec:"required"` // ... // The Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list

Bloom Bloom `json:"logsBloom" gencodec:"required"`

}

https://sourcegraph.com/github.com/ethereum/go-ethereum@479aa61f11724560c63a7b56084259552892819d/-/blob/core/types/block.go#L70:1

The bloom filter is a fixed 256 bytes data structure. It behaves like set, and you can ask it whether a topic exists in it or not.

So we can optimize the log query process like this:

for block in chain:

# check bloom filter to filter out a block quickly

if not block.Bloom.exist(topic):

next # block might have the log we want, re-execute

for tx in block.transactions:

for log in tx.recalculateLogs():

if log.topic[0].matches(topic)

yield log

Aside from topics, the address of contract that emits the logs are also added to the bloom filter.

BloomBitsTrie

The Ethereum mainnet has about 5,000,000 blocks in Jan 2018, and iterating through all blocks can still be quite expensive because you’d need to load the block headers from disk.

The average block header being about 500 bytes, you’d be loading 2.5GB of data in total.

Felföldi Zsolt implemented the BloomBitsTrie in PR #14970 to make logs filtering even faster. The idea is that instead of looking at each block’s bloom filter separately, it’s possible to design a data structure that looks at 32768 blocks all at the same time.

To understand what follows, the least you need to know about bloom filter is that storing a piece of data “hashes” it to 3 random (but deterministic) bits in a bloom filter and set them to 1. To check for existence we check whether those 3 bits are set to 1.

The bloom filter used in Ethereum is 2048 bits.

Suppose the topic “0xa” sets the 16th, 632th, and 777th bits of a bloom filter to 1. The BloomBits Trie is a 2048 x 32768 bitmap. Indexing into the BloomBits structure gives us three 32768 bit vectors:

BloomBits[15] => 32768 bit vector (4096 byte)

BloomBits[631] => 32768 bit vector (4096 byte)

BloomBits[776] => 32768 bit vector (4096 byte)

These bit vectors tell us which blocks has the 16th, 632th, and 777th bits of their bloom filters set to 1.

Let’s look at the first 8 bits of these vectors, which might look like

10110001...

00101101...

10101001...

The 1st block has the 16th and 776th bits set to 1, but not the 631th bit.

The 3rd block has all three bits set.

The 8th block has all three bits set.

Then we can quickly find the blocks that match all three bits by applying binary-AND to these vectors:

00100001...

The final bit vector tells us exactly which blocks among 32768 match our filtering condition.

To match multiple topics, we just do the same indexing for each topic, and then binary-AND the final bit vectors together.

See BloomBits Trie for more details on how this works.

Conclusion

In summary, an EVM log may have up to 4 topics, and an arbitrary number of bytes as data. Solidity event’s non-indexed arguments are ABI-encoded as data, and indexed arguments used as log topics.

The gas cost of storing log data is much cheaper than normal storage, so you might consider it as an alternative for your DApp as long as your contract doesn’t need access to the data.

Two alternative design choices for the logging-facilities may be:

Allowing more number of topics, though more topics would decrease the effectiveness of the bloom filters used to index logs by topics.

Allowing topics to be have an arbitrary number of bytes. Why not?

If you like EVM and furry animals, you should follow me on Twitter @hayeah.