You might have been surprised when you saw a weird token transfer coming from your account. Your first reaction was that someone hacked you.

After a few seconds of Googling you realized that it’s just a trick used by the token creators — any contract can emit any event with any arguments. In this case, the token contract emitted a token transfer, in which the “from” address is your account. Etherscan then captured this event and attached to your address.

Identifying ERC20 Transfer Spam with BigQuery

Below is an SQL that returns spammy transfers:

#standardSQL

with transfers_and_authorizing_transactions as (

with external_accounts as (

select distinct(from_address) as address

from `bigquery-public-data.ethereum_blockchain.transactions` as transactions

),

authorizing_transactions as (

select from_address, to_address, min(block_number) as block_number

from `bigquery-public-data.ethereum_blockchain.transactions` as transactions

group by from_address, to_address

)

select token_transfers.transaction_hash as transaction_hash,

token_transfers.log_index as log_index,

authorizing_transactions.block_number as authorizing_block_number

from `bigquery-public-data.ethereum_blockchain.token_transfers` as token_transfers

join external_accounts on external_accounts.address = token_transfers.from_address

left join authorizing_transactions

on authorizing_transactions.from_address = token_transfers.from_address

and authorizing_transactions.to_address = token_transfers.token_address

and authorizing_transactions.block_number <= token_transfers.block_number

where token_transfers.from_address != '0x0000000000000000000000000000000000000000'

)

select transaction_hash, log_index

from transfers_and_authorizing_transactions

where authorizing_block_number is null

Run it in BigQuery

The query is based on the assumption that if there is a token transfer from an external account X, then one of the following conditions must satisfy:

There was a transaction from X calling the approve() method of the token in some past block.

method of the token in some past block. The transaction in which the transfer is contained comes from account X.

More generally, if there is a token transfer from an external account X, then X must have interacted with the token contract before (i.e. sent a transaction to it). In other words, if X never interacted with the token, then the transfer can’t be authorized by X and must be a spam.

This query returns spammy tokens ordered by the number of transfers. The top ones are:

The first 2 are already marked as spam on Etherscan. The others can be marked using the simple heuristic described above.

Follow us on Twitter https://twitter.com/EthereumEtl.

Also read Ethereum in BigQuery: how we built this dataset.