Bitcoin address clustering is a process that attempts to de-anonymize bitcoin users via discovering all addresses generated by a single user, via means of analysis of information derived from the blockchain. Observing the peer-to-peer P2P network represents also another information source that aids in the de-anonymization of bitcoin users. Combining blockchain in addition to P2P network information can promote the bitcoin address clustering process.

Previous research studies have presented heuristics for bitcoin address clustering and proved that it is possible to link multiple addresses to a single user. Moreover, it has been proven that in many instances, it is possible to link a user’s bitcoin address to information derived from additional sources that helps in the revelation of the user’s identity. In the worst scenario, this information can be utilized to correlate all transactions of an identified user.

Before being stored onto the blockchain, transactions are broadcast across a decentralized P2P network. Via connecting to and monitoring the network, additional information regarding the sender of a transaction can be obtained. Nevertheless, with bitcoin users using VPNs/proxy servers or online wallet services, it is unclear whether or not information obtained, via joining the network and monitoring the normal flow of messages, could be utilized to de-anonymize bitcoin users.

The major challenge is that even with the presence of blockchain based address clustering and network derived information, there is no guarantee that tracing back a bitcoin user to his/her real world identity can be successful in 100% of cases.

Bitcoin Address Clustering Heuristics:

All confirmed transactions form a graph known as the “transaction graph”, which is plotted using all confirmed transactions as vertices, and adding a single edge from every output to the spending input. The transaction graph represents an acyclic, directed, append-only graph which reflects bitcoin ownership. If you own some bitcoins, then you have the right to spend these coins. Practically speaking, bitcoin ownership equals possession of a particular private key that matches the public key that was broadcast along with the transaction output that gave ownership to these coins. Accordingly, to issue a valid transaction, the owner of the coins has to sign the transaction’s spending input using the specific private key that matches the coins’ public key which was broadcast along with the proceeding transaction output.

I will present a few heuristics that can be used for Bitcoin address clustering throughout this series of articles:

The bitcoin address clustering procedure processes a sequence Ớ = { S 1 + S 2 +……….S n } of the group of all addresses A, while S 1 + S 2 +……….S n represent the resultant clusters. To do this. it processes all executed transactions in their original temporal sequence. For every given transaction t, heuristics should compute the partition Ṯ t = { Ṯ 1 t +………..{ Ṯ m t } that includes all of the input as well as the output addresses of t, which equals the outputs (t) in addition to the inputs (t). So, this specific partition of transactions Ṯ t denotes which of the addresses were used in the transactions executed by a single user.

Heuristic 1: (Multi-input transactions):

If a transaction spends coins originating from multiple inputs, the transaction has to be signed using the appropriate private keys that match the public keys of all inputs. If we assume that a transaction was executed by one user, then this user owns all addresses that were included in the inputs of this transaction.

For a given transaction t, the transaction partition denoted by this heuristic is:

Ṯ t = { inputs(t), {o 1 (t)},………{o 1 , {o |outputs(t)| (t)}}

This is the main heuristic that we will apply initially for the clustering procedures that we will present along this series of articles. The heuristic yields only false positive results, if the assumptions were not correct. This can happen when the owner gives access to his/her coins private keys to an exchange, or when transactions are executed by multiple users in a decentralized manner.

References:

Reid, F., Harrigan, M.: An analysis of anonymity in the bitcoin system. In: Security and privacy in social networks, pp. 197–223. Springer (2013)

Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G.M., Savage, S.: A fistful of bitcoins: characterizing payments among men with no names. In: Proceedings of the 2013 conference on Internet measurement conference. pp. 127–140. ACM (2013)

Image source: Flickr