Short Introduction to Deanonymization

Most of the current strategies for deanonymization of transactions comes from detecting change addresses and detecting public keys spent by the same transaction. When a transaction spends outputs locked with different public keys, most of the times that means that those public keys where all controlled by the same person or entity.

People could make different transactions, in order to avoid having inputs with different public keys on the same transaction. But users still can be distinguished from each other by the use of clustering techniques, taking in consideration amounts and timing, or via network analysis and watching for what was the first IP that relayed a given transaction (although this requires a more complex setup than just looking at the blockchain).

In order to improve things like security or privacy, I feel that things have to be proven to be broken so there’s more incentive to fix them. I’ve built a proof-of-concept web application that allows some blockchain-exploration, binding together possible addresses owned by the same entity. You can find the code for it here: https://github.com/eordano/privacy and a live demo here: https://eordano.github.io/privacy. The usability of the tool is quite awful, and I’d probably build a better version at some point.

It’s rather easy in general for a human to characterise or guess what’s going on within a transaction. For example, I just started looking at some transactions while writing this post and found some examples:

In 5d09…9995, I bet somebody paid for a small amount using a very large output.

1e4b…feca looks like somebody wanted to try out a P2SH-based wallet, and the wallet uses a strategy to add an output to pay for the fees of the transaction.

This transaction 80ef…65fc looks to me like a payment of 0.1 BTC to somebody, using the same input address as the change address, or maybe saving that amount to cold storage.

66e5…d9a2 suggests that a 0.0017 transaction was typed by a human user and the uneven amount of 0.48938242 went to a change address controlled by the same user.

6a13…f34c0 seems like a transaction splitting a large output into different addresses controlled by the same entity, as the amounts are very similar to each other.

f298…acfd looks like a well-done CoinJoin, which I’ll explain now.

CoinJoin

CoinJoin is a (quite general) schema for improving anonymity that smashes the “same transaction, same user” heuristic, and a very good write down explaining it can be found at https://bitcointalk.org/index.php?topic=279249.0 by Greg Maxwell. What follows is yet another explanation and discussion of this strategy.

The left column are the inputs for a transaction, and the right column are the outputs. If you see this transaction on the blockchain, it looks very likely that a set of users mixed their transactions together, making it quite hard to link the inputs to outputs. If all transactions on bitcoin look like this, it would be very difficult to trace money movements and link addresses to identities.

Basically, CoinJoin could be explained as: “Whenever you need to make a transaction, find a set of people that wants to send the same amount as you, and mix the transactions together”. If done properly, there’s no risk of losing your coins.

Arbitrary transaction values make it harder for CoinJoin participants to find suitable peers for transactions . That’s why most people suggest that we start using power-of-two-valued outputs to make it easier to make CoinJoin transactions.

This also may hide good improvements on privacy: it could diminish the possibility of leaking information about what outputs are change outputs and which ones are a payment to another person. For example, if Alice has an output with a value of 23.33432423 BTC and makes a transaction spending 5 BTC to an address and 18.33432423 to another address, it could be analyzed as a spend of 5 BTC to a third party (as that amount is more likely to have been determined by a human than 18.33432423), and the rest is still under Alice’s control. A similar analysis could be made correlating round values in a fiat currency at the exchange rate at that time, but this is harder to do accurately.

Privacy can be lost by dropping the use of a CoinJoin schema

So, when two parties want to mix two transactions, they simply sort out randomly the outputs on a new transaction, and they leak very little information about whose outputs are whom. This still needs some care in order to prevent leaking information across transactions, for example, if the set of the two green outputs on the picture are spent together in a further transaction an analysis could correlate the flow of those coins.

Rendezvous Servers

Participants need to establish a communication channel outside of the network for this to work. Such a server could know the mapping from inputs to outputs, so it’s not totally anonymous. Luckily, a schema called secure multiparty computation can be used so participants can shuffle the inputs and outputs without revealing information about the inputs (an implementation for this has already been proposed and discussed) so as long as multiple parties are involved, anonymity is preserved. But deanonymization can still occur, if all the other participants collude, or all the other participants are actually the same entity.

In general, very decentralized CoinJoin schemas need to be secured against denial of service attacks, as there must be a first round to assemble the CoinJoin transaction and another round of signing. A party could just not sign the transaction, forcing the rest of the participants to gather again and build a new transaction.

That kind of attack can be mitigated by encouraging some kind of blacklisting of inputs that dropped out of a CoinJoin, or requiring fidelity bonds to participate. This last solution may leak some identity information if logged, but it adds a requirement to a blockchain analysis.

Fees

Who pays for this transaction? It seems that the most simple solution would be that a participant could come out and offer to add an input to pay exclusively for the fee, maybe in order to rush the transaction. This would be the most efficient case, as adding one input per participant could amount to a larger, costlier transaction, and the amounts of such inputs would be very close to dust outputs. Also, a rendezvous server could charge users outside of the transaction and offer to pay these fees, but this would require some trust on the server and potential loss of anonymity.

Final Thoughts

CoinJoin is not the full solution. Not at the moment, and it’s a solution for a problem that doesn’t exist if more people don’t become aware that blockchain analysis can almost completely and accurately deanonymize all transactions made by you. But with a good strategy and some extra thought (plus, more people participating in CoinJoins) looks like a very good solution right now, if we could just fix some of the weird usability around it. It’s not a criminal’s weapon; it’s just that I don’t feel like people should get to know all your financial movements if they know a few of your addresses and look at the blockchain making some educated guesses. And if you do need to reveal that to a third party (say, the IRS), you can always show which addresses where you in control of and whom did you receive money from.

I feel that Bitcoin is not going to be used by companies if the competition can look around and realise who are your suppliers and how much are you paying them. It’s not going to be used by individuals if people realise that all your transactions can be almost certainly known by anybody you transact with.

Further Reads