A primer on Bitcoin privacy

Bitcoin is neither completely anonymous nor completely transparent. The Bitcoin privacy conundrum exists in a grey area where the unmasking of a user’s financial activity ultimately depends on the capabilities of the adversary and the sophistication of the user and their choice of tools. There is no perfect privacy solution for any activity on the Internet, and in many cases, privacy-conscious choices come with tradeoffs to both cost and ease-of-use where no one-size-fits-all solution exists. Moreover, privacy is never a static thing but evolves continuously and in response to the battle between those who build tools to protect privacy and those who build tools to destroy it.

The Bitcoin protocol itself evolves over time, which can lead to dramatic changes in its privacy properties. Changes to the core protocol are seldom simple choices between privacy and transparency alone, but more often come packed with changes to the security, scalability, and backward-compatibility of the software as well. Historically, the trend and ethos within the Bitcoin community has always favored privacy over transparency, but more conservatively so compared to other cryptocurrencies where privacy is the primary focus.

As a result, activists or journalists who are considering using bitcoin to escape the prying eyes of an authoritarian government or a corporation need to understand what type of traces they leave when they’re using it and whether the privacy nature of bitcoin is sufficient for their needs. However, achieving this understanding requires some amount of effort.

Tracing transactions

When you transact on the Bitcoin network you leave two types of traces. These can be categorized into “what’s on the blockchain” and “what’s not on the blockchain”. The information that is on the blockchain reveals no direct link between your identity and your transactions, but it does reveal information that can link your transactions to each other. What does link your identity to your transactions are the things in the second category: “what’s not on the blockchain”.

What’s not on the blockchain

When you transact on the Bitcoin network, you are sometimes sending or receiving money to/from some entity that knows who you are. That entity will then have outside-of-the-blockchain-knowledge that links your identity to a transaction.

When you combine this fact with the other fact that your transactions can be linked to each other, the result is that motivated entities can sometimes figure out how you’re using your bitcoins, how much you have and who you’ve been transacting with.

There are also countless ways you could be linked to a transaction even without having transacted with an entity that knows who you are, since Bitcoin transactions are typically sent in unencrypted packets over the Internet and the source IP address can be pinpointed through various means. Bitcoin transactions sent via full nodes such as Bitcoin Core require some triangulation or targeted traffic sniffing in order for the source IP address to be estimated, whereas other “light” wallets such as mobile wallets (Mycelium, Blockchain Wallet, Coinbase Wallet) will often broadcast transactions through company-run servers that can see your IP address directly and your full transaction history. The same is true for most hardware wallets (Ledger, Trezor) in their out-of-the-box setups.

Geolocation IP databases can often roughly approximate your physical location using your IP address. You can test it out yourself using this link, then enter the coordinates you get into an interface like Google Maps. More importantly, your IP address reveals your Internet Service Provider (ISP), which in turn knows the real-world identity of the owner of your IP address and often has a legal obligation to store this information for several months.

Even if you are using a public WiFi network to transmit your transactions, you could still accidentally associate your real identity with that IP address from the websites you visit and the background services your device connects to. Your Dropbox application will gladly connect to Dropbox’s company servers when you start your laptop which will associate that IP address with your Dropbox account in Dropbox’s server logs. The same thing will happen when you browse to a personal account on any website. Even if you don’t visit any personal web accounts, cookies stored on your laptop can reveal who you are to the website you browse to through your cookie’s association to your previous browsing history. Many websites allow third parties to track users like this for analytics purposes — Google alone is estimated to track users across 80% of the sites of the entire web.

Even if you clear your cookies, website operators can track you across their different sites as long as your browser fingerprint is unique and associate your IP address to your identity that way. And even if you have no services running and avoid browsing altogether, your device’s MAC address could get exposed to the network provider which could be linked to your identity using sophisticated methods. So, even if your IP address doesn’t lead back to you via an ISP record, you might still leave other traces that do when you’re using your personal devices.

The worst category for privacy is of course when using third-party services that implement know your customer (KYC) practices as your Bitcoin wallet, as these services will keep logs of all your transactions and your real-world identity.

You could also be linked to a Bitcoin address or transaction just by searching for it using web-based tools since there usually aren’t that many people other than you who are going to be looking up your transactions on the web for no good reason. Keep this in mind as we move to the next segment. Other data that isn’t on the blockchain but can easily be logged about your transaction is the approximate time it was broadcast to the network.

The current known best method to hide your source device and IP address when retrieving information about transactions or when transmitting transactions is to leverage Tor hidden services. Many wallets including Bitcoin Core will provide this as a configurable option while others have it built-in. The Tor browser can similarly be a useful tool for your web-based Bitcoin-related activity as it, in addition to hiding your IP address, clears cookies upon each exit, prevents third-party cookies and is immune to most browser fingerprinting techniques.

What’s on the blockchain

A simple way to begin understanding what type of information is revealed by the Bitcoin blockchain is to use a block explorer. For this exercise, we’ll use the open-source explorer blockstream.info.

The most recent block at the time of writing (#563899) in the Bitcoin blockchain contains 2122 transactions. Let’s look at what a randomly chosen transaction reveals.

Transactions contain inputs and outputs and are identified by transaction IDs (seen at the top in the image above). If your Bitcoin wallet has sent a transaction, each transaction will be associated with one such identifier.

From a high-level view, what is revealed about this transaction is the following:

The approximate time the transaction was mined (from the block header)

The addresses bitcoins were sent to and the amounts sent (i.e. the “transaction outputs”)

The source of the funds for the transaction (i.e. the inputs)

Let’s look at each of these items individually for the transaction shown above, e70c2ed31c05fbf2865a15a696a7ca0cb8f3afef92c34f4e41051dc2356827c8 .

Time

Transactions are not timestamped, but blocks are. Block timestamps are not necessarily precisely accurate, but assuming a majority of miners are reporting time honestly, all blocks are bound to be reasonably accurate within a few hours range. For the blocks mined by the honest miners, they’ll be precisely accurate. This doesn’t mean that the block timestamp is necessarily accurate within a few hours range to its transactions’ broadcast times however, since it can sometimes take a lot longer for a transaction to be included in a block. Some block explorers complement data this by displaying the time they first saw a transaction on the network to give a more accurate view of transactions’ broadcast times.

The approximate time when the transaction above was included in a block can be derived by looking at the block header (in our case it’s block #563899 with the timestamp 2019–02–20, 14:45 UTC).

The addresses bitcoins were sent to and the amounts sent

The receiving addresses in this transaction are:

There is more to an address than what meets the eye. It’s easy to think of Bitcoin addresses as “hard-to-read email addresses but for bitcoins”, but an address isn’t always a simple pointer to a certain user’s cryptographic key-pair. What addresses are in reality, are cryptographic descriptors of the spending rules for the next time someone wants to move those bitcoins.

For example, if you send bitcoins to 37k7toV1Nv4DfmQbmZ8KuZDQCYK9x5KpzP , the configuration of this address is such that you’re not sending bitcoins to an owner of a particular private key, but rather to a spending rule that releases the coins to anyone who can provide two different strings that have the same SHA-1 hash (this would mean that the SHA-1 hash function is broken, which it was in 2017— so don’t send anything to that address!). What’s good to note is that since many address formats used today are hashed when we send bitcoins to them, we typically can’t tell what those spending rules are until someone spends bitcoin from that address, as they need to reveal what was hashed in order to do so.

In our example transaction, the blockchain reveals that bitcoins have been spent from both addresses, so the spending rules for those addresses are known. 32Z63LVtUERdEEwz275JHt3o4cewPfE8YC was revealed to be a 2-of-2 multisignature address when it was spent from in the transaction f491dfe9867c36e85950116a90a6128060d6070866ad0f3598d70d146750162f . We’ll look at exactly how that information is revealed in the next section.