The Blockchain Graveyard is a list of Bitcoin exchanges which have been hacked. It is growing constantly, not only sabotaging the general public trust in cryptocurrencies, but also ruining companies, customers and investors alike.

The root causes of these hacks are various and often complex, but most of these breaches could have been prevented — or at least severely limited — with a best practice security approach.

And when we say best practice, we are referring to what banking institutions, telecoms and governments have been relying upon for the last decades: secure hardware.

Hardware Security Modules

A hardware security module (HSM) is a physical computing device that safeguards and manages cryptographic keys, and provides secure execution of critical code. These modules come in the form of a PCI card, or an external rackable device which can be directly connected to the network. HSMs have built-in anti-tampering technology which wipes secrets in case of physical breach. They are architectured around secure cryptoprocessor chips and active physical security measures such as meshes to mitigate side channel attacks or bus probing. These devices are heavily used in the banking industry and in all verticals where critical secrets must be protected.

Bitcoin exchanges and HSMs

The only mission critical industry which is not using HSMs is… the Bitcoin exchange industry (with the exception of Gemini). For some unknown and mysterious reasons, hot wallets security architectures are based on ad hoc solutions built around off the shelf hardware and thus totally uncertifiable against Common Criteria or FIPS 140. When you deal with private keys that you cannot revoke, and whose compromise would result into massive losses, you just can’t have them on a regular server architecture.

Hot wallet vs Cold wallet

Most of the exchanges keep the vast majority (97%+) of their assets in cold storage. The keys are totally offline, out of reach of hackers. This is the best protection you can have. However, to be able to automate payouts and function normally, you need hot wallets. These wallets are controlled through APIs and receive orders to sign outgoing transactions to pay customers wishing to withdraw their funds. Because you need to be able to automate these wallets, the keys must be live, and are therefore at risk.

HSM based security architecture for exchanges

In this section, we are going to present the Ledger recommended HSM based architecture to secure an exchange’s hot wallet.

Here are the different modules/services in play:

Exchange engine: requests payment orders (customer asks for a withdraw)

Exchange business logic: API with a view of all customer’s balances, soft/hard withdrawing limits and payment history

Hardware Security Module: PCI card connected to a server in the exchange’s datacenter (example: Safenet ProtectServer HSM)

Ledger Blue: secure device protected by PIN code and kept in a safe. Accessible only by top management (CEO/CTO).

2FA app: external second factor channel, on user’s phone (containing an assymetric key)

The HSM itself is architectured around the following units:

BOLOS core: this is the Ledger OS, safeguarding the root seed from which all keypairs are derived, and exposing API so internal business apps (such as a Bitcoin wallet or a matching engine consistency check) can operate. Those apps are tested and signed offline, and cannot be altered when the system is operating live.

Rate limiter: sets hard limits on the velocity of what the HSM is authorized to sign (for instance: 1000 BTC / hour, 15000 BTC / day). This is a very important number: it would ultimately decide the maximum amount of loss in case of total system compromise. The only way to modify the rules of the limiter is through an authorization signed by the Ledger Blue device.

2FA channel: each signature request must be validated by this internal plugin. It will requires two challenge approvals: one from the exchange business logic (“send me your new business data so I can check if it is consistent with the previous system state”), and one from the user itself (“do you confirm that you want do to that?”).

Bitcoin wallet app: contains all the logic to build and sign transactions from a UTXO pool (could be replaced by an ETH wallet or any other crypto)

Provisionning the security system

Initialization of the HSM and its modules must be done according to the following process:

HSM is in provisioning mode : it is flashed with the BOLOS core and all its plugin (attestation of firmware is done using the HSM internal logic and previous setup)

Provisioning: a 256 bits master seed is generated by the HSM. It can be split using standard mechanisms such as Shamir’s Secret Sharing and displayed as a set of BIP 39 words to different key officers. A paper backup is made and safeguarded according to best practice by each security officer.

Pairing: a secure key exchange is done with the Ledger Blue. From this moment, only this device will have authority on the rate limiter. Initial business logic data related to the matching engine can also be provisioned at that point.

Production: HSM is moved to the production facility and switched to live mode. From this moment, provisioning mode is disabled and any attempt to physically move around or physically attack the HSM would wipe the seed.

Flow of a payment request

Let’s say that a user wants to withdraw 50 BTC (all its balance) to a Bitcoin address of her choice. She logins on the exchange and fills a form with a withdraw request. The following process then occurs:

A payment request of 50 BTC with the payout address and all customer meta data is sent from the exchange engine to the HSM through an API call

The HSM checks with the rate limiter if it can proceed with a 50 BTC payout; it gets a greenlight (or not)

The HSM now requests the 2FA plugin to issue two challenges: one to the exchange business logic, and another one to the user’s app

The business logic receives the withdraw information data, checks if all matches in its database (user is authorized to withdraw, limits are within AML rules, enough funds are available, etc); and returns the data that was previously certified by the HSM along with the new transfer information

The user is pushed on her 2FA app (which she had downloaded before, and paired initially to the HSM through the exchange interface). She sees a 50 BTC request with her Bitcoin payout address. She confirms, and the app signs the challenge with its private key

The 2FA plugin greenlights the payout, it is then forwarded to the Bitcoin wallet which signs the transactions (using UTXO from its pool) and then gives it to the exchange engine which broadcasts it and issues internal orders to update its accounting books

What would be the worst case scenario for a hack?

Let’s directly assume that the attacker would gain full control of the entire infrastructure of the exchange (like an inside job). By injecting false user pairing, the attacker can easily trick the 2FA user channel (which is more a protection against a local hack of the user itself). Still, injecting false market data into the HSM would require the attacker to proceed carefully — if the HSM fails its periodic consistency checks, it will shut down the signing plugin until reactivated by an administrator. The last line of defense is the rate limiter: the hacker won’t be able to withdraw more than the hard limit set in the HSM (which cannot be changed, the Leger Blue being out of reach). After a few hours, customers start to complain about empty accounts, and the security team can shutdown the HSM as an emergency response. The quicker the “community” detects something is wrong, the quicker the hack can be stopped.

The worst case scenario is a loss of what the rate limiter allows per hour, multiplied by the number of hours the hacker managed to stay undetected.

Safenet HSM PCI card in a rackable server (implementation for Ledger’s firmware key management)

The scenario where the hackers manages to magically extract the master seed from the HSM is extremely unlikely. These security modules are carefully tested and most exploits have been limited to abuse or misunderstandings of the administrative interfaces. Of course, one can always say that nothing is unhackable, and this would be true; but the difficulty to achieve such a feat is a few orders of magnitude higher than “just” taking control of a full IT architecture.

Additionally, we could also factor situations where the Ledger Blue is in the possession of the hacker, or that he simply gets access to the seed backup. Because of the tendency of humans to do stupid mistake, this could in fact be the way it would go… That is why even the best of security technology is nothing without common sense and carefully audited internal processes.