[messaging] Best practices (if any) for backing up message key data server-side

Hi all, We're currently tackling the problem of backing up message keys in Matrix.org's end-to-end encryption architecture. The aim is to give users a way to recover their message history if they only have one client app (aka 'device') which they then lose. For context: Matrix's E2EE strategy is that each device in a chatroom establishes a 1:1 Double Ratchet between each other in a full mesh (using the Olm ratchet: https://matrix.org/docs/spec/olm.html). Each device then maintains a simpler hash ratchet (Megolm: https://matrix.org/docs/spec/megolm.html) which it uses to encrypt sequences of messages it sends to the other devices in the room via Matrix (HTTPS+JSON). The state of each device's megolm ratchet (its "megolm key") is sent to all the other devices in the room over the secure 1:1 Olm channel, such that they can decrypt the messages and message history as long as they have the necessary megolm session keys. The sessions are regularly re-established to avoid reusing the same key throughout the lifetime of the room (especially as users join/part the room). So far we let users manually export/import their megolm keys for a given device as a passphrased blob (HMAC'd AES-256-CTR, using a PBKDF2 derived key from the passphrase). We've also just added the ability for users to sync megolm keys on demand between their own trusted devices via so-called "keyshare requests" over the Olm channel. However, this fails for the scenario where the user is logging into a new device but doesn't have any other active devices online (e.g. having lost them, or because they're turned off, etc). So we've been trying to establish the best approach for *optionally* backing up the keys serverside. The options we've considered so far are: 1. Prompt the user for a passphrase at login (or launch?), which is stored to encrypt the megolm keys and sync them to the server. If the client is missing any megolm keys for whatever reason it can retrieve them from the server. The disadvantage is the bad UX of needing the user to remember and enter a passphrase whenever they login (as well as doing a more normal login/password sign-in), and the fact a passphrase-equivalent needs to hang around on the client. 2. Generate a recovery keypair for the account, and give the private key to the user as a 'recovery code' to keep safe. We sync the public key between the user's verified devices, and they encrypt the megolm keys with the public key and store them on the server. If the user has a disaster and needs to recover the keys, they enter their 'recovery code' and sync the keys back to their client. This has the advantage of not storing this master private key anywhere (other than out-of-band by the user), and only prompting the user when things are going wrong. However, it means the server-side keys can't be used to transparently recover missing keys on an ad hoc basis, and the UX of storing and entering long 'recovery codes' is perhaps questionable. 3. Same as option 1, but we sync the passphrase-equivalent between the user's verified devices over the Olm channel. This means trusted devices magically get access to the history keys stored on the server - but means that we are enthusiastically copying an unprotected master key between devices (albeit trusted devices), which feels dangerous. However, we are effectively doing a subset of this today already when we transfer specific megolm keys between devices using keyshare requests. I've been going around in circles on this, and given the whole idea of "storing private keys serverside" generally rings alarm bells, I thought I'd ask for opinions from the wider community before we screw something up. Feedback on the overall scheme would be appreciated too: it feels slightly wrong that we're going through all the hassle of Olm and Megolm ratchets only to then go and deliberately store message keys or master recovery keys in order to decrypt history. (That said, it's worth noting that rooms can theoretically be configured to deliberately discard old session keys if PFS is more important than serverside history). thoughts welcome! thanks, Matthew -- Matthew Hodgson Matrix.org