This article was first published on Lawfare.

The most recent purportedly serious proposal by a Western government to force technology companies to provide access to the content of encrypted communications comes from Ian Levy and Crispin Robinson of the Government Communications Headquarters, or GCHQ, the U.K.’s equivalent of the National Security Agency. Cryptography luminaries such as Susan Landau, Matt Green, and Bruce Schneier have published detailed critiques of this proposal. Indeed, others from EFF have written about the proposal—known colloquially as the “ghost”—and explained why, contrary to GCHQ’s claim, the proposal really is an encryption backdoor with all the attendant security risks.

But even putting aside the persuasive arguments that the ghost presents a security risk to average users, in this post, we detail some previously undiscussed reasons that the GCHQ’s proposal is undesirable from both technical and policy perspectives.

For the purposes of this post, we’re taking the GCHQ authors at their word that they’re proposing a ghost that it is “relatively easy for a service provider to [implement by] silently add[ing] a law enforcement participant to a group chat or call.” Further, we assume that they are “not talking about weakening encryption or defeating the end-to-end nature of the service. In a solution like this, we’re normally talking about suppressing a notification on a target’s device, and only on the device of the target and possibly those they communicate with.”

This ghost, as framed by Levy and Robinson, is very probably detectable in operation.

In fact, we think when the ghost feature is active—silently inserting a secret eavesdropping member into an otherwise end-to-end encrypted conversation in the manner described by the GCHQ authors—it could be detected (by the target as well as certain third parties) with at least four different techniques: binary reverse engineering, cryptographic side channels, network-traffic analysis, and crash log analysis. Further, crash log analysis could lead unrelated third parties to find evidence of the ghost in use, and it’s even possible that binary reverse engineering could lead researchers to find ways to disable the ghost capability on the client side.

It should be obvious that none of these possibilities are desirable for law enforcement or society as a whole. And while we’ve theorized some types of mitigations that might make the ghost less detectable by particular techniques, they could also impose considerable costs to the network when deployed at the necessary scale, as well as creating new potential security risks or detection methods.

(Note: There’s another pretty glaring problem with the ghost proposal that we’re not going to examine here—it only works with text or asynchronous protocols. It’s not immediately clear to us how it could be adapted to real-time audio or video communications. But that’s a discussion for another day.)

Detecting Ghosts with Binary Reverse-Engineering

Proprietary messaging applications have been regularly reverse-engineered to find security flaws; the ghost feature would have to be implemented in code which could be examined by anyone able to download a copy of the messaging tool. If the ghost is activated by a particular function (that adds a user to a chat while suppressing the usual notification of this event), researchers could use existing reverse engineering techniques to try to identify the function in question. Users could then potentially remove that function from their copies of the application, or set a breakpoint so that they receive a different sort of notification whenever the function is used.

Software debugging tools can assist with this process. Even without extensive reverse engineering, coverage and profiling tools can show the “hot spots” within a program and indicate exactly how frequently various portions of the program’s code were activated (if at all). Such tools then make it possible to observe that a part of a program that was previously inactive has recently been active, even without knowing for sure exactly what that part of the program does.

We tested this possibility using a binary code coverage feature in the DynamoRIO platform (originally developed by MIT and Hewlett-Packard). We found that, for example, the drcov tool can observe that a calculator app calls square-root code when the user presses the square-root button, but that this code remains unused when the user presses other buttons. The tool can readily observe this distinction without access to source code (albeit without determining that the code in question calculates square roots). So if ghost-related code is present in every copy of a certain communication app but remains unused most of the time, it would be feasible to detect the anomaly by comparing coverage records. (However, this does not directly prove that the code has a surveillance function, only that previously or normally unused code has started to be used. To be more confident that it relates to surveillance, a researcher would have to reverse-engineer the code in question.)

Reverse engineering could also be used to create variant versions of messaging tools that allow users to continue to verify their communication partners’ keys, simply ignoring any instructions from a messaging service operator to hide the presence of particular keys or devices associated with a conversation. In most ghost proposals, services apparently have to request that users’ devices deliberately hide this information, and then rely on the users’ software to comply with this request. While the official version of a messaging app might indeed be designed to hide the presence of a ghost recipient, the information will still be in the possession of the target user’s device and an alternative or modified app that allowed the ghost to be visible could still be interoperable with the messaging service.

Detecting Ghosts with Cryptographic Side Channels

Different numbers of recipients of an encrypted message, or devices participating in an encrypted session, means different numbers of public key encryption operations, which means different amounts of computation to perform those operations. When there are more recipients, the computer simply has to do more math, consuming more resources in a way that should be detectable.

In this simplified example, we simulated what would happen if Alyssa Hacker sends a PGP-encrypted email to different sets of recipients who are all using 2048-bit RSA PGP keys. Suppose the government asks Alyssa’s e-mail software developer to modify the e-mail software so that it starts adding ghost Bcc: recipients, yet without showing that on-screen. Since RSA encryption performs a lot of multiplication, we used Intel’s pin tool to count the number of times that the encryption software would use the IMUL CPU instruction as a result of sending each message. Here are the results:

From: Alyssa P. Hacker <aphacker@mit.edu> To: John Doe <jdoe@eff.org> → 124138 IMULs Subject: Hi John From: Alyssa P. Hacker <aphacker@mit.edu> To: Marin Mersenne <mersenne@eff.org> → 124163 IMULs Subject: Multiplication is vexation From: Alyssa P. Hacker <aphacker@mit.edu> To: John Doe <jdoe@eff.org> Cc: Marin Mersenne <mersenne@eff.org> → 248153 IMULs Subject: Hi folks From: Alyssa P. Hacker <aphacker@mit.edu> To: John Doe <jdoe@eff.org> Cc: Marin Mersenne <mersenne@eff.org> → 372215 IMULs Subject: Hi folks Bcc: Clyde <ghost@nsa.gov> From: Alyssa P. Hacker <aphacker@mit.edu> To: John Doe <jdoe@eff.org> Cc: Marin Mersenne <mersenne@eff.org> → 496254 IMULs Subject: Hi folks Bcc: Clyde <ghost@nsa.gov>, Jacob Marley <ghost@ghcq.gov.uk>

In this scenario, if Alyssa is logging how much multiplication her computer does, she can easily determine the number of encrypted recipients: here it happens to be the first digit of the number of multiplications (124138, 124163, 248153, 372215, 496254). Her computer has to do the substantial extra mathematical work to encrypt the message to the ghost recipients, even if it doesn’t show their identity in her email software. (To be clear, the tool is determining the amount of multiplication performed by the PGP software, not by the computer as a whole, so there’s no risk of confusion merely because multiple apps are running on the same device.)

Certainly, detecting the number of cryptographic recipients of an email (and confirming whether it matches the sender’s expectations) is simpler in several ways than detecting the number of devices participating in an encrypted group chat (and confirming whether it matches the participants’ expectations). But changes in the number of devices in a group chat would likely be detectable using this method, and if the group members are able to communicate out of band to confirm that none have changed their configuration, then the additional computation might strongly indicate the activation of the ghost.

This is an example of a side channel. In other settings, side channels may be able to recover the actual keys used by cryptographic software; for example, numerous research papers describe ways that a program running on a device can potentially extract secret encryption keys used by another program on the same device, because of detectable ways that the secret key value changes the device’s behavior. Determining that encryption is happening at all is, comparatively, a dramatically easier task than recovering specific key parameters.

While there are some encryption methods that hide how many entities are authorized to decrypt a particular encrypted message, they would require significant changes to the architecture of person-to-person messaging applications. These changes would be a far cry from the “relatively easy” proposals.

Detecting Ghosts by Examining Network Traffic

In some messaging technologies, communications other than message contents themselves (so-called communications metadata) may be transmitted and received in unencrypted form, or may be encrypted in a form that a user can decrypt. For person-to-person messaging tools, the metadata includes communications related to exchanging cryptographic keys, adding contacts, starting encrypted conversations, and changing group membership, among other things. The communications related to these events may have characteristic sizes and timings (as well as being understandable in their own right, if they’re available unencrypted).

EFF has often pointed out that access to various kinds of metadata can reveal sensitive facts about people’s relationships and interactions, and it’s clear that governments have developed tools to analyze metadata to extract this information. More surprising might be the prospect that analyzing their own metadata could show users when unexpected communications occur. Users could observe a kind of interaction with a messaging service that’s characteristic of adding a device to a conversation (such as a characteristic sequence of encrypted messages of particular sizes), and then determine whether or not these event was associated with a notification in the user interface. If not, they could infer that the application was hiding information from them. It might be difficult to conceal this information channel because the service has to actually communicate with the user’s device in order to cause it to respond in various ways. In some messaging protocols, the kinds of communication between the device and the service may be different enough to identify their general nature and then see whether or not the app responded to an event in an expected or unexpected way.

Detecting Ghosts by Reviewing Crash Logs

When computer operating systems or programs (including mobile apps) crash, they are often configured to create a log containing information about what the app was doing, and the state of the computer and its memory leading up to the crash. The purpose of such logs is to permit developers—both of the app and of the operating system—to figure out what went wrong so that the bugs can be identified and fixed. Those logs are often automatically shared with the developers as well.

While we can’t create a proof of concept for detecting the ghost by examining crash logs, since we don’t actually have a ghost function implemented on our messaging clients, we think it’s likely that the crash log of an app with an active ghost will appear different in observable ways from a log from the same app without the active ghost, due to the additional functionality being called. An engineer at say, Facebook, Apple, Samsung, or Huawei familiar with reading crash logs may be able to recognize that a particular log looks different or somehow off. (For example, a list of functions that were called leading up to the crash could be different from a typical list, or that data structures related to participants in a chat appear inconsistent with one another.) Further investigation of what was different for that particular crash could lead the engineer reading the log to figure out that the ghost was active on the user’s device at the time of the crash.

This method of detecting the ghost is different from the rest in that it could lead, to not only the target of the surveillance, but indeed third parties located in potentially diverse jurisdictions around the world, and not bound by any gag order, to discover an active wiretap and potentially alert the target.

Some of the above potential detection systems would be more difficult to conduct on particular platforms. For instance, it might be more difficult to analyze timing or reverse-engineer or patch binaries on Apple's iOS devices, because of their locked-down nature. But log and network analysis is relatively independent of the user-accessibility of the target device; moreover, a ghost becomes detectable and circumventable if you use it on, say, an Apple laptop or an Android phone, seems unlikely to achieve its covert aims.

Conclusion

In the original Ghostbusters movie, Dr. Egon Spengler was able to invent a “P.K.E. meter” that could reveal supernatural energies even when the ghosts or entities responsible for those energies weren’t in view. And in all sorts of ghost stories, people can perceive effects of a ghost’s presence (a chill in the air, a breeze, strange sounds) even when they can’t see the ghost directly. In other words, fictional ghosts often make their presence known, even while remaining invisible.

Government ghost listener proposals for eavesdropping are eerily similar. The user interface of an application may not change—so there’s no effect at first to the naked eye—yet the application’s behavior will almost certainly change in a variety of ways that are detectable with the right tools. Since the messaging applications in question run on the users’ own devices, the users are well-positioned to make the necessary observations about what their devices are doing.