Transcription

1 Vuvuzela a scalable private messaging system David Lazar Jelle van den Hooff, Matei Zaharia, Nickolai Zeldovich

2 Motivation Bob (Oncologist)

3 Encryption Z28gUGF0cmlvdHMhCg c2vhagf3a3mgc3vjawo Bob (Oncologist)

4 Problem: metadata Ex-boyfriend Z28gUGF0cmlvdHMhCg Pfizer Lawyer c2vhagf3a3mgc3vjawo AA Bob (Oncologist) Hospital Lawyer Snowden NY Times Guardian White House

5 Goal: hide metadata Vuvuzela Ex-boyfriend Pfizer Lawyer AA Bob (Oncologist) Hospital Lawyer Snowden NY Times Guardian White House

6 Goal: hide metadata Vuvuzela Ex-boyfriend Pfizer Lawyer AA Bob (Oncologist) Hospital Lawyer Snowden NY Times Guardian White House

7 Goal: scalability Vuvuzela Ex-boyfriend Pfizer Lawyer AA Bob (Oncologist) Hospital Lawyer Snowden NY Times Guardian White House

8 Tor is scalable Bob Tor network

9 Tor is insecure Bob Tor network

10 Tor is insecure Low-Cost Traffic Analysis of Tor Steven J. Murdoch and George Danezis University of Cambridge, Computer Laboratory 15 JJ Thomson Avenue, Cambridge CB3 0FD United Kingdom Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries Abstract Other systems, based on the idea of a mix, were developed to Aaron carry lowjohnson latency traffic. 1 Chris ISDNWacek mixes [33] 2 Rob Jansen 1 Micah Sherr 2 Paul Syverson 1 Tor is the second generation Onion Router, supporting propose a design that allows phone conversations to be the anonymous transport of TCP streams over the Internet. Its low latency makes it very suitable for common terns to{aaron.m.johnson, anonymise web traffic. rob.g.jansen, A based these {cwacek, anonymised, and 1 U.S. web-mixes Naval Research [6] follow the Laboratory, same design Washington pat- DC 2 Georgetown University, Washington DC tasks, such as web browsing, but insecure against trafficanalysis attacks by a global passive adversary. We present and is running at the University of Dresden. These ap- ideas, the Java Anon Proxy (JAP) 1 has been implemented new traffic-analysis techniques that allow adversaries with proaches ABSTRACT work in a synchronous fashion, which is not well The traffic correlation problem in Tor has seen much attention only a partial view of the network to infer which nodes are adapted for the asynchronous nature of widely deployed We present the first analysis of the popular Tor anonymity network in the literature. Prior Tor security analyses often consider entropy being used to relay the anonymous streams and therefore TCP/IPthat Circuit networks indicates[8]. Fingerprinting the security of typical users Attacks: against reasonably realis- Onion adversaries Routing the project Tor network has been inworking the underlying on stream- Internet. Our or similar statistical measures as metrics of the security provided greatly reduce the anonymity provided by Tor. Furthermore, Thetic by the system at a static point in time. In addition, while prior Passive Deanonymization of Tor Hidden Services we show that otherwise unrelated streams can be linked level, results low-latency, show that high-bandwidth Tor users are faranonymous more susceptible communications than [35]. indicated Their by prior latestwork. design Specific and contributions implementation, of the paper to compromise metrics of security may provide useful information about overall back to the same initiator. Our attack is feasible for the usage, they typically do not tell users how secure a type of behavior is. Further, similar previous work has thus far only considered adversary anticipated by the Tor designers. Our theoretical attacks are backed up by experiments performed the sary model that includes Tor network relays, autonomous systems Tor [18], include has many (1) a model attractive of various features, typical including kinds offorward users, (2) security and AlSabah support for, David anonymous Lazar servers., Marc These Dacier features,, and Srinivas Devadas an adver- adversaries that control either a subset of the members of the Tor Albert Kwon, Mashael (ASes), Internet exchange points (IXPs), and groups of IXPs drawn network, a single autonomous system (AS), or a single Internet exchange point (IXP). These analyses have missed important char- deployed, albeit experimental, Tor network. Our techniques and its ease of use, have already made it very popular, and should also be applicable to any low latency anonymous from empirical study, (3) metrics that indicate how secure users are Massachusetts atestingnetwork,availableforpublicuse,alreadyhas50 Institute of Technology, network. These attacks highlight the relationship between over a period of time, (4) the most accurate topological model to acteristics of the network, such as that a single organization often Qatar nodes Computing date acting of ASes as onion Research and IXPs routers as Institute, they (as of relate November Tor usage 2004). and network configuration, Qatar aims touniversity, protect (5) a novel the anonymity realistic Torof path itssimulator users from (TorPS), non- and (6) controls several geographically diverse ASes or IXPs. That organization may have malicious intent or undergo coercion, threatening the field of traffic-analysis and more traditional computer Tor security issues, such as covert channel analysis. Our research also highlights that the inability to directly observe approach is useful to explore alternatives and not just Tor as curglobalanalyses adversaries. of security Thismaking meansuse that ofthe all the adversary above. To has show thethat our users of all network components under its control. Given the severity of the traffic correlation problem and its security implications, we develop an analysis framework for evaluat- ability to observe and control some part of the network, but network links does not prevent an attacker from performing rently deployed, we also analyze published alternative path selection This paper sheds light on notcrucial its totality. weaknesses algorithm, Similarly, in Congestion-Aware the adversary As a result, is Tor. assumed many We create tosensitive be an ca- that allow of model controlling us to break services are empirical ingonly the security accessi- identify ofthrough Tor novel nodes. Tor. of various user behaviors on the live Tor network traffic-analysis: the adversary can usedesign the anonymising of hiddennet- work as an oracle to infer the traffic load anonymity remote of hidden nodes service these clients assumptions, thatand it toooperators in order to perform traffic-analysis. servicespable of Tor congestion, somethe fractionble attackby Prominent vectors, making examples and show and include show how human to concretely apply this framework by performing a Tor is more thevulnerable designers passively. In particular, we show employ that only theminimal circuits, mixing paths of the ileaks stream andcells Globalleaks, that are re- tools for anonymous against the messag- threat of complete deanonymization. To enable such an network than rights of Tor previously and believe whistleblowing indicated. it safe to organizations comprehensive such as Wik- evaluation of the security of the Tor network [41] established through the Tor layed, network, Categories therefore usedlowering to communicate with hidden services analysis, we develop a detailed model of a network adversary that and the Subject latency ing such Descriptors overhead as TorChat of theand communication. exhibit a very different be- like Silkroad and Black Market Reloaded. Even many Bitmessage, and black markets includes (i) the largest and most accurate system for AS path inference yet applied to Tor and (ii) a thorough analysis of the threat C.2.0 [Computer-Communication Networks]: General Security 1 Introduction havior compared to a general This circuit. choice We and protection of propose threat model, two with non-hidden its limitation services, of the like adversaries Facebook and DuckDuckGo, of Internet exchange points and IXP coalitions. We also develop attacks, under two slightly different threat powers, models, has been that a subject recently of controversy have started in providing the hidden versions realistic metrics of their that inform this analysis, considering the network could identify a hidden service client or operator using websites to provide stronger anonymity guarantees. Anonymous communication networks were first introduced by David Chaum in his seminal paper [10] describing cused Keywords we can identify the anonymity community, yet most of the discussion has fo- topology as it evolves over time, for example, as new relays are these weaknesses. We found that on assessing whether these restrictions of attackers introduced and others go offline. users involvement with hidden services the mix as a fundamental building block for anonymity. A capabilities Anonymity; with are realistic metrics; more than That said, over the past few years, hidden onion or not. routing Our analysis services shows that 80% of all types of users may be deanonymized wild [12, by 28], a relatively moderate Tor-relay adversary within six We leave this discussion 98% true positive rate and less than 0.1% false positive have witnessed various active attacks in the mix acts as a store-and-forward relaythathidesthecorre- aside and instead show that traffic-analysis attacks can be rate with the first attack, and 99% true positive rate and resulting in several takedowns [28]. To months. examineour theresults se- also show that against a single AS adversary successfully mounted against Tor even within this very re- Bob

11 Related work Tor Pond Scalability Riposte [Oakland 2015] Dissent [OSDI 2012] Privacy

12 Contribution Tor Pond Vuvuzela Scalability Riposte [Oakland 2015] Dissent [OSDI 2012] Privacy

13 Contribution Vuvuzela: the first private messaging system that hides metadata from powerful adversaries for millions of users Vuvuzela scales linearly with the number of users Differential privacy for millions of messages per user for one million users 37s end-to-end message latency 60,000 messages / second throughput Good match for private text-based messaging

14 Vuvuzela overview Handful of servers arranged in a chain Users send/receive messages through the first server Bob Server 1 Server 2 Server 3 Charlie Last server decides who gets what messages and sends them back down the chain

15 Vuvuzela s two protocols Dialing protocol: Initiate conversation session between two users Bob Conversation protocol: Charlie Exchange messages between two users

16 Threat model All but one server are compromised Adversary is active (can knock users offline, tamper with messages, etc) Bob All users might be malicious Charlie (besides you and your friends) PKI: users know each other s keys

17 Metadata privacy Scenario 1 Scenario 2 Scenario 3 Bob Charlie Bob Charlie Bob Charlie Vuvuzela Vuvuzela Vuvuzela

18 Metadata privacy Scenario 1 Scenario 2 Scenario 3 Bob Charlie Bob Charlie Bob Charlie Vuvuzela Vuvuzela Vuvuzela??? 47D1FC9A traffic analysis hacked servers

19 Approach to scalable privacy Use efficient cryptography to encrypt as much metadata as possible. Add noise to metadata that we can t encrypt. Use differential privacy to reason about how much privacy the noise gives us.

20 Dead drops prevent users from talking directly Bob Dead drop: a place to leave a message that another user can pick up Charlie

21 Talking via dead drops Dead drop: zzp8ns0nrxt3g9efb6c Message: Hi Bob! How s it going? Bob Charlie Dead drop: zzp8ns0nrxt3g9efb6c Message:

22 Conversation protocol Dead drop: zzp8ns0nrxt3g9efb6c Message: Hi Bob! How s it going? Bob Charlie Dead drop: zzp8ns0nrxt3g9efb6c Message: Round 1

23 Conversation protocol Dead drop: Fsdd5vPMLH3KARqE2a Message: Bob Dead drop: Fsdd5vPMLH3KARqE2a Message: I m good, thanks! Charlie Round 2

24 Conversation protocol Bob Charlie Round 3

25 Conversation protocol Bob Charlie Round 4

26 Messages are encrypted Dead drop: Fsdd5vPMLH3KARqE2a Message: WCzdjL5wBNpJUtt9tE7 Bob Dead drop: Fsdd5vPMLH3KARqE2a Message: yjt1qwsvk8qw4up6gej Charlie

27 Idle clients send cover traffic Dead drop: Fsdd5vPMLH3KARqE2a Message: WCzdjL5wBNpJUtt9tE7 Bob Dead drop: Fsdd5vPMLH3KARqE2a Message: yjt1qwsvk8qw4up6gej Charlie

28 Idle clients send cover traffic Dead drop: Fsdd5vPMLH3KARqE2a Message: WCzdjL5wBNpJUtt9tE7 Bob Charlie Dead drop: Fsdd5vPMLH3KARqE2a Message: yjt1qwsvk8qw4up6gej Dead drop: uy06zouttvreru7rch Message: JwXpDGH5reB627KOs0

29 Dead drops give privacy Dead drop: Fsdd5vPMLH3KARqE2a Message: WCzdjL5wBNpJUtt9tE7 Bob Charlie Dead drop: Fsdd5vPMLH3KARqE2a Message: yjt1qwsvk8qw4up6gej Dead drop: uy06zouttvreru7rch Message: JwXpDGH5reB627KOs0

30 Dead drops give privacy Dead drop: Fsdd5vPMLH3KARqE2a Message: WCzdjL5wBNpJUtt9tE7 Bob Charlie Dead drop: Fsdd5vPMLH3KARqE2a Message: yjt1qwsvk8qw4up6gej Dead drop: uy06zouttvreru7rch Message: JwXpDGH5reB627KOs0

31 Mixnet hides origin of messages Bob Charlie

32 Mixnet hides origin of messages A B Bob C Charlie

33 Mixnet hides origin of messages Bob B C A Charlie

34 Mixnet hides origin of messages Bob B C A Charlie

35 Are we done yet? 2 Bob 1 Charlie

36 Are we done yet? 2 Bob Charlie Challenge: dead drop counts reveal access patterns 1

37 Demo! Let s see why access counts are a problem.

38 Solution: Each server adds noise Fake exchanges (noise) 1 2 Bob Charlie 1 1 2

39 What is noise? Fake singles Fake doubles Dead drop: RY9VjW4XROtTcbnZPaJ Message: Bzizd2loCIeXdIfHU33mds Dead drop: t53c81ttfdmbczflq7q Message: rccnmcttj8c8jmthlxn8 Dead drop: pavnhqmuegsmvxz6y5 Message: IuA94shFx7okpZdBacjBg Dead drop: 3nPki8GbZWfXRyw61wk Message: ne7yvljleicvcd1cu62 Dead drop: 3nPki8GbZWfXRyw61wk Message: 4QjdRfoB7GoEEb0vtMjf Dead drop: kt2jncerb7ieu3m1k5oj Message: mb4zgdabtlttm9ruzzv Dead drop: kt2jncerb7ieu3m1k5oj Message: wynxuyooip9ffjr4lktv38 Dead drop: LWnyE3AB2TTmUcCGL Message: k1bvsotvljqtey92vxd1o Dead drop: LWnyE3AB2TTmUcCGL Message: mtla2cdkkgzadt0ojm8s

40 Demo! Vuvuzela with noise is effective!

41 Formalizing privacy guarantee Pr[ i talked to Bob] ε Pr[ i not talked to Bob] Bob Bob Vuvuzela Vuvuzela

42 (ε,δ) differential privacy, simplified Pr[ i talked to Bob] ε Pr[ i not talked to Bob] Bob Bob Vuvuzela Vuvuzela

43 Noise achieves DP Let d be the number of dead drops with two accesses in a single round. To make d differentially private, we need to make these distributions very close (indistinguishable): Pr[ d=x talked to Bob] Pr[ d=x not talked to Bob] Probability Probability 0 1 Dead drops with two messages 250 Dead drops with two messages

44 Generating this distribution Pr[ d=x talked to Bob] Pr[ d=x not talked to Bob] Probability 250 Dead drops with two messages Constraints: Can t have negative dead drops Distributions have to be close enough for differential privacy

45 Generating this distribution Pr[ d=x talked to Bob] Pr[ d=x not talked to Bob] Probability Average noise is hundreds of fake messages 250 Dead drops with two messages Constraints: Can t have negative dead drops Distributions have to be close enough for differential privacy

46 Privacy degrades every round Each round leaks metadata We want differential privacy after sending many messages This means adding more noise to support more messages.

47 Vuvuzela s approach to noise More noise means privacy for more messages. Add as much noise as possible, while still keeping the system practical. Use differential privacy to compute how much privacy users get. Using composition theorem [Dwork & Roth 2014] We picked: 300,000 fake singles and 300,000 fake doubles per server per round.

48 Privacy with 300,000 noise Pr[ i talked to Bob] ε Pr[ i not talked to Bob] , ,000 1M 2M Messages wants to keep private

49 Eve is very evil sees previous graph and sends Eve many messages through Vuvuzela. Will NSA arrest for talking to Eve? Probably: using Vuvuzela is already suspicious Will a fair jury convict of talking to Eve? Unlikely: Vuvuzela observations are not damning evidence!

50 gets a fair trial Jury is already 50% certain did the crime (NSA is intimidating, other evidence, etc) Beyond unreasonable doubt = 90% certainty

51 is innocent for millions of messages Jury certainty % , ,000 1M 2M Messages wants to keep private

52 Implementation 3,000 lines of Go Untrusted entry server manages user connections Entry server notifies clients when a new round starts Available soon on Github: github.com/davidlazar/vuvuzela

53 Evaluation Can Vuvuzela servers support a large number of users and messages? Does Vuvuzela provide acceptable performance?

54 Asymptotic performance Noise is independent of number of users. Performance is linear in number of users Bandwidth, latency, CPU

55 Setup Entry server Server 1 Server 2 Server 3 36 cores per VM 10 Gbps links Client VMs

56 Acceptable end-to-end latency for text messaging 60 s End-to-end latency for conversation messages 50 s 40 s 30 s 20 s 10 s 0 s ,000 1M 1.5M 2M Number of online users

57 Performance bottlenecks CPU bound Dominated by mixnet operations High bandwidth cost 166 MB/s for servers, 12 KB/s for clients Can lower bandwidth by increasing latency linearly

58 Conclusion Problem: hide metadata in a secure and scalable way. Approach: Encrypt as much metadata as possible. Add noise to obscure remaining metadata. Formalized privacy guarantee with differential privacy Vuvuzela: scalable private messaging without metadata Scales linearly with number of users Privacy for millions of messages per user 37s latency 60,000 messages / second of throughput

59 What happens after 2M? Privacy for lifetime of messages is unrealistic under this configuration User s should change their expectation to just expect privacy for a subset of messages Example: privacy just for important messages. Example: privacy just for recent messages. User does not need to specify which subset of messages to keep private Vuvuzela s guarantee holds for any (small) subset of messages that the adversary cares about