This is a transcript of Breaking Monero series Episode 09: Poisoned outputs

Published under CC-BY RyoRU

Justin: [00:00:01] Welcome back to “Breaking Monero”. This is our 9th episode on poisoned outputs or the EAE attack or knacc attack. It’s given several names and we are here to discuss this pretty honestly difficult to nuanced topic to talk about, so we’ll do our very best. We’re here with me. Justin we’re here with Brandon and Sarang. It’s great to have everyone here on. So this video is going to be diagram heavy as if you’re watching this on like a podcast we recommend watch it in video form later if you’re confused because there’s really a lot of stuff going on. But first we’re gonna have Brandon talk about what’s the general idea sort of marked funds that we’re sort of talking about. And then I can jump into the diagrams to explain the situation to all.

Brandon: [00:00:52] All right. So here is a thought experiment for everybody in the breaking Monero audience.

Brandon: [00:00:57] Let’s say that you are living in an oppressive society and you want to buy some banned books. And let’s say that you have a local banned book dealer who stands on the corner and you can go give them cash and they will give you your Bible or your “To Kill a Mockingbird” or whatever copy of a banned textbook that you want. Now one question you might have is how would this tyrannical government come to the conclusion that you are a purchaser of these devices. Or rather how would they go about tracking the person who is supplying all of these banned books these nasty little devices to society. Well one way to do it and this is how law enforcement has done this you know this is like good old fashioned police work going back hundreds of years. You’d go do a controlled purchase from one of the street dealers you go buy a book six or seven times in a row and then you look for bank deposits at local banks and what will end up happening is the street dealer will go give the cash to his boss or one of his boss’s minions who will then handed over to the boss and then eventually it will make its way into a deposit at a bank account. And if those dollar bills that were used to purchase the banned books were marked the bank will alert law enforcement and they will have found their kingpin boss who is depositing money from these controlled purchases. So the main thing is that somebody is buying a book and there’s going to be a chain of custody of the money that will eventually end up in some Know Your Customer banks hands. And once that money ends up in the banks hands they can start linking real life identities with those original purchases and find the interior hops of those transactions. And unfortunately this is a problem that Monero faces..I’ll hand this back to Justin.

Justin: [00:03:03] Excellent. So we’re going to take a look at this attack in a little bit more detail just to show people what it is. Now again a warning this is this is very heavy series of slides so sort of brace for impact here. And I’m going to try and get my cursor up here, so as discussed you have person Eve that is sending funds to a specific address. The adress with Monero 43.. And they’re learning information when this person then takes funds and puts it on an exchange or some other colluding entity it doesn’t matter what these entities are. All that matters is that Eve is colluding with the exchange or this party A is colluding with Party B. Again it doesn’t matter who these parties specifically are. So. Let’s say Eve sent a transaction to that address. They create a ring signature as strong here. They use their real output here. That’s related to Eve and then they create a transaction with two outputs one output is given to that address. We’ll call it output A and the other address let’s say is given back to Eve so Eve doesn’t worry about this. She’s not trying to track this output, ultimately let’s say that Eve comes back looks at the block chain and notices that there are three transactions that have this output in their ring signature. These are not associated by time or anything let’s just say there are three transactions. And let’s say like this first one again it uses a the second one is that they all use this one output. These are all the transactions on the Monero blockchain that use this output and we’re not sure necessarily that a is truly spent in these ring signatures. It could be a decoy all three but the they’re still trying to learn information about what the situation is. So let’s say for the second transaction here for instance that someone deposited this output on an exchange. This this one yellow output here was sent to an exchange. And Alice was the person who sent the funds to the exchange. So for this EAE example you have Eve is the first e you have Alice in the middle and then you have the exchange on the last portion here. So what we have to know here is like the exchange might suspect with in conjunction with Eve that Alice is the holder of this Monero address. Now this is a heuristic. It’s not very strong especially if there’s only one instance here. We don’t really know what the what the real case here is. Right. It could be that Alice just use this output in the sort of ring you use. This output as a decoy in the ring. And if this only happens once if there’s a pretty high chance that this the how high probability that this just happened by chance that Alice is not the true person who holds this address. But if this happens twice or three times or four times. Well now you’re starting to get a pattern where Eve sends multiple transactions to this address and then this address has several deposits onto the exchange all under Alice’s account. Well now the exchange and Eve are starting to learn a lot of information where they can say “OK the chance that one person that deposits funds on our exchange is able to have a possible really recent spend from the outputs we assign to this address is very low”. It’s a very-very low probability that this happens by chance. And so as a result we’re going to assume that there’s a good enough amount of evidence that Alice is the same person as this address and it’s not the only sort of circumstance that can happen.

Justin: [00:07:13] This graph gets very messy especially as you add other sort of situations we’re trying to really keep it simple for this episode. While nevertheless keeping at least discussing some things that can sort of get in the way. So let’s say for this ring here. That there is another output that was sent to the exchange. As I’m as I’m highlighting here. Well in this case maybe this was sent to Charlie who sent the funds to the exchange. Well the exchange probably would at least have some suspicion of Charlie. But it’s far more likely that in Charlie’s case it happened by chance than Alices case had happened by chance. But it’s certainly some limitation of this sort of model and it’s a very sort of complex system on how these things get added. Ultimately it is a statistical test to say “OK what is the likelihood that this would have happened by chance or what is the likelihood that this is actually what’s going on”. And you’re going to have type 1 and Type 2 error as you conduct these statistical tests. But ultimately there are certainly ways for them to be much stronger in some circumstances than other circumstances.

Brandon: [00:08:22] Do you mind if I jump in really quick Justin. One thing I want to point out is just to connect this graph to this discussion I mentioned a little bit ago about controlled purchases. One way that you can visualize this is Eve is for example a detective who’s purchasing banned books online. They’ve constructed these for purchases from this this anonymous online vendor 43WI and then a couple of months later or maybe a week later this exchange notices that Alice makes all four of these deposits. The exchange comes to the conclusion that these four deposits are probably related to those the banned bookseller purchases. The difference is is that in Monero we have these ring signatures that point back to several previous spenders. And so in the cash scenario I have this linear chain of money going from one person to another person. Here we have each output is used possibly in multiple ring signatures and is only like probablistically implicated. So what do you end up having is this exchange is capable of developing this probability profile of Alice.

Justin: [00:09:38] Exactly. So we can even expand this. We’re gonna take this example and sort of do a little bit more with it. So let’s say instead of Alice sending four different transactions to an exchange she takes even less precaution and decides that she just wants to send one big transaction here. So there’s one transaction has four rings. There are four inputs each with their own rings.

Justin: [00:10:03] And so you see one of these is A B C and D. And then this transaction is deposited on the exchange again under Alice’s account. Well now we can be even more clear generally because you have one transaction like what is the likelihood that a single transaction would contain all of these four outputs that are being watched in each independent ring.

Justin: [00:10:32] It’s very very low and especially as the number of transactions grow that are that are tracked to one user it becomes increasingly unlikely. So this is certainly a consideration where Alice if she was interacting with an exchange should be concerned about revealing that she is again related to this this one entity at this address because it’s highly unlikely that anyone else could be on this case it is very likely that this did not happen by chance. The errors are very low.

Justin: [00:11:10] Now both of these were sort of with one layer of separation where you again have like one person who send funds to one intermediary who then sends it onto an exchange. But this could get even more complex. And so I’m going to show an example now where you have more intermediary steps in this case I’m only showing two because that’s about what will fit on a slide. And even so it’s simplified but this could be much longer depending on what you’re really assessing. So again let’s start off with a simple single transaction case we have Eve here who sends one transaction to this entity and this output is used. Let’s let’s say in three transactions of course in all likelihood it would probably be much more. But for the sake of faint on the slide let’s say three. But let’s say none of these transactions were deposited on an exchange or deposited on a service that e was working with to learn more information. So what Eve might do is say “OK well even though I didn’t learn anything specifically from these set of transactions I’m still going to watch these new outputs that are generated”.

Justin: [00:12:26] So instead of watching these three I mean I’m still going to watch these three but let’s say I’m also going to watch these — well now each of these outputs independently has a set of what’s again say three and other transactions where these are spent. So let’s say this output was used in these transactions. This output was used in these transactions and so on and so forth. Let’s say in this second layer there is one transaction here that deposited funds to the exchange.

Justin: [00:12:58] Let’s say Alice deposited funds in the exchange. Even the exchange really know how good is their test determining whether or not Alice is really the same person that controls. Is this really the person that controls this account. And this is similar to the very first situation we went over except that in this case there is more ambiguity. You really you have even less certainty because now instead of saying “OK well there’s maybe like three transactions that I’m worried about”. Well there’s a lot more transactions here that you’re worried about the graph gets a lot larger. But Eve can try and learn more information by sending more transactions. So let’s say that Eve sends transaction B and sends another transaction with output B. This again doesn’t come with any initial red flags in the first transactions that immediately use B. But let’s say that in the second set of transactions that there is another deposit to the exchange and Alice also made this deposit.

[00:14:15] Well now the exchange in Eve start to get a little bit more information about who actually deposited these funds. Because now you’re starting to have enough strength in a statistical model to say “well now it’s unlikely that Alice would have deposited both source of these funds if they did not receive outputs a and b”. So it’s really important to show how this grows over time it gets more complicated as you have more layers to this and to sort of simplify it. I have a nice stupid example here where we have a transaction tree and it purposely looks very much like a pyramid here. But let’s say that like one user like so for each tree here this is the depth as of an output as it goes on. So you have your initial transaction here that generates outputs that generate more outputs. So on and so forth.

[00:15:14] And this tree gets bigger and bigger over time. So if you have two people that that are involved on both sides of the transaction again let’s say that Eve here sends funds to Alice who then sends funds to an exchange really early on. Well there’s not a very deep transaction graph. And so Eve and the exchange have at least a decent idea that they could do a statistical test that has a pretty high degree of probability. But if there’s only one test that’s all the way down here. There’s a ton of other entropy that could occur. The statistical test is far weaker but if a user does this twice so that sure it might be deep down in the transaction trees but there are several trees that they can collect information from. Well then the test starts to become stronger again. So ultimately in summary and this can be applied to really any sort of application of these statistical tests. The more points of information that the user has so the more trees that Eve and the exchange or any sort of observer is able to construct and the further up the tree that the outputs are ornaments. Maybe in this picture example. The better the statistical test. So as users have as you have more instances of information about a user and as there’s less ambiguity for those sets of information the statistical tests get stronger. And so now you’re sort of at least should understand the very basics about what an EAE or a poison output set of attacks are. It’s the case where users try to construct as many..they try to be as associated with as many of these trees as possible and to try and learn as much observable information from it as possible so that they can really get actionable information. And one thing that I am done ranting.

Brandon: [00:17:19] Oh I guess we’re not going to be able to go back to the diagram while I’m talking. So one interesting thing is that as Justin says if you go to the bottom of the tree. The intuition is that you’re going to be more likely to be hidden by obscurity security by obscurity as you get down lower to the tree because you have more hops as you go down into the tree. You have more or more options, but one of the interesting things about these trees is that if you go really really deep.

Brandon: [00:18:22] But one of the interesting things about these trees is that if you go really really deep they get scraggly and unhealthy and they don’t have a nice full base like a good tree does. And so they get less dense as you go further down. So even though it seems like the further deep you go the safer you are if you go too deep you may also be creating problems for yourself. And that’s just one of these interesting properties these. Tree structures. I just wanted to throw that out there.

Justin: [00:18:52] All right. Thanks. Thanks Surae. So Sarang can you talk a little bit about something about how these attacks sort of get harder. What what some of the implications are a little bit more detail about like how these statistical probabilities for these tests sort of gets or is determined

Sarang: [00:19:12] Well I mean so determined by a few different things right. So you kind of talked about two different aspects. You talked about the number of trees which of course in part that just kind of depends on how the adversary Eve is doing controlled purchases that may be out of your control depending on your circumstances. But one parameter that we do in fact control that we talked about pretty frequently is ring size — ring size of course is the number of different outputs that you’re kind of throwing around in each ring which affects in some ways the level of plausible deniability you have that you’re the spender of a particular transaction. So in that particular circumstance the idea of kind of working your way down this tree and changing the ring size can in some ways almost kind of effect the width of these different trees. So in particular every single time that you know you kind of go one layer deep on the tree a bigger ring size will in general (kind of depending on transaction volume and things like that) effect you know kind of how wide the tree is and therefore how many different paths there are that could lead an adversary toward the statistical model of what’s going on. So all of the things being equal an increase in the ring size can in fact make this a bit harder for the adversary depending on how it’s done. And of course the number of times that you send funds to another entity and of course the idea of what’s called “churn” is just sending the funds to yourself basically brings you kind of deeper and deeper down into this tree which of course may lead you to believe. “OK. In that case why don’t I just send funds to myself a bunch of different times before I eventually go to the exchange”. You know that I’m far enough down the tree and we said that that’s good. But of course the way that you actually go about doing those transactions the the timing that you use for example that could also in effect leak information if it’s not done safely. So timing data that may be available to others looking at the chain or it’s your ISP you know who knows when you’re online and things like that different types of metadata like that can affect the way that churn is done which is why people say you know how often should I turn and how should I do it. It’s kind of a tricky question because it depends a lot on your threat model how you go about timing.

Sarang: [00:21:14] So those are kind of two interesting parameters that kind of kind of affect the way that these old trees are structured.

Justin: [00:21:25] So what about the point where like suppose..So how do you sort of avoid this type of problem now but do larger ring sizes alone sort of help with this where users don’t need to care or even with large ring sizes to users need to care.

Sarang: [00:21:44] Well you know I guess you can kind of take this to an extreme right. The idea that you know let’s suppose that the ring size kind of guy as big as it could possibly get. That is suppose that every possible output was a possible decoy and that every possible output would be used as a possible decoy. Then you kind of run into this this large anonymity set situation that other like zero coin or zcash type protocols and assets have which in that circumstance you’re pretty much as good as you can get. Everything is equally probable as a spender you’re trying to build this tree would be absolutely incomprehensible. So to some extent it does become really good. But it’s ring size in our particular case is a parameter that does affect things like transaction time and transaction size. So it definitely needs to be balanced against what we believe the possible benefit could be. It’s it’s definitely not a one size fits all answer for what the ideal ring size should be.

Sarang: [00:22:37] It’s tough it’s tough and tricky.

Justin: [00:22:41] And then we also want to make clear that the examples I should show up and we’re not exhaustive. Right. There’s a ton of different other nuances or different things we could have thrown in there that would either make the test. To make a statistical tests far more or less accurate. In practice imagine if they start bringing IP address metadata and where they’re going to like “Well these transactions were also sent from this IP” or it’s going to get far more involved or what if there is many people making deposits on the exchange. And so therefore there is a ton of different users that have similar behavior. Well then it’s going to become weaker. It’s not just a really sort of simple answer we can have here for you. Based off what how it all works because users can come up with whatever sort of statistical tests they can pull on any other outside information in conjunction with this. So it’s all fine and good for us to look at the blockchain and be like “oh well absent external information this” but again that’s not really the case. And of course even in this sort of attack that we described — Eve’s involvement the initial person sending the poison outputs that they’re tracking that still is sort of like external information by by itself. So we’re still testing an external information parameter.

Brandon: [00:24:04] So yeah. That’s actually a really good point. One of the main things is that these Eve characters exchanges and even the fact that they know that they’re sending these transactions to this one suspicious address in the first place. That’s not information that hits the blockchain. They just know that that’s external. In addition to that to elaborate on Sarangs point about ring sizes being sort of counterintuitive and liberating on your point that if you pick your own metric or heuristic then you’re going to draw a different conclusion. You can look at a variety of anonymity metrics and apply them to Monero. And what’s interesting is that some of them have some counterintuitive results — for example increasing ring size once it’s bigger than like 5 or 6 . It seems to me (I think the threshold is eight or nine) gives you these reduced returns according to certain anonymity metrics. But then sometimes if the blockchain according to other unanimity metrics the blockchain getting larger alone is enough to reduce the anonymity metric.

Brandon: [00:25:01] And so like balancing the ring size the heaviness of the blockchain and all of these properties in order to guarantee people “you have a negligible probability of your transactions being traced”. It’s not really functionally possible currently. But especially because you can draw different conclusions using different metrics.

Justin: [00:25:34] All right. Sarang you have two big takeaways for people who are watching this episode we sort of discussed like what this is and the fact that it’s nuanced but what are the takeaways for people that are watching.

Sarang: [00:25:42] So I mean some of the big takeaways right are that a lot of this as with so many other things really depends on your threat model. Like me and said you know the kind of many different forms of this attack kind of assume that you have kind of cooperating entities on both sides. Right. You know you have some kind of KYC AML exchange that knows who you are when you’re doing deposits. You might have another entity on the other side colluding with that exchange who’s maybe making controlled purchases. So if you if you are concerned about that you know part of it has to do with you know knowing the entities with whom you’re interacting if possible. Obviously in some situations and threat models you may not be able to have more control over what those entities are and what they’re doing. So basically we can reduce the number of Eves like the exchanges and the Eves and whatever kind of word we want to use for those you know — ideally the more we get rid of those the better. But again that’s not necessarily always possible. Some users in certain circumstances may be able to you know who may have a lot of choices over how they’re doing deposits or from whom they’re accepting funds but others might not.

Justin: [00:26:52] So what about plausible. Can you speak about sort of like the idea between plausible deniability. You’ve already discussed this in previous episodes but with this sort of statistical test. Do people still have the plausible deniability provided by ringing signatures?

Brandon: [00:27:32] So going back to what sarongs said a moment ago about a threat model. If your threat model is that your tyrannical nation is going to shoot you if there is a probability of more than 50 percent that they can link you with this transaction on the Monero blockchain. I would say maybe don’t use blockchains at all because you’re gonna be in a similar probability category is if you use the zcash of use any other blockchain you’re probably if your threat model is that you’re going to be punished just for proballistically even using the technology then it’s not really going to help you out.

Brandon: [00:28:00] On the other hand if your threat model is one of plausible deniability like Justin asks and you are living in a nation with a rule of law where you would be prosecuted in a court and you could demonstrate there are ten thousand additional transaction histories none of which include my my alleged transactions they just pulled this random transaction history out of the space of all transaction histories and they’re accusing me of it.

Brandon: [00:28:25] If you can make that plausible deniability argument then Monero is fantastic. And honestly if you can make that argument that even cash is dangerous. Like physical cash. So in terms of plausible deniability I would always look at cryptocurrency technology as a plausible deniability technology not as a way of hiding your transactions forever on a certain level. You know we’re in an arms race with the people who want to learn your financial information and we’re trying to protect you guys as fast as possible or as much as possible but like black holes leak information about their contents. The idea that we would be able to divine a block chain that will protect people forever from the most extreme threat model where government is going to nab you if you have a 15 percent chance that you were guilty. Then you know we have a completely different situation on our hands than finance.

Justin: [00:29:28] Thanks Brandon.

Justin: [00:29:30] So Sarang last question and so is moving past ring signature is still something that is necessary long term or is it reasonable for us to make the ring size like 100 or whatever.

Sarang: [00:29:42] So you know again like you know increasing the ring size such that we have you know a ton of different path possibilities and this big tree that makes it very very improbable that you know an adversary who’s not doing a ton of controlled purchases would be able to do that. You know is it possible. Yes. You know in theory we can increase the ring size to basically be whatever we want. Depending on how you do the selection of decoys and things like that but of course that increases the size of transactions which has scaling issues. It also increases the verification time and transactions which are scaling issues. So to some extent this is always kind of a balance game about this to what benefit is it. You have to increase the ring size fairly substantially to be able to get a negligible probability of this ever being an issue. So that’s unfortunate. And of course that’s why the research community right now is really big in kind of taking rings out of the picture entirely to go to more complete anonymity sets something like like Zcash and the protocols that they are based on for example when used correctly are able to offer excellent anonymity sets complete anonymity sets. But of course we know that that has tradeoffs. It has tradeoffs in kind of the whole trusted setup aspect. And we know the caches had issues with that in the past. So there’s a there’s always tradeoffs to this whether or not we’ll be able to find something you know relatively soon that’s able to kind of do away with all those tradeoffs. That’s that’s uncertain. A lot of people are looking into it. And as I’ve always said I personally look forward to the day when we don’t have to deal with these anonymity set problems anymore. So we can continue trying to iterate as best we can until the entire ecosystem gets there.

Justin: [00:31:15] Thank you so much Sarang and sorry for joining me today for this very difficult episode that frankly poisoned outputs EAE attack knacc attack whatever you want to call it. There’s tons of different names and there’s tons of different circumstances where it could be applied. So I think this serves as a good episode for sort of understanding how people should look at how they’re sort of transaction tree is growing as they use Monero and keep that in mind. And beyond that make sure to draw comparisons between this episode and other episodes with what other information you’re giving an observer to look at information by higher saying transactions. All right. That’s all from us today.