Breaking Monero series Episode 05: Input selection algorithm

This is a transcript of Breaking Monero series Episode 05: Input selection algorithm

Published under CC-BY RyoRU

Justin: [00:00:00] Welcome back to “Breaking Monero”. Today we are talking about Moneros ring input selection algorithm. We’ve spoken in the past about ring signatures and decoys in other previous episodes and Sarang specifically has talked about how the decoys are selected but he used a very vague phrase at least called “randomly selected” right. And in this episode we’re gonna get far more nuance from or specific about what we mean by this. This mysterious phrase and also the phrase itself isn’t very accurate either and so it’s important to add some additional clarification here on what’s actually happening. And there’s a lot more to random than behind the scenes so I’m going to start with the screen share showing an example of some of the Moneros input selection algorithms over the past. So on the top here you can see an example of a completely random distribution algorithm. So on the left you have old outputs that were generated at the very very beginning of Moneros history on the right you have new outputs that are generated very very recently especially within the past few days or so.

Justin: [00:01:08] Let’s say that the green circle is the actual output that was spent. This is the real money that’s sent and the blue ones are the decoys that are selected. Now a completely random distribution method might sound great to begin with because you know any input could be selected for any reason but this leads to a lot of unintended consequences as a result. So you can make pretty strong heuristics that say people are far more likely to spend new money than old money. So as a result the latest inputs (the green one highlighted here) is most likely to be the real one. And well you don’t necessarily have the ground truth to prove that this is true. It could be tested as very reliable over time you could make a potentially very strong heuristic there and you can see in the example on the screen on the first line there that is the case because that’s often the case. Right? So Monero starts to improve upon this iterate upon this and you can see on the second line there there’s an example of a “recent zone” selection algorithm. So you have again the whole history of Moneros outputs but you have a short recent zone period where you’re more likely to select other decoys from this specific period. So Moneros code might specify for instance that the recent zone needs to be about one point eight days and that you should select about half of the decoys from this set of recent zones. You can see on this example here that about half the decoys are selected from this recent zone and then for the rest the tail going back to previous time like the very beginning of Monero history. You still have the ability to select these outputs but they’re less common than new outputs. And this helps address the specific heuristic we’re speaking about where the latest output is the most in the ring is usually the true one because now you have more latest decoys included in this ring. So therefore you have a more plausible selected outputs in this case. And the recent zone was nice and simple it was a really easy way to implement this sort of feature and it would definitely was an improvement over the existing completely random system Monero began with, but it’s not ideal. And so Monero has moved to what more resembles the bottom line there which is a “matching distribution” one that is based on empirically observed distributions based on what we might narrow and outside reaches researchers found with bitcoin and Monero. It’s a mathematical model so you can see that in this case the newest outputs are even more likely to be selected for instance. So hopefully this diagram helps show how it’s not just about how many inputs there are in a transaction it’s also about how you select them and there’s a lot of implications on how these are actually selected, but it’s more than just timing as I show you — timing is just one part of how this is done and for that to this end Sarang is going to speak a little bit more specifically about other factors involved in the selection algorithm.

Sarang: [00:04:18] Yeah absolutely. I mean and it’s worth noting that you know technically the way that we still do it and the way that we’ve always done it did have elements of randomness to it.

Sarang: [00:04:25] I mean “random” is kind of a it’s a it’s often a really poor word right. So for example even though typically the outputs that you’ll choose kind of follow that particular mathematical model there is an element of randomness involved. So you know two people can definitely choose very very different rings but you know on average they’ll follow a pattern that looks approximately like the pattern that you’ve showed. So there’s always still randomness into it. But like you said timing heuristics or guesses that an adversary might make make based just on the age of the output like you said are only one heuristic. They’re a pretty big heuristic because in general for a lot of old transactions you can guess what you thought the newest one was in and you know you might be right or you could prove it implicitly but timing is just one part. And then we’ve iterated since then to kind of mitigate against other smaller heuristic that we’re not timing based. So one example is something called “coinbase outputs”. If you’re not familiar with the term basically every Monero block of transactions that is generated has a special output in it that generates new money as part of the protocol. That’s kind of what helps to reward miners for doing work in part. And those coinbase up I like to think of as you know newly generated money. So in general do people spend newly generated money or coinbase outputs as the true spender. Probably not necessarily as often as non new money or non coinbase outputs. So for example if I happen to choose a ring that contained, I don’t know, 10 coinbase outputs and then my true output which was not a coinbase output an adversary might look at that and think “I would say it’s much more likely that this person didn’t spend a coinbase output because that’s all very new money”. So that could be a heuristic that they might use they might think coinbase outputs are probably decoys. Well in that case that would kind of imply that we should select fewer coinbase outputs as part of our rings. How many is too many or too few. I mean that’s not a very well-defined problem with a very well defined solution. But, as we’ve iterated our selection algorithm to make it better against this “guess newest” heuristic involving output age we probably introduced more coinbase outputs than some people would have liked. So we made a slight modification to the algorithm where instead of just choosing a block and then yanking a decoy out of it which tends to give us more coinbase outputs than fewer. Instead we actually look at a very small window around that particular block. So effectively increasing the size of the bin from which we get to choose our outputs. And what that ends up meaning statistically — is that we end up choosing fewer coinbase outputs which kind of mitigates against this much smaller heuristic and that of course is not the only heuristic type you might come up with either. So coinbase outputs are one thing an adversary might use to look at it to try to make guesses timing which we’ve worked on of course and have talked about might be one that the adversary might use those other ones. For example if I have a transaction that has two different inputs each of those has a separate ring. Maybe the adversary is able to look at the different decoys and outputs that are in those rings and maybe the adversarial find that there’s a transaction. Way back when in the blockchain that generated two different outputs maybe one of those outputs appears in one of the inputs to my new transaction and the other output appears as an input to the other one. Again it’s just a guess because it may have happened by chance but probably not. The adversary might try to conclude that the outputs that were generated in a previous transaction are now being spent by me and might make some conclusion based on that. Again, without external information a heuristic is not a proof, but it gives the adversary something that they might try to guess.

Sarang: [00:07:47] So in general this is very very complex. I would say right now it is pretty impossible to get rid of all possible heuristics, so we can always make our selection algorithms better. And as Justin pointed out and as I’ve kind of hinted at we have done this over time — we iterate to get better and better. I don’t want to think about this as something you Justin brought up in fact kind of like a plugging of a hole analogy if you want to give that you kind of liked.

Justin: [00:08:11] So one example like we we know of a specific heuristic, for instance, that the guess newest heuristic might be an example. So. We can iterate Monero selection algorithm to help counter this sort of here. Second the actual effectiveness of it but in doing so… We’re still… Choosing some other way to select outputs that people can develop heuristics for. So, there’s no limit to the number of heuristics that people can come up with. They can continue making complicated heuristics over time pretty much no matter what we do. So we’re always plugging these holes that we’re aware of and the biggest holes we know of, but we might indirectly be making smaller holes we might make holes that we’re not necessarily aware of, because maybe the heuristics haven’t been conceived yet especially by participants in the Monero community. So this is definitely something that will need continuous improvements. It needs continuous iteration in order to make it better to keep patching these holes. You can’t just pave a road and never expect that have potholes. You have to suffer through it like the Minnesotan winters like I have here you go and have to keep patching these potholes that keep appearing right. They keep coming out. And when you think you’re done some trucks are going to drive over it and come up with and make a new one right. So these sort of circumstances keep happening.

Justin: [00:09:30] (Apologies that terrible analogy I’m trying to place around here). But I we try to address the big ones first. The example that the “guess new” was but there might have been some consequences as an example by changing the selection algorithms to follow a more mathematical distribution. We actually selected more of those coinbase outputs and we needed to go back and sort of refine how we did this, was every improve with every iteration still an improvement overall? From what we can see now it should be yes — we addressed the existing heuristics and done more good than harm. But there’s always going to be some sort of tradeoff that we’re going to keep playing with. So to speak.

Sarang: [00:10:10] Yeah absolutely. I mean we we definitely receive reports all the time about people who come up with you know a particular small pattern that they might see among certain transactions or outputs based on the way that we make our selections and some of those are very very small doesn’t mean that you know it’s optimal that we keep them in there.

Sarang: [00:10:26] But you know to some extent it’s not necessarily always obvious how to make an absolute good change to counter some of those small heuristics without inadvertently kind of ruining the work that we’ve done for some of the much bigger heuristics. So if we see a small year pointed out and we don’t change our algorithm because of it you know it doesn’t mean that we are not concerned about those your heuristics but it means that we always have to balance the good that we’d be doing by making such a change with the inadvertent harm that might be caused by it. So it’s always an arms race right. Financial privacy as a whole as you’re probably learning from this video series is an arms race — analysis only gets better over time and that’s great. Just because we receive small reports sometimes and reports other times about different heuristics that come up doesn’t mean we don’t want to see them. I love seeing those reports. I love learning more about what other people are doing with this analysis but it just reminds me that it is an arms race analysis, gets gets better we get better because of that that might invite more analysis which just has kind of this spiraling effect toward us getting better.

Justin: [00:11:27] Yes speak. You speak about spiraling effects towards getting better. One of the great things about ring signatures is that the selection algorithm and the ring size sort of go hand in hand in sort of a positive or negative feedback loop. Suppose you’re purposely making things worse. But as Monero increases its ring size it sort of decreases the severe negativeness of perhaps a bad selection algorithm or some limitations of selection algorithm. If you just keep picking more and more inputs for example right more and more decoys — the shortcomings of a specific algorithm might decrease will or should ideally decrease. Meanwhile if you improve your selection algorithm you make better use of the decoys that you have. So these two components are really critically important and having strong privacy Monero making the most out of its ring signatures. Because if you have a large ring size with the terrible selection algorithm then you’re not going to have great privacy because you’re able to develop really strong heuristics potentially. And likewise if you have a really even if you’ve had some mechanism of having a perfect algorithm under every circumstance, but you had a really small ring size for example, then you’re also not great either. So these things really critically work hand-in-hand at providing really the privacy that ring signatures offer — it’s more than just what they say they provide out of the box which is one out in Monero cases now one out of eleven, are possibly spent. It’s all the additional metadata timing analysis coinbase metadata that is associated with this. And so the selection algorithm needs to adjust over time to compensate for this. Otherwise we could just pick the first 10 outputs that were ever generated on Moneros blockchain and be like “oh it’s either the latest or the first 10”. And that’s obviously not very convincing. So you know it’s input selection algorithm is very important Monero ring signatures. Do you have any last closing thoughts to leave the viewer with just that?

Sarang: [00:13:32] Our goal still remains to provide the best plausible deniability possible with ring signatures and over time we continue to learn about better and better ways to do that. And so we keep iterating and iterating iterating to get better and we’re absolutely not perfect now, but we continue to try to get better and better.

Justin: [00:13:51] All right. Thank you Sarang. Thank you everyone for watching this episode of “Breaking Monero”. We will get you in the next one.