Breaking Monero Episode 12: Input-Output Metadata

This is a transcript of Breaking Monero Episode 12 Input-Output Metadata

Published under CC-BY RyoRU

Justin [00:00:00] Hello and welcome back to another episode of Breaking Monero. Today Sarang and I will be talking about the input output structure of Monero transactions. When you send Monero transactions you reveal to the rest of the network how many inputs and how many outputs are sent in the transaction. You can think of outputs like dollar bills. Some people call them notes where they’re just a container of money. I sometimes call them bags of gold because you sort of have a varying amount of gold in there that you dump into other bags. So that’s what an output is. But when you send monero transactions you may reveal a good amount of information with this metadata. So today we’re going to talk about some considerations when you send transactions to help make sure that this metadata alone can’t severely undermine the privacy of your transactions. But first we’re going to have Sarang talk about what the different transaction example types that you can make in Monero with the metadata on the number of inputs and outputs reveal and then also talk about whether this is unique to Monero or not.

Sarang [00:01:04] So like you were saying the idea is that a transaction is one or more inputs and then one or more outputs inputs are spent in their entirety and generate outputs that could go to different destinations or the same destination. Of course this is not unique to monero. This kind of structure of consuming inputs and generating outputs is the way Bitcoin and its family of different assets similar asset types works. Monero use it of course you know even even other privacy focused assets like zcash now use a similar structure that is aplied a bit of a different way. And of course you know the number of inputs and outputs you have depends in part on what you’re trying to do. So you’re going to see later most commonly typically consume one maybe two inputs depending on the total hidden value contained in those. And typically you’ll often have one destination output and then one other output where the change the remaining balance is sent back to you of course is also protected by the idea of an monero stealth addresses. And of course each of the inputs that you spend is still obscured by its own separate ring of decoys in a ring signature which you talked about. There are kind of some big exceptions to this. So one big example is the idea of pool payouts. So pool payouts are often sending a fixed amount of Monero to a bunch of different recipients in their pool. So oftentimes these will have many different outputs. Monero currently because of the way that bulletproof works has a limit of 16 outputs for example but you’ll often see this and if you do you might be able to summarise that it’s a pool payout. Another thing you might try to do is what’s called a churn and churn you basically send funds to yourself which may be an attempt to kind of obscure the fact that funds you’ve received are then going to be used for later purpose although it’s a bit tricky to do this correctly and oftentimes you know those might consume a single input and then generates you know ideally two outputs one to yourself and one to change. It’s worth noting and we’re gonna talk a little bit about this later to that by default Monero will always have at least two outputs you know just because most transactions used to any way destination and change and we don’t want to reveal information that would be caused by sending a single output which you could under certain circumstances reveal information about that output and could also signal to an adversary that a churn operation where you’re just sending to yourself and nowhere else might be in progress.

Justin [00:03:27] Yeah. So adding on to though one output considerations as you mentioned the Monero software will not allow users to send transactions with only one output for several reasons. So first if the user was trying to do a churn type transaction send funds to themselves it often would look like the case where they have one input and one output where users are sending the entire value of the output to themselves. And that ultimately is the issue with one output transactions is that you know that the entire value of all the inputs whether it’s one input or five inputs or 10 inputs or however no many number of inputs is all sent to one output it’s all sent to one recipient and so you know that the entire value of all these outputs is spent. By making it so that there are at least two outputs you might still spend the entire value of all these inputs to one individual but then also fits in with all the other transactions that have changed. So even if you don’t have change by having two output minimum even if one is empty its a zero output. Then you’re able to hide among the other normal transactions that have changed. So it helps hide certain things like churn transactions. It also helps make sure that transactions look in general more homogenous so with Monero the wallet minimum is with two outputs which is really necessary to prevent people from knowing that the entire value of all the inputs is actually used up in that transaction. Not that some of it is returned as change and that’s a really important distinction. So we just want to also stress that even if certain cryptocurrency implementations have really really strong privacy otherwise this is generally a piece of metadata that is revealed even in these really really good implementations or otherwise really good implementations. So with Monero if you can assume like what if Monero had perfect ring signature privacy what if we had an enormously long walk by a large ring sizes you would still know the number of inputs and the number of outputs that were sent to the transaction so you still might infer. Okay. Well this transaction still likely is a poor pay payout or an exchange with this batch withdraw. Because they pay 16 people and it’s typically user behavior to pay sixteen people at a time. So those sort of things are important or many implementations not just by Monero but also other privacy implementations like zcash where they might have otherwise really good privacy properties. But this metadata is often usually leaked to others. So Sarang is there anything else you want to cover on this point before I move to showing some of the data in regards to how people have spent Monero related to the number of inputs and outputs in the past.

Sarang [00:06:18] Yeah I mean you did say that one of one of the things we ideally like to do is try to make all transactions look as similar as possible. Well from a metadata perspective so you know if every single transaction were to input to output or something like that you might say well that’s great. You know all the transactions look pretty similar from that respect. So that must mean that privacy is improved and from a certain perspective you know that is true. You know the way that I like to look at this is kind of like how Monero used to allow basically any ring size you want down to some some minimum.

Sarang [00:06:51] If you wanted a very large ring size you could do that but you know even even after we chose to standardize on a fixed ring size which is 11 right now. So every ring for any input that’s spent must contain eleven total ring members. Even with that that does not automatically mean that because that part of transactions with some organized that solved all of the privacy woes that folks may have had. It’s one aspect to transaction metadata. Fixing it is not necessarily sufficient to solve all the particular privacy based use cases folks have. And this is another example of that. So we’d like to homogenize things as much as possible but of course we know that it’s not necessarily enough. And it’s also worth noting with the idea of like 1 output transactions. Mathematically if we did allow those it might be possible for me to send Justin a transaction with one output and then for Justin to send me a transaction back with one output and under certain circumstances like that there is actually additional data that could be leaked. So you know besides the fact that it would also say that the entire value of the output was sent to Justin. There’s other concerns too. So we like to require at least two outputs either within the walls software or maybe eventually as a consensus rule.

Justin [00:08:08] Yeah that’s a really good points. The number of inputs a number of outputs is an important consideration but it certainly is not the only consideration. So you still need to worry about things like the general strength of your rings signature. Also timing analysis everything else you’ve talked about in previous episodes. Those don’t go away just because you’re following these best practices. It’s one additional set of best practices you should add to the rest of the pile. One other thing I want to add quickly before moving on is that. It’s often the case that people will try to look at several transactions in a row meet a specific payout structure. So if we look at things like mining pools mining those are sort of a mediocre example because as you mentioned in a previous episode they often make data public.

Justin [00:08:56] Well let’s suppose that we’re talking about a private mining pool that keeps things in-house. They don’t share this information with others. Well with these private mining pools they typically will make these 16 output transactions continuously. So it often is the case that a mining pool will pay out 15 people have one output as change back themselves and then they’ll use that output in another 16 out transaction going into the future. So you can sort of look at the outputs that are generated by the pool and say OK well I know that the change to the pool most likely will be spent and another 16 output transactions. So I’m going to specifically look for transactions that meet this metadata or say that yes I’m looking for transactions in the future that have 16 outputs. And by doing so it can help reduce the anonymity set further. This is an example of a heuristic you could use to learn more information about users. This also could be the case with churning where let’s say for. For some reason you might suspect that a user might churn eight times.Let’s say you need some way to come up with this information in the first place of course but let’s say due to some knowledge of information you know they will churn eight times. Well you might look for a path of one in two out transactions eight times. And if there’s there might not be thousands of rounds that look like that there might only be a dozen herself. So now your anonymity set is lower because someone has knowledge of how you’re trying to transact in this metadata could be used to learn more information. So it’s also not just the case on a per transaction basis. It also matters if someone has a good understanding what your general spending behaviors are they can use this metadata to facilitate some other investigations. So other interesting things to do with metadata analysis really in Monero generally if you have predictable repeat behavior it’s generally a problem. And this metadata that is leaked based off the number of inputs and outputs is another thing to worry about in terms of trying to avoid this sort of predictable behavior. So I’m going to share my screen here and show some of that information that was collected based off Monero transaction history. So I have an excel document pulled up here. This is information prepared by the non sense sense research lab had up by is missing a few others. And it showed that they scanned the Monero block chain for this data. It’s specifically related to ringct outputs. So they have information on all outputs but these are just outputs created 2017 and later does not include monero early outputs. And we can see the distribution of inputs the distribution of outputs. And they also looked at the number of one output transactions. And so we’ll look at all of those very briefly. You can see here there are that about 49% percent of Monero transactions have one input and 41% have two inputs. Only about 9 percent of transactions have any other number. The maximum output at least as the time sorry the maximum number of inputs at least at the time that they prepared this data was one thousand one hundred eighty seven. So just a fun fact for you there was at least one transaction with a large number of inputs. So what this the key takeaway from this is if you are sending a transaction with more than two inputs you are at least falling slightly out of the typical zone. If you’re sending the transaction with only one or two inputs you’re generally at least of a pretty big crowd behind it. Of course I mentioned that it matters more for other transactions you make too. But in general for one given transaction it’s pretty typical to see those of one into output inputs. If you make a transaction with let’s say seventy seven inputs there are only one hundred transactions in Moneros recent history that have this. So it’s much less frequent and if you go down where there is only let’s say five transactions with one hundred ninety three inputs then you might be a little bit more revealing compared to how you otherwise might look. So that’s just one thing to keep in mind. Most people spend transactions with only one or two inputs in Monero. If you look at the outputs it’s even more stark. You can see here’s just sorted by the number this is sorted by the total transaction count over here. Ninety one percent of Monero transactions have two outputs again ringct transactions. So you can see there is an extraordinary skew here where you know two outputs is vastly overrepresented compared to any other number. I also highlighted 16 here. Again this is the maximum amount that you can spend with bullet proofs. Any of these higher numbers were spent before bulletproof. But you can see there’s only thirteen thousand seven hundred three transactions. Again as the time this was written that sent the maximum amount which would suggest that if you look at a 16 output transaction there’s a really good chance that it’s related to an exchange or pool payout. So that helps you as investigator learn a little bit more about the monero block chain there too. Now this is something that’s really annoying is that there are two thousand five hundred twenty two transactions with only one output. So as I mentioned the monero wallet software will not let you send these transactions. So that means that this these two thousand five hundred twenty two transactions were sent with some custom wallet that someone created and was able to mine needs or have them confirmed in the blocks. So this is something that I mean going into the future Sarang and I recommend that this should be a consensus requirement that these are not allowed at all. But as as of right now there are two thousand five hundred twenty five. Twenty two of them. We can actually look at the distribution of that. So these are all of them here. These are all the transaction outputs all this information is available publicly on their github I have a link in the description and we can look at a histogram here. So you can see that not too many were created before April 2018 but through April May going through July that’s when the vast majority of these 1 output transactions were created. Around October 2018 there were fewer of them created and as a result it looks like someone was really trying to test it around this time period. There was a Monero fork around this time period. So it’s sort of interesting isn’t it if you’re trying to investigate them in monero block chain to help see what people were doing over these time periods is one output transaction certainly help share a little bit there. So that’s just one thing that I found is pretty interesting that the consensus reached or if I’m pronouncing that correctly. Research Lab put together looking at the distribution of monero inputs and outputs it can be really useful to help give you a really basic understanding of whether what you’re doing is common among monero network participants or not. And this may change over time. For example you can’t send your transactions with outputs greater than 16 at this point. So the proportion of those with more than 16 outputs is decreasing over time for instance but it’s just a really good way to look at how people are generally spending the funds. So Sarang. Can you talk a little bit more about what this metadata means to people how they should really be concerned about sending these transactions with either a normal amount of inputs and outputs or where a very unusual amount of inputs and outputs.

Sarang [00:16:21] Right. So you know typical transactions as we’ve seen are almost always one or two inputs and then kind of the minimum of two outputs the number of inputs can change of course you know depending on the types of funds you’re trying to send course in any given transaction you need to have enough funds in the combined outputs you’re spending in order to make the transaction balance. So there might be circumstances where you just have a little bit of change leftover that you do need to be pulled into a given transaction which could increase the number of inputs. So if at all possible by using kind of standard wallet software the wallet will often try to make kind of reasonable and smart decisions about how to set up transactions in this way. And of course for example require two or more outputs to avoid the issue where you know that all of the inputs have been spent for example. But even if you’re very very careful about always having a reasonable number of inputs and outputs so your transaction can kind of hide in the crowd. As we said like certain types of predictable behavior could make you stand out. So if an adversary happens to know that the software you’re using or that your own behavior often leads to you churning a certain number of times that might imply a certain kind of structure to your transactions that the adversary might try to look at in this big broad Monero transaction graph which basically reduces the size of the crowd. You get to hide in. So ideally you know spending behavior should be a similar to kind of the crowd as possible and should ideally be as unpredictable as possible.

Justin [00:17:50] All right. Thanks. Besides those two things are there any other recommendations you have for users.

Sarang [00:17:58] For a lot of branding I would say for most you know kind of different threats and use case models you know using standard Monero software as usual it is typically a reasonable way to do things. If you are concerned about kind of targeted analysis toward you you know you might want to be very careful about as you said many different types of metadata which could involve things like you know sending transactions around the same times or you know from the same IP address every time or you know making sure that you’re not using custom wallet software that might try to make poor decisions or you know look out for you know very common repeated sharing behavior that always looks the same for example.

Justin [00:18:37] All right. Thanks. Is there anything else you want to leave the listener with.

Sarang [00:18:42] I guess just that you know this is this is one particular kind of metadata that is absolutely not unique to Monero but it’s something interesting to think about right. You know it’s a type of metadata that is very difficult for us to homogenize ringsize is very easy mandates. You know we just say that that’s a system parameter. But the reason that most digital assets like Monero and others have fixed input-output structures is that that particular kind of metadata is very very difficult to hide in a smart way. And attempts to try to make transactions more homogenous could lead to behavior like very regular repeated predictable churn which itself can leave a fingerprint. So know for many threat models in these cases just kind of standard use of the default Monero wallet client is just fine.

Justin [00:19:33] All right. Thank you so much Sarang for joining me today. Thanks everyone for watching this episode of breaking Monero. Catch you in the next one.