Breaking Monero Episode 10: Public mining pools

This is a transcript of Breaking Monero Episode 10: Public mining pools

Published under CC-BY RyoRU

Justin: [00:00:00] Hello and welcome back to another episode of “Breaking Monero”. Today we are talking about public mining pool data. Public mining pools allow people to mine with the help of other people that share a block reward but in doing so they often reveal a lot of information about the blocks that they mine and the transactions that they make. Now this transparency is good for miners. Miners want to know when pools find blocks and when they send out their payments of course they want to check to make sure the pool isn’t trying to rip them off in some sort of fashion. But this is actually generally bad for the rest of the network perspective because the pool is giving a high level of visibility to the outputs that it handles which ultimately could impact the privacy of other members on the network. Individuals can use the information from the blocks that they mined and the transactions that they send in order to compile a list of outputs that the pool controls. And just like any other large output list that someone published it could cause some damage and the case in the case of large public pools this actually starts to become a large enough proportion of the total Monero output that we start to need to pay attention. So we’re back on today. Sarang can you talk a little bit more about what the public pools make public and why this is important?

Sarang: [00:01:30] Yeah. So I mean it really depends on what the pool’s doing. So of course when any miner whether they are part of the pool or whether they’re mining solo of matter the network doesn’t really care. The block contains a special transaction called the coinbase transaction which generates basically the reward for the miner or it could be the operator of the mining pool whoever’s collecting for that group. It includes an output that is specially designated in specially identifiable as being essentially for the reward for the miner. So they call that a coinbase output. If you’re a solo miner (I’m just kind of mining on my own computer) and I get one of those I think “horay I get to keep it. And then later I can spend that in a transaction do whatever the heck I want with it”. So there aren’t a whole lot of issues with that. So if you’re a solo miner and you’re just kind of spending stuff as normal. Life’s pretty good, but if you’re part of a mining pool you know the block reward goes to one address because that’s the way the network is set up. And then that pool (the operator whoever’s running it) will then need to essentially payout shares like you were saying of the mining reward to the participating miners according to whatever type of algorithm they use to determine that. And that typically looks like a payoff transaction. So a payment transaction will often look like no one maybe you know a couple more inputs going into the transaction to be spent and then a whole bunch of outputs going off to destinations that the network can’t identify. Of course these are one time addresses, but we might be able to infer are destined for the individual miners who are participating in the pool. So when we occasionally see these types of transactions which had maybe one or two inputs and a whole bunch of outputs then we can probably identify that those items “might be minor payouts”. Again most transactions that I’m just sending notes and make a purchase or something may have a few different inputs but typically the two outputs the destination whatever I’m actually spending the money on and change that would be directed back to me. So these definitely stand out on the network and the way that Monero transaction protocol is set up while ideally again we can’t tell which inputs are being spent and where the outputs are destined for. We can’t identify how many of them there are. I can definitely see in the transaction how many inputs are being spent and how many outputs are being generated. And of course the mining pool like you’re saying in the interest of transparency but also publish lists identifying which coinbase outputs are generated for rewards for that pool and which transactions contain payouts. So there is data that is either on the chain already and a clever or even not so clever adversary might try to clean some information from and the pools themselves might publish some or all of that information for transparency on their end too.

Justin: [00:04:07] Excellent thanks Sarang. So to go or some of the goals that we have sort of from a Monero network perspective. So number one we want to make sure that Monero ring size can account for these these known outputs or that these heuristicly dead outputs. So that involves doing analysis to make to see what the ultimate impact is what proportion of the outweighed outputs these are and making sure that the ring size can meet these requirements. Because even if we go through a list of other mitigations there can always be bad actors. Maybe we might come up with a list of best case are a list of requirements for pools to follow and they might just ignore them and purposely do malicious behavior or any other actor could try and do this behavior. So we need to make sure that Monero ring size meets these requirements also we want to keep your privates on pools light and maintain transparency as possible. We know that from a miner’s perspective they want to know when when blocks are being mined in order to keep the pools accountable. So the ideal solution of course would be so that pools could still share as much information as possible.

Justin: [00:05:17] Also keeping the impact to the network lightest possible we could recommend that everyone churns all the time but that’s not necessarily good for four networks health, recommending that every time a pool finds a block they create six transactions. I think that’s not necessarily something that’s good for everyone overall. And then we also want to avoid wallet interaction if wallets need to be scraping API data and other sort of information. It gets burdensome from the user and it’s prone to being fed incorrect or malicious information. And ultimately the main goal is to preserve the integrity of the output. So pools can send transactions that they can publish information but we want to make it so that in doing so we don’t mark the output that they control as known spent during these time periods. Ideally we want to make it so that if these outputs are included in other ring signatures that we can’t as an outside observer immediately exclude these as known bad.

Sarang: [00:06:23] And I think it’s worth noting too that you know with all this introduction stuff you know it makes it sound like “my goodness you know are public pools good for the network if there’s all these things that you know we can identify and go wrong”. It’s absolutely not. I mean a lot of people do not mine on their own because depending on the computational resources you have you’ll have a lot of variability and when you’re going to get payouts and you know that’s not maybe great for the individual miner. So public pool do offer kind of a kind of reduce the variability for individual miners and in some sense can encourage people to mine would otherwise not have mined. So there’s definitely a benefit to getting more miners on the network in this kind of organized way. But we do have to be careful that, whatever behavior we’re choosing it’s we make sure that it’s easy to do the right thing and that you know pools that don’t do the right thing just for economic or incentive based reasons you know aren’t harming the network.

Justin: [00:07:17] Thanks Ryan. So I am currently sharing my screen regarding some of the public mining pool data. I’m going over my defcon slide just so you can see them in a little more detail. So we can divide the pool outputs into two main categories. We have the coinbase outputs. These are new outputs that they mine out of blocks and then they have what I’m calling “payment outputs”. So these are outputs that individuals send in transactions. And so we have to account for them differently. There’s a number of strategies that we can use to approaches information. I’ve listed them under their respective category. There isn’t really a great option for these coinbase outputs. We can say pools should not publish a list but of course that has its own set of downsides. We can say that they should churn secretly but that has its own downsides. Ultimately users can mark the coinbase outputs so they interact with a pool API. Ask the pool “Hey what blocks have you mined”. And then they do not use those outputs or they could just avoid the coinbase output. There’s certainly some debate within the research community in Monero about what the best process is there. We’re not going to focus too much on these coinbase outputs but it’s something that you should be aware of as an individual regarding the payments. This is something where pools have an option to make much less impact on the rest of the network and I’m going to walk through how they can do that.

Justin: [00:08:45] So of course they could again not publish but miners want to have this information published. You could also mark payment output. So pools might just publish a list of all the outputs they control and people would need to not use those. But of course that requires interaction. And it has a high potential for pools to feed malicious information to users. So I guess we have this other solution called a “modified input selection algorithm” that I’m going to walk you through. And what this simply does is adjust the outputs that pools use to send transactions in a way that it definitely protects the integrity of the outputs that they’re receiving from these payment outputs as I as I’m calling them. So let’s look at these really quick.

Justin: [00:09:29] So let’s say that a pool makes a payout transaction in this case they’re only sending funds to two users and one output like the one that is going back to themselves. This is a change output that is going back to the pool now the concern is that “OK these two individuals receive their payout but let’s say a pool creates another transaction here where the only output that they control is obviously spent here”. So by that I mean these other block outputs here. Could our outputs at the pool does not control and so you know that they’re not spent since you have a list of all of the outputs of the pool controls — you know that this is the actual output that was spent by the pool. So as a result you can mark all these other decoys as fake for this transaction. You know that they are decoys and then you can say “OK well I know now that this transaction or this output is spent in this specific transaction”. So as a result if this output appears in any other transactions I know it is fake and that could have a potential impact on the network if there are enough of those cases. However instead, if the pool determined decides to select outputs in its ring from the transactions that it sends to the users. So it makes it payuout to these users and it of course has this change output again. So it uses all three of these in a ring signature. Sure you can still mark the black ones here as decoys because there’s no way for the pool to have actually spent those but it theoretically could have spent any of these because you don’t know which of the outputs in this transaction were the change output. So as a result the change output here now looks the same as any other legitimate mining payout. And as a result we preserve the integrity of this output. This output no longer needs to be marked by wallet clients and you can see that if we compare that to any other miner transaction where a miner here just sends funds that they receive as a payout that would look the same as a transaction that would just use this miner payout or the change output as a decoy. So just really walking through how the integrity of the output is preserved there. So that’s some really basic information covering how public mining pool data especially in regards to the payouts can go through a sort of novel solution where there is no additional harm to the network. We can see the pools can still publish all the information that’s needed. They can just make a really small tweak in order to preserve the integrity of the payout outputs that they make. So I think I thought that’s pretty interesting when I when I had this revelation for these outputs.

Justin: [00:12:25] So Sarang can you talk a little bit a little bit more in-depth about some of the analysis?

Sarang: [00:12:35] Yeah. A lot of the different kinds of analysis that we’ve been talking about in previous episodes and maybe talking about in future episodes are about the idea of whether or not it’s known or suspected that a given output is spent or unspent. So obviously in assets like bitcoin you can identify unspent transaction outputs and spent transaction. We know that that’s built into basically the way the transaction protocol works. But in Monero of course ideally we don’t want to know if transaction outputs are unspent or spent because then we can identify them as decoys or not decoys in particular transactions. So as Justin was mentioning under certain circumstances you may be able to remove kind of heuristicly in your head as you’re analyzing different transactions you may be able to kind of internally flag or remove certain ring members as decoys or just fly to see if you can guess which one might be spent. Of course these are heuristics and absent other information there’s no way to tell for sure. But this basically kind of exacerbates that. So if you know if we’re dealing with for example a chainsplit which is something we’ve talked about before, which I’ve done improperly can allow folks to kind of flag outputs as being decoys or not. Or if there were to be for example a chain reaction caused by something else then we know that that can also allow people to kind of flag certain outputs as decoys or not. So this kind of fits into this whole broad idea that we’ve kind of been hinting at and talking about throughout the whole series which is that anything that allows us to identify or statistically guess if something is a decoy or not is not good. And it can kind of fit in with these other forms of analysis too. So our goal is to make everything look uniformly the same. I mean of course this can be very tough to do a lot of times it causes us to put a lot of other burdens and requirements on the chain. So for example we could require that all transactions have a certain fixed number about even if they don’t need them. But that can cause a lot of blood and work too. So it’s you know it’s it’s a contentious thing to do. And of course it is worth noting too that unlike some of the other types of analysis where you can look at an output and provably for sure know that it spent or not using set theory or other kinds of graph analysis the stuff that you’ve been talking about Justin are typically the idea something that might be heuristicly identifiable which means that it’s a guess a statistical guess based on certain things you suspect about transaction patterns or behavior.

Sarang: [00:14:54] So again absent other information nothing could be proven about that but it could allow the adversary make a pretty good guess.

Justin: [00:15:02] So Sarang what are the key takeaways for people who are watching this episode. Do people need to be concerned about these and if so what can they really do to protect themselves.

Sarang: [00:15:12] So in general like you were saying the idea of having a ring size that has gotten larger over time is kind of a layered approach right. So increasing ring size is one thing that can be done to mitigate but certainly not eliminate certain forms of analysis that we’ve been talking about including this one. So the idea is that if for whatever reason you choose your ring member decoys poorly and they happen to be caught up in something involving a public pool or a chain split or something know even if an adversary could kind of flag that as being a well I bet that’s a decoy. If your ring size is large enough and the effects of these analysis aren’t large enough you still have the benefit of all the other decoys in your ring. So in general you know increasing ring size over time which is something that we do you know other projects have different philosophies on the same ideas. It’s something that can be done to mitigate and of course we have a spent output tool formerly called the “blackball tool”. But I don’t really like that word but there is a tool that you know you can run. You can either download lists of spent outputs at your own risk or run it yourself to basically make sure that your wallet avoids selecting outputs that are either probably dead or that are kind of heuristicly dead. pulling other information about things like public pool outputs. Is this necessary for most people? I would argue probably not. If it’s something that you’re really concerned about based on your own personal risk model it’s something you could certainly consider and there’s resources elsewhere on how to run that how to do it and how to use it safely. But it is also worth noting too that this this is a particularly tricky one. So this kind of falls into the specific question of what do we do with coinbase outputs and things that we suspect relate to pools. Now there some proposals that say maybe we only deal with coinbase outputs in rings in certain ways or when we’re picking decoys maybe we avoid coinbase outputs or use them only in particular patterns. And these are tricky because they all come with different tradeoffs and different consequences. And it’s definitely area very very ongoing research on how we handle these outputs. Both with kind of default wallet behavior or a kind of a more network wide required level.

Sarang: [00:17:16] We don’t really have a solution right now that I think everyone is completely happy with but it’s something that we’re really really actively working on right now.

Justin: [00:17:24] And then I have one final note too is that the public mining pool data over time is a reliable source of information that attackers or observers can use in conjunction with other information they’re trying to find. So if an attacker is able is trying to do a chainsplit attack and let’s say they only get like 50 percent of the outputs over a certain time period to be observable from this chainsplit attack. But 20 percent of the output over that time period were observed was as a result of the pool. Well we can’t test that. We certainly shouldn’t test these individually. A smart observer would be looking at the 50 percent and then the extra 20 percent to do it. A test of about 70 percent of the output. So the reason that this pool information is so important is that it’s a recurring source of information that could help contribute to other forms of attack and make them powerful because it’s a constant reliable source of information that people can use.

Sarang: [00:18:30] Yeah. That’s a really good point. The the various chain splits that we might tend to worry about. There’s only very very few of them that we’ve seen so far. But you’re right. Mining pools operate all the time coin based outputs are generated very reliably and as a result mining payouts also happen fairly reliably. So you might not have to worry too much about chainsplits and for the most part we tend not to. But you’re right — there’s there’s a lot of data that will continue coming in public pool output. So working very hard to make sure that that folks can continue to use the network safely without having to worry about these things.

Justin: [00:19:03] All right. Thanks. Any final comment?

Sarang: [00:19:07] Active area of research. As well as with a lot of other forms of analysis. It’s something it’s very tricky. It’s something we are aware of and do acknowledge that it is a limitation of the way that Monero is structured. Now it’s part of it’s just the ways that different users behavior can affect the behavior of others and some of it is you know how we decide to design our wallets and our transaction protocols. Our goal is to continue to make them better.

Justin: [00:19:30] All right. Thank you Sarang for joining us today. Thank you to the viewer or you for watching another episode of “Breaking Monero”. Take care.