DeathAndTaxes

Legendary



Offline



Activity: 1218

Merit: 1007





Gerald Davis







DonatorLegendaryActivity: 1218Merit: 1007Gerald Davis Using a DHT to reduce the resource requirements of full nodes. June 23, 2014, 04:06:47 PM

Last edit: June 23, 2014, 05:19:29 PM by DeathAndTaxes #1



1) Block structure for DHT nodes would be changed such that it contains just the block header and tx hashes (TxId).

2) A new block message would be created which allows relaying this "reduced block" (also useful in reducing the orphan cost of blocks).

3) The full transaction would be stored in the DTT. Individual nodes would store a subset of the DTT.

4) Current "full nodes" could be converted into DHT nodes by simply maintaining a 100% share of the DHT.



The naive solution would be for individual nodes to drop all txns which aren't in their share and use the DTT as needed. This would be rather inefficient as the probability of a node needing a particular txn (either internally or because it was requested by a peer) is not consistent for all txns. Certain txn would always be maintained in the local cache those would include:

a) Txns in "recent" blocks. In the event of a reorg updating the UTXO requires knowing the details of the txns of the block being orphaned.

b) Txns in the UTXO. This is txns which have at least one unspent output. The UTXO is needed to validate new txns and new blocks.

c) Txns in the memory pool. This is the set of txns which are valid and known to the node but not yet included in a block. If block messages include txns hashes only the memory pool will be needed to validate a block.

d) Txns involving the user's keys. This isn't a requirement but a user has a vested interest in ensuring copies of his txns are maintained. If nothing else this may be psychologically useful to improve trust in the DHT concept.



Understand however the integrity of txns comes from their txn hash (TxId) so a "badly optimized" DHT would have suboptimal performance but not reduced security.



Some rough numbers of the current storage requirements

Full raw blockchain = ~18.9 GB

Undo chain = ~2.5 GB (can this be reduced? seems undo information past say the last 144 blocks would be of little value)

Blockchain Indexes = ~2.0 GB (I haven't decoded these but find it interesting it is so large relative to the blocks)

Chainstate (UTXO) = ~400 MB (compressed format contains unspent outputs & txn metadata only)

Memory Pool = <1MB (unconfirmed txns, 538 at time of post)

Total: ~23.8 GB



The current blockchain stats

Number of blocks: 307,394

Number of transactions: ~41,182,000

Number of unspent txns: 3,347,562 (a txns with at least one unspent output)

Number of unspent outputs: 11,648,626



Breaking this down

Size of hash-only blocks: 1,320 MB (includes headers & txn hashes)

Size of the UTXO: 400 MB (unspent outputs only)

Size of the Unspent Txns in entirety: ~1,500 MB



So the local cache storage requirements for a DHT node would be: 1.7 GB (or 2.8 GB if full txns of UTXO elements are retained). If we assume the average node maintains a 5% DHT share of the remaining txns (bulk of the block bodies) that would be another 1GB. DHT nodes would also keep a local copy of all memory pool but that is a rounding error on storage requirements.



This wouldn't reduce the bootstrap time or (or amount of bandwidth used) for bootstrapping full nodes. As a full node you would still need to download 100% of the txns of 100% of the blocks to verify the longest chain is the longest valid chain. However once bootstrapped the ongoing resource requirements would be reduced. This would work best if the protocol contained a block message which just sent blockheader & txns hashes. Currently when a full node receives and verifies a new block which extends the longest chain it stores the full block and removes the now spent txns from the UTXO. DHT nodes would instead record the reduced block (header & hashes only), then saved its DHT share of the spent txns and discard the rest. To reduce the overhead from reorgs and provide better coverage for syncing nodes needing only the tip of the blockchain it may make sense for DHT nodes to retain all txns for the most recent x blocks (144? one block day?).



If a structure like this was used for all nodes then "full nodes" would simply be nodes participating in the txn DHT that retain a 100% copy of the full txn set. The best comparison would be to the status of a "seeder" in the bittorrent protocol. Since all DHT nodes would retain a full copy of the UTXO these nodes could still support SPV clients. SPV clients could actually support the network by retaining a portion of the txn set. Retaining older txns would be especially beneficial in that they could reduce the load on full nodes when the network bootstraps new full nodes.

Many times in the past people have suggested using a DHT (either by name or by reinventing the concept) to store the entire blockchain but this breaks the trust-free nature of blockchain verification. To verify blocks a node must have the full block history. With the TxIds from the longest chain however you can not be spoofed if you ask an arbitrary node for the details of that txn. To create a different txn with the same TxId would require a preimage attack on the hash and that is considered infeasible as long as SHA-256 is cryptographically strong. This would allow txns to be stored in a distributed trust free manner using a DHT ( Distributed Hash Table ). The primary reason would be to reduce the minimum storage requirements of each node and allow nodes to support the network on a partial basis. Currently there is an all or nothing requirement. You are either a full node (cost is ~25GB) or you do not help the network in any way (SPV nodes do not improve the security of the network). DHT of Transactions, a Distributed Transaction Table (DTT). The required changes not be significant and could be done in a backwards compatible manner.1) Block structure for DHT nodes would be changed such that it contains just the block header and tx hashes (TxId).2) A new block message would be created which allows relaying this "reduced block" (also useful in reducing the orphan cost of blocks).3) The full transaction would be stored in the DTT. Individual nodes would store a subset of the DTT.4) Current "full nodes" could be converted into DHT nodes by simply maintaining a 100% share of the DHT.The naive solution would be for individual nodes to drop all txns which aren't in their share and use the DTT as needed. This would be rather inefficient as the probability of a node needing a particular txn (either internally or because it was requested by a peer) is not consistent for all txns. Certain txn would always be maintained in the local cache those would include:a) Txns in "recent" blocks. In the event of a reorg updating the UTXO requires knowing the details of the txns of the block being orphaned.b) Txns in the UTXO. This is txns which have at least one unspent output. The UTXO is needed to validate new txns and new blocks.c) Txns in the memory pool. This is the set of txns which are valid and known to the node but not yet included in a block. If block messages include txns hashes only the memory pool will be needed to validate a block.d) Txns involving the user's keys. This isn't a requirement but a user has a vested interest in ensuring copies of his txns are maintained. If nothing else this may be psychologically useful to improve trust in the DHT concept.Understand however the integrity of txns comes from their txn hash (TxId) so a "badly optimized" DHT would have suboptimal performance but not reduced security.Full raw blockchain = ~18.9 GBUndo chain = ~2.5 GB (can this be reduced? seems undo information past say the last 144 blocks would be of little value)Blockchain Indexes = ~2.0 GB (I haven't decoded these but find it interesting it is so large relative to the blocks)Chainstate (UTXO) = ~400 MB (compressed format contains unspent outputs & txn metadata only)Memory Pool = <1MB (unconfirmed txns, 538 at time of post)Total: ~23.8 GBNumber of blocks: 307,394Number of transactions: ~41,182,000Number of unspent txns: 3,347,562 (a txns with at least one unspent output)Number of unspent outputs: 11,648,626Size of hash-only blocks: 1,320 MB (includes headers & txn hashes)Size of the UTXO: 400 MB (unspent outputs only)Size of the Unspent Txns in entirety: ~1,500 MBSo the local cache storage requirements for a DHT node would be: 1.7 GB (or 2.8 GB if full txns of UTXO elements are retained). If we assume the average node maintains a 5% DHT share of the remaining txns (bulk of the block bodies) that would be another 1GB. DHT nodes would also keep a local copy of all memory pool but that is a rounding error on storage requirements.This wouldn't reduce the bootstrap time or (or amount of bandwidth used) for bootstrapping full nodes. As a full node you would still need to download 100% of the txns of 100% of the blocks to verify the longest chain is the longest valid chain. However once bootstrapped the ongoing resource requirements would be reduced. This would work best if the protocol contained a block message which just sent blockheader & txns hashes. Currently when a full node receives and verifies a new block which extends the longest chain it stores the full block and removes the now spent txns from the UTXO. DHT nodes would instead record the reduced block (header & hashes only), then saved its DHT share of the spent txns and discard the rest. To reduce the overhead from reorgs and provide better coverage for syncing nodes needing only the tip of the blockchain it may make sense for DHT nodes to retain all txns for the most recent x blocks (144? one block day?).If a structure like this was used for all nodes then "full nodes" would simply be nodes participating in the txn DHT that retain a 100% copy of the full txn set. The best comparison would be to the status of a "seeder" in the bittorrent protocol. Since all DHT nodes would retain a full copy of the UTXO these nodes could still support SPV clients. SPV clients could actually support the network by retaining a portion of the txn set. Retaining older txns would be especially beneficial in that they could reduce the load on full nodes when the network bootstraps new full nodes.

DeathAndTaxes

Legendary



Offline



Activity: 1218

Merit: 1007





Gerald Davis







DonatorLegendaryActivity: 1218Merit: 1007Gerald Davis Re: Using a DHT to reduce the resource requirements of full nodes. June 23, 2014, 04:48:20 PM #3 Quote from: onemorebtc on June 23, 2014, 04:11:50 PM i really like that idea.

but how to ensure that really old transactions are available forever?

they are only needed be new upcoming nodes which should mean most nodes dont even bother with storing anymore



The DHT protocol would assign nodes a subset based on the TxId and thus old txns would be just as likely to have coverage as newer ones. So the question becomes "how can you be sure the DHT protocol will always retain a copy of any individual txn (regardless of age). In reality as a fallback it would make sense for some nodes to still be "full nodes". They could be transparently compatible with the DHT nodes in that they use the same protocol but their local cache is always 100%.



There will always be a desire for some users to maintain a full copy of the blockchain. I don't think there will ever be a case where no full copy of the blockchain exists. The issue is that SPV nodes are (to borrow Bittorrent terminology) are leechers and they aren't even leechers which still contribute they would be download only leechers. This creates a perverse set of disincentives. As the number of full nodes (as a % of total users) falls the load on each full node increases, leading to less full nodes which further increases the load of the remaining full nodes. The use of a TxDht would hopefully increase the number of "almost full" (probably need a better name) nodes. This provides robust support for SPV nodes.



Speaking of SPV nodes, this type of structure would also allow SPV nodes to support the network on a limited basis. SPV nodes would simply maintain a smaller DHT share and they would only maintain txns which are "deep enough" in the blockchain (as they can't fully verify blocks). Remember the security of a txn comes from its hash and if that is broken well full nodes are compromised as well. This (with some other protocol enhancements) could even lead to SPV+ type nodes which maintain only the UTXO and can provide even more support to the network.



The raw spent txns make up the majority of the blockchain and beyond initial bootstrapping even full nodes rarely "use" many of these txns more than once. There is very little need for all nodes to continually maintain a full copy of all txns BUT it is also important that a node be able to obtain a copy of a given txn if needed. That makes storing spent txns in a DHT a perfect use case for a DHT but it could be expanded to include the UTXO as well. As the UTXO grows, individual outputs could be scored based on the probability it will be used in the near future (based on txn output, dust limit, current fees, and age). Older txns would be dropped from the local cache of some DHT nodes. In the event that 5 year old, 1 satoshi output is used in a future block, those nodes could trustlessly request a copy of it from the DHT. The DHT protocol would assign nodes a subset based on the TxId and thus old txns would be just as likely to have coverage as newer ones. So the question becomes "how can you be sure the DHT protocol will always retain a copy of any individual txn (regardless of age). In reality as a fallback it would make sense for some nodes to still be "full nodes". They could be transparently compatible with the DHT nodes in that they use the same protocol but their local cache is always 100%.There will always be a desire for some users to maintain a full copy of the blockchain. I don't think there will ever be a case where no full copy of the blockchain exists. The issue is that SPV nodes are (to borrow Bittorrent terminology) are leechers and they aren't even leechers which still contribute they would be download only leechers. This creates a perverse set of disincentives. As the number of full nodes (as a % of total users) falls the load on each full node increases, leading to less full nodes which further increases the load of the remaining full nodes. The use of a TxDht would hopefully increase the number of "almost full" (probably need a better name) nodes. This provides robust support for SPV nodes.Speaking of SPV nodes, this type of structure would also allow SPV nodes to support the network on a limited basis. SPV nodes would simply maintain a smaller DHT share and they would only maintain txns which are "deep enough" in the blockchain (as they can't fully verify blocks). Remember the security of a txn comes from its hash and if that is broken well full nodes are compromised as well. This (with some other protocol enhancements) could even lead to SPV+ type nodes which maintain only the UTXO and can provide even more support to the network.The raw spent txns make up the majority of the blockchain and beyond initial bootstrapping even full nodes rarely "use" many of these txns more than once. There is very little need for all nodes to continually maintain a full copy of all txns BUT it is also important that a node be able to obtain a copy of a given txn if needed. That makes storing spent txns in a DHT a perfect use case for a DHT but it could be expanded to include the UTXO as well. As the UTXO grows, individual outputs could be scored based on the probability it will be used in the near future (based on txn output, dust limit, current fees, and age). Older txns would be dropped from the local cache of some DHT nodes. In the event that 5 year old, 1 satoshi output is used in a future block, those nodes could trustlessly request a copy of it from the DHT.

DeathAndTaxes

Legendary



Offline



Activity: 1218

Merit: 1007





Gerald Davis







DonatorLegendaryActivity: 1218Merit: 1007Gerald Davis Re: Using a DHT to reduce the resource requirements of full nodes. June 23, 2014, 05:37:22 PM #6 Quote from: 12648430 on June 23, 2014, 05:09:59 PM The tx hash is a large proportion of the size of the tx itself, and many tx must be gathered from far and wide to assemble a single block for initial verification. Why not split up storage by block hashes?



That is an alternative however a single verified block is of little use. A bootstrapping node will need to verify all txns before it is synced with the network. Optimally the node would download all block headers & txn hashes. It would then request entire sets of txns from DHT peers (i.e. request from a particular DHT peer all txn hashes with the prefix 0x00e0 to 0x00ef). This would allow the node to use multiple peers (potentially dozens or hundreds if the bootstrapping peer has sufficient resources) to simultaneously download the txn cache in parallel.



I do agree that txn hashes do add some overhead. If we look at the full blockchain the average txn is 461 bytes. This means a txn hash set is ~7% of the full blockchain but it does mean ~7% additional overhead. It isn't necessary for txn hashes and block hashes to have the same hash length. A hash collision for bitcoin txns is not useful, an attacker would need a preimage attack. This means even 160 bit or 128 bit hashes would provide sufficient security and that would reduce the overhead to 3% to 4%. I doubt Bitcoin will be changing to support smaller txns hashes but it could be done in a backwards compatible manner. Still this is something for altcoins to keep in mind.

That is an alternative however a single verified block is of little use. A bootstrapping node will need to verify all txns before it is synced with the network. Optimally the node would download all block headers & txn hashes. It would then request entire sets of txns from DHT peers (i.e. request from a particular DHT peer all txn hashes with the prefix 0x00e0 to 0x00ef). This would allow the node to use multiple peers (potentially dozens or hundreds if the bootstrapping peer has sufficient resources) to simultaneously download the txn cache in parallel.I do agree that txn hashes do add some overhead. If we look at the full blockchain the average txn is 461 bytes. This means a txn hash set is ~7% of the full blockchain but it does mean ~7% additional overhead. It isn't necessary for txn hashes and block hashes to have the same hash length. A hash collision for bitcoin txns is not useful, an attacker would need a preimage attack. This means even 160 bit or 128 bit hashes would provide sufficient security and that would reduce the overhead to 3% to 4%. I doubt Bitcoin will be changing to support smaller txns hashes but it could be done in a backwards compatible manner. Still this is something for altcoins to keep in mind.

onemorebtc



Offline



Activity: 266

Merit: 250







Sr. MemberActivity: 266Merit: 250 Re: Using a DHT to reduce the resource requirements of full nodes. June 23, 2014, 05:49:01 PM #7 Quote from: DeathAndTaxes on June 23, 2014, 04:48:20 PM Quote from: onemorebtc on June 23, 2014, 04:11:50 PM i really like that idea.

but how to ensure that really old transactions are available forever?

they are only needed be new upcoming nodes which should mean most nodes dont even bother with storing anymore



The DHT protocol would assign nodes a subset based on the TxId and thus old txns would be just as likely to have coverage as newer ones. So the question becomes "how can you be sure the DHT protocol will always retain a copy of any individual txn (regardless of age). In reality as a fallback it would make sense for some nodes to still be "full nodes". They could be transparently compatible with the DHT nodes in that they use the same protocol but their local cache is always 100%.



The DHT protocol would assign nodes a subset based on the TxId and thus old txns would be just as likely to have coverage as newer ones. So the question becomes "how can you be sure the DHT protocol will always retain a copy of any individual txn (regardless of age). In reality as a fallback it would make sense for some nodes to still be "full nodes". They could be transparently compatible with the DHT nodes in that they use the same protocol but their local cache is always 100%.

i dont think you can enforce that on a protocol level. what if many nodes just decide(!) to just keep the uxto and throw away the rest?

this could lead to archive nodes which would get paid by new companies (miners, "banks"?) trying to entering the market - and close everybody else out



Quote from: DeathAndTaxes on June 23, 2014, 04:48:20 PM There will always be a desire for some users to maintain a full copy of the blockchain. I don't think there will ever be a case where no full copy of the blockchain exists. The issue is that SPV nodes are (to borrow Bittorrent terminology) are leechers and they aren't even leechers which still contribute they would be download only leechers. This creates a perverse set of disincentives. As the number of full nodes (as a % of total users) falls the load on each full node increases, leading to less full nodes which further increases the load of the remaining full nodes. The use of a TxDht would hopefully increase the number of "almost full" (probably need a better name) nodes. This provides robust support for SPV nodes.





i do understand your point and absolutely feel the same. a blockchain only with txids has some charme.



but i dont like phrases like "i dont believe there ever will be no copy of the blockchain"



Quote from: DeathAndTaxes on June 23, 2014, 04:48:20 PM

Speaking of SPV nodes, this type of structure would also allow SPV nodes to support the network on a limited basis. SPV nodes would simply maintain a smaller DHT share and they would only maintain txns which are "deep enough" in the blockchain (as they can't fully verify blocks). Remember the security of a txn comes from its hash and if that is broken well full nodes are compromised as well. This (with some other protocol enhancements) could even lead to SPV+ type nodes which maintain only the UTXO and can provide even more support to the network.



The raw spent txns make up the majority of the blockchain and beyond initial bootstrapping even full nodes rarely "use" many of these txns more than once. There is very little need for all nodes to continually maintain a full copy of all txns BUT it is also important that a node be able to obtain a copy of a given txn if needed. That makes storing spent txns in a DHT a perfect use case for a DHT but it could be expanded to include the UTXO as well. As the UTXO grows, individual outputs could be scored based on the probability it will be used in the near future (based on txn output, dust limit, current fees, and age). Older txns would be dropped from the local cache of some DHT nodes. In the event that 5 year old, 1 satoshi output is used in a future block, those nodes could trustlessly request a copy of it from the DHT.



++1 i dont think you can enforce that on a protocol level. what if many nodes just decide(!) to just keep the uxto and throw away the rest?this could lead to archive nodes which would get paid by new companies (miners, "banks"?) trying to entering the market - and close everybody else outi do understand your point and absolutely feel the same. a blockchain only with txids has some charme.but i dont like phrases like "i dont believe there ever will be no copy of the blockchain"++1 transfer 3 onemorebtc.k1024.de 1

DeathAndTaxes

Legendary



Offline



Activity: 1218

Merit: 1007





Gerald Davis







DonatorLegendaryActivity: 1218Merit: 1007Gerald Davis Re: Using a DHT to reduce the resource requirements of full nodes. June 23, 2014, 05:59:18 PM #8 Quote from: Sukrim on June 23, 2014, 05:22:23 PM I have a suspicion that transactions are not uniformely queried, especially if new blocks are published there is a good chance the DHT nodes holding these transactions might be DDoSed...

Good points to consider but I believe they can be circumvented.



First for efficiency reasons it would make sense for all DHT nodes to retain a "full local" cache of recent blocks. How "recent" is recent probably needs some research but the goal would be to reduce the overhead in the event of reorgs. I think keeping "one block day" (144 blocks) of the tip of the blockchain would be a good compromise between storage and efficiency. For similar reasons DHT nodes would also retain the full UTXO and the full memory pool. It would not be possible to DDOS the propagation of new blocks (or at least no easier than doing so currently with full nodes) as all DHT nodes would have a copy of the UTXO and memory pool meaning that DHT nodes would not need to rely on the DHT to validate new blocks.



In theory one could DDOS the bootstrapping of new nodes by blocking a subset of the nodes in the network. One would need to DDOS all nodes which contain that subset plus all full nodes. Remember full nodes could operate transparently on the DHT network by simply being a node whose DHT share is 100%. One way to look at it is using a bittorrent definition. Bittorrent looks at the availability of a torrent by the number of complete copies available (including all subsets) so 12.8 would mean that combined all seeds and peers have 12.8 copies. Bitcoin right now only have "seeds" so the availability is SUM(1.00* num_full_nodes). Under a DHT system the availability would be SUM(average_dht_share * num_DHT_nodes). While the occurrence that nodes query an individual txn will vary the point of a DHT is to distribute that load uniformly. A given txn may be queried at a higher frequency but as an example txns with a TxId beginning with an "e" are probably not queried significantly more or less than txn beginning with an "f".



Quote Also this could have privacy implications (DHT nodes knowing who queries for which TXIDs).



For full nodes bootstrapping from the DHT there is no privacy concern. For SPV nodes there are potential privacy concerns however there are already privacy concerns with SPV nodes.

Good points to consider but I believe they can be circumvented.First for efficiency reasons it would make sense for all DHT nodes to retain a "full local" cache of recent blocks. How "recent" is recent probably needs some research but the goal would be to reduce the overhead in the event of reorgs. I think keeping "one block day" (144 blocks) of the tip of the blockchain would be a good compromise between storage and efficiency. For similar reasons DHT nodes would also retain the full UTXO and the full memory pool. It would not be possible to DDOS the propagation of new blocks (or at least no easier than doing so currently with full nodes) as all DHT nodes would have a copy of the UTXO and memory pool meaning that DHT nodes would not need to rely on the DHT to validate new blocks.In theory one could DDOS the bootstrapping of new nodes by blocking a subset of the nodes in the network. One would need to DDOS all nodes which contain that subset plus all full nodes. Remember full nodes could operate transparently on the DHT network by simply being a node whose DHT share is 100%. One way to look at it is using a bittorrent definition. Bittorrent looks at the availability of a torrent by the number of complete copies available (including all subsets) so 12.8 would mean that combined all seeds and peers have 12.8 copies. Bitcoin right now only have "seeds" so the availability is SUM(1.00* num_full_nodes). Under a DHT system the availability would be SUM(average_dht_share * num_DHT_nodes). While the occurrence that nodes query an individual txn will vary the point of a DHT is to distribute that load uniformly. A given txn may be queried at a higher frequency but as an example txns with a TxId beginning with an "e" are probably not queried significantly more or less than txn beginning with an "f".For full nodes bootstrapping from the DHT there is no privacy concern. For SPV nodes there are potential privacy concerns however there are already privacy concerns with SPV nodes.

tl121



Offline



Activity: 278

Merit: 251







Sr. MemberActivity: 278Merit: 251 Re: Using a DHT to reduce the resource requirements of full nodes. June 23, 2014, 06:00:02 PM #9 Have you analyzed the impact on required bandwidth including the overhead of the added mechanisms?



I found the cost of storing the complete block chain to be less than $2.00 in disk storage, and this is a one time cost that's good for the life of a hard drive. Unfortunately I pay many times this much each month for DSL Internet service that has marginal uplink bandwidth to run a full node. For me, at least, disk space is effectively free compared to bandwidth.





DeathAndTaxes

Legendary



Offline



Activity: 1218

Merit: 1007





Gerald Davis







DonatorLegendaryActivity: 1218Merit: 1007Gerald Davis Re: Using a DHT to reduce the resource requirements of full nodes. June 23, 2014, 06:07:56 PM #10 Quote from: onemorebtc on June 23, 2014, 05:49:01 PM i dont think you can enforce that on a protocol level. what if many nodes just decide(!) to just keep the uxto and throw away the rest?

this could lead to archive nodes which would get paid by new companies (miners, "banks"?) trying to entering the market - and close everybody else out

That could happen right now. Nothing requires a bitcoin user to run a full node. Exchanges, eWallets, lite clients, and SPV nodes all make it possible for a user to not support the network. Still forming a cartel on freely available information is unlikely. As long as there is a single copy there is a free alternative to the cartel.



Say today there are 10,000 full nodes. That means there are 10,000 independent copies of the txn set. Now imagine those 10,000 nodes were replaced with x DHT nodes that combined had 10,000 complete copies of the txn set. From a decentralization standpoint nothing has changed. I would state that lowering the requirements of helping the network will increase the amount of users willing to help.



Today you have two options

a) run a full node (and not a KB less)

OR

b) use a SPV or other method which doesn't support the network at all



Under a DHT model you would still have the exact same two options plus

c) run a DHT node and maintain an independent copy of some portion (>0% and <100%) of the txn set. Plus a complete copy of the UTXO and a complete copy of the reduced blocks.



Quote but i dont like phrases like "i dont believe there ever will be no copy of the blockchain"

That could happen right now. Nothing requires a bitcoin user to run a full node. Exchanges, eWallets, lite clients, and SPV nodes all make it possible for a user to not support the network. Still forming a cartel on freely available information is unlikely. As long as there is a single copy there is a free alternative to the cartel.Say today there are 10,000 full nodes. That means there are 10,000 independent copies of the txn set. Now imagine those 10,000 nodes were replaced with x DHT nodes that combined had 10,000 complete copies of the txn set. From a decentralization standpoint nothing has changed. I would state that lowering the requirements of helping the network will increase the amount of users willing to help.Today you have two optionsa) run a full node (and not a KB less)ORb) use a SPV or other method which doesn't support the network at allUnder a DHT model you would still have the exact same two options plusc) run a DHT node and maintain an independent copy of some portion (>0% and <100%) of the txn set. Plus a complete copy of the UTXO and a complete copy of the reduced blocks.Well you PERSONALLY can ensure that is always true by always maintaining a full copy. A DHT network doesn't change that. If someone under a DHT model is unwilling to support the network by carrying part of the blockchain, well they probably even less likely to support the network by running a full node. The same risk applies and if you feel it is a credible risk well then I wouldn't be holding any Bitcoins as the protocol will fail if there is a ever a point in time where there is no complete (decentralized or otherwise) copy of the blockchain exists.

DeathAndTaxes

Legendary



Offline



Activity: 1218

Merit: 1007





Gerald Davis







DonatorLegendaryActivity: 1218Merit: 1007Gerald Davis Re: Using a DHT to reduce the resource requirements of full nodes. June 23, 2014, 06:33:12 PM

Last edit: June 23, 2014, 07:02:19 PM by DeathAndTaxes #11 DHT nodes, would store block header & set of txn hashes. As originally proposed this would require a new message for relaying a reduced block. An alternative which may work better would be to have a new inventory type merkletree. This would allow DHT nodes to use the existing getheaders message and then request the txn hashes using a subsequent message (getmerkletrees) after the longest chain has been downloaded and verified. Support this does not need to be limited to DHT nodes. Full nodes could be updated to support this as well and it would be optimal if they did. This would have an added advantage of reducing the new block propagation time as well.



Ultimately the the distinction of full node could be deprecated. Full nodes and DHT nodes would simply be "nodes" which identify what subset of the txn set they retain. Some nodes would retain <100% of the "archive txn set", some would retain 100% and some (SPV+ nodes) would retain 0% (but they would retain blockheaders and potentially merkle trees). However for a bootstrapping peer there is little difference between a single node which has the full txns set or a set of nodes which individually do not but combined do. To avoid overloading a single noode, bootstrapping peers would optimally use multiple nodes anyways even if they have access to a node which has all the necessary information.



DeathAndTaxes

Legendary



Offline



Activity: 1218

Merit: 1007





Gerald Davis







DonatorLegendaryActivity: 1218Merit: 1007Gerald Davis Re: Using a DHT to reduce the resource requirements of full nodes. June 25, 2014, 07:18:12 PM

Last edit: June 25, 2014, 07:46:25 PM by DeathAndTaxes #13 Quote from: jl2012 on June 25, 2014, 04:43:46 PM The major problem of running a full node is bandwidth, not storage. 23.8GB is nothing for modern harddrives, and is affordable even with SSD



Agreed and maybe this would be a non-issue if full nodes just had better peer selection and bandwidth throttling features. While a DHT can't reduce the bandwidth requirements (they would actually increase by some percentage) the bandwidth requirement of each node could be reduced if there are more nodes. Today you have a very binary choice. You either are a full node at a cost of ~25GB plus potentially higher bandwidth requirements or you use a lite client and don't support the network at all. My assumption is there are some users who would be willing to support the network to a limited degree.



One thing which may make it more clear is just to look at an abstract simplified view of syncing peers. Lets put all nodes into one of two categories; they are either currently up to date or they are a new node which needs to bootstrap from the genesis block. For a group of "current" peers the cost per node to remain current is not dependent on the number of peers, it is dependent on the rate of new information (new txns) and the protocol efficiency in distributing that new information. The cost to bootstrap a single new node is dependent on the number of peers that can assist and the protocol efficiency.



So lets ignore protocol efficiency for just a second and look at the raw data which needs to be propagated. A new 1MB block every 600 seconds is 13kbps. Excluding protocol overhead the network is adding 13kilobits of new information every second. The average node will need to both receive and send that new information so we are looking at 26kbps plus protocol overhead. Now that is just the per node cost of a group of nodes to remain current. Lets say one new node joins the group. The new node will need to receive 20GB of information. To do that in one day requires the new node to receive 230 kbps. This load is amortized over the number of nodes which can assist in that process. Depending on the ratio between bootstrapping nodes and node which can assist that could potentially be higher than the cost to remain synced.



You are right if a DHT was used and it didn't increase the number of nodes at all then it would increase not decrease the per node bandwidth cost. However if the use of a DHT leads to more nodes that can assist bootstrapping new node then the bandwidth cost would be reduced.



So it boils down to this question:

Would reducing the storage requirements of a node by 80% to 90%, smarter load balancing of peers and including bandwidth limiting options for nodes significantly increase the number of nodes? If the answer is no then nothing is gained. If the answer is yes then security can be improved and the per node bandwidth cost can be reduced.

Agreed and maybe this would be a non-issue if full nodes just had better peer selection and bandwidth throttling features. While a DHT can't reduce the bandwidth requirements (they would actually increase by some percentage) the bandwidth requirement of each node could be reduced if there are more nodes. Today you have a very binary choice. You either are a full node at a cost of ~25GB plus potentially higher bandwidth requirements or you use a lite client and don't support the network at all. My assumption is there are some users who would be willing to support the network to a limited degree.One thing which may make it more clear is just to look at an abstract simplified view of syncing peers. Lets put all nodes into one of two categories; they are either currently up to date or they are a new node which needs to bootstrap from the genesis block. For a group of "current" peers the cost per node to remain current is not dependent on the number of peers, it is dependent on the rate of new information (new txns) and the protocol efficiency in distributing that new information. The cost to bootstrap a single new node is dependent on the number of peers that can assist and the protocol efficiency.So lets ignore protocol efficiency for just a second and look at the raw data which needs to be propagated. A new 1MB block every 600 seconds is 13kbps. Excluding protocol overhead the network is adding 13kilobits of new information every second. The average node will need to both receive and send that new information so we are looking at 26kbps plus protocol overhead. Now that is just the per node cost of a group of nodes to remain current. Lets say one new node joins the group. The new node will need to receive 20GB of information. To do that in one day requires the new node to receive 230 kbps. This load is amortized over the number of nodes which can assist in that process. Depending on the ratio between bootstrapping nodes and node which can assist that could potentially be higher than the cost to remain synced.You are right if a DHT was used and it didn't increase the number of nodes at all then it would increase not decrease the per node bandwidth cost. However if the use of a DHT leads to more nodes that can assist bootstrapping new node then the bandwidth cost would be reduced.So it boils down to this question:Would reducing the storage requirements of a node by 80% to 90%, smarter load balancing of peers and including bandwidth limiting options for nodes significantly increase the number of nodes? If the answer is no then nothing is gained. If the answer is yes then security can be improved and the per node bandwidth cost can be reduced.