Pieter Wuille





Offline



Activity: 1064

Merit: 1039







LegendaryActivity: 1064Merit: 1039 Ultraprune merged in mainline October 20, 2012, 10:37:51 PM #1



I've just merged my "ultraprune" branch into mainline (including Mike's LevelDB work). This is a very significant change, and all testing is certainly welcome. As a result of this, many pull requests probably don't apply cleanly anymore. If you need help rebasing them on the new structure, ask me.



The idea behind ultraprune is to use an ultra-pruned copy (only unspent transaction outputs in a custom compact format) of the block chain for validation (as opposed to a transaction index into the block chain). It still keeps all blocks around for serving them to other nodes, for rescanning, and for reorganisations. As such, it is still a full node. So, despite the name, it does not implement any actual pruning yet, though pruning would be trivial to implement now. This would have profound effects on the network though, so may still need some discussion first.



A small summary of the changes: Instead of blk000?.dat, we have blocks/blk000??.dat files of max 128 MiB, pre-allocated per 16 MiB

Instead of a Berklely DB blkindex.dat, we have a LevelDB directory blktree/. This only contains a block index, no transaction index.

A new LevelDB directory coins/, which contains data about the current unspent transaction output set.

New files blocks/rev000??.dat contain undo data for blocks (necessary for reorganisation).

More information is kept about blocks and block files, to facilitate pruning in the future, and to prepare for a headers-first mode.

Two new RPC calls are added: gettxout and gettxoutsetinfo.

The most noticeable change should be performance: LevelDB deals much better with slow I/O than BDB does, and the working set size for validation is an order of magnitude smaller. In the longer run, I think it is an evolution towards separation between validation nodes

and archive nodes, which is needed in my opinion.

(copy of mailinglist post)I've just merged my "ultraprune" branch into mainline (including Mike's LevelDB work). This is a very significant change, and all testing is certainly welcome. As a result of this, many pull requests probably don't apply cleanly anymore. If you need help rebasing them on the new structure, ask me.The idea behind ultraprune is to use an ultra-pruned copy (only unspent transaction outputs in a custom compact format) of the block chain for validation (as opposed to a transaction index into the block chain). It still keeps all blocks around for serving them to other nodes, for rescanning, and for reorganisations. As such, it is still a full node. So, despite the name, it does not implement any actual pruning yet, though pruning would be trivial to implement now. This would have profound effects on the network though, so may still need some discussion first.A small summary of the changes:The most noticeable change should be performance: LevelDB deals much better with slow I/O than BDB does, and the working set size for validation is an order of magnitude smaller. In the longer run, I think it is an evolution towards separation between validation nodesand archive nodes, which is needed in my opinion. I do Bitcoin stuff.

AWARD-WINNING

CASINO CRYPTO EXCLUSIVE

CLUBHOUSE 1500+

GAMES 2 MIN

CASH-OUTS 24/7

SUPPORT 100s OF

FREE SPINS PLAY NOW rtised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertised sites are not endorsedy the Bitcoin Forum. They may beunsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.

kokjo



Offline



Activity: 1050

Merit: 1000



You are WRONG!







LegendaryActivity: 1050Merit: 1000You are WRONG! Re: Ultraprune merged in mainline October 21, 2012, 09:53:11 AM #9 testing it now!! "The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell

Pieter Wuille





Offline



Activity: 1064

Merit: 1039







LegendaryActivity: 1064Merit: 1039 Re: Ultraprune merged in mainline October 21, 2012, 03:01:46 PM #13



Quote from: MysteryMiner on October 21, 2012, 02:38:34 PM One more question - will the new database discard spent addresses? Some places says it will, some says it will not. I am confused. What will happen to clients that rely on downloading the complete transaction history and verify all blocks and transactions in them on-the-way, like 0.3.xx does?



The current code does not prune anything - it uses a pruned copy (in addition to the blockchain itself) for validation. Since this copy is much smaller, far less data needs to be accessed during block and transaction validation (it's around 120 MB right now). This makes it faster to validate and to update the database.



Also, Bitcoin at the protocol level does not know anything about addresses or balances - those are client-side things provided by the wallet abstractions. What we're talking about is removing individual transaction outputs that have been spent.



At some later point in time we may add actual pruning, by removing blocks (but not their unspent outputs in the pruned copy) that are old enough. This will imply they cannot be served to other nodes, cannot be rescanned, and cannot be reorganised away. Clearly not everyone in the network can do this, as that would mean new nodes cannot bootstrap anymore. This is why I believe in a move towards validation nodes and archive nodes.



Also, Bitcoin is a zero-trust system (at least full nodes are). This means that no data ever received from the network is ever taken for granted, and needs validation. This implies you can't ever bootstrap a (zero trust) node without having it validate the entire block chain (although it is not necessary that everyone keeps that data around forever). An answer to MysteryMiner, who asked in another thread:The current code does not prune anything - it uses a pruned copy (in addition to the blockchain itself) for validation. Since this copy is much smaller, far less data needs to be accessed during block and transaction validation (it's around 120 MB right now). This makes it faster to validate and to update the database.Also, Bitcoin at the protocol level does not know anything about addresses or balances - those are client-side things provided by the wallet abstractions. What we're talking about is removing individual transaction outputs that have been spent.At some later point in time we may add actual pruning, by removing blocks (but not their unspent outputs in the pruned copy) that are old enough. This will imply they cannot be served to other nodes, cannot be rescanned, and cannot be reorganised away. Clearly not everyone in the network can do this, as that would mean new nodes cannot bootstrap anymore. This is why I believe in a move towards validation nodes and archive nodes.Also, Bitcoin is a zero-trust system (at least full nodes are). This means that no data ever received from the network is ever taken for granted, and needs validation. This implies you can't ever bootstrap a (zero trust) node without having it validate the entire block chain (although it is not necessary that everyone keeps that data around forever). I do Bitcoin stuff.

casascius

VIP

Legendary



Offline



Activity: 1386

Merit: 1064





The Casascius 1oz 10BTC Silver Round (w/ Gold B)







Mike CaldwellVIPLegendaryActivity: 1386Merit: 1064The Casascius 1oz 10BTC Silver Round (w/ Gold B) Re: Ultraprune merged in mainline October 21, 2012, 03:47:24 PM

Last edit: October 21, 2012, 04:00:39 PM by casascius #14







The user ought to have a simple way to decide what he wants to contribute to the network, with the default being something that ensures that the user remains a "full citizen node" but perhaps without automatically seeding large amounts of history without the user's consent. I imagine having four or five settings, but a real implementation will probably expound on the idea. (I realize that this is a thread about "ultraprune" and my examples mention "metatree", but please see past that - I am only presenting a 30,000-foot-level view of how I imagine this working)



What the other settings might be:



MINIMAL:

* Recommended for low-bandwidth or high-cost network connections.

* No incoming connections from peers allowed.

* Downloaded data set consists only of the minimum necessary to determine the latest block.

* Information about balances queried from peers on an as-needed basis

* Lowest possible security. Add trusted peers to the preferred peer list whenever possible.



LOW:

* No incoming connections from peers allowed.

* A pruned dataset is downloaded and maintained.



MEDIUM: (this would be the default setting)

* Incoming connections from peers allowed

* A pruned dataset is downloaded and maintained.

* Peers may download the dataset up to the configured upload limit



MEDIUM-HIGH: see image...



HIGH:

* Incoming connections from peers allowed

* Accepts metatree queries from peers, and seeds historical

versions of metatree to assist in recovery/rollback if needed

* Full transaction history is maintained (requires XX GB,

which increases over time)

* Allows peers to download the data set up to the

configured bandwidth limit.

* Full network citizen/historian which assists in allowing other nodes

to recover the entire network history in case recovery is needed

* Recommended setting for mining nodes wherever feasible



Ideally, if all of these modes were implemented, a new installation could start running in the "MINIMAL" mode regardless of user choice so it is instantly usable without a day of downloading, and then slowly upgrade itself to the level of the user's choice as objects are downloaded and verified. Here's how it ought to work in my mind:The user ought to have a simple way to decide what he wants to contribute to the network, with the default being something that ensures that the user remains a "full citizen node" but perhaps without automatically seeding large amounts of history without the user's consent. I imagine having four or five settings, but a real implementation will probably expound on the idea. (I realize that this is a thread about "ultraprune" and my examples mention "metatree", but please see past that - I am only presenting a 30,000-foot-level view of how I imagine this working)What the other settings might be:MINIMAL:* Recommended for low-bandwidth or high-cost network connections.* No incoming connections from peers allowed.* Downloaded data set consists only of the minimum necessary to determine the latest block.* Information about balances queried from peers on an as-needed basis* Lowest possible security. Add trusted peers to the preferred peer list whenever possible.LOW:* No incoming connections from peers allowed.* A pruned dataset is downloaded and maintained.MEDIUM: (this would be the default setting)* Incoming connections from peers allowed* A pruned dataset is downloaded and maintained.* Peers may download the dataset up to the configured upload limitMEDIUM-HIGH: see image...HIGH:* Incoming connections from peers allowed* Accepts metatree queries from peers, and seeds historicalversions of metatree to assist in recovery/rollback if needed* Full transaction history is maintained (requires XX GB,which increases over time)* Allows peers to download the data set up to theconfigured bandwidth limit.* Full network citizen/historian which assists in allowing other nodesto recover the entire network history in case recovery is needed* Recommended setting for mining nodes wherever feasibleIdeally, if all of these modes were implemented, a new installation could start running in the "MINIMAL" mode regardless of user choice so it is instantly usable without a day of downloading, and then slowly upgrade itself to the level of the user's choice as objects are downloaded and verified. Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.

flipperfish



Offline



Activity: 350

Merit: 251





Dolphie Selfie







Sr. MemberActivity: 350Merit: 251Dolphie Selfie Re: Ultraprune merged in mainline October 21, 2012, 04:26:18 PM #19 Quote from: Pieter Wuille on October 21, 2012, 03:01:46 PM At some later point in time we may add actual pruning, by removing blocks (but not their unspent outputs in the pruned copy) that are old enough. This will imply they cannot be served to other nodes, cannot be rescanned, and cannot be reorganised away. Clearly not everyone in the network can do this, as that would mean new nodes cannot bootstrap anymore. This is why I believe in a move towards validation nodes and archive nodes.



Also, Bitcoin is a zero-trust system (at least full nodes are). This means that no data ever received from the network is ever taken for granted, and needs validation. This implies you can't ever bootstrap a (zero trust) node without having it validate the entire block chain (although it is not necessary that everyone keeps that data around forever).



Maybe it is a good a idea, to define some of the terms used here (maybe in the wiki?). It can be very confusing to read different terms for the same thing and the same words for different things, especially if you're not deeply invovlved in the ongoing development. Also I think "ultraprune" should really be renamed, as it does not prune, but rather lays the foundation for pruning. I would suggest calling it "historic data separation" or "blockchain validation data optimization" as this is what it does.



As far I identiefied this terms from the recent posts about this topic. Please correct me, if I'm wrong:



Pruning: To remove all transactions, whose outputs have been spent.

Full Node: A bitcoin-client, which stores only the data needed to validate new transactions within the network. It has seen the complete blockchain history at some previous time and can thus be sure, that it's current validation data is correct.

Archiving Node: A bitcoin-client, which stores all data from the beginning of the blockchain. Can serve the whole blockchain to other nodes. Needed for bootstrapping new nodes without trust to anything else.

Light Node: A bitcoin-client, which does not store any data and has to trust another Full or Archiving Node.

Zero Trust Node: A bitcoin-client, which can validate new transactions within the network, without having to trust anything besides the blockchain. Full Nodes and Archival Nodes are Zero Trust Nodes. Maybe it is a good a idea, to define some of the terms used here (maybe in the wiki?). It can be very confusing to read different terms for the same thing and the same words for different things, especially if you're not deeply invovlved in the ongoing development. Also I think "ultraprune" should really be renamed, as it does not prune, but rather lays the foundation for pruning. I would suggest calling it "historic data separation" or "blockchain validation data optimization" as this is what it does.As far I identiefied this terms from the recent posts about this topic. Please correct me, if I'm wrong: