Last of the V8s



Offline



Activity: 1596

Merit: 4130





Be a bank







LegendaryActivity: 1596Merit: 4130Be a bank Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 10:22:38 AM #1

haven't edited the quote so there's lots of political stuff which would be off topic here (but it is your board!)

please see @-ck's thread for some initial commentary on point 13. otherwise,

invite dev and tech regulars to comment



Quote from: Troll Buster on July 06, 2017, 12:53:20 AM Quote from: DooMAD on July 05, 2017, 07:55:47 PM The thing to bear in mind is that Core have an exemplary record for testing, bugfixing and just generally having an incredibly stable and reliable codebase. So while people may run SegWit2x code in the interim to make sure it's activated, I envision many of them would switch back to Core the moment Core release compatible code. As such, any loss in Core's dominance would probably only be temporary.



In short, I agree there's probably enough support to active a 2MB fork, but I disagree that Core will lose any significant market share over the long term, even if the 2MB fork creates the longest chain and earns the Bitcoin mantle.



Nokia was also good at testing and reliability, where are they now?



And Core code is shit, anyone experienced in writing kernels/drivers, or ultra low latency communication/financial/military/security systems would instantly notice:



1. The general lack of regards for L0/L1/TLB/L2/L3/DRAM latency and data locality.

2. Lack of cache line padding and alignment.

3. Lack of inline assembly in critical loops.

4. Lack of CPU and platform specific speed ups.

5. Inefficient data structures and data flow.

6. Not replacing simple if/else with branchless operations.

7. Not using __builtin_expect() to make branch predictions more accurate.

8. Not breaking bigger loops into smaller loops to make use of L0 cache (Loop tiling).

9. Not coding in a way that deliberately helps CPU prefetcher cheats time.

10. Unnecessary memory copying.

11. Unnecessary pointer chasing.

12. Using pointers instead of registers in performance sensitive areas.

13. Inefficient data storage (LevelDB? Come on, the best LevelDB devs moved onto RocksDB years ago)

14. Lack of simplicity.

15. Lack of clear separation of concerns.

16. The general pile-togetherness commonly seen in projects involving too many people of different skill levels.



The bottleneck of performance today is memory, the CPU register is 150-400 times faster than main memory, 10x that if you use the newest CPUs and code in a way to make use of all the execution units parallelly and make use of SIMD (out-of-order execution window size, 168 in Sandy Bridge, 192 in Haswell, 224 in Skylake).



One simple cache miss and you end up wasting the time for 30-400 CPU instructions. Even moving 1 byte from one core to another takes 40 nanoseconds, that's enough time for 160 instructions on a 4GHz CPU.



You take one look at Core's code and you know instantly most of the people who wrote it knows only software but not hardware, they know how to write the logic, they know how to allocate and release memory, but they don't understand the hardware they're running the code on, they don't know how electrons are being moved from one place to another inside the CPU at the nanometer level, if you don't have instinctive knowledge of hardware, you'll never be able to write great codes, good maybe, but not great.



Since inception, Core was written by amateurs or semi-professionals, picked up by other amateurs or semi-professionals, it works, there are small nugget of good code here and there, contributed by people who knew what they were doing, but over all the code is nowhere near good, not even close, really just a bunch of slow crap code written by people of different skill levels.



There are plenty of gurus out there who can make Core's code run two to four times faster without even trying. But most of them won't bother, if they're going to work for the bankers they'd expect to get paid handsomely for it.



Quote from: DooMAD on July 05, 2017, 07:55:47 PM So while people may run SegWit2x code in the interim to make sure it's activated, I envision many of them would switch back to Core the moment Core release compatible code. As such, any loss in Core's dominance would probably only be temporary.



In short, I agree there's probably enough support to active a 2MB fork, but I disagree that Core will lose any significant market share over the long term, even if the 2MB fork creates the longest chain and earns the Bitcoin mantle.



So even a Core fan boy have to agree that Core must fall in line to stay relevant.



A fan boy can fantasize everyone flocking back to Core after they lose the first to market advantage.



But the key is even if Core decide to fall in line to stay relevant, they can no longer play god like before.



So what's your point.

Nokia was also good at testing and reliability, where are they now?And Core code is shit, anyone experienced in writing kernels/drivers, or ultra low latency communication/financial/military/security systems would instantly notice:1. The general lack of regards for L0/L1/TLB/L2/L3/DRAM latency and data locality.2. Lack of cache line padding and alignment.3. Lack of inline assembly in critical loops.4. Lack of CPU and platform specific speed ups.5. Inefficient data structures and data flow.6. Not replacing simple if/else with branchless operations.7. Not using __builtin_expect() to make branch predictions more accurate.8. Not breaking bigger loops into smaller loops to make use of L0 cache (Loop tiling).9. Not coding in a way that deliberately helps CPU prefetcher cheats time.10. Unnecessary memory copying.11. Unnecessary pointer chasing.12. Using pointers instead of registers in performance sensitive areas.13. Inefficient data storage (LevelDB? Come on, the best LevelDB devs moved onto RocksDB years ago)14. Lack of simplicity.15. Lack of clear separation of concerns.16. The general pile-togetherness commonly seen in projects involving too many people of different skill levels.The bottleneck of performance today is memory, the CPU register is 150-400 times faster than main memory, 10x that if you use the newest CPUs and code in a way to make use of all the execution units parallelly and make use of SIMD (out-of-order execution window size, 168 in Sandy Bridge, 192 in Haswell, 224 in Skylake).One simple cache miss and you end up wasting the time for 30-400 CPU instructions. Even moving 1 byte from one core to another takes 40 nanoseconds, that's enough time for 160 instructions on a 4GHz CPU.You take one look at Core's code and you know instantly most of the people who wrote it knows only software but not hardware, they know how to write the logic, they know how to allocate and release memory, but they don't understand the hardware they're running the code on, they don't know how electrons are being moved from one place to another inside the CPU at the nanometer level, if you don't have instinctive knowledge of hardware, you'll never be able to write great codes, good maybe, but not great.Since inception, Core was written by amateurs or semi-professionals, picked up by other amateurs or semi-professionals, it works, there are small nugget of good code here and there, contributed by people who knew what they were doing, but over all the code is nowhere near good, not even close, really just a bunch of slow crap code written by people of different skill levels.There are plenty of gurus out there who can make Core's code run two to four times faster without even trying. But most of them won't bother, if they're going to work for the bankers they'd expect to get paid handsomely for it.So even a Core fan boy have to agree that Core must fall in line to stay relevant.A fan boy can fantasize everyone flocking back to Core after they lose the first to market advantage.But the key is even if Core decide to fall in line to stay relevant, they can no longer play god like before.So what's your point. thought this was worth preserving - was thoroughly off topic in another thread and may get deleted.haven't edited the quote so there's lots of political stuff which would be off topic here (but it is your board!)please see @-ck's thread for some initial commentary on point 13. otherwise,invite dev and tech regulars to comment https://i.imgur.com/UIm67kh.jpg

gmaxwell

Legendary





Offline



Activity: 3178

Merit: 4298









ModeratorLegendaryActivity: 3178Merit: 4298 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 11:28:18 AM

Last edit: July 08, 2017, 12:30:35 AM by gmaxwell #2



A quarter of the items in the list like "Lack of inline assembly in critical loops." are both untrue and also show up in other abusive folks lists as things Bitcoin Core is doing and is awful for doing because its antithetical to portability reliability or the posters idea of code aesthetics (or because MSVC stopped supporting inline assembly thus anyone who uses it is a "moron").



Here is the straight dope: If the comments had merit and the author were qualified to apply them-- where is the patch? Oh look at that, no patches.



Many of the of the people working on the project have a long term experience with low level programming (for example I spend many years building multimedia codecs; wladimir does things like video drivers and IIRC used to work in the semiconductor industry), and the codebase reflects



Some of these pieces of advice are just a bit outdated as well-- it makes little sense to bake in an optimization that a compiler will reliably perform on its own at the expense of code clarity and maintainability; especially in the 99% of code that isn't hot or on a latency critical path. (Examples being loop invariant code motion and use of conditional moves instead of branching).



Similarly, some are true for generic non-hot-path code: E.g. it's pretty challenging in idiomatic, safe C++ to avoid some amount of superfluous memory copying (especially prior to C++11 which we were only able to upgrade to in the last year due to laggards in the userbase), but in the critical path for validation there is virtually none (though there are an excess of small allocations, help improving that would be very welcome). Though, you're not likely to know that if you're just tossing around insults on the internet instead of starting up a profiler.



And of course, we're all quite busy keeping things running reliably and improving-- and pulling out the big tens of percent performance improvements that come from high level algorithmic improvements. Eeking out the last percent in micro-optimizations isn't always something that we have the resources to do even where they make sense from a maintainability perspective. But, instead we're off building the fastest ECC validation code that exists out there bar none; because thats simply more important.



Could there be more micro-optimizations: Absolutely. So step on up and get your hands dirty because there is 10x as much work needed as there are resources are. There is almost no funding (unlike the millions poured into BU just to crank out crashware); and we can't have basically any failures-- at least not in the consensus critical parts. Oh yea, anonymous people will be abusive to you on the internet too. It's great fun.



Quote Inefficient data storage Oh please. Cargo cult bullshit at its worst. Do you even know what leveldb is used for in Bitcoin? What reason do you believe that $BUZZWORD_PACKAGE_DEJURE is any better for that? Did it occur to you that perhaps people have already benchmarked other options? Rocks has a lot of feature set which is completely irrelevant for our very narrow use of leveldb-- I see in your other posts that you're going on about superior compression in rocksdb: Guess what: we disable compression and rip out out of leveldb, because it HURTS PERFORMANCE and actually makes the database larger-- for our use case. It turns out that cryptographic hashes are not very compressible. (And as CK pointed out, no the blockchain isn't stored in it-- that would be pretty stupid)



Pretty sad that you feel qualified to through out that long list of insults without having much of an idea about the architecture of the software.



Quote Since inception, Core was written by amateurs or semi-professionals, picked up by other amateurs or semi-professionals The regular contributors who have written most of the code are the same people pretty much through the entire life of the project; and they're professionals with many years of experience. Perhaps you'd care to share with use your lovely and impressive works?



Quote run two to four times faster without even trying. Which wouldn't even hold a candle to the multiple orders of magnitude speedup we've produced so far cumulatively through the life of the project-- exactly my point about micro-optimizations. Of course, contributions are welcome. But it's a heck of a lot easier to wave your arms and insult people who've produced hundred fold improvements, because you think a laundry list of magic moves is going to get another couple times (and they might-- but at what cost?)



If you'd like to help out it's open and ready-- though you'll be held to the same high standard of review and validation and not just given a pass because a micro-benchmark got 1% faster-- reliability is the first concern... but 2x-level improvements in latency or throughput critical paths would be very very welcome even if they were a bit painful to review.



If you're not interested or able-- well then maybe you're just another drunken sports fan throwing concessions from the stand convinced that you could do so much better than the team, though you won't ever take to the field yourself. It doesn't impress, quite the opposite: because you're effectively exploiting the fact that we don't self-promote much, and so you can get away with slinging some rubbish about how terrible we are just to try to make yourself look impressive. It's a low blow against some very hard working people whom owe nothing to you.



If you do a really outstanding job perhaps you'll be able to overcome the embarrassment of:



Quote 2) Say what you will about Craig, he's still a mathematician, the math checks out.

(Hint: Wright's output is almost all pure gibberish; though perhaps you were too busy having fuck screamed at you to notice little details like his code examples for quadratic signature hashing being code from a testing harness that has nothing to do with validation, his fix being a total no op, his false claims that quadratic sighashing is an implementation issue, false claims about altstack having anything to do with turing completeness, false claims that segwit makes the system quadratically slower, false claim that Bitcoin Core removed opcode, yadda yadda. )

I for one an not impressed. Show us some contributions if you want to show that you know something useful, not hot air.

What you're seeing here is someone trying to pump his ego by crapping on the work of others and trying to show off to impress you with how uber technical he is-- not the first or the last one of those we'll see.A quarter of the items in the list like "Lack of inline assembly in critical loops." are both untrue and also show up in other abusive folks lists as things Bitcoin Core is doing and is awful for doing because its antithetical to portability reliability or the posters idea of code aesthetics (or because MSVC stopped supporting inline assembly thus anyone who uses it is a "moron").Here is the straight dope: If the comments had merit and the author were qualified to apply them-- where is the patch? Oh look at that, no patches.Many of the of the people working on the project have a long term experience with low level programming (for example I spend many years building multimedia codecs; wladimir does things like video drivers and IIRC used to work in the semiconductor industry), and the codebase reflects many points of optimization with micro-architectural features in mind. But _most_ of the codebase is not a hot-path and _all_ of the codebase must be optimized for reliability and reviewability above pretty much all else.Some of these pieces of advice are just a bit outdated as well-- it makes little sense to bake in an optimization that a compiler will reliably perform on its own at the expense of code clarity and maintainability; especially in the 99% of code that isn't hot or on a latency critical path. (Examples being loop invariant code motion and use of conditional moves instead of branching).Similarly, some are true for generic non-hot-path code: E.g. it's pretty challenging in idiomatic, safe C++ to avoid some amount of superfluous memory copying (especially prior to C++11 which we were only able to upgrade to in the last year due to laggards in the userbase), but in the critical path for validation there is virtually none (though there are an excess of small allocations, help improving that would be very welcome). Though, you're not likely to know that if you're just tossing around insults on the internet instead of starting up a profiler.And of course, we're all quite busy keeping things running reliably and improving-- and pulling out the big tens of percent performance improvements that come from high level algorithmic improvements. Eeking out the last percent in micro-optimizations isn't always something that we have the resources to do even where they make sense from a maintainability perspective. But, instead we're off building the fastest ECC validation code that exists out there bar none; because thats simply more important.Could there be more micro-optimizations: Absolutely. So step on up and get your hands dirty because there is 10x as much work needed as there are resources are. There is almost no funding (unlike the millions poured into BU just to crank out crashware); and we can't have basically any failures-- at least not in the consensus critical parts. Oh yea, anonymous people will be abusive to you on the internet too. It's great fun.Oh please. Cargo cult bullshit at its worst. Do you even know what leveldb is used for in Bitcoin? What reason do you believe that $BUZZWORD_PACKAGE_DEJURE is any better for that? Did it occur to you that perhaps people have already benchmarked other options? Rocks has a lot of feature set which is completely irrelevant for our very narrow use of leveldb-- I see in your other posts that you're going on about superior compression in rocksdb: Guess what: we disable compression and rip out out of leveldb, because it HURTS PERFORMANCE and actually makes the database larger-- for our use case. It turns out that cryptographic hashes are not very compressible. (And as CK pointed out, no the blockchain isn't stored in it-- that would be pretty stupid)Pretty sad that you feel qualified to through out that long list of insults without having much of an idea about the architecture of the software.The regular contributors who have written most of the code are the same people pretty much through the entire life of the project; and they're professionals with many years of experience. Perhaps you'd care to share with use your lovely and impressive works?Which wouldn't even hold a candle to the multiple orders of magnitude speedup we've produced so far cumulatively through the life of the project-- exactly my point about micro-optimizations. Of course, contributions are welcome. But it's a heck of a lot easier to wave your arms and insult people who've produced hundred fold improvements, because you think a laundry list of magic moves is going to get another couple times (and they might-- but at what cost?)If you'd like to help out it's open and ready-- though you'll be held to the same high standard of review and validation and not just given a pass because a micro-benchmark got 1% faster-- reliability is the first concern... but 2x-level improvements in latency or throughput critical paths would be very very welcome even if they were a bit painful to review.If you're not interested or able-- well then maybe you're just another drunken sports fan throwing concessions from the stand convinced that you could do so much better than the team, though you won't ever take to the field yourself.It doesn't impress, quite the opposite: because you're effectively exploiting the fact that we don't self-promote much, and so you can get away with slinging some rubbish about how terrible we are just to try to make yourself look impressive. It's a low blow against some very hard working people whom oweto you.If you do a really outstanding job perhaps you'll be able to overcome the embarrassment of:(Hint: Wright's output is almost all pure gibberish; though perhaps you were too busy having fuck screamed at you to notice little details like his code examples for quadratic signature hashing being code from a testing harness that has nothing to do with validation, his fix being a total no op, his false claims that quadratic sighashing is an implementation issue, false claims about altstack having anything to do with turing completeness, false claims that segwit makes the system quadratically slower, false claim that Bitcoin Core removed opcode, yadda yadda. )I for one an not impressed. Show us some contributions if you want to show that you know something useful, not hot air.

cr1776



Offline



Activity: 2730

Merit: 1105







LegendaryActivity: 2730Merit: 1105 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 07:55:40 PM #5 Quote from: Troll Buster on July 06, 2017, 07:23:41 PM Quote from: gmaxwell on July 06, 2017, 11:28:18 AM What you're seeing here is someone trying to pump his ego by shitting on other things and show off to impress you with how uber technical he is-- not the first or the last one of those we'll see.



What you're seeing here is someone trying to defend obvious bad design choices.

...

--i instead of ++i

...

Fix your silly shit instead of keep talking about it.



What you're seeing here is someone trying to defend obvious bad design choices....--i instead of ++i...Fix your silly shit instead of keep talking about it.

As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs), this kind of language isn't a way to (a) make your point, and (b) get anyone to listen to you with any degree of respect.



In open source projects, if you have something like your --i and ++i change, open a pull request or at minimum link to the specific code you are talking about. Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way. But, as I said, if it is that easy, please point out what you are talking about.



As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs), this kind of language isn't a way to (a) make your point, and (b) get anyone to listen to you with any degree of respect.In open source projects, if you have something like your --i and ++i change, open a pull request or at minimum link to the specific code you are talking about. Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way. But, as I said, if it is that easy, please point out what you are talking about.

tspacepilot



Offline



Activity: 1456

Merit: 1061





I may write code in exchange for bitcoins.







LegendaryActivity: 1456Merit: 1061I may write code in exchange for bitcoins. Re: Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 08:04:49 PM

Last edit: July 06, 2017, 08:17:31 PM by tspacepilot #6



You replied with a lot of "translations", but I think gmaxwell put it pretty clearly:



Quote from: gmaxwell on July 06, 2017, 11:28:18 AM Here is the straight dope: If the comments had merit and the author were qualified to apply them-- where is the patch? Oh look at that, no patches.



Some of your "translations" are really questionable:



Quote from: Troll Buster on July 06, 2017, 07:23:41 PM Quote from: gmaxwell on July 06, 2017, 11:28:18 AM Some of these pieces of advice are just a bit outdated as well-- it makes little sense to bake in an optimization that a compiler will reliably perform on its own at the expense of code clarity and maintainability; especially in the 99% of code that isn't hot or on a latency critical path. (Examples being loop invariant code motion and use of conditional moves instead of branching).



Translation: My code is great, everyone else is wrong, nobody else can possibly improve it.

Translation: My code is great, everyone else is wrong, nobody else can possibly improve it.

That doesn't seem right. My reading of gmaxwell was a very strongly worded invitation for you to go ahead and improve it.



Quote Quote from: gmaxwell on July 06, 2017, 11:28:18 AM Which wouldn't even hold a candle to the multiple orders of magnitude speedup we've produced so far cumulatively through the life of the project-- exactly my point about micro-optimizations. Of course, contributions are welcome. But it's a heck of a lot easier to wave your arms and insult people who've produced hundred fold improvements, because you think a laundry list of magic moves is going to get another couple times (and they might-- but at what cost?)



If you'd like to help out it's open and ready-- though you'll be held to the same high standard of review and validation and not just given a pass because a micro-benchmark got 1% faster-- reliability is the first concern... but 2x-level improvements in latency or throughput critical paths would be very very welcome even if they were a bit painful to review.



If you're not interested or able-- well then maybe you're just another drunken sports fan throwing concessions from the stand convinced that you could do so much better than the team, though you won't ever take to the field yourself. Tongue It doesn't impress, quite the opposite: because you're effectively exploiting the fact that we don't self-promote much, and so you can get away with slinging some rubbish about how terrible we are just to try to make yourself look impressive. It's a low blow against some very hard working people whom owe nothing to you.



All this bullshit talk is meaningless when your basic level silly choices are all over the place.

All this bullshit talk is meaningless when your basic level silly choices are all over the place.

Couldn't you, like, fix a few of the 'basic level silly choices' in order to strengthen your argument?





As far as I can tell you've been invited to offer improvements rather than just insults, but it seems that you chose to reply with further insults.



If, for some reason, you can't provide a patch but can provide some helpful discussion which might lead to improvements then it seems like you might need to alter your approach.



I'm not worshipping at anyone's "church" here, I'm just noticing the dynamic: you've been invited to prove the worth of your assumptions, but your reply doesn't seem to be headed in that direction. @TrollBusterYou replied with a lot of "translations", but I think gmaxwell put it pretty clearly:Some of your "translations" are really questionable:That doesn't seem right. My reading of gmaxwell was a very strongly worded invitation forto go ahead and improve it.Couldn't you, like, fix a few of the 'basic level silly choices' in order to strengthen your argument?As far as I can tell you've been invited to offer improvements rather than just insults, but it seems that you chose to reply with further insults.If, for some reason, you can't provide a patch but can provide some helpful discussion which might lead to improvements then it seems like you might need to alter your approach.I'm not worshipping at anyone's "church" here, I'm just noticing the dynamic: you've been invited to prove the worth of your assumptions, but your reply doesn't seem to be headed in that direction.

Troll Buster



Offline



Activity: 42

Merit: 0







NewbieActivity: 42Merit: 0 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 08:32:03 PM

Last edit: July 06, 2017, 09:34:44 PM by Troll Buster #7 Quote from: cr1776 on July 06, 2017, 07:55:40 PM

As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs), this kind of language isn't a way to (a) make your point, and (b) get anyone to listen to you with any degree of respect.



In open source projects, if you have something like your --i and ++i change, open a pull request or at minimum link to the specific code you are talking about. Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way. But, as I said, if it is that easy, please point out what you are talking about.



If greg wants to be treated with respect, he shouldn't begin and end a reply with insults.



This --i and ++i is basic stuff and you want to argue about it? wtf have you been doing for the past 30 years?



And it's not just the speed, it's the smaller byte code which allow you to pack more code into the tiny L0 instruction cache and reduce cache miss, which still costs you 4cycles when you re-fetch it from L1 to L0.



It also means you can fit more code in that tiny 32kb L1 instruction cache, so your other loops/threads can run faster by not being kicked out of the cache by other codes. It also saves power on embedded systems.



This is what I was talking about, the world is flooded with "experts" with "30 years experience" and "50 alphabet soup titles" but still have absolutely no idea wtf actually happens inside a CPU.



Only talentless coders talk about credentials instead of the code.



This is not some super advanced stuff, this is entry level knowledge that's not even up for debate.

The information is everywhere, this took 1 second to find, look:



Quote https://stackoverflow.com/questions/2823043/is-it-faster-to-count-down-than-it-is-to-count-up/2823164#2823164



Which loop has better performance? Increment or decrement?



What your teacher have said was some oblique statement without much clarification. It is NOT that decrementing is faster than incrementing but you can create much much faster loop with decrement than with increment.



int i;

for (i = 0; i < 10; i++){

//something here

}



after compilation (without optimisation) compiled version may look like this (VS2015):



-------- C7 45 B0 00 00 00 00 mov dword ptr ,0

-------- EB 09 jmp labelB

labelA 8B 45 B0 mov eax,dword ptr

-------- 83 C0 01 add eax,1

-------- 89 45 B0 mov dword ptr ,eax

labelB 83 7D B0 0A cmp dword ptr ,0Ah

-------- 7D 02 jge out1

-------- EB EF jmp labelA



The whole loop is 8 instructions (26 bytes). In it - there are actually 6 instructions (17 bytes) with 2 branches. Yes yes I know it can be done better (its just an example).



Now consider this frequent construct which you will often find written by embedded developer:



i = 10;

do{

//something here

} while (--i);



It also iterates 10 times (yes I know i value is different compared with shown for loop but we care about iteration count here). This may be compiled into this:



00074EBC C7 45 B0 01 00 00 00 mov dword ptr ,1

00074EC3 8B 45 B0 mov eax,dword ptr

00074EC6 83 E8 01 sub eax,1

00074EC9 89 45 B0 mov dword ptr ,eax

00074ECC 75 F5 jne main+0C3h (074EC3h)



5 instructions (18 bytes) and just one branch. Actually there are 4 instruction in the loop (11 bytes).



The best thing is that some CPUs (x86/x64 compatible included) have instruction that may decrement a register, later compare result with zero and perform branch if result is different than zero. Virtually ALL PC cpus implement this instruction. Using it the loop is actually just one (yes one) 2 byte instruction:



00144ECE B9 0A 00 00 00 mov ecx,0Ah

label:

// something here

00144ED3 E2 FE loop label (0144ED3h) // decrement ecx and jump to label if not zero



Do I have to explain which is faster?



What your teacher have said was some oblique statement without much clarification. It is NOT that decrementing is faster than incrementing butint i;for (i = 0; i < 10; i++){//something hereafter compilation (without optimisation) compiled version may look like this (VS2015):-------- C7 45 B0 00 00 00 00 mov dword ptr

Here is more on the L0 and uops instruction cache:



Quote http://www.realworldtech.com/haswell-cpu/2/



Sandy Bridge made tremendous strides in improving the front-end and ensuring the smooth delivery of uops to the rest of the pipeline. The biggest improvement was a uop cache that essentially acts as an L0 instruction cache, but contains fixed length decoded uops. The uop cache is virtually addressed and included in the L1 instruction cache. Hitting in the uop cache has several benefits, including reducing the pipeline length by eliminating power hungry instruction decoding stages and enabling an effective throughput of 32B of instructions per cycle. For newer SIMD instructions, the 16B fetch limit was problematic, so the uop cache synergizes nicely with extensions such as AVX.



The Haswell uop cache is the same size and organization as in Sandy Bridge. The uop cache lines hold upto 6 uops, and the cache is organized into 32 sets of 8 cache lines (i.e., 8 way associative). A 32B window of fetched x86 instructions can map to 3 lines within a single way. Hits in the uop cache can deliver 4 uops/cycle and those 4 uops can correspond to 32B of instructions, whereas the traditional front-end cannot process more than 16B/cycle. For performance, the uop cache can hold microcoded instructions as a pointer to microcode, but partial hits are not supported. As with the instruction cache, the decoded uop cache is shared by the active threads.

Sandy Bridge made tremendous strides in improving the front-end and ensuring the smooth delivery of uops to the rest of the pipeline., but contains fixed length decoded uops. The uop cache is virtually addressed and included in the L1 instruction cache.For newer SIMD instructions, the 16B fetch limit was problematic, so the uop cache synergizes nicely with extensions such as AVX., whereas the traditional front-end cannot process more than 16B/cycle. For performance, the uop cache can hold microcoded instructions as a pointer to microcode, but partial hits are not supported. As with the instruction cache, the decoded uop cache is shared by the active threads. If greg wants to be treated with respect, he shouldn't begin and end a reply with insults.This --i and ++i is basic stuff and you want to argue about it? wtf have you been doing for the past 30 years?And it's not just the speed, it's the smaller byte code which allow you to pack more code into the tiny L0 instruction cache and reduce cache miss, which still costs you 4cycles when you re-fetch it from L1 to L0.It also means you can fit more code in that tiny 32kb L1 instruction cache, so your other loops/threads can run faster by not being kicked out of the cache by other codes. It also saves power on embedded systems.This is what I was talking about, the world is flooded with "experts" with "30 years experience" and "50 alphabet soup titles" but still have absolutely no idea wtf actually happens inside a CPU.Only talentless coders talk about credentials instead of the code.This is not some super advanced stuff, this is entry level knowledge that's not even up for debate.The information is everywhere, this took 1 second to find, look:Here is more on the L0 and uops instruction cache:

Troll Buster



Offline



Activity: 42

Merit: 0







NewbieActivity: 42Merit: 0 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 08:42:19 PM #8 Quote from: tspacepilot on July 06, 2017, 08:04:49 PM Couldn't you, like, fix a few of the 'basic level silly choices' in order to strengthen your argument?



As far as I can tell you've been invited to offer improvements rather than just insults, but it seems that you chose to reply with further insults.



If, for some reason, you can't provide a patch but can provide some helpful discussion which might lead to improvements then it seems like you might need to alter your approach.



I'm not worshipping at anyone's "church" here, I'm just noticing the dynamic: you've been invited to prove the worth of your assumptions, but your reply doesn't seem to be headed in that direction.



By "you can't provide a patch" you mean things like the Intel sha256 patch I posted at the end?



LOL what kind of bullshit echo chamber is this? You guys are funny. By "you can't provide a patch" you mean things like the Intel sha256 patch I posted at the end?LOL what kind of bullshit echo chamber is this? You guys are funny.

cr1776



Offline



Activity: 2730

Merit: 1105







LegendaryActivity: 2730Merit: 1105 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 10:11:09 PM #9



Easy question: where is this "switching to --i instead of ++i" which you are speaking about? Just post a link to it on GitHub.



And regarding your "talentless coders talking about credentials" you again seem to have a huge chip on your shoulder. I spoke about my experience - when people come in and start insulting, attacking, denigrating with a lot of hand-waving and a big chip on their shoulder and no specifics, they are ignored (or not hired) in my experience at big (22000 plus people) and small organizations (3+). And rightly so. I think everyone would appreciate specifics instead of baseless, groundless, inaccurate attacks.



Without more detail no one can evaluate whether you are good at coding or just insulting.







Quote from: Troll Buster on July 06, 2017, 08:32:03 PM Quote from: cr1776 on July 06, 2017, 07:55:40 PM

As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs), this kind of language isn't a way to (a) make your point, and (b) get anyone to listen to you with any degree of respect.



In open source projects, if you have something like your --i and ++i change, open a pull request or at minimum link to the specific code you are talking about. Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way. But, as I said, if it is that easy, please point out what you are talking about.



If greg wants to be treated with respect, he shouldn't begin and end a reply with insults.



This --i and ++i is basic stuff and you want to argue about it? wtf have you been doing for the past 30 years?



And it's not just the speed, it's the smaller byte code which allow you to pack more code into the tiny L0 instruction cache and reduce cache miss, which still costs you 4cycles when you re-fetch it from L1 to L0.



It also means you can fit more code in that tiny 32kb L1 instruction cache, so your other loops/threads can run faster by not being kicked out of the cache by other codes. It also saves power on embedded systems.



This is what I was talking about, the world is flooded with "experts" with "30 years experience" and "50 alphabet soup titles" but still have absolutely no idea wtf actually happens inside a CPU.



Only talentless coders talk about credentials instead of the code.



This is not some super advanced stuff, this is entry level knowledge that's not even up for debate.

The information is everywhere, this took 1 second to find, look:



Quote https://stackoverflow.com/questions/2823043/is-it-faster-to-count-down-than-it-is-to-count-up/2823164#2823164



Which loop has better performance? Increment or decrement?



What your teacher have said was some oblique statement without much clarification. It is NOT that decrementing is faster than incrementing but you can create much much faster loop with decrement than with increment.



int i;

for (i = 0; i < 10; i++){

//something here

}



after compilation (without optimisation) compiled version may look like this (VS2015):



-------- C7 45 B0 00 00 00 00 mov dword ptr ,0

-------- EB 09 jmp labelB

labelA 8B 45 B0 mov eax,dword ptr

-------- 83 C0 01 add eax,1

-------- 89 45 B0 mov dword ptr ,eax

labelB 83 7D B0 0A cmp dword ptr ,0Ah

-------- 7D 02 jge out1

-------- EB EF jmp labelA



The whole loop is 8 instructions (26 bytes). In it - there are actually 6 instructions (17 bytes) with 2 branches. Yes yes I know it can be done better (its just an example).



Now consider this frequent construct which you will often find written by embedded developer:



i = 10;

do{

//something here

} while (--i);



It also iterates 10 times (yes I know i value is different compared with shown for loop but we care about iteration count here). This may be compiled into this:



00074EBC C7 45 B0 01 00 00 00 mov dword ptr ,1

00074EC3 8B 45 B0 mov eax,dword ptr

00074EC6 83 E8 01 sub eax,1

00074EC9 89 45 B0 mov dword ptr ,eax

00074ECC 75 F5 jne main+0C3h (074EC3h)



5 instructions (18 bytes) and just one branch. Actually there are 4 instruction in the loop (11 bytes).



The best thing is that some CPUs (x86/x64 compatible included) have instruction that may decrement a register, later compare result with zero and perform branch if result is different than zero. Virtually ALL PC cpus implement this instruction. Using it the loop is actually just one (yes one) 2 byte instruction:



00144ECE B9 0A 00 00 00 mov ecx,0Ah

label:

// something here

00144ED3 E2 FE loop label (0144ED3h) // decrement ecx and jump to label if not zero



Do I have to explain which is faster?



What your teacher have said was some oblique statement without much clarification. It is NOT that decrementing is faster than incrementing butint i;for (i = 0; i < 10; i++){//something hereafter compilation (without optimisation) compiled version may look like this (VS2015):-------- C7 45 B0 00 00 00 00 mov dword ptr

Here is more on the L0 and uops instruction cache:



Quote http://www.realworldtech.com/haswell-cpu/2/



Sandy Bridge made tremendous strides in improving the front-end and ensuring the smooth delivery of uops to the rest of the pipeline. The biggest improvement was a uop cache that essentially acts as an L0 instruction cache, but contains fixed length decoded uops. The uop cache is virtually addressed and included in the L1 instruction cache. Hitting in the uop cache has several benefits, including reducing the pipeline length by eliminating power hungry instruction decoding stages and enabling an effective throughput of 32B of instructions per cycle. For newer SIMD instructions, the 16B fetch limit was problematic, so the uop cache synergizes nicely with extensions such as AVX.



The Haswell uop cache is the same size and organization as in Sandy Bridge. The uop cache lines hold upto 6 uops, and the cache is organized into 32 sets of 8 cache lines (i.e., 8 way associative). A 32B window of fetched x86 instructions can map to 3 lines within a single way. Hits in the uop cache can deliver 4 uops/cycle and those 4 uops can correspond to 32B of instructions, whereas the traditional front-end cannot process more than 16B/cycle. For performance, the uop cache can hold microcoded instructions as a pointer to microcode, but partial hits are not supported. As with the instruction cache, the decoded uop cache is shared by the active threads.

Sandy Bridge made tremendous strides in improving the front-end and ensuring the smooth delivery of uops to the rest of the pipeline., but contains fixed length decoded uops. The uop cache is virtually addressed and included in the L1 instruction cache.For newer SIMD instructions, the 16B fetch limit was problematic, so the uop cache synergizes nicely with extensions such as AVX., whereas the traditional front-end cannot process more than 16B/cycle. For performance, the uop cache can hold microcoded instructions as a pointer to microcode, but partial hits are not supported. As with the instruction cache, the decoded uop cache is shared by the active threads. If greg wants to be treated with respect, he shouldn't begin and end a reply with insults.And it's not just the speed, it's the smaller byte code which allow you to pack more code into the tiny L0 instruction cache and reduce cache miss, which still costs you 4cycles when you re-fetch it from L1 to L0.It also means you can fit more code in that tiny 32kb L1 instruction cache, so your other loops/threads can run faster by not being kicked out of the cache by other codes. It also saves power on embedded systems.This is what I was talking about, the world is flooded with "experts" with "30 years experience" and "50 alphabet soup titles" but still have absolutely no idea wtf actually happens inside a CPU.This is not some super advanced stuff, this is entry level knowledge that's not even up for debate.The information is everywhere, this took 1 second to find, look:Here is more on the L0 and uops instruction cache: Perhaps you should try reading and understanding prior to attacking. I never "argued" with you about --i and ++i. I asked for specifics in the code about which you were referring - which should be easy to provide - and pointed out compilers are quite smart about optimizations, but without knowing what code you are referencing it is impossible to review.Easy question: where is this "switching to --i instead of ++i" which you are speaking about? Just post a link to it on GitHub.And regarding your "talentless coders talking about credentials" you again seem to have a huge chip on your shoulder. I spoke about my experience - when people come in and start insulting, attacking, denigrating with a lot of hand-waving and a big chip on their shoulder and no specifics, they are ignored (or not hired) in my experience at big (22000 plus people) and small organizations (3+). And rightly so. I think everyone would appreciate specifics instead of baseless, groundless, inaccurate attacks.Without more detail no one can evaluate whether you are good at coding or just insulting.

Troll Buster



Offline



Activity: 42

Merit: 0







NewbieActivity: 42Merit: 0 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 10:29:01 PM

Last edit: July 07, 2017, 08:29:36 AM by Troll Buster #10 Quote from: cr1776 on July 06, 2017, 10:11:09 PM Perhaps you should try reading and understanding prior to attacking. I never "argued" with you about --i and ++i.

Yes you did, I even highlighted it:



Quote from: cr1776 on July 06, 2017, 07:55:40 PM In open source projects, if you have something like your --i and ++i change, open a pull request or at minimum link to the specific code you are talking about. Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way. But, as I said, if it is that easy, please point out what you are talking about.

Stop talking bullshit.



Quote from: cr1776 on July 06, 2017, 10:11:09 PM And regarding your "talentless coders talking about credentials" you again seem to have a huge chip on your shoulder. I spoke about my experience - when people come in and start insulting, attacking, denigrating with a lot of hand-waving and a big chip on their shoulder and no specifics, they are ignored (or not hired) in my experience at big (22000 plus people) and small organizations (3+). And rightly so. I think everyone would appreciate specifics instead of baseless, groundless, inaccurate attacks.



Here is a tip, if you don't want to be mocked, next time don't start an argument with:

"As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs)"



You walked in here knowing you had no idea wtf was going on inside a CPU, thrown out a bunch of titles, made a bunch of false claims while making demands, and you want to talk about etiquette?



Your code sucks, everyone else is doing better, I shown you the proof, I pointed you to the right direction, take it or leave it.



You're a nothing burger with 50 stickers on it and I simply don't give a shit what you think.

Yes you did, I even highlighted it:Stop talking bullshit.Here is a tip, if you don't want to be mocked, next time don't start an argument with:"As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs)"You walked in here knowing you had no idea wtf was going on inside a CPU, thrown out a bunch of titles, made a bunch of false claims while making demands, and you want to talk about etiquette?Your code sucks, everyone else is doing better, I shown you the proof, I pointed you to the right direction, take it or leave it.You're a nothing burger with 50 stickers on it and I simply don't give a shit what you think.

gmaxwell

Legendary





Offline



Activity: 3178

Merit: 4298









ModeratorLegendaryActivity: 3178Merit: 4298 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 10:58:09 PM

Last edit: July 09, 2017, 06:40:51 AM by gmaxwell #11 Quote And many people on the project quit because they didn't like working with you, what's your point?



Name one.



Quote People have been laughing at your choices for years and here you are defending it because you wrote some codec to watch porn with higher fps some years ago.

Says the few days old account...





Quote Quote from: gmaxwell on July 06, 2017, 11:28:18 AM Inefficient data storage Oh please. Cargo cult bullshit at its worst. Do you even know what leveldb is used for in Bitcoin? What reason do you believe that $BUZZWORD_PACKAGE_DEJURE is any better for that? Did it occur to you that perhaps people have already benchmarked other options? Rocks has a lot of feature set which is completely irrelevant for our very narrow use of leveldb-- I see in your other posts that you're going on about superior compression in rocksdb: Guess what: we disable compression and rip out out of leveldb, because it HURTS PERFORMANCE for our use case. It turns out that cryptographic hashes are not very compressible.



Everyone knows compression costs performance, it's for space efficiency, wtf are you even on about.



Most people's CPU is running idle most of the time, and SSD is still expensive.



So just use RocksDB, or just toss in a lz4 lib, add an option in the config and let people with a decent CPU to enable compression and save 20+G.

Everyone knows compression costs performance, it's for space efficiency, wtf are you even on about.Most people's CPU is running idle most of the time, and SSD is still expensive.So just use RocksDB, or just toss in a lz4 lib, add an option in the config and let people with a decent CPU to enable compression and save 20+G.

Reading failure on your part. The blocks are not in a database. Doing so would be very bad for performance. The chainstate is not meaningfully compressible beyond key sharing (and if it were, who would care, it's 2GBish). The chainstate is small and entirely about performance. In fact we just made it 10% larger or so in order to create a 25%-ish initial sync speedup.



If you care about how much space the blocks are using, turn on pruning and you'll save 140GB. LZ4 is a really inefficient way to compress blocks-- it mostly just exploits repeated pubkeys from address reuse the compact serilization we have better (28% reduction) but it's not clear if its worth the slowdown, especially since you can just prune and save a lot more.



Especially since if what you want is generic compression of block files you can simply use a filesystem that implements it... and it will helpfully compress all your other data, logs, etc.



Quote So what's your excuse for not making use of SSE/AVX/AVX2 and the Intel SHA extension? Aesthetics? Portability? Pfft.

There was an incomplete



Name one.Says the few days old account...Reading failure on your part. The blocks are not in a database. Doing so would be very bad for performance. The chainstate is not meaningfully compressible beyond key sharing (and if it were, who would care, it's 2GBish). The chainstate is small and entirely about performance. In fact we just made it 10% larger or so in order to create a 25%-ish initial sync speedup.If you care about how much space the blocks are using, turn on pruning and you'll save 140GB. LZ4 is a really inefficient way to compress blocks-- it mostly just exploits repeated pubkeys from address reusethe compact serilization we have better (28% reduction) but it's not clear if its worth the slowdown, especially since you can just prune and save a lot more.Especially since if what you want is generic compression of block files you can simply use a filesystem that implements it... and it will helpfully compress all your other data, logs, etc.There was an incomplete PR for that, it was something like a 5% performance difference for initial sync at the time; it would be somewhat more now due to other optimizations. Instead we spent more time eliminating redundant sha256 operations in the codebase, which got a lot more speed up then this final bit of optimization will. It's used in the fibre codebase without autodetection. Please feel free to finish up the autodetection for it. It's a perfect project for a new contributor. We also have a new AMD host so that x86_64 sha2 extensions can be tested on it.

cr1776



Offline



Activity: 2730

Merit: 1105







LegendaryActivity: 2730Merit: 1105 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 06, 2017, 11:41:07 PM #13



I am quite aware of what "goes on inside a CPU" and have actually done several CPU designs. Although I think you need to drop the "Buster" since you are just trolling us.







Quote from: Troll Buster on July 06, 2017, 10:29:01 PM Quote from: cr1776 on July 06, 2017, 10:11:09 PM Perhaps you should try reading and understanding prior to attacking. I never "argued" with you about --i and ++i.

Yes you did, I even highlighted it:



Quote from: cr1776 on July 06, 2017, 07:55:40 PM Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way.

Well technically you posted ++i and i++, but this whole time I've been talking about ++i and --i, that was what you were responding to, and you stated that compilers can handle everything, they can't, and that's entry level knowledge.



Quote from: cr1776 on July 06, 2017, 10:11:09 PM And regarding your "talentless coders talking about credentials" you again seem to have a huge chip on your shoulder. I spoke about my experience - when people come in and start insulting, attacking, denigrating with a lot of hand-waving and a big chip on their shoulder and no specifics, they are ignored (or not hired) in my experience at big (22000 plus people) and small organizations (3+). And rightly so. I think everyone would appreciate specifics instead of baseless, groundless, inaccurate attacks.



Here is a tip, if you don't want to be mocked, next time don't start an argument with:

"As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs)"



You walked in here knowing you had no idea wtf was going on inside a CPU, thrown out a bunch of titles, made a bunch of false claims while making demands, and you want to talk about etiquette?



Your code sucks, everyone else is doing better, I shown you the proof, I pointed you to the right direction, take it or leave it.



You're a nothing burger with 50 stickers on it and I simply don't give a shit what you think.



Yes you did, I even highlighted it:Well technically you posted ++i and i++, but this whole time I've been talking about ++i and --i, that was what you were responding to, and you stated that compilers can handle everything, they can't, and that's entry level knowledge.Here is a tip, if you don't want to be mocked, next time don't start an argument with:"As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs)"You walked in here knowing you had no idea wtf was going on inside a CPU, thrown out a bunch of titles, made a bunch of false claims while making demands, and you want to talk about etiquette?Your code sucks, everyone else is doing better, I shown you the proof, I pointed you to the right direction, take it or leave it.You're a nothing burger with 50 stickers on it and I simply don't give a shit what you think. How about just posting a link (as I've asked 3 times now) to where you advocate "switching to --i instead of ++i"?I am quite aware of what "goes on inside a CPU" and have actually done several CPU designs. Although I think you need to drop the "Buster" since you are just trolling us.

gmaxwell

Legendary





Offline



Activity: 3178

Merit: 4298









ModeratorLegendaryActivity: 3178Merit: 4298 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 07, 2017, 03:24:09 AM

Last edit: July 07, 2017, 07:51:39 PM by gmaxwell #14 Quote from: Troll Buster on July 06, 2017, 11:35:17 PM XT team for starters:

Fun fact: Mike Hearn contributed a grand total of something like 6 relatively minor pull requests-- most just changing strings. It's popular disinformation that he was some kind of major contributor to the project. Several of his changes that weren't string changes introduced remote vulnerabilities (but fortunately we caught them with review.)



Quote Right, if the logic doesn't work, just fall back to using registration date and post counts to establish authority.

Yes, I've been using Bitcoin pretty much its entire life and I can easily demonstrate it. My expertise is well established, why is it that you won't show us yours though you claim to be so vastly more skilled than everyone here?



Quote At the time I didn't even know you guys were stupid enough to not compress the 150G of blocks, until someone reminded me in that thread. Seriously what is the point leaving blocks from 2009 uncompressed? SSD is cheap these days but not that cheap.

From 2009? ... you know that the blocks are not accessed at all, except by new peers that read all of them right? They're not really accessed any less accessed than blocks from 6 months ago. (they're also pretty much completely incompressable with lz4, since unlike modern blocks they're not full of reused addresses).



As to why? Because a 10% decrease in size isn't all that interesting esp at the cost of making fetching blocks for bloom filtered lite nodes much more cpu intensive, as that's already a DOS vector.





[Edit:



Quote So after all the talk about your l33t porn codec skills, your solution to save space is to just prune the blocks? LOL. You might as well say "Just run a thin wallet".

Uh, sounds like you're misinformed on this too: Pruning makes absolutely no change in the security, privacy, or behavior of your node other than that you no longer help new nodes do their initial sync/scanning. Outside of those narrow things a pruned node is completely indistinguishable. And instead of only reducing the storage 10%, it reduces it 99%.



Quote Why do you think compression experts around the world invented algorithms like Lz4? Why do you think it's part of ZFS? Because it is fast enough and it works, it is simple proven tech used by millions of low power NAS around the world for years.



Here, there are over 100 compression algorithms, all invented and benchmarked for you.

You'll easily find one that has a size/speed/mem profile that just happen to work great on bitcoin block files and is better than LZ4.

Lz4 is fine stuff, but it isn't the right tool for Bitcoin almost all the data in Bitcoin is cryptographic hashes which are entirely uncompressable. This is why a simple change to more efficient serialization can get over 28% reduction while your LZ4 only gets 10%. As far as other things-- no we won't: block data is not like ordinary documents and traditional compressors don't do very much with it.



(And as an aside, every one of the items in your list are exceptionally slow. lol, for example I believe the top item in it takes it about 12 hours to decompress its 15MB enwiki8 file. heh way to show off your ninja recommendation skills)



If you'd like to work on compression, I can point you to the compacted serialization spec that gets close to 30%... but if you think you're going to use one of the paq/ppm compressors ... well, hope you've got a fast computer.



Quote I would have made patches a long time ago if the whole project wasn't already rotten to the core. Can you show us a non-trivial patch you made to any other project anywhere?

Fun fact: Mike Hearn contributed a grand total of something like 6 relatively minor pull requests-- most just changing strings. It's popular disinformation that he was some kind of major contributor to the project. Several of his changes that weren't string changes introduced remote vulnerabilities (but fortunately we caught them with review.)Yes, I've been using Bitcoin pretty much its entire life and I can easily demonstrate it. My expertise is well established, why is it that you won't show us yours though you claim to be so vastly more skilled than everyone here?From 2009? ... you know that the blocks are not accessed at all, except by new peers that read all of them right? They're not really accessed any less accessed than blocks from 6 months ago. (they're also pretty much completely incompressable with lz4, since unlike modern blocks they're not full of reused addresses).As to why? Because a 10% decrease in size isn't all that interesting esp at the cost of making fetching blocks for bloom filtered lite nodes much more cpu intensive, as that's already a DOS vector.[Edit: dooglus points out the very earliest blocks are actually fairly compressible presumably because they consist of nothing but coinbase transactions which have a huge wad of zeros in them.]Uh, sounds like you're misinformed on this too: Pruning makes absolutely no change in the security, privacy, or behavior of your node other than that you no longer help new nodes do their initial sync/scanning. Outside of those narrow things a pruned node is completely indistinguishable. And instead of only reducing the storage 10%, it reduces it 99%.Lz4 is fine stuff, but it isn't the right tool for Bitcoin almost all the data in Bitcoin is cryptographic hashes which are entirely uncompressable. This is why a simple change to more efficient serialization can get over 28% reduction while your LZ4 only gets 10%. As far as other things-- no we won't: block data is not like ordinary documents and traditional compressors don't do very much with it.(And as an aside, every one of the items in your list are exceptionally slow. lol, for example I believe the top item in it takes it about 12 hours to decompress its 15MB enwiki8 file. heh way to show off your ninja recommendation skills)If you'd like to work on compression, I can point you to the compacted serialization spec that gets close to 30%... but if you think you're going to use one of the paq/ppm compressors ... well, hope you've got a fast computer.Can you show us a non-trivial patch you made to any other project anywhere?

Troll Buster



Offline



Activity: 42

Merit: 0







NewbieActivity: 42Merit: 0 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 07, 2017, 04:34:47 AM

Last edit: July 07, 2017, 08:35:02 AM by Troll Buster #15 Quote Fun fact: Mike Hearn contributed a grand total of something like 6 relatively minor pull requests-- most just changing strings. It's popular disinformation that he was some kind of major contributor to the project. Several of his changes that weren't string changes introduced remote vulnerabilities (but fortunately we caught them with review.)



Irrelevant.

You challenged me to find one person who quit because of you. I gave you a whole team.

Here is another team, the Bitcoin Classic team, they left for similar reasons.



Quote Yes, I've been using Bitcoin pretty much its entire life and I can easily demonstrate it. My expertise is well established, why is it that you won't show us yours though you claim to be so vastly more skilled than everyone here?



I didn't claim anything about myself.

Your code sucks, you said nope they're great, so I shown you where, I shown you how to improve it.

Then you went all "Says the few days old account", "i spent years on a porn codec" ego authority bullshit.

I don't care what you think about yourself or me.

Stick to the tech or stfu.



Quote From 2009? ... you know that the blocks are not accessed at all, except by new peers that read all of them right? They're not really accessed any less accessed than blocks from 6 months ago. (they're also pretty much completely incompressable with lz4, since unlike modern blocks they're not full of reused addresses).



As to why? Because a 10% decrease in size isn't all that interesting esp at the cost of making fetching blocks for bloom filtered lite nodes much more cpu intensive, as that's already a DOS vector.



So now you're going to use new peers as an excuse to not compress the blocks?



That is so stupid.



When compression is enabled, and a new peer requests an old block.

Just send him the entire compressed block as is and let him process it.

It'll actually save bandwidth and download time.



Just add the compression feature and setting.

Some user would like to save 20G on their SSD by changing a 0 to 1, some wouldn't.

Just add the feature and move on, what's so complicated.

Compression is standard stuff, don't argue over stupid shit.



Quote Uh, sounds like you're misinformed on this too: Pruning makes absolutely no change in the security, privacy, or behavior of your node other than that you no longer help new nodes do their initial sync/scanning. Outside of those narrow things a pruned node is completely indistinguishable. And instead of only reducing the storage 10%, it reduces it 99%.



Who said anything about security or privacy.

To suggest pruning over simple compression was silly enough.

One minute you go all "My expertise is well established"

Next minute you talk total nonsense.

It's like amateur hour.



Quote Lz4 is fine stuff, but it isn't the right tool for Bitcoin almost all the data in Bitcoin is cryptographic hashes which are entirely uncompressed. This is why a simple change to more efficient serialization can get over 28% reduction while your LZ4 only gets 10%. As far as other things-- no we won't: block data is not like ordinary documents and traditional compressors don't do very much with it.



(And as an aside, every one of the items in your list are exceptionally slow. lol, for example I believe the top item in it takes it about 12 hours to decompress its 15MB enwiki8 file. heh way to show off your ninja recommendation skills)



If you'd like to work on compression, I can point you to the compacted serialization spec that gets close to 30%... but if you think you're going to use one of the paq/ppm compressors ... well, hope you've got a fast computer.



Look, here is the bottom line.

Compression is a common feature used everywhere for decades.

It's not some new high tech secret, why talk so much bullshit making it sound so complicated.



The point is you're already a few years late.

10%, 20%, 30%, Lz4, not Lz4, who gives a shit, in the end it's a space/time trade off.

If you can't decide what settings to use, just offer 3 settings, low/medium/high.

If you can't decide which algorithm to use, let user choose 1 out of 3 algorithms, give users the choice.

Compression is simple, libs and examples are everywhere, just figure it out.

Stop giving stupid excuses and stop mumbling irrelevant bullshit.



Quote Can you show us a non-trivial patch you made to any other project anywhere?



Like they said, "I could tell you but then I'd have to kill you."

Too much hassle.

Irrelevant.You challenged me to find one person who quit because of you. I gave you a whole team.Here is another team, the Bitcoin Classic team, they left for similar reasons.I didn't claim anything about myself.Your code sucks, you said nope they're great, so I shown you where, I shown you how to improve it.Then you went all "Says the few days old account", "i spent years on a porn codec" ego authority bullshit.I don't care what you think about yourself or me.Stick to the tech or stfu.So now you're going to use new peers as an excuse to not compress the blocks?That is so stupid.When compression is enabled, and a new peer requests an old block.Just send him the entire compressed block as is and let him process it.It'll actually save bandwidth and download time.Just add the compression feature and setting.Some user would like to save 20G on their SSD by changing a 0 to 1, some wouldn't.Just add the feature and move on, what's so complicated.Compression is standard stuff, don't argue over stupid shit.Who said anything about security or privacy.To suggest pruning over simple compression was silly enough.One minute you go all "My expertise is well established"Next minute you talk total nonsense.It's like amateur hour.Look, here is the bottom line.Compression is a common feature used everywhere for decades.It's not some new high tech secret, why talk so much bullshit making it sound so complicated.The point is you're already a few years late.10%, 20%, 30%, Lz4, not Lz4, who gives a shit, in the end it's a space/time trade off.If you can't decide what settings to use, just offer 3 settings, low/medium/high.If you can't decide which algorithm to use, let user choose 1 out of 3 algorithms, give users the choice.Compression is simple, libs and examples are everywhere, just figure it out.Stop giving stupid excuses and stop mumbling irrelevant bullshit.Like they said, "I could tell you but then I'd have to kill you."Too much hassle.

tspacepilot



Offline



Activity: 1456

Merit: 1061





I may write code in exchange for bitcoins.







LegendaryActivity: 1456Merit: 1061I may write code in exchange for bitcoins. Re: Some 'technical commentary' about Core code esp. hardware utilisation July 07, 2017, 06:49:24 AM #16 Quote from: Troll Buster on July 06, 2017, 11:35:17 PM Quote from: gmaxwell on July 06, 2017, 10:58:09 PM There is a PR for that, it was something like a 5% performance difference for initial sync at the time; it would be somewhat more now due to other optimizations. It's used in the fibre codebase without autodetection. Please feel free to finish up the autodetection for it.



I would have made patches a long time ago if the whole project wasn't already rotten to the core.

I would have made patches a long time ago if the whole project wasn't already rotten to the core.

So here you're just admitting that you're only here to troll?



Quote from: Troll Buster on July 07, 2017, 04:34:47 AM Like they said, "I could tell you but then I'd have to kill you."

Too much hassle.



There was this other thing that they said, something about talk being cheap. Then there was another one I heard once that went something like 'put up or shut up'. Maybe those are relevant here.





At this point it's pretty clear to me that Troll Buster is just here to spew bile. It's really striking how puffed up he is about his skills and badassery and then when someone asks him to point to a project he's worked on or generally to prove his talk with something more than a google search his reply is all 'hey, look over there!'



I'll keep watching this thread because amongst all the chest thumping are some interesting technical details, but I think we can go ahead and recognize that Troll Buster isn't going to be contributing anything more than the chest thumping. So here you're just admitting that you're only here to troll?There was this other thing that they said, something about talk being cheap. Then there was another one I heard once that went something like 'put up or shut up'. Maybe those are relevant here.At this point it's pretty clear to me that Troll Buster is just here to spew bile. It's really striking how puffed up he is about his skills and badassery and then when someone asks him to point to a project he's worked on or generally to prove his talk with something more than a google search his reply is all 'hey, look over there!'I'll keep watching this thread because amongst all the chest thumping are some interesting technical details, but I think we can go ahead and recognize that Troll Buster isn't going to be contributing anything more than the chest thumping.

wiffwaff



Offline



Activity: 6

Merit: 0







NewbieActivity: 6Merit: 0 Re: Some 'technical commentary' about Core code esp. hardware utilisation July 07, 2017, 08:49:14 AM #18 Quote from: tspacepilot on July 07, 2017, 06:49:24 AM At this point it's pretty clear to me that Troll Buster is just here to spew bile. It's really striking how puffed up he is about his skills and badassery and then when someone asks him to point to a project he's worked on or generally to prove his talk with something more than a google search his reply is all 'hey, look over there!'



Troll Buster is pointing out poor decisions that can be improved upon and people here are trying to find something of their's to be able to bash. This is typical Core tactics, whereby they fail to address the issue being highlighted and instead attempt to launch person attacks on the person stepping forward.



This is exactly why bitcoin development fragmented under the fifth column attacks that forced out the best and brightest, leaving us with the cesspit we have today.



Go on, fire up some BIP148 hashing power. I double-dare you. Troll Buster is pointing out poor decisions that can be improved upon and people here are trying to find something of their's to be able to bash. This is typical Core tactics, whereby they fail to address the issue being highlighted and instead attempt to launch person attacks on the person stepping forward.This is exactly why bitcoin development fragmented under the fifth column attacks that forced out the best and brightest, leaving us with the cesspit we have today.Go on, fire up some BIP148 hashing power. I double-dare you.