Author Message

Eric Driver



Project developer

Project tester

Project scientist

Project administratorProject developerProject testerProject scientist Send message

Joined: 8 Jul 11

Posts: 1045

Credit: 126,146,365

RAC: 111,070

Joined: 8 Jul 11Posts: 1045Credit: 126,146,365RAC: 111,070 Message 2360 - Posted: 5 Apr 2019, 23:40:01 UTC So there have been some new developments over the last week. It's both good and bad.



First of all, some history. The reason I waited so long to develop a GPU app is because the calculation was heavily dependent on multi-precision libraries (gmp) and number theoretic libraries (pari/gp). Both of these use dynamically allocated memory which is a big no-no in GPUs. I found a multi-precision library online that I could use by hard coding the precision to the maximum required (about 750 bits), thereby removing the dependence on memory allocations. The next piece of the puzzle was to code up a polynomial discriminant function. After doing this, I could finally compile a kernel for the GPU. That is the history for the current GPU app. It is about 20 to 30 times faster than the current cpu version (depends on WU and cpu/gpu speeds).



But then I got thinking... my GPU polynomial discriminant algorithm is different from the one in the PARI library (theirs works for any degree and mine is specialized to degree 10). So to do a true apples-to-apples comparison, I replaced the PARI algorithm with mine in the cpu version of the code. I was shocked by what I found... the cpu version was now about 10x faster than it used to be. I never thought I was capable of writing an algorithm that would be 10x faster than a well established library function. WTF? Now I'm kicking myself in the butt for not having done this sooner!



This brings mixed emotions. On one side, it is great that I now have a cpu version that is 10x faster. But it also means that my GPU code is total crap. With all the horsepower in a present day GPU I would expect it to be at least 10x faster than the equivalent cpu version. Compared with the new cpu version, the gpu is only 2 to 3 times faster. That is unacceptable.



So the new plan is as follows:

1. Deploy new cpu executables. Since it's 10x faster, I will need to drop the credit by a factor of 10. (Credits/hour will remain the same for the cpu but will obviously drop for the GPU)

2. Develop new and improved GPU kernels.



I don't blame the GPU users for jumping ship at this point. Frankly, the inefficiency of the current GPU app just makes it not worth it (for them or the project).



For what it's worth, I did have openCL versions built. Nvidia version works perfectly. The AMD version is buggy for some reason, as is the windows version. Since I will be changing the kernels anyways, there is no point in debugging them yet. Reply Quote ID: 2360 · Rating: 0 · rate:

nedmanjo

Send message

Joined: 10 Sep 17

Posts: 2

Credit: 2,295,462

RAC: 5,746

Joined: 10 Sep 17Posts: 2Credit: 2,295,462RAC: 5,746 Message 2361 - Posted: 5 Apr 2019, 23:56:41 UTC - in response to Message 2360.

Last modified: 5 Apr 2019, 23:58:55 UTC Actually, that's great news! An optimized CPU app and GPU app as well. Can I infer a AMD app will be available in time as well? That would be great!



By the way, any sort of time line for deploying the new apps? Reply Quote ID: 2361 · Rating: 0 · rate:

Eric Driver



Project developer

Project tester

Project scientist

Project administratorProject developerProject testerProject scientist Send message

Joined: 8 Jul 11

Posts: 1045

Credit: 126,146,365

RAC: 111,070

Joined: 8 Jul 11Posts: 1045Credit: 126,146,365RAC: 111,070 Message 2362 - Posted: 6 Apr 2019, 0:30:47 UTC - in response to Message 2361. Actually, that's great news! An optimized CPU app and GPU app as well. Can I infer a AMD app will be available in time as well? That would be great!



By the way, any sort of time line for deploying the new apps?



I just deployed the new cpu apps. Version 3.00. Feel free to abort any WUs associated with the older versions (2.xx).



Not sure the best way to transition the credit value. If I change it now then late returns are penalized. If I wait then quick turn arounds will be overly rewarded.



And new GPU apps are weeks away. I just deployed the new cpu apps. Version 3.00. Feel free to abort any WUs associated with the older versions (2.xx).Not sure the best way to transition the credit value. If I change it now then late returns are penalized. If I wait then quick turn arounds will be overly rewarded.And new GPU apps are weeks away. Reply Quote ID: 2362 · Rating: 0 · rate:

Eric Driver



Project developer

Project tester

Project scientist

Project administratorProject developerProject testerProject scientist Send message

Joined: 8 Jul 11

Posts: 1045

Credit: 126,146,365

RAC: 111,070

Joined: 8 Jul 11Posts: 1045Credit: 126,146,365RAC: 111,070 Message 2363 - Posted: 6 Apr 2019, 0:52:06 UTC - in response to Message 2362. I am temporarily going back to credit based on runtime. Once everyone has had a chance to settle in with the super fast cpu app, I will go back to fixed credit per wu. I think this is the fairest way to handle credits during the transition period. Reply Quote ID: 2363 · Rating: 0 · rate:

Michael H.W. Weber



Send message

Joined: 30 Apr 18

Posts: 11

Credit: 1,578,856

RAC: 5

Joined: 30 Apr 18Posts: 11Credit: 1,578,856RAC: 5 Message 2364 - Posted: 6 Apr 2019, 7:48:22 UTC - in response to Message 2363.

Last modified: 6 Apr 2019, 7:49:52 UTC To my knowledge there is not a single project where the credit system had been adapted after deploying an improved client - if I was you, I wouldn't change anything.

The credits inbetween projects are anyway not comparable and deployment of a new client version affects all project participants in the same way. Morevoer, there are several reasons to argue that even within projects CPU vs. GPU credits and even credits generated by different types of CPU architectures (ARM / AMD / Intel / ...) pose an issue.

So, please focus on the research results and further (GPU?) client improvements.



Michael. President of Rechenkraft.net Reply Quote ID: 2364 · Rating: 0 · rate:

Julien

Send message

Joined: 14 Sep 13

Posts: 2

Credit: 876,769

RAC: 0

Joined: 14 Sep 13Posts: 2Credit: 876,769RAC: 0 Message 2365 - Posted: 6 Apr 2019, 7:48:47 UTC Hello,



I didn't find the answer in FAQ so just asking here:

did you plan to put code (for Cpu and Gpu Nvidia or Amd) on github/gitlab or similar so people may contribute?

Eg: I use cppcheck (a C/C++ static analyzer) to find some bugs. Reply Quote ID: 2365 · Rating: 0 · rate:

Michael H.W. Weber



Send message

Joined: 30 Apr 18

Posts: 11

Credit: 1,578,856

RAC: 5

Joined: 30 Apr 18Posts: 11Credit: 1,578,856RAC: 5 Message 2366 - Posted: 6 Apr 2019, 7:54:57 UTC - in response to Message 2365. Hello,



I didn't find the answer in FAQ so just asking here:

did you plan to put code (for Cpu and Gpu Nvidia or Amd) on github/gitlab or similar so people may contribute?

Eg: I use cppcheck (a C/C++ static analyzer) to find some bugs.

This indeed is a good idea - and again highlights a problem with the credits: Just check out how many projects already have optimized clients coded by third party people that are producing more credits/hour compared to the projects own software. Following Erics arguments above, even here credit system adaptations would be required - to my knowledge, again, nowhere this is put into practice...



Michael. President of Rechenkraft.net This indeed is a good idea - and again highlights a problem with the credits: Just check out how many projects already have optimized clients coded by third party people that are producing more credits/hour compared to the projects own software. Following Erics arguments above, even here credit system adaptations would be required - to my knowledge, again, nowhere this is put into practice...Michael.President of Rechenkraft.net Reply Quote ID: 2366 · Rating: 0 · rate:

M0CZY



Send message

Joined: 7 Dec 18

Posts: 2

Credit: 25,364

RAC: 18

Joined: 7 Dec 18Posts: 2Credit: 25,364RAC: 18 Message 2367 - Posted: 6 Apr 2019, 11:53:44 UTC I just deployed the new cpu apps. Version 3.00. Feel free to abort any WUs associated with the older versions (2.xx).

My 32-bit Linux machine is still using version 2.12.

Are there plans to release version 3.00 apps for this platform (and 32-bit Windows)? My 32-bit Linux machine is still using version 2.12.Are there plans to release version 3.00 apps for this platform (and 32-bit Windows)? Reply Quote ID: 2367 · Rating: 0 · rate:

UBT - Timbo

Send message

Joined: 30 Dec 13

Posts: 1

Credit: 2,409,715

RAC: 3,120

Joined: 30 Dec 13Posts: 1Credit: 2,409,715RAC: 3,120 Message 2368 - Posted: 6 Apr 2019, 14:05:36 UTC - in response to Message 2360.

Last modified: 6 Apr 2019, 14:06:01 UTC



So the new plan is as follows:

1. Deploy new cpu executables. Since it's 10x faster, I will need to drop the credit by a factor of 10. (Credits/hour will remain the same for the cpu but will obviously drop for the GPU)





Hi



In the past, I think that the NumberFields tasks took my PCs about 4 hours to complete and from my notes a while back the fixed credits were about 370 per completed task - so that's about 1.5 credits per minute.(I can't see any of my old results on the project so I'm not 100% sure of this).



Using the v3.0 CPU app, the 2 PCs I've run Numberfields on (since last night) are earning at around 0.667 and 0.835 credits per minute (respectively).



Am I using the wrong "old" data and making an incorrect assumption that the credits per hour are now less than the v2.x app? If so, then that would be a shame.



On the other hand, I do hope your server(s) can cope with the increased number of tasks being downloaded as well as more frequent uploads being made.



regards

Tim HiIn the past, I think that the NumberFields tasks took my PCs about 4 hours to complete and from my notes a while back the fixed credits were about 370 per completed task - so that's about 1.5 credits per minute.(I can't see any of my old results on the project so I'm not 100% sure of this).Using the v3.0 CPU app, the 2 PCs I've run Numberfields on (since last night) are earning at around 0.667 and 0.835 credits per minute (respectively).Am I using the wrong "old" data and making an incorrect assumption that the credits per hour are now less than the v2.x app? If so, then that would be a shame.On the other hand, I do hope your server(s) can cope with the increased number of tasks being downloaded as well as more frequent uploads being made.regardsTim Reply Quote ID: 2368 · Rating: 0 · rate:

bcavnaugh



Send message

Joined: 4 Aug 14

Posts: 5

Credit: 4,018,952

RAC: 0

Joined: 4 Aug 14Posts: 5Credit: 4,018,952RAC: 0 Message 2369 - Posted: 6 Apr 2019, 16:18:05 UTC - in response to Message 2368.

Last modified: 6 Apr 2019, 16:20:13 UTC

So the new plan is as follows:

1. Deploy new cpu executables. Since it's 10x faster, I will need to drop the credit by a factor of 10. (Credits/hour will remain the same for the cpu but will obviously drop for the GPU)



Why?

Other Projects that add GPU Apps gives us the Higher Credit for running GPU Tasks over CPU Tasks. Why?Other Projects that add GPU Apps gives us the Higher Credit for running GPU Tasks over CPU Tasks. Reply Quote ID: 2369 · Rating: 0 · rate:

Richard Haselgrove

Send message

Joined: 28 Oct 11

Posts: 148

Credit: 123,097,096

RAC: 71,461

Joined: 28 Oct 11Posts: 148Credit: 123,097,096RAC: 71,461 Message 2370 - Posted: 6 Apr 2019, 17:05:00 UTC I was a bit taken aback to see the much shorter estimated runtime when I first saw my task list this morning, but once I'd focused on the version number and read this thread, all was explained.



As it happens, I'd started a spreadsheet to measure the performance of my Windows machines in BOINC credit terms. I have three identical i5-4690 CPU @ 3.50GHz CPUs running Windows 7/64, but with different software loaded for different purposes: with version 2.12, they were recording 68, 70, 72 credits per hour with minuscule variation (st_dev down to 0.00073).



Under version 3.00 - exactly the same! I don't know how you managed it, but that's the smoothest version upgrade I've ever seen. No problems with runtime estimates and over/under fetching, no interruption to work flow, no messy credit adjustments. The only thing I haven't checked yet is whether the more efficient application increases the power consumption of the CPU, but I'll check that later - I haven't got the watt-meter in circuit at the moment.



I'd say that was a fair result. We are contributing the same hardware and (subject to checking) the same power, and we've done nothing to optimise our systems. You've done the work, and you've got the benefit in the form of a much increased result rate.



Bravo, and well done. :-) Reply Quote ID: 2370 · Rating: 0 · rate:

Richard Haselgrove

Send message

Joined: 28 Oct 11

Posts: 148

Credit: 123,097,096

RAC: 71,461

Joined: 28 Oct 11Posts: 148Credit: 123,097,096RAC: 71,461 Message 2371 - Posted: 6 Apr 2019, 17:26:00 UTC



Here are the raw figures, expressed as average credits per hour.



Host v2.12 v3.00 1288 70.5627 70.9940 1290 68.0019 68.0024 1291 72.1462 72.1432 To reassure people with different recollections, I took version 2.12 credit readings from between 21 March and 25 March, before the first adjustments for the GPU release. There were between 47 and 75 results visible for the three machines. For version 3.00, I had between 27 and 40 results available per machine when I started updating the spreadsheet.Here are the raw figures, expressed as average credits per hour. Reply Quote ID: 2371 · Rating: 0 · rate:

Eric Driver



Project developer

Project tester

Project scientist

Project administratorProject developerProject testerProject scientist Send message

Joined: 8 Jul 11

Posts: 1045

Credit: 126,146,365

RAC: 111,070

Joined: 8 Jul 11Posts: 1045Credit: 126,146,365RAC: 111,070 Message 2372 - Posted: 6 Apr 2019, 18:41:49 UTC - in response to Message 2365. Hello,



I didn't find the answer in FAQ so just asking here:

did you plan to put code (for Cpu and Gpu Nvidia or Amd) on github/gitlab or similar so people may contribute?

Eg: I use cppcheck (a C/C++ static analyzer) to find some bugs.



I hadn't thought about that. Up until now, I have just emailed a tarball to anybody that wanted to help develop. I have extensive testing scripts that I run before deploying new executables and these include running on a private BOINC server. I dont know how all that would work in a github environment, and there would need to be some changes. Do any other projects develop in this way?



I will checkout cppcheck when I get a chance. The bugs I was referring to are caused by different OpenCL implementations, because I coded as if it were normal C, but there are special rules that need to be followed for "OpenCL C". Nvidia's implementation seems to follow traditional C, so my code worked well there. As an example, I was passing an array of flags back to the host. I coded these as booleans. Nvidia had no problem, but to get it to work on AMD GPUs I had to change the booleans to chars. I hadn't thought about that. Up until now, I have just emailed a tarball to anybody that wanted to help develop. I have extensive testing scripts that I run before deploying new executables and these include running on a private BOINC server. I dont know how all that would work in a github environment, and there would need to be some changes. Do any other projects develop in this way?I will checkout cppcheck when I get a chance. The bugs I was referring to are caused by different OpenCL implementations, because I coded as if it were normal C, but there are special rules that need to be followed for "OpenCL C". Nvidia's implementation seems to follow traditional C, so my code worked well there. As an example, I was passing an array of flags back to the host. I coded these as booleans. Nvidia had no problem, but to get it to work on AMD GPUs I had to change the booleans to chars. Reply Quote ID: 2372 · Rating: 0 · rate:

Eric Driver



Project developer

Project tester

Project scientist

Project administratorProject developerProject testerProject scientist Send message

Joined: 8 Jul 11

Posts: 1045

Credit: 126,146,365

RAC: 111,070

Joined: 8 Jul 11Posts: 1045Credit: 126,146,365RAC: 111,070 Message 2373 - Posted: 6 Apr 2019, 19:01:21 UTC - in response to Message 2367. My 32-bit Linux machine is still using version 2.12.

Are there plans to release version 3.00 apps for this platform (and 32-bit Windows)?



So I just queried the database to get an idea of how many 32bit users there are. Here are the numbers over the last 5 days:

Total WUs processed: 191,000

32 bit linux WUs: 141 (=.07%)

32 bit windows: 475 (=.25%)



So I am not sure if its worth the effort to maintain these versions. I am adding it to my list of To-Dos, but it will be much lower priority.



How old are these 32bit computers? Weren't the last 32bit machines back in the pentium days? Or are you running a VM in a newer machine? So I just queried the database to get an idea of how many 32bit users there are. Here are the numbers over the last 5 days:Total WUs processed: 191,00032 bit linux WUs: 141 (=.07%)32 bit windows: 475 (=.25%)So I am not sure if its worth the effort to maintain these versions. I am adding it to my list of To-Dos, but it will be much lower priority.How old are these 32bit computers? Weren't the last 32bit machines back in the pentium days? Or are you running a VM in a newer machine? Reply Quote ID: 2373 · Rating: 0 · rate:

Eric Driver



Project developer

Project tester

Project scientist

Project administratorProject developerProject testerProject scientist Send message

Joined: 8 Jul 11

Posts: 1045

Credit: 126,146,365

RAC: 111,070

Joined: 8 Jul 11Posts: 1045Credit: 126,146,365RAC: 111,070 Message 2374 - Posted: 6 Apr 2019, 19:09:34 UTC - in response to Message 2368. In the past, I think that the NumberFields tasks took my PCs about 4 hours to complete and from my notes a while back the fixed credits were about 370 per completed task - so that's about 1.5 credits per minute.(I can't see any of my old results on the project so I'm not 100% sure of this).



Using the v3.0 CPU app, the 2 PCs I've run Numberfields on (since last night) are earning at around 0.667 and 0.835 credits per minute (respectively).



Am I using the wrong "old" data and making an incorrect assumption that the credits per hour are now less than the v2.x app? If so, then that would be a shame.



On the other hand, I do hope your server(s) can cope with the increased number of tasks being downloaded as well as more frequent uploads being made.



regards

Tim



I recall seeing a small credit drop in January after upgrading the server, so there is probably some truth to your memories. It looks like Richard started recording data after that, so he wouldn't show the drop. I recall seeing a small credit drop in January after upgrading the server, so there is probably some truth to your memories. It looks like Richard started recording data after that, so he wouldn't show the drop. Reply Quote ID: 2374 · Rating: 0 · rate:

Eric Driver



Project developer

Project tester

Project scientist

Project administratorProject developerProject testerProject scientist Send message

Joined: 8 Jul 11

Posts: 1045

Credit: 126,146,365

RAC: 111,070

Joined: 8 Jul 11Posts: 1045Credit: 126,146,365RAC: 111,070 Message 2375 - Posted: 6 Apr 2019, 19:25:51 UTC - in response to Message 2370. I was a bit taken aback to see the much shorter estimated runtime when I first saw my task list this morning, but once I'd focused on the version number and read this thread, all was explained.



As it happens, I'd started a spreadsheet to measure the performance of my Windows machines in BOINC credit terms. I have three identical i5-4690 CPU @ 3.50GHz CPUs running Windows 7/64, but with different software loaded for different purposes: with version 2.12, they were recording 68, 70, 72 credits per hour with minuscule variation (st_dev down to 0.00073).



Under version 3.00 - exactly the same! I don't know how you managed it, but that's the smoothest version upgrade I've ever seen. No problems with runtime estimates and over/under fetching, no interruption to work flow, no messy credit adjustments. The only thing I haven't checked yet is whether the more efficient application increases the power consumption of the CPU, but I'll check that later - I haven't got the watt-meter in circuit at the moment.



I'd say that was a fair result. We are contributing the same hardware and (subject to checking) the same power, and we've done nothing to optimise our systems. You've done the work, and you've got the benefit in the form of a much increased result rate.



Bravo, and well done. :-)



The reason the credit rates are so similar is the credit_from_runtime option. We may just have to stick with that, at least until the new GPU apps come out.



I will be interested in seeing your power consumption analysis. My cpu monitor shows temps about 10 deg F higher. This may explain why the cpu version is so much more efficient than it used to be. In my version of the algorithm, I use gmp which is supposed to be highly efficient; the old version, being a PARI function, used PARI's built in multi-precision, which is probably less efficient. By efficiency, I mean keeping more of the data in on-chip cache instead of RAM. I have heard that RAM access is an order of magnitude slower than cache. The reason the credit rates are so similar is the credit_from_runtime option. We may just have to stick with that, at least until the new GPU apps come out.I will be interested in seeing your power consumption analysis. My cpu monitor shows temps about 10 deg F higher. This may explain why the cpu version is so much more efficient than it used to be. In my version of the algorithm, I use gmp which is supposed to be highly efficient; the old version, being a PARI function, used PARI's built in multi-precision, which is probably less efficient. By efficiency, I mean keeping more of the data in on-chip cache instead of RAM. I have heard that RAM access is an order of magnitude slower than cache. Reply Quote ID: 2375 · Rating: 0 · rate:

Richard Haselgrove

Send message

Joined: 28 Oct 11

Posts: 148

Credit: 123,097,096

RAC: 71,461

Joined: 28 Oct 11Posts: 148Credit: 123,097,096RAC: 71,461 Message 2376 - Posted: 6 Apr 2019, 20:05:22 UTC - in response to Message 2375. I will be interested in seeing your power consumption analysis. I'll dig them up, but it may take a while. I posted them on a message board, but I think it was SETI - which has crashed hard this weekend. And I tidied away my notes when I had a visitor last month: that's fatal, of course. I'll dig them up, but it may take a while. I posted them on a message board, but I think it was SETI - which has crashed hard this weekend. And I tidied away my notes when I had a visitor last month: that's fatal, of course. Reply Quote ID: 2376 · Rating: 0 · rate:

Julien

Send message

Joined: 14 Sep 13

Posts: 2

Credit: 876,769

RAC: 0

Joined: 14 Sep 13Posts: 2Credit: 876,769RAC: 0 Message 2377 - Posted: 7 Apr 2019, 6:44:15 UTC Hello again,



Thank you for your feedback. Also, perhaps it could be interesting you send your code about GPU polynomial discriminant algorithm to PARI authors.

Indeed, it could help them, perhaps they could find some flaws but also they may have some idea to improve it even more! Reply Quote ID: 2377 · Rating: 0 · rate:

[AF>Amis des Lapins] Jean-Luc

Send message

Joined: 16 May 12

Posts: 7

Credit: 17,508,915

RAC: 0

Joined: 16 May 12Posts: 7Credit: 17,508,915RAC: 0 Message 2378 - Posted: 7 Apr 2019, 8:53:45 UTC - in response to Message 2377. Congratulations to Eric Driver for making the search much faster.

This is incredible !



For credits, it's excellent right now.

But when the GPU tasks come out, it will certainly be necessary to give a fixed credit for the tasks.

A good solution might be to average all the credits given for the tasks currently and take this average as a fixed credit for the tasks.

This should be very fair...



There is no longer any hope now that it is possible to make GPU calculations significantly more efficient than CPU calculations.

However, one thing worries me. If the calculations become 30 or 100 times more efficient, a GPU task will then take about 80 or 25 seconds, which is very short !

This will generate a lot of traffic ! Reply Quote ID: 2378 · Rating: 0 · rate: