Message boards : Generalized Fermat Prime Search : Invalid GFN WUs when validating ATI against OCLcuda

Author Message

Very rarely I get an invalid Genefer result, but when I do, it's always against two OCLcuda wingmen and it's always GFN17MEGA. Could this be a validation issue?



I used to run much more demanding GFN-WR workunits before (on the same machine) and I never got an invalid result. It's seems strange to me that GFN17MEGA is producing errors, if GFN-WR hasn't.



http://www.primegrid.com/results.php?hostid=293682&offset=0&show_names=0&state=6&appid=

Very rarely I get an invalid Genefer result, but when I do, it's always against two OCLcuda



Since the majority of all GFN tasks are OCL-ATI or OCL-CUDA (it's the same program with different names that's sent to NVIDIA and ATI/AMD GPUs), it would be more surprising if your wingmen for your failures was anything other than an OCL task. You should find that the majority of your successful tasks also have wingmen running the OCL app.



I used to run much more demanding GFN-WR workunits before (on the same machine) and I never got an invalid result. It's seems strange to me that GFN17MEGA is producing errors, if GFN-WR hasn't.



It's probably got more to do with luck, or with the calendar, than with the workunits. As the GPU ages, it will become more prone to errors, both because the cooling may be getting less efficient due to dust or contaminants on the cooling surfaces as well as the electronics wearing out. There's a cumulative effect that heat has on the boundaries between the silicon layers.

____________

My lucky number is 75898524288+1

Hi Vortac,



GFN-WR/22 uses the double-precision 'OCL' transform whereas GFN17-MEGA uses the number-theoretic 'OCL4' transform. Those are completely different code so can explain why you see different behaviour from one sub-project to another.



Not sure yet why you are getting occasional invalid results on GFN17-MEGA, I am just re-running one of those with the CPU code as an additional data-point. It would also be very interesting if other users with AMD cards could try running this test:



./geneferocl -q 43322502^131072+1



and post your residue, to understand if the problem is repeatable (i.e. a software bug that affects only AMD cards) or a transient error that occasionally affects your hardware...



- Iain



____________

Twitter: IainBethune

Proud member of team "Aggie The Pew". Go Aggie!

3073428256125*2^1290000-1 is Prime!

That command line is going to fail-ocl4 will error out on a mac with an amd card :)



I also find it interesting that the failures occur on ocl4, which which appears to be less taxing on a tahiti gpu than ocl. Running that unit now on one gpu on a 7990. The wr unit running on the other gpu is showing steady 98% load, the mega unit fluctuating around 86%. The gpu running the mega unit is 10C cooler, and it usually runs 3-4C hotter.

C:\ProgramData\BOINC\projects\www.primegrid.com>primegrid_genefer_3_3_0_3.12_windows_intelx86__atiGFNWR.exe -q "43322502^131072+1"

geneferocl 3.3.0-1 (Windows/OpenCL/32-bit)

Supported transform implementations: ocl ocl3 ocl4 ocl5

Copyright 2001-2016, Yves Gallot

Copyright 2009, Mark Rodenkirch, David Underbakke

Copyright 2010-2012, Shoichiro Yamada, Ken Brazier

Copyright 2011-2014, Michael Goetz, Ronald Schneider

Copyright 2011-2016, Iain Bethune

Genefer is free source code, under the MIT license.



Command line: primegrid_genefer_3_3_0_3.12_windows_intelx86__atiGFNWR.exe -q 43322502^131072+1



Priority change succeeded.



Testing 43322502^131072+1...

Using OCL4 transform



Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (2004.6)' and driver '2004.6 (VM)'.



Starting initialization...

Initialization complete (1.077 seconds).

Estimated time remaining for 43322502^131072+1 is 0:24:36

43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.0000) (time = 0:25:36) 12:53:00

Thanks VZ, I'm looking into it.



- Iain

____________

Twitter: IainBethune

Proud member of team "Aggie The Pew". Go Aggie!

3073428256125*2^1290000-1 is Prime!

Uh oh.



Obviously, we're taking this very seriously. We shall get to the bottom of it and we'll keep you informed of what we discover as well as what corrective action will be taken.

____________

My lucky number is 75898524288+1

Testing 43322502^131072+1...

Using OCL4 transform



Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (2004.6)' and driver '2004.6 (VM)'.



Starting initialization...

Initialization complete (1.077 seconds).

Estimated time remaining for 43322502^131072+1 is 0:24:36

43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.0000) (time = 0:25:36) 12:53:00

Same residue on NVIDIA:



geneferocl 3.3.1-1 (Windows/OpenCL/32-bit)



Copyright 2001-2016, Yves Gallot

Copyright 2009, Mark Rodenkirch, David Underbakke

Copyright 2010-2012, Shoichiro Yamada, Ken Brazier

Copyright 2011-2014, Michael Goetz, Ronald Schneider

Copyright 2011-2016, Iain Bethune

Genefer is free source code, under the MIT license.



Command line: geneferocl_windows.exe -q 43322502^131072+1



Priority change succeeded.

A benchmark is needed to determine best transform, testing available transform implementations...

OCL transform is past its b limit.

OCL3 transform is past its b limit.

Testing OCL4 transform...

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 680', version 'OpenCL 1.2 CUDA' and driver '364.51'.

OCL5 transform is past its b limit.

Benchmarks completed (5.078 seconds).



Testing 43322502^131072+1...

Using OCL4 transform

Starting initialization...

Initialization complete (1.049 seconds).

Estimated time remaining for 43322502^131072+1 is 0:18:52

43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.0000) (time = 0:18:54) 15:27:11

I also got the same residue '71e42cc1499c83c1' on a Tesla k20m using the OCL4 transform and driver 346.46. This card has ECC memory and 'HPC-spec' cooling, so I expect it to be accurate. Still waiting on my CPU run to finish. At the moment it looks like the question is this:



Why did the two wingmen both return the residue '696d5c384b46f4c9'. They are:



Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '364.72'

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 980 Ti', version 'OpenCL 1.2 CUDA' and driver '364.72'.



Looking at all the other tasks which have been invalidated on this host, the wingmen which generated the 'valid' results used the following (all Windows hosts, for what it's worth):



Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970M', version 'OpenCL 1.2 CUDA' and driver '364.72'.

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 750', version 'OpenCL 1.2 CUDA' and driver '364.91'.

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 980', version 'OpenCL 1.2 CUDA' and driver '364.72'.

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970M', version 'OpenCL 1.2 CUDA' and driver '364.51'.





I also checked out some other WUs involving this host. For example:



http://www.primegrid.com/workunit.php?wuid=477656349



Valid results from:



Running on platform 'NVIDIA CUDA', device 'GeForce GTX 750 Ti', version 'OpenCL 1.2 CUDA' and driver '362.00'.

Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (2004.6)' and driver '2004.6 (VM)'.



Invalid result from:



Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970M', version 'OpenCL 1.2 CUDA' and driver '364.72'.



To me this is starting to look more and more like the 364 drivers from Nvidia are causing wrong results, although that's clearly not the case on Yves' hardware...



- Iain

____________

Twitter: IainBethune

Proud member of team "Aggie The Pew". Go Aggie!

3073428256125*2^1290000-1 is Prime!

To me this is starting to look more and more like the 364 drivers from Nvidia are causing wrong results, although that's clearly not the case on Yves' hardware...

Just an hypothesis, but I think that 346.x generates wrong results with Maxwell, not with Kepler or Fermi.



Later today I plan on implementing a change on the server that will prevent it from sending out GPU tasks to computers running the 364.xx driver.



More details will be available later.



____________

My lucky number is 75898524288+1

Is there a concern with respect to completed units that were adjudicated only on machines running the 364 driver?

Yves Gallot

Volunteer developer

Project scientist

Send message

Joined: 19 Aug 12

Posts: 578

ID: 164101

Credit: 304,715,793

RAC: 0



Joined: 19 Aug 12Posts: 578ID: 164101Credit: 304,715,793RAC: 0 Message 94392 - Posted: 21 Apr 2016 | 15:36:56 UTC - in response to Message 94386.

Last modified: 21 Apr 2016 | 15:37:13 UTC

To me this is starting to look more and more like the 364 drivers from Nvidia are causing wrong results, although that's clearly not the case on Yves' hardware...

Just an hypothesis, but I think that 346.x generates wrong results with Maxwell, not with Kepler or Fermi.



I upgraded my computer with driver '364.72': same residue.

I have been running many GFN20 tasks with 364.51: none were invalid and several were double-checked with Tahiti.



Command line: geneferocl_windows.exe -q 43322502^131072+1



Priority change succeeded.

A benchmark is needed to determine best transform, testing available transform implementations...

OCL transform is past its b limit.

OCL3 transform is past its b limit.

Testing OCL4 transform...

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 680', version 'OpenCL 1.2 CUDA' and driver '364.72'.

OCL5 transform is past its b limit.

Benchmarks completed (3.699 seconds).



Testing 43322502^131072+1...

Using OCL4 transform

Starting initialization...

Initialization complete (0.666 seconds).

Estimated time remaining for 43322502^131072+1 is 0:18:51

43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.0000) (time = 0:18:40) 17:29:53

This machine has been crunching n=22 units error free. It is currently running the 364.72 driver. It is kepler-based, not maxwell.



http://www.primegrid.com/results.php?hostid=506237&offset=0&show_names=0&state=0&appid=17



Interestingly, when I ran the mega unit, the driver crashed and reloaded about halfway through.



Command line: primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q 43322502^131072+1



Priority change succeeded.

A benchmark is needed to determine best transform, testing available transform implementations...

OCL transform is past its b limit.

OCL3 transform is past its b limit.

Testing OCL4 transform...



Running on platform 'NVIDIA CUDA', device 'GeForce GTX TITAN Black', version 'OpenCL 1.2 CUDA' and driver '364.72'.



OCL5 transform is past its b limit.

Benchmarks completed (8.208 seconds).



Testing 43322502^131072+1...

Using OCL4 transform

Starting initialization...

Initialization complete (0.629 seconds).

Estimated time remaining for 43322502^131072+1 is 0:13:41

43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.0000) (time = 0:13:41) 12:01:01



C:\ProgramData\BOINC\projects\www.primegrid.com>

Is there a concern with respect to completed units that were adjudicated only on machines running the 364 driver?



Yes. It's still very early and it's entirely possible that we don't yet fully understand the scope of the problem, but based on what I know at this moment, once we've blocked the 364 driver I plan on somehow rechecking any tests where all of the validated results were with 364.

____________

My lucky number is 75898524288+1

In light of this thread I just rolled back the driver on my laptop, and took the chance to do a comparative test:



Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970M', version 'OpenCL 1.2 CUDA' and driver '364.72'.

43322502^131072+1 is composite. (RES=696d5c384b46f4c9) (1000960 digits) (err = 0.0000) (time = 0:19:15) 19:21:48



Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970M', version 'OpenCL 1.2 CUDA' and driver '362.00'.

43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.0000) (time = 0:19:21) 20:12:52

I'm yet to complete my round of testing, but so far, here's what I have on my Gtx 970:



Driver 362.00. The card was OCed; still, it's returning the correct result.

43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.0000) (time = 0:12:08) 14:07:27



Driver 364.51. While I was waiting for the next driver to download, I repeated the test 4 times. The first 2 with the card OCed, the others with factory-OC only. And look at these results!

43322502^131072+1 is composite. (RES= 696d5c384b46f4c9 ) (1000960 digits) (err = 0.0000) (time = 0:12:35) 15:00:51

43322502^131072+1 is composite. (RES= 63f88fc407418f27 ) (1000960 digits) (err = 0.0000) (time = 0:12:17) 15:13:57

43322502^131072+1 is composite. (RES= 63f88fc407418f27 ) (1000960 digits) (err = 0.0000) (time = 0:13:03) 15:27:49

43322502^131072+1 is composite. (RES= 696d5c384b46f4c9 ) (1000960 digits) (err = 0.0000) (time = 0:13:07) 15:41:22



2 different residues! For the same card, same driver and even the same OC settings!



Driver 364.72. Deja vu here: once again, 2 results for the same settings!

43322502^131072+1 is composite. (RES=63f88fc407418f27) (1000960 digits) (err = 0.0000) (time = 0:12:24) 15:55:11

43322502^131072+1 is composite. (RES=696d5c384b46f4c9) (1000960 digits) (err = 0.0000) (time = 0:12:23) 16:08:06

43322502^131072+1 is composite. (RES=63f88fc407418f27) (1000960 digits) (err = 0.0000) (time = 0:13:01) 16:22:11

43322502^131072+1 is composite. (RES=696d5c384b46f4c9) (1000960 digits) (err = 0.0000) (time = 0:13:07) 16:40:10





All that's left is the 364.96 "hot fix" (aka Doom Beta support driver). Though I have a feeling that one isn't going to be any better...

People are rolling back drivers with partially done work. What kind of affect will that have on the outcome?

People are rolling back drivers with partially done work. What kind of affect will that have on the outcome?

Good question... I'll test it, give me ~10min.



Yves, is it too much work to make a debug version that logs each and every step of the way to the genefer file, so we can see where it's errorring out? We might get some info with it...

People are rolling back drivers with partially done work. What kind of affect will that have on the outcome?



43322502^131072+1 is composite. ( RES=8731a69057a4d411 ) (1000960 digits) (err = 0.0000) (time = 0:12:54) 17:24:33



I went around 2/3 of the task with 364.96, before switching to 362.00. And a completely new residue showed up...





Michael, this got me wondering: what will we do about previously "valid" tasks? How will we handle credit, as they returned wrong residues, but this really wasn't the user's fault...

People are rolling back drivers with partially done work. What kind of affect will that have on the outcome?



It depends on the exact point where the error is occurring.



We're looking right now at modifying the validator to look at the version string as reported by the app in the stderr file. The string is printed at the beginning of the run.



What we're planning on doing right now is to have the validator require at least one result NOT be from a computer running OCL on an Nvidia 364.xx driver. If all the "valid" results are from 364.xx, we'll bump up nresults and send out an additional task.



My earlier plan to stop the server from sending out tasks to computers running 364.xx isn't panning out at the moment.

____________

My lucky number is 75898524288+1

If all the "valid" results are from 364.xx, we'll bump up nresults and send out an additional task.





If we assume the 364.xx results are wrong, and the third machine comes back with a different residue, would a fourth (and possibly a 5th, if the quorum is 3) unit will be required, and presumably also from a non-364.xx machine?

If we assume the 364.xx results are wrong, and the third machine comes back with a different residue, would a fourth (and possibly a 5th, if the quorum is 3) unit will be required, and presumably also from a non-364.xx machine?

Yes, exactly. The quorum doesn't change, we just need one result for any given residue to not be from nVidia OpenCL driver 364.xx.



Edit: The validator change has now been made. We're going to have to discuss among ourselves how to handle already-validated workunits. We're not purging any current GFN workunits and we can get data on already-purged ones from database backups. I've got daily backups for all of 2015-16, but at the moment I don't think the problems started before March 1st of this year. If in doubt we'll retest.

Figured I'd give my GTX 760 (362.00) and GT 740 (364.51) both a go.



GTX 760:

C:\ProgramData\BOINC\projects\www.primegrid.com>primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"

geneferocl 3.3.0-1 (Windows/OpenCL/32-bit)

Supported transform implementations: ocl ocl3 ocl4 ocl5

Copyright 2001-2016, Yves Gallot

Copyright 2009, Mark Rodenkirch, David Underbakke

Copyright 2010-2012, Shoichiro Yamada, Ken Brazier

Copyright 2011-2014, Michael Goetz, Ronald Schneider

Copyright 2011-2016, Iain Bethune

Genefer is free source code, under the MIT license.



Command line: primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q 43322502^131072+1



Priority change succeeded.

A benchmark is needed to determine best transform, testing available transform implementations...

OCL transform is past its b limit.

OCL3 transform is past its b limit.

Testing OCL4 transform...



Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', version 'OpenCL 1.2 CUDA' and driver '362.00'.



OCL5 transform is past its b limit.

Benchmarks completed (1.317 seconds).



Testing 43322502^131072+1...

Using OCL4 transform

Starting initialization...

Initialization complete (0.759 seconds).

Estimated time remaining for 43322502^131072+1 is 0:22:13

43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960digits) (err = 0.0000) (time = 0:22:41) 21:34:21



GT 740:

C:\ProgramData\BOINC\projects\www.primegrid.com>primegrid_genefer_3_3_0_3.12_win

dows_intelx86__OCLcudaGFN17MEGA.exe -q "43322502^131072+1"

geneferocl 3.3.0-1 (Windows/OpenCL/32-bit)

Supported transform implementations: ocl ocl3 ocl4 ocl5

Copyright 2001-2016, Yves Gallot

Copyright 2009, Mark Rodenkirch, David Underbakke

Copyright 2010-2012, Shoichiro Yamada, Ken Brazier

Copyright 2011-2014, Michael Goetz, Ronald Schneider

Copyright 2011-2016, Iain Bethune

Genefer is free source code, under the MIT license.



Command line: primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFN17MEGA.exe -q 43322502^131072+1



Priority change succeeded.

A benchmark is needed to determine best transform, testing available transform implementations...

OCL transform is past its b limit.

OCL3 transform is past its b limit.

Testing OCL4 transform...



Running on platform 'NVIDIA CUDA', device 'GeForce GT 740', version 'OpenCL 1.2CUDA' and driver '364.51'.



OCL5 transform is past its b limit.

Benchmarks completed (3.262 seconds).



Testing 43322502^131072+1...

Using OCL4 transform

Starting initialization...

Initialization complete (1.434 seconds).

Estimated time remaining for 43322502^131072+1 is 1:02:26

43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.0000) (time = 1:01:45) 22:09:34

____________

My Primes

Badge Score: 2*1 + 4*2 + 6*6 + 7*7 + 9*2 + 10*1 = 123



In case there was still any doubt, the correct residue as found by the CPU x87 code:



Testing 43322502^131072+1... Using x87 (80-bit) transform The checkpoint doesn't match current test: 43322502^131072+1 != 10000000^8192+1. Current test will be restarted Starting initialization... Initialization complete (0.599 seconds). Estimated time remaining for 43322502^131072+1 is 12:38:49 Testing 43322502^131072+1... 3244032 steps to go (12:11:01 remaining) 43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.2500) (time = 11:12:32) 00:04:30

____________

Twitter: IainBethune

Proud member of team "Aggie The Pew". Go Aggie!

3073428256125*2^1290000-1 is Prime!

Yves Gallot

Volunteer developer

Project scientist

Send message

Joined: 19 Aug 12

Posts: 578

ID: 164101

Credit: 304,715,793

RAC: 0



Joined: 19 Aug 12Posts: 578ID: 164101Credit: 304,715,793RAC: 0 Message 94455 - Posted: 23 Apr 2016 | 10:43:11 UTC

Last modified: 23 Apr 2016 | 10:49:28 UTC

I'm trying to outline the bug in NVIDIA GPU instructions.



The OpenCL source code is compiled (by the driver) in a PTX file (a virtual instruction set for parallel computing). Then the PTX program is translated to the target hardware instruction set.



There is a bug in the OpenCL-to-PTX compiler or in the PTX-to-GPU translator.

Both depend on hardware Compute Capability (https://en.wikipedia.org/wiki/CUDA#Supported_GPUs).

PTX and GPU instructions are different on different hardware.



Correct results were reported on GeForce GTX TITAN Black, Tesla K20m, GeForce GTX 680, ... then CC 3.0 and 3.5 devices are not affected by the bug.

CC 5.2 (5.0?) device results are invalid.



On my computer, the PTX file generated by 364 driver is identical to 362 version. But my computer is a CC 3.0 GPU.

The files are available in C:\Users\<login>\AppData\Roaming\NVIDIA\ComputeCache directory. You can clear it and run genefer. 3 files are created, one of them contains genefer OpenCL source code and PTX instructions.



Are 362 and 364 PTX files identical on a CC 5 device?



If they are, the bug is in the PTX-to-GPU translator.

The translator included in 36x drivers is CUDA 8 binary generator. This version is not yet available then I don't know how to generate the bin file translated by 36x drivers.

If somebody knows how to generate it, by comparing 362 and 364 versions, we could easily find the wrong instruction.



The translator included in 36x drivers is CUDA 8 binary generator. This version is not yet available then I don't know how to generate the bin file translated by 36x drivers.

If somebody knows how to generate it, by comparing 362 and 364 versions, we could easily find the wrong instruction.

Unfortunately bug also may be not in compiler but in common driver code (how threads are scheduled, GPU memory allocated, etc). It may be just a random GPU or CPU memory corruption in unrelated part of driver.



Back to the question, you can do it other way. Ask somebody with different drivers versions (bad and good) send whole content of ComputeCache directory to you. I have 750ti/CC 5.2 and driver 361.*, so I can send you good ones. The PTX intermediate code is easy to extract and compare. Comparing binary code is more tricky but also possible. One of these cache files contains GPU binary code (result of PTX compilation), this is an ELF format program but it's surrounded by extra headers. Use hex editor to cut out ELF part (starting from "\x7F" "ELF" signature to the end of file). Then you can run disassembler (part of CUDA tools) on the ELF file to get readable listing of machine-code GPU instructions and compare them with other versions.



Unfortunately bug also may be not in compiler but in common driver code (how threads are scheduled, GPU memory allocated, etc). It may be just a random GPU or CPU memory corruption in unrelated part of driver.

You may be right but it is not a ramdom corruption because error is repeatable (residue is incorrect but is always the same value).



Use hex editor to cut out ELF part (starting from "\x7F" "ELF" signature to the end of file). Then you can run disassembler (part of CUDA tools) on the ELF file to get readable listing of machine-code GPU instructions and compare them with other versions.



Thanks, I disassembled the machine code my computer: 362 and 364 codes are identical. But it is EF_CUDA_SM30 code.



You may be right but it is not a ramdom corruption because error is repeatable (residue is incorrect but is always the same value).

It's not necessarily repeatable on every run, see http://www.primegrid.com/forum_thread.php?id=6775&nowrap=true#94408.

You may be right but it is not a ramdom corruption because error is repeatable (residue is incorrect but is always the same value).

It's not necessarily repeatable on every run, see http://www.primegrid.com/forum_thread.php?id=6775&nowrap=true#94408.

Yes, it is almost repeatable :o)

Yves Gallot

Volunteer developer

Project scientist

Send message

Joined: 19 Aug 12

Posts: 578

ID: 164101

Credit: 304,715,793

RAC: 0



Joined: 19 Aug 12Posts: 578ID: 164101Credit: 304,715,793RAC: 0 Message 94477 - Posted: 23 Apr 2016 | 19:26:13 UTC - in response to Message 94456.

Last modified: 23 Apr 2016 | 20:03:52 UTC

I have 750ti/CC 5.2 and driver 361.*,

Your driver compiler is CUDA 7.5 (PTX ISA 4.3).

Is it a 361? On my computer, driver 361 compiler is CUDA 8 (PTX ISA 5.0).

I have 750ti/CC 5.2 and driver 361.*,

Your driver compiler is CUDA 7.5 (PTX ISA 4.3).

Is it a 361? On my computer, driver 361 compiler is CUDA 8 (PTX ISA 5.0).

Sorry, I forgot which system has which drivers. This one was 359.06. I have two more 750TIs at another place where I surely had 361.* installed but I could visit it only late Monday.



Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94603 - Posted: 28 Apr 2016 | 12:06:41 UTC

Last modified: 28 Apr 2016 | 12:20:36 UTC

I have a contact at NVIDIA who is looking to actively fix this, but he needs more info.

His is asking this:



I tried added the project "PrimeGrid" through "BONIC Manager" on a Windows7sp1 64bit/GTX 970/ driver 364.72 test system. However, I only have "primegrid_sr2sieve_wrapper_1.12_windows_x86_64.exe" tool under C:\ProgramData\BOINC\projects\www.primegrid.com\ for now. According to the information posted on forum thread, looks like it needs to use the following command line to reproduce.

Command line: primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q 43322502^131072+1



Is it possible to provide us a standalone test case and the detail instructions that ran into this issue? It would be helpful for us to investigate this issue further with a local reproduction exactly. Thanks.



So.... is this the correct answer to his question?



Is the answer to his question:



Download the .exe files from here:



http://www.primegrid.com/download/primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe

http://www.primegrid.com/download/primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFNWR.exe



... and then run them via these command lines:

primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"

primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"



... and the result should ALWAYS say:

(RES=71e42cc1499c83c1)



Is this what I should tell him??

I have a contact at NVIDIA who is looking to actively fix this, but he needs more info.

His is asking this:



I tried added the project "PrimeGrid" through "BONIC Manager" on a Windows7sp1 64bit/GTX 970/ driver 364.72 test system. However, I only have "primegrid_sr2sieve_wrapper_1.12_windows_x86_64.exe" tool under C:\ProgramData\BOINC\projects\www.primegrid.com\ for now. According to the information posted on forum thread, looks like it needs to use the following command line to reproduce.

Command line: primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q 43322502^131072+1



Is it possible to provide us a standalone test case and the detail instructions that ran into this issue? It would be helpful for us to investigate this issue further with a local reproduction exactly. Thanks.



So.... is this the correct answer to his question?



Is the answer to his question:



Download the .exe files from here:



http://www.primegrid.com/download/primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe

http://www.primegrid.com/download/primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFNWR.exe



... and then run them via these command lines:

primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"

primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"



... and the result should ALWAYS say:

(RES=71e42cc1499c83c1)



Is this what I should tell him??



You could recommend a manual GFN sieve. My n=22 sieve results were invalid when I upgraded to 364. for a 750TI. They can run very quick, and produce factor files that can be compared for differences.



I have a contact at NVIDIA who is looking to actively fix this, but he needs more info.

His is asking this:Is it possible to provide us a standalone test case and the detail instructions that ran into this issue? It would be helpful for us to investigate this issue further with a local reproduction exactly. Thanks.

It would be helpful if you explained a bit how the app works, so that he understands the command line better (and thus can troubleshoot mistakes). Here's what I'd tell him:



1- Download the app here

2- Taking the file above, the command line is as follows:

geneferocl_windows.exe -q "43322502^131072+1" -d 0 -x ocl4

2a- The bold part is the app. Both the space after the -q and the quotes are important (for the Windows version, at least).

2b- Underlined is to select the GPU, if you have multiples. Say, an iGPU + GPU, or SLI, etc. If you only have one, don't bother changing that.

2c- Italian part is just to select a fast transform. It's not needed, but it'll speed things up a little bit, as you don't have to do benchmarks. Again, don't bother changing it.



If the result is correct, you'll see (RES=71e42cc1499c83c1) printed on the screen. Or, if you are using a .bat file like I do, it'll be stored into the genefer.txt file.



If it's wrong, however, something different will show up. In the specific case of a Gtx 970, I'd expect either (RES=696d5c384b46f4c9) OR (RES=63f88fc407418f27). Which one shows up seems random, but it's consistently one of the 2.



Also, if the driver is changed midways, the end result will differ again. If you want to reset the current test and try from scratch, delete the genefer.ckpt file before starting the app again.





Btw, once it's finished fixing this, could you ask your contact to take a look at this other issue with GeneferOCL please: https://www.youtube.com/watch?v=dBthOfr7im0

What Rafael said. Plus, if he wants the source code for Genefer is available (it's open source). And he can speak directly to the admins here if he wants.

____________

My lucky number is 75898524288+1

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94612 - Posted: 28 Apr 2016 | 15:19:29 UTC

Last modified: 28 Apr 2016 | 16:12:57 UTC

I passed along info to NVIDIA for the standalone test case using:

primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"



They were able to reproduce it, and a dev is looking at it now.

WOOT!



Edit:



I've linked him also to:

http://www.primegrid.com/forum_thread.php?id=6775&nowrap=true#94606

and

https://www.youtube.com/watch?v=dBthOfr7im0

... and requested a fix for that too.



We'll see what shakes out.



PS: All I did to get the ball rolling, was to create a dev account here, and report the bug as clearly and simply as I could:

https://developer.nvidia.com

Thanks Jacob, great that Nvidia are taking an interest. If they have any other questions, feel free to put them in touch with me, if that's helpful.



- Iain

____________

Twitter: IainBethune

Proud member of team "Aggie The Pew". Go Aggie!

3073428256125*2^1290000-1 is Prime!

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94629 - Posted: 29 Apr 2016 | 1:38:55 UTC

:) I like fixing things!



Here is the current correspondance I've had, with Kevin Kang, who works with NVIDIA, and is contacting me via comments on my https://developer.nvidia.com NVIDIA Bug report. It is BUG ID #1754468. It is titled "OpenCL failures on R364 drivers".



They have steps to repro the issues, and so I didn't anticipate replying back to them unless they ask more info.



The Bug Details are below, with the important Comments History correspondence at the bottom.



I'll chime back in on this thread, whenever any info gets updated.



Regards,

Jacob



=========================



Description:

OpenCL has some serious issues with R364. Poem@Home is crashing, PrimeGrid is receiving incorrect results, and OpenCL SDK Examples are failing. Please investigate and fix, using details below!



Duplication Steps:

1) Something in 364.72 drivers is causing POEM@Home to TDR/BSOD, but not in 362.00. Tracking thread here: http://boinc.fzk.de/poem/forum_thread.php?id=1205 ... please fix!

2) PrimeGrid is receiving incorrect results on R364, but correct results on 362.00. Tracking thread here: http://www.primegrid.com/forum_thread.php?id=6775#94408 ... please fix!

3) The following 7 OpenCL SDK Code Samples are failing currently, on my GTX 970, on Windows 10 x64, Builds 10586 and 14316. Confirmed also on Windows 8 x64 and Windows 7 x64. Error details included below.



Please fix these OpenCL issues, and let me know if there's anything more I can provide!

Thanks!



===============

13 - oclSimpleD3D10Texture

!!! Error # -11 at file .\oclSimpleD3D10Texture.cpp, line 396

<kernel>:47:9: error: can't convert between vector values of different size ('__attribute__((address_space(16776963))) float4' and 'double')

pixel *= 255.0;

~~~~~ ^ ~~~~~



14 - oclSimpleD3D9Texture

!!! Error # -9999 (Unspecified Error) at line 345 , in file .\oclSimpleD3D9Texture.cpp !!!



28 - oclHistogram

!!! Error # -5 at line 147 , in file .\src\main.cpp !!!



32 - oclSobelFilter

!!! Error # -11 at file .\oclSobelFilter.cpp, line 303

<kernel>:55:72: error: call to 'mul24' is ambiguous

uc4LocalData[iLocalPixOffset] = uc4Source[iDevGMEMOffset + mul24(get_local_size(1), get_global_size(0))];

^~~~~



33 - oclMedianFilter

!!! Error # -11 at file .\oclMedianFilter.cpp, line 306

<kernel>:51:72: error: call to 'mul24' is ambiguous

uc4LocalData[iLocalPixOffset] = uc4Source[iDevGMEMOffset + mul24(get_local_size(1), get_global_size(0))];

^~~~~



35 - oclRecursiveGaussian

!!! Error # -11 at file .\src\oclRecursiveGaussian.cpp, line 262

<kernel>:56:14: error: call to 'mul24' is ambiguous

xIndex = mul24(get_group_id(1), get_local_size(1)) + get_local_id(0);

^~~~~



38 - oclNbody

!!! Error # -11 at file .\src\oclBodySystemOpenclLaunch.cpp, line 294

<kernel>:86:44: error: call to 'mul24' is ambiguous

accel = bodyBodyInteraction(accel, SX(i++), myPos, softeningSquared);

^~~~~~~

!!! Error # -11 (CL_BUILD_PROGRAM_FAILURE) at line 297 , in file .\src\oclBodySystemOpenclLaunch.cpp !!!



Comments History:

Thu, 2016-04-28 08:23JacobKlein

[CUDA RegDev Program [DevZone]] YouTube video of OpenCL TDR: https://www.youtube.com/watch?v=dBthOfr7im0 Please fix! :)

Thu, 2016-04-28 08:22JacobKlein

[CUDA RegDev Program [DevZone]] I'm glad you could repro it. Here's a link to a clearer description of repro steps, as well as an indication of another possibly-related problem with a YouTube video. Could you take a look to see if it's another problem that can be solved? Thanks. http://www.primegrid.com/forum_thread.php?id=6775&nowrap=true#94606

Thu, 2016-04-28 07:30Kevin Kang

Hi Jacob, Thanks for the update and providing the standalone test case. I can reproduce the PrimeGrid failure as following when I tested on Windows7sp1_64bit/GTX 970 setup with the v364.72 driver. Our developers will look into this issue further. Thanks. Estimated time remaining for 43322502^131072+1 is 0:13:54 43322502^131072+1 is composite. (RES=63f88fc407418f27) (1000960 digits) (err = 0.0000) (time = 0:13:21) 14:10:48 Thanks, Kevin

Thu, 2016-04-28 05:50JacobKlein

[CUDA RegDev Program [DevZone]] Hi Kevin! Thanks for responding! For the PrimeGrid failure, here's more information for a standalone test: - Download the OpenCL PrimeGrid program here: http://www.primegrid.com/download/primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe - Execute it with the following command line (notice the quotes that are required): primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1" - Expected result (may need to run the test multiple times to get an erroneous different result): (RES=71e42cc1499c83c1) For my new GTX 980 Ti that I recently got, I can reliably reproduce the bug on 364.96: 362.00: 43322502^131072+1 is composite. (RES=71e42cc1499c83c1) (1000960 digits) (err = 0.0000) (time = 0:09:20) 08:23:21 364.96: 43322502^131072+1 is composite. (RES=696d5c384b46f4c9) (1000960 digits) (err = 0.0000) (time = 0:09:34) 08:44:56 Note 1: The download location for PrimeGrid apps, in general, is: http://www.primegrid.com/download/ Note 2: They do have an updated app, primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFNWR.exe .... but I have not tested it, and would prefer to ignore it for our testing purposes here. Let me know if you need anything else! Thanks! Jacob Klein

Thu, 2016-04-28 02:33Kevin Kang

Hi Jacob, Thanks for the reporting and sorry for any inconvenience brought by this issue. For issue #1: TDR/BSOD issue when running POEM@Home I can reproduce the following problem(TDR) when running Poem@HOme tasks on a Windows7sp1 64bit/GTX 970/ driver 364.72 test system. Attaching the information from "Problem details" for your reference. And now, we have assigned this issue to the appropriate developer team for further investigation, we'll keep you posted once we have new update. === Problem signature: Problem Event Name: APPCRASH Application Name: poemcl_2.30_windows_intelx86__opencl_nvidia_101 Application Version: 0.0.0.0 Application Timestamp: 00411b86 Fault Module Name: poemcl_2.30_windows_intelx86__opencl_nvidia_101 Fault Module Version: 0.0.0.0 Fault Module Timestamp: 00411b86 Exception Code: 40000015 Exception Offset: 002374e8 OS Version: 6.1.7601.2.1.0.256.4 Locale ID: 1033 Additional Information 1: bee5 Additional Information 2: bee5c893ebd2ad7bd84a7da19377e94d Additional Information 3: 7ee4 Additional Information 4: 7ee4a73a18622fe27074b2886eee4e92 === For issue #2: PrimeGrid is receiving incorrect results; I tried added the project "PrimeGrid" through "BONIC Manager" on a Windows7sp1 64bit/GTX 970/ driver 364.72 test system. However, I only have "primegrid_sr2sieve_wrapper_1.12_windows_x86_64.exe" tool under C:\ProgramData\BOINC\projects\www.primegrid.com\ for now. According to the information posted on forum thread, looks like it needs to use the following command line to reproduce. Command line: primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q 43322502^131072+1 Is it possible to provide us a standalone test case and the detail instructions that ran into this issue? It would be helpful for us to investigate this issue further with a local reproduction exactly. Thanks. For issue #3: 7 OpenCL SDK Code Samples are failing; Actually, they are known issues in the OpenCL SDK code, and have been fixed in our development builds. Unfortunately, the fix of this issue is not available on the OpenCL Samples download page yet We'll keep you posted once they're ready on https://developer.nvidia.com/opencl Thanks, Kevin

Wed, 2016-04-27 21:35JacobKlein

[CUDA RegDev Program [DevZone]] I updated the Description and Duplication Steps on 4/28/2016, to correctly detail the 3 problems and provide thread links. Please fix all of these! Thanks!

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94701 - Posted: 2 May 2016 | 16:04:52 UTC

I have confirmed that today's 365.10 drivers do NOT fix the OpenCL problems -- PrimeGrid miscalculation and Poem@Home TDRs.



I'd recommend users to stick with 362.00, and projects to take action to prevent issuing OpenCL tasks to R364 users.

I have confirmed that today's 365.10 drivers do NOT fix the OpenCL problems -- PrimeGrid miscalculation and Poem@Home TDRs.



I'd recommend users to stick with 362.00, and projects to take action to prevent issuing OpenCL tasks to R364 users.

Right. Just out of curiosity, are the wrong residues the same as 364, or have they changed again?



I think the erroneous 365.10 residue matched my erroneous 364.96 residue. If you have the hardware and the patience, you could test it yourself :)

I think the erroneous 365.10 residue matched my erroneous 364.96 residue. If you have the hardware and the patience, you could test it yourself :)

I would. But my 970 is in another castle.... er, another house, and I'll only be able to get to it on late night Friady.

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94771 - Posted: 4 May 2016 | 12:44:17 UTC

I have a small status update, regarding my NVIDIA bug (Bug ID 1754468) for these OpenCL issues:

- Status changed from "Open - pending review" to "Open - in progress"

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94787 - Posted: 5 May 2016 | 12:28:06 UTC

Last modified: 5 May 2016 | 12:42:17 UTC

Iain Bethune / Michael Goetz / Yves Gallot / Rafael / GDB



While NVIDIA fixes the actual problem, they are requesting additional information in order to consider including a quick "PrimeGrid" test case in their checklist for new driver releases.



Could someone please answer these questions, either publicly or privately? If you'd prefer me to forward the actual NVIDIA email to you, so you could reply via email, you can PM me your email address.



Thanks!

Jacob



===============================================================



- Current repro case:

Command line: primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"

Expected results: (RES=71e42cc1499c83c1)



===============================================================



NVIDIA's Questions:

- Looks like there are lots of applications(with named “*_windows_intelx86__OCLcuda*”) under http://www.primegrid.com/download/, are these applications using GPU computing(OCL and/or CUDA)?

- Except the “primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe”, do you know if other test apps also have this problem?

- If this issues also persists with other applications, how can I get exact command lines, the expected results to reproduce and verify it further?

* What about the running times of each failures? The “primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1" command line takes me 10~20 mins on different Windows test systems.

- Is it possible to provide the latest source of this test application(alone with the compilation commands) that ran into this problem? It would be helpful for us to test it further with our development driver version by using a sample code(or a self-contained test case) instead of using a executable, since it’s hard to debug it further if the executable ran into other issue(i.e. segfault).

- Considering IT policy, could you please help to confirm if there is data transfer while running the test cases? If it has, please help to provide us the detailed information of data gathered/transferred during the tests. Thanks.



===============================================================



* This would benefit them the most, I'd think -- A quicker test case to reproduce the error!



===============================================================

* This would benefit them the most, I'd think -- A quicker test case to reproduce the error!

Has anyone tried residue test: geneferocl_windows.exe -nvidia -r -x ocl4



Could someone please answer these questions, either publicly or privately? If you'd prefer me to forward the actual NVIDIA email to you, so you could reply via email, you can PM me your email address.



On the forums is actually a good place to answer, because you may need multiple people to answer all the questions.



===============================================================



- Current repro case:

Command line: primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"

Expected results: (RES=71e42cc1499c83c1)



===============================================================



NVIDIA's Questions:

- Looks like there are lots of applications(with named “*_windows_intelx86__OCLcuda*”) under http://www.primegrid.com/download/, are these applications using GPU computing(OCL and/or CUDA)?



All of the executables with names *3.13_windows_intel86__OCLcuda* are actually the same binary. For achane BOINC reasons, we need multiple copies of the same binary with different names.



All apps with "OCLcuda" at the end are OpenCL apps. They will run on any OpenCL device. They are not actual CUDA apps, but, again for archane BOINC reasons, they need to have the string "cuda" in that part of the name.



The "3.13" is the version number, and is the latest version There's quite a few older versions on the server but they're not relevant to this bug.



- Except the “primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe”, do you know if other test apps also have this problem?



Since all of the other *3.13_windows_intel86__OCLcuda* apps are identical binaries, they of course all exhibit the same behavior. I don't really know about the older versions, but I strongly suspect 3.12 is the same since the differences between 3.12 and 3.13 had to do with linux build parameters and some behavior of the CPU version of the app. Both are completely different builds, and although we did rebuild the Windows OCL app for the 3.13 release nothing actually changed. So I expect 3.12 would have the same behavior. 3.11 I'm or earlier I'm not sure of.



- If this issues also persists with other applications, how can I get exact command lines, the expected results to reproduce and verify it further?



It's really just the one application.



* What about the running times of each failures? The “primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1" command line takes me 10~20 mins on different Windows test systems.



Run times of "good" and "bad" results are similar.



We should be able to find examples on shorter tests, if that's helpful.



EDIT: Or maybe not. I can't find any examples in the database of GFN15 or GFN16 tasks exhibiting the same error. If it's only manifesting on the larger tests, that in itself might help in determining the cause.



There are GFN17LOW tasks which will be slightly faster (not much, however). "3646782^131072+1" is an example. The correct result is 4bd691363f68a759 and the result produced with a GTX 660 and driver 364.72 produced 95f73e35a6a973cc. Someone with the correct hardware and driver would need to attempt to reproduce that error, however. The database doesn't have enough information to indicate whether this is an actual example of the driver bug or just a random error.



- Is it possible to provide the latest source of this test application(alone with the compilation commands) that ran into this problem? It would be helpful for us to test it further with our development driver version by using a sample code(or a self-contained test case) instead of using a executable, since it’s hard to debug it further if the executable ran into other issue(i.e. segfault).



Of course. https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/src



- Considering IT policy, could you please help to confirm if there is data transfer while running the test cases? If it has, please help to provide us the detailed information of data gathered/transferred during the tests. Thanks.



I'm not sure what this question means. I think he's asking if there's any external data used by the application. If that's correct, the answer is "no".



The program does write checkpoint files to disk periodically so that it can be restarted. You could compare the checkpoint files to see exactly when the error is happening. The source can be easily modified to write checkpoint files on every iteration (and with different filenames so they don't overwrite themselves), but that would eat up disk space in a hurry.



We can provide the format of the checkpoint file if he needs it. Most of it is just a single really, really long number which is the interim value of the calculation after the latest iteration. The rest of the file is bookkeeping information (program version number, iteration count, cumulative elapsed time, etc.)

____________

My lucky number is 75898524288+1

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94800 - Posted: 5 May 2016 | 16:53:38 UTC

Last modified: 5 May 2016 | 16:54:58 UTC

Thanks Michael. I have let my NVIDIA contacts know about your answers.



Note: They have determined that the PrimeGrid OpenCL miscalculations, have a different root cause than the Poem@Home OpenCL TDR crashes. As such, they've created a new internal Bug for PrimeGrid, which I don't have access to, but they promise to keep me informed on.



Bug ID: 200197534 (New internal Bug report): Will be used to resolve the PrimeGrid OpenCL miscalculations.



Bug ID: 1754468 (My original Bug report): Will now be used to resolve the Poem@Home OpenCL TDR crashes.



Moving forward!

Jacob

Thanks Michael. I have let my NVIDIA contacts know about your answers.



Note: They have determined that the PrimeGrid OpenCL miscalculations, have a different root cause than the Poem@Home OpenCL TDR crashes. As such, they've created a new internal Bug for PrimeGrid, which I don't have access to, but they promise to keep me informed on.



Bug ID: 200197534 (New internal Bug report): Will be used to resolve the PrimeGrid OpenCL miscalculations.



Bug ID: 1754468 (My original Bug report): Will now be used to resolve the Poem@Home OpenCL TDR crashes.



Moving forward!

Jacob



Thanks for your help, Jacob!

____________

My lucky number is 75898524288+1

I have confirmed that today's 365.10 drivers do NOT fix the OpenCL problems -- PrimeGrid miscalculation and Poem@Home TDRs.



I'd recommend users to stick with 362.00, and projects to take action to prevent issuing OpenCL tasks to R364 users.

This has been addressed on PrimeGrid, as the use of GeForce drivers after 362.00 does not affect all NVIDIA gpu devices.



I can confirm that this policy is working as intended. I have returned a GFN-18 task using driver 365.10 and wingman was using 364.72, which generated an 'inconclusive' result until a second wingman completed the task using a much older GeForce driver and validating all three results.

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94826 - Posted: 6 May 2016 | 19:56:17 UTC

Last modified: 6 May 2016 | 19:56:45 UTC

The NVIDIA contacts have additional questions.

Could someone please answer them, so I can relay the answers?



===============================================================

NVIDIA's Questions:



- The "expected result" should be "RES=71e42cc1499c83c1" consistently when running with the command line below, right? Even it has a higher version(>3.13) in the future? .... So it could be generalized to be: primegrid_genefer_3_3_0_<version>_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"



- At one point, I stated "may need to run the test multiple times to get an erroneous different result" to them. Is that true? If so, about how many test cycles should they run this command line, for a good test case when they evaluate a driver version?



- Except the (-q "43322502^131072+1") arguments, are there other options that may also hit this failure? They don’t want to lose coverage in our test plan, any other known failure points?



- Regarding the questions for “data gather/transfer during tests”, they want to know whether the PrimeGrid apps will gather the GPU’s hardware/software information and transfer them to outside servers during the tests.

===============================================================



Can someone answer, please?

- The "expected result" should be "RES=71e42cc1499c83c1" consistently when running with the command line below, right? Even it has a higher version(>3.13) in the future? .... So it could be generalized to be: primegrid_genefer_3_3_0_<version>_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1"



Correct. The result should be the same no matter which version of the app is used.



If the CPU, ATI/AMD OpenCL, or CUDA versions of Genefer were used you would also, always, get the same result (provided the program is operating correctly, of course.)



- At one point, I stated "may need to run the test multiple times to get an erroneous different result" to them. Is that true? If so, about how many test cycles should they run this command line, for a good test case when they evaluate a driver version?



I'm not certain of this answer. Perhaps Iain should give a definitive answer. I **think** that with a faulty driver, when used on hardware that manifests this problem (not all Nvidia GPU families are affected), you should get the error 100% of the time. If you're testing a fix and you start getting the correct result where you were getting the bad result previously, then you've squashed the bug.



Please wait for Iain to respond before relaying this answer to Nvidia.



- Except the (-q "43322502^131072+1") arguments, are there other options that may also hit this failure? They don’t want to lose coverage in our test plan, any other known failure points?



If you're asking whether there's other numbers that exhibit the same error, I have at least 5 more numbers where a computer running the 364 driver produced the wrong result. What I don't know is whether the bad result was due to the driver bug (in which case the bad result will be reproducible), or some other failure, e.g., excessive overclocking (in which case it will not be reproducible.) If you want additional numbers, we should be able to check some of these to see if they're reproducible or not.



If you were asking if there's completely different types of failures, the answer is no. None of the other options will cause an error.



- Regarding the questions for “data gather/transfer during tests”, they want to know whether the PrimeGrid apps will gather the GPU’s hardware/software information and transfer them to outside servers during the tests.



We collect some data, yes. And we'd be happy to share what we have. :)



For the OpenCL apps, we collect the device type and the driver version. It's captured as text:



Running on platform 'NVIDIA CUDA', device 'GeForce GTX 960', version 'OpenCL 1.2 CUDA' and driver '362.00'.

Supported transform implementations: ocl ocl3 ocl4 ocl5





Command line: projects/www.primegrid.com/primegrid_genefer_3_3_1_3.13_windows_intelx86__OCLcudaGFN15.exe -boinc -q 23632578^32768+1 --device 0





Don't relay this back to Nvidia until Iain has a chance to eyeball the answers first.

____________

My lucky number is 75898524288+1

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94831 - Posted: 6 May 2016 | 22:05:54 UTC

Last modified: 6 May 2016 | 22:07:43 UTC

Michael:

Thanks. If they run a standalone command prompt executable outside of BOINC, does it communicate any of that data over the network, or does it just store as file? Also, can you list off the command lines, and expected results, for the ~5 ones you found suspect?



Iain:

Looking forward to your responses too, please!

Michael:

Thanks. If they run a standalone command prompt executable outside of BOINC, does it communicate any of that data over the network, or does it just store as file? Also, can you list off the command lines, and expected results, for the ~5 ones you found suspect?





No, when run in standalone mode, there's no network communications. Even in BOINC mode, there's no network communications. Everything is always done through files or the command line and stdin/stdout/stderr.





____________

My lucky number is 75898524288+1

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94833 - Posted: 6 May 2016 | 23:13:02 UTC

Okay. Now waiting on Iain, before I relay the answers...



If you're asking whether there's other numbers that exhibit the same error, I have at least 5 more numbers where a computer running the 364 driver produced the wrong result. What I don't know is whether the bad result was due to the driver bug (in which case the bad result will be reproducible), or some other failure, e.g., excessive overclocking (in which case it will not be reproducible.) If you want additional numbers, we should be able to check some of these to see if they're reproducible or not.

As you may know, I have a Gtx 970, and I can do tests if anyone wants.



Just give me the numbers you want me to test, and I can give you some results back, provided they can be tested by Sunday morning (which is when I leave my PC alone for the week).

Also, can you list off the command lines, and expected results, for the ~5 ones you found suspect?



Missed that part.



The command line for each of the following is:



genefer_whatever.exe -q "#######"



The quotes are required (under Windows).



###### is in the first column below. The second is the correct result. The third column is the bad result.



3646782^131072+1 4bd691363f68a759 95f73e35a6a973cc

3674128^131072+1 5e477f1623e46ba5 cefdc8f58b09622a

3681296^131072+1 8655e9221606630b 1a20d5d48e09b5a2

3717024^131072+1 fba4548ad43ae8f5 9b1d30a302090ce0

3793078^131072+1 3ce3a924ec8052ae 66308eb84d746bbb

3880700^131072+1 e318cf569684adcf dc2d5d2c99f4ba4f



These are 17-Low tasks, while the test case you've been using is a 17-mega. I'm not sure if that makes a difference -- 17-low uses the "low-b" variant of the OCL4 transform while 17-mega uses the "high-b" variant of OCL4.



Can someone who has the driver and GPU that manifests this bug check to see whether they can reproduce the erroneous result?



In the database, there's 0 such results on GFN15 and GFN16, both of which use the same transform as GFN17-mega. GFN17-mega has over 2000 such results. There's many, many more 15's and 16's overall, so those numbers are not expected. It's not clear whether the numerous bad results are due to the driver or simply some bad GPUs that happen to be running 364.xx or 365.xx.



In the other ranges, all of which use a different transform than 17-megs, have the following number of bad "364.xx" results:



GFN15: 0 (same transform as 17-mega)

GFN16: 0 (same transform as 17-mega)

GFN17-Low: 6 (as reported above)

GFN117-mega: 2257 (Our test case)

GFN18: 4

GFN19: 13

GFN20: 0

GFN21: 4

GFN22: 2



Does anyone know if the problem only manifests itself on 17-mega (or only with OCL4)?

____________

My lucky number is 75898524288+1

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94837 - Posted: 7 May 2016 | 0:29:42 UTC

Last modified: 7 May 2016 | 0:30:38 UTC

I have 2 overclocked-but-stable GTX 980 Ti's, I'm an expert at clean-driver-installing, I know how to repro the error, and I have a 12-pack of Diet Coke.

I'll be working over the next several hours, getting results for these 6 additional test cases, running 362.00 vs 365.10. #NoLifeNeeded :-p

*ALL* six of those examples in my last post were GTX 5xx or GTX 6xx series GPUs. My understanding is that those are not affected by this bug. Either the bug can affect them too, or these are just random failures.



My best guess is that this bug:



* only affects the OCL4 high-b transform.

* doesn't manifest itself on small tests, so we're not seeing errors on the 15s and 16s.

* only affects 9xx series GPUs.



And if that's correct, it might be affecting a lot of the tests, resulting in those 2000+ errors in the 17-megas. But not all of them. I looked at one of the GPUs that has a bunch of those 2000+ errors. It's got 42 bad results. And 200+ good results. So it's not failing on every test, but it is failing on a lot of them.



Here's one of the failures:



Number/Good/Bad:



43370168^131072+1 d3e2790fc7d1cee3 ec183a2f73b189f2



That's on a GTX 960 running 365.10

____________

My lucky number is 75898524288+1

I have 2 overclocked-but-stable GTX 980 Ti's, I'm an expert at clean-driver-installing, I know how to repro the error, and I have a 12-pack of Diet Coke .

I'll be working over the next several hours, getting results for these 6 additional test cases, running 362.00 vs 365.10. #NoLifeNeeded :-p



I used to really love Diet Coke. Still do, I suppose. However...



I stopped drinking soda over 10 years ago. Once upon a time I decided to collect ALL the Pepsi Star Wars themed soda cans. I worked for a company that had free vending machines. :)



I stuck them all on top of the credenza in my office.



A few years later I changed offices, and went to pack the cans for the move. The soda had eaten through the aluminum cans and got all over the top of the credenza.



Anything that will eat through aluminum is not something I really want to drink.



That aside, thanks for the help. :)

____________

My lucky number is 75898524288+1

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94840 - Posted: 7 May 2016 | 0:42:05 UTC - in response to Message 94838.

Last modified: 7 May 2016 | 0:47:18 UTC

The extra info about the tasks, is good, I appreciate it. It'd help me if we try to keep our "handful of test cases" together.

So far, I have on my list: the original example, the 6 additional examples, and the 1 additional one the post just above this. So, I have 8 on my list.

Keep in mind, I'm doing this because I want the problem solved. I'm not even a PrimeGrid cruncher [no offense]!



Note: In addition to my GTX 980 Ti GPUs, I also have, at my disposal, the following GPU types to test with, if we decide we need to: GTX 970, GTX 660 Ti.

The extra info about the tasks, is good, I appreciate it. It'd help me if we try to keep our "handful of test cases" together.

So far, I have on my list: the original example, the 6 additional examples, and the 1 additional one the post just above this. So, I have 8 on my list.

Keep in mind, I'm doing this because I want the problem solved. I'm not even a PrimeGrid cruncher [no offense]!



Note: In addition to my GTX 980 Ti GPUs, I also have, at my disposal, the following GPU types to test with, if we decide we need to: GTX 970, GTX 660 Ti.



If my suspicions are correct, my original 6 tests are errors unrelated to the driver.

____________

My lucky number is 75898524288+1

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94842 - Posted: 7 May 2016 | 1:00:20 UTC - in response to Message 94841.

Last modified: 7 May 2016 | 1:01:29 UTC

I'm actually glad this problem surfaced. I just got a beast of a PC, and this is helping me break it in. Dell Alienware Area-51, i7-5960X factory-overclocked, S2716DG 144Hz GSync display, Dell GTX 980 Ti overclocked to +200 stable, EVGA GTX 980 Ti FTW but can't overclock it any farther. What's interesting, is that the Dell GPU can be stable-overclocked higher than the FTW -- meaning, I think, that I got an excellent-binned chip in it. I primarily do GPUGrid work. Rambling...

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94843 - Posted: 7 May 2016 | 1:12:59 UTC

Last modified: 7 May 2016 | 1:14:36 UTC

Michael:



Are you able to add any more 17-mega examples that are suspect, that we can use as reference?



Also, you said there were errors for the following --- do we want any examples from them also?

GFN18: 4

GFN19: 13

GFN21: 4

GFN22: 2

Michael:



Are you able to add any more 17-mega examples that are suspect, that we can use as reference?



Also, you said there were errors for the following --- do we want any examples from them also?

GFN18: 4

GFN19: 13

GFN21: 4

GFN22: 2



Those tests gets progressively larger. Each time N goes up by one, the resulting tests are 4 times as long. I was looking for shorter tests, not longer, to help with the testing.



Besides, chances are these are just random errors. The 2000+ sticks out like a sore thumb.



I can give you more examples from the pool of 2000, but I'm not sure that any of them are any different than what we already have, and I've got no criteria with which to select from amongst the 2000. I was looking for better example to test with that could be done quicker. More of the same might not be helpful.



____________

My lucky number is 75898524288+1

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94846 - Posted: 7 May 2016 | 1:35:15 UTC

Fair enough. Makes sense. Thanks for the explanation - hope you didn't mind me asking :)

I'm ... about 15% through my testing. Getting all my 362.00 baselines in place, before I fresh install 365.10 and cringe!

###### is in the first column below. The second is the correct result. The third column is the bad result.



3646782^131072+1 4bd691363f68a759 95f73e35a6a973cc

3674128^131072+1 5e477f1623e46ba5 cefdc8f58b09622a

3681296^131072+1 8655e9221606630b 1a20d5d48e09b5a2

3717024^131072+1 fba4548ad43ae8f5 9b1d30a302090ce0

3793078^131072+1 3ce3a924ec8052ae 66308eb84d746bbb

3880700^131072+1 e318cf569684adcf dc2d5d2c99f4ba4f

I'm testing these right now. Doing OCL 1,3,4 and 5 with a GTX 970 (Driver 365.10).



If my suspicions are correct, my original 6 tests are errors unrelated to the driver.

It's a bit early to say for sure, but 2 of the OCL4 tests finished already. And guess what: I got the correct result.



3646782^131072+1 is composite. (RES=4bd691363f68a759) (860083 digits) (err = 0.0000) (time = 0:06:50) 23:35:10

3674128^131072+1 is composite. (RES=5e477f1623e46ba5) (860508 digits) (err = 0.0000) (time = 0:06:57) 23:42:07





Btw, this is a bit unrelated, but can we force OCL4 to use the high / low transform on the combined app?

Btw, this is a bit unrelated, but can we force OCL4 to use the high / low transform on the combined app?



It will always use the low transform unless it can't, so in a sense it's always "forced" to the low transform. There's no way to force the high transform, at least not at present.

____________

My lucky number is 75898524288+1

Btw, this is a bit unrelated, but can we force OCL4 to use the high / low transform on the combined app?



It will always use the low transform unless it can't, so in a sense it's always "forced" to the low transform. There's no way to force the high transform, at least not at present.

Which is a real shame, as it seems OCL4 Low isn't affected.



Just for the sake of testing, I have the older versions (during early OCL4 development) stored. If I remember right, 978 was the High transform equivalent, no? If so, I guess I'll try that one as well.

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94851 - Posted: 7 May 2016 | 3:56:32 UTC

Last modified: 7 May 2016 | 3:57:31 UTC

I'm almost 50% done with testing (ie: completely done with 362.00). Then I can work on the back 50% (365.10). I think I'm finding that one of my new GPUs is overclocked too high, which is why this is beneficial to me. :)

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94852 - Posted: 7 May 2016 | 6:13:01 UTC

Last modified: 7 May 2016 | 6:32:47 UTC

Maybe 2/3 done. I estimate another 3 hours of extensive testing, for me... :)



So far, preliminary results, on both of my GTX 980 Ti GPUs, are:

- GFN17-Low: 365.10 generates same correct residues as 362.00

- "43322502^131072+1": 365.10 produces the same set of 2 incorrect results that Raf had: RES=696d5c384b46f4c9, RES=63f88fc407418f27 ... and never produces the correct result.

Okay. Now waiting on Iain, before I relay the answers...



As far as I know, everything Mike said is correct, if a driver/card combination exhibits the bug it should generate the incorrect result every time.



If anyone is able to supply Yves with the binary/ptx code on a buggy card that would also be very helpful, we might be able to figure out a workaround or at least provide Nvidia with a reduced test case.



- Iain

____________

Twitter: IainBethune

Proud member of team "Aggie The Pew". Go Aggie!

3073428256125*2^1290000-1 is Prime!

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94854 - Posted: 7 May 2016 | 6:54:51 UTC

If you are referring to the files that get generated in:

C:\Users\[username]\AppData\Roaming\NVIDIA\ComputeCache

.... then yeah, I'm on it. I'm going to switch gears, get those files for both 362.00 and 365.10, then share to my OneDrive to be linked here, then resume testing.



Sorry I missed that earlier.

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94857 - Posted: 7 May 2016 | 11:51:54 UTC - in response to Message 94854.

Last modified: 7 May 2016 | 11:55:59 UTC

Yves / Iain:



I was able to get the ComputeCache files, for several runs of 362.00 and 365.10, on my GTX 980 Ti GPUs.



It seems that the binary files are in fact different between the driver versions, so you will likely find them useful!! :) Also, on some of the runs, I got more than just the 3 files, not sure what the other files are.



Anyway, please go here:

https://onedrive.live.com/redir?resid=11F4002E1134F403!446594&authkey=!APERQZ1O7_RWjXI&ithint=file%2czip

... to get a single .zip file

... "Genefer ComputeCache, 362.00 & 365.10, GTX 980 Ti.zip"

... that contains:

ComputeCache, 362.00, Dell GTX 980 Ti, 3646782^131072+1

ComputeCache, 362.00, Dell GTX 980 Ti, 43322502^131072+1

ComputeCache, 362.00, EVGA GTX 980 Ti FTW, 3646782^131072+1

ComputeCache, 362.00, EVGA GTX 980 Ti FTW, 43322502^131072+1

ComputeCache, 365.10, Dell GTX 980 Ti, 3646782^131072+1

ComputeCache, 365.10, Dell GTX 980 Ti, 43322502^131072+1

ComputeCache, 365.10, EVGA GTX 980 Ti FTW, 3646782^131072+1

ComputeCache, 365.10, EVGA GTX 980 Ti FTW, 43322502^131072+1



Let us know what you find!!



I will now resume the remainder of my 365.10 residue testing.

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94861 - Posted: 7 May 2016 | 15:18:25 UTC

Last modified: 7 May 2016 | 15:22:04 UTC

Michael:



Woot. I have completed my test run. Results are below. I believe they match up with what you expected them to be, with the exception that I actually got a "different bad" result for my testing of "43370168^131072+1".



After I post these results, I'll put together an email response to send to NVIDIA, and reply back to this thread when that email has been sent.



Regards,

Jacob Klein



Device 0: EVGA GTX 980 Ti FTW

Device 1: Dell GTX 980 Ti

* = MISCALCULATION



43322502^131072+1

362.00, Dev 0, Pass 1: 71e42cc1499c83c1

362.00, Dev 0, Pass 2: 71e42cc1499c83c1

362.00, Dev 0, Pass 3: 71e42cc1499c83c1

362.00, Dev 1, Pass 1: 71e42cc1499c83c1

362.00, Dev 1, Pass 2: 71e42cc1499c83c1

362.00, Dev 1, Pass 3: 71e42cc1499c83c1

365.10, Dev 0, Pass 1: *696d5c384b46f4c9

365.10, Dev 0, Pass 2: *696d5c384b46f4c9

365.10, Dev 0, Pass 3: *696d5c384b46f4c9

365.10, Dev 1, Pass 1: *696d5c384b46f4c9

365.10, Dev 1, Pass 2: *63f88fc407418f27

365.10, Dev 1, Pass 3: *63f88fc407418f27



43370168^131072+1

362.00, Dev 0, Pass 1: d3e2790fc7d1cee3

362.00, Dev 0, Pass 2: d3e2790fc7d1cee3

362.00, Dev 1, Pass 1: d3e2790fc7d1cee3

362.00, Dev 1, Pass 2: d3e2790fc7d1cee3

365.10, Dev 0, Pass 1: *0644bd7a3f6a551b

365.10, Dev 0, Pass 2: *0644bd7a3f6a551b

365.10, Dev 0, Pass 3: *0644bd7a3f6a551b

365.10, Dev 0, Pass 4: *0644bd7a3f6a551b

365.10, Dev 1, Pass 1: *0644bd7a3f6a551b

365.10, Dev 1, Pass 2: *0644bd7a3f6a551b

365.10, Dev 1, Pass 3: *0644bd7a3f6a551b

365.10, Dev 1, Pass 4: *0644bd7a3f6a551b



3646782^131072+1

362.00, Dev 0, Pass 1: 4bd691363f68a759

362.00, Dev 0, Pass 2: 4bd691363f68a759

362.00, Dev 1, Pass 1: 4bd691363f68a759

362.00, Dev 1, Pass 2: 4bd691363f68a759

365.10, Dev 0, Pass 1: 4bd691363f68a759

365.10, Dev 0, Pass 2: 4bd691363f68a759

365.10, Dev 1, Pass 1: 4bd691363f68a759

365.10, Dev 1, Pass 2: 4bd691363f68a759



3674128^131072+1

362.00, Dev 0, Pass 1: 5e477f1623e46ba5

362.00, Dev 0, Pass 2: 5e477f1623e46ba5

362.00, Dev 1, Pass 1: 5e477f1623e46ba5

362.00, Dev 1, Pass 2: 5e477f1623e46ba5

365.10, Dev 0, Pass 1: 5e477f1623e46ba5

365.10, Dev 0, Pass 2: 5e477f1623e46ba5

365.10, Dev 1, Pass 1: 5e477f1623e46ba5

365.10, Dev 1, Pass 2: 5e477f1623e46ba5



3681296^131072+1

362.00, Dev 0, Pass 1: 8655e9221606630b

362.00, Dev 0, Pass 2: 8655e9221606630b

362.00, Dev 1, Pass 1: 8655e9221606630b

362.00, Dev 1, Pass 2: 8655e9221606630b

365.10, Dev 0, Pass 1: 8655e9221606630b

365.10, Dev 0, Pass 2: 8655e9221606630b

365.10, Dev 1, Pass 1: 8655e9221606630b

365.10, Dev 1, Pass 2: 8655e9221606630b



3717024^131072+1

362.00, Dev 0, Pass 1: fba4548ad43ae8f5

362.00, Dev 0, Pass 2: fba4548ad43ae8f5

362.00, Dev 1, Pass 1: fba4548ad43ae8f5

362.00, Dev 1, Pass 2: fba4548ad43ae8f5

365.10, Dev 0, Pass 1: fba4548ad43ae8f5

365.10, Dev 0, Pass 2: fba4548ad43ae8f5

365.10, Dev 1, Pass 1: fba4548ad43ae8f5

365.10, Dev 1, Pass 2: fba4548ad43ae8f5



3793078^131072+1

362.00, Dev 0, Pass 1: 3ce3a924ec8052ae

362.00, Dev 0, Pass 2: 3ce3a924ec8052ae

362.00, Dev 1, Pass 1: 3ce3a924ec8052ae

362.00, Dev 1, Pass 2: 3ce3a924ec8052ae

365.10, Dev 0, Pass 1: 3ce3a924ec8052ae

365.10, Dev 0, Pass 2: 3ce3a924ec8052ae

365.10, Dev 1, Pass 1: 3ce3a924ec8052ae

365.10, Dev 1, Pass 2: 3ce3a924ec8052ae



3880700^131072+1

362.00, Dev 0, Pass 1: e318cf569684adcf

362.00, Dev 0, Pass 2: e318cf569684adcf

362.00, Dev 1, Pass 1: e318cf569684adcf

362.00, Dev 1, Pass 2: e318cf569684adcf

365.10, Dev 0, Pass 1: e318cf569684adcf

365.10, Dev 0, Pass 2: e318cf569684adcf

365.10, Dev 1, Pass 1: e318cf569684adcf

365.10, Dev 1, Pass 2: e318cf569684adcf

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94862 - Posted: 7 May 2016 | 15:41:44 UTC

Last modified: 7 May 2016 | 15:45:46 UTC

I have sent the following email reply, to my NVIDIA contacts. I will now step away and enjoy my weekend, but will be monitoring this thread, and can do additional testing/research if necessary.



I'm looking forward to anything awesome that you guys can find, from those ComputeCache discrepancies!



Thanks,

Jacob



Hello again Kevin,



I have answers to your most-recent PrimeGrid questions, at the following links:

http://www.primegrid.com/forum_thread.php?id=6775&nowrap=true#94830

http://www.primegrid.com/forum_thread.php?id=6775&nowrap=true#94832

http://www.primegrid.com/forum_thread.php?id=6775&nowrap=true#94853



To summarize:

- The expected RES should be the correct value, even if newer versions of the app are released.

- It looks like, when the error occurs on a given driver, it will always occur, even though the error RES result may differ from run to run. So, running the test plan once on an input, to verify a correct RES result, should be sufficient. Sorry I was mistaken on that earlier.

- For the test plan, in addition to "43322502^131072+1" failing on 365.10 drivers, "43370168^131072+1" is another good example that consistently fails. I propose that your test case be: Run once for each of those numbers, and verify that the results match the correct values of:

43322502^131072+1 -----> (RES=71e42cc1499c83c1)

43370168^131072+1 -----> (RES=d3e2790fc7d1cee3)

- The standalone app writes info to local files, but does not communicate with network at all. So that should be GREAT for your test plan. Conversely, running a task in BOINC, will communicate those files to the project upon task completion, which you'd probably want to avoid.



PS: Your devs may be interested in the ComputeCache differences/discrepancies that I found between 362.00 and 365.10... found here:

http://www.primegrid.com/forum_thread.php?id=6775&nowrap=true#94857

... I believe the PrimeGrid devs will be looking at that, to see if they can devise an even simpler/faster example for your test plan. However, I'd still recommend that the test plan include running once for "43322502^131072+1" and once for "43370168^131072+1".



Hope this helps!



Thanks,

Jacob Klein

Woot. I have completed my test run. Results are below. I believe they match up with what you expected them to be, with the exception that I actually got a "different bad" result for my testing of "43370168^131072+1".



I'm not terribly surprised. Sometimes GPUs simply malfunction, so the bad result in the DB could be unrelated to the driver. However, you also got TWO bad results for 43322502^131072+1, and that's surprising:



Device 0: EVGA GTX 980 Ti FTW

Device 1: Dell GTX 980 Ti

* = MISCALCULATION



43322502^131072+1

362.00, Dev 0, Pass 1: 71e42cc1499c83c1

362.00, Dev 0, Pass 2: 71e42cc1499c83c1

362.00, Dev 0, Pass 3: 71e42cc1499c83c1

362.00, Dev 1, Pass 1: 71e42cc1499c83c1

362.00, Dev 1, Pass 2: 71e42cc1499c83c1

362.00, Dev 1, Pass 3: 71e42cc1499c83c1

365.10, Dev 0, Pass 1: *696d5c384b46f4c9

365.10, Dev 0, Pass 2: *696d5c384b46f4c9

365.10, Dev 0, Pass 3: *696d5c384b46f4c9

365.10, Dev 1, Pass 1: *696d5c384b46f4c9

365.10, Dev 1, Pass 2: *63f88fc407418f27

365.10, Dev 1, Pass 3: *63f88fc407418f27



The Dell GTX 980 Ti got the expected bad result once -- and another bad result TWICE. That's strange. I would normally expect either completely different bad results every time, or a single bad result. Not two different reproducible bad results. That's harder to explain.



As for the rest of it, that seems to confirm my suspicion that the GFN17-low tests are not affected by the driver bug.



____________

My lucky number is 75898524288+1

Yves / Iain:

I was able to get the ComputeCache files, for several runs of 362.00 and 365.10, on my GTX 980 Ti GPUs.

[...]

Let us know what you find!!



Thanks, now we know that the bug is not into OpenCL compiler but in PTX to machine code translator or in the driver itself.



OpenCL code is translated into a "parallel thread execution virtual machine (PTX)" code and PTX is compiled into machine code. The PTX to machine code translator is shared with CUDA compiler then the bug may occur with CUDA. But as CUDA 8 is not available we cannot reproduce it with CUDA.



On Maxwell (sm52), 362 and 365 PTX codes are identical. But 362 and 365 machine codes are completly different. My suspicion is that there is a bug in the new 364/365 Maxwell machine code compiler/optimiser.



On Kepler (sm30), 362 and 365 PTX and machine codes are identical.



The problem to make progress is that machine code is 30000 lines of assembly code and more than 20000 are different between the 362 and the 365 versions!



It seems that the binary files are in fact different between the driver versions, so you will likely find them useful!! :) Also, on some of the runs, I got more than just the 3 files, not sure what the other files are.

They are ocl3 and ocl5 PTX and machine codes. That's of interest too.



Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94868 - Posted: 7 May 2016 | 18:09:58 UTC - in response to Message 94866.

Last modified: 7 May 2016 | 18:14:06 UTC

... However, you also got TWO bad results for 43322502^131072+1, and that's surprising...



The Dell GTX 980 Ti got the expected bad result once -- and another bad result TWICE. That's strange. I would normally expect either completely different bad results every time, or a single bad result. Not two different reproducible bad results. That's harder to explain.



Michael:

Raf reported the "dual erroneous results" issue earlier,

http://www.primegrid.com/forum_thread.php?id=6775&nowrap=true#94408

... so basically I was able to confirm it, with the same set of 2 erroneous results.



I tried to see if I could get the other high-beam example to give 2 erroneous results -- ran 8 passes -- but only got the 1 erroneous result. Fun. This test run took me like 10-12 hours, of staring at Command Prompt :)



Regards,

Jacob

I don't get what's going on here. I did a lot of testing, but found no pattern at all. Here's what I did:



First, I started by testing the 6 candidates Michael asked. I ran them all with OCL3, 4 and 5 on 365.10. And I got the correct result for every single one of them. As expected those errors were caused by something other than the faulty drivers.



Next, I thought that maybe the bug affects only OCL4. So I got the limits for OCL4 low and tested candidates just above that. I did it so that I could also test those candidates with OCL3 and 5. I also ran everything with 362.00 and 365.10. These were the candidates:

259000000^32+1 - (RES=bcd99b2ae1c6034c)

184000000^64+1 - (RES=8bdc7fdecb45722f)

130000000^128+1 - (RES=97b53d856b6b7900)

92000000^256+1 - (RES=a8830b0cebecd62f)

65000000^512+1 - (RES=d9ba723e2bdb578d)

46000000^1024+1 - (RES=22d39f3bc239cb34)

33000000^2048+1 - (RES=1a91fa6e0f66d9e5)

23000000^4096+1 - (RES=e7cee1916854ae5d)

17000000^8192+1 - (RES=0bf945e98dc4777d)

12000000^16384+1 - (RES=7f6bd0922e6abbfe)



And guess what: for each candidate test case, I got the exact same RES! Buggy driver, safe driver, OCL4 high, OCL3, OCL5 - it didn't matter, I got the same residues for every candidate!



Then I thought to myself that something was wrong. I ran the usual 43322502^131072+1 test 5 times in a row, with 365.10. Still getting the faulty residue, but this time with a catch: I got ONLY RES=696d5c384b46f4c9 as the result, I didn't get the other wrong residue that showed up prevously.



"Alright, back to the drawing board". This time, I went with a different strat. Instead of going with something just above OCL4-low, I aimed for numbers just below OCL3 and 5's limit. I tested 8388606^131072+1 (which is 2 shy of OCL3's limit) and 8076496^131072+1 (which is 2 shy of OCL5's). Again, both with 362.00 and 365.10. And the transform I got this numbers from.



Once more, THE EXACT SAME RESULTS. Buggy drivers, other transforms, doesn't matter, I get:

8388606^131072+1 is composite. (RES=5a3463bfc14ecef9)

8076496^131072+1 is composite. (RES=93e48ab46dfe5b37)



This leads me to believe that the bug only shows up with "big numbers" (whatever that means). Much like OCL breaks when using n=24 or higher, or how OCL used to brick if B<n^3 in previous versions, maybe OCL4-High breaks after a certain B.



I'll be doing more testing on that.

UPDATE to the post above, while I do more testing.



It seems I was at least partially right about OCL4 + "high B" + buggy driver = fail. I used the known-to-be-bad b=43322502, but this time with N=32768 and N=65536. And BINGO: different residues!



With 362.00, I get RES=7b5269df77fda9bf and RES=db4779368e7538c9 for n=15 and n=16, respectively)

With 365.10, I did 3 passes and got RES=88a68b762462bf8f and RES=c6acbe7d3509a9b0 for n=15 and 16 respectively.



In other words, it isn't that n=15;16 isn't affected, it's just that the current candidates that we are testing just aren't quite to the point of failure yet. I obviously didn't have the time to test this, but I bet that every N will be affected once it get's high enough.





So I decided to try and diagnose that magic point of failure.



-I started by getting the Max B complete out of n=15;16 to do my tests. At the time I looked at the stats page, it said b=24918520 was the highest B for n=15, so I tried that first. It had 2 different residues with n=17, so magic b < 24918520.

-Next, b=12340336. And here's a BIG surprise: once I put in "12340336^131072+1" (with OCL4 obviously), I got MaxErr 1.000 > 0.45



That's right, MaxErr. It wasn't 2 different residues, oh no - it was straight up Max Err. Wow.



At the time of making this, 10200000 MaxErr me, 10100000 passed. I'll try to close that gap as much as I can.

Rafael, what you're saying makes sense. There's likely a problem with rounding, or significant digits, or something like that that only happens when b is past a certain threshold.

____________

My lucky number is 75898524288+1

And BINGO: different residues!



With 362.00, I get RES=7b5269df77fda9bf and RES=db4779368e7538c9 for n=15 and n=16, respectively)

With 365.10, I did 3 passes and got RES=88a68b762462bf8f and RES=c6acbe7d3509a9b0 for n=15 and 16 respectively.

I checked these numbers on my GeForce GT 740M (Kepler) with driver 365.10:

RES=7b5269df77fda9bf and RES=db4779368e7538c9.



A candidate with N=32 would be useful for debugging.

With b=43322502, the residues (ocl4/365.10/Kepler and x87 transform) are:

32: RES=379f0838325714eb

64: RES=2229de4c20fa0706

128: RES=36df517cdbf6996b

256: RES=5f1094b349ac012d

512: RES=45c8e1581af84576

1024: RES=9e0d0b96b2e6a32a

2048: RES=7985cfe2366bd551

4096: RES=ad89f4525c85170d

8192: RES=b6013fa4fd5f6d78

16384: RES=76f5aa9fff987d29



Because b limit is 45766469/400000000 at N=1024, residues may be correct for N <= 1024 on Maxwell GPU.

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94881 - Posted: 8 May 2016 | 15:48:05 UTC

Last modified: 8 May 2016 | 15:51:29 UTC

It took me a bit to figure out what you meant by "n", but I got it now.

Here are my results, on my GTX 980 Ti GPUs. I think you'll find them interesting.

Regards,

Jacob



Expected:

n=5, 43322502^32+1: RES=379f0838325714eb

n=6, 43322502^64+1: RES=2229de4c20fa0706

n=7, 43322502^128+1: RES=36df517cdbf6996b

n=8, 43322502^256+1: RES=5f1094b349ac012d

n=9, 43322502^512+1: RES=45c8e1581af84576

n=10, 43322502^1024+1: RES=9e0d0b96b2e6a32a

n=11, 43322502^2048+1: RES=7985cfe2366bd551

n=12, 43322502^4096+1: RES=ad89f4525c85170d

n=13, 43322502^8192+1: RES=b6013fa4fd5f6d78

n=14, 43322502^16384+1: RES=76f5aa9fff987d29

n=15, 43322502^32768+1: RES=7b5269df77fda9bf

n=16, 43322502^65536+1: RES=db4779368e7538c9



362.00, GTX 980 Ti:

n=5, 43322502^32+1: RES=379f0838325714eb

n=6, 43322502^64+1: RES=2229de4c20fa0706

n=7, 43322502^128+1: RES=36df517cdbf6996b

n=8, 43322502^256+1: RES=5f1094b349ac012d

n=9, 43322502^512+1: RES=45c8e1581af84576

n=10, 43322502^1024+1: RES=9e0d0b96b2e6a32a

n=11, 43322502^2048+1: RES=7985cfe2366bd551

n=12, 43322502^4096+1: RES=ad89f4525c85170d

n=13, 43322502^8192+1: RES=b6013fa4fd5f6d78

n=14, 43322502^16384+1: RES=76f5aa9fff987d29

n=15, 43322502^32768+1: RES=7b5269df77fda9bf

n=16, 43322502^65536+1: RES=db4779368e7538c9



365.10, GTX 980 Ti:

n=5, 43322502^32+1: RES=379f0838325714eb

n=6, 43322502^64+1: RES=2229de4c20fa0706

n=7, 43322502^128+1: RES=36df517cdbf6996b

n=8, 43322502^256+1: RES=5f1094b349ac012d

n=9, 43322502^512+1: RES=45c8e1581af84576

n=10, 43322502^1024+1: RES=9e0d0b96b2e6a32a

n=11, 43322502^2048+1: RES=7985cfe2366bd551

n=12, 43322502^4096+1: RES=ad89f4525c85170d



n=13, 43322502^8192+1: [ran 4 passes per GPU, all the same result]

maxErr exceeded for 43322502^8192+1, 1.0000 > 0.4500

Errors occurred for all available transform implementations



n=14, 43322502^16384+1: [ran 4 passes per GPU, all the same result]

(RES=270979b48088bb4b)



n=15, 43322502^32768+1: [ran 4 passes per GPU, all the same result]

(RES=88a68b762462bf8f)



n=16, 43322502^65536+1: [ran 4 passes per GPU, but got 2 different residues]

Some (RES=c6acbe7d3509a9b0)

Some (RES=620939783f226704)

Okay... it seems like OCL4 high and buggy drivers have a B range where some tests will start to Max Err, but some will work as intended, much in the way that the B limit works for OCL1 and CUDA. If one goes too far away from it, tasks stop MaxErring, but return incorrect residues instead.



As an example with N=131072, b= 10162000 and 10162012 are able to complete on my GPU, returning RES=6641e4156c195a53 and RES=f5e434d8d2a28685 (which I've checked to be true using 362.00). However, all the candidates in between those 2, and even a handful I tested just above/below them ended up in MaxErr. And, as we all know, b=43322502 doesn't Max Err, but returns an incorrect RES.



I've set up a list of candidates to try and determine the highest B for n=17 such that every task below it succeeds, but tasks above it may fail. Currently, I'm testing b=10161908; the last candidate that passed the test was 1016960, with RES=60d0dbb81daac689 (I didn't verify it yet, though).

It took me a bit to figure out what you meant by "n", but I got it now.

Thanks, Jacob. I have been using N = 2^n for many years and I forget that it is my usage.



365.10, GTX 980 Ti:

n=12, 43322502^4096+1: RES=ad89f4525c85170d

Is that correct for all available transform implementations (ocl3, ocl4, ocl5) ?



n=13, 43322502^8192+1: [ran 4 passes per GPU, all the same result]

maxErr exceeded for 43322502^8192+1, 1.0000 > 0.4500

Errors occurred for all available transform implementations

Only ocl4 is available, no?



This test can be reproduced quickly, it will help your NVIDIA contact.





Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94884 - Posted: 8 May 2016 | 17:48:33 UTC - in response to Message 94883.

Last modified: 8 May 2016 | 18:00:13 UTC

365.10, GTX 980 Ti:

n=12, 43322502^4096+1: RES=ad89f4525c85170d



Is that correct for all available transform implementations (ocl3, ocl4, ocl5) ?



n=13, 43322502^8192+1: [ran 4 passes per GPU, all the same result]

maxErr exceeded for 43322502^8192+1, 1.0000 > 0.4500

Errors occurred for all available transform implementations



Only ocl4 is available, no?

This test can be reproduced quickly, it will help your NVIDIA contact.





Yves:



I believe that every run I've ever done, has said:

Supported transform implementations: ocl ocl3 ocl4 ocl5



So...

- Are you basically asking me to run the n=12, n=13, n=14 examples using "-x ocl" and "-x ocl3" and "-x ocl4" and "-x ocl5" if possible ?

- Should I be clearing out the "genefer" files in the folder before running a new command line?

- Should I be clearing out the "ComputeCache" folder before running a new command line?

Yves:



Can you please:

- give me exact command lines to try (ocl3, ocl4, ocl5), or at least one example of each

- indicate if I should be clearing out the "genefer" files in the folder before running a new command line

- indicate if I should be clearing out the "ComputeCache" folder before running a new command line

geneferocl_windows.exe -nvidia -q "43322502^4096+1" -x ocl3

geneferocl_windows.exe -nvidia -q "43322502^4096+1" -x ocl4

geneferocl_windows.exe -nvidia -q "43322502^4096+1" -x ocl5



I remove genefer.dat and genefer.ckpt files (if exist).



The "ComputeCache" doesn't need to be cleaned.



Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94886 - Posted: 8 May 2016 | 18:02:16 UTC

Thanks. I'll get those results within a couple hours from now hopefully.

Jacob Klein

Send message

Joined: 9 Sep 15

Posts: 46

ID: 418272

Credit: 152

RAC: 0



Joined: 9 Sep 15Posts: 46ID: 418272Credit: 152RAC: 0 Message 94889 - Posted: 8 May 2016 | 19:17:15 UTC - in response to Message 94886.

Last modified: 8 May 2016 | 19:56:54 UTC

Yves:



My testing results are below.



So... This means that the following 2 tests would be useful for NVIDIA to quickly get a correct RES result, in fixing this bug, right?

n=13, 43322502^8192+1: RES=b6013fa4fd5f6d78 [18 seconds]

n=14, 43322502^16384+1: RES=76f5aa9fff987d29 [33 seconds]



... and I think I'll still recommend that, if this is going to be some sort of automated test, then it wouldn't hurt if their test plan also included:

n=17, 43322502^131072+1: RES=71e42cc1499c83c1 [9.5 minutes]



Edit: I've relayed the 2 quicker examples to NVIDIA, and suggested that they use all 3 in any automated test plan.



Thanks,

Jacob



=====================================================



362.00:

362.00 - 43322502^4096+1 -d 0 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^4096+1 -d 1 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^4096+1 -d 0 -x ocl3: (RES=ad89f4525c85170d)

362.00 - 43322502^4096+1 -d 1 -x ocl3: (RES=ad89f4525c85170d)

362.00 - 43322502^4096+1 -d 0 -x ocl4: (RES=ad89f4525c85170d)

362.00 - 43322502^4096+1 -d 1 -x ocl4: (RES=ad89f4525c85170d)

362.00 - 43322502^4096+1 -d 0 -x ocl5: (RES=ad89f4525c85170d)

362.00 - 43322502^4096+1 -d 1 -x ocl5: (RES=ad89f4525c85170d)

362.00 - 43322502^8192+1 -d 0 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^8192+1 -d 1 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^8192+1 -d 0 -x ocl3: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^8192+1 -d 1 -x ocl3: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^8192+1 -d 0 -x ocl4: (RES=b6013fa4fd5f6d78)

362.00 - 43322502^8192+1 -d 1 -x ocl4: (RES=b6013fa4fd5f6d78)

362.00 - 43322502^8192+1 -d 0 -x ocl5: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^8192+1 -d 1 -x ocl5: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^16384+1 -d 0 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^16384+1 -d 1 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^16384+1 -d 0 -x ocl3: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^16384+1 -d 1 -x ocl3: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^16384+1 -d 0 -x ocl4: (RES=76f5aa9fff987d29)

362.00 - 43322502^16384+1 -d 1 -x ocl4: (RES=76f5aa9fff987d29)

362.00 - 43322502^16384+1 -d 0 -x ocl5: Cannot test. b is too large. Errors occurred for all available transform implementations

362.00 - 43322502^16384+1 -d 1 -x ocl5: Cannot test. b is too large. Errors occurred for all available transform implementations



365.10:

365.10 - 43322502^4096+1 -d 0 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^4096+1 -d 1 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^4096+1 -d 0 -x ocl3: (RES=ad89f4525c85170d)

365.10 - 43322502^4096+1 -d 1 -x ocl3: (RES=ad89f4525c85170d)

365.10 - 43322502^4096+1 -d 0 -x ocl4: (RES=ad89f4525c85170d)

365.10 - 43322502^4096+1 -d 1 -x ocl4: (RES=ad89f4525c85170d)

365.10 - 43322502^4096+1 -d 0 -x ocl5: (RES=ad89f4525c85170d)

365.10 - 43322502^4096+1 -d 1 -x ocl5: (RES=ad89f4525c85170d)

365.10 - 43322502^8192+1 -d 0 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^8192+1 -d 1 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^8192+1 -d 0 -x ocl3: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^8192+1 -d 1 -x ocl3: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^8192+1 -d 0 -x ocl4: maxErr exceeded for 43322502^8192+1, 1.0000 > 0.4500 Errors occurred for all available transform implementations

365.10 - 43322502^8192+1 -d 1 -x ocl4: maxErr exceeded for 43322502^8192+1, 1.0000 > 0.4500 Errors occurred for all available transform implementations

365.10 - 43322502^8192+1 -d 0 -x ocl5: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^8192+1 -d 1 -x ocl5: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^16384+1 -d 0 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^16384+1 -d 1 -x ocl: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^16384+1 -d 0 -x ocl3: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^16384+1 -d 1 -x ocl3: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^16384+1 -d 0 -x ocl4: (RES=270979b48088bb4b)

365.10 - 43322502^16384+1 -d 1 -x ocl4: (RES=270979b48088bb4b)

365.10 - 43322502^16384+1 -d 0 -x ocl5: Cannot test. b is too large. Errors occurred for all available transform implementations

365.10 - 43322502^16384+1 -d 1 -x ocl5: Cannot test. b is too large. Errors occurred for all available transform implementations

Not only does that look like a much shorter test case for them to debug, but we've also possibly established that b (or some combination of b and N) needs to be above a certain threshold, which in turns may point to an overflow or rounding problem. If that's correct, it may help narrow down the possible causes of the problem.

____________

My lucky number is 75898524288+1

Not only does that look like a much shorter test case for them to debug, but we've also possibly established that b (or some combination of b and N) needs to be above a certain threshold, which in turns may point to an overflow or rounding problem. If that's correct, it may help narrow down the possible causes of the problem.

I'm trying to find that b for N=131072 at the moment.



It seems to be around the b=10.16M mark. b=10161960; 10162000 and 10162012 all passed ("pass" as in "were able to complete the test and returned correct residues"), b