Posts by rebirther
log in
81) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9772)
Posted 18 Mar 2024 by Profile rebirther
you need to add support for every cc level explicitly.

the whole section would be something like this:

--generate-code arch=compute_50,code=sm_50 --generate-code arch=compute_52,code=sm_52 --generate-code arch=compute_60,code=sm_60 --generate-code arch=compute_62,code=sm_62 --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_75,code=sm_75 --generate-code arch=compute_72,code=sm_72 --generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_86,code=sm_86 --generate-code arch=compute_89,code=sm_89


plus whatever other flags you have there


ok, noticed, thx, we need to change that again
82) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9770)
Posted 18 Mar 2024 by Profile rebirther
was v31 compiled with explicit support for CC_6.0 in the NVCCFLAGS section of your makefile or build script?


the argument should be something similar to this:

--generate-code arch=compute_60,code=sm_60


this was what I was suggesting when I mentioned recompiling it to add 6.0 support.


yes it was but with cc5
83) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9767)
Posted 18 Mar 2024 by Profile rebirther
Interesting, try to run ldd ./mfaktc
84) Message boards : Number crunching : Bases loaded (Message 9765)
Posted 18 Mar 2024 by Profile rebirther
S323
- n=100-300k
- runtime 8min-1h12min (AVX@3.8Ghz)
- 100-150k = 60 credits
- 150-200k = 110 credits
- 200-250k = 160 credits
- 250-300k = 260 credits
- Sierpinski Base
- deadline 3 days
85) Message boards : News : Testing new apps started (Message 9763)
Posted 17 Mar 2024 by Profile rebirther
new cuda120 v31 TF mfaktc linux app is up

changelog:
- reduced the driver requuirement from v30
86) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9762)
Posted 17 Mar 2024 by Profile rebirther
That fixed one of my PCs.

Another still errors out on every task
https://srbase.my-firewall.org/sr5/result.php?resultid=145885980


v31 is up, should reduce the driver requirement to 525
87) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9759)
Posted 17 Mar 2024 by Profile rebirther
All errors with new version.
https://srbase.my-firewall.org/sr5/result.php?resultid=145853091

Please undo

Fine with 0.28, errors on 0.29.


ok, this was not as planned, reverted back to 0.28, need to check...


v30 fixed the issue with a wrong and older.so12 file
88) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9758)
Posted 17 Mar 2024 by Profile rebirther
All errors with new version.
https://srbase.my-firewall.org/sr5/result.php?resultid=145853091

Please undo

Fine with 0.28, errors on 0.29.


ok, this was not as planned, reverted back to 0.28, need to check...
89) Message boards : News : base R815 Magaprime / proven (Message 9756)
Posted 16 Mar 2024 by Profile rebirther
magic_sam, a member of the team Gridcoin found a megaprime for base R815.
The prime 8*815^559138-1 has 1.627.740 digits and entered the TOP5000 in Chris Caldwell's The Largest Known Primes Database. With this find it also has proven the base!
90) Message boards : News : Testing new apps started (Message 9755)
Posted 16 Mar 2024 by Profile rebirther
new cuda120 TF mfaktc app is up

changelog:
- recompiled linux app from cc5-latest
91) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9754)
Posted 16 Mar 2024 by Profile rebirther
new cuda120 TF mfaktc app is up

changelog:
- recompiled from cc5-latest
92) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9753)
Posted 14 Mar 2024 by Profile rebirther
there also appears to be a problem with the cuda100 package.

the mfaktc-linux64-v10.zip contains Windows files (mfaktc.exe and .ini files). it should contain linux binaries and the cuda10 libcudart.so.10 shared object libraries like the cuda11 package does.



the libcudart.so.10 file you can find in FAQs. If the app was dynamically compiled you dont need them in package. The .exe in linux was also compiled so but running under linux. The .ini file is fixed.


when I run ldd mfaktc.exe on the binary it says libcudart.so.10.1 is missing.

in your cuda111 and cuda120 packages you include this file.

I see nothing in the FAQ that references this file. can you link to it? Is your intended workaround for this to create a symlink from libcudart.so.10->libcudart.so.10.1 ?


The file must be somewhere in another thread. Linux is not my thing so I will ask if we can do something. The package need a recompile too, old stuff.
93) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9751)
Posted 14 Mar 2024 by Profile rebirther
there also appears to be a problem with the cuda100 package.

the mfaktc-linux64-v10.zip contains Windows files (mfaktc.exe and .ini files). it should contain linux binaries and the cuda10 libcudart.so.10 shared object libraries like the cuda11 package does.



the libcudart.so.10 file you can find in FAQs. If the app was dynamically compiled you dont need them in package. The .exe in linux was also compiled so but running under linux. The .ini file is fixed.
94) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9749)
Posted 14 Mar 2024 by Profile rebirther
cuda 12+ still supports CC 6.0 devices. why not just recompile the cuda120 app with CC 6.0 support?

otherwise, the only real solution is to transition to Anonymous Platform, and force the cuda100 app to run on everything with an app_info.xml file.


cuda12 app support cc6.1, see 1070ti I dont know why the Tesla doesnt run cuda120 WUs, anonymous patform is not allowed.


i know your app supports 6.1. but CUDA 12 from nvidia still supports down to CC 5.0

I'm saying you should recompile it to add support for 6.0 also. Tesla P100 has CC level 6.0.

6.1 is used on the mainstream pascal GPUs (GTX 10-series, Quadro P-series)


Yes, we will try to recompile, perhaps this will solve a lot of problems.
95) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9747)
Posted 14 Mar 2024 by Profile rebirther
cuda 12+ still supports CC 6.0 devices. why not just recompile the cuda120 app with CC 6.0 support?

otherwise, the only real solution is to transition to Anonymous Platform, and force the cuda100 app to run on everything with an app_info.xml file.


cuda12 app support cc6.1, see 1070ti I dont know why the Tesla doesnt run cuda120 WUs, anonymous patform is not allowed.
96) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9745)
Posted 14 Mar 2024 by Profile rebirther
I'm trying to say that the situation is worse now for some reason. With your help, I was able to process tasks for a time (on P100 only computers) using the first app_config you posted, but now that one no longer helps, nor does the second app_config with plan_class help either. I can't process tasks on P100s at all now, whether they are alone in the computer, in pairs, or mixed with another GPU. Nothing works now.

My humble request would be for you to revert whatever changes were made to the server side, so I can see if at least the P100-only computers can process tasks again. That would be a large improvement.


The server plan_class has the original. You need the app_config in the first post.
97) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9743)
Posted 14 Mar 2024 by Profile rebirther
Also my P100-only machines are having problems again despite having got them working with the initial app_config you posted.


Yes, you need the app_config also a 2nd entry for the other card if you have 2
98) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9740)
Posted 13 Mar 2024 by Profile rebirther
It could be that since the P100 is the second GPU, it's presence isn't being properly detected on the server side. That's just a wild guess based on the fact that the list of hosts always claims the computer has x number of whatever the first GPU is, instead of what's really there.


Yes but with the app_config we can reduce the possibilities.
99) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9738)
Posted 13 Mar 2024 by Profile rebirther
I'm sorry, after getting rid of the app_config and resetting the project, the client is still getting cuda120 tasks which error out immediately on the P100.
Perhaps it's not worth the trouble, not many people have these old compute GPUs.


Yeah I see, we have 2 plan_classes and both are working so we need it all in app_config manually to overwrite the settings.

<app_config> <app_version> <plan_class>cuda100</plan_class> <host_summary_regex>Tesla P100</host_summary_regex> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> <app_version> <plan_class>cuda120</plan_class> <host_summary_regex>GeForce GTX 1070 Ti</host_summary_regex> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> <app_config>
100) Message boards : Number crunching : Nvidia Tesla P100 Problems (Message 9736)
Posted 13 Mar 2024 by Profile rebirther
I have added the cuda100 plan_class for Tesla P100 to the server, better for both sides. You can get rid of the app_config. Lets test it.


Previous 20 · Next 20

Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther