Nvidia Tesla P100 Problems
log in

Advanced search

Message boards : Number crunching : Nvidia Tesla P100 Problems

Previous · 1 · 2 · 3
Author Message
Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9770 - Posted: 18 Mar 2024, 19:27:09 UTC - in response to Message 9769.

was v31 compiled with explicit support for CC_6.0 in the NVCCFLAGS section of your makefile or build script?


the argument should be something similar to this:

--generate-code arch=compute_60,code=sm_60


this was what I was suggesting when I mentioned recompiling it to add 6.0 support.


yes it was but with cc5

Ian&Steve C.
Send message
Joined: 7 May 23
Posts: 7
Credit: 1,086,024
RAC: 2
Message 9771 - Posted: 18 Mar 2024, 19:33:59 UTC - in response to Message 9770.
Last modified: 18 Mar 2024, 19:51:55 UTC

you need to add support for every cc level explicitly.

the whole section would be something like this:

--generate-code arch=compute_50,code=sm_50 --generate-code arch=compute_52,code=sm_52 --generate-code arch=compute_60,code=sm_60 --generate-code arch=compute_61,code=sm_61 --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_75,code=sm_75 --generate-code arch=compute_72,code=sm_72 --generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_86,code=sm_86 --generate-code arch=compute_89,code=sm_89


plus whatever other flags you have there

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9772 - Posted: 18 Mar 2024, 19:51:54 UTC - in response to Message 9771.

you need to add support for every cc level explicitly.

the whole section would be something like this:

--generate-code arch=compute_50,code=sm_50 --generate-code arch=compute_52,code=sm_52 --generate-code arch=compute_60,code=sm_60 --generate-code arch=compute_62,code=sm_62 --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_75,code=sm_75 --generate-code arch=compute_72,code=sm_72 --generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_86,code=sm_86 --generate-code arch=compute_89,code=sm_89


plus whatever other flags you have there


ok, noticed, thx, we need to change that again

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9773 - Posted: 18 Mar 2024, 22:20:16 UTC

v32 recompiled with the fix, I hope...

crashtech
Send message
Joined: 10 Apr 19
Posts: 28
Credit: 466,952,134
RAC: 130,404
Message 9775 - Posted: 19 Mar 2024, 0:39:35 UTC
Last modified: 19 Mar 2024, 1:16:05 UTC

I wanted to try v32, but even after resetting the project in the client, the server only sent me v28 cuda100 tasks. Oddly enough, the cuda100 tasks used to work on the P100 but now they don't. So I put this as an app_config.xml:

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda120</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>


Now I can run the new v32 and it works fine, but the app_config has to be used at least on this particular computer. I have not tried the rest yet.

EDIT: Actually, I spoke too soon. This app_config does NOT prevent the download of v28 cuda100 tasks. So I'm still not able to run all my GPUs.

mmonnin
Send message
Joined: 1 Feb 17
Posts: 27
Credit: 311,202,073
RAC: 14,155
Message 9776 - Posted: 19 Mar 2024, 0:41:10 UTC

I saw the new version download. Works on both my PCs.

crashtech
Send message
Joined: 10 Apr 19
Posts: 28
Credit: 466,952,134
RAC: 130,404
Message 9777 - Posted: 19 Mar 2024, 0:47:26 UTC - in response to Message 9776.

I saw the new version download. Works on both my PCs.

And you have P100s?

mmonnin
Send message
Joined: 1 Feb 17
Posts: 27
Credit: 311,202,073
RAC: 14,155
Message 9778 - Posted: 19 Mar 2024, 0:53:42 UTC

You most likely know I don't.

It was a confirmation post that an update for P100 systems didn't break other systems. Like it did earlier.

crashtech
Send message
Joined: 10 Apr 19
Posts: 28
Credit: 466,952,134
RAC: 130,404
Message 9779 - Posted: 19 Mar 2024, 1:12:55 UTC - in response to Message 9778.
Last modified: 19 Mar 2024, 1:16:23 UTC

I actually had no idea, your computers are hidden. So everything works now but the P100s need an app_config, for now anyway.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9780 - Posted: 19 Mar 2024, 1:54:18 UTC - in response to Message 9775.

I wanted to try v32, but even after resetting the project in the client, the server only sent me v28 cuda100 tasks. Oddly enough, the cuda100 tasks used to work on the P100 but now they don't. So I put this as an app_config.xml:

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda120</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>


Now I can run the new v32 and it works fine, but the app_config has to be used at least on this particular computer. I have not tried the rest yet.

EDIT: Actually, I spoke too soon. This app_config does NOT prevent the download of v28 cuda100 tasks. So I'm still not able to run all my GPUs.


It should work without an app_config file or not?

Ian&Steve C.
Send message
Joined: 7 May 23
Posts: 7
Credit: 1,086,024
RAC: 2
Message 9781 - Posted: 19 Mar 2024, 11:26:51 UTC - in response to Message 9775.
Last modified: 19 Mar 2024, 11:27:05 UTC

your cuda100 app P100s are now failing with this:

./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory


maybe you removed the cuda toolkit?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9782 - Posted: 19 Mar 2024, 12:05:32 UTC - in response to Message 9781.
Last modified: 19 Mar 2024, 12:43:13 UTC

your cuda100 app P100s are now failing with this:

./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory


maybe you removed the cuda toolkit?


https://srbase.my-firewall.org/sr5/download/libcudart.so.10.1

I will update the app including this file.

Update:
The missing libfile is now in the zipfile of v29 cuda100 app

crashtech
Send message
Joined: 10 Apr 19
Posts: 28
Credit: 466,952,134
RAC: 130,404
Message 9784 - Posted: 19 Mar 2024, 17:51:33 UTC - in response to Message 9782.

Thank you for all the work you agreed to do just to get one old model of GPU running, I appreciate it greatly.

I will return to testing TF after the upcoming PrimeGrid challenge is concluded!

crashtech
Send message
Joined: 10 Apr 19
Posts: 28
Credit: 466,952,134
RAC: 130,404
Message 9856 - Posted: 25 Mar 2024, 2:03:27 UTC

All seems to be working well on P100s. I believe that in honor of this fix, I will put all my GPUs on TF for a while.

Thank you!

Previous · 1 · 2 · 3
Post to thread

Message boards : Number crunching : Nvidia Tesla P100 Problems


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther