Nvidia Tesla P100 Problems

Author	Message
rebirther Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 2 Jan 13 Posts: 8234 Credit: 113,724,563 RAC: 0	Message 9770 - Posted: 18 Mar 2024, 19:27:09 UTC - in response to Message 9769.
was v31 compiled with explicit support for CC_6.0 in the NVCCFLAGS section of your makefile or build script? the argument should be something similar to this: --generate-code arch=compute_60,code=sm_60 this was what I was suggesting when I mentioned recompiling it to add 6.0 support. yes it was but with cc5
ID: 9770 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 7 May 23 Posts: 19 Credit: 466,716,124 RAC: 4	Message 9771 - Posted: 18 Mar 2024, 19:33:59 UTC - in response to Message 9770. Last modified: 18 Mar 2024, 19:51:55 UTC
you need to add support for every cc level explicitly. the whole section would be something like this: --generate-code arch=compute_50,code=sm_50 --generate-code arch=compute_52,code=sm_52 --generate-code arch=compute_60,code=sm_60 --generate-code arch=compute_61,code=sm_61 --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_75,code=sm_75 --generate-code arch=compute_72,code=sm_72 --generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_86,code=sm_86 --generate-code arch=compute_89,code=sm_89 plus whatever other flags you have there
ID: 9771 · Rating: 0 · rate: / Reply Quote

rebirther Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 2 Jan 13 Posts: 8234 Credit: 113,724,563 RAC: 0	Message 9772 - Posted: 18 Mar 2024, 19:51:54 UTC - in response to Message 9771.
you need to add support for every cc level explicitly. the whole section would be something like this: --generate-code arch=compute_50,code=sm_50 --generate-code arch=compute_52,code=sm_52 --generate-code arch=compute_60,code=sm_60 --generate-code arch=compute_62,code=sm_62 --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_75,code=sm_75 --generate-code arch=compute_72,code=sm_72 --generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_86,code=sm_86 --generate-code arch=compute_89,code=sm_89 plus whatever other flags you have there ok, noticed, thx, we need to change that again
ID: 9772 · Rating: 0 · rate: / Reply Quote

rebirther Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 2 Jan 13 Posts: 8234 Credit: 113,724,563 RAC: 0	Message 9773 - Posted: 18 Mar 2024, 22:20:16 UTC
v32 recompiled with the fix, I hope...
ID: 9773 · Rating: 0 · rate: / Reply Quote

crashtech Send message Joined: 10 Apr 19 Posts: 29 Credit: 2,414,801,984 RAC: 239,888	Message 9775 - Posted: 19 Mar 2024, 0:39:35 UTC Last modified: 19 Mar 2024, 1:16:05 UTC
I wanted to try v32, but even after resetting the project in the client, the server only sent me v28 cuda100 tasks. Oddly enough, the cuda100 tasks used to work on the P100 but now they don't. So I put this as an app_config.xml: <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda120</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config> Now I can run the new v32 and it works fine, but the app_config has to be used at least on this particular computer. I have not tried the rest yet. EDIT: Actually, I spoke too soon. This app_config does NOT prevent the download of v28 cuda100 tasks. So I'm still not able to run all my GPUs.
ID: 9775 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 1 Feb 17 Posts: 40 Credit: 1,067,056,386 RAC: 31,544	Message 9776 - Posted: 19 Mar 2024, 0:41:10 UTC
I saw the new version download. Works on both my PCs.
ID: 9776 · Rating: 0 · rate: / Reply Quote

crashtech Send message Joined: 10 Apr 19 Posts: 29 Credit: 2,414,801,984 RAC: 239,888	Message 9777 - Posted: 19 Mar 2024, 0:47:26 UTC - in response to Message 9776.
I saw the new version download. Works on both my PCs. And you have P100s?
ID: 9777 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 1 Feb 17 Posts: 40 Credit: 1,067,056,386 RAC: 31,544	Message 9778 - Posted: 19 Mar 2024, 0:53:42 UTC
You most likely know I don't. It was a confirmation post that an update for P100 systems didn't break other systems. Like it did earlier.
ID: 9778 · Rating: 0 · rate: / Reply Quote

crashtech Send message Joined: 10 Apr 19 Posts: 29 Credit: 2,414,801,984 RAC: 239,888	Message 9779 - Posted: 19 Mar 2024, 1:12:55 UTC - in response to Message 9778. Last modified: 19 Mar 2024, 1:16:23 UTC
I actually had no idea, your computers are hidden. ~~So everything works now but the P100s need an app_config, for now anyway.~~
ID: 9779 · Rating: 0 · rate: / Reply Quote

rebirther Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 2 Jan 13 Posts: 8234 Credit: 113,724,563 RAC: 0	Message 9780 - Posted: 19 Mar 2024, 1:54:18 UTC - in response to Message 9775.
I wanted to try v32, but even after resetting the project in the client, the server only sent me v28 cuda100 tasks. Oddly enough, the cuda100 tasks used to work on the P100 but now they don't. So I put this as an app_config.xml: <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda120</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config> Now I can run the new v32 and it works fine, but the app_config has to be used at least on this particular computer. I have not tried the rest yet. EDIT: Actually, I spoke too soon. This app_config does NOT prevent the download of v28 cuda100 tasks. So I'm still not able to run all my GPUs. It should work without an app_config file or not?
ID: 9780 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 7 May 23 Posts: 19 Credit: 466,716,124 RAC: 4	Message 9781 - Posted: 19 Mar 2024, 11:26:51 UTC - in response to Message 9775. Last modified: 19 Mar 2024, 11:27:05 UTC
your cuda100 app P100s are now failing with this: ./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory maybe you removed the cuda toolkit?
ID: 9781 · Rating: 0 · rate: / Reply Quote

rebirther Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 2 Jan 13 Posts: 8234 Credit: 113,724,563 RAC: 0	Message 9782 - Posted: 19 Mar 2024, 12:05:32 UTC - in response to Message 9781. Last modified: 19 Mar 2024, 12:43:13 UTC
your cuda100 app P100s are now failing with this: ./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory maybe you removed the cuda toolkit? https://srbase.my-firewall.org/sr5/download/libcudart.so.10.1 I will update the app including this file. Update: The missing libfile is now in the zipfile of v29 cuda100 app
ID: 9782 · Rating: 0 · rate: / Reply Quote

crashtech Send message Joined: 10 Apr 19 Posts: 29 Credit: 2,414,801,984 RAC: 239,888	Message 9784 - Posted: 19 Mar 2024, 17:51:33 UTC - in response to Message 9782.
Thank you for all the work you agreed to do just to get one old model of GPU running, I appreciate it greatly. I will return to testing TF after the upcoming PrimeGrid challenge is concluded!
ID: 9784 · Rating: 0 · rate: / Reply Quote

crashtech Send message Joined: 10 Apr 19 Posts: 29 Credit: 2,414,801,984 RAC: 239,888	Message 9856 - Posted: 25 Mar 2024, 2:03:27 UTC
All seems to be working well on P100s. I believe that in honor of this fix, I will put all my GPUs on TF for a while. Thank you!
ID: 9856 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 7 May 23 Posts: 19 Credit: 466,716,124 RAC: 4	Message 10028 - Posted: 15 Jul 2024, 16:28:32 UTC
all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring. did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually.
ID: 10028 · Rating: 0 · rate: / Reply Quote

rebirther Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 2 Jan 13 Posts: 8234 Credit: 113,724,563 RAC: 0	Message 10029 - Posted: 15 Jul 2024, 16:42:17 UTC - in response to Message 10028. Last modified: 15 Jul 2024, 16:44:24 UTC
all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring. did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually. We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running. <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>
ID: 10029 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 7 May 23 Posts: 19 Credit: 466,716,124 RAC: 4	Message 10030 - Posted: 15 Jul 2024, 16:55:21 UTC - in response to Message 10029. Last modified: 15 Jul 2024, 16:55:53 UTC
all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring. did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually. We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running. <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config> tried it. didnt work. this app config does not work. it will not limit the project to only getting the cuda100 app. same results on a Titan V (same cc-7.0), instant error. I'm guessing the problem is how the app was compiled without cc 7.0 support.
ID: 10030 · Rating: 0 · rate: / Reply Quote

rebirther Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 2 Jan 13 Posts: 8234 Credit: 113,724,563 RAC: 0	Message 10031 - Posted: 15 Jul 2024, 17:21:15 UTC - in response to Message 10030.
all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring. did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually. We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running. <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config> tried it. didnt work. this app config does not work. it will not limit the project to only getting the cuda100 app. same results on a Titan V (same cc-7.0), instant error. I'm guessing the problem is how the app was compiled without cc 7.0 support. We can only try this <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda111</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config> Or the app was not compiled with cc7, P100 was working, right?
ID: 10031 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 7 May 23 Posts: 19 Credit: 466,716,124 RAC: 4	Message 10032 - Posted: 15 Jul 2024, 17:31:43 UTC - in response to Message 10031.
all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring. did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually. We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running. <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config> tried it. didnt work. this app config does not work. it will not limit the project to only getting the cuda100 app. same results on a Titan V (same cc-7.0), instant error. I'm guessing the problem is how the app was compiled without cc 7.0 support. We can only try this <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda111</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config> Or the app was not compiled with cc7, P100 was working, right? this method of using an app_config to try to force what app gets sent, do not work. i've never had this work reliably and it didnt work for crashtech when you suggested it last time. an app config is designed to enact some configuration on apps that are already downloaded. it does not impose any influence on what the project server actually sends me. i think your app was not compiled with CC 7.0 support. and if you didn't do it on the latest app, I would guess you didnt do it on the older apps either. P100 works as far as I know. but that's because you fixed it by recompiling the app to include the CC 6.0 support. when compiling cuda apps you cannot just put a minimum value. you need to put ALL values that you want to support. to support everything from CC5.0+ you need to explicitly state every version (5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.6, 8.9) 5.0 - GM100 Maxwell 5.2 - GM200 Maxwell 6.0 - GP100 Pascal 6.1 - GP102+ Pascal 7.0 - GV100 Volta (Titan V and V100) 7.5 - TU102+ Turing 8.0 - GA100 Ampere 8.6 - GA102+ Ampere 8.9 - AD102+ Ada Lovelace 9.0 - GH100 Hopper and you're going to have to recompile again in the future to add 10.x something if you want to support the upcoming Blackwell GPUs.
ID: 10032 · Rating: 0 · rate: / Reply Quote

rebirther Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 2 Jan 13 Posts: 8234 Credit: 113,724,563 RAC: 0	Message 10033 - Posted: 15 Jul 2024, 17:38:04 UTC - in response to Message 10032. Last modified: 15 Jul 2024, 17:38:37 UTC
all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring. did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually. We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running. <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config> tried it. didnt work. this app config does not work. it will not limit the project to only getting the cuda100 app. same results on a Titan V (same cc-7.0), instant error. I'm guessing the problem is how the app was compiled without cc 7.0 support. We can only try this <app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda111</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config> Or the app was not compiled with cc7, P100 was working, right? this method of using an app_config to try to force what app gets sent, do not work. i've never had this work reliably and it didnt work for crashtech when you suggested it last time. an app config is designed to enact some configuration on apps that are already downloaded. it does not impose any influence on what the project server actually sends me. i think your app was not compiled with CC 7.0 support. and if you didn't do it on the latest app, I would guess you didnt do it on the older apps either. P100 works as far as I know. but that's because you fixed it by recompiling the app to include the CC 6.0 support. when compiling cuda apps you cannot just put a minimum value. you need to put ALL values that you want to support. to support everything from CC5.0+ you need to explicitly state every version (5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.6, 8.9) 5.0 - GM100 Maxwell 5.2 - GM200 Maxwell 6.0 - GP100 Pascal 6.1 - GP102+ Pascal 7.0 - GV100 Volta (Titan V and V100) 7.5 - TU102+ Turing 8.0 - GA100 Ampere 8.6 - GA102+ Ampere 8.9 - AD102+ Ada Lovelace 9.0 - GH100 Hopper and you're going to have to recompile again in the future to add 10.x something if you want to support the upcoming Blackwell GPUs. Yes, this was only testing the current plan_class with Cuda111 while there are some different things in config. It looks like the app was not compiled with cc7 but cc6. For cc10 we will have a separate app of course.
ID: 10033 · Rating: 0 · rate: / Reply Quote

Author

Message

rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 2 Jan 13
Posts: 8234
Credit: 113,724,563
RAC: 0
2M in Sierpinski / Riesel Base credit

5M in Sierpinski / Riesel Base - short credit

1M in Sierpinski / Riesel Base - long credit

500k in Sierpinski / Riesel Base - average credit

was v31 compiled with explicit support for CC_6.0 in the NVCCFLAGS section of your makefile or build script?

the argument should be something similar to this:

--generate-code arch=compute_60,code=sm_60

this was what I was suggesting when I mentioned recompiling it to add 6.0 support.

yes it was but with cc5

ID: 9770 · Rating: 0 · rate:

Reply Quote

Ian&Steve C.
Send message
Joined: 7 May 23
Posts: 19
Credit: 466,716,124
RAC: 4
100k in Riesel Base credit

you need to add support for every cc level explicitly.

the whole section would be something like this:

--generate-code arch=compute_50,code=sm_50 --generate-code arch=compute_52,code=sm_52 --generate-code arch=compute_60,code=sm_60 --generate-code arch=compute_61,code=sm_61 --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_75,code=sm_75 --generate-code arch=compute_72,code=sm_72 --generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_86,code=sm_86 --generate-code arch=compute_89,code=sm_89

plus whatever other flags you have there

ID: 9771 · Rating: 0 · rate:

Reply Quote

rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 2 Jan 13
Posts: 8234
Credit: 113,724,563
RAC: 0
2M in Sierpinski / Riesel Base credit

you need to add support for every cc level explicitly.

the whole section would be something like this:

--generate-code arch=compute_50,code=sm_50 --generate-code arch=compute_52,code=sm_52 --generate-code arch=compute_60,code=sm_60 --generate-code arch=compute_62,code=sm_62 --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_75,code=sm_75 --generate-code arch=compute_72,code=sm_72 --generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_86,code=sm_86 --generate-code arch=compute_89,code=sm_89

plus whatever other flags you have there

ok, noticed, thx, we need to change that again

ID: 9772 · Rating: 0 · rate:

Reply Quote

rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 2 Jan 13
Posts: 8234
Credit: 113,724,563
RAC: 0
2M in Sierpinski / Riesel Base credit

v32 recompiled with the fix, I hope...

ID: 9773 · Rating: 0 · rate:

Reply Quote

crashtech
Send message
Joined: 10 Apr 19
Posts: 29
Credit: 2,414,801,984
RAC: 239,888
1M in Sierpinski / Riesel Base credit

500k in Sierpinski / Riesel Base - short credit

10M in Sierpinski / Riesel Base - long credit

2M in Sierpinski / Riesel Base - average credit

I wanted to try v32, but even after resetting the project in the client, the server only sent me v28 cuda100 tasks. Oddly enough, the cuda100 tasks used to work on the P100 but now they don't. So I put this as an app_config.xml:

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda120</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

Now I can run the new v32 and it works fine, but the app_config has to be used at least on this particular computer. I have not tried the rest yet.

EDIT: Actually, I spoke too soon. This app_config does NOT prevent the download of v28 cuda100 tasks. So I'm still not able to run all my GPUs.

ID: 9775 · Rating: 0 · rate:

Reply Quote

mmonnin
Send message
Joined: 1 Feb 17
Posts: 40
Credit: 1,067,056,386
RAC: 31,544
2M in Sierpinski / Riesel Base credit

2M in Sierpinski / Riesel Base - short credit

5M in Sierpinski / Riesel Base - long credit

5M in Sierpinski / Riesel Base - average credit

I saw the new version download. Works on both my PCs.

ID: 9776 · Rating: 0 · rate:

Reply Quote

crashtech
Send message
Joined: 10 Apr 19
Posts: 29
Credit: 2,414,801,984
RAC: 239,888
1M in Sierpinski / Riesel Base credit

I saw the new version download. Works on both my PCs.

And you have P100s?

ID: 9777 · Rating: 0 · rate:

Reply Quote

mmonnin
Send message
Joined: 1 Feb 17
Posts: 40
Credit: 1,067,056,386
RAC: 31,544
2M in Sierpinski / Riesel Base credit

You most likely know I don't.

It was a confirmation post that an update for P100 systems didn't break other systems. Like it did earlier.

ID: 9778 · Rating: 0 · rate:

Reply Quote

crashtech
Send message
Joined: 10 Apr 19
Posts: 29
Credit: 2,414,801,984
RAC: 239,888
1M in Sierpinski / Riesel Base credit

I actually had no idea, your computers are hidden. ~~So everything works now but the P100s need an app_config, for now anyway.~~

ID: 9779 · Rating: 0 · rate:

Reply Quote

rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 2 Jan 13
Posts: 8234
Credit: 113,724,563
RAC: 0
2M in Sierpinski / Riesel Base credit

I wanted to try v32, but even after resetting the project in the client, the server only sent me v28 cuda100 tasks. Oddly enough, the cuda100 tasks used to work on the P100 but now they don't. So I put this as an app_config.xml:

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda120</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

Now I can run the new v32 and it works fine, but the app_config has to be used at least on this particular computer. I have not tried the rest yet.

EDIT: Actually, I spoke too soon. This app_config does NOT prevent the download of v28 cuda100 tasks. So I'm still not able to run all my GPUs.

It should work without an app_config file or not?

ID: 9780 · Rating: 0 · rate:

Reply Quote

Ian&Steve C.
Send message
Joined: 7 May 23
Posts: 19
Credit: 466,716,124
RAC: 4
100k in Riesel Base credit

your cuda100 app P100s are now failing with this:

./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory

maybe you removed the cuda toolkit?

ID: 9781 · Rating: 0 · rate:

Reply Quote

rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 2 Jan 13
Posts: 8234
Credit: 113,724,563
RAC: 0
2M in Sierpinski / Riesel Base credit

your cuda100 app P100s are now failing with this:

./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory

maybe you removed the cuda toolkit?

https://srbase.my-firewall.org/sr5/download/libcudart.so.10.1

I will update the app including this file.

Update:
The missing libfile is now in the zipfile of v29 cuda100 app

ID: 9782 · Rating: 0 · rate:

Reply Quote

crashtech
Send message
Joined: 10 Apr 19
Posts: 29
Credit: 2,414,801,984
RAC: 239,888
1M in Sierpinski / Riesel Base credit

Thank you for all the work you agreed to do just to get one old model of GPU running, I appreciate it greatly.

I will return to testing TF after the upcoming PrimeGrid challenge is concluded!

ID: 9784 · Rating: 0 · rate:

Reply Quote

crashtech
Send message
Joined: 10 Apr 19
Posts: 29
Credit: 2,414,801,984
RAC: 239,888
1M in Sierpinski / Riesel Base credit

All seems to be working well on P100s. I believe that in honor of this fix, I will put all my GPUs on TF for a while.

Thank you!

ID: 9856 · Rating: 0 · rate:

Reply Quote

Ian&Steve C.
Send message
Joined: 7 May 23
Posts: 19
Credit: 466,716,124
RAC: 4
100k in Riesel Base credit

all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring.

did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually.

ID: 10028 · Rating: 0 · rate:

Reply Quote

rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 2 Jan 13
Posts: 8234
Credit: 113,724,563
RAC: 0
2M in Sierpinski / Riesel Base credit

all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring.

did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually.

We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running.

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

ID: 10029 · Rating: 0 · rate:

Reply Quote

Ian&Steve C.
Send message
Joined: 7 May 23
Posts: 19
Credit: 466,716,124
RAC: 4
100k in Riesel Base credit

all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring.

did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually.

We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running.

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

tried it. didnt work. this app config does not work. it will not limit the project to only getting the cuda100 app.

same results on a Titan V (same cc-7.0), instant error.

I'm guessing the problem is how the app was compiled without cc 7.0 support.

ID: 10030 · Rating: 0 · rate:

Reply Quote

rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 2 Jan 13
Posts: 8234
Credit: 113,724,563
RAC: 0
2M in Sierpinski / Riesel Base credit

all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring.

did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually.

We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running.

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

tried it. didnt work. this app config does not work. it will not limit the project to only getting the cuda100 app.

same results on a Titan V (same cc-7.0), instant error.

I'm guessing the problem is how the app was compiled without cc 7.0 support.

We can only try this

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda111</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

Or the app was not compiled with cc7, P100 was working, right?

ID: 10031 · Rating: 0 · rate:

Reply Quote

Ian&Steve C.
Send message
Joined: 7 May 23
Posts: 19
Credit: 466,716,124
RAC: 4
100k in Riesel Base credit

all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring.

did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually.

We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running.

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

tried it. didnt work. this app config does not work. it will not limit the project to only getting the cuda100 app.

same results on a Titan V (same cc-7.0), instant error.

I'm guessing the problem is how the app was compiled without cc 7.0 support.

We can only try this

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda111</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

Or the app was not compiled with cc7, P100 was working, right?

this method of using an app_config to try to force what app gets sent, do not work. i've never had this work reliably and it didnt work for crashtech when you suggested it last time. an app config is designed to enact some configuration on apps that are already downloaded. it does not impose any influence on what the project server actually sends me.

i think your app was not compiled with CC 7.0 support. and if you didn't do it on the latest app, I would guess you didnt do it on the older apps either.

P100 works as far as I know. but that's because you fixed it by recompiling the app to include the CC 6.0 support.

when compiling cuda apps you cannot just put a minimum value. you need to put ALL values that you want to support. to support everything from CC5.0+ you need to explicitly state every version (5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.6, 8.9)

5.0 - GM100 Maxwell
5.2 - GM200 Maxwell
6.0 - GP100 Pascal
6.1 - GP102+ Pascal
7.0 - GV100 Volta (Titan V and V100)
7.5 - TU102+ Turing
8.0 - GA100 Ampere
8.6 - GA102+ Ampere
8.9 - AD102+ Ada Lovelace
9.0 - GH100 Hopper

and you're going to have to recompile again in the future to add 10.x something if you want to support the upcoming Blackwell GPUs.

ID: 10032 · Rating: 0 · rate:

Reply Quote

rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 2 Jan 13
Posts: 8234
Credit: 113,724,563
RAC: 0
2M in Sierpinski / Riesel Base credit

all cuda120 tasks error out on my V100s. in the same way that crash's P100s were erroring.

did you omit support for CC 7.0 (titan V and V100 at this CC level)? as i pointed out before you need to explicitly add support for every CC level individually.

We can try to use cuda100 WUs with an app_config. Thats the oldest app which we are running.

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda100</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

tried it. didnt work. this app config does not work. it will not limit the project to only getting the cuda100 app.

same results on a Titan V (same cc-7.0), instant error.

I'm guessing the problem is how the app was compiled without cc 7.0 support.

We can only try this

<app_config> <app_version> <app_name>TF</app_name> <plan_class>cuda111</plan_class> <min_gpu_ram_mb>384</min_gpu_ram_mb> <gpu_ram_used_mb>384</gpu_ram_used_mb> <gpu_peak_flops_scale>0.22</gpu_peak_flops_scale> <cpu_frac>0.01</cpu_frac> </app_version> </app_config>

Or the app was not compiled with cc7, P100 was working, right?

this method of using an app_config to try to force what app gets sent, do not work. i've never had this work reliably and it didnt work for crashtech when you suggested it last time. an app config is designed to enact some configuration on apps that are already downloaded. it does not impose any influence on what the project server actually sends me.

i think your app was not compiled with CC 7.0 support. and if you didn't do it on the latest app, I would guess you didnt do it on the older apps either.

P100 works as far as I know. but that's because you fixed it by recompiling the app to include the CC 6.0 support.

when compiling cuda apps you cannot just put a minimum value. you need to put ALL values that you want to support. to support everything from CC5.0+ you need to explicitly state every version (5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.6, 8.9)

5.0 - GM100 Maxwell
5.2 - GM200 Maxwell
6.0 - GP100 Pascal
6.1 - GP102+ Pascal
7.0 - GV100 Volta (Titan V and V100)
7.5 - TU102+ Turing
8.0 - GA100 Ampere
8.6 - GA102+ Ampere
8.9 - AD102+ Ada Lovelace
9.0 - GH100 Hopper

and you're going to have to recompile again in the future to add 10.x something if you want to support the upcoming Blackwell GPUs.

Yes, this was only testing the current plan_class with Cuda111 while there are some different things in config. It looks like the app was not compiled with cc7 but cc6. For cc10 we will have a separate app of course.

ID: 10033 · Rating: 0 · rate:

Reply Quote