TF on GPUs works sometimes, and sometimes does nothing for hours
log in

Advanced search

Message boards : Number crunching : TF on GPUs works sometimes, and sometimes does nothing for hours

Author Message
Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 8878 - Posted: 20 May 2023, 18:36:27 UTC

I have Tahiti and Fury GPUs. They randomly finish a task in an hour, or not at all, no % GPU usage, just sit there. Not sure how long I should leave them. No CPU being used either. I'm now trying 8 at a time on the GPUs so they can work on another when one has dozed off. The cards are NOT incompatible, because I have successfully finished many tasks on them all.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,620,491
RAC: 37,926
Message 8879 - Posted: 20 May 2023, 18:57:42 UTC - in response to Message 8878.
Last modified: 20 May 2023, 19:00:41 UTC

I have Tahiti and Fury GPUs. They randomly finish a task in an hour, or not at all, no % GPU usage, just sit there. Not sure how long I should leave them. No CPU being used either. I'm now trying 8 at a time on the GPUs so they can work on another when one has dozed off. The cards are NOT incompatible, because I have successfully finished many tasks on them all.


Should be a problem with multi-GPU, the app doesnt have multi-GPU support, there must be a fix somehow but dont know yet. Excluding a card doesnt help either because you have the same multi-GPU setup. Its a real BOINC client issue, need one GPU WU on every card separatily.

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 8884 - Posted: 21 May 2023, 20:55:08 UTC - in response to Message 8879.

You have a point there, my triple GPU machines are the two failing. The others work fine, they have single GPUs. I couldn't put my finger on it because I was looking at card models instead.

Is there some way you can prevent them going out to multi GPU setups? Inform the user, etc? They seem to work then mostly fail or sit idle for ages.

By the way, I didn't get an email to inform me of your reply.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,620,491
RAC: 37,926
Message 8885 - Posted: 21 May 2023, 22:28:15 UTC - in response to Message 8884.

You have a point there, my triple GPU machines are the two failing. The others work fine, they have single GPUs. I couldn't put my finger on it because I was looking at card models instead.

Is there some way you can prevent them going out to multi GPU setups? Inform the user, etc? They seem to work then mostly fail or sit idle for ages.

By the way, I didn't get an email to inform me of your reply.


Do you have an app_config?

I sent you a PM with a SPAM filter notice from your provider, you must setup your config there.

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 8887 - Posted: 23 May 2023, 1:37:08 UTC - in response to Message 8885.

You have a point there, my triple GPU machines are the two failing. The others work fine, they have single GPUs. I couldn't put my finger on it because I was looking at card models instead.

Is there some way you can prevent them going out to multi GPU setups? Inform the user, etc? They seem to work then mostly fail or sit idle for ages.

By the way, I didn't get an email to inform me of your reply.

Do you have an app_config?

Yes I know how to use those. What are you suggesting I do with it?

I sent you a PM with a SPAM filter notice from your provider, you must setup your config there.

Ah, yes it does that sometimes, it's run by my ISP. But if it's turned off, I get 50 spams a day. Can you tell me the address it will be coming from please? I can simply add a whitelist exception like "microsoft.com" - for now I'll assume it's my-firewall.org (which to be fair on my ISP filter does look dodgy!)

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,620,491
RAC: 37,926
Message 8888 - Posted: 23 May 2023, 6:17:07 UTC - in response to Message 8887.

You have a point there, my triple GPU machines are the two failing. The others work fine, they have single GPUs. I couldn't put my finger on it because I was looking at card models instead.

Is there some way you can prevent them going out to multi GPU setups? Inform the user, etc? They seem to work then mostly fail or sit idle for ages.

By the way, I didn't get an email to inform me of your reply.

Do you have an app_config?

Yes I know how to use those. What are you suggesting I do with it?

I sent you a PM with a SPAM filter notice from your provider, you must setup your config there.

Ah, yes it does that sometimes, it's run by my ISP. But if it's turned off, I get 50 spams a day. Can you tell me the address it will be coming from please? I can simply add a whitelist exception like "microsoft.com" - for now I'll assume it's my-firewall.org (which to be fair on my ISP filter does look dodgy!)


nope, srbase@outlook.de

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 8890 - Posted: 23 May 2023, 11:47:26 UTC - in response to Message 8888.

Thanks. What were you asking earlier about my app config?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,620,491
RAC: 37,926
Message 8891 - Posted: 23 May 2023, 11:51:44 UTC - in response to Message 8890.
Last modified: 23 May 2023, 13:25:22 UTC

Thanks. What were you asking earlier about my app config?


Can you post it here?

You can try this...

<cc_config> <options> <exclude_gpu> <url>http://srbase.my-firewall.org/sr5/</url> <type>ATI</type> <device_num>1</device_num> <device_num>2</device_num> <app>TF</app> </exclude_gpu> <use_all_gpus>1</use_all_gpus> </options> </cc_config>

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 8893 - Posted: 24 May 2023, 10:41:15 UTC
Last modified: 24 May 2023, 10:41:59 UTC

I'm confused now. Earlier you said "Excluding a card doesn't help either because you have the same multi-GPU setup", but now you are suggesting (I think) I can keep SRbase off all but one GPU and it would work?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,620,491
RAC: 37,926
Message 8894 - Posted: 24 May 2023, 14:32:10 UTC - in response to Message 8893.

I'm confused now. Earlier you said "Excluding a card doesn't help either because you have the same multi-GPU setup", but now you are suggesting (I think) I can keep SRbase off all but one GPU and it would work?


Exclude the cards is easier if its working so running only device 0 (card 1) and disable the other both for other projects.

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 8897 - Posted: 24 May 2023, 17:15:16 UTC - in response to Message 8894.

Cool! That seems to be working. Although I changed it to:

<exclude_gpu>
<url>https://srbase.my-firewall.org/sr5/</url>
<device_num>2</device_num>
</exclude_gpu>
<exclude_gpu>
<url>https://srbase.my-firewall.org/sr5/</url>
<device_num>1</device_num>
</exclude_gpu>

Because the project is attached with https, and it doesn't seem to take two device numbers in one instruction.


Post to thread

Message boards : Number crunching : TF on GPUs works sometimes, and sometimes does nothing for hours


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther