2 GPUs, 2 tasks on one card, not utilizing gpu 1,
log in

Advanced search

Message boards : Number crunching : 2 GPUs, 2 tasks on one card, not utilizing gpu 1,

Previous · 1 · 2
Author Message
Sandman192
Send message
Joined: 28 Apr 22
Posts: 13
Credit: 46,166,890
RAC: 0
Message 8593 - Posted: 30 Dec 2022, 10:39:22 UTC

Do you mean to exclude the other devices for SRBase to run only on device 0?

For one you're asking the wrong question.
I don't want to exclude any devices. But you keep telling us to exclude all but 1 device to get your TF projects to play somewhat nice.

Forgot to mention, if someone did run the TF projects on an 8 GPU server, I believe that the project would run 8 TF projects on 1 GPU at the same time and leaving the rest not running any project OR just run only one TF GPU project and the rest of other projects. Again 1 TF out of 8 GPUs. Note: there are some computers that can run up to 8 GPUs (not just serves). Again only 1 GPU TF on anyone with more than 1 GPU on their computer.

I can speak for all of us when I say, "we want to run the same amount of TF projects that we have of GPUs". Exp: 2 GPUs, we should run 2 TF projects, One for each GPU. Or one TF project on ANY GPU and run whatever project BOINC wants to run on the other GPU". Oh, wait BOINC does that on every project out there like, Prime Grid, GPU Grid, Amicable Numbers, Einstein, ext. Even all of SRBase. Well, for some reason NOT for TF projects.

TF is the only project I can see runs to project at the same time on the same GPU and leaving the other GPU not doing anything.
BOINC is showing TF on device 0 and second TF on device 1. The device 1 is showing false device 1.

People are trying to tell you that there is a problem with you TF project assigning GPUs to their respective devices. And it just baffles all of us and only you should listen to people complaining about a problem with your TF projects and you're not getting it that you need to fix it.


Well, I'm just not going to run any TF projects until you fix it. I'm not even going into the config files just to have it only run 1 TF at a time.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7480
Credit: 43,876,295
RAC: 43,732
Message 8595 - Posted: 30 Dec 2022, 11:12:35 UTC - in response to Message 8593.

Do you mean to exclude the other devices for SRBase to run only on device 0?

For one you're asking the wrong question.
I don't want to exclude any devices. But you keep telling us to exclude all but 1 device to get your TF projects to play somewhat nice.

Forgot to mention, if someone did run the TF projects on an 8 GPU server, I believe that the project would run 8 TF projects on 1 GPU at the same time and leaving the rest not running any project OR just run only one TF GPU project and the rest of other projects. Again 1 TF out of 8 GPUs. Note: there are some computers that can run up to 8 GPUs (not just serves). Again only 1 GPU TF on anyone with more than 1 GPU on their computer.

I can speak for all of us when I say, "we want to run the same amount of TF projects that we have of GPUs". Exp: 2 GPUs, we should run 2 TF projects, One for each GPU. Or one TF project on ANY GPU and run whatever project BOINC wants to run on the other GPU". Oh, wait BOINC does that on every project out there like, Prime Grid, GPU Grid, Amicable Numbers, Einstein, ext. Even all of SRBase. Well, for some reason NOT for TF projects.

TF is the only project I can see runs to project at the same time on the same GPU and leaving the other GPU not doing anything.
BOINC is showing TF on device 0 and second TF on device 1. The device 1 is showing false device 1.

People are trying to tell you that there is a problem with you TF project assigning GPUs to their respective devices. And it just baffles all of us and only you should listen to people complaining about a problem with your TF projects and you're not getting it that you need to fix it.


Well, I'm just not going to run any TF projects until you fix it. I'm not even going into the config files just to have it only run 1 TF at a time.


I can understand your situation. The issue on BOINC is to run all GPUs only on device 0. Also the mfaktc/o have no multi-gpu support for one task, here from the readme:

Q Does mfaktc support multiple GPUs?
A Yes, with the exception that a single instance of mfaktc can only use one
GPU. For each GPU you want to run mfaktc on you need (at least) one
instance of mfaktc. For each instance of mfaktc you can use the
commandline option "-d " to specify which GPU to use for each
specific mfaktc instance.


An app_config.xml could help but I havent found any solution yet thats why we exclude the devices for other projects to avoid being idle.

Profile mikey
Avatar
Send message
Joined: 29 Apr 16
Posts: 59
Credit: 1,507,725,859
RAC: 311,180
Message 8621 - Posted: 6 Jan 2023, 2:39:01 UTC - in response to Message 8593.



I can speak for all of us when I say, "we want to run the same amount of TF projects that we have of GPUs". Exp: 2 GPUs, we should run 2 TF projects, One for each GPU. Or one TF project on ANY GPU and run whatever project BOINC wants to run on the other GPU". Oh, wait BOINC does that on every project out there like, Prime Grid, GPU Grid, Amicable Numbers, Einstein, ext. Even all of SRBase. Well, for some reason NOT for TF projects.


I think if you changed the word "project" to the word "tasks", no quotes, it would be easier to get what you are trying to say.

What I think you are trying to say is that you want EVERY gpu in the pc to run at least one TF task on it here at SRBase, just like it does at every other Boinc Project ie MilkyWay, Einstein etc when you use the <use-all-gpus> line in the cc_config file.

Profile marmot
Avatar
Send message
Joined: 17 Nov 16
Posts: 97
Credit: 151,328,872
RAC: 567,015
Message 8630 - Posted: 7 Jan 2023, 16:46:17 UTC - in response to Message 8621.



I can speak for all of us when I say, "we want to run the same amount of TF projects that we have of GPUs". Exp: 2 GPUs, we should run 2 TF projects, One for each GPU. Or one TF project on ANY GPU and run whatever project BOINC wants to run on the other GPU". Oh, wait BOINC does that on every project out there like, Prime Grid, GPU Grid, Amicable Numbers, Einstein, ext. Even all of SRBase. Well, for some reason NOT for TF projects.


I think if you changed the word "project" to the word "tasks", no quotes, it would be easier to get what you are trying to say.

What I think you are trying to say is that you want EVERY gpu in the pc to run at least one TF task on it here at SRBase, just like it does at every other Boinc Project ie MilkyWay, Einstein etc when you use the <use-all-gpus> line in the cc_config file.


Yes, that's what understand them to be saying also.
If you have three GTX NVidia cards in a computer then a single TF should get assigned to each GPU device automatically, as long as <use_all_gpus>1</use_all_gpus> is in the cc_config.

From Rebirther:
Q Does mfaktc support multiple GPUs?
A Yes, with the exception that a single instance of mfaktc can only use one
GPU. For each GPU you want to run mfaktc on you need (at least) one
instance of mfaktc. For each instance of mfaktc you can use the
commandline option "-d <GPU number>" to specify which GPU to use for each
specific mfaktc instance.


So how does the "-d <GPU number>" switch work?
Do we send that switch to the TF instance and how do we do that?
Or is it a switch applied in the batch file starting a BOINC client with <use_all_gpus>0</use_all_gpus> set in cc_config?

This is strange. If you start Milkyway@Home or Einstein@Home then a single BOINC client with <use_all_gpus>1</use_all_gpus> will send equal number of WU's to each GPU.
SRBase TF should do the same thing, automatically, without the user sending "-d <GPU number>" to any process.

I'll restate my opinion: BOINC management of GPU's is archaic and clunky. We should easily be able to use the BOINC Management GUI to assign GPU WU's without touching config files....
____________
My primes found at SRBase:
40*1017^215605+1 (Top 5000)
18922*111^383954+1 (Top 5000)
19116*24^791057-1 (Top 5000)
4281*880^27069+1

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7480
Credit: 43,876,295
RAC: 43,732
Message 8632 - Posted: 7 Jan 2023, 16:59:53 UTC - in response to Message 8630.



I can speak for all of us when I say, "we want to run the same amount of TF projects that we have of GPUs". Exp: 2 GPUs, we should run 2 TF projects, One for each GPU. Or one TF project on ANY GPU and run whatever project BOINC wants to run on the other GPU". Oh, wait BOINC does that on every project out there like, Prime Grid, GPU Grid, Amicable Numbers, Einstein, ext. Even all of SRBase. Well, for some reason NOT for TF projects.


I think if you changed the word "project" to the word "tasks", no quotes, it would be easier to get what you are trying to say.

What I think you are trying to say is that you want EVERY gpu in the pc to run at least one TF task on it here at SRBase, just like it does at every other Boinc Project ie MilkyWay, Einstein etc when you use the line in the cc_config file.


Yes, that's what understand them to be saying also.
If you have three GTX NVidia cards in a computer then a single TF should get assigned to each GPU device automatically, as long as 1 is in the cc_config.

From Rebirther:
Q Does mfaktc support multiple GPUs?
A Yes, with the exception that a single instance of mfaktc can only use one
GPU. For each GPU you want to run mfaktc on you need (at least) one
instance of mfaktc. For each instance of mfaktc you can use the
commandline option "-d " to specify which GPU to use for each
specific mfaktc instance.


So how does the "-d " switch work?
Do we send that switch to the TF instance and how do we do that?
Or is it a switch applied in the batch file starting a BOINC client with 0 set in cc_config?

This is strange. If you start Milkyway@Home or Einstein@Home then a single BOINC client with 1 will send equal number of WU's to each GPU.
SRBase TF should do the same thing, automatically, without the user sending "-d " to any process.

I'll restate my opinion: BOINC management of GPU's is archaic and clunky. We should easily be able to use the BOINC Management GUI to assign GPU WU's without touching config files....


BOINC send always work to device 0, if you have more GPUs on one host it will only work on this task. I have already asked the BOINC dev to change that. We need the same way which CPU does, 1 WU per GPU.

crashtech
Send message
Joined: 10 Apr 19
Posts: 29
Credit: 828,005,334
RAC: 3,597,880
Message 8634 - Posted: 7 Jan 2023, 21:53:24 UTC

There must be some kind of translation problem here.
I'm going to try to say this clearly:
SRBase is the only project I know of with this problem.
The other projects have solved the problem.
It shouldn't be the user's problem to make SRBase work properly.
This project might benefit from the experience of people at PrimeGrid.
Primegrid does not have problems with multi GPU, nor is any special user configuration required.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7480
Credit: 43,876,295
RAC: 43,732
Message 8635 - Posted: 7 Jan 2023, 22:43:39 UTC - in response to Message 8634.

There must be some kind of translation problem here.
I'm going to try to say this clearly:
SRBase is the only project I know of with this problem.
The other projects have solved the problem.
It shouldn't be the user's problem to make SRBase work properly.
This project might benefit from the experience of people at PrimeGrid.
Primegrid does not have problems with multi GPU, nor is any special user configuration required.


I have already posted this everywhere. The problem is the app itself. If one of the devs can change the single instance to multiple then we are good. Iam still open for any hints or solutions.

Unfortunately gpuowl doesnt support TF which has multi-GPU support.

DeleteNull
Volunteer developer
Volunteer tester
Send message
Joined: 29 Nov 14
Posts: 83
Credit: 374,931,522
RAC: 4,639
Message 8636 - Posted: 7 Jan 2023, 22:47:46 UTC - in response to Message 8635.

Hello, this is the content of a job file:
../../projects/srbase.my-firewall.org_sr5/job_TF_l64c_00020.xml

<job_desc>
<task>
<application>./mfaktc.exe</application>
<append_cmdline_args/>
</task>
<unzip_input>
<zipfilename>mfaktc-linux64-v6.zip</zipfilename>
</unzip_input>
</job_desc>

if you want a device number you have to add a -d <number> parameter (default is 0)

Usage: ./mfaktc.exe [options]
-h display this help and exit
-d <device number> specify the device number used by this program
-tf <exp> <min> <max> trial factor M<exp> from 2^<min> to 2^<max> and exit
instead of parsing the worktodo file
-st run builtin selftest and exit
-st2 same as -st but extended range for k_min/m_max
-v <number> set verbosity (min = 0, default = 1, more = 2, max/debug = 3)

crashtech
Send message
Joined: 10 Apr 19
Posts: 29
Credit: 828,005,334
RAC: 3,597,880
Message 8637 - Posted: 7 Jan 2023, 22:50:09 UTC - in response to Message 8635.

I have already posted this everywhere. The problem is the app itself. If one of the devs can change the single instance to multiple then we are good. Iam still open for any hints or solutions.

Unfortunately gpuowl doesnt support TF which has multi-GPU support.

OK, I was seeing "BOINC app" mentioned and thinking that meant that the boinc-client app was being blamed for the problem.

DeleteNull
Volunteer developer
Volunteer tester
Send message
Joined: 29 Nov 14
Posts: 83
Credit: 374,931,522
RAC: 4,639
Message 8638 - Posted: 7 Jan 2023, 22:56:47 UTC - in response to Message 8637.

As far as I know: BOINC passes e.g. "-device 1" to the device so may be we have to update the code that it understands -device instead of -d.

Profile marmot
Avatar
Send message
Joined: 17 Nov 16
Posts: 97
Credit: 151,328,872
RAC: 567,015
Message 8643 - Posted: 10 Jan 2023, 13:40:37 UTC - in response to Message 8636.

Hello, this is the content of a job file:
../../projects/srbase.my-firewall.org_sr5/job_TF_l64c_00020.xml


Can we permanently edit the job file or is it created freshly upon every WU?
Can we edit a job file template that supersedes the default?
____________
My primes found at SRBase:
40*1017^215605+1 (Top 5000)
18922*111^383954+1 (Top 5000)
19116*24^791057-1 (Top 5000)
4281*880^27069+1

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7480
Credit: 43,876,295
RAC: 43,732
Message 8644 - Posted: 10 Jan 2023, 14:14:01 UTC - in response to Message 8643.

Hello, this is the content of a job file:
../../projects/srbase.my-firewall.org_sr5/job_TF_l64c_00020.xml


Can we permanently edit the job file or is it created freshly upon every WU?
Can we edit a job file template that supersedes the default?


no, you need an app_config.xml, the job file is signed by the server.

Profile marmot
Avatar
Send message
Joined: 17 Nov 16
Posts: 97
Credit: 151,328,872
RAC: 567,015
Message 8654 - Posted: 18 Jan 2023, 3:26:28 UTC - in response to Message 8636.

Hello, this is the content of a job file:
../../projects/srbase.my-firewall.org_sr5/job_TF_l64c_00020.xml

<job_desc>
<task>
<application>./mfaktc.exe</application>
<append_cmdline_args/>
</task>
<unzip_input>
<zipfilename>mfaktc-linux64-v6.zip</zipfilename>
</unzip_input>
</job_desc>

if you want a device number you have to add a -d <number> parameter (default is 0)

Usage: ./mfaktc.exe [options]
-h display this help and exit
-d <device number> specify the device number used by this program
-tf <exp> <min> <max> trial factor M<exp> from 2^<min> to 2^<max> and exit
instead of parsing the worktodo file
-st run builtin selftest and exit
-st2 same as -st but extended range for k_min/m_max
-v <number> set verbosity (min = 0, default = 1, more = 2, max/debug = 3)


So could you give us an example app_config to force a WU onto dev x with a switch, please?
____________
My primes found at SRBase:
40*1017^215605+1 (Top 5000)
18922*111^383954+1 (Top 5000)
19116*24^791057-1 (Top 5000)
4281*880^27069+1

Jozef J
Send message
Joined: 29 Dec 14
Posts: 3
Credit: 146,921,101
RAC: 0
Message 8959 - Posted: 4 Jul 2023, 16:27:21 UTC

So could you give us an example app_config to force a WU onto dev x with a switch, please?

Previous · 1 · 2
Post to thread

Message boards : Number crunching : 2 GPUs, 2 tasks on one card, not utilizing gpu 1,


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther