2 GPUs, 2 tasks on one card, not utilizing gpu 1,
log in

Advanced search

Message boards : Number crunching : 2 GPUs, 2 tasks on one card, not utilizing gpu 1,

1 · 2 · Next
Author Message
MindCrime
Send message
Joined: 29 Mar 15
Posts: 3
Credit: 30,202,081
RAC: 92
Message 7446 - Posted: 20 Mar 2021, 22:20:58 UTC

I have been successfully crunching milkyway, einstein, collatz.. on a dual HD 7970 machine.

I just got some TF wus, it starts 2, it says one is on device 0 and the other device 1 but only one of the cards has a load on it. the other is idle.

I recently updated my driver so to confirm it wasn't that I suspended SRbase and ran some einstein and both cards are utilized.

The 2 WUs completed successfully, the estimated time, ~1600 seconds, was significantly longer than the actual run time 753 seconds and 758 seconds

As the einstein finished on one card SRbase started 1 wu on that card. That one finished in 250 seconds. I watched the other einstein finish (dev 1), and SRbase did not start a WU on that card. While dev 1 is idle SRbase on dev0 finished, and started another solo.

Im sure someone has tackled this before, I just need someone to point me to the fix.

MindCrime
Send message
Joined: 29 Mar 15
Posts: 3
Credit: 30,202,081
RAC: 92
Message 7447 - Posted: 20 Mar 2021, 23:20:14 UTC - in response to Message 7446.

looking at these run times and how many watts these 7970s use it doesn't feel like TF is a good app for them. So don't invest much time in helping me.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 7448 - Posted: 21 Mar 2021, 6:31:16 UTC - in response to Message 7447.

looking at these run times and how many watts these 7970s use it doesn't feel like TF is a good app for them. So don't invest much time in helping me.


The apps are not running on Multi-GPU, only on device 0, see FAQ for more infos.

Profile bcavnaugh
Avatar
Send message
Joined: 17 Apr 15
Posts: 9
Credit: 86,684,896
RAC: 0
Message 8005 - Posted: 13 Nov 2021, 19:54:22 UTC
Last modified: 13 Nov 2021, 20:00:59 UTC

Seems this is also an issue with NVIDIA Graphics Cards as well.
While 2 tasks are running one card shows 2055MHz 97% usage and the other card shows 300MHz and 0% usage.

2] NVIDIA NVIDIA GeForce RTX 2080 Ti (4095MB) driver: 472.12 OpenCL: 3.0

Went with this for now
<cc_config>
<options>
<exclude_gpu>
<url>http://srbase.my-firewall.org/sr5/</url>
<type>NVIDIA</type>
<device_num>0</device_num>
<app>TF</app>
</exclude_gpu>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>
____________

Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.

Sandman192
Send message
Joined: 28 Apr 22
Posts: 13
Credit: 46,166,890
RAC: 645
Message 8320 - Posted: 18 Jul 2022, 23:56:59 UTC - in response to Message 7448.
Last modified: 19 Jul 2022, 0:31:46 UTC

The apps are not running on Multi-GPU, only on device 0, see FAQ for more infos.


I got the same problem and the same answer.

rebirther we know it's not Multi-GPU and you're not helping with the problem.
We are running 2 projects that are both SRBase and BOINC assigning each to a GPU. SRBase is ignoring BOINC commands to run 2nd 'GPU - Device 1' and possibly 2, 3 ext.
But what SRBase is doing is running 2 GPU projects on the same GPU. That's not Multi-GPUing.
Running one project for 2 or more GPUs is Multi-GPU.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 8321 - Posted: 19 Jul 2022, 6:16:58 UTC - in response to Message 8320.

The apps are not running on Multi-GPU, only on device 0, see FAQ for more infos.


I got the same problem and the same answer.

rebirther we know it's not Multi-GPU and you're not helping with the problem.
We are running 2 projects that are both SRBase and BOINC assigning each to a GPU. SRBase is ignoring BOINC commands to run 2nd 'GPU - Device 1' and possibly 2, 3 ext.
But what SRBase is doing is running 2 GPU projects on the same GPU. That's not Multi-GPUing.
Running one project for 2 or more GPUs is Multi-GPU.


The second card should be running on Device 1 for another project, if not post your app_config here

Sandman192
Send message
Joined: 28 Apr 22
Posts: 13
Credit: 46,166,890
RAC: 645
Message 8326 - Posted: 23 Jul 2022, 2:21:38 UTC - in response to Message 8321.

The second card should be running on Device 1 for another project, if not post your app_config here

Should run on device 1. But it doesn't. I'm running 2 SRBase TF and only showing 1 GPU utilized.
<cc_config>
<log_flags>
<file_xfer>1</file_xfer>
<sched_ops>1</sched_ops>
<task>1</task>
<app_msg_receive>0</app_msg_receive>
<app_msg_send>0</app_msg_send>
<async_file_debug>0</async_file_debug>
<benchmark_debug>0</benchmark_debug>
<checkpoint_debug>0</checkpoint_debug>
<coproc_debug>0</coproc_debug>
<cpu_sched>0</cpu_sched>
<cpu_sched_debug>0</cpu_sched_debug>
<cpu_sched_status>0</cpu_sched_status>
<dcf_debug>0</dcf_debug>
<disk_usage_debug>0</disk_usage_debug>
<file_xfer_debug>0</file_xfer_debug>
<gui_rpc_debug>0</gui_rpc_debug>
<heartbeat_debug>0</heartbeat_debug>
<http_debug>0</http_debug>
<http_xfer_debug>0</http_xfer_debug>
<idle_detection_debug>0</idle_detection_debug>
<mem_usage_debug>0</mem_usage_debug>
<network_status_debug>0</network_status_debug>
<notice_debug>0</notice_debug>
<poll_debug>0</poll_debug>
<priority_debug>0</priority_debug>
<proxy_debug>0</proxy_debug>
<rr_simulation>0</rr_simulation>
<rrsim_detail>0</rrsim_detail>
<sched_op_debug>0</sched_op_debug>
<scrsave_debug>0</scrsave_debug>
<slot_debug>0</slot_debug>
<state_debug>0</state_debug>
<statefile_debug>0</statefile_debug>
<suspend_debug>0</suspend_debug>
<task_debug>0</task_debug>
<time_debug>0</time_debug>
<trickle_debug>0</trickle_debug>
<unparsed_xml>0</unparsed_xml>
<work_fetch_debug>0</work_fetch_debug>
</log_flags>
<options>
<abort_jobs_on_exit>0</abort_jobs_on_exit>
<allow_gui_rpc_get>0</allow_gui_rpc_get>
<allow_multiple_clients>0</allow_multiple_clients>
<allow_remote_gui_rpc>0</allow_remote_gui_rpc>
<disallow_attach>0</disallow_attach>
<dont_check_file_sizes>0</dont_check_file_sizes>
<dont_contact_ref_site>0</dont_contact_ref_site>
<lower_client_priority>0</lower_client_priority>
<dont_suspend_nci>0</dont_suspend_nci>
<dont_use_vbox>0</dont_use_vbox>
<dont_use_wsl>0</dont_use_wsl>
<exit_after_finish>0</exit_after_finish>
<exit_before_start>0</exit_before_start>
<exit_when_idle>0</exit_when_idle>
<fetch_minimal_work>0</fetch_minimal_work>
<fetch_on_update>0</fetch_on_update>
<force_auth>default</force_auth>
<http_1_0>0</http_1_0>
<http_transfer_timeout>300</http_transfer_timeout>
<http_transfer_timeout_bps>10</http_transfer_timeout_bps>
<max_event_log_lines>2000</max_event_log_lines>
<max_file_xfers>8</max_file_xfers>
<max_file_xfers_per_project>2</max_file_xfers_per_project>
<max_stderr_file_size>0.000000</max_stderr_file_size>
<max_stdout_file_size>0.000000</max_stdout_file_size>
<max_tasks_reported>0</max_tasks_reported>
<ncpus>-1</ncpus>
<no_alt_platform>0</no_alt_platform>
<no_gpus>0</no_gpus>
<no_info_fetch>0</no_info_fetch>
<no_opencl>0</no_opencl>
<no_priority_change>0</no_priority_change>
<os_random_only>0</os_random_only>
<process_priority>-1</process_priority>
<process_priority_special>-1</process_priority_special>
<proxy_info>
<socks_server_name></socks_server_name>
<socks_server_port>80</socks_server_port>
<http_server_name></http_server_name>
<http_server_port>80</http_server_port>
<socks5_user_name></socks5_user_name>
<socks5_user_passwd></socks5_user_passwd>
<socks5_remote_dns>0</socks5_remote_dns>
<http_user_name></http_user_name>
<http_user_passwd></http_user_passwd>
<no_proxy></no_proxy>
<no_autodetect>0</no_autodetect>
</proxy_info>
<rec_half_life_days>10.000000</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
<run_apps_manually>0</run_apps_manually>
<save_stats_days>30</save_stats_days>
<skip_cpu_benchmarks>0</skip_cpu_benchmarks>
<simple_gui_only>0</simple_gui_only>
<start_delay>0.000000</start_delay>
<stderr_head>0</stderr_head>
<suppress_net_info>0</suppress_net_info>
<unsigned_apps_ok>0</unsigned_apps_ok>
<use_all_gpus>1</use_all_gpus>
<use_certs>1</use_certs>
<use_certs_only>0</use_certs_only>
<vbox_window>1</vbox_window>
</options>
</cc_config>
<options>

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 8327 - Posted: 23 Jul 2022, 6:43:31 UTC - in response to Message 8326.

This is only the cc_config file, you need an app_config

Sandman192
Send message
Joined: 28 Apr 22
Posts: 13
Credit: 46,166,890
RAC: 645
Message 8328 - Posted: 24 Jul 2022, 2:39:22 UTC - in response to Message 8327.

I don't have app_data.xml anywhere in the BOINC data folder.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 8329 - Posted: 24 Jul 2022, 7:01:57 UTC - in response to Message 8328.

I don't have app_data.xml anywhere in the BOINC data folder.


Forgot my last statement, try this:

<cc_config> <options> <exclude_gpu> <url>http://srbase.my-firewall.org/sr5/</url> <type>ATI</type> <device_num>1</device_num> <app>TF</app> </exclude_gpu> <use_all_gpus>1</use_all_gpus> </options> </cc_config>

Sandman192
Send message
Joined: 28 Apr 22
Posts: 13
Credit: 46,166,890
RAC: 645
Message 8333 - Posted: 25 Jul 2022, 5:44:12 UTC - in response to Message 8329.

I don't have app_data.xml anywhere in the BOINC data folder.


Forgot my last statement, try this:

<cc_config> <options> <exclude_gpu> <url>http://srbase.my-firewall.org/sr5/</url> <type>ATI</type> <device_num>1</device_num> <app>TF</app> </exclude_gpu> <use_all_gpus>1</use_all_gpus> </options> </cc_config>

Exclude srbase. How does that make the project, work on GPU 'device 1'?
This just told the TF work not to try to run 'device 1'.
Plus, that's just for ATIs.
Is it going to get fixed or just do a work around?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 8334 - Posted: 25 Jul 2022, 6:45:27 UTC - in response to Message 8333.

I don't have app_data.xml anywhere in the BOINC data folder.


Forgot my last statement, try this:

<cc_config> <options> <exclude_gpu> <url>http://srbase.my-firewall.org/sr5/</url> <type>ATI</type> <device_num>1</device_num> <app>TF</app> </exclude_gpu> <use_all_gpus>1</use_all_gpus> </options> </cc_config>

Exclude srbase. How does that make the project, work on GPU 'device 1'?
This just told the TF work not to try to run 'device 1'.
Plus, that's just for ATIs.
Is it going to get fixed or just do a work around?


For Nvidia change ATI to Nvidia, unfortunately the app doesnt work on device 1

Sandman192
Send message
Joined: 28 Apr 22
Posts: 13
Credit: 46,166,890
RAC: 645
Message 8345 - Posted: 3 Aug 2022, 17:38:52 UTC - in response to Message 8334.
Last modified: 3 Aug 2022, 17:52:44 UTC

SRBase TF 0.12 is now running on Device 1 for me.
It's running more than one GPU at a time.👍

crashtech
Send message
Joined: 10 Apr 19
Posts: 28
Credit: 466,952,134
RAC: 6,308,462
Message 8554 - Posted: 24 Dec 2022, 23:19:08 UTC

Why can't this app just make use of all GPUs?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 8555 - Posted: 25 Dec 2022, 1:02:04 UTC - in response to Message 8554.

Why can't this app just make use of all GPUs?


In standalone you can run 2 instances but not designed for Multi-GPU. Its more a BOINC issue while its only running on device 0.

crashtech
Send message
Joined: 10 Apr 19
Posts: 28
Credit: 466,952,134
RAC: 6,308,462
Message 8557 - Posted: 25 Dec 2022, 2:52:15 UTC - in response to Message 8555.

Why can't this app just make use of all GPUs?


In standalone you can run 2 instances but not designed for Multi-GPU. Its more a BOINC issue while its only running on device 0.

If it's a BOINC issue, virtually every other project seems to have solved it. But perhaps you are getting the amount of results desired, so there is no incentive to get more compute power by fixing this problem.

Profile marmot
Avatar
Send message
Joined: 17 Nov 16
Posts: 97
Credit: 126,410,450
RAC: 21,911
Message 8559 - Posted: 25 Dec 2022, 3:51:05 UTC
Last modified: 25 Dec 2022, 3:59:33 UTC

BOINC's GPU management is archaic and clunky.
It was designed for sharing projects on a single core CPU.

The dev's refuse to add a user managed GUI work queue for each GPU device or CPU core (group of cores).
(Folding at Home has better work queue management)

They expect us to solve these issues by multiple installs of BOINC (I use VM's to solve some CPU queue issues including multiple time of day pauses imposed by the electric company's 10x rates for 2 hours twice a day).

You can setup a BOINC install for each GPU and attempt to exclude the other GPU's in each install (<exclude_gpu>).
This should work fine on 2 GPU's but 3 or more will get trickier.
Using VM's for controlling work queue on each GPU needs GPU passthrough and only works readily on Linux with VMWare and (going by what I read, no personal experience).
I've read Hyper-V on Windows could work but not read anyone report success.

Even when the GPU sharing amongst projects is supposed to be working correctly, I've seen the client properly share WU's among the GPU's then improperly share once WU's complete and new ones start. It takes a complete BOINC client shutdown to fix the issue.

From what Rebirther says, it's not supposed to work, but try this app_config.xml for 2 GPU's

<app_config> <project_max_concurrent>X</project_max_concurrent> <app> <name>TF</name> <max_concurrent>4</max_concurrent> <gpu_versions> <gpu_usage>0.50</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> </app_config>


This could kick the client into loading 2 TF's on both GPU's.
X= your number of CPU threads, also try threads + 4 as X.
Make sure you do a complete restart of the client; don't just reread the config files from Options menu (which won't change the GPU loadout on the fly).

Maybe setting
<max_concurrent>2</max_concurrent> ... <gpu_usage>0.98</gpu_usage>

or
<gpu_usage>0.49</gpu_usage>

will trick BOINC into loading the WU on the other GPU.
The client sometimes acts in ways the documentation claims it won't.

My dual GPU machine is doing Milkyway atm, don't have time to experiment.
____________
My primes found at SRBase:
40*1017^215605+1 (Top 5000)
18922*111^383954+1 (Top 5000)
4281*880^27069+1

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 8560 - Posted: 25 Dec 2022, 8:07:41 UTC - in response to Message 8557.

Why can't this app just make use of all GPUs?


In standalone you can run 2 instances but not designed for Multi-GPU. Its more a BOINC issue while its only running on device 0.

If it's a BOINC issue, virtually every other project seems to have solved it. But perhaps you are getting the amount of results desired, so there is no incentive to get more compute power by fixing this problem.


It would not be a problem if they change their code to support multi-GPU.

Sandman192
Send message
Joined: 28 Apr 22
Posts: 13
Credit: 46,166,890
RAC: 645
Message 8589 - Posted: 30 Dec 2022, 7:31:42 UTC - in response to Message 8560.
Last modified: 30 Dec 2022, 7:40:48 UTC

I like how the TF out of all projects SRBase products (not just this SRBase GPU project that you can find on BOINC) is the ONLY 1 project problem to get it to run on more than 1 GPU on the same computer. This is NOT a BOINC problem.

I've seen servers that can handle 8 GPUs BUT if you want to run 8 TF projects for each GPU (not 1 project for 8 GPUs) which should run with no problems.

This has been reported in Mar 2021 and the "administrator/developer" thanks that there is NO problem/bug and that we should never run TF on more than 1 GPU for some reason and just edit a config file just to turn OFF all other GPUs for TF.

When does anyone have to go into config file just to disable any GPUs!?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 8592 - Posted: 30 Dec 2022, 8:03:47 UTC - in response to Message 8589.

I like how the TF out of all projects SRBase products (not just this SRBase GPU project that you can find on BOINC) is the ONLY 1 project problem to get it to run on more than 1 GPU on the same computer. This is NOT a BOINC problem.

I've seen servers that can handle 8 GPUs BUT if you want to run 8 TF projects for each GPU (not 1 project for 8 GPUs) which should run with no problems.

This has been reported in Mar 2021 and the "administrator/developer" thanks that there is NO problem/bug and that we should never run TF on more than 1 GPU for some reason and just edit a config file just to turn OFF all other GPUs for TF.

When does anyone have to go into config file just to disable any GPUs!?


Do you mean to exclude the other devices for SRBase to run only on device 0?

1 · 2 · Next
Post to thread

Message boards : Number crunching : 2 GPUs, 2 tasks on one card, not utilizing gpu 1,


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther