Author |
Message |
|
I have been successfully crunching milkyway, einstein, collatz.. on a dual HD 7970 machine.
I just got some TF wus, it starts 2, it says one is on device 0 and the other device 1 but only one of the cards has a load on it. the other is idle.
I recently updated my driver so to confirm it wasn't that I suspended SRbase and ran some einstein and both cards are utilized.
The 2 WUs completed successfully, the estimated time, ~1600 seconds, was significantly longer than the actual run time 753 seconds and 758 seconds
As the einstein finished on one card SRbase started 1 wu on that card. That one finished in 250 seconds. I watched the other einstein finish (dev 1), and SRbase did not start a WU on that card. While dev 1 is idle SRbase on dev0 finished, and started another solo.
Im sure someone has tackled this before, I just need someone to point me to the fix. |
|
|
|
looking at these run times and how many watts these 7970s use it doesn't feel like TF is a good app for them. So don't invest much time in helping me. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7480 Credit: 43,876,295 RAC: 43,732 |
looking at these run times and how many watts these 7970s use it doesn't feel like TF is a good app for them. So don't invest much time in helping me.
The apps are not running on Multi-GPU, only on device 0, see FAQ for more infos. |
|
|
|
Seems this is also an issue with NVIDIA Graphics Cards as well.
While 2 tasks are running one card shows 2055MHz 97% usage and the other card shows 300MHz and 0% usage.
2] NVIDIA NVIDIA GeForce RTX 2080 Ti (4095MB) driver: 472.12 OpenCL: 3.0
Went with this for now
<cc_config>
<options>
<exclude_gpu>
<url>http://srbase.my-firewall.org/sr5/</url>
<type>NVIDIA</type>
<device_num>0</device_num>
<app>TF</app>
</exclude_gpu>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>
____________
Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community. |
|
|
|
The apps are not running on Multi-GPU, only on device 0, see FAQ for more infos.
I got the same problem and the same answer.
rebirther we know it's not Multi-GPU and you're not helping with the problem.
We are running 2 projects that are both SRBase and BOINC assigning each to a GPU. SRBase is ignoring BOINC commands to run 2nd 'GPU - Device 1' and possibly 2, 3 ext.
But what SRBase is doing is running 2 GPU projects on the same GPU. That's not Multi-GPUing.
Running one project for 2 or more GPUs is Multi-GPU. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7480 Credit: 43,876,295 RAC: 43,732 |
The apps are not running on Multi-GPU, only on device 0, see FAQ for more infos.
I got the same problem and the same answer.
rebirther we know it's not Multi-GPU and you're not helping with the problem.
We are running 2 projects that are both SRBase and BOINC assigning each to a GPU. SRBase is ignoring BOINC commands to run 2nd 'GPU - Device 1' and possibly 2, 3 ext.
But what SRBase is doing is running 2 GPU projects on the same GPU. That's not Multi-GPUing.
Running one project for 2 or more GPUs is Multi-GPU.
The second card should be running on Device 1 for another project, if not post your app_config here |
|
|
|
The second card should be running on Device 1 for another project, if not post your app_config here
Should run on device 1. But it doesn't. I'm running 2 SRBase TF and only showing 1 GPU utilized.
<cc_config>
<log_flags>
<file_xfer>1</file_xfer>
<sched_ops>1</sched_ops>
<task>1</task>
<app_msg_receive>0</app_msg_receive>
<app_msg_send>0</app_msg_send>
<async_file_debug>0</async_file_debug>
<benchmark_debug>0</benchmark_debug>
<checkpoint_debug>0</checkpoint_debug>
<coproc_debug>0</coproc_debug>
<cpu_sched>0</cpu_sched>
<cpu_sched_debug>0</cpu_sched_debug>
<cpu_sched_status>0</cpu_sched_status>
<dcf_debug>0</dcf_debug>
<disk_usage_debug>0</disk_usage_debug>
<file_xfer_debug>0</file_xfer_debug>
<gui_rpc_debug>0</gui_rpc_debug>
<heartbeat_debug>0</heartbeat_debug>
<http_debug>0</http_debug>
<http_xfer_debug>0</http_xfer_debug>
<idle_detection_debug>0</idle_detection_debug>
<mem_usage_debug>0</mem_usage_debug>
<network_status_debug>0</network_status_debug>
<notice_debug>0</notice_debug>
<poll_debug>0</poll_debug>
<priority_debug>0</priority_debug>
<proxy_debug>0</proxy_debug>
<rr_simulation>0</rr_simulation>
<rrsim_detail>0</rrsim_detail>
<sched_op_debug>0</sched_op_debug>
<scrsave_debug>0</scrsave_debug>
<slot_debug>0</slot_debug>
<state_debug>0</state_debug>
<statefile_debug>0</statefile_debug>
<suspend_debug>0</suspend_debug>
<task_debug>0</task_debug>
<time_debug>0</time_debug>
<trickle_debug>0</trickle_debug>
<unparsed_xml>0</unparsed_xml>
<work_fetch_debug>0</work_fetch_debug>
</log_flags>
<options>
<abort_jobs_on_exit>0</abort_jobs_on_exit>
<allow_gui_rpc_get>0</allow_gui_rpc_get>
<allow_multiple_clients>0</allow_multiple_clients>
<allow_remote_gui_rpc>0</allow_remote_gui_rpc>
<disallow_attach>0</disallow_attach>
<dont_check_file_sizes>0</dont_check_file_sizes>
<dont_contact_ref_site>0</dont_contact_ref_site>
<lower_client_priority>0</lower_client_priority>
<dont_suspend_nci>0</dont_suspend_nci>
<dont_use_vbox>0</dont_use_vbox>
<dont_use_wsl>0</dont_use_wsl>
<exit_after_finish>0</exit_after_finish>
<exit_before_start>0</exit_before_start>
<exit_when_idle>0</exit_when_idle>
<fetch_minimal_work>0</fetch_minimal_work>
<fetch_on_update>0</fetch_on_update>
<force_auth>default</force_auth>
<http_1_0>0</http_1_0>
<http_transfer_timeout>300</http_transfer_timeout>
<http_transfer_timeout_bps>10</http_transfer_timeout_bps>
<max_event_log_lines>2000</max_event_log_lines>
<max_file_xfers>8</max_file_xfers>
<max_file_xfers_per_project>2</max_file_xfers_per_project>
<max_stderr_file_size>0.000000</max_stderr_file_size>
<max_stdout_file_size>0.000000</max_stdout_file_size>
<max_tasks_reported>0</max_tasks_reported>
<ncpus>-1</ncpus>
<no_alt_platform>0</no_alt_platform>
<no_gpus>0</no_gpus>
<no_info_fetch>0</no_info_fetch>
<no_opencl>0</no_opencl>
<no_priority_change>0</no_priority_change>
<os_random_only>0</os_random_only>
<process_priority>-1</process_priority>
<process_priority_special>-1</process_priority_special>
<proxy_info>
<socks_server_name></socks_server_name>
<socks_server_port>80</socks_server_port>
<http_server_name></http_server_name>
<http_server_port>80</http_server_port>
<socks5_user_name></socks5_user_name>
<socks5_user_passwd></socks5_user_passwd>
<socks5_remote_dns>0</socks5_remote_dns>
<http_user_name></http_user_name>
<http_user_passwd></http_user_passwd>
<no_proxy></no_proxy>
<no_autodetect>0</no_autodetect>
</proxy_info>
<rec_half_life_days>10.000000</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
<run_apps_manually>0</run_apps_manually>
<save_stats_days>30</save_stats_days>
<skip_cpu_benchmarks>0</skip_cpu_benchmarks>
<simple_gui_only>0</simple_gui_only>
<start_delay>0.000000</start_delay>
<stderr_head>0</stderr_head>
<suppress_net_info>0</suppress_net_info>
<unsigned_apps_ok>0</unsigned_apps_ok>
<use_all_gpus>1</use_all_gpus>
<use_certs>1</use_certs>
<use_certs_only>0</use_certs_only>
<vbox_window>1</vbox_window>
</options>
</cc_config>
<options> |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7480 Credit: 43,876,295 RAC: 43,732 |
This is only the cc_config file, you need an app_config |
|
|
|
I don't have app_data.xml anywhere in the BOINC data folder. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7480 Credit: 43,876,295 RAC: 43,732 |
I don't have app_data.xml anywhere in the BOINC data folder.
Forgot my last statement, try this:
<cc_config>
<options>
<exclude_gpu>
<url>http://srbase.my-firewall.org/sr5/</url>
<type>ATI</type>
<device_num>1</device_num>
<app>TF</app>
</exclude_gpu>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config> |
|
|
|
I don't have app_data.xml anywhere in the BOINC data folder.
Forgot my last statement, try this:
<cc_config>
<options>
<exclude_gpu>
<url>http://srbase.my-firewall.org/sr5/</url>
<type>ATI</type>
<device_num>1</device_num>
<app>TF</app>
</exclude_gpu>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>
Exclude srbase. How does that make the project, work on GPU 'device 1'?
This just told the TF work not to try to run 'device 1'.
Plus, that's just for ATIs.
Is it going to get fixed or just do a work around? |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7480 Credit: 43,876,295 RAC: 43,732 |
I don't have app_data.xml anywhere in the BOINC data folder.
Forgot my last statement, try this:
<cc_config>
<options>
<exclude_gpu>
<url>http://srbase.my-firewall.org/sr5/</url>
<type>ATI</type>
<device_num>1</device_num>
<app>TF</app>
</exclude_gpu>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>
Exclude srbase. How does that make the project, work on GPU 'device 1'?
This just told the TF work not to try to run 'device 1'.
Plus, that's just for ATIs.
Is it going to get fixed or just do a work around?
For Nvidia change ATI to Nvidia, unfortunately the app doesnt work on device 1 |
|
|
|
SRBase TF 0.12 is now running on Device 1 for me.
It's running more than one GPU at a time.👍 |
|
|
|
Why can't this app just make use of all GPUs? |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7480 Credit: 43,876,295 RAC: 43,732 |
Why can't this app just make use of all GPUs?
In standalone you can run 2 instances but not designed for Multi-GPU. Its more a BOINC issue while its only running on device 0. |
|
|
|
Why can't this app just make use of all GPUs?
In standalone you can run 2 instances but not designed for Multi-GPU. Its more a BOINC issue while its only running on device 0.
If it's a BOINC issue, virtually every other project seems to have solved it. But perhaps you are getting the amount of results desired, so there is no incentive to get more compute power by fixing this problem. |
|
|
|
BOINC's GPU management is archaic and clunky.
It was designed for sharing projects on a single core CPU.
The dev's refuse to add a user managed GUI work queue for each GPU device or CPU core (group of cores).
(Folding at Home has better work queue management)
They expect us to solve these issues by multiple installs of BOINC (I use VM's to solve some CPU queue issues including multiple time of day pauses imposed by the electric company's 10x rates for 2 hours twice a day).
You can setup a BOINC install for each GPU and attempt to exclude the other GPU's in each install (<exclude_gpu>).
This should work fine on 2 GPU's but 3 or more will get trickier.
Using VM's for controlling work queue on each GPU needs GPU passthrough and only works readily on Linux with VMWare and (going by what I read, no personal experience).
I've read Hyper-V on Windows could work but not read anyone report success.
Even when the GPU sharing amongst projects is supposed to be working correctly, I've seen the client properly share WU's among the GPU's then improperly share once WU's complete and new ones start. It takes a complete BOINC client shutdown to fix the issue.
From what Rebirther says, it's not supposed to work, but try this app_config.xml for 2 GPU's
<app_config>
<project_max_concurrent>X</project_max_concurrent>
<app>
<name>TF</name>
<max_concurrent>4</max_concurrent>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.05</cpu_usage>
</gpu_versions>
</app>
</app_config>
This could kick the client into loading 2 TF's on both GPU's.
X= your number of CPU threads, also try threads + 4 as X.
Make sure you do a complete restart of the client; don't just reread the config files from Options menu (which won't change the GPU loadout on the fly).
Maybe setting
<max_concurrent>2</max_concurrent>
...
<gpu_usage>0.98</gpu_usage>
or
<gpu_usage>0.49</gpu_usage>
will trick BOINC into loading the WU on the other GPU.
The client sometimes acts in ways the documentation claims it won't.
My dual GPU machine is doing Milkyway atm, don't have time to experiment.
____________
My primes found at SRBase:
40*1017^215605+1 (Top 5000)
18922*111^383954+1 (Top 5000)
19116*24^791057-1 (Top 5000)
4281*880^27069+1 |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7480 Credit: 43,876,295 RAC: 43,732 |
Why can't this app just make use of all GPUs?
In standalone you can run 2 instances but not designed for Multi-GPU. Its more a BOINC issue while its only running on device 0.
If it's a BOINC issue, virtually every other project seems to have solved it. But perhaps you are getting the amount of results desired, so there is no incentive to get more compute power by fixing this problem.
It would not be a problem if they change their code to support multi-GPU. |
|
|
|
I like how the TF out of all projects SRBase products (not just this SRBase GPU project that you can find on BOINC) is the ONLY 1 project problem to get it to run on more than 1 GPU on the same computer. This is NOT a BOINC problem.
I've seen servers that can handle 8 GPUs BUT if you want to run 8 TF projects for each GPU (not 1 project for 8 GPUs) which should run with no problems.
This has been reported in Mar 2021 and the "administrator/developer" thanks that there is NO problem/bug and that we should never run TF on more than 1 GPU for some reason and just edit a config file just to turn OFF all other GPUs for TF.
When does anyone have to go into config file just to disable any GPUs!? |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7480 Credit: 43,876,295 RAC: 43,732 |
I like how the TF out of all projects SRBase products (not just this SRBase GPU project that you can find on BOINC) is the ONLY 1 project problem to get it to run on more than 1 GPU on the same computer. This is NOT a BOINC problem.
I've seen servers that can handle 8 GPUs BUT if you want to run 8 TF projects for each GPU (not 1 project for 8 GPUs) which should run with no problems.
This has been reported in Mar 2021 and the "administrator/developer" thanks that there is NO problem/bug and that we should never run TF on more than 1 GPU for some reason and just edit a config file just to turn OFF all other GPUs for TF.
When does anyone have to go into config file just to disable any GPUs!?
Do you mean to exclude the other devices for SRBase to run only on device 0? |
|
|