New TF multiGPU apps deployed (issues fixed)
log in

Advanced search

Message boards : News : New TF multiGPU apps deployed (issues fixed)

Previous · 1 · 2 · 3 · 4 · Next
Author Message
Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 9624 - Posted: 4 Feb 2024, 10:49:02 UTC - in response to Message 9623.
Last modified: 4 Feb 2024, 10:50:31 UTC

The 7 minutes I said was from Boinc, which seems to race ahead with the % complete. In 15 minutes according to stderr.txt, it will be complete. We'll see if it validates. Or perhaps you can run that same task on a known good card of your own. I'd hate to think it's giving tasks back which validate but are wrong. since it's generating no heat and has 0% usage in MSI Afterburner, it's definitely not doing calculations. And there's no CPU usage for that task either (in Boinc or Windows Task Manager).

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7445
Credit: 42,730,867
RAC: 0
Message 9625 - Posted: 4 Feb 2024, 10:51:39 UTC - in response to Message 9624.

The 7 minutes I said was from Boinc, which seems to race ahead with the % complete. In 15 minutes according to stderr.txt, it will be complete. We'll see if it validates. Or perhaps you can run that same task on a known good card of your own. I'd hate to think it's giving tasks back which validate but are wrong.


I dont have a good card. A standalone test is the best option to test both cards and track down the issue.

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 9626 - Posted: 4 Feb 2024, 10:55:43 UTC - in response to Message 9625.

The 7 minutes I said was from Boinc, which seems to race ahead with the % complete. In 15 minutes according to stderr.txt, it will be complete. We'll see if it validates. Or perhaps you can run that same task on a known good card of your own. I'd hate to think it's giving tasks back which validate but are wrong.


I dont have a good card. A standalone test is the best option to test both cards and track down the issue.
Can you make them both run the same task?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7445
Credit: 42,730,867
RAC: 0
Message 9627 - Posted: 4 Feb 2024, 11:03:38 UTC - in response to Message 9626.
Last modified: 4 Feb 2024, 11:04:26 UTC

The 7 minutes I said was from Boinc, which seems to race ahead with the % complete. In 15 minutes according to stderr.txt, it will be complete. We'll see if it validates. Or perhaps you can run that same task on a known good card of your own. I'd hate to think it's giving tasks back which validate but are wrong.


I dont have a good card. A standalone test is the best option to test both cards and track down the issue.
Can you make them both run the same task?


yes but not recommended. The GPU use 99% and CPU is nearly unused. Only a test can help.

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 9628 - Posted: 4 Feb 2024, 11:09:39 UTC - in response to Message 9627.
Last modified: 4 Feb 2024, 11:09:50 UTC

The 7 minutes I said was from Boinc, which seems to race ahead with the % complete. In 15 minutes according to stderr.txt, it will be complete. We'll see if it validates. Or perhaps you can run that same task on a known good card of your own. I'd hate to think it's giving tasks back which validate but are wrong.


I dont have a good card. A standalone test is the best option to test both cards and track down the issue.
Can you make them both run the same task?

yes but not recommended. The GPU use 99% and CPU is nearly unused. Only a test can help.

I don't understand what you mean. I wanted to run the same task on both GPUs at once. If the dodgy one gives a different result, there's something up.

This is the finished task, which the server claims passed, but it can't have done if it didn't do calculations: https://srbase.my-firewall.org/sr5/result.php?resultid=141051715

Let me know how to run this test.

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 9629 - Posted: 4 Feb 2024, 11:12:40 UTC

This could be a severe problem if there's "valid" tasks coming back which aren't. If this was happening before this update, on any machines with more than one card, can you track down suspect results and re-run them?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7445
Credit: 42,730,867
RAC: 0
Message 9630 - Posted: 4 Feb 2024, 11:20:59 UTC - in response to Message 9629.

This could be a severe problem if there's "valid" tasks coming back which aren't. If this was happening before this update, on any machines with more than one card, can you track down suspect results and re-run them?


The result was good.

no factor for M590297503 from 2^74 to 2^75 [mfakto 0.15pre7-MGW cl_barrett15_82_gs_2]
tf(): total time spent: 1h 6m 5.398s (141.22 GHz-days / day)

ERROR: get_next_assignment(): no valid assignment found in "worktodo.txt"
2024-02-04 11:04:38 (8628): mfakto.exe exited; CPU time 10.062500
2024-02-04 11:04:38 (8628): called boinc_finish(0)

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7445
Credit: 42,730,867
RAC: 0
Message 9631 - Posted: 4 Feb 2024, 11:22:12 UTC - in response to Message 9628.

The 7 minutes I said was from Boinc, which seems to race ahead with the % complete. In 15 minutes according to stderr.txt, it will be complete. We'll see if it validates. Or perhaps you can run that same task on a known good card of your own. I'd hate to think it's giving tasks back which validate but are wrong.


I dont have a good card. A standalone test is the best option to test both cards and track down the issue.
Can you make them both run the same task?

yes but not recommended. The GPU use 99% and CPU is nearly unused. Only a test can help.

I don't understand what you mean. I wanted to run the same task on both GPUs at once. If the dodgy one gives a different result, there's something up.

This is the finished task, which the server claims passed, but it can't have done if it didn't do calculations: https://srbase.my-firewall.org/sr5/result.php?resultid=141051715

Let me know how to run this test.


I will create a test later today.

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 9632 - Posted: 4 Feb 2024, 11:31:41 UTC - in response to Message 9630.

This could be a severe problem if there's "valid" tasks coming back which aren't. If this was happening before this update, on any machines with more than one card, can you track down suspect results and re-run them?


The result was good.

no factor for M590297503 from 2^74 to 2^75 [mfakto 0.15pre7-MGW cl_barrett15_82_gs_2]
tf(): total time spent: 1h 6m 5.398s (141.22 GHz-days / day)

ERROR: get_next_assignment(): no valid assignment found in "worktodo.txt"
2024-02-04 11:04:38 (8628): mfakto.exe exited; CPU time 10.062500
2024-02-04 11:04:38 (8628): called boinc_finish(0)

It can't have been if the card was idle. It's claiming it didn't find a factor, but it could have been sat doing nothing and lying.

TRINITAS
Send message
Joined: 19 Jan 24
Posts: 12
Credit: 12,570,000
RAC: 0
Message 9633 - Posted: 4 Feb 2024, 11:33:28 UTC

I also have the same problem with my 2 Radeon RX 7900 GRE, and my Radeon PRO Duo (On another config)

Mr P Hucker
Avatar
Send message
Joined: 30 Sep 17
Posts: 36
Credit: 16,105,684
RAC: 0
Message 9634 - Posted: 4 Feb 2024, 11:33:49 UTC

I've figured it out.
They're both running on the first card. If I pause the one alledgedly on the second card, the memory usage on the first card drops.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7445
Credit: 42,730,867
RAC: 0
Message 9635 - Posted: 4 Feb 2024, 15:04:21 UTC
Last modified: 4 Feb 2024, 15:17:06 UTC

wrapper test - standalone-windows (AMD only)

1. download zip-file
2. extract somewhere outside BOINC
3. run each wrapper file wrapper_26018_windows_x86_64.exe at the same time, in this case its for device 0 and 1
4. check the GPU usage on each card

If you want to rerun a test you need to recopy worktodo.txt because it will be deleted after a test is done

TRINITAS
Send message
Joined: 19 Jan 24
Posts: 12
Credit: 12,570,000
RAC: 0
Message 9636 - Posted: 4 Feb 2024, 15:42:21 UTC

I opened the file twice together, it works like under BOINC: GPU0 is at 100% and GPU1 is at 0

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7445
Credit: 42,730,867
RAC: 0
Message 9637 - Posted: 4 Feb 2024, 15:46:25 UTC - in response to Message 9636.
Last modified: 4 Feb 2024, 15:47:33 UTC

I opened the file twice together, it works like under BOINC: GPU0 is at 100% and GPU1 is at 0


hmm, any output. I have updated the zip file due a change, whats the input of job.xml?

TRINITAS
Send message
Joined: 19 Jan 24
Posts: 12
Credit: 12,570,000
RAC: 0
Message 9638 - Posted: 4 Feb 2024, 16:41:45 UTC - in response to Message 9637.

<job_desc>
<task>
<application>mfakto.exe</application>
<command_line>-d 1</command_line>
</task>
<unzip_input>
<zipfilename>mfakto-win-v7.zip</zipfilename>
</unzip_input>

</job_desc>

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7445
Credit: 42,730,867
RAC: 0
Message 9640 - Posted: 4 Feb 2024, 17:04:04 UTC - in response to Message 9638.
Last modified: 4 Feb 2024, 17:04:40 UTC

[quote]<job_desc> <task> <application>mfakto.exe</application> <command_line>-d 1</command_line> </task> <unzip_input> <zipfilename>mfakto-win-v7.zip</zipfilename> </unzip_input> </job_desc>[/quote]


this was wrong, was changed in the new zipfile from -d 1 to --device 1, the same for --device 0

TRINITAS
Send message
Joined: 19 Jan 24
Posts: 12
Credit: 12,570,000
RAC: 0
Message 9642 - Posted: 4 Feb 2024, 18:37:57 UTC - in response to Message 9640.

When I put --device 0 and --device 1 in GPU1 and GPU2 respectively, it doesn't open.

TRINITAS
Send message
Joined: 19 Jan 24
Posts: 12
Credit: 12,570,000
RAC: 0
Message 9643 - Posted: 4 Feb 2024, 18:41:14 UTC

In my opinion, on BOINC, the 2nd GPU works for 2 WU. Your software does not seem to entrust a WU to 2 GPUs, but to entrust 2 WUs to a GPU.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7445
Credit: 42,730,867
RAC: 0
Message 9644 - Posted: 4 Feb 2024, 19:07:27 UTC - in response to Message 9642.

When I put --device 0 and --device 1 in GPU1 and GPU2 respectively, it doesn't open.


Whats the content of job.xml now?

TRINITAS
Send message
Joined: 19 Jan 24
Posts: 12
Credit: 12,570,000
RAC: 0
Message 9645 - Posted: 4 Feb 2024, 20:20:31 UTC - in response to Message 9644.

<job_desc>
<task>
<application>mfakto.exe</application>
<command_line>--device 0</command_line>
</task>
<unzip_input>
<zipfilename>mfakto-win-v7.zip</zipfilename>
</unzip_input>
</job_desc>

Previous · 1 · 2 · 3 · 4 · Next
Post to thread

Message boards : News : New TF multiGPU apps deployed (issues fixed)


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther