Developers- given any thought to fixing the TF application from failing if not allowed to run non-stop?
log in

Advanced search

Message boards : Number crunching : Developers- given any thought to fixing the TF application from failing if not allowed to run non-stop?

Author Message
Keith Myers
Avatar
Send message
Joined: 15 Jul 24
Posts: 2
Credit: 260,438,060
RAC: 5,140,582
Message 10860 - Posted: 11 Jul 2025, 23:32:53 UTC

Developers- given any thought to fixing the TF application from failing if not allowed to run non-stop?

The application always fails if the running work unit is stopped and resumed.

Annoying, would like it fixed if possible.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7830
Credit: 44,534,674
RAC: 101
Message 10861 - Posted: 12 Jul 2025, 5:03:26 UTC

I have never seen this issue. Do you have an example? Which hardware, OS, BOINC version?

Keith Myers
Avatar
Send message
Joined: 15 Jul 24
Posts: 2
Credit: 260,438,060
RAC: 5,140,582
Message 10865 - Posted: 15 Jul 2025, 1:08:53 UTC
Last modified: 15 Jul 2025, 1:17:33 UTC

This host was stopped for a reboot to update OS software while TF tasks were in progress for example. Just the latest example and the latest host to have had Boinc interrupted. I normally don't try and stop Boinc for any reason to have tasks finish uninterrupted on all my hosts. Your app is not the only one to have this flaw. Gaia@home is another project which can't have tasks stopped or upon restart will show instant computation error.

This host also runs GPUGrid and some of their apps have the same issue where it is a given that you are going to throw away a running compututation if it is stopped midstream and not let complete with no interruption.

https://srbase.my-firewall.org/sr5/results.php?hostid=236361&offset=0&show_names=0&state=6&appid=

Stderr output
<core_client_version>8.3.0</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
2025-07-13 08:25:34 (82403): wrapper (7.24.26018): starting
2025-07-13 08:25:34 (82403): wrapper (7.24.26018): starting
2025-07-13 08:25:34 (82403): wrapper: running ./mfaktc.exe (-d 1)
2025-07-13 08:25:34 (82403): wrapper: created child process 82405
2025-07-13 08:33:13 (82403): ./mfaktc.exe exited; CPU time 1.990786
2025-07-13 08:33:13 (82403): app exit status: 0x1
2025-07-13 08:33:13 (82403): called boinc_finish(195)

</stderr_txt>
]]>


OS is Ubuntu 24.04.2 LTS
Boinc is version 8.30

Epyc 7713 cpu running on a Asrock Rack ROMED8-2T motherboard with 128GB of DDR4-3200 ECC RDIMMS.

Suspect it is due to a common fault with some gpu apps which which will fail upon restart when the restart causes the task to run on a different gpu that the tasks was originally started on. GPUGrid apps being the most common example.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7830
Credit: 44,534,674
RAC: 101
Message 10866 - Posted: 15 Jul 2025, 11:36:23 UTC
Last modified: 15 Jul 2025, 11:41:02 UTC

BOINC cannot leave GPU tasks in memory while suspended. The app itself is creating a checkpoint file every 1min as long as long as one was written. Before you update an OS (in this case ubuntu) it is always better to stop BOINC before you do that. Ubuntu is always asking to update. This will avoid any issues.


Suspect it is due to a common fault with some gpu apps which will fail upon restart when the restart causes the task to run on a different gpu that the tasks was originally started on. GPUGrid apps being the most common example.


That is equal, you can restart the same task on a different GPU.


Post to thread

Message boards : Number crunching : Developers- given any thought to fixing the TF application from failing if not allowed to run non-stop?


Main page · Your account · Message boards


Copyright © 2014-2025 BOINC Confederation / rebirther