WU keeps running after reaching 100%
log in

Advanced search

Message boards : Number crunching : WU keeps running after reaching 100%

Author Message
killerrabbit
Send message
Joined: 22 Jul 20
Posts: 2
Credit: 71,400
RAC: 0
Message 6653 - Posted: 26 Jul 2020, 14:06:23 UTC

So when I am running the Sierpinski Base WU's, when the WU gets to 100% it continues to run indefinitely and never finish, complete or upload.

Anything that I can do?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 6654 - Posted: 26 Jul 2020, 14:14:11 UTC - in response to Message 6653.

So when I am running the Sierpinski Base WU's, when the WU gets to 100% it continues to run indefinitely and never finish, complete or upload.

Anything that I can do?


Check the stderr.txt file in which slot folder the WU is running, the BOINCmanager always showing the wrong runtime if you are not returned some WUs.

You can calculate the runtime with this formula:
time per bit * max bit / 1000 / 60 = min

killerrabbit
Send message
Joined: 22 Jul 20
Posts: 2
Credit: 71,400
RAC: 0
Message 6656 - Posted: 26 Jul 2020, 17:30:01 UTC - in response to Message 6654.

Not sure where to find that file or folder.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 6657 - Posted: 26 Jul 2020, 17:59:23 UTC - in response to Message 6656.
Last modified: 26 Jul 2020, 18:36:54 UTC

Not sure where to find that file or folder.


check WU properties in BOINCmanager and search for stderr.txt on C:

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,275,240,137
RAC: 2,538,193
Message 6896 - Posted: 3 Nov 2020, 10:12:35 UTC

What's the fix to this problem?

I currently have 8 WUs on one host that show 100% completed but the status goes to "Waiting to run".

6 of them are Long work units, so I don't want to lose them :(

If I suspend all other work they stay at 100% but continue counting processing seconds. Nothing else happens... they just carry on counting seconds.
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 6897 - Posted: 3 Nov 2020, 11:01:01 UTC - in response to Message 6896.

What's the fix to this problem?

I currently have 8 WUs on one host that show 100% completed but the status goes to "Waiting to run".

6 of them are Long work units, so I don't want to lose them :(

If I suspend all other work they stay at 100% but continue counting processing seconds. Nothing else happens... they just carry on counting seconds.


Thats a main problem with BOINC if the runtime is changing every time while setup different bases.

You can calculate the correct runtime with a formula, see
http://srbase.my-firewall.org/sr5/forum_thread.php?id=6&postid=698

You can find the entries in the stderr.txt

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,275,240,137
RAC: 2,538,193
Message 6898 - Posted: 3 Nov 2020, 11:37:54 UTC - in response to Message 6897.
Last modified: 3 Nov 2020, 11:44:26 UTC

But how do I get the task out of the "waiting to run" state and finish?

Do I halt all other tasks so that they can continue running?

***Edit: Hmmm, just did that and two of the tasks went to "computation error" :(

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,275,240,137
RAC: 2,538,193
Message 6899 - Posted: 3 Nov 2020, 11:56:10 UTC - in response to Message 6898.

All 8 died with:

<core_client_version>7.16.11</core_client_version> <![CDATA[ <message> Process still present 5 min after writing finish file; aborting</message> <stderr_txt>



See: Error tasks for computer 208801
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 6900 - Posted: 3 Nov 2020, 15:11:03 UTC - in response to Message 6899.
Last modified: 3 Nov 2020, 15:11:22 UTC

All 8 died with:

<core_client_version>7.16.11</core_client_version> <![CDATA[ <message> Process still present 5 min after writing finish file; aborting</message> <stderr_txt>



See: Error tasks for computer 208801


All WUs were already finished and restarted from new. Do you have an antivirus program or firewall which is blocking the output file? Also check in prefs if you have "Leave non-GPU tasks in memory while suspended" enabled.

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,275,240,137
RAC: 2,538,193
Message 6901 - Posted: 3 Nov 2020, 15:55:59 UTC - in response to Message 6900.
Last modified: 3 Nov 2020, 16:06:27 UTC

Nothing stopping communication.

If you check host 208801 you'll see that this host normally does GPU tasks only. There was a problem with the project it normally crunches CPU tasks for yesterday, so I started crunching SRBase CPU work units yesterday.

It processed lots of them before choking on these 8.

So they were only downloaded yesterday and processed very quickly.

SRBase GPU processing continued as normal -- so there was plenty of communication between the server and host.

The only change made after downloading the CPU units was the host location was changed back to "work" to stop any further download of CPU work units.

"Leave non-GPU tasks in memory while suspended" was not enabled, I have enabled this now.
____________

Luigi R.
Avatar
Send message
Joined: 17 Jul 18
Posts: 21
Credit: 35,671,759
RAC: 0
Message 7230 - Posted: 11 Jan 2021, 12:34:10 UTC

It's still happening.


@rebirther: maybe you should avoid to resend these tasks if you are not already doing so. I aborted 45 tasks from my work queue. We have a residue and doublechecking is not our purpose... so it's useless to crunch them again.

I link them here so you can copy residues.
28686860, 28688798, 28685897, 28689205, 28686861, 28685584, 28687739, 28687993, 28688572, 28685593, 28685826, 28688283, 28687983, 28688262, 28688971, 28688241, 28688902, 28687770, 28688223, 28688294, 28685528, 28688293, 28687125, 28685519, 28688222, 28685893, 28688295, 28688299, 28687831, 28688147, 28688472, 28688340, 28688510, 28688235, 28688548, 28688274, 28688377, 28687850, 28688246, 28687838, 28687968, 28686146, 28687155, 28687078, 28686807

If you can't fix that, I could try to implement something to retrieve residues of this kind of errored tasks. It would mean to scrape thousands of html pages though. Let me know.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 7231 - Posted: 11 Jan 2021, 16:25:59 UTC - in response to Message 7230.

It's still happening.


@rebirther: maybe you should avoid to resend these tasks if you are not already doing so. I aborted 45 tasks from my work queue. We have a residue and doublechecking is not our purpose... so it's useless to crunch them again.

I link them here so you can copy residues.
28686860, 28688798, 28685897, 28689205, 28686861, 28685584, 28687739, 28687993, 28688572, 28685593, 28685826, 28688283, 28687983, 28688262, 28688971, 28688241, 28688902, 28687770, 28688223, 28688294, 28685528, 28688293, 28687125, 28685519, 28688222, 28685893, 28688295, 28688299, 28687831, 28688147, 28688472, 28688340, 28688510, 28688235, 28688548, 28688274, 28688377, 28687850, 28688246, 28687838, 28687968, 28686146, 28687155, 28687078, 28686807

If you can't fix that, I could try to implement something to retrieve residues of this kind of errored tasks. It would mean to scrape thousands of html pages though. Let me know.


I have never had a neverending task, maybe something is blocking your results file.

Luigi R.
Avatar
Send message
Joined: 17 Jul 18
Posts: 21
Credit: 35,671,759
RAC: 0
Message 7232 - Posted: 11 Jan 2021, 16:47:16 UTC - in response to Message 7231.

I have never had a neverending task, maybe something is blocking your results file.

It's not my computer to have never ending tasks. I've got (and aborted) resends. I was the wingman.
Those resends are useless imho.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 7233 - Posted: 11 Jan 2021, 18:38:32 UTC - in response to Message 7232.

I have never had a neverending task, maybe something is blocking your results file.

It's not my computer to have never ending tasks. I've got (and aborted) resends. I was the wingman.
Those resends are useless imho.


We have some doublechecking but only for a few WUs. I found another side effect of the crash where the first WU was finished and a second was sent out.


Post to thread

Message boards : Number crunching : WU keeps running after reaching 100%


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther