Author |
Message |
|
So when I am running the Sierpinski Base WU's, when the WU gets to 100% it continues to run indefinitely and never finish, complete or upload.
Anything that I can do? |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7479 Credit: 43,651,567 RAC: 40,309 |
So when I am running the Sierpinski Base WU's, when the WU gets to 100% it continues to run indefinitely and never finish, complete or upload.
Anything that I can do?
Check the stderr.txt file in which slot folder the WU is running, the BOINCmanager always showing the wrong runtime if you are not returned some WUs.
You can calculate the runtime with this formula:
time per bit * max bit / 1000 / 60 = min |
|
|
|
Not sure where to find that file or folder. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7479 Credit: 43,651,567 RAC: 40,309 |
Not sure where to find that file or folder.
check WU properties in BOINCmanager and search for stderr.txt on C: |
|
|
|
What's the fix to this problem?
I currently have 8 WUs on one host that show 100% completed but the status goes to "Waiting to run".
6 of them are Long work units, so I don't want to lose them :(
If I suspend all other work they stay at 100% but continue counting processing seconds. Nothing else happens... they just carry on counting seconds.
____________
|
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7479 Credit: 43,651,567 RAC: 40,309 |
What's the fix to this problem?
I currently have 8 WUs on one host that show 100% completed but the status goes to "Waiting to run".
6 of them are Long work units, so I don't want to lose them :(
If I suspend all other work they stay at 100% but continue counting processing seconds. Nothing else happens... they just carry on counting seconds.
Thats a main problem with BOINC if the runtime is changing every time while setup different bases.
You can calculate the correct runtime with a formula, see
http://srbase.my-firewall.org/sr5/forum_thread.php?id=6&postid=698
You can find the entries in the stderr.txt |
|
|
|
But how do I get the task out of the "waiting to run" state and finish?
Do I halt all other tasks so that they can continue running?
***Edit: Hmmm, just did that and two of the tasks went to "computation error" :( |
|
|
|
All 8 died with:
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Process still present 5 min after writing finish file; aborting</message>
<stderr_txt>
See: Error tasks for computer 208801
____________
|
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7479 Credit: 43,651,567 RAC: 40,309 |
All 8 died with:
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Process still present 5 min after writing finish file; aborting</message>
<stderr_txt>
See: Error tasks for computer 208801
All WUs were already finished and restarted from new. Do you have an antivirus program or firewall which is blocking the output file? Also check in prefs if you have "Leave non-GPU tasks in memory while suspended" enabled. |
|
|
|
Nothing stopping communication.
If you check host 208801 you'll see that this host normally does GPU tasks only. There was a problem with the project it normally crunches CPU tasks for yesterday, so I started crunching SRBase CPU work units yesterday.
It processed lots of them before choking on these 8.
So they were only downloaded yesterday and processed very quickly.
SRBase GPU processing continued as normal -- so there was plenty of communication between the server and host.
The only change made after downloading the CPU units was the host location was changed back to "work" to stop any further download of CPU work units.
"Leave non-GPU tasks in memory while suspended" was not enabled, I have enabled this now.
____________
|
|
|
|
It's still happening.
@rebirther: maybe you should avoid to resend these tasks if you are not already doing so. I aborted 45 tasks from my work queue. We have a residue and doublechecking is not our purpose... so it's useless to crunch them again.
I link them here so you can copy residues.
28686860, 28688798, 28685897, 28689205, 28686861, 28685584, 28687739, 28687993, 28688572, 28685593, 28685826, 28688283, 28687983, 28688262, 28688971, 28688241, 28688902, 28687770, 28688223, 28688294, 28685528, 28688293, 28687125, 28685519, 28688222, 28685893, 28688295, 28688299, 28687831, 28688147, 28688472, 28688340, 28688510, 28688235, 28688548, 28688274, 28688377, 28687850, 28688246, 28687838, 28687968, 28686146, 28687155, 28687078, 28686807
If you can't fix that, I could try to implement something to retrieve residues of this kind of errored tasks. It would mean to scrape thousands of html pages though. Let me know. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7479 Credit: 43,651,567 RAC: 40,309 |
It's still happening.
@rebirther: maybe you should avoid to resend these tasks if you are not already doing so. I aborted 45 tasks from my work queue. We have a residue and doublechecking is not our purpose... so it's useless to crunch them again.
I link them here so you can copy residues.
28686860, 28688798, 28685897, 28689205, 28686861, 28685584, 28687739, 28687993, 28688572, 28685593, 28685826, 28688283, 28687983, 28688262, 28688971, 28688241, 28688902, 28687770, 28688223, 28688294, 28685528, 28688293, 28687125, 28685519, 28688222, 28685893, 28688295, 28688299, 28687831, 28688147, 28688472, 28688340, 28688510, 28688235, 28688548, 28688274, 28688377, 28687850, 28688246, 28687838, 28687968, 28686146, 28687155, 28687078, 28686807
If you can't fix that, I could try to implement something to retrieve residues of this kind of errored tasks. It would mean to scrape thousands of html pages though. Let me know.
I have never had a neverending task, maybe something is blocking your results file. |
|
|
|
I have never had a neverending task, maybe something is blocking your results file.
It's not my computer to have never ending tasks. I've got (and aborted) resends. I was the wingman.
Those resends are useless imho. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7479 Credit: 43,651,567 RAC: 40,309 |
I have never had a neverending task, maybe something is blocking your results file.
It's not my computer to have never ending tasks. I've got (and aborted) resends. I was the wingman.
Those resends are useless imho.
We have some doublechecking but only for a few WUs. I found another side effect of the crash where the first WU was finished and a second was sent out. |
|
|