Invalid WUs that aren't invalid
log in

Advanced search

Message boards : Number crunching : Invalid WUs that aren't invalid

Author Message
Ken_g6
Send message
Joined: 1 Jan 15
Posts: 3
Credit: 3,943,093
RAC: 16,327
Message 5259 - Posted: 7 Jun 2019, 5:27:01 UTC
Last modified: 7 Jun 2019, 5:27:30 UTC

Here's a stderr:

01:52:08 (4058): wrapper (7.2.26012): starting 01:52:08 (4058): wrapper: running llr64 ( -d -oPgenInputFile=input.prp -oPgenOutputFile=primes.txt -oDiskWriteTime=10 -oOutputIterations=50000 -oResultsFileIterations=99999999) 12:12:26 (2180): wrapper (7.2.26012): starting 12:12:26 (2180): wrapper: running llr64 ( -d -oPgenInputFile=input.prp -oPgenOutputFile=primes.txt -oDiskWriteTime=10 -oOutputIterations=50000 -oResultsFileIterations=99999999) SIGSEGV: segmentation violation Stack trace (11 frames): ../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu(boinc_catch_signal+0x65)[0x41fa15] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11670)[0x7fbe3ef15670] ../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x464351] ../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x45f554] /lib/x86_64-linux-gnu/libc.so.6(+0x357f0)[0x7fbe3eb727f0] /lib/x86_64-linux-gnu/libc.so.6(nanosleep+0x2d)[0x7fbe3ec0a2ed] /lib/x86_64-linux-gnu/libc.so.6(usleep+0x34)[0x7fbe3ec3c334] ../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x433ca3] ../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x407ddd] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fbe3eb5d3f1] ../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x404f59] Exiting... 17:00:48 (3539): wrapper (7.2.26012): starting 17:00:48 (3539): wrapper: running llr64 ( -d -oPgenInputFile=input.prp -oPgenOutputFile=primes.txt -oDiskWriteTime=10 -oOutputIterations=50000 -oResultsFileIterations=99999999 -t4) 20:38:46 (3539): llr64 exited; CPU time 50504.952000 20:38:46 (3539): called boinc_finish


See, there was a SIGSEGV. I think that happened when the WU was suspended. But then the computation was started over. That time it succeeded.

Can you fix this?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 5260 - Posted: 7 Jun 2019, 5:39:49 UTC - in response to Message 5259.

The error can produce bad results thats why the validator handled this as invalid.

Ken_g6
Send message
Joined: 1 Jan 15
Posts: 3
Credit: 3,943,093
RAC: 16,327
Message 5261 - Posted: 7 Jun 2019, 6:02:44 UTC - in response to Message 5260.
Last modified: 7 Jun 2019, 6:03:53 UTC

Alright, how about fixing it in the wrapper? If the number of SIGSEGV's in stderr.txt is greater than the number of RESTARTs, restart LLR (delete the z-file) and write "RESTARTing" lines until the number of SIGSEGV's in stderr.txt equals the number of RESTARTs. Edit: Or, more simply, just abort the WU when there's a SIGSEGV in stderr.txt.

Why are you using a different wrapper than PrimeGrid, anyway? (Doing the above would be a reason, but not a reason for not including a lot of features from said wrapper.)

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 5262 - Posted: 7 Jun 2019, 6:07:56 UTC - in response to Message 5261.

Alright, how about fixing it in the wrapper? If the number of SIGSEGV's in stderr.txt is greater than the number of RESTARTs, restart LLR (delete the z-file) and write "RESTARTing" lines until the number of SIGSEGV's in stderr.txt equals the number of RESTARTs. Edit: Or, more simply, just abort the WU when there's a SIGSEGV in stderr.txt.

Why are you using a different wrapper than PrimeGrid, anyway? (Doing the above would be a reason, but not a reason for not including a lot of features from said wrapper.)


Iam using an older wrapper from primegrid. The latest wrapper is hardcoded for primegrid.

crashtech
Send message
Joined: 10 Apr 19
Posts: 28
Credit: 466,952,134
RAC: 6,308,462
Message 5264 - Posted: 7 Jun 2019, 14:44:24 UTC

If we have work in our queues, is there a quick way to determine which of them will produce these errors, so they can be aborted before being allowed to waste CPU cycles?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 5265 - Posted: 7 Jun 2019, 15:14:41 UTC - in response to Message 5264.

If we have work in our queues, is there a quick way to determine which of them will produce these errors, so they can be aborted before being allowed to waste CPU cycles?


No.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 5267 - Posted: 7 Jun 2019, 15:48:39 UTC

SIGSEGV: segmentation violation


ok, I have excluded this error for now in the validator (srbase2 app). Will check the results later.

biodoc
Send message
Joined: 6 May 18
Posts: 1
Credit: 320,114,250
RAC: 0
Message 5270 - Posted: 7 Jun 2019, 18:22:31 UTC
Last modified: 7 Jun 2019, 18:22:52 UTC

I have 42 tasks marked as invalid due to SIGSEGB: segmentation violation.

http://srbase.my-firewall.org/sr5/results.php?userid=1833&offset=0&show_names=0&state=5&appid=
I'm new to the project so I added the project on 4 computers and long 0.22 tasks were downloaded and started running in single thread mode. I added the proper app_config.xml file for running the app in mt mode and then restarted boinc for all 4 computers. Apparently tasks labeled as "waiting to run" were eventually processed and then were marked as invalid. I guess I should have aborted the tasks after restarting boinc and downloaded new ones.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 5271 - Posted: 7 Jun 2019, 18:25:39 UTC

I have removed all segmentation violation from validator. All tasks should be now valid.

crashtech
Send message
Joined: 10 Apr 19
Posts: 28
Credit: 466,952,134
RAC: 6,308,462
Message 5272 - Posted: 7 Jun 2019, 18:31:08 UTC - in response to Message 5271.

@rebirther , thank you very much for your help!


Post to thread

Message boards : Number crunching : Invalid WUs that aren't invalid


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther