log in |
Message boards : Number crunching : Invalid WUs that aren't invalid
Author | Message |
---|---|
Here's a stderr: 01:52:08 (4058): wrapper (7.2.26012): starting
01:52:08 (4058): wrapper: running llr64 ( -d -oPgenInputFile=input.prp -oPgenOutputFile=primes.txt -oDiskWriteTime=10 -oOutputIterations=50000 -oResultsFileIterations=99999999)
12:12:26 (2180): wrapper (7.2.26012): starting
12:12:26 (2180): wrapper: running llr64 ( -d -oPgenInputFile=input.prp -oPgenOutputFile=primes.txt -oDiskWriteTime=10 -oOutputIterations=50000 -oResultsFileIterations=99999999)
SIGSEGV: segmentation violation
Stack trace (11 frames):
../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu(boinc_catch_signal+0x65)[0x41fa15]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11670)[0x7fbe3ef15670]
../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x464351]
../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x45f554]
/lib/x86_64-linux-gnu/libc.so.6(+0x357f0)[0x7fbe3eb727f0]
/lib/x86_64-linux-gnu/libc.so.6(nanosleep+0x2d)[0x7fbe3ec0a2ed]
/lib/x86_64-linux-gnu/libc.so.6(usleep+0x34)[0x7fbe3ec3c334]
../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x433ca3]
../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x407ddd]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fbe3eb5d3f1]
../../projects/srbase.my-firewall.org_sr5/wrapper_26012-v2_x86_64-pc-linux-gnu[0x404f59]
Exiting...
17:00:48 (3539): wrapper (7.2.26012): starting
17:00:48 (3539): wrapper: running llr64 ( -d -oPgenInputFile=input.prp -oPgenOutputFile=primes.txt -oDiskWriteTime=10 -oOutputIterations=50000 -oResultsFileIterations=99999999 -t4)
20:38:46 (3539): llr64 exited; CPU time 50504.952000
20:38:46 (3539): called boinc_finish See, there was a SIGSEGV. I think that happened when the WU was suspended. But then the computation was started over. That time it succeeded. Can you fix this? | |
ID: 5259 · Rating: 0 · rate:
![]() ![]() ![]() | |
The error can produce bad results thats why the validator handled this as invalid. | |
ID: 5260 · Rating: 0 · rate:
![]() ![]() ![]() | |
Alright, how about fixing it in the wrapper? If the number of SIGSEGV's in stderr.txt is greater than the number of RESTARTs, restart LLR (delete the z-file) and write "RESTARTing" lines until the number of SIGSEGV's in stderr.txt equals the number of RESTARTs. Edit: Or, more simply, just abort the WU when there's a SIGSEGV in stderr.txt. | |
ID: 5261 · Rating: 0 · rate:
![]() ![]() ![]() | |
Alright, how about fixing it in the wrapper? If the number of SIGSEGV's in stderr.txt is greater than the number of RESTARTs, restart LLR (delete the z-file) and write "RESTARTing" lines until the number of SIGSEGV's in stderr.txt equals the number of RESTARTs. Edit: Or, more simply, just abort the WU when there's a SIGSEGV in stderr.txt. Iam using an older wrapper from primegrid. The latest wrapper is hardcoded for primegrid. | |
ID: 5262 · Rating: 0 · rate:
![]() ![]() ![]() | |
If we have work in our queues, is there a quick way to determine which of them will produce these errors, so they can be aborted before being allowed to waste CPU cycles? | |
ID: 5264 · Rating: 0 · rate:
![]() ![]() ![]() | |
If we have work in our queues, is there a quick way to determine which of them will produce these errors, so they can be aborted before being allowed to waste CPU cycles? No. | |
ID: 5265 · Rating: 0 · rate:
![]() ![]() ![]() | |
SIGSEGV: segmentation violation ok, I have excluded this error for now in the validator (srbase2 app). Will check the results later. | |
ID: 5267 · Rating: 0 · rate:
![]() ![]() ![]() | |
I have 42 tasks marked as invalid due to SIGSEGB: segmentation violation. | |
ID: 5270 · Rating: 0 · rate:
![]() ![]() ![]() | |
I have removed all segmentation violation from validator. All tasks should be now valid. | |
ID: 5271 · Rating: 0 · rate:
![]() ![]() ![]() | |
@rebirther , thank you very much for your help! | |
ID: 5272 · Rating: 0 · rate:
![]() ![]() ![]() | |
Message boards :
Number crunching :
Invalid WUs that aren't invalid