Error on long - but it finished the llr
log in

Advanced search

Message boards : Number crunching : Error on long - but it finished the llr

Author Message
Buster Gunn
Avatar
Send message
Joined: 2 Dec 14
Posts: 10
Credit: 3,658,264
RAC: 0
Message 612 - Posted: 12 Jan 2015, 10:26:31 UTC

http://srbase.myfirewall.org/sr5/result.php?resultid=3035957

This errored out but why?
____________

Crystal Pellet
Send message
Joined: 30 Nov 14
Posts: 9
Credit: 5,351,956
RAC: 78
Message 614 - Posted: 12 Jan 2015, 12:35:18 UTC - in response to Message 612.

The task is aborted by BOINC, because after creating the finish file, the process was still alive after 10 seconds.

In normal circumstances after the finish the cleaning up etc. is done in a fraction of a second and the process exits.

Buster Gunn
Avatar
Send message
Joined: 2 Dec 14
Posts: 10
Credit: 3,658,264
RAC: 0
Message 615 - Posted: 12 Jan 2015, 13:25:37 UTC - in response to Message 614.

Seems like BOINC should be a little more patient.
____________

Neo
Send message
Joined: 28 Dec 14
Posts: 18
Credit: 299,879
RAC: 0
Message 637 - Posted: 15 Jan 2015, 1:07:15 UTC

I had a similar problem with:

Work Units:
S185_800-900K_wu_4908_0_0
S185_800-900K_wu_4976_0_0

Crunched 100%... System not acknowledging receipt of the workunits, BOINC can't send them to SR Base.... Been done for 10+ hours now. Tried resending the w/u's numerous times over the 10+ hour stretch.
There is no error from my system; it's DOWNCLOCKED, not overclocked..

Been crunching Primegrid because BOINC isn't getting w/u's from SR Base.

:(

Neo
AtP

Neo
Send message
Joined: 28 Dec 14
Posts: 18
Credit: 299,879
RAC: 0
Message 638 - Posted: 15 Jan 2015, 1:12:00 UTC - in response to Message 637.

I just checked the BOINC event log..

It says (after identifying the w/u) "Transient HTTP error"


Neo
AtP

Profile Coleslaw
Avatar
Send message
Joined: 1 Dec 14
Posts: 8
Credit: 3,134,919
RAC: 244
Message 639 - Posted: 15 Jan 2015, 2:35:37 UTC
Last modified: 15 Jan 2015, 2:38:58 UTC

Neo, are you:
1. Using a proxy
2. What security software/firewall are you using? http://boinc.berkeley.edu/dev/forum_thread.php?id=7470&postid=43828#43828
3. having troubles with any other project?
4. trying to use a cc_config with any flags in it? (If using a proxy, I would suggest adding <http_1_0>1</http_1_0> if you haven't already)
____________

Profile Michael Goetz
Avatar
Send message
Joined: 1 Jan 15
Posts: 18
Credit: 303,916
RAC: 0
Message 640 - Posted: 15 Jan 2015, 9:25:32 UTC - in response to Message 638.

I just checked the BOINC event log..

It says (after identifying the w/u) "Transient HTTP error"


Neo
AtP


This sounds similar to a problem that's been observed at PrimeGrid and is actually a problem that will affect ALL BOINC projects.

Do you use AVG antivirus? If so, you must DISABLE its "Identify theft" protection. This part of AVG intermittently thinks the network communications with a BOINC server is suspicious and blocks it, resulting in transient HTTP errors. You can't block this by directory; you have to completely disable this feature.

Neo
Send message
Joined: 28 Dec 14
Posts: 18
Credit: 299,879
RAC: 0
Message 641 - Posted: 15 Jan 2015, 13:13:55 UTC - in response to Message 640.
Last modified: 15 Jan 2015, 13:20:56 UTC

This sounds similar to a problem that's been observed at PrimeGrid and is actually a problem that will affect ALL BOINC projects.

Do you use AVG antivirus? If so, you must DISABLE its "Identify theft" protection. This part of AVG intermittently thinks the network communications with a BOINC server is suspicious and blocks it, resulting in transient HTTP errors. You can't block this by directory; you have to completely disable this feature.



Thanks guys for your responses...

I am not using a proxy. I have no problem receiving workunits from SRBase nor Primegrid, and I've crunched a fair number of SRbase workunits without problem, before and after Rebirther started keeping stats.

I'm not using AVG. I use Microsoft Security Essentials.

The only thing I can think of is that I upgraded BOINC to the newest version on the rig that had these two errors. However, I've returned a decent number of w/u's to SRbase after this upgrade... I only upgraded to get a more accurate estimated time of completion..

I do have a modified config file but the only thing I did was set BOINC to use my two GPUS... nothing else. I got my instructions on how to do that from Gary Craig.

With respect to those two workunits (which were LONG Btw ;) ... BOINC wouldn't let me abort them to clear them out of BOINC... I had to "reset project" for SRBase to get BOINC to clear them out.

A bigger issue is that I have set my preferences for "Resource Shares" to 99% for SRBase and 1% for Primegrid. (this was back when Rebirther was having trouble keeping the server full of work, so I figured I would get all I could from SR Base and crunch Primegrid until Rebirther got the server reloaded)... Both of my systems will crunch Primegrid, and continue to crunch Primegrid despite this resource share allocation, and despite SRBase server having work to send, unless I say "No New Tasks" for Primegrid in BOINC.

My initial thought was Rebirther's internet connection was just over burdened by sending and receiving a bunch of short workunits... but I tried over and over during a 10 hour span...

Woke up this morning and both of my rigs are crunching Primegrid Mega Proths... I have to goto work soon... when I get home I will force my rigs to grab SRBase w/u's and see if this problem happens again.

Neo
AtP

Profile Coleslaw
Avatar
Send message
Joined: 1 Dec 14
Posts: 8
Credit: 3,134,919
RAC: 244
Message 642 - Posted: 15 Jan 2015, 14:44:40 UTC - in response to Message 641.

If you still have the issue when you get home, could you try adding <http_1_0>1</http_1_0> to your cc_config file and see if it helps. I know it is normally for proxies, but it can't hurt to try. :D Keep us posted as the more info we can get on this the better for others in the future. You may also want to state which version you went from and which version you went to specifically including whether they were 32bit or 64bit versions in case there happens to be a correlation.
____________

Profile Michael Goetz
Avatar
Send message
Joined: 1 Jan 15
Posts: 18
Credit: 303,916
RAC: 0
Message 643 - Posted: 15 Jan 2015, 15:43:51 UTC

If you want SRBase as your main project, and want PrimeGrid as a "backup" project, don't do 99% and 1%.

Do 100% and 0%.

If PG has 1%, then BOINC will use its internal logic to decide whether it's done the 1% or not, and the calculation is complex, non-intuitive, and unreliable. On your computer, for some arcane reason, BOINC thinks PG hasn't done 1% yet.

When a project is set to 0%, that's a special setting that means "Only get work from this project when no work is available from any other project." This is precisely what you're trying to do.

FYI, should you happen to use PRPNet, it works exactly the same way. "0%" indicates a backup port that should only be used when the other ports have no work.

Neo
Send message
Joined: 28 Dec 14
Posts: 18
Credit: 299,879
RAC: 0
Message 654 - Posted: 16 Jan 2015, 3:54:10 UTC

MAJOR UPDATE: :)

Ok... got home from work and still no SRBase w/u's downloaded....

So, I reset my modem and router, and bingo! I was able to download SRbase work units...

I have "Dishnet" or "hughesnet" (satellite internet) because I live so far away from civilization...

I hope this information helps others.. I don't know for the life of me why that makes a difference, but it apparently did. After reboot of both, I was immediately able to start crunching SRbase w/u's. :) Go me!

Go AtP!

Neo


Post to thread

Message boards : Number crunching : Error on long - but it finished the llr


Main page · Your account · Message boards


Copyright © 2014-2018 BOINC Confederation / rebirther