Posts by PDW
log in
1) Message boards : Number crunching : Long 2 are a joke (Message 5178)
Posted 4 May 2019 by Profile PDW
RUNNING WITH 4 CPUs

had to restart boinc, not just re-read that app file

So if you had got your app_config.xml file right, what you did earlier when you re-started Boinc would have seen everything work as expected ?
2) Message boards : Number crunching : Long 2 are a joke (Message 5168)
Posted 4 May 2019 by Profile PDW
See this thread for more info about checkpoints: http://srbase.my-firewall.org/sr5/forum_thread.php?id=1129

Can you post the contents of your app_config.xml file please.
[I know you linked to one you found but want to see what you actually used.]
3) Message boards : Number crunching : Long 2 are a joke (Message 5166)
Posted 3 May 2019 by Profile PDW
I had five long2 running for just over 2 days on my i9-7900X with 5 hours remaining. After reading about allowing "mt" I picked up that app_config.xml and put it into my srbase project directory. I then requested a "read" of the xml. OK, it was read but no extra CPUs were assigned. Still using 1 cpu each and I got 20 cpu threads total.

I then surmised that I had to stop and start boinc to make it happen. That did not work either. In fact, it made things MUCH WORSE. It seems there is no checkpointing going on. Those 5 tasks that had run for over 2 days started over at "0". They now show 35 minutes elapsed and 20 hours remaining instead of 2+ days elapsed and 5 hours remaining.

As soon as I post this I will abort them and go back to number fields.

Well that was a waste, looking at the log file clearly shows that checkpoints were happening, in fact they were all over 80% done, the highest was at 83.98% when you aborted it...

Resuming N+1 prime test of 3656*22^4632494-1 at bit 17147137 [83.00%] Using FMA3 FFT length 2016K, Pass1=448, Pass2=4608, a = 3 3656*22^4632494-1, bit: 17150000 / 20658303 [83.01%]. Time per bit: 222.012 ms. 3656*22^4632494-1, bit: 17200000 / 20658303 [83.25%]. Time per bit: 12.610 ms. 3656*22^4632494-1, bit: 17250000 / 20658303 [83.50%]. Time per bit: 12.601 ms. 3656*22^4632494-1, bit: 17300000 / 20658303 [83.74%]. Time per bit: 12.709 ms. 3656*22^4632494-1, bit: 17350000 / 20658303 [83.98%]. Time per bit: 12.622 ms. </stderr_txt>
4) Message boards : Number crunching : Computation Errors (Message 5108)
Posted 20 Apr 2019 by Profile PDW
That's what I had to do, Conan pointed it out here...

http://srbase.my-firewall.org/sr5/forum_thread.php?id=1072&postid=4773#4773
5) Message boards : Number crunching : Computation Errors (Message 5104)
Posted 19 Apr 2019 by Profile PDW
Hi,

Have you tried using an app_config.xml file to limit the number of running tasks to 1 to start with ?

It is unlikely all your 100 processor machines can run up to 25 tasks at a time :)
6) Message boards : Number crunching : My name isn't in TOP 5000 prime Hall of Fame list. (Message 5017)
Posted 7 Mar 2019 by Profile PDW
Did you create an account at http://primes.utm.edu/primes/status.php ?
Did you read the FAQ ?
Did you report the prime ?

I suspect you didn't do that.
7) Message boards : Number crunching : Error while downloading (Message 4934)
Posted 13 Feb 2019 by Profile PDW
Probably this... http://srbase.my-firewall.org/sr5/forum_thread.php?id=877&postid=4924#4924
8) Message boards : Number crunching : Long 2 are a joke (Message 4910)
Posted 2 Feb 2019 by Profile PDW
Mine work using <cmdline>-t4</cmdline>

Wuprop only counts 1 thread though.



That REALLY sucks... I was going to increase thread count to 6 until noticing your comment and now really want to reduce it back to 1 and abort all long WU that don't have 6 days deadline.

One of my main goals it to complete 100,000 hours per work unit at WuProp.

Reduce hours counted by 1/3 on longs will mean it might be 4 years to reach 100,000 hours since WU here are intermittent and I shut down work in the summer to avoid contributing to global warming with A/C since our community still uses coal and natural gas fired power plants.

What Conan (and others) posted saying to add the following line did make the hours all count for Wuprop...

<avg_ncpus>4</avg_ncpus>

Just make the number the same as your <cmdline>-t4</cmdline> number.
9) Message boards : Number crunching : SR-base overheat !? (Message 4890)
Posted 21 Jan 2019 by Profile PDW
SRBase is hot, best run in the cold winter months to keep you warm.

If your cooling is struggling, start with 1 thread for SRBase and increase the number of threads running SRBase until you get to a heat level you are comfortable to maintain 24/7.
10) Message boards : Number crunching : Long 2 are a joke (Message 4766)
Posted 21 Nov 2018 by Profile PDW
Mine work using <cmdline>-t4</cmdline>

Wuprop only counts 1 thread though.
11) Message boards : Number crunching : Long 2 are a joke (Message 4761)
Posted 19 Nov 2018 by Profile PDW
It is in the FAQ thread here http://srbase.my-firewall.org/sr5/forum_thread.php?id=6
12) Message boards : Number crunching : Cross-project ID problem (Message 4301)
Posted 11 Apr 2018 by Profile PDW
Has it always been split or recently started ?
If only recent did you add another project on only one of your PCs ?

Perhaps look here https://boinc.berkeley.edu/dev/forum_thread.php?id=8341 to see if anything is helpful !

This is a Boinc thing, the admin here will not be able to fix it.
13) Message boards : Number crunching : WU reporting time delayed (Message 4300)
Posted 11 Apr 2018 by Profile PDW
What about this one http://srbase.my-firewall.org/sr5/workunit.php?wuid=309015299

It was completed and validated on 10 Apr 2018, 18:01:16 UTC, wouldn't that normally mean the one sent to my machine would be cancelled ?
14) Message boards : Number crunching : Cross-project ID problem (Message 4298)
Posted 11 Apr 2018 by Profile PDW
Have you checked that you are using the same email address for SRBase as the other projects ?
15) Message boards : Number crunching : WU reporting time delayed (Message 4296)
Posted 10 Apr 2018 by Profile PDW
Yes, thats odd and I dont know how this happened, if your result was reported before the deadline then it should be valid but you was one day too late.

Okay, no worries, lesson learnt, I will abort any tasks that won't complete before the deadline rather than let them run and waste resources.
16) Message boards : Number crunching : WU reporting time delayed (Message 4292)
Posted 10 Apr 2018 by Profile PDW
http://srbase.my-firewall.org/sr5/workunit.php?wuid=309026922

I started this WU yesterday shortly before the server deadline [9 Apr 2018, 15:38:05 UTC]. I checked again on the server after the deadline and the other computer had still not reported a result so I decided to let it run.

This morning I check and find mine has reported in but is marked as Completed, too late to validate. I assume this is because the other computer has reported their result but it has a report date of 2 Apr 2018, 9:56:40 UTC, but that result from a week ago has appeared overnight !

How can this happen ?
17) Message boards : Number crunching : Work available?? (Message 4168)
Posted 1 Feb 2018 by Profile PDW
The host was blacklisted by me some days ago, I see all the files which are bad and need to rerun.

I understand the action and remediation you have taken for that host.
My concern is whether it can happen again with another host ?
18) Message boards : Number crunching : Work available?? (Message 4166)
Posted 1 Feb 2018 by Profile PDW
i want more work xD


A new host created some days ago reported a lot of errors (around 100k results are useless) and need to be rerun soon. The validator cant intercept these bad results due the OS which is running there. I set it on the blacklist now.


Stumbled here because of that host, and its user "abcman", and the job logs.

It looks like they created a script to create fake XML result files. The "called boinc_finish" was missing the return code 0 indicator like from a real result.

Hopefully you can find a way to check results a little better before that user changes their host ID and gets back in for another batch. They earned 350 GRC in 36 hours off of this hole.

So if the script was to add the return code 0 indicator they would not get spotted at all or would they eventually get caught in a spot check ?

Are Gridcoin planning on blacklisting the project until this is fixed or will it still allow GRC to be paid for not doing the work ?




Main page · Your account · Message boards


Copyright © 2014-2020 BOINC Confederation / rebirther