Posts by marsinph
log in
1) Message boards : Number crunching : Is SRBase actually utilizing both GPU's? (Message 7209)
Posted 7 Jan 2021 by Profile marsinph
Hello,
I have two GPU.
WU runs on both. Two WU at same time. One on device 0 and a second on device 1 !
2) Message boards : Number crunching : All my GPU rasks end in error (Message 6859)
Posted 25 Oct 2020 by Profile marsinph
Hello,
all was running fine, but sunce a few hours, all my TF0.12 ends after a few seconds. Always the same error : ERROR: cudaGetLastError() returned 77: an illegal memory access was encountered

Look on
http://srbase.my-firewall.org/sr5/result.php?resultid=22021649

More WU in :
http://srbase.my-firewall.org/sr5/results.php?hostid=110412&offset=0&show_names=0&state=6&appid=

The host runs only SRBase. Nothing else !!!
A project reset change nothing !
3) Message boards : Number crunching : Sudden credit loss (Message 5710)
Posted 23 Feb 2020 by Profile marsinph
Hello Reb,
Thank you very much.
I am sure you will do it (whern you will have time).
You have only two hands, one brain,... 24hours/day...

My team (BOINC.BE) and I appreciate, it very much.

In the past, I was (very) disappointed. But it was in the past. I am sure you remember me !?

After all you do the latest hours , only congratulations and respect.
Not forget to take "pause". To have rest, to sleep a little.
I can not imagine your brain since two days.

Best support from Belgium.

Information. I have start again the WU cazlled "S211_retest_xxx" No problem
I have oone strange , but I let run till end : R31_retest_wu_777_0
After 1day 2hours : in stderr.txt : 46.27% and +/-20ms per bit
on host 100501 (i7-2600K // 3.6Ghz OC to 4.20Ghz
Not important. I let finish, knowing that such WU will give 11.000CR.
Ridiculous considering running time, but so perhapas it will help you
I have unhide my hosts
4) Message boards : Number crunching : Sudden credit loss (Message 5700)
Posted 23 Feb 2020 by Profile marsinph
Hello Reb,
Thank you very much my CR are restored. Well done !

I know you have more urgent to do.
But the CR of my team (cumulated) not correspond with team stats.
Total team stats 4.763.557 (23feb 13:50 UTC)
But cumultaed individual : 4.539K (my own CR user 1791) + 25K (user 2087) + 119K (user 2088) + 306K (user2039 + 225K (user2084)
Totally it would be 5214K


http://srbase.my-firewall.org/sr5/team_members.php?teamid=167&offset=0&sort_by=expavg_credit

So when you will have time, can you check ?
5) Message boards : Number crunching : Sudden credit loss (Message 5686)
Posted 23 Feb 2020 by Profile marsinph
Like everyone, i lost a lot of CR (869K !)
https://www.boincstats.com/stats/157/user/detail/1791/lastDays

Thank you Reb to inform us. I am sure you do your best and much more !.

With hope you can restore the lost CR. Because it changes also all stats and competitions between teams.

Good luck.
6) Message boards : News : Formula BOINC Sprint 07.06.2019 04:00 (UTC) - 10.06.2019 03:59 (UTC) (Message 5276)
Posted 10 Jun 2019 by Profile marsinph
Hello Rebirther
I think the few participation is due to the WU.
Not forget, at start, it were only "long" and "long2" WU, almost impossible
to runch more than 1 serie.
Then users who not look in slot, but only on BoincManager, think that WU are "frozen"
And also a serious bottleneck by multiples Cores, with results "time per bit" explode.
With those two factors only, it can explain why so few returning participants.





A big thx to all who have participated. We have made a good process but was behind my expectations.
7) Message boards : News : Formula BOINC Sprint 07.06.2019 04:00 (UTC) - 10.06.2019 03:59 (UTC) (Message 5273)
Posted 7 Jun 2019 by Profile marsinph
Hello Mmonnin
Thank you.
In fact I already, a long time ago ,have read the post you suggest.
Reducing cores.
It is real benefit.
But now, PRJ release again other serie of WU

You, as me, you are attentive in Sprint.
I crunch very long WU here. But in Sprint it is fully ompossible to have estimation. On the same host, same apps ( Siepinsky Base short 0.22 S163-100-150K
some WU runs in a few minuts, some hurs !!! I never consider Boinc time. I only look stderr.txt in slot and "Time per bit"
The only trustable reference.
What I see ?
The short WU have longer deadline than very long !
With result, sorry to everyone, all my "long", "long2" : cancelled
Average2: deadline 4 days but "time per bit" 20ms
The new one deadline 5 days and time per bit about 2ms (ten times less)
I repeat all on the same host.
Where is logic of PRJ ?

In each case, Mmonnin, thank you for your help here and on several othe project
Best and friendly regards

Philippe
8) Message boards : News : Formula BOINC Sprint 07.06.2019 04:00 (UTC) - 10.06.2019 03:59 (UTC) (Message 5252)
Posted 6 Jun 2019 by Profile marsinph
Hello Rebirther,
Thank you. I think most of us know about Sprint.
I got some long2 4-4.5M. Estimated running 22 hours.
I know it is all except accurate.
The only accurate is in slot and "time per bit"
After 8 hours running : 5.15% done.
Time per bit about 45ms !!!
WU name : R22_4-4.5M_wu_4949_4

Well eight WU on a i7-2600K eight core.
At such speed, those WU will never finish before end of sprint.
5% = 8Hr === 1% = 1.6hr === 100% = 160 hours = 6.6 days !!!
Sprint 3 days


Reducing 8 simultanous to 4 simultanous decrease time per bit of about 30%
I now understand why everyone cancel such WU. Also considering about 25% are in error while computing"

Reducing to two WU at the time will go to about 15ms. But considering amount of
to crunch, they will be also too late for Sprint. And whith hope no "error while computing"

It is the same for long. On perfect same host : time per bit 37ms.
Will also be out of time.

And all those WU are annouced from project with estimation computing of 80GFLOPS (eighty).
To compare Asteroid : 1,380,023 GFLOPS (seventeen thousand time more) : average running 2h30

So why for a sprint you release only "monster" WU ?
I understand you have a lot old WU waiting results.
I understand you want to make your queue so empty as possible.
But then why producing so monster WU ?

It was only my opinion. At normal time (outside Sprint), OK
Not at sprint.

I will let finish on one of my host four simultanous Long2. Waiting result.
I know the credit is very big. But considering % of error and delay, why to crunch so WU.

Waiting production of norma WU,
Best regards










08:41:01 (3628): wrapper (7.5.26012): starting
08:41:01 (3628): wrapper: running llr.exe ( -d -oPgenInputFile=input.prp -oPgenOutputFile=primes.txt -oDiskWriteTime=10 -oOutputIterations=50000 -oResultsFileIterations=99999999)
Base factorized as : 2*11
Base prime factor(s) taken : 11
Starting N+1 prime test of 3656*22^4348766-1
Using AVX FFT length 1920K, Pass1=768, Pass2=2560, a = 3

3656*22^4348766-1, bit: 50000 / 19393037 [0.25%]. Time per bit: 39.327 ms.
3656*22^4348766-1, bit: 100000 / 19393037 [0.51%]. Time per bit: 33.596 ms.
3656*22^4348766-1, bit: 150000 / 19393037 [0.77%]. Time per bit: 34.878 ms.
3656*22^4348766-1, bit: 200000 / 19393037 [1.03%]. Time per bit: 35.968 ms.
3656*22^4348766-1, bit: 250000 / 19393037 [1.28%]. Time per bit: 36.125 ms.
3656*22^4348766-1, bit: 300000 / 19393037 [1.54%]. Time per bit: 38.167 ms.
3656*22^4348766-1, bit: 350000 / 19393037 [1.80%]. Time per bit: 42.293 ms.
3656*22^4348766-1, bit: 400000 / 19393037 [2.06%]. Time per bit: 42.970 ms.
3656*22^4348766-1, bit: 450000 / 19393037 [2.32%]. Time per bit: 43.005 ms.
3656*22^4348766-1, bit: 500000 / 19393037 [2.57%]. Time per bit: 42.257 ms.
3656*22^4348766-1, bit: 550000 / 19393037 [2.83%]. Time per bit: 26.694 ms.
3656*22^4348766-1, bit: 600000 / 19393037 [3.09%]. Time per bit: 24.258 ms.
3656*22^4348766-1, bit: 650000 / 19393037 [3.35%]. Time per bit: 24.116 ms.
3656*22^4348766-1, bit: 700000 / 19393037 [3.60%]. Time per bit: 25.888 ms.
3656*22^4348766-1, bit: 750000 / 19393037 [3.86%]. Time per bit: 42.601 ms.
3656*22^4348766-1, bit: 800000 / 19393037 [4.12%]. Time per bit: 42.885 ms.
3656*22^4348766-1, bit: 850000 / 19393037 [4.38%]. Time per bit: 41.926 ms.
3656*22^4348766-1, bit: 900000 / 19393037 [4.64%]. Time per bit: 44.576 ms.
3656*22^4348766-1, bit: 950000 / 19393037 [4.89%]. Time per bit: 46.705 ms.
3656*22^4348766-1, bit: 1000000 / 19393037 [5.15%]. Time per bit: 47.208 ms.
9) Message boards : Number crunching : Estimated running time / size given by project vs reality ! (Message 4942)
Posted 15 Feb 2019 by Profile marsinph
Other possibilities when over clocking, error correction in RAM slows the process down so over clocked system RAM or T3 cache could actually slow work down; step your FSB down 1 or 2 points and see if things speed up.

If running WU on 1 or 2 cores then the CPU can use burst mode to higher frequencies to get faster results. When all cores are in use then burst mode can't be used for all available wattage is in use and thermal limits are reached.



Hello Marmot,
I was thinking about something like what you explain.
It is full possible. CPU wait too long results from process.
So L3 is full.


About FSB and RAM speed, all is in adequation.
A stupid comparaison :
RAM at 1600, CPU base clock at 100 and FSB at 1333 !!!
Nice way to slown down a machine !
Or to use only one big RAM instead of 2. So again nice way to slow down because dual channel inactive.

and so one and so one...

But I will check again very accurate.

With two simultanous WU, (and I repeat, nothing else is running. When I say nothing is really nothing exept Win7)
running time per bit go to about 2ms instead 4 )
But global running time is not divided by a ratio of 2, only by about 1.5.

And once again, I repeat i compare the perfect same app, on same host
10) Message boards : Number crunching : Estimated running time / size given by project vs reality ! (Message 4938)
Posted 14 Feb 2019 by Profile marsinph
It looks like your CPU is overheated and throttled.

(ms / 1000) * iterations total = seconds total

I think my runtime had something with 1,4ms not 4,x



Hello,
It is true my CPU (i7-2600k) is in little overheat 70°C.
But you wrote a few weeks ago no problem to 70°C. At 80°C danger.
On a eight core, I run only four WU at same time.
And no any other project, no any other software. Only Boinc.

I have test with only one WU. I go down to 2.8 ms. Not test a few minutes, one hour !
CPU load : 12% - T° : 35°C


Nothing change.

I will try to capture before finish WU details (R531_250-300K_a_wu_124707)

I also see there is no any checkpoint in BoincReport (well in slot)



Sorry if it is confused, but as blind for 95% I need to use artefact to "see"
I use a other host to write. Not the host who crunh.

Look also R351_250-300k_a_wu_124754_0
and ...........124755_0
I let finish this two WU alone.

Then I will go ahead. I will not monopolyse one host for running one task each two hours. Sorry
11) Message boards : Number crunching : Estimated running time / size given by project vs reality ! (Message 4936)
Posted 14 Feb 2019 by Profile marsinph
Jello,
A few questions :


What or who) estimated computation size (R351_250_300K_....
Windows gives 80GFLOPS. My host has 3.8GFLOPS

On the post estimated running time 8min-1h20 on avx 3.8Ghz.
I run at 4.2Ghz (3.4Ghz nominal OC to 4.2Ghz). So normùally I would need less than 1h21 !
After 2h20 running, still at 96% in stderr.txt !
It is also not frozen. It increase at each checkpoint.

Then at download estimated running time : 00:16:33 (before start)
I not consider the remaining time given by Windows who is very inaccurate.
Only I check and estimate in stderr.txt.
See WU
http://srbase.my-firewall.org/sr5/workunit.php?wuid=358042596

But also http://srbase.my-firewall.org/sr5/show_host_detail.php?hostid=100501
who takes a little less than 4 (four hours)

What do I wrong ?
12) Message boards : Number crunching : SR-base overheat !? (Message 4894)
Posted 22 Jan 2019 by Profile marsinph
Hello Luigi
I already have try your proposal, it not help very much !
I have let run only three WU on 8 cores.
Nothing else running. This host is ONLY for crunching.
Three WU running immediately to 70°C !!!
And i repeat it is not a problem of cooling failure.
13) Message boards : Number crunching : SR-base overheat !? (Message 4893)
Posted 22 Jan 2019 by Profile marsinph
SRBase is hot, best run in the cold winter months to keep you warm.

If your cooling is struggling, start with 1 thread for SRBase and increase the number of threads running SRBase until you get to a heat level you are comfortable to maintain 24/7.



Hello ?
I will not move to artic or antartic because SR take so much !
Joke of course.
14) Message boards : Number crunching : SR-base overheat !? (Message 4888)
Posted 21 Jan 2019 by Profile marsinph
Hello,
I7-2600K Win7x64 8Cores
CPU TDP 95W
Cooler TDP 250W
Win7 X64
2x8Gb RAM
BAM set to use 75% of CPU time. Also on project.
The host (100501, visible by admin) have only OS/Java/Framework, nothing else.
This host is only used to crunch.

When I run SR with other project, CPU overheating (72°C !!!)
I was thinking some incompatibilty and/or wrapper using more than normal.

So, I runs SR fully alone, on 8 WU
The same, not depending the kind of WU !!! (short, long, avg,.....)

Someone the same ? Or idea ?
It happens only with SR. Sometimes with NFS when there are 4 project running.
But never so worst
15) Message boards : Number crunching : The recent loaded WU (Message 4660)
Posted 9 Oct 2018 by Profile marsinph
Hello,
I followed your calculation as describe in FAQ.
It not correspond at all with estimated running time !
host 104769 task S370_200-250k_WU_12161_0
(not yet returned) I guess about 15:15 UTC)



Situation after 38 minuts running :

15:58:07 (8244): wrapper (7.5.26012): starting
15:58:07 (8244): wrapper: running llr.exe ( -d -oPgenInputFile=input.prp -oPgenOutputFile=primes.txt -oDiskWriteTime=10 -oOutputIterations=50000 -oResultsFileIterations=99999999)
Base factorized as : 2*5*73
Base prime factor(s) taken : 73
Starting N-1 prime test of 85*730^214335+1
Using all-complex AVX FFT length 240K, Pass1=1280, Pass2=192, a = 3

85*730^214335+1, bit: 50000 / 2038699 [2.45%]. Time per bit: 2.969 ms.
85*730^214335+1, bit: 100000 / 2038699 [4.90%]. Time per bit: 2.878 ms.
85*730^214335+1, bit: 150000 / 2038699 [7.35%]. Time per bit: 2.993 ms.
85*730^214335+1, bit: 200000 / 2038699 [9.81%]. Time per bit: 2.895 ms.
85*730^214335+1, bit: 250000 / 2038699 [12.26%]. Time per bit: 2.697 ms.
85*730^214335+1, bit: 300000 / 2038699 [14.71%]. Time per bit: 2.405 ms.
85*730^214335+1, bit: 350000 / 2038699 [17.16%]. Time per bit: 2.154 ms.
85*730^214335+1, bit: 400000 / 2038699 [19.62%]. Time per bit: 1.927 ms.
85*730^214335+1, bit: 450000 / 2038699 [22.07%]. Time per bit: 1.928 ms.
85*730^214335+1, bit: 500000 / 2038699 [24.52%]. Time per bit: 1.892 ms.
85*730^214335+1, bit: 550000 / 2038699 [26.97%]. Time per bit: 1.957 ms.
85*730^214335+1, bit: 600000 / 2038699 [29.43%]. Time per bit: 1.883 ms.
85*730^214335+1, bit: 650000 / 2038699 [31.88%]. Time per bit: 1.891 ms.
85*730^214335+1, bit: 700000 / 2038699 [34.33%]. Time per bit: 1.981 ms.
85*730^214335+1, bit: 750000 / 2038699 [36.78%]. Time per bit: 1.919 ms.
85*730^214335+1, bit: 800000 / 2038699 [39.24%]. Time per bit: 1.950 ms.
85*730^214335+1, bit: 850000 / 2038699 [41.69%]. Time per bit: 1.971 ms.
85*730^214335+1, bit: 900000 / 2038699 [44.14%]. Time per bit: 1.913 ms.
16) Message boards : Number crunching : The recent loaded WU (Message 4649)
Posted 7 Oct 2018 by Profile marsinph
Hello,
Considering the post of Admin,
The new WU (S370 and 716 need about 10 minuts to run (on a 3.8Ghz))
On my host 104672, I have WU already 3 hours running
After restart, again started from null because no checkpoint/.
It is not the problem, we know it.

The problem is that WU are stuck at 100% !!!
One of the WU after 15 minuts running is at 100% and the other started at the same time are at 20% !!!

Who can explain ???
Best regards
17) Message boards : Number crunching : Why I not find my team ??? (Message 4287)
Posted 28 Mar 2018 by Profile marsinph
I am owner of BOINC.BE as main SETI@HOME
cross project ID e7ff5d12115a943832d094c3a4fd0335
Our team participate to a lot of projects
We are the first belgian team. 113th world combined BOINC
But here when a search team here, filtered on Belgium, it not appears.
It is OK on all other projekt !
What happens ?




Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther