Posts by Michael Goetz

1) Message boards : Cafe : Rebirther is currently in hospital. (Message 9915)
Posted 17 Apr 2024 by

Wishing you the best. Making a full recovery is the most important thing. All this will still be here when you get better.

Hurry up and get well!

2) Message boards : News : server outage / db crash again - 2nd (Message 7034)
Posted 29 Nov 2020 by

Michael Goetz

The MLC@Home BOINC server is literally running on a Raspberry Pi.

Have you considered setting up a database replication server to protect against problems like this? If he can run a whole server on a Raspberry Pi, you might be able to run a database replica. It might be worth looking into as a low cost reliability upgrade.

3) Message boards : Number crunching : Long 2 are a joke (Message 5170)
Posted 4 May 2019 by

Michael Goetz

Bottom line is:

Ignore the percentage done.

Ignore the time estimates. Since the percent done is wrong, the estimates are useless.

Checkpointing works. Absolutely, Positively. You can trust it.

4) Message boards : Number crunching : Long 2 are a joke (Message 5146)
Posted 24 Apr 2019 by

Michael Goetz

Longs really shouldn't be run on a 2 core machine, which would generally be 8 year old Core 2 Duos, so excluding them would be setting an appropriate boundary.

Most modern Core i3 CPUs are 2c/4t.

All but the highest end laptops tend to be 2c/4t. Most Mobile Core i5 CPUs are 2c/4t, unlike their desktop counterparts. Unless I'm mistaken, many mobile Core i7s are also 2c/4t.

You don't have to go back 8 years to find dual core CPUs in common use.

When I said Core2 Duo, the implication was 2c/2t. But to be specific, 2c/2t machines. The benchmarks of 6th -> 9th gen 2c/4t are much greater than the Core2 Duos and can do the work of a long1 or long2, slowly.
BOINC will see 4 threads unless an owner actually turned off hyperthreading.

I wouldn't want to be doing long SRBase work units on even a modern 2c machine with hyper threading turned off and no available processing power to even use Chromium. LLR is pretty aggressive even set to idle priority.

BTW, only 1st gen i7 600 series were 2c/4t, all later have been 4+ real cores. 2nd, 3rd, 4th and 5th gen i3/i5s had quite a lot of low power 2 cores sold (desktop i5's were 4 core), 6th gen and later (after Jan 2015) are 2/4, 4 and 6.

We'll just have to agree to disagree. There's a lot in what you said that I completely disagree with.

5) Message boards : Number crunching : Long 2 are a joke (Message 5139)
Posted 22 Apr 2019 by

Michael Goetz

Longs really shouldn't be run on a 2 core machine, which would generally be 8 year old Core 2 Duos, so excluding them would be setting an appropriate boundary.

Most modern Core i3 CPUs are 2c/4t.

All but the highest end laptops tend to be 2c/4t. Most Mobile Core i5 CPUs are 2c/4t, unlike their desktop counterparts. Unless I'm mistaken, many mobile Core i7s are also 2c/4t.

You don't have to go back 8 years to find dual core CPUs in common use.

6) Message boards : Number crunching : Long 2 are a joke (Message 5135)
Posted 21 Apr 2019 by

Michael Goetz

When I tried it, CPU temperatures dropped dramatically, indicating that only one core was being used.

Noticed this also, there appears to be a 1 to 4 minute delay (depending on the computer's stats) where the long1 and long2 (haven't observed all WU's for the behavior but no noticeable lag on shorts) run a single core as the data set is being initialized.

Just mentioning so that anyone else maybe new to SRBase multithreading know about this lag and don't conclude they've made a mistake after seeing single core execution for 2 minutes.

That was because I wasn't doing it correctly. When done right, it does work.

7) Message boards : Number crunching : Long 2 are a joke (Message 5128)
Posted 20 Apr 2019 by

Michael Goetz

1) The discussion in this thread about multi-threading confuses me. Does multithreading even work? If so, then how? At PrimeGrid, our wrapper is modified to specifically pass the "-t#" parameter to LLR, but in doing so it has to convert "-t#" to "-t #". Unless the wrapper here passes the parameter to LLR, and converts it to the correct format, I don't see how you could actually get multithreading to work. When I tried it, CPU temperatures dropped dramatically, indicating that only one core was being used.

it works the same way as it works at Primegrid

with -t8 my i7-5820k needs about 11-12 hours for one task

I stand corrected. PEBKAC

MT does indeed work.

8) Message boards : Number crunching : Long 2 are a joke (Message 5126)
Posted 20 Apr 2019 by

Michael Goetz

I'm running some long2 and long3 tasks right now, on a non-AVX computer. The long2 tasks will take about 11 days to run. They also were queued up for a few days, so they'll finish about 4 days after the deadline. Starting them late is my fault.

There's 3 different things I'd like to mention, however:

1) The discussion in this thread about multi-threading confuses me. Does multithreading even work? If so, then how? At PrimeGrid, our wrapper is modified to specifically pass the "-t#" parameter to LLR, but in doing so it has to convert "-t#" to "-t #". Unless the wrapper here passes the parameter to LLR, and converts it to the correct format, I don't see how you could actually get multithreading to work. When I tried it, CPU temperatures dropped dramatically, indicating that only one core was being used.

2) Looking in the in-progress stderr.txt file, it's obvious that the BOINC manager is not reporting the correct progress done amount.

3) Maybe the deadline should be long enough to let computers without AVX to finish?

9) Message boards : News : base S409 solved (Message 1654)
Posted 13 Jul 2015 by

Michael Goetz

Congratulations!

10) Message boards : Number crunching : Overall project progress? (Message 853)
Posted 5 Feb 2015 by

Michael Goetz

Hi,

is there some rough estimation on how much of the whole project we have calculated so far

No.

...and how long it still might run?

The definitive, best, and only possible answer to that question is "Sometime between right now and the end of time." :)

These conjecture projects are of indeterminate length and there's no way to estimate how long they will take. They can *literally* take forever if a conjecture is false.

Just looking at base 2 -- and only base 2 -- you have two projects that have been running a decade or more -- Seventeen or Bust and RieselSieve (now called "The Riesel Problem" on PrimeGrid).

R2 is in the n=7,000,000 vicinity and S2 is at n=29,000,000. A single S2 task at the lower n=27,000,000 range where PrimeGrid is currently searching, on a *VERY* fast computer, takes about 4 days.

That's just base two.

Chances are anyone currently asking the question today won't live long enough to see the completion. And, of course, if any of the 2000+ conjectures comprising this project are false, the project length is effectively infinite and the project will never be able to prove all of the conjectures.

11) Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long (Message 788)
Posted 28 Jan 2015 by

Michael Goetz

Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? According to Agner Fog the Intel compiler-made Assembly seeks a different, inferior path when running on a non-Intel CPU. I wouldn't be surprised that there is no difference as compared to SSE4 because it is the exact same code that has been running. Who wrote the MicroSoft compiler actually?

We could argue pointlessly about this for eternity, with each of us thinking the other is completely clueless, and nobody's mind is going to be changed.

What's not up for argument is that the programmers involved (most notably George Woltman and Yves Gallot) in optimizing their respective software have been unable to get AMD CPUs to provide AVX-class performance, and it's certainly not for lack of trying. It's in everyone's interest to get more performance out of AMD CPUs.

If you're so certain that the problem is in some compiler error (or conspiracy theory), hey -- it's all open source software. Go grab the source code and fix the problem. You'll be a hero.

Don't have the skills to do that, or are unwilling to put in the time? Well, the people who do have the skills and have been putting in the time and effort (for more than a decade), all say it's impossible because of the way AMD implemented AVX. Do you truly believe you're correct and everyone who has actually been working on this is wrong? And perhaps more importantly, do you think you're going to convince anyone else that you're right?

12) Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long (Message 782)
Posted 28 Jan 2015 by

Michael Goetz

Nothing wrong with the instruction set of the Athlon 5350...it includes AVX
(but when the application is compiled using an Intel compiler it may not be able to use it because the Intel compiler checks on vendor string instead of capabilities).

Part of that is technically true -- AMD does have AVX (and AVX2) instructions. However, their implementation is substantially inferior to Intel's, and in practice using AVX instructions on AMD doesn't speed up LLR. The important part of LLR is written in assembly language, so the "Intel compiler conspiracy" is simply incorrect. Also, we don't use Intel compilers.

In more detail, the problem with AMD's implementation is that they have a single AVX ALU for each pair of CPU cores. If you're running LLR on all cores, effectively the speed of AVX instructions is therefore cut in half. The effect on LLR (and similar programs) is dramatic, and the result is that primality testing has, for all intents and purposes, become an Intel dominated proposition.

For what it's worth, for the latest CPU version of the Genefer program (it's not used here, but we use it at PrimeGrid), we used the Microsoft Visual Studio compiler suite to produce AMD-specific FMA4 (AVX2) builds hoping to boost the performance on AMD. It didn't help. The AMD-FMA4 version of Genefer runs at about the same speed as the SSE4 version. AVX on AMD CPUs is useless for our purposes, unfortunately.

13) Message boards : Number crunching : Error on long - but it finished the llr (Message 643)
Posted 15 Jan 2015 by

Michael Goetz

If you want SRBase as your main project, and want PrimeGrid as a "backup" project, don't do 99% and 1%.

Do 100% and 0%.

If PG has 1%, then BOINC will use its internal logic to decide whether it's done the 1% or not, and the calculation is complex, non-intuitive, and unreliable. On your computer, for some arcane reason, BOINC thinks PG hasn't done 1% yet.

When a project is set to 0%, that's a special setting that means "Only get work from this project when no work is available from any other project." This is precisely what you're trying to do.

FYI, should you happen to use PRPNet, it works exactly the same way. "0%" indicates a backup port that should only be used when the other ports have no work.

14) Message boards : Number crunching : Error on long - but it finished the llr (Message 640)
Posted 15 Jan 2015 by

Michael Goetz

I just checked the BOINC event log..

It says (after identifying the w/u) "Transient HTTP error"

Neo
AtP

This sounds similar to a problem that's been observed at PrimeGrid and is actually a problem that will affect ALL BOINC projects.

Do you use AVG antivirus? If so, you must DISABLE its "Identify theft" protection. This part of AVG intermittently thinks the network communications with a BOINC server is suspicious and blocks it, resulting in transient HTTP errors. You can't block this by directory; you have to completely disable this feature.

15) Message boards : Number crunching : credits (Message 629)
Posted 13 Jan 2015 by

Michael Goetz

Michael Goetz, there is no reason why you shouldn't continue posting. I think even if your comments are addressed to rebirther, others may learn from them.

I think I should clarify what I meant when I said "the project is for the users".

There are many people who have put a lot of effort into searching for prime numbers, and more relevant to SRbase, to working on proving the various Sieprinski and Riesel conjectures. I'm one of those people. So is Rebirther. It's fair to say nearly everyone at the Mersenne forums and those involved in CRUS also fall into that category.

It's THOSE people for whom I see SRbase as being beneficial. It provides them with an easy way to organize their efforts. That's what I meant by "it's for the users."

I apologize for not being clearer or more inclusive.

16) Message boards : Number crunching : credits (Message 611)
Posted 12 Jan 2015 by

Michael Goetz

You might want to try being a bit more supportive of all the work he's put into this site, and make the criticism a bit more constructive. He's doing it for you.

I do not understand Michael.

Indeed, I don't believe you were one of the people my "try to be constructive" comments were aimed at. I didn't want to name individuals, so I apologize if you felt included in that wide net. There have been a few "Do this or I quit!" types of posts which don't seem to me to be particularly helpful other than to point out that they don't like credit new.

By the way, I've decided to try not to post here anymore. I'm not sure if I'm helping or hurting Rebirther, so I'm just going to keep my mouth shut from now on. I wish him luck with the site and I wish everyone else luck with knocking off k's from all of the CRUS conjectures.

17) Message boards : Number crunching : credits (Message 596)
Posted 10 Jan 2015 by

Michael Goetz

Rebirther,

What you might want to try is to calculate credit dynamically on CONSTANT * (log10(k)+log10(b)*n)^2. I.e., credit is proportional to the size of the number squared. (Adjust constant as needed so credit is reasonable.) The credit will scale nicely with the size of the task.

That won't take FFT size variations into account, so it's not perfect, but it's significantly easier to do than a system that looks at FFT sizes, is MUCH better than BOINC's credit (new or old), and saves you from having to manually assign fixed credit to every batch of tasks.

To everyone who is saying "Use fixed credit or else!" -- fixed credit is not as good as you think it is. There's a lot of variation in task run lengths from similar looking numbers, and with fixed credit you get as much credit for the longer tasks as for the shorter tasks. Look at this another way -- if the other guy is systematically aborting all of his longer running tasks and only running the shorter ones, thus leaving the longer tasks for other people (like yourself), that means he's getting more credit per hour. It's still better than BOINC credit, but not nearly as good as it could be, and is VERY labor intensive for the admin -- prohibitively so. There may come a point where Rebirther could decide it's just too much work (and the crowd is too nasty) for it to be worth his trouble. You might want to try being a bit more supportive of all the work he's put into this site, and make the criticism a bit more constructive. He's doing it for you.

18) Message boards : Number crunching : credits (Message 593)
Posted 9 Jan 2015 by

Michael Goetz

- AVX should not have any advantage in credit rate.

That's understating it a bit. What actually happens is that AVX (and to a larger extent, FMA3) CPUs compute faster than non-AVX CPUs, but have the same BOINC benchmark numbers, if everything else is equal. Therefore, to BOINC, the AVX CPU did less work, and gets less credit for the same task than does the non-AVX CPU.

From a credit-hunter's perspective, this means SRBase will be a good place to put AMD CPUs since they'll get about the same credit per core-hour as AVX CPUs. Under a more fair credit system, AMD CPUs get fewer credits per core-hour.

People who care more about the conjectures and primes than about credits and badges won't change their habits.

As someone with a lot of experience in this area (I'm one of the people who run PrimeGrid, for those who don't know me), getting the credits right with LLR is a horribly complex problem. The built-in BOINC credit system works BACKWARDS with AVX-enabled apps, rewarding slower non-AVX computers with more credit. That's not ideal, but there's no easy fix. Rebirther's doing the best he can -- and he's got a lot of details to take care of just keeping SRBase running without worrying about credit all the time.

Please keep in mind this is a BETA project -- it's going to be a while until all the kinks are worked out. PrimeGrid was running for 8 years before we figured out a way to make the credit work reasonably well. Internally, PrimeGrid and SRBase are very different, so it's not easy to just take our credit system and migrate it over here. (If it was, Rebirther would have already done it.)

19) Message boards : Number crunching : Raspberry Pi (Message 587)
Posted 9 Jan 2015 by

Michael Goetz

VFP10 is a step in the right direction. When that's available, serious discussion about ARM versions of this software will be reasonable. It will still be slower (and a lot slower), but not nearly as bad as the current differential of several orders of magnitude.

Until then, we wait patiently.

if anyone is going to buy that design and goes into mass-production.
does anyone need a gadget capable of running vector math?

We do. :)

20) Message boards : Number crunching : Raspberry Pi (Message 583)
Posted 9 Jan 2015 by

Michael Goetz

Fair enough: http://www.arm.com/products/processors/technologies/vector-floating-point.php

Thanks.

VFP10 is a step in the right direction. When that's available, serious discussion about ARM versions of this software will be reasonable. It will still be slower (and a lot slower), but not nearly as bad as the current differential of several orders of magnitude.

Until then, we wait patiently.

Next 20