log in |
Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long
1 · 2 · Next
Author | Message |
---|---|
I'have stopped the Sierpinski / Riesel Base - long v0.01 WUs because with my Phenom II X6 1100T under Ubuntu 14.10 64 bits the time of calculation was far too much long especially the last 10% where progress moved only 0.001 % forward every 2 - 3 seconds. 2007676 1781846 1 Jan 2015, 13:48:04 UTC 2 Jan 2015, 17:09:06 UTC Terminé et validé 31,714.50 28,442.41 500.00 Sierpinski / Riesel Base - long v0.01 Exemple with a i7-4771 CPU @ 3.50GHz under Darwin 14.0.0 (S185 serie) 2009484 1783653 1 Jan 2015, 19:36:22 UTC 2 Jan 2015, 20:11:52 UTC Terminé et validé 14,967.01 13,735.24 650.00 Sierpinski / Riesel Base - long v0.01 Why this enormous difference??? ____________ "Libre de penser... pensez Libre" =8?()> | |
ID: 469 · Rating: 0 · rate: / Reply Quote | |
The massive difference is due to the AVX instruction set in the intel cpu's. | |
ID: 472 · Rating: 0 · rate: / Reply Quote | |
The massive difference is due to the AVX instruction set in the intel cpu's. Thanks for the info Neo. @ work I've a Xeon E5-1650, the AVX instructions are listed so I can crunch the Sierpinski / Riesel Base - long WUs... YOUPI!!! ____________ "Libre de penser... pensez Libre" =8?()> | |
ID: 475 · Rating: 0 · rate: / Reply Quote | |
Want to see long calculation time? My Athlon 5350 (AM1 socket) | |
ID: 770 · Rating: 0 · rate: / Reply Quote | |
AMD is good for sieving not for llr. | |
ID: 771 · Rating: 0 · rate: / Reply Quote | |
Nothing wrong with the instruction set of the Athlon 5350...it includes AVX | |
ID: 778 · Rating: 0 · rate: / Reply Quote | |
The massive difference is due to the AVX instruction set in the intel cpu's. You will see the same result if you crunch on Primegrid. NeoAtP The massive difference could very well be due to the use of code that excludes CPUs of other make than Intel from using the afore mentioned AVX instruction set. AVX is included in all current AMD CPUs, with the exeption of the FM1 APUs (Llano). All AM3+, FM2, FM2+, AM1 and FT3 CPUs, APUs and SOCs can use AVX. | |
ID: 781 · Rating: 0 · rate: / Reply Quote | |
Nothing wrong with the instruction set of the Athlon 5350...it includes AVX Part of that is technically true -- AMD does have AVX (and AVX2) instructions. However, their implementation is substantially inferior to Intel's, and in practice using AVX instructions on AMD doesn't speed up LLR. The important part of LLR is written in assembly language, so the "Intel compiler conspiracy" is simply incorrect. Also, we don't use Intel compilers. In more detail, the problem with AMD's implementation is that they have a single AVX ALU for each pair of CPU cores. If you're running LLR on all cores, effectively the speed of AVX instructions is therefore cut in half. The effect on LLR (and similar programs) is dramatic, and the result is that primality testing has, for all intents and purposes, become an Intel dominated proposition. For what it's worth, for the latest CPU version of the Genefer program (it's not used here, but we use it at PrimeGrid), we used the Microsoft Visual Studio compiler suite to produce AMD-specific FMA4 (AVX2) builds hoping to boost the performance on AMD. It didn't help. The AMD-FMA4 version of Genefer runs at about the same speed as the SSE4 version. AVX on AMD CPUs is useless for our purposes, unfortunately. | |
ID: 782 · Rating: 0 · rate: / Reply Quote | |
Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? According to Agner Fog the Intel compiler-made Assembly seeks a different, inferior path when running on a non-Intel CPU. I wouldn't be surprised that there is no difference as compared to SSE4 because it is the exact same code that has been running. Who wrote the MicroSoft compiler actually? | |
ID: 784 · Rating: 0 · rate: / Reply Quote | |
Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? the Intel compiler-made Assembly seeks a when running on a non-Intel CPU. oh no - not again that ancient story. :( even if this still would be the case - but of course you seem to be sure that intels compiler has been used for the app in use here... | |
ID: 786 · Rating: 0 · rate: / Reply Quote | |
Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? According to Agner Fog the Intel compiler-made Assembly seeks a different, inferior path when running on a non-Intel CPU. I wouldn't be surprised that there is no difference as compared to SSE4 because it is the exact same code that has been running. Who wrote the MicroSoft compiler actually? We could argue pointlessly about this for eternity, with each of us thinking the other is completely clueless, and nobody's mind is going to be changed. What's not up for argument is that the programmers involved (most notably George Woltman and Yves Gallot) in optimizing their respective software have been unable to get AMD CPUs to provide AVX-class performance, and it's certainly not for lack of trying. It's in everyone's interest to get more performance out of AMD CPUs. If you're so certain that the problem is in some compiler error (or conspiracy theory), hey -- it's all open source software. Go grab the source code and fix the problem. You'll be a hero. Don't have the skills to do that, or are unwilling to put in the time? Well, the people who do have the skills and have been putting in the time and effort (for more than a decade), all say it's impossible because of the way AMD implemented AVX. Do you truly believe you're correct and everyone who has actually been working on this is wrong? And perhaps more importantly, do you think you're going to convince anyone else that you're right? | |
ID: 788 · Rating: 0 · rate: / Reply Quote | |
You claim AMD's way of implementing and cite the architecture of the FX and present Opteron CPUs "they have a single AVX ALU for each pair of CPU cores", while I have problems with the performance of my AM1 Athlon, using the Jaguar architecture that does *NOT* share its ALUs with other cores. I just want to know wheter the present application really uses the AM1 Athlon's architecture to the fullest and therefore asked whether you have run an AM1 Athlon through the dubugger to see if AVX is used or not. | |
ID: 790 · Rating: 0 · rate: / Reply Quote | |
Perhaps interesting to observe, AMD-performance wise: [1.] Three using FM1 APUs (A8-3820, 3850 and 3870K), K10 based, four discrete cores (no shared resources), not using AVX. [2.] One using a FM2 A10-5700 APU, Piledriver based, two Bulldozer modules that feature two integer units and one floating point unit each, using AVX. [3.] One using an AM1 Athlon 5350 SOC, Jaguar based, four discrete cores (no shared resources), using AVX (amongst others).
[1.] 58,000-68,000 sec. [2.] 85,000 sec. [3.] 150,000 sec.
| |
ID: 791 · Rating: 0 · rate: / Reply Quote | |
The more modern the AMD architecture -the more (theoretically) complete the possible instruction set-, the worse the performance??? plain simple? yes - if you call for floating-point performance. integer speed is just another thing. and about "K10-baesd" - those things lack that big L3-cache - pretty ugly.... | |
ID: 792 · Rating: 0 · rate: / Reply Quote | |
That's a huge part of it. If you have a AMD cpu that is newer than the Phenom II, it only has half the number of FPU's as it does cores. Try running BOINC at 50% cpu usage. That should dramatically help your crunch times. I remember when the "Bulldozer" first came out... about 2010-2011?? I went out and bought the new CPU and a new mobo.. only to discover that it was 200% slower than my Phenom II 555.... Neo AtP | |
ID: 795 · Rating: 0 · rate: / Reply Quote | |
That's not completely true. It is true for those CPUs that are based upon the Bulldozer architecture -The FX series CPUs and Opterons based upon the same archtecture and the Trinity, Richland and Kaveri APUs. It is not true for Llano (still K10 based) and not true for the AM1 SOCs (Jaguar based). | |
ID: 801 · Rating: 0 · rate: / Reply Quote | |
i fold. | |
ID: 802 · Rating: 0 · rate: / Reply Quote | |
Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? the Intel compiler-made Assembly seeks a when running on a non-Intel CPU. Well, Mr. Frankhagen, just for you: the same Motherboard (ASUS AM1I-A), SOC (AMD Athlon 5350) and amount of memory (16GB), plain install of OS and BOINC, nothing further: Under Windows 10 Under Linux Look especially at the given integer performance.... How do you explain that, other than with AMD systems being artificially crippled by compilers and/or libraries favouring Intel? An Intel CPU or SOC does no give this dramatical differences -in fact almost none at all. | |
ID: 1641 · Rating: 0 · rate: / Reply Quote | |
I'have stopped the Sierpinski / Riesel Base - long v0.01 WUs because with my Phenom II X6 1100T under Ubuntu 14.10 64 bits the time of calculation was far too much long especially the last 10% where progress moved only 0.001 % forward every 2 - 3 seconds. Did they run to 100% or have they been finished earlier? I have lix long ones on a lowpower Xeon (much slower than your box, especially the int speed - and no AVX on chip) and I'm afraid that the last 10% will run much longer than the first 90% The result with the best progress is 93.75% / 79 hours, this morning (~10 hours ago) it was at about 91% so it isn't too much happening per hour :-/ | |
ID: 2186 · Rating: 0 · rate: / Reply Quote | |
... I can answer it myself now, they end somewhere between 95.5% and 95.8% - I finished all 6 before a second one had been sent out and it gave my RAC temporarily a nice boost :-) | |
ID: 2187 · Rating: 0 · rate: / Reply Quote | |
Message boards :
Number crunching :
Too long calculation for the Sierpinski/Riesel Bases - long