Too long calculation for the Sierpinski/Riesel Bases - long
log in

Advanced search

Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long

1 · 2 · Next
Author Message
Profile al@ON
Avatar
Send message
Joined: 29 Nov 14
Posts: 9
Credit: 1,559,430
RAC: 1,559
Message 469 - Posted: 2 Jan 2015, 23:03:36 UTC

I'have stopped the Sierpinski / Riesel Base - long v0.01 WUs because with my Phenom II X6 1100T under Ubuntu 14.10 64 bits the time of calculation was far too much long especially the last 10% where progress moved only 0.001 % forward every 2 - 3 seconds.

Exemples with my computer (S185 serie)

2007676 1781846 1 Jan 2015, 13:48:04 UTC 2 Jan 2015, 17:09:06 UTC Terminé et validé 31,714.50 28,442.41 500.00 Sierpinski / Riesel Base - long v0.01
2007445 1781615 1 Jan 2015, 13:37:52 UTC 2 Jan 2015, 16:16:13 UTC Terminé et validé 30,485.67 27,661.06 500.00 Sierpinski / Riesel Base - long v0.01
2007256 1781426 1 Jan 2015, 13:32:03 UTC 2 Jan 2015, 11:15:05 UTC Terminé et validé 31,328.21 30,482.91 500.00 Sierpinski / Riesel Base - long v0.01
2007270 1781440 1 Jan 2015, 13:32:03 UTC 2 Jan 2015, 11:06:17 UTC Terminé et validé 31,339.86 30,486.52 500.00 Sierpinski / Riesel Base - long v0.01
2007377 1781547 1 Jan 2015, 13:30:20 UTC 2 Jan 2015, 11:07:40 UTC Terminé et validé 31,607.84 30,760.57 500.00 Sierpinski / Riesel Base - long v0.01
2007200 1781370 1 Jan 2015, 13:29:58 UTC 2 Jan 2015, 8:22:44 UTC Terminé et validé 30,514.18 29,060.50 500.00 Sierpinski / Riesel Base - long v0.01
2007171 1781341 1 Jan 2015, 13:27:51 UTC 2 Jan 2015, 2:32:53 UTC Terminé et validé 34,451.03 29,415.54 500.00 Sierpinski / Riesel Base - long v0.01
2007191 1781361 1 Jan 2015, 13:27:51 UTC 2 Jan 2015, 7:48:07 UTC Terminé et validé 30,746.50 28,890.87 500.00 Sierpinski / Riesel Base - long v0.01
2007286 1781456 1 Jan 2015, 13:27:51 UTC 2 Jan 2015, 8:20:30 UTC Terminé et validé 31,264.18 29,677.62 500.00 Sierpinski / Riesel Base - long v0.01
2006837 1781007 1 Jan 2015, 13:21:00 UTC 2 Jan 2015, 2:23:54 UTC Terminé et validé 34,559.88 29,455.59 500.00 Sierpinski / Riesel Base - long v0.01
2006676 1780846 1 Jan 2015, 13:21:00 UTC 2 Jan 2015, 2:20:52 UTC Terminé et validé 34,290.83 29,146.23 500.00 Sierpinski / Riesel Base - long v0.01
2005775 1779947 1 Jan 2015, 13:08:41 UTC 1 Jan 2015, 20:47:38 UTC Terminé et validé 25,179.09 21,174.75 450.00 Sierpinski / Riesel Base - long v0.01
2005776 1779948 1 Jan 2015, 13:08:41 UTC 1 Jan 2015, 21:00:19 UTC Terminé et validé 25,550.78 21,490.91 450.00 Sierpinski / Riesel Base - long v0.01
2005111 1779286 1 Jan 2015, 13:04:04 UTC 1 Jan 2015, 20:21:48 UTC Terminé et validé 24,247.67 20,379.71 450.00 Sierpinski / Riesel Base - long v0.01


Exemple with a i7-4771 CPU @ 3.50GHz under Darwin 14.0.0 (S185 serie)
2009484 1783653 1 Jan 2015, 19:36:22 UTC 2 Jan 2015, 20:11:52 UTC Terminé et validé 14,967.01 13,735.24 650.00 Sierpinski / Riesel Base - long v0.01
2009286 1783455 1 Jan 2015, 19:32:54 UTC 2 Jan 2015, 18:45:03 UTC Terminé et validé 14,869.09 13,642.94 650.00 Sierpinski / Riesel Base - long v0.01
2009350 1783519 1 Jan 2015, 19:31:28 UTC 2 Jan 2015, 17:19:49 UTC Terminé et validé 14,950.56 13,725.76 650.00 Sierpinski / Riesel Base - long v0.01
2009314 1783483 1 Jan 2015, 19:28:27 UTC 2 Jan 2015, 16:16:55 UTC Terminé et validé 15,127.67 13,898.61 650.00 Sierpinski / Riesel Base - long v0.01
2009021 1783190 1 Jan 2015, 17:31:08 UTC 2 Jan 2015, 12:53:45 UTC Terminé et validé 11,592.49 11,533.81 650.00 Sierpinski / Riesel Base - long v0.01
2008543 1782713 1 Jan 2015, 17:21:43 UTC 2 Jan 2015, 12:04:44 UTC Terminé et validé 10,707.96 10,654.51 600.00 Sierpinski / Riesel Base - long v0.01
2008205 1782375 1 Jan 2015, 17:17:04 UTC 2 Jan 2015, 11:36:44 UTC Terminé et validé 10,529.84 10,470.07 600.00 Sierpinski / Riesel Base - long v0.01
2008759 1782929 1 Jan 2015, 17:17:04 UTC 2 Jan 2015, 11:16:32 UTC Terminé et validé 10,864.17 10,808.88 600.00 Sierpinski / Riesel Base - long v0.01
2007225 1781395 1 Jan 2015, 13:27:38 UTC 2 Jan 2015, 8:55:20 UTC Terminé et validé 7,528.49 7,524.50 500.00 Sierpinski / Riesel Base - long v0.01
2007051 1781221 1 Jan 2015, 13:23:30 UTC 2 Jan 2015, 8:15:29 UTC Terminé et validé 7,244.56 7,241.21 500.00 Sierpinski / Riesel Base - long v0.01
2006846 1781016 1 Jan 2015, 13:22:26 UTC 2 Jan 2015, 6:49:48 UTC Terminé et validé 7,100.81 7,098.31 500.00 Sierpinski / Riesel Base - long v0.01
2006886 1781056 1 Jan 2015, 13:21:52 UTC 2 Jan 2015, 4:51:28 UTC Terminé et validé 7,302.43 7,300.37 500.00 Sierpinski / Riesel Base - long v0.01
2006680 1780850 1 Jan 2015, 13:21:52 UTC 2 Jan 2015, 6:14:40 UTC Terminé et validé 7,020.58 7,018.28 500.00 Sierpinski / Riesel Base - long v0.01
2006958 1781128 1 Jan 2015, 13:21:52 UTC 2 Jan 2015, 4:17:38 UTC Terminé et validé 7,243.75 7,242.03 500.00 Sierpinski / Riesel Base - long v0.01
2006819 1780989 1 Jan 2015, 13:18:53 UTC 2 Jan 2015, 0:54:35 UTC Terminé et validé 12,269.36 11,971.72 500.00 Sierpinski / Riesel Base - long v0.01
2006617 1780787 1 Jan 2015, 13:18:53 UTC 1 Jan 2015, 23:31:21 UTC Terminé et validé 12,466.02 12,108.29 500.00 Sierpinski / Riesel Base - long v0.01
2006393 1780563 1 Jan 2015, 13:18:53 UTC 2 Jan 2015, 0:19:01 UTC Terminé et validé 12,190.62 11,878.24 500.00 Sierpinski / Riesel Base - long v0.01
2006698 1780868 1 Jan 2015, 13:18:53 UTC 2 Jan 2015, 2:16:57 UTC Terminé et validé 7,073.02 7,069.51 500.00 Sierpinski / Riesel Base - long v0.01


Why this enormous difference???
____________


"Libre de penser... pensez Libre" =8?()>

Neo
Send message
Joined: 28 Dec 14
Posts: 18
Credit: 299,879
RAC: 0
Message 472 - Posted: 3 Jan 2015, 2:42:50 UTC - in response to Message 469.

The massive difference is due to the AVX instruction set in the intel cpu's.

You will see the same result if you crunch on Primegrid.

Neo
AtP

Profile al@ON
Avatar
Send message
Joined: 29 Nov 14
Posts: 9
Credit: 1,559,430
RAC: 1,559
Message 475 - Posted: 3 Jan 2015, 12:36:57 UTC - in response to Message 472.

The massive difference is due to the AVX instruction set in the intel cpu's.

You will see the same result if you crunch on Primegrid.

Neo
AtP


Thanks for the info Neo.


@ work I've a Xeon E5-1650, the AVX instructions are listed so I can crunch the Sierpinski / Riesel Base - long WUs... YOUPI!!!
____________


"Libre de penser... pensez Libre" =8?()>

Dirk Broer
Send message
Joined: 2 Jan 15
Posts: 14
Credit: 5,778,104
RAC: 834
Message 770 - Posted: 27 Jan 2015, 12:19:24 UTC - in response to Message 469.

Want to see long calculation time? My Athlon 5350 (AM1 socket)

3979288 1186 22 Jan 2015, 16:41:56 UTC 26 Jan 2015, 13:24:00 UTC Voltooid en gecontroleerd 167,366.05 159,487.70 1,100.00 Sierpinski / Riesel Base - long v0.01

Most of the times it is not finished before the deadline, it takes more than 40 hours to calculate the long WUs...
____________

Profile Tarmo Ilves
Avatar
Send message
Joined: 20 Jan 15
Posts: 5
Credit: 1,637,306
RAC: 2,916
Message 771 - Posted: 27 Jan 2015, 12:48:58 UTC

AMD is good for sieving not for llr.
____________

Dirk Broer
Send message
Joined: 2 Jan 15
Posts: 14
Credit: 5,778,104
RAC: 834
Message 778 - Posted: 27 Jan 2015, 20:19:44 UTC - in response to Message 771.
Last modified: 27 Jan 2015, 20:23:19 UTC

Nothing wrong with the instruction set of the Athlon 5350...it includes AVX
(but when the application is compiled using an Intel compiler it may not be able to use it because the Intel compiler checks on vendor string instead of capabilities).

Dirk Broer
Send message
Joined: 2 Jan 15
Posts: 14
Credit: 5,778,104
RAC: 834
Message 781 - Posted: 28 Jan 2015, 0:04:39 UTC - in response to Message 472.

The massive difference is due to the AVX instruction set in the intel cpu's. You will see the same result if you crunch on Primegrid. NeoAtP


The massive difference could very well be due to the use of code that excludes CPUs of other make than Intel from using the afore mentioned AVX instruction set. AVX is included in all current AMD CPUs, with the exeption of the FM1 APUs (Llano). All AM3+, FM2, FM2+, AM1 and FT3 CPUs, APUs and SOCs can use AVX.

Michael Goetz
Avatar
Send message
Joined: 1 Jan 15
Posts: 18
Credit: 303,916
RAC: 0
Message 782 - Posted: 28 Jan 2015, 0:05:44 UTC - in response to Message 778.

Nothing wrong with the instruction set of the Athlon 5350...it includes AVX
(but when the application is compiled using an Intel compiler it may not be able to use it because the Intel compiler checks on vendor string instead of capabilities).


Part of that is technically true -- AMD does have AVX (and AVX2) instructions. However, their implementation is substantially inferior to Intel's, and in practice using AVX instructions on AMD doesn't speed up LLR. The important part of LLR is written in assembly language, so the "Intel compiler conspiracy" is simply incorrect. Also, we don't use Intel compilers.

In more detail, the problem with AMD's implementation is that they have a single AVX ALU for each pair of CPU cores. If you're running LLR on all cores, effectively the speed of AVX instructions is therefore cut in half. The effect on LLR (and similar programs) is dramatic, and the result is that primality testing has, for all intents and purposes, become an Intel dominated proposition.

For what it's worth, for the latest CPU version of the Genefer program (it's not used here, but we use it at PrimeGrid), we used the Microsoft Visual Studio compiler suite to produce AMD-specific FMA4 (AVX2) builds hoping to boost the performance on AMD. It didn't help. The AMD-FMA4 version of Genefer runs at about the same speed as the SSE4 version. AVX on AMD CPUs is useless for our purposes, unfortunately.

Dirk Broer
Send message
Joined: 2 Jan 15
Posts: 14
Credit: 5,778,104
RAC: 834
Message 784 - Posted: 28 Jan 2015, 8:35:03 UTC - in response to Message 782.
Last modified: 28 Jan 2015, 9:25:43 UTC

Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? According to Agner Fog the Intel compiler-made Assembly seeks a different, inferior path when running on a non-Intel CPU. I wouldn't be surprised that there is no difference as compared to SSE4 because it is the exact same code that has been running. Who wrote the MicroSoft compiler actually?

frankhagen
Send message
Joined: 7 Jun 14
Posts: 36
Credit: 164,036
RAC: 0
Message 786 - Posted: 28 Jan 2015, 14:58:23 UTC - in response to Message 784.
Last modified: 28 Jan 2015, 15:03:54 UTC

Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? the Intel compiler-made Assembly seeks a when running on a non-Intel CPU.



oh no - not again that ancient story. :(

even if this still would be the case - but of course you seem to be sure that intels compiler has been used for the app in use here...

Michael Goetz
Avatar
Send message
Joined: 1 Jan 15
Posts: 18
Credit: 303,916
RAC: 0
Message 788 - Posted: 28 Jan 2015, 17:36:33 UTC - in response to Message 784.

Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? According to Agner Fog the Intel compiler-made Assembly seeks a different, inferior path when running on a non-Intel CPU. I wouldn't be surprised that there is no difference as compared to SSE4 because it is the exact same code that has been running. Who wrote the MicroSoft compiler actually?


We could argue pointlessly about this for eternity, with each of us thinking the other is completely clueless, and nobody's mind is going to be changed.

What's not up for argument is that the programmers involved (most notably George Woltman and Yves Gallot) in optimizing their respective software have been unable to get AMD CPUs to provide AVX-class performance, and it's certainly not for lack of trying. It's in everyone's interest to get more performance out of AMD CPUs.

If you're so certain that the problem is in some compiler error (or conspiracy theory), hey -- it's all open source software. Go grab the source code and fix the problem. You'll be a hero.

Don't have the skills to do that, or are unwilling to put in the time? Well, the people who do have the skills and have been putting in the time and effort (for more than a decade), all say it's impossible because of the way AMD implemented AVX. Do you truly believe you're correct and everyone who has actually been working on this is wrong? And perhaps more importantly, do you think you're going to convince anyone else that you're right?

Dirk Broer
Send message
Joined: 2 Jan 15
Posts: 14
Credit: 5,778,104
RAC: 834
Message 790 - Posted: 28 Jan 2015, 20:44:01 UTC - in response to Message 788.

You claim AMD's way of implementing and cite the architecture of the FX and present Opteron CPUs "they have a single AVX ALU for each pair of CPU cores", while I have problems with the performance of my AM1 Athlon, using the Jaguar architecture that does *NOT* share its ALUs with other cores. I just want to know wheter the present application really uses the AM1 Athlon's architecture to the fullest and therefore asked whether you have run an AM1 Athlon through the dubugger to see if AVX is used or not.

Dirk Broer
Send message
Joined: 2 Jan 15
Posts: 14
Credit: 5,778,104
RAC: 834
Message 791 - Posted: 28 Jan 2015, 21:22:24 UTC
Last modified: 28 Jan 2015, 22:05:31 UTC

Perhaps interesting to observe, AMD-performance wise:
I've run the Sierpinski/Riesel Base - long so far on five AMD systems, in order of architectural age:


    [1.] Three using FM1 APUs (A8-3820, 3850 and 3870K), K10 based, four discrete cores (no shared resources), not using AVX.
    [2.] One using a FM2 A10-5700 APU, Piledriver based, two Bulldozer modules that feature two integer units and one floating point unit each, using AVX.
    [3.] One using an AM1 Athlon 5350 SOC, Jaguar based, four discrete cores (no shared resources), using AVX (amongst others).


Computing times:


    [1.] 58,000-68,000 sec.
    [2.] 85,000 sec.
    [3.] 150,000 sec.


The more modern the AMD architecture -the more (theoretically) complete the possible instruction set-, the worse the performance???
The even older Phenom II X6 1100T mentioned earlier so far has the best AMD scores...but at least they were pretty consistent scores.
My two Intel systems that ran the Sierpinski/Riesel Base - long WUs showed wildly different computing times, ranging from slighty more than 1,099 sec to more than 109,000 sec for my i7-3770 and from slighty more than 1,600 sec to more than 63,000 sec for my Core2 Q8200?????

frankhagen
Send message
Joined: 7 Jun 14
Posts: 36
Credit: 164,036
RAC: 0
Message 792 - Posted: 28 Jan 2015, 22:15:05 UTC - in response to Message 791.

The more modern the AMD architecture -the more (theoretically) complete the possible instruction set-, the worse the performance???


plain simple?

yes - if you call for floating-point performance.

integer speed is just another thing.


and about "K10-baesd" - those things lack that big L3-cache - pretty ugly....

Neo
Send message
Joined: 28 Dec 14
Posts: 18
Credit: 299,879
RAC: 0
Message 795 - Posted: 29 Jan 2015, 11:47:26 UTC - in response to Message 792.



plain simple?

yes - if you call for floating-point performance.

integer speed is just another thing.


and about "K10-based" - those things lack that big L3-cache - pretty ugly....



That's a huge part of it. If you have a AMD cpu that is newer than the Phenom II, it only has half the number of FPU's as it does cores. Try running BOINC at 50% cpu usage. That should dramatically help your crunch times.

I remember when the "Bulldozer" first came out... about 2010-2011?? I went out and bought the new CPU and a new mobo.. only to discover that it was 200% slower than my Phenom II 555....

Neo
AtP

Dirk Broer
Send message
Joined: 2 Jan 15
Posts: 14
Credit: 5,778,104
RAC: 834
Message 801 - Posted: 29 Jan 2015, 18:19:54 UTC - in response to Message 795.
Last modified: 29 Jan 2015, 18:21:22 UTC

That's not completely true. It is true for those CPUs that are based upon the Bulldozer architecture -The FX series CPUs and Opterons based upon the same archtecture and the Trinity, Richland and Kaveri APUs. It is not true for Llano (still K10 based) and not true for the AM1 SOCs (Jaguar based).
Most amazing in the floating point vs Integer discussion is the BOINC benchamrks for the APUs:

A8-3870K (Llano, 1st generation APU), running Ubuntu 13.10, benchmarks via BOINC Manager:
2461 floating point MIPS (Whetstone) per CPU
15793 integer MIPS (Dhrystone) per CPU
This Llano is made out of four K10 cores, each having both a FPU (Floating Point Unit) and an ALU (Arithmetic Logic Unit)

A10-5700 (Trinity, 2nd generation APU), running Lubuntu 13.10, benchmarks via BOINC Manager:
2450 floating point MIPS (Whetstone) per CPU
9513 integer MIPS (Dhrystone) per CPU
This Trinity is made out of two Piledriver modules, each having two integer cores and a shared floating point unit. For some reason the integer performance of Bulldozer, Piledriver -and now Steamroller too- leaves much to be desired as compared to the older K10 integer units....and quite a lot of BOINC projects make heavy use of the integer performance of your CPU core(s).

It almost looks like a Bulldozer module isn't made out of two integer cores and a shared floating point unit, but the other way around: Two floating point units and a shared integer unit!

frankhagen
Send message
Joined: 7 Jun 14
Posts: 36
Credit: 164,036
RAC: 0
Message 802 - Posted: 29 Jan 2015, 19:53:19 UTC

i fold.

Dirk Broer
Send message
Joined: 2 Jan 15
Posts: 14
Credit: 5,778,104
RAC: 834
Message 1641 - Posted: 9 Jul 2015, 23:44:46 UTC - in response to Message 786.
Last modified: 9 Jul 2015, 23:51:14 UTC

Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? the Intel compiler-made Assembly seeks a when running on a non-Intel CPU.

oh no - not again that ancient story. :(
even if this still would be the case - but of course you seem to be sure that intels compiler has been used for the app in use here...


Well, Mr. Frankhagen, just for you: the same Motherboard (ASUS AM1I-A), SOC (AMD Athlon 5350) and amount of memory (16GB), plain install of OS and BOINC, nothing further:
Under Windows 10
Under Linux
Look especially at the given integer performance....
How do you explain that, other than with AMD systems being artificially crippled by compilers and/or libraries favouring Intel?
An Intel CPU or SOC does no give this dramatical differences -in fact almost none at all.

Ananas
Send message
Joined: 26 Nov 15
Posts: 10
Credit: 370,238
RAC: 0
Message 2186 - Posted: 15 Dec 2015, 18:05:22 UTC - in response to Message 469.
Last modified: 15 Dec 2015, 18:06:41 UTC

I'have stopped the Sierpinski / Riesel Base - long v0.01 WUs because with my Phenom II X6 1100T under Ubuntu 14.10 64 bits the time of calculation was far too much long especially the last 10% where progress moved only 0.001 % forward every 2 - 3 seconds.
...

Did they run to 100% or have they been finished earlier?

I have lix long ones on a lowpower Xeon (much slower than your box, especially the int speed - and no AVX on chip) and I'm afraid that the last 10% will run much longer than the first 90%

The result with the best progress is 93.75% / 79 hours, this morning (~10 hours ago) it was at about 91% so it isn't too much happening per hour :-/

Ananas
Send message
Joined: 26 Nov 15
Posts: 10
Credit: 370,238
RAC: 0
Message 2187 - Posted: 16 Dec 2015, 18:17:07 UTC - in response to Message 2186.

...
Did they run to 100% or have they been finished earlier?
...

I can answer it myself now, they end somewhere between 95.5% and 95.8% - I finished all 6 before a second one had been sent out and it gave my RAC temporarily a nice boost :-)

1 · 2 · Next
Post to thread

Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long


Main page · Your account · Message boards


Copyright © 2014-2017 BOINC Confederation / rebirther