Posts by Dirk Broer
log in
61) Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long (Message 801)
Posted 29 Jan 2015 by Dirk Broer
That's not completely true. It is true for those CPUs that are based upon the Bulldozer architecture -The FX series CPUs and Opterons based upon the same archtecture and the Trinity, Richland and Kaveri APUs. It is not true for Llano (still K10 based) and not true for the AM1 SOCs (Jaguar based).
Most amazing in the floating point vs Integer discussion is the BOINC benchamrks for the APUs:

A8-3870K (Llano, 1st generation APU), running Ubuntu 13.10, benchmarks via BOINC Manager:
2461 floating point MIPS (Whetstone) per CPU
15793 integer MIPS (Dhrystone) per CPU
This Llano is made out of four K10 cores, each having both a FPU (Floating Point Unit) and an ALU (Arithmetic Logic Unit)

A10-5700 (Trinity, 2nd generation APU), running Lubuntu 13.10, benchmarks via BOINC Manager:
2450 floating point MIPS (Whetstone) per CPU
9513 integer MIPS (Dhrystone) per CPU
This Trinity is made out of two Piledriver modules, each having two integer cores and a shared floating point unit. For some reason the integer performance of Bulldozer, Piledriver -and now Steamroller too- leaves much to be desired as compared to the older K10 integer units....and quite a lot of BOINC projects make heavy use of the integer performance of your CPU core(s).

It almost looks like a Bulldozer module isn't made out of two integer cores and a shared floating point unit, but the other way around: Two floating point units and a shared integer unit!
62) Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long (Message 791)
Posted 28 Jan 2015 by Dirk Broer
Perhaps interesting to observe, AMD-performance wise:
I've run the Sierpinski/Riesel Base - long so far on five AMD systems, in order of architectural age:

    [1.] Three using FM1 APUs (A8-3820, 3850 and 3870K), K10 based, four discrete cores (no shared resources), not using AVX.
    [2.] One using a FM2 A10-5700 APU, Piledriver based, two Bulldozer modules that feature two integer units and one floating point unit each, using AVX.
    [3.] One using an AM1 Athlon 5350 SOC, Jaguar based, four discrete cores (no shared resources), using AVX (amongst others).


Computing times:


    [1.] 58,000-68,000 sec.
    [2.] 85,000 sec.
    [3.] 150,000 sec.


The more modern the AMD architecture -the more (theoretically) complete the possible instruction set-, the worse the performance???
The even older Phenom II X6 1100T mentioned earlier so far has the best AMD scores...but at least they were pretty consistent scores.
My two Intel systems that ran the Sierpinski/Riesel Base - long WUs showed wildly different computing times, ranging from slighty more than 1,099 sec to more than 109,000 sec for my i7-3770 and from slighty more than 1,600 sec to more than 63,000 sec for my Core2 Q8200?????

63) Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long (Message 790)
Posted 28 Jan 2015 by Dirk Broer
You claim AMD's way of implementing and cite the architecture of the FX and present Opteron CPUs "they have a single AVX ALU for each pair of CPU cores", while I have problems with the performance of my AM1 Athlon, using the Jaguar architecture that does *NOT* share its ALUs with other cores. I just want to know wheter the present application really uses the AM1 Athlon's architecture to the fullest and therefore asked whether you have run an AM1 Athlon through the dubugger to see if AVX is used or not.
64) Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long (Message 784)
Posted 28 Jan 2015 by Dirk Broer
Ever watched in the debugger whether an AMD CPU actually uses the AVX part of the code? According to Agner Fog the Intel compiler-made Assembly seeks a different, inferior path when running on a non-Intel CPU. I wouldn't be surprised that there is no difference as compared to SSE4 because it is the exact same code that has been running. Who wrote the MicroSoft compiler actually?
65) Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long (Message 781)
Posted 28 Jan 2015 by Dirk Broer
The massive difference is due to the AVX instruction set in the intel cpu's. You will see the same result if you crunch on Primegrid. NeoAtP


The massive difference could very well be due to the use of code that excludes CPUs of other make than Intel from using the afore mentioned AVX instruction set. AVX is included in all current AMD CPUs, with the exeption of the FM1 APUs (Llano). All AM3+, FM2, FM2+, AM1 and FT3 CPUs, APUs and SOCs can use AVX.
66) Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long (Message 778)
Posted 27 Jan 2015 by Dirk Broer
Nothing wrong with the instruction set of the Athlon 5350...it includes AVX
(but when the application is compiled using an Intel compiler it may not be able to use it because the Intel compiler checks on vendor string instead of capabilities).
67) Message boards : Number crunching : Too long calculation for the Sierpinski/Riesel Bases - long (Message 770)
Posted 27 Jan 2015 by Dirk Broer
Want to see long calculation time? My Athlon 5350 (AM1 socket)

3979288 1186 22 Jan 2015, 16:41:56 UTC 26 Jan 2015, 13:24:00 UTC Voltooid en gecontroleerd 167,366.05 159,487.70 1,100.00 Sierpinski / Riesel Base - long v0.01

Most of the times it is not finished before the deadline, it takes more than 40 hours to calculate the long WUs...
68) Message boards : Number crunching : Raspberry Pi (Message 621)
Posted 12 Jan 2015 by Dirk Broer
RaspberryPi has this http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0301h/index.html (section 1.5.9)


And the RaspberryPi just has to do with a ARMv6 CPU. BananaPi and others based upon the Allwinner A20 SOC do it with an ARMv7, which is more powerful.
69) Message boards : Number crunching : Raspberry Pi (Message 582)
Posted 9 Jan 2015 by Dirk Broer
Fair enough: http://www.arm.com/products/processors/technologies/vector-floating-point.php
70) Message boards : Number crunching : Raspberry Pi (Message 578)
Posted 9 Jan 2015 by Dirk Broer
Did I anywhere said "current ARM CPU" in my message?
71) Message boards : Number crunching : Raspberry Pi (Message 573)
Posted 9 Jan 2015 by Dirk Broer
The LLR application uses "gwnum code". This code is written for INTEL processors. Therefore you cannot compile the LLR application for ARM based OS's ..... until there is "gwnum code" available for ARM processors.

this will hardly ever happen simpy because afaik there is no arm-design that is IEEE 754 compatible.
there are commercial libraries for newer arm-design which support this, but since they need to emulate on that limited arm hardware....


"gwnum code" is *NOT* written for Intel CPUs but for CPUs using the x86 and x86-64 archtecture, so AMD CPUs and VIA CPUs can be used ass well.

And as Arthur C. Clarke once said: "Any sufficiently advanced technology is indistinguishable from magic". The mere fact that NOW there is no arm-design that is IEEE 754 compatible says nothing about the future. We are just waiting for a programmer to eable it, one way or another.


Previous 20

Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther