FMA3 vs AVX
log in

Advanced search

Message boards : Number crunching : FMA3 vs AVX

Author Message
Thalus
Send message
Joined: 7 Mar 17
Posts: 34
Credit: 2,584,831
RAC: 0
Message 3283 - Posted: 14 Mar 2017, 7:46:15 UTC
Last modified: 14 Mar 2017, 8:13:57 UTC

Hi,

I got some questions concerning Sierpinski / Riesel Base - short:
I observed, that my i7 6700k is using zero-padded FMA3 FFT (WU 206102871) while (for example) the i7 of rebirther is using zero-padded AVX FFT (WU 206108473).
Is there an advantage of using FMA3 FFTs instead of AVX FFTs concerning runtime? Or are there no differences at all?

Thalus

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 3284 - Posted: 14 Mar 2017, 17:45:41 UTC - in response to Message 3283.

Hi,

I got some questions concerning Sierpinski / Riesel Base - short:
I observed, that my i7 6700k is using zero-padded FMA3 FFT (WU 206102871) while (for example) the i7 of rebirther is using zero-padded AVX FFT (WU 206108473).
Is there an advantage of using FMA3 FFTs instead of AVX FFTs concerning runtime? Or are there no differences at all?

Thalus


FMA3 is similar to AVX2 and should be 25-50% faster than AVX instructions depends on CPU cache and memory controller.

Thalus
Send message
Joined: 7 Mar 17
Posts: 34
Credit: 2,584,831
RAC: 0
Message 3287 - Posted: 14 Mar 2017, 21:37:30 UTC

I guess the current apps are not optimized for FMA3? The difference is around 5-10% as far as i can see.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 3288 - Posted: 14 Mar 2017, 21:43:50 UTC - in response to Message 3287.
Last modified: 14 Mar 2017, 21:50:51 UTC

I guess the current apps are not optimized for FMA3? The difference is around 5-10% as far as i can see.


I think the bottleneck is your memory controller. Mine has a 6 channel. The app is optimized for everything. Or do you use HT?

Thalus
Send message
Joined: 7 Mar 17
Posts: 34
Credit: 2,584,831
RAC: 0
Message 3289 - Posted: 15 Mar 2017, 6:22:00 UTC

I do use HT. But if I disable it my throughput will get ~50% of what I can do now ;-)

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 3290 - Posted: 15 Mar 2017, 17:08:55 UTC - in response to Message 3289.

I do use HT. But if I disable it my throughput will get ~50% of what I can do now ;-)


The new version of llr is faster than the current one but still in test. FMA3 should be faster than AVX. Maybe you have a temp issue which reduce the cpu load.

Thalus
Send message
Joined: 7 Mar 17
Posts: 34
Credit: 2,584,831
RAC: 0
Message 3291 - Posted: 15 Mar 2017, 20:38:46 UTC

Hmm... definitly no Temp-Issue, running @4.2GHz each core without throtteling at 1.15V at around 56-65 degrees core. CPU itself has around 58 degrees (at least thats what PECI tells me). But i compared to other i7 6700k and my current times for e.g. Riesel shorts are nearly the same. But i have no clue why your CPU is faster than mine. Is it stock speed or oc?

Definitly looking forward to llr 3.8.10 and multicore useage!

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 3292 - Posted: 15 Mar 2017, 21:02:26 UTC - in response to Message 3291.

Hmm... definitly no Temp-Issue, running @4.2GHz each core without throtteling at 1.15V at around 56-65 degrees core. CPU itself has around 58 degrees (at least thats what PECI tells me). But i compared to other i7 6700k and my current times for e.g. Riesel shorts are nearly the same. But i have no clue why your CPU is faster than mine. Is it stock speed or oc?

Definitly looking forward to llr 3.8.10 and multicore useage!


The CPU is not OC but was more expensive. It has 12MB cache and a 6 channel controller but only AVX not AVX2. But the memory bandwith is higher than yours.

Thalus
Send message
Joined: 7 Mar 17
Posts: 34
Credit: 2,584,831
RAC: 0
Message 3293 - Posted: 15 Mar 2017, 21:19:27 UTC - in response to Message 3292.

Hmm... so I should think about using a RAM-Disk then...

By the way, hopefully the new 3.8.20 gets optimized a bit more. Did some testing with single core 3.8.18/3.8.20 and multicore 3.8.20...
3.8.18:
129897*68^129897+1 - 228.383s

3.8.20:
129897*68^129897+1 - 229.175s

3.8.20 8 threads (this is only using ~90% of CPU):
129897*68^129897+1 - 103.033s

Strange results...

Wailing Angus Beef
Send message
Joined: 4 Dec 14
Posts: 7
Credit: 324,496,280
RAC: 1,542,058
Message 3373 - Posted: 1 Apr 2017, 20:35:04 UTC
Last modified: 1 Apr 2017, 21:06:36 UTC

Do you see all cores clocking at 4.2Ghz when SRBase is running? Some CPUs automatically downclock when running AVX even thought temps are fine. I have a Xeon Broadwell-EP ES system that will clock at 2.8Ghz running non-optimized apps but downclocks to 2.3Ghz when running SRBase or project apps which use AVX/AVX2. And the i7-6700K only supports 2 memory channels. Are you using 2x 8GB or 4x 4GB sticks? And what speed is yours and Rebs?

Reb, what system are you running with 6 memory channels?

Thalus
Send message
Joined: 7 Mar 17
Posts: 34
Credit: 2,584,831
RAC: 0
Message 3374 - Posted: 1 Apr 2017, 21:41:12 UTC
Last modified: 1 Apr 2017, 21:42:29 UTC

Stock is 4.2 turbo on 1 core. I set all cores to 4.2 so clockspeed is identically for all cores. And yes. I can see all cores at 4.2.
I use 2x8GB DDR4 @ 3GHz.


Post to thread

Message boards : Number crunching : FMA3 vs AVX


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther