mfakto --perftest
log in

Advanced search

Message boards : Number crunching : mfakto --perftest

Author Message
Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7255
Credit: 42,729,227
RAC: 4
Message 6080 - Posted: 10 Apr 2020, 6:22:58 UTC
Last modified: 10 Apr 2020, 6:24:49 UTC

Standalone tests:

add this line to your mfakto.ini file

TestExponents=2000093,39000037,66362159,74000077,78000071,332900047,999900079,2001862367,4201971233


RX5500XT - done
output needed for Vega 56/64, Radeon VII, etc.

run
mfakto --perftest (windows)
./mfakto --perftest (linux)

the mfakto name could be another name so change it

k3ack3r
Volunteer developer
Volunteer tester
Send message
Joined: 7 Aug 19
Posts: 19
Credit: 10,641,118
RAC: 0
Message 6101 - Posted: 10 Apr 2020, 19:19:28 UTC
Last modified: 10 Apr 2020, 19:20:23 UTC

http://www.filedropper.com/mfaktowinbenchmark
Just run the benchmark script - edit it and add -d xx if you need...

k3ack3r
Volunteer developer
Volunteer tester
Send message
Joined: 7 Aug 19
Posts: 19
Credit: 10,641,118
RAC: 0
Message 6102 - Posted: 10 Apr 2020, 20:21:17 UTC - in response to Message 6101.

Also, ignore the errors on console and post perftest*.txt

Thanks :)

Gigacruncher [TSBTs Pirate]
Send message
Joined: 28 Mar 20
Posts: 48
Credit: 8,419,360
RAC: 0
Message 6103 - Posted: 10 Apr 2020, 20:41:29 UTC - in response to Message 6102.
Last modified: 10 Apr 2020, 20:41:49 UTC

AMD Radeon HD 77600M Series

mfakto 0.15pre6-MGW (64bit build)


Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType GCN
SmallExp no
UseBinfile
Select device - Get device info:

OpenCL device info
name Turks (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (1800.8) (1800.8 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 6 (384 compute elements)
clock rate 600 MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for GCN

Compiling kernels.


Perftest

Generate list of the first 1075766 primes: 171.60 ms

1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 10 iterations)
Init_class(sieveprimes= 5000): 1.56 ms
Init_class(sieveprimes= 20000): 4.68 ms
Init_class(sieveprimes= 80000): 18.72 ms
Init_class(sieveprimes= 200000): 49.92 ms
Init_class(sieveprimes= 500000): 134.16 ms
Init_class(sieveprimes=1000000): 282.36 ms

2. CPU-Sieve (output rate M/s)
Sieve size is fixed at compile time, cannot test with variable sizes. Just running 3 fixed tests.

SievePrimes: 254
SieveSizeLimit
36 kiB 336.1

Best SieveSizeLimit for
SievePrimes: 254
at kiB: 36
max M/s: 336.1
Survivors: 36.41%
removal rate 587.0


3. Memory copy to GPU (blocks of 8388608 bytes)

Standard copy, standard queue:
800 MB in 202.8 ms (4136.4 MB/s) (real)

Standard copy, profiled queue:
800 MB in 202.8 ms (4136.4 MB/s) (real)
800 MB in 196.2 ms (4276.3 MB/s) (profiled data)
8 MB in 1.7 ms (4961.7 MB/s) (profiled data, peak)

Standard copy, two queues:
800 MB in 187.2 ms (4481.1 MB/s) (real)

Reinitializing with gpu_sieving enabled.
Select device - Get device info:

OpenCL device info
name Turks (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (1800.8) (1800.8 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 6 (384 compute elements)
clock rate 600 MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for GCN

Compiling kernels.

4. GPU sieve, 10 iterations each
GPUSievePrimes (adjusted) 52534
GPUsieve minimum exponent 646182

gpusieve_init: 62.400000 ms (CPU work)
gpusieve_init_exponent: 0.780000 ms (CalcModularInverses)
gpusieve_init_class: 0.000000 ms (CalcBitToClear)
gpusieve: 78.000100 ms (SegSieve)
tf: 396.240700 ms = 333.435147 M/s (raw rate, cl_barrett15_69_gs)

GPU sieve raw rate (input rate M/s)
SievePrimes: 54
GPUSieveSize
0 MBit Error -61 (Invalid buffer size): clCreateBuffer (d_bitarray)

mmonnin
Send message
Joined: 1 Feb 17
Posts: 27
Credit: 311,202,073
RAC: 84,573
Message 6105 - Posted: 10 Apr 2020, 21:58:12 UTC

Not sure if it worked or not as the driver crashed. Radeon VII

mfakto 0.15pre6-MGW (64bit build) Runtime options Inifile mfakto.ini Verbosity 1 SieveOnGPU yes MoreClasses yes GPUSievePrimes 81157 GPUSieveProcessSize 24 Kib GPUSieveSize 96 Mib FlushInterval 0 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300 s Stages enabled StopAfterFactor class PrintMode compact V5UserID none ComputerID none TimeStampInResults no VectorSize 2 GPUType GCN SmallExp no UseBinfile Select device - Get device info: OpenCL device info name gfx906 (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL)) maximum threads per block 1024 maximum threads per grid 1073741824 number of multiprocessors 60 (3840 compute elements) clock rate 1802 MHz Automatic parameters threads per grid 2097152 optimizing kernels for GCN Compiling kernels. Perftest Generate list of the first 1075766 primes: 283.02 ms 1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 10 iterations) Init_class(sieveprimes= 5000): 1.30 ms Init_class(sieveprimes= 20000): 6.00 ms Init_class(sieveprimes= 80000): 25.10 ms Init_class(sieveprimes= 200000): 64.80 ms Init_class(sieveprimes= 500000): 175.31 ms Init_class(sieveprimes=1000000): 368.82 ms 2. CPU-Sieve (output rate M/s) Sieve size is fixed at compile time, cannot test with variable sizes. Just running 3 fixed tests. SievePrimes: 254 SieveSizeLimit 36 kiB 243.8 Best SieveSizeLimit for SievePrimes: 254 at kiB: 36 max M/s: 243.8 Survivors: 36.41% removal rate 425.9 3. Memory copy to GPU (blocks of 8388608 bytes) Standard copy, standard queue: 800 MB in 452.0 ms (1855.8 MB/s) (real) Standard copy, profiled queue: 800 MB in 359.0 ms (2336.5 MB/s) (real) 800 MB in 69.3 ms (12107.2 MB/s) (profiled data) 8 MB in 0.7 ms (12584.9 MB/s) (profiled data, peak) Standard copy, two queues: 800 MB in 372.0 ms (2254.9 MB/s) (real) Reinitializing with gpu_sieving enabled. Select device - Get device info: OpenCL device info name gfx906 (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL)) maximum threads per block 1024 maximum threads per grid 1073741824 number of multiprocessors 60 (3840 compute elements) clock rate 1802 MHz Automatic parameters threads per grid 2097152 optimizing kernels for GCN Compiling kernels. 4. GPU sieve, 10 iterations each GPUSievePrimes (adjusted) 52534 GPUsieve minimum exponent 646182 gpusieve_init: 125.007000 ms (CPU work) gpusieve_init_exponent: 0.200000 ms (CalcModularInverses) gpusieve_init_class: 0.000000 ms (CalcBitToClear) gpusieve: 1.000100 ms (SegSieve) tf: 11.000600 ms = 12010.306347 M/s (raw rate, cl_barrett15_69_gs) GPU sieve raw rate (input rate M/s) SievePrimes: 54 GPUSieveSize 0 MBit Error -61 (Invalid buffer size): clCreateBuffer (d_bitarray)

k3ack3r
Volunteer developer
Volunteer tester
Send message
Joined: 7 Aug 19
Posts: 19
Credit: 10,641,118
RAC: 0
Message 6106 - Posted: 11 Apr 2020, 0:16:05 UTC

Well, appearently "skipped if empty" is skipped only when the whole entry/line is gone... really hoping this one works/doesn't crash the GPU.
http://www.filedropper.com/mfaktokernelbenchmark

Gigacruncher [TSBTs Pirate]
Send message
Joined: 28 Mar 20
Posts: 48
Credit: 8,419,360
RAC: 0
Message 6112 - Posted: 11 Apr 2020, 7:27:06 UTC - in response to Message 6106.

Now it is working.

mfakto 0.15pre6-MGW (64bit build)


Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType GCN
SmallExp no
UseBinfile
Select device - Get device info:

OpenCL device info
name Turks (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (1800.8) (1800.8 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 6 (384 compute elements)
clock rate 600 MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for GCN

Compiling kernels.


Perftest

Generate list of the first 1075766 primes: 171.60 ms

1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 10 iterations)
Init_class(sieveprimes= 5000): 1.56 ms
Init_class(sieveprimes= 20000): 4.68 ms
Init_class(sieveprimes= 80000): 18.72 ms
Init_class(sieveprimes= 200000): 49.92 ms
Init_class(sieveprimes= 500000): 134.16 ms
Init_class(sieveprimes=1000000): 280.80 ms

2. CPU-Sieve (output rate M/s)

3. Memory copy to GPU (blocks of 8388608 bytes)

Standard copy, standard queue:
800 MB in 202.8 ms (4136.4 MB/s) (real)

Standard copy, profiled queue:
800 MB in 187.2 ms (4481.1 MB/s) (real)
800 MB in 195.7 ms (4286.5 MB/s) (profiled data)
8 MB in 1.7 ms (4876.5 MB/s) (profiled data, peak)

Standard copy, two queues:
800 MB in 202.8 ms (4136.4 MB/s) (real)

Reinitializing with gpu_sieving enabled.
Select device - Get device info:

OpenCL device info
name Turks (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (1800.8) (1800.8 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 6 (384 compute elements)
clock rate 600 MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for GCN

Compiling kernels.

4. GPU sieve, 10 iterations each
GPUSievePrimes (adjusted) 52534
GPUsieve minimum exponent 646182

gpusieve_init: 78.000000 ms (CPU work)
gpusieve_init_exponent: 0.000000 ms (CalcModularInverses)
gpusieve_init_class: 1.560000 ms (CalcBitToClear)
gpusieve: 78.000200 ms (SegSieve)
tf: 394.680700 ms = 334.753070 M/s (raw rate, cl_barrett15_69_gs)


Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType GCN
SmallExp no
UseBinfile
Select device - Get device info:

OpenCL device info
name Turks (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (1800.8) (1800.8 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 6 (384 compute elements)
clock rate 600 MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for GCN

Compiling kernels.
GPUSievePrimes (adjusted) 81206
GPUsieve minimum exponent 1037054

5. GPU tf kernels

exponent=78000071 ... calibrating
exponent=78000071, 767M FCs each, k=1891972028970, 0.766434 GHz-days (assignment), 0.001570 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2854.80 ms ==> 282.09M FCs/s ==> 47.51 GHz-days/day
cl_barrett15_69_gs [60-69]: 3010.80 ms ==> 267.47M FCs/s ==> 45.05 GHz-days/day
cl_barrett15_70_gs [60-69]: 3026.41 ms ==> 266.09M FCs/s ==> 44.82 GHz-days/day
cl_barrett32_87_gs [65-87]: 3026.41 ms ==> 266.09M FCs/s ==> 44.82 GHz-days/day
cl_barrett32_77_gs [64-77]: 3073.20 ms ==> 262.04M FCs/s ==> 44.14 GHz-days/day
cl_barrett32_79_gs [64-79]: 3244.81 ms ==> 248.18M FCs/s ==> 41.80 GHz-days/day
cl_barrett15_71_gs [60-70]: 3260.41 ms ==> 247.00M FCs/s ==> 41.60 GHz-days/day
cl_barrett32_88_gs [65-88]: 3338.41 ms ==> 241.22M FCs/s ==> 40.63 GHz-days/day
cl_barrett32_92_gs [65-92]: 3385.21 ms ==> 237.89M FCs/s ==> 40.07 GHz-days/day
cl_barrett15_74_gs [60-74]: 3541.21 ms ==> 227.41M FCs/s ==> 38.30 GHz-days/day
cl_barrett15_73_gs [60-73]: 3572.41 ms ==> 225.42M FCs/s ==> 37.97 GHz-days/day
cl_barrett15_82_gs [60-81]: 4227.61 ms ==> 190.49M FCs/s ==> 32.09 GHz-days/day
cl_barrett15_83_gs [60-82]: 4352.41 ms ==> 185.03M FCs/s ==> 31.17 GHz-days/day
cl_barrett15_88_gs [60-87]: 4633.21 ms ==> 173.81M FCs/s ==> 29.28 GHz-days/day

Resulting speed for M78000071:
bit_min - bit_max GHz-days/day kernelname
60 - 64 45.053 cl_barrett15_69_gs
64 - 76 47.515 cl_barrett32_76_gs
76 - 87 44.821 cl_barrett32_87_gs
87 - 88 40.632 cl_barrett32_88_gs
88 - 92 40.070 cl_barrett32_92_gs

exponent=95795449 ... calibrating
exponent=95795449, 767M FCs each, k=1540511100790, 0.624058 GHz-days (assignment), 0.001570 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2870.41 ms ==> 280.55M FCs/s ==> 47.26 GHz-days/day
cl_barrett15_70_gs [60-69]: 3010.81 ms ==> 267.47M FCs/s ==> 45.05 GHz-days/day
cl_barrett15_69_gs [60-69]: 3026.41 ms ==> 266.09M FCs/s ==> 44.82 GHz-days/day
cl_barrett32_87_gs [65-87]: 3042.01 ms ==> 264.73M FCs/s ==> 44.59 GHz-days/day
cl_barrett32_77_gs [64-77]: 3057.61 ms ==> 263.38M FCs/s ==> 44.36 GHz-days/day
cl_barrett15_71_gs [60-70]: 3260.41 ms ==> 247.00M FCs/s ==> 41.60 GHz-days/day
cl_barrett32_79_gs [64-79]: 3260.41 ms ==> 247.00M FCs/s ==> 41.60 GHz-days/day
cl_barrett32_88_gs [65-88]: 3338.41 ms ==> 241.22M FCs/s ==> 40.63 GHz-days/day
cl_barrett32_92_gs [65-92]: 3369.61 ms ==> 238.99M FCs/s ==> 40.26 GHz-days/day
cl_barrett15_73_gs [60-73]: 3556.81 ms ==> 226.41M FCs/s ==> 38.14 GHz-days/day
cl_barrett15_74_gs [60-74]: 3556.81 ms ==> 226.41M FCs/s ==> 38.14 GHz-days/day
cl_barrett15_82_gs [60-81]: 4212.01 ms ==> 191.19M FCs/s ==> 32.20 GHz-days/day
cl_barrett15_83_gs [60-82]: 4352.41 ms ==> 185.03M FCs/s ==> 31.17 GHz-days/day
cl_barrett15_88_gs [60-87]: 4633.21 ms ==> 173.81M FCs/s ==> 29.28 GHz-days/day

Resulting speed for M95795449:
bit_min - bit_max GHz-days/day kernelname
60 - 64 45.053 cl_barrett15_70_gs
64 - 76 47.257 cl_barrett32_76_gs
76 - 87 44.591 cl_barrett32_87_gs
87 - 88 40.632 cl_barrett32_88_gs
88 - 92 40.256 cl_barrett32_92_gs
mfakto 0.15pre6-MGW (64bit build)


Runtime options
Inifile mfaktonodp.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType VLIW5
SmallExp no
UseBinfile
Select device - Get device info:

OpenCL device info
name Turks (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (1800.8) (1800.8 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 6 (480 compute elements)
clock rate 600 MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for VLIW5

Compiling kernels.


Perftest

Generate list of the first 1075766 primes: 171.60 ms

1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 10 iterations)
Init_class(sieveprimes= 5000): 1.56 ms
Init_class(sieveprimes= 20000): 4.68 ms
Init_class(sieveprimes= 80000): 18.72 ms
Init_class(sieveprimes= 200000): 51.48 ms
Init_class(sieveprimes= 500000): 132.60 ms
Init_class(sieveprimes=1000000): 279.24 ms

2. CPU-Sieve (output rate M/s)

3. Memory copy to GPU (blocks of 8388608 bytes)

Standard copy, standard queue:
800 MB in 202.8 ms (4136.4 MB/s) (real)

Standard copy, profiled queue:
800 MB in 202.8 ms (4136.4 MB/s) (real)
800 MB in 199.3 ms (4209.6 MB/s) (profiled data)
8 MB in 1.7 ms (4882.1 MB/s) (profiled data, peak)

Standard copy, two queues:
800 MB in 202.8 ms (4136.4 MB/s) (real)

Reinitializing with gpu_sieving enabled.
Select device - Get device info:

OpenCL device info
name Turks (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (1800.8) (1800.8 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 6 (480 compute elements)
clock rate 600 MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for VLIW5

Compiling kernels.

4. GPU sieve, 10 iterations each
GPUSievePrimes (adjusted) 52534
GPUsieve minimum exponent 646182

gpusieve_init: 62.401000 ms (CPU work)
gpusieve_init_exponent: 0.780000 ms (CalcModularInverses)
gpusieve_init_class: 0.000000 ms (CalcBitToClear)
gpusieve: 78.000100 ms (SegSieve)
tf: 393.120700 ms = 336.081453 M/s (raw rate, cl_barrett15_69_gs)


Runtime options
Inifile mfaktonodp.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType VLIW5
SmallExp no
UseBinfile
Select device - Get device info:

OpenCL device info
name Turks (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (1800.8) (1800.8 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 6 (480 compute elements)
clock rate 600 MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for VLIW5

Compiling kernels.
GPUSievePrimes (adjusted) 81206
GPUsieve minimum exponent 1037054

5. GPU tf kernels

exponent=78000071 ... calibrating
exponent=78000071, 767M FCs each, k=1891972028970, 0.766434 GHz-days (assignment), 0.001570 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2854.80 ms ==> 282.09M FCs/s ==> 47.51 GHz-days/day
cl_barrett15_70_gs [60-69]: 3010.80 ms ==> 267.47M FCs/s ==> 45.05 GHz-days/day
cl_barrett15_69_gs [60-69]: 3010.80 ms ==> 267.47M FCs/s ==> 45.05 GHz-days/day
cl_barrett32_87_gs [65-87]: 3026.41 ms ==> 266.09M FCs/s ==> 44.82 GHz-days/day
cl_barrett32_77_gs [64-77]: 3073.20 ms ==> 262.04M FCs/s ==> 44.14 GHz-days/day
cl_barrett15_71_gs [60-70]: 3244.81 ms ==> 248.18M FCs/s ==> 41.80 GHz-days/day
cl_barrett32_79_gs [64-79]: 3244.81 ms ==> 248.18M FCs/s ==> 41.80 GHz-days/day
cl_barrett32_88_gs [65-88]: 3338.41 ms ==> 241.22M FCs/s ==> 40.63 GHz-days/day
cl_barrett32_92_gs [65-92]: 3385.21 ms ==> 237.89M FCs/s ==> 40.07 GHz-days/day
cl_barrett15_74_gs [60-74]: 3525.61 ms ==> 228.42M FCs/s ==> 38.47 GHz-days/day
cl_barrett15_73_gs [60-73]: 3556.81 ms ==> 226.41M FCs/s ==> 38.14 GHz-days/day
cl_barrett15_82_gs [60-81]: 4212.01 ms ==> 191.19M FCs/s ==> 32.20 GHz-days/day
cl_barrett15_83_gs [60-82]: 4352.41 ms ==> 185.03M FCs/s ==> 31.17 GHz-days/day
cl_barrett15_88_gs [60-87]: 4648.81 ms ==> 173.23M FCs/s ==> 29.18 GHz-days/day

Resulting speed for M78000071:
bit_min - bit_max GHz-days/day kernelname
60 - 64 45.053 cl_barrett15_70_gs
64 - 76 47.515 cl_barrett32_76_gs
76 - 87 44.821 cl_barrett32_87_gs
87 - 88 40.632 cl_barrett32_88_gs
88 - 92 40.070 cl_barrett32_92_gs

exponent=95795449 ... calibrating
exponent=95795449, 767M FCs each, k=1540511100790, 0.624058 GHz-days (assignment), 0.001570 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2854.80 ms ==> 282.09M FCs/s ==> 47.51 GHz-days/day
cl_barrett15_69_gs [60-69]: 3010.80 ms ==> 267.47M FCs/s ==> 45.05 GHz-days/day
cl_barrett15_70_gs [60-69]: 3010.81 ms ==> 267.47M FCs/s ==> 45.05 GHz-days/day
cl_barrett32_87_gs [65-87]: 3042.01 ms ==> 264.73M FCs/s ==> 44.59 GHz-days/day
cl_barrett32_77_gs [64-77]: 3073.21 ms ==> 262.04M FCs/s ==> 44.14 GHz-days/day
cl_barrett15_71_gs [60-70]: 3229.20 ms ==> 249.38M FCs/s ==> 42.01 GHz-days/day
cl_barrett32_79_gs [64-79]: 3260.41 ms ==> 247.00M FCs/s ==> 41.60 GHz-days/day
cl_barrett32_88_gs [65-88]: 3322.81 ms ==> 242.36M FCs/s ==> 40.82 GHz-days/day
cl_barrett32_92_gs [65-92]: 3385.21 ms ==> 237.89M FCs/s ==> 40.07 GHz-days/day
cl_barrett15_74_gs [60-74]: 3525.61 ms ==> 228.42M FCs/s ==> 38.47 GHz-days/day
cl_barrett15_73_gs [60-73]: 3556.81 ms ==> 226.41M FCs/s ==> 38.14 GHz-days/day
cl_barrett15_82_gs [60-81]: 4212.01 ms ==> 191.19M FCs/s ==> 32.20 GHz-days/day
cl_barrett15_83_gs [60-82]: 4352.41 ms ==> 185.03M FCs/s ==> 31.17 GHz-days/day
cl_barrett15_88_gs [60-87]: 4648.81 ms ==> 173.23M FCs/s ==> 29.18 GHz-days/day

Resulting speed for M95795449:
bit_min - bit_max GHz-days/day kernelname
60 - 64 45.053 cl_barrett15_69_gs
64 - 76 47.515 cl_barrett32_76_gs
76 - 87 44.591 cl_barrett32_87_gs
87 - 88 40.823 cl_barrett32_88_gs
88 - 92 40.070 cl_barrett32_92_gs

k3ack3r
Volunteer developer
Volunteer tester
Send message
Joined: 7 Aug 19
Posts: 19
Credit: 10,641,118
RAC: 0
Message 6147 - Posted: 13 Apr 2020, 1:47:30 UTC
Last modified: 13 Apr 2020, 2:04:30 UTC

All I need is data from R VII, once I get that most GPU's should be running on "good" kernels with no unrecognized warning.
Also... thank you to everyone who has posted their results! :)

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7255
Credit: 42,729,227
RAC: 4
Message 6205 - Posted: 15 Apr 2020, 11:08:32 UTC - in response to Message 6147.

All I need is data from R VII, once I get that most GPU's should be running on "good" kernels with no unrecognized warning.
Also... thank you to everyone who has posted their results! :)


We could really need a test from a R7 card. If someone can run it this could help for a new better version.

mmonnin
Send message
Joined: 1 Feb 17
Posts: 27
Credit: 311,202,073
RAC: 84,573
Message 6238 - Posted: 16 Apr 2020, 10:27:25 UTC - in response to Message 6205.

I did ^^. The program crashed the driver.

k3ack3r
Volunteer developer
Volunteer tester
Send message
Joined: 7 Aug 19
Posts: 19
Credit: 10,641,118
RAC: 0
Message 6244 - Posted: 16 Apr 2020, 13:17:18 UTC - in response to Message 6238.

Did you try the second one I posted later?

mmonnin
Send message
Joined: 1 Feb 17
Posts: 27
Credit: 311,202,073
RAC: 84,573
Message 6248 - Posted: 16 Apr 2020, 22:03:29 UTC - in response to Message 6244.

Did you try the second one I posted later?


I didn't realize it was a new version.

mfakto 0.15pre6-MGW (64bit build) Runtime options Inifile mfakto.ini Verbosity 1 SieveOnGPU yes MoreClasses yes GPUSievePrimes 81157 GPUSieveProcessSize 24 Kib GPUSieveSize 96 Mib FlushInterval 0 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300 s Stages enabled StopAfterFactor class PrintMode compact V5UserID none ComputerID none TimeStampInResults no VectorSize 2 GPUType GCN SmallExp no UseBinfile Select device - Get device info: OpenCL device info name gfx906 (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL)) maximum threads per block 1024 maximum threads per grid 1073741824 number of multiprocessors 60 (3840 compute elements) clock rate 1802 MHz Automatic parameters threads per grid 2097152 optimizing kernels for GCN Compiling kernels. Perftest Generate list of the first 1075766 primes: 273.01 ms 1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 10 iterations) Init_class(sieveprimes= 5000): 1.30 ms Init_class(sieveprimes= 20000): 5.80 ms Init_class(sieveprimes= 80000): 25.50 ms Init_class(sieveprimes= 200000): 67.50 ms Init_class(sieveprimes= 500000): 180.01 ms Init_class(sieveprimes=1000000): 377.61 ms 2. CPU-Sieve (output rate M/s) 3. Memory copy to GPU (blocks of 8388608 bytes) Standard copy, standard queue: 800 MB in 408.0 ms (2055.9 MB/s) (real) Standard copy, profiled queue: 800 MB in 354.6 ms (2365.5 MB/s) (real) 800 MB in 69.6 ms (12059.4 MB/s) (profiled data) 8 MB in 0.7 ms (12500.9 MB/s) (profiled data, peak) Standard copy, two queues: 800 MB in 373.0 ms (2248.8 MB/s) (real) Reinitializing with gpu_sieving enabled. Select device - Get device info: OpenCL device info name gfx906 (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL)) maximum threads per block 1024 maximum threads per grid 1073741824 number of multiprocessors 60 (3840 compute elements) clock rate 1802 MHz Automatic parameters threads per grid 2097152 optimizing kernels for GCN Compiling kernels. 4. GPU sieve, 10 iterations each GPUSievePrimes (adjusted) 52534 GPUsieve minimum exponent 646182 gpusieve_init: 129.007000 ms (CPU work) gpusieve_init_exponent: 0.200000 ms (CalcModularInverses) gpusieve_init_class: 0.000000 ms (CalcBitToClear) gpusieve: 1.100100 ms (SegSieve) tf: 11.100600 ms = 11902.111237 M/s (raw rate, cl_barrett15_69_gs) Runtime options Inifile mfakto.ini Verbosity 1 SieveOnGPU yes MoreClasses yes GPUSievePrimes 81157 GPUSieveProcessSize 24 Kib GPUSieveSize 96 Mib FlushInterval 0 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300 s Stages enabled StopAfterFactor class PrintMode compact V5UserID none ComputerID none TimeStampInResults no VectorSize 2 GPUType GCN SmallExp no UseBinfile Select device - Get device info: OpenCL device info name gfx906 (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL)) maximum threads per block 1024 maximum threads per grid 1073741824 number of multiprocessors 60 (3840 compute elements) clock rate 1802 MHz Automatic parameters threads per grid 2097152 optimizing kernels for GCN Compiling kernels. GPUSievePrimes (adjusted) 81206 GPUsieve minimum exponent 1037054 5. GPU tf kernels exponent=78000071 ... calibrating exponent=78000071, 24575M FCs each, k=1891972028970, 0.766434 GHz-days (assignment), 0.050239 GHz-days (per test): .............. cl_barrett32_76_gs [64-76]: 2060.72 ms ==> 12505.26M FCs/s ==> 2106.38 GHz-days/day cl_barrett32_77_gs [64-77]: 2252.73 ms ==> 11439.38M FCs/s ==> 1926.85 GHz-days/day cl_barrett32_87_gs [65-87]: 2354.68 ms ==> 10944.09M FCs/s ==> 1843.42 GHz-days/day cl_barrett15_69_gs [60-69]: 2404.14 ms ==> 10718.94M FCs/s ==> 1805.50 GHz-days/day cl_barrett15_70_gs [60-69]: 2406.14 ms ==> 10710.03M FCs/s ==> 1804.00 GHz-days/day cl_barrett32_79_gs [64-79]: 2429.28 ms ==> 10608.00M FCs/s ==> 1786.81 GHz-days/day cl_barrett32_88_gs [65-88]: 2561.94 ms ==> 10058.69M FCs/s ==> 1694.28 GHz-days/day cl_barrett15_71_gs [60-70]: 2568.35 ms ==> 10033.62M FCs/s ==> 1690.06 GHz-days/day cl_barrett32_92_gs [65-92]: 2732.36 ms ==> 9431.35M FCs/s ==> 1588.61 GHz-days/day cl_barrett15_73_gs [60-73]: 2860.71 ms ==> 9008.19M FCs/s ==> 1517.34 GHz-days/day cl_barrett15_74_gs [60-74]: 2969.47 ms ==> 8678.24M FCs/s ==> 1461.76 GHz-days/day cl_barrett15_82_gs [60-81]: 3229.97 ms ==> 7978.35M FCs/s ==> 1343.87 GHz-days/day cl_barrett15_83_gs [60-82]: 3492.20 ms ==> 7379.26M FCs/s ==> 1242.96 GHz-days/day cl_barrett15_88_gs [60-87]: 3765.65 ms ==> 6843.39M FCs/s ==> 1152.70 GHz-days/day Resulting speed for M78000071: bit_min - bit_max GHz-days/day kernelname 60 - 64 1805.495 cl_barrett15_69_gs 64 - 76 2106.383 cl_barrett32_76_gs 76 - 77 1926.846 cl_barrett32_77_gs 77 - 87 1843.420 cl_barrett32_87_gs 87 - 88 1694.284 cl_barrett32_88_gs 88 - 92 1588.615 cl_barrett32_92_gs exponent=95795449 ... calibrating exponent=95795449, 24575M FCs each, k=1540511100790, 0.624058 GHz-days (assignment), 0.050239 GHz-days (per test): .............. cl_barrett32_76_gs [64-76]: 2105.12 ms ==> 12241.48M FCs/s ==> 2061.95 GHz-days/day cl_barrett32_77_gs [64-77]: 2268.13 ms ==> 11361.70M FCs/s ==> 1913.76 GHz-days/day cl_barrett32_87_gs [65-87]: 2371.49 ms ==> 10866.49M FCs/s ==> 1830.35 GHz-days/day cl_barrett15_70_gs [60-69]: 2408.68 ms ==> 10698.72M FCs/s ==> 1802.09 GHz-days/day cl_barrett32_79_gs [64-79]: 2430.14 ms ==> 10604.25M FCs/s ==> 1786.18 GHz-days/day cl_barrett15_69_gs [60-69]: 2432.14 ms ==> 10595.53M FCs/s ==> 1784.71 GHz-days/day cl_barrett15_71_gs [60-70]: 2586.24 ms ==> 9964.20M FCs/s ==> 1678.37 GHz-days/day cl_barrett32_88_gs [65-88]: 2609.06 ms ==> 9877.05M FCs/s ==> 1663.69 GHz-days/day cl_barrett32_92_gs [65-92]: 2760.92 ms ==> 9333.79M FCs/s ==> 1572.18 GHz-days/day cl_barrett15_73_gs [60-73]: 2887.16 ms ==> 8925.64M FCs/s ==> 1503.43 GHz-days/day cl_barrett15_74_gs [60-74]: 2972.86 ms ==> 8668.35M FCs/s ==> 1460.10 GHz-days/day cl_barrett15_82_gs [60-81]: 3235.13 ms ==> 7965.62M FCs/s ==> 1341.73 GHz-days/day cl_barrett15_83_gs [60-82]: 3492.58 ms ==> 7378.43M FCs/s ==> 1242.82 GHz-days/day cl_barrett15_88_gs [60-87]: 3837.41 ms ==> 6715.41M FCs/s ==> 1131.14 GHz-days/day Resulting speed for M95795449: bit_min - bit_max GHz-days/day kernelname 60 - 64 1802.090 cl_barrett15_70_gs 64 - 76 2061.953 cl_barrett32_76_gs 76 - 77 1913.762 cl_barrett32_77_gs 77 - 87 1830.349 cl_barrett32_87_gs 87 - 88 1663.688 cl_barrett32_88_gs 88 - 92 1572.181 cl_barrett32_92_gs mfakto 0.15pre6-MGW (64bit build) Runtime options Inifile mfaktonodp.ini Verbosity 1 SieveOnGPU yes MoreClasses yes GPUSievePrimes 81157 GPUSieveProcessSize 24 Kib GPUSieveSize 96 Mib FlushInterval 0 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300 s Stages enabled StopAfterFactor class PrintMode compact V5UserID none ComputerID none TimeStampInResults no VectorSize 2 GPUType VLIW5 SmallExp no UseBinfile Select device - Get device info: OpenCL device info name gfx906 (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL)) maximum threads per block 1024 maximum threads per grid 1073741824 number of multiprocessors 60 (4800 compute elements) clock rate 1802 MHz Automatic parameters threads per grid 2097152 optimizing kernels for VLIW5 Compiling kernels. Perftest Generate list of the first 1075766 primes: 270.01 ms 1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 10 iterations) Init_class(sieveprimes= 5000): 1.40 ms Init_class(sieveprimes= 20000): 5.70 ms Init_class(sieveprimes= 80000): 25.30 ms Init_class(sieveprimes= 200000): 67.80 ms Init_class(sieveprimes= 500000): 180.66 ms Init_class(sieveprimes=1000000): 377.75 ms 2. CPU-Sieve (output rate M/s) 3. Memory copy to GPU (blocks of 8388608 bytes) Standard copy, standard queue: 800 MB in 406.6 ms (2063.1 MB/s) (real) Standard copy, profiled queue: 800 MB in 358.8 ms (2338.0 MB/s) (real) 800 MB in 70.2 ms (11952.9 MB/s) (profiled data) 8 MB in 0.7 ms (12512.8 MB/s) (profiled data, peak) Standard copy, two queues: 800 MB in 375.4 ms (2234.6 MB/s) (real) Reinitializing with gpu_sieving enabled. Select device - Get device info: OpenCL device info name gfx906 (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL)) maximum threads per block 1024 maximum threads per grid 1073741824 number of multiprocessors 60 (4800 compute elements) clock rate 1802 MHz Automatic parameters threads per grid 2097152 optimizing kernels for VLIW5 Compiling kernels. 4. GPU sieve, 10 iterations each GPUSievePrimes (adjusted) 52534 GPUsieve minimum exponent 646182 gpusieve_init: 144.008000 ms (CPU work) gpusieve_init_exponent: 0.200050 ms (CalcModularInverses) gpusieve_init_class: 0.000000 ms (CalcBitToClear) gpusieve: 1.000000 ms (SegSieve) tf: 11.000700 ms = 12010.197169 M/s (raw rate, cl_barrett15_69_gs) Runtime options Inifile mfaktonodp.ini Verbosity 1 SieveOnGPU yes MoreClasses yes GPUSievePrimes 81157 GPUSieveProcessSize 24 Kib GPUSieveSize 96 Mib FlushInterval 0 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300 s Stages enabled StopAfterFactor class PrintMode compact V5UserID none ComputerID none TimeStampInResults no VectorSize 2 GPUType VLIW5 SmallExp no UseBinfile Select device - Get device info: OpenCL device info name gfx906 (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL)) maximum threads per block 1024 maximum threads per grid 1073741824 number of multiprocessors 60 (4800 compute elements) clock rate 1802 MHz Automatic parameters threads per grid 2097152 optimizing kernels for VLIW5 Compiling kernels. GPUSievePrimes (adjusted) 81206 GPUsieve minimum exponent 1037054 5. GPU tf kernels exponent=78000071 ... calibrating exponent=78000071, 24575M FCs each, k=1891972028970, 0.766434 GHz-days (assignment), 0.050239 GHz-days (per test): .............. cl_barrett32_76_gs [64-76]: 2056.12 ms ==> 12533.23M FCs/s ==> 2111.09 GHz-days/day cl_barrett32_77_gs [64-77]: 2233.13 ms ==> 11539.78M FCs/s ==> 1943.76 GHz-days/day cl_barrett32_87_gs [65-87]: 2355.33 ms ==> 10941.04M FCs/s ==> 1842.91 GHz-days/day cl_barrett15_69_gs [60-69]: 2395.14 ms ==> 10759.22M FCs/s ==> 1812.28 GHz-days/day cl_barrett15_70_gs [60-69]: 2396.68 ms ==> 10752.29M FCs/s ==> 1811.11 GHz-days/day cl_barrett32_79_gs [64-79]: 2428.14 ms ==> 10612.99M FCs/s ==> 1787.65 GHz-days/day cl_barrett15_71_gs [60-70]: 2555.90 ms ==> 10082.49M FCs/s ==> 1698.29 GHz-days/day cl_barrett32_88_gs [65-88]: 2571.47 ms ==> 10021.41M FCs/s ==> 1688.00 GHz-days/day cl_barrett32_92_gs [65-92]: 2710.51 ms ==> 9507.35M FCs/s ==> 1601.42 GHz-days/day cl_barrett15_73_gs [60-73]: 2863.16 ms ==> 9000.46M FCs/s ==> 1516.04 GHz-days/day cl_barrett15_74_gs [60-74]: 2963.91 ms ==> 8694.52M FCs/s ==> 1464.50 GHz-days/day cl_barrett15_82_gs [60-81]: 3189.16 ms ==> 8080.42M FCs/s ==> 1361.06 GHz-days/day cl_barrett15_83_gs [60-82]: 3433.74 ms ==> 7504.88M FCs/s ==> 1264.12 GHz-days/day cl_barrett15_88_gs [60-87]: 3784.21 ms ==> 6809.83M FCs/s ==> 1147.05 GHz-days/day Resulting speed for M78000071: bit_min - bit_max GHz-days/day kernelname 60 - 64 1812.280 cl_barrett15_69_gs 64 - 76 2111.095 cl_barrett32_76_gs 76 - 77 1943.759 cl_barrett32_77_gs 77 - 87 1842.907 cl_barrett32_87_gs 87 - 88 1688.004 cl_barrett32_88_gs 88 - 92 1601.416 cl_barrett32_92_gs exponent=95795449 ... calibrating exponent=95795449, 24575M FCs each, k=1540511100790, 0.624058 GHz-days (assignment), 0.050239 GHz-days (per test): .............. cl_barrett32_76_gs [64-76]: 2056.66 ms ==> 12529.93M FCs/s ==> 2110.54 GHz-days/day cl_barrett32_77_gs [64-77]: 2233.33 ms ==> 11538.76M FCs/s ==> 1943.59 GHz-days/day cl_barrett32_87_gs [65-87]: 2356.13 ms ==> 10937.34M FCs/s ==> 1842.28 GHz-days/day cl_barrett15_70_gs [60-69]: 2396.14 ms ==> 10754.73M FCs/s ==> 1811.52 GHz-days/day cl_barrett15_69_gs [60-69]: 2404.33 ms ==> 10718.09M FCs/s ==> 1805.35 GHz-days/day cl_barrett32_79_gs [64-79]: 2418.94 ms ==> 10653.36M FCs/s ==> 1794.45 GHz-days/day cl_barrett32_88_gs [65-88]: 2559.15 ms ==> 10069.69M FCs/s ==> 1696.14 GHz-days/day cl_barrett15_71_gs [60-70]: 2567.15 ms ==> 10038.30M FCs/s ==> 1690.85 GHz-days/day cl_barrett32_92_gs [65-92]: 2724.45 ms ==> 9458.74M FCs/s ==> 1593.23 GHz-days/day cl_barrett15_73_gs [60-73]: 2854.12 ms ==> 9029.00M FCs/s ==> 1520.84 GHz-days/day cl_barrett15_74_gs [60-74]: 2957.17 ms ==> 8714.35M FCs/s ==> 1467.84 GHz-days/day cl_barrett15_82_gs [60-81]: 3199.15 ms ==> 8055.21M FCs/s ==> 1356.82 GHz-days/day cl_barrett15_83_gs [60-82]: 3424.26 ms ==> 7525.66M FCs/s ==> 1267.62 GHz-days/day cl_barrett15_88_gs [60-87]: 3768.41 ms ==> 6838.37M FCs/s ==> 1151.85 GHz-days/day Resulting speed for M95795449: bit_min - bit_max GHz-days/day kernelname 60 - 64 1811.524 cl_barrett15_70_gs 64 - 76 2110.538 cl_barrett32_76_gs 76 - 77 1943.585 cl_barrett32_77_gs 77 - 87 1842.284 cl_barrett32_87_gs 87 - 88 1696.136 cl_barrett32_88_gs 88 - 92 1593.227 cl_barrett32_92_gs

k3ack3r
Volunteer developer
Volunteer tester
Send message
Joined: 7 Aug 19
Posts: 19
Credit: 10,641,118
RAC: 0
Message 6250 - Posted: 17 Apr 2020, 0:08:04 UTC - in response to Message 6248.

Thanks! Decent speedups from using the MUL32 kernels.

69-70 15%
71 20%
72-73 28%
74 31%
75 36%
76 36%
77 31%
78 27%


Post to thread

Message boards : Number crunching : mfakto --perftest


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther