log in |
Message boards : Number crunching : mfakto --perftest
Author | Message |
---|---|
Standalone tests: | |
ID: 6080 · Rating: 0 · rate: / Reply Quote | |
http://www.filedropper.com/mfaktowinbenchmark | |
ID: 6101 · Rating: 0 · rate: / Reply Quote | |
Also, ignore the errors on console and post perftest*.txt | |
ID: 6102 · Rating: 0 · rate: / Reply Quote | |
AMD Radeon HD 77600M Series mfakto 0.15pre6-MGW (64bit build) | |
ID: 6103 · Rating: 0 · rate: / Reply Quote | |
Not sure if it worked or not as the driver crashed. Radeon VII mfakto 0.15pre6-MGW (64bit build)
Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType GCN
SmallExp no
UseBinfile
Select device - Get device info:
OpenCL device info
name gfx906 (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 60 (3840 compute elements)
clock rate 1802 MHz
Automatic parameters
threads per grid 2097152
optimizing kernels for GCN
Compiling kernels.
Perftest
Generate list of the first 1075766 primes: 283.02 ms
1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 10 iterations)
Init_class(sieveprimes= 5000): 1.30 ms
Init_class(sieveprimes= 20000): 6.00 ms
Init_class(sieveprimes= 80000): 25.10 ms
Init_class(sieveprimes= 200000): 64.80 ms
Init_class(sieveprimes= 500000): 175.31 ms
Init_class(sieveprimes=1000000): 368.82 ms
2. CPU-Sieve (output rate M/s)
Sieve size is fixed at compile time, cannot test with variable sizes. Just running 3 fixed tests.
SievePrimes: 254
SieveSizeLimit
36 kiB 243.8
Best SieveSizeLimit for
SievePrimes: 254
at kiB: 36
max M/s: 243.8
Survivors: 36.41%
removal rate 425.9
3. Memory copy to GPU (blocks of 8388608 bytes)
Standard copy, standard queue:
800 MB in 452.0 ms (1855.8 MB/s) (real)
Standard copy, profiled queue:
800 MB in 359.0 ms (2336.5 MB/s) (real)
800 MB in 69.3 ms (12107.2 MB/s) (profiled data)
8 MB in 0.7 ms (12584.9 MB/s) (profiled data, peak)
Standard copy, two queues:
800 MB in 372.0 ms (2254.9 MB/s) (real)
Reinitializing with gpu_sieving enabled.
Select device - Get device info:
OpenCL device info
name gfx906 (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 60 (3840 compute elements)
clock rate 1802 MHz
Automatic parameters
threads per grid 2097152
optimizing kernels for GCN
Compiling kernels.
4. GPU sieve, 10 iterations each
GPUSievePrimes (adjusted) 52534
GPUsieve minimum exponent 646182
gpusieve_init: 125.007000 ms (CPU work)
gpusieve_init_exponent: 0.200000 ms (CalcModularInverses)
gpusieve_init_class: 0.000000 ms (CalcBitToClear)
gpusieve: 1.000100 ms (SegSieve)
tf: 11.000600 ms = 12010.306347 M/s (raw rate, cl_barrett15_69_gs)
GPU sieve raw rate (input rate M/s)
SievePrimes: 54
GPUSieveSize
0 MBit Error -61 (Invalid buffer size): clCreateBuffer (d_bitarray)
| |
ID: 6105 · Rating: 0 · rate: / Reply Quote | |
Well, appearently "skipped if empty" is skipped only when the whole entry/line is gone... really hoping this one works/doesn't crash the GPU. | |
ID: 6106 · Rating: 0 · rate: / Reply Quote | |
Now it is working. mfakto 0.15pre6-MGW (64bit build) | |
ID: 6112 · Rating: 0 · rate: / Reply Quote | |
All I need is data from R VII, once I get that most GPU's should be running on "good" kernels with no unrecognized warning. | |
ID: 6147 · Rating: 0 · rate: / Reply Quote | |
All I need is data from R VII, once I get that most GPU's should be running on "good" kernels with no unrecognized warning. We could really need a test from a R7 card. If someone can run it this could help for a new better version. | |
ID: 6205 · Rating: 0 · rate: / Reply Quote | |
I did ^^. The program crashed the driver. | |
ID: 6238 · Rating: 0 · rate: / Reply Quote | |
Did you try the second one I posted later? | |
ID: 6244 · Rating: 0 · rate: / Reply Quote | |
Did you try the second one I posted later? I didn't realize it was a new version. mfakto 0.15pre6-MGW (64bit build)
Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType GCN
SmallExp no
UseBinfile
Select device - Get device info:
OpenCL device info
name gfx906 (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 60 (3840 compute elements)
clock rate 1802 MHz
Automatic parameters
threads per grid 2097152
optimizing kernels for GCN
Compiling kernels.
Perftest
Generate list of the first 1075766 primes: 273.01 ms
1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 10 iterations)
Init_class(sieveprimes= 5000): 1.30 ms
Init_class(sieveprimes= 20000): 5.80 ms
Init_class(sieveprimes= 80000): 25.50 ms
Init_class(sieveprimes= 200000): 67.50 ms
Init_class(sieveprimes= 500000): 180.01 ms
Init_class(sieveprimes=1000000): 377.61 ms
2. CPU-Sieve (output rate M/s)
3. Memory copy to GPU (blocks of 8388608 bytes)
Standard copy, standard queue:
800 MB in 408.0 ms (2055.9 MB/s) (real)
Standard copy, profiled queue:
800 MB in 354.6 ms (2365.5 MB/s) (real)
800 MB in 69.6 ms (12059.4 MB/s) (profiled data)
8 MB in 0.7 ms (12500.9 MB/s) (profiled data, peak)
Standard copy, two queues:
800 MB in 373.0 ms (2248.8 MB/s) (real)
Reinitializing with gpu_sieving enabled.
Select device - Get device info:
OpenCL device info
name gfx906 (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 60 (3840 compute elements)
clock rate 1802 MHz
Automatic parameters
threads per grid 2097152
optimizing kernels for GCN
Compiling kernels.
4. GPU sieve, 10 iterations each
GPUSievePrimes (adjusted) 52534
GPUsieve minimum exponent 646182
gpusieve_init: 129.007000 ms (CPU work)
gpusieve_init_exponent: 0.200000 ms (CalcModularInverses)
gpusieve_init_class: 0.000000 ms (CalcBitToClear)
gpusieve: 1.100100 ms (SegSieve)
tf: 11.100600 ms = 11902.111237 M/s (raw rate, cl_barrett15_69_gs)
Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType GCN
SmallExp no
UseBinfile
Select device - Get device info:
OpenCL device info
name gfx906 (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 60 (3840 compute elements)
clock rate 1802 MHz
Automatic parameters
threads per grid 2097152
optimizing kernels for GCN
Compiling kernels.
GPUSievePrimes (adjusted) 81206
GPUsieve minimum exponent 1037054
5. GPU tf kernels
exponent=78000071 ... calibrating
exponent=78000071, 24575M FCs each, k=1891972028970, 0.766434 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2060.72 ms ==> 12505.26M FCs/s ==> 2106.38 GHz-days/day
cl_barrett32_77_gs [64-77]: 2252.73 ms ==> 11439.38M FCs/s ==> 1926.85 GHz-days/day
cl_barrett32_87_gs [65-87]: 2354.68 ms ==> 10944.09M FCs/s ==> 1843.42 GHz-days/day
cl_barrett15_69_gs [60-69]: 2404.14 ms ==> 10718.94M FCs/s ==> 1805.50 GHz-days/day
cl_barrett15_70_gs [60-69]: 2406.14 ms ==> 10710.03M FCs/s ==> 1804.00 GHz-days/day
cl_barrett32_79_gs [64-79]: 2429.28 ms ==> 10608.00M FCs/s ==> 1786.81 GHz-days/day
cl_barrett32_88_gs [65-88]: 2561.94 ms ==> 10058.69M FCs/s ==> 1694.28 GHz-days/day
cl_barrett15_71_gs [60-70]: 2568.35 ms ==> 10033.62M FCs/s ==> 1690.06 GHz-days/day
cl_barrett32_92_gs [65-92]: 2732.36 ms ==> 9431.35M FCs/s ==> 1588.61 GHz-days/day
cl_barrett15_73_gs [60-73]: 2860.71 ms ==> 9008.19M FCs/s ==> 1517.34 GHz-days/day
cl_barrett15_74_gs [60-74]: 2969.47 ms ==> 8678.24M FCs/s ==> 1461.76 GHz-days/day
cl_barrett15_82_gs [60-81]: 3229.97 ms ==> 7978.35M FCs/s ==> 1343.87 GHz-days/day
cl_barrett15_83_gs [60-82]: 3492.20 ms ==> 7379.26M FCs/s ==> 1242.96 GHz-days/day
cl_barrett15_88_gs [60-87]: 3765.65 ms ==> 6843.39M FCs/s ==> 1152.70 GHz-days/day
Resulting speed for M78000071:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1805.495 cl_barrett15_69_gs
64 - 76 2106.383 cl_barrett32_76_gs
76 - 77 1926.846 cl_barrett32_77_gs
77 - 87 1843.420 cl_barrett32_87_gs
87 - 88 1694.284 cl_barrett32_88_gs
88 - 92 1588.615 cl_barrett32_92_gs
exponent=95795449 ... calibrating
exponent=95795449, 24575M FCs each, k=1540511100790, 0.624058 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2105.12 ms ==> 12241.48M FCs/s ==> 2061.95 GHz-days/day
cl_barrett32_77_gs [64-77]: 2268.13 ms ==> 11361.70M FCs/s ==> 1913.76 GHz-days/day
cl_barrett32_87_gs [65-87]: 2371.49 ms ==> 10866.49M FCs/s ==> 1830.35 GHz-days/day
cl_barrett15_70_gs [60-69]: 2408.68 ms ==> 10698.72M FCs/s ==> 1802.09 GHz-days/day
cl_barrett32_79_gs [64-79]: 2430.14 ms ==> 10604.25M FCs/s ==> 1786.18 GHz-days/day
cl_barrett15_69_gs [60-69]: 2432.14 ms ==> 10595.53M FCs/s ==> 1784.71 GHz-days/day
cl_barrett15_71_gs [60-70]: 2586.24 ms ==> 9964.20M FCs/s ==> 1678.37 GHz-days/day
cl_barrett32_88_gs [65-88]: 2609.06 ms ==> 9877.05M FCs/s ==> 1663.69 GHz-days/day
cl_barrett32_92_gs [65-92]: 2760.92 ms ==> 9333.79M FCs/s ==> 1572.18 GHz-days/day
cl_barrett15_73_gs [60-73]: 2887.16 ms ==> 8925.64M FCs/s ==> 1503.43 GHz-days/day
cl_barrett15_74_gs [60-74]: 2972.86 ms ==> 8668.35M FCs/s ==> 1460.10 GHz-days/day
cl_barrett15_82_gs [60-81]: 3235.13 ms ==> 7965.62M FCs/s ==> 1341.73 GHz-days/day
cl_barrett15_83_gs [60-82]: 3492.58 ms ==> 7378.43M FCs/s ==> 1242.82 GHz-days/day
cl_barrett15_88_gs [60-87]: 3837.41 ms ==> 6715.41M FCs/s ==> 1131.14 GHz-days/day
Resulting speed for M95795449:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1802.090 cl_barrett15_70_gs
64 - 76 2061.953 cl_barrett32_76_gs
76 - 77 1913.762 cl_barrett32_77_gs
77 - 87 1830.349 cl_barrett32_87_gs
87 - 88 1663.688 cl_barrett32_88_gs
88 - 92 1572.181 cl_barrett32_92_gs
mfakto 0.15pre6-MGW (64bit build)
Runtime options
Inifile mfaktonodp.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType VLIW5
SmallExp no
UseBinfile
Select device - Get device info:
OpenCL device info
name gfx906 (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 60 (4800 compute elements)
clock rate 1802 MHz
Automatic parameters
threads per grid 2097152
optimizing kernels for VLIW5
Compiling kernels.
Perftest
Generate list of the first 1075766 primes: 270.01 ms
1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 10 iterations)
Init_class(sieveprimes= 5000): 1.40 ms
Init_class(sieveprimes= 20000): 5.70 ms
Init_class(sieveprimes= 80000): 25.30 ms
Init_class(sieveprimes= 200000): 67.80 ms
Init_class(sieveprimes= 500000): 180.66 ms
Init_class(sieveprimes=1000000): 377.75 ms
2. CPU-Sieve (output rate M/s)
3. Memory copy to GPU (blocks of 8388608 bytes)
Standard copy, standard queue:
800 MB in 406.6 ms (2063.1 MB/s) (real)
Standard copy, profiled queue:
800 MB in 358.8 ms (2338.0 MB/s) (real)
800 MB in 70.2 ms (11952.9 MB/s) (profiled data)
8 MB in 0.7 ms (12512.8 MB/s) (profiled data, peak)
Standard copy, two queues:
800 MB in 375.4 ms (2234.6 MB/s) (real)
Reinitializing with gpu_sieving enabled.
Select device - Get device info:
OpenCL device info
name gfx906 (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 60 (4800 compute elements)
clock rate 1802 MHz
Automatic parameters
threads per grid 2097152
optimizing kernels for VLIW5
Compiling kernels.
4. GPU sieve, 10 iterations each
GPUSievePrimes (adjusted) 52534
GPUsieve minimum exponent 646182
gpusieve_init: 144.008000 ms (CPU work)
gpusieve_init_exponent: 0.200050 ms (CalcModularInverses)
gpusieve_init_class: 0.000000 ms (CalcBitToClear)
gpusieve: 1.000000 ms (SegSieve)
tf: 11.000700 ms = 12010.197169 M/s (raw rate, cl_barrett15_69_gs)
Runtime options
Inifile mfaktonodp.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults no
VectorSize 2
GPUType VLIW5
SmallExp no
UseBinfile
Select device - Get device info:
OpenCL device info
name gfx906 (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.5) (2841.5 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 60 (4800 compute elements)
clock rate 1802 MHz
Automatic parameters
threads per grid 2097152
optimizing kernels for VLIW5
Compiling kernels.
GPUSievePrimes (adjusted) 81206
GPUsieve minimum exponent 1037054
5. GPU tf kernels
exponent=78000071 ... calibrating
exponent=78000071, 24575M FCs each, k=1891972028970, 0.766434 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2056.12 ms ==> 12533.23M FCs/s ==> 2111.09 GHz-days/day
cl_barrett32_77_gs [64-77]: 2233.13 ms ==> 11539.78M FCs/s ==> 1943.76 GHz-days/day
cl_barrett32_87_gs [65-87]: 2355.33 ms ==> 10941.04M FCs/s ==> 1842.91 GHz-days/day
cl_barrett15_69_gs [60-69]: 2395.14 ms ==> 10759.22M FCs/s ==> 1812.28 GHz-days/day
cl_barrett15_70_gs [60-69]: 2396.68 ms ==> 10752.29M FCs/s ==> 1811.11 GHz-days/day
cl_barrett32_79_gs [64-79]: 2428.14 ms ==> 10612.99M FCs/s ==> 1787.65 GHz-days/day
cl_barrett15_71_gs [60-70]: 2555.90 ms ==> 10082.49M FCs/s ==> 1698.29 GHz-days/day
cl_barrett32_88_gs [65-88]: 2571.47 ms ==> 10021.41M FCs/s ==> 1688.00 GHz-days/day
cl_barrett32_92_gs [65-92]: 2710.51 ms ==> 9507.35M FCs/s ==> 1601.42 GHz-days/day
cl_barrett15_73_gs [60-73]: 2863.16 ms ==> 9000.46M FCs/s ==> 1516.04 GHz-days/day
cl_barrett15_74_gs [60-74]: 2963.91 ms ==> 8694.52M FCs/s ==> 1464.50 GHz-days/day
cl_barrett15_82_gs [60-81]: 3189.16 ms ==> 8080.42M FCs/s ==> 1361.06 GHz-days/day
cl_barrett15_83_gs [60-82]: 3433.74 ms ==> 7504.88M FCs/s ==> 1264.12 GHz-days/day
cl_barrett15_88_gs [60-87]: 3784.21 ms ==> 6809.83M FCs/s ==> 1147.05 GHz-days/day
Resulting speed for M78000071:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1812.280 cl_barrett15_69_gs
64 - 76 2111.095 cl_barrett32_76_gs
76 - 77 1943.759 cl_barrett32_77_gs
77 - 87 1842.907 cl_barrett32_87_gs
87 - 88 1688.004 cl_barrett32_88_gs
88 - 92 1601.416 cl_barrett32_92_gs
exponent=95795449 ... calibrating
exponent=95795449, 24575M FCs each, k=1540511100790, 0.624058 GHz-days (assignment), 0.050239 GHz-days (per test): ..............
cl_barrett32_76_gs [64-76]: 2056.66 ms ==> 12529.93M FCs/s ==> 2110.54 GHz-days/day
cl_barrett32_77_gs [64-77]: 2233.33 ms ==> 11538.76M FCs/s ==> 1943.59 GHz-days/day
cl_barrett32_87_gs [65-87]: 2356.13 ms ==> 10937.34M FCs/s ==> 1842.28 GHz-days/day
cl_barrett15_70_gs [60-69]: 2396.14 ms ==> 10754.73M FCs/s ==> 1811.52 GHz-days/day
cl_barrett15_69_gs [60-69]: 2404.33 ms ==> 10718.09M FCs/s ==> 1805.35 GHz-days/day
cl_barrett32_79_gs [64-79]: 2418.94 ms ==> 10653.36M FCs/s ==> 1794.45 GHz-days/day
cl_barrett32_88_gs [65-88]: 2559.15 ms ==> 10069.69M FCs/s ==> 1696.14 GHz-days/day
cl_barrett15_71_gs [60-70]: 2567.15 ms ==> 10038.30M FCs/s ==> 1690.85 GHz-days/day
cl_barrett32_92_gs [65-92]: 2724.45 ms ==> 9458.74M FCs/s ==> 1593.23 GHz-days/day
cl_barrett15_73_gs [60-73]: 2854.12 ms ==> 9029.00M FCs/s ==> 1520.84 GHz-days/day
cl_barrett15_74_gs [60-74]: 2957.17 ms ==> 8714.35M FCs/s ==> 1467.84 GHz-days/day
cl_barrett15_82_gs [60-81]: 3199.15 ms ==> 8055.21M FCs/s ==> 1356.82 GHz-days/day
cl_barrett15_83_gs [60-82]: 3424.26 ms ==> 7525.66M FCs/s ==> 1267.62 GHz-days/day
cl_barrett15_88_gs [60-87]: 3768.41 ms ==> 6838.37M FCs/s ==> 1151.85 GHz-days/day
Resulting speed for M95795449:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1811.524 cl_barrett15_70_gs
64 - 76 2110.538 cl_barrett32_76_gs
76 - 77 1943.585 cl_barrett32_77_gs
77 - 87 1842.284 cl_barrett32_87_gs
87 - 88 1696.136 cl_barrett32_88_gs
88 - 92 1593.227 cl_barrett32_92_gs
| |
ID: 6248 · Rating: 0 · rate: / Reply Quote | |
Thanks! Decent speedups from using the MUL32 kernels. | |
ID: 6250 · Rating: 0 · rate: / Reply Quote | |
Message boards :
Number crunching :
mfakto --perftest