log in |
Message boards : Number crunching : "Must update drivers"
1 · 2 · Next
Author | Message |
---|---|
I am no longer receiving any GPU tasks for SRBase. I am getting a message in the debug log | |
ID: 9869 · Rating: 0 · rate:
![]() ![]() ![]() | |
I am no longer receiving any GPU tasks for SRBase. I am getting a message in the debug log What cuda version do you have installed? cc6.0+ is supported, looks like your card is too old. I see you have some badges. When did you get the last WUs? | |
ID: 9870 · Rating: 0 · rate:
![]() ![]() ![]() | |
... looks like your card is too old. What is "too old", what GPUs are supported? ____________ ![]() | |
ID: 9871 · Rating: 0 · rate:
![]() ![]() ![]() | |
... looks like your card is too old. I have found some specs by google with cc5 if this is right. | |
ID: 9873 · Rating: 0 · rate:
![]() ![]() ![]() | |
These are the relevant lines from the BOINC event log. I think you're right, you've "upgraded" the SRBASE software past the capabilities of my video card. | |
ID: 9875 · Rating: 0 · rate:
![]() ![]() ![]() | |
These are the relevant lines from the BOINC event log. I think you're right, you've "upgraded" the SRBASE software past the capabilities of my video card. Its a Kepler card with cc3.0, I have changed to cc3.0 in plan_class for cuda10. If its not working for some cards I will change it back. | |
ID: 9877 · Rating: 0 · rate:
![]() ![]() ![]() | |
Thank you for that. I had to reset the project to get new work. So far, 3 tasks have completed and validated. | |
ID: 9878 · Rating: 0 · rate:
![]() ![]() ![]() | |
@rebirther, do you know if this was changed back? I'm adding some old K80 cards and get the same error:
2025-03-16T19:04:37.768554+00:00 vm01 boinc[11654]: 16-Mar-2025 19:04:37 [SRBase] NVIDIA GPU: Upgrade to the latest driver to process tasks using your computer's GPU
Here are the lines from my log with the CUDA version/etc..:
2025-03-16T18:56:28.815046+00:00 vm01 boinc[11654]: 16-Mar-2025 18:56:28 [---] CUDA: NVIDIA GPU 0: Tesla K80 (driver version 470.99, CUDA version 11.4, compute capability 3.7, 11441MB, 11441MB available, 4111 GFLOPS peak)
2025-03-16T18:56:28.815357+00:00 vm01 boinc[11654]: 16-Mar-2025 18:56:28 [---] CUDA: NVIDIA GPU 1: Tesla K80 (driver version 470.99, CUDA version 11.4, compute capability 3.7, 11441MB, 11441MB available, 4111 GFLOPS peak) | |
ID: 10527 · Rating: 0 · rate:
![]() ![]() ![]() | |
Do you have updated to cuda 11? Can you use cuda 10? | |
ID: 10528 · Rating: 0 · rate:
![]() ![]() ![]() | |
So far I haven't had any luck downgrading to cuda 10 using the 450.66 driver (based on https://docs.nvidia.com/deploy/cuda-compatibility/index.html) | |
ID: 10529 · Rating: 0 · rate:
![]() ![]() ![]() | |
As info from google K80 is not running with cuda11, unfortunately. | |
ID: 10530 · Rating: 0 · rate:
![]() ![]() ![]() | |
Hmm where do you see that? K80s are compute capability 3.7 so I thought they are supported up to CUDA 11.8, just deprecated:
$ nvidia-smi
Mon Mar 17 04:34:33 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
But I still see the same error in the logs, although it seems like the project downloads the files:
Mar 17 04:29:17 vm01 boinc[46003]: 17-Mar-2025 04:29:17 Initialization completed
Mar 17 04:29:17 vm01 boinc[46003]: 17-Mar-2025 04:29:17 [SRBase] Sending scheduler request: To fetch work.
Mar 17 04:29:17 vm01 boinc[46003]: 17-Mar-2025 04:29:17 [SRBase] Requesting new tasks for NVIDIA GPU
Mar 17 04:29:19 vm01 boinc[46003]: 17-Mar-2025 04:29:19 [SRBase] Scheduler request completed: got 47 new tasks
Mar 17 04:29:19 vm01 boinc[46003]: 17-Mar-2025 04:29:19 [SRBase] NVIDIA GPU: Upgrade to the latest driver to process tasks using your computer's GPU
Mar 17 04:29:19 vm01 boinc[46003]: 17-Mar-2025 04:29:19 [SRBase] Project requested delay of 7 seconds
Mar 17 04:29:21 vm01 boinc[46003]: 17-Mar-2025 04:29:21 [SRBase] Started download of wrapper_26018_linux_x86-64
Mar 17 04:29:21 vm01 boinc[46003]: 17-Mar-2025 04:29:21 [SRBase] Started download of mfaktc-linux64-cuda100v1.zip
Mar 17 04:29:24 vm01 boinc[46003]: 17-Mar-2025 04:29:24 [SRBase] Finished download of mfaktc-linux64-cuda100v1.zip
Mar 17 04:29:24 vm01 boinc[46003]: 17-Mar-2025 04:29:24 [SRBase] Started download of job_TF_l64c1_00043.xml
Mar 17 04:29:25 vm01 boinc[46003]: 17-Mar-2025 04:29:25 [SRBase] Finished download of job_TF_l64c1_00043.xml
....
Mar 17 04:29:32 vm01 boinc[46003]: 17-Mar-2025 04:29:32 [SRBase] Started download of worktodo13a382_0077223.txt
Mar 17 04:29:32 vm01 boinc[46003]: 17-Mar-2025 04:29:32 [SRBase] Computation for task TF_75-76_272-277M_wu_78821_0 finished
Mar 17 04:29:32 vm01 boinc[46003]: 17-Mar-2025 04:29:32 [SRBase] Output file TF_75-76_272-277M_wu_78821_0_0 for task TF_75-76_272-277M_wu_78821_0 absent
Mar 17 04:29:32 vm01 boinc[46003]: 17-Mar-2025 04:29:32 [SRBase] Starting task TF_75-76_272-277M_wu_77377_1
Mar 17 04:29:32 vm01 boinc[46003]: 17-Mar-2025 04:29:32 [SRBase] Starting task TF_75-76_272-277M_wu_78854_0
Mar 17 04:29:32 vm01 boinc[46003]: 17-Mar-2025 04:29:32 [SRBase] Scheduler request completed: got 47 new tasks
Mar 17 04:29:32 vm01 boinc[46003]: 17-Mar-2025 04:29:32 [SRBase] NVIDIA GPU: Upgrade to the latest driver to process tasks using your computer's GPU
Mar 17 04:29:32 vm01 boinc[46003]: 17-Mar-2025 04:29:32 [SRBase] Project requested delay of 7 seconds
Mar 17 04:29:33 vm01 boinc[46324]: mv: cannot stat 'slots/2/results.json.txt': No such file or directory
Mar 17 04:29:33 vm01 boinc[46003]: 17-Mar-2025 04:29:33 [SRBase] Computation for task TF_75-76_272-277M_wu_78853_0 finished
Mar 17 04:29:33 vm01 boinc[46003]: 17-Mar-2025 04:29:33 [SRBase] Output file TF_75-76_272-277M_wu_78853_0_0 for task TF_75-76_272-277M_wu_78853_0 absent
On a side note, I'm running einstien at home with other K80s with the same 470.256.02 driver without a problem; looking at their projects directory, they run with libcudart.so.10.2. If it's too much effort/doesn't make sense to get K80s to work with SRBase, I can just allocate this machine to something else. | |
ID: 10531 · Rating: 0 · rate:
![]() ![]() ![]() | |
Also, it looks like the jobs are failing because they are looking for libcudart.so.10.1 but libcudart.so.10.0 is deployed:
# cat /var/lib/boinc/slots/1/stderr.txt
2025-03-17 04:29:49 (46400): wrapper (7.24.26018): starting
2025-03-17 04:29:49 (46400): wrapper (7.24.26018): starting
2025-03-17 04:29:49 (46400): wrapper: running ./mfaktc.exe (-d 1)
2025-03-17 04:29:49 (46400): wrapper: created child process 46404
./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory
2025-03-17 04:29:50 (46400): ./mfaktc.exe exited; CPU time 0.001167
2025-03-17 04:29:50 (46400): app exit status: 0x7f
2025-03-17 04:29:50 (46400): called boinc_finish(195)
# ll /var/lib/boinc/slots/1/
total 3276
drwxrwx--x 2 boinc boinc 4096 Mar 17 04:29 ./
drwxr-xr-x 6 boinc boinc 4096 Mar 17 04:29 ../
-rw-r--r-- 1 boinc boinc 35147 Oct 23 19:09 COPYING
-rw-r--r-- 1 boinc boinc 19480 Oct 23 19:09 Changelog.txt
-rw-r--r-- 1 boinc boinc 14134 Oct 23 19:09 README.txt
-rw-r--r-- 1 boinc boinc 4 Mar 17 04:29 boinc_finish_called
-rw-r--r-- 1 boinc boinc 8192 Mar 17 04:29 boinc_mmap_file
-rw-r--r-- 1 boinc boinc 9046 Mar 17 04:29 init_data.xml
-rw-r--r-- 1 boinc boinc 88 Mar 17 04:29 job.xml
-rw-r--r-- 1 boinc boinc 495736 Dec 29 01:16 libcudart.so.10.0
-rwxrwxrwx 1 boinc boinc 2715824 Sep 4 2019 mfaktc.exe*
-rw-r--r-- 1 boinc boinc 9103 Jan 9 10:01 mfaktc.ini
-rw-r--r-- 1 boinc boinc 554 Mar 17 04:29 stderr.txt
-rw-r--r-- 1 boinc boinc 57 Mar 17 04:29 worktodo.txt
-rw-r--r-- 1 boinc boinc 92 Mar 17 04:29 wrapper_26018_linux_x86-64
| |
ID: 10532 · Rating: 0 · rate:
![]() ![]() ![]() | |
https://srbase.my-firewall.org/sr5/download/libcudart.so.10.1 | |
ID: 10533 · Rating: 0 · rate:
![]() ![]() ![]() | |
I redeployed Ubuntu 24 and reinstalled the 470.256.02 driver but still seeing:
vm01:~$ nvidia-smi
Wed Mar 19 17:25:30 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.256.02 Driver Version: 470.256.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| |
ID: 10537 · Rating: 0 · rate:
![]() ![]() ![]() | |
I see, we had 2 different plan_classes for cuda11, win and linux, I have changed one and set it to cc3.0 min, should be working now, pls try again... | |
ID: 10539 · Rating: 0 · rate:
![]() ![]() ![]() | |
Interestingly, it still has the "Upgrade driver" message but it proceeds and downloads files; but it looks like its getting an error in the tasks:
2025-03-20T03:41:34.322868+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:34 [SRBase] NVIDIA GPU: Upgrade to the latest driver to process tasks using your computer's GPU
2025-03-20T03:41:34.322947+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:34 [SRBase] Project requested delay of 7 seconds
2025-03-20T03:41:36.372613+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:36 [SRBase] Started download of worktodo13a383_0026378.txt
2025-03-20T03:41:36.372742+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:36 [SRBase] Started download of worktodo13a383_0026415.txt
2025-03-20T03:41:38.382452+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:38 [SRBase] Finished download of worktodo13a383_0026378.txt (57 bytes)
2025-03-20T03:41:38.382733+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:38 [SRBase] Finished download of worktodo13a383_0026415.txt (57 bytes)
2025-03-20T03:41:38.382798+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:38 [SRBase] Started download of worktodo13a383_0026416.txt
2025-03-20T03:41:38.382917+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:38 [SRBase] Started download of worktodo13a383_0026417.txt
2025-03-20T03:41:38.386071+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:38 [SRBase] Starting task TF_75-76_236-282M_wu_26415_0
2025-03-20T03:41:38.389121+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:38 [SRBase] Starting task TF_75-76_236-282M_wu_26378_0
2025-03-20T03:41:40.422707+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:40 [SRBase] Finished download of worktodo13a383_0026416.txt (57 bytes)
2025-03-20T03:41:40.422892+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:40 [SRBase] Finished download of worktodo13a383_0026417.txt (57 bytes)
2025-03-20T03:41:40.423007+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:40 [SRBase] Started download of worktodo13a383_0026418.txt
2025-03-20T03:41:40.423070+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:40 [SRBase] Started download of worktodo13a383_0026529.txt
2025-03-20T03:41:40.426282+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:40 [SRBase] Starting task TF_75-76_236-282M_wu_26416_0
2025-03-20T03:41:40.429498+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:40 [SRBase] Starting task TF_75-76_236-282M_wu_26417_0
2025-03-20T03:41:42.466372+00:00 vm01 boinc[47982]: mv: cannot stat 'slots/0/results.json.txt': No such file or directory
2025-03-20T03:41:42.469145+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:42 [SRBase] Finished download of worktodo13a383_0026418.txt (57 bytes)
2025-03-20T03:41:42.469362+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:42 [SRBase] Finished download of worktodo13a383_0026529.txt (57 bytes)
2025-03-20T03:41:42.469461+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:42 [SRBase] Started download of worktodo13a383_0026375.txt
2025-03-20T03:41:42.469552+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:42 [SRBase] Started download of worktodo13a383_0026377.txt
2025-03-20T03:41:42.469629+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:42 [SRBase] Computation for task TF_75-76_236-282M_wu_26415_0 finished
2025-03-20T03:41:42.469691+00:00 vm01 boinc[47533]: 20-Mar-2025 03:41:42 [SRBase] Output file TF_75-76_236-282M_wu_26415_0_0 for task TF_75-76_236-282M_wu_26415_0 absent
$ sudo cat /var/lib/boinc/slots/0/stderr.txt
2025-03-20 03:41:50 (48039): wrapper (7.24.26018): starting
2025-03-20 03:41:50 (48039): wrapper (7.24.26018): starting
2025-03-20 03:41:50 (48039): wrapper: running ./mfaktc.exe (-d 0)
2025-03-20 03:41:50 (48039): wrapper: created child process 48044
2025-03-20 03:41:52 (48039): ./mfaktc.exe exited; CPU time 0.697094
2025-03-20 03:41:52 (48039): app exit status: 0x1
2025-03-20 03:41:52 (48039): called boinc_finish(195)
free(): invalid pointer
SIGABRT: abort called
Stack trace (22 frames):
../../projects/srbase.my-firewall.org_sr5/wrapper_26018_linux_x86-64(+0x41eac)[0x62d17fa41eac]
/lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x74590ac45330]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x11c)[0x74590ac9eb2c]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x1e)[0x74590ac4527e]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xdf)[0x74590ac288ff]
/lib/x86_64-linux-gnu/libc.so.6(+0x297b6)[0x74590ac297b6]
/lib/x86_64-linux-gnu/libc.so.6(+0xa8ff5)[0x74590aca8ff5]
/lib/x86_64-linux-gnu/libc.so.6(+0xab38c)[0x74590acab38c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_free+0x7e)[0x74590acaddae]
../../projects/srbase.my-firewall.org_sr5/wrapper_26018_linux_x86-64(+0x67daa)[0x62d17fa67daa]
../../projects/srbase.my-firewall.org_sr5/wrapper_26018_linux_x86-64(+0x87334)[0x62d17fa87334]
/lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x74590ac45330]
/lib/x86_64-linux-gnu/libc.so.6(clock_nanosleep+0xbf)[0x74590acecadf]
/lib/x86_64-linux-gnu/libc.so.6(nanosleep+0x17)[0x74590acf9a27]
/lib/x86_64-linux-gnu/libc.so.6(sleep+0x43)[0x74590ad0ec63]
../../projects/srbase.my-firewall.org_sr5/wrapper_26018_linux_x86-64(+0x592d4)[0x62d17fa592d4]
../../projects/srbase.my-firewall.org_sr5/wrapper_26018_linux_x86-64(+0x3b129)[0x62d17fa3b129]
../../projects/srbase.my-firewall.org_sr5/wrapper_26018_linux_x86-64(+0x3b1cd)[0x62d17fa3b1cd]
../../projects/srbase.my-firewall.org_sr5/wrapper_26018_linux_x86-64(+0x1e4df)[0x62d17fa1e4df]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x74590ac2a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x74590ac2a28b]
../../projects/srbase.my-firewall.org_sr5/wrapper_26018_linux_x86-64(+0x1aa2a)[0x62d17fa1aa2a]
Exiting...
| |
ID: 10540 · Rating: 0 · rate:
![]() ![]() ![]() | |
Can you try to unzip the content of mfaktc.zip in a separate folder and run | |
ID: 10541 · Rating: 0 · rate:
![]() ![]() ![]() | |
Sure, here's the output:
$ ./mfaktc.exe -st
mfaktc v0.23.0 (64bit built)
Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled
Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 2047Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 60s
WorkFileAddDelay disabled
Stages enabled
StopAfterFactor bitlevel
PrintMode compact
Logging disabled
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no
CUDA version info
binary compiled for CUDA 11.20
CUDA runtime version 11.20
CUDA driver version 11.40
CUDA device info
name Tesla K80
compute capability 3.7
max threads per block 1024
max shared memory per MP 114688 byte
number of multiprocessors 13
CUDA cores per MP 192
CUDA cores - total 2496
clock rate (CUDA cores) 823MHz
memory clock rate: 2505MHz
memory bus width: 384 bit
Automatic parameters
threads per grid 851968
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144
########## testcase 1/2867 ##########
Starting trial factoring M50804297 from 2^67 to 2^68 (0.59 GHz-days)
Using GPU kernel "75bit_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Mar 21 16:01 | 3387 0.1% | 0.001 n.a. | n.a. 82485 n.a.%ERROR: cudaGetLastError() returned 209: no kernel image is available for execution on the device
| |
ID: 10544 · Rating: 0 · rate:
![]() ![]() ![]() | |
Thx for the test, it has been confirmed that the app was not correct compiled with cc3.5, we will try to get another fix for this. | |
ID: 10545 · Rating: 0 · rate:
![]() ![]() ![]() | |
Message boards :
Number crunching :
"Must update drivers"