Author |
Message |
|
I'll try rolling the driver back to 440 and see what happens.
Hardware and temps I doubt are an issue given the card works fine on other prime number projects but it won't hurt to try just the same. |
|
|
|
Went to 455 drivers instead. Still no go. |
|
|
|
Back to 435 drivers, still no go.
Still running PrimeGrid tasks with no drama (not at the same time of course)
ldd mfaktc.exe confirms I have all required libraries installed correctly. |
|
|
|
Same issue on my machine. mfaktc was happily running (Manjaro Linux), then since I updated the nvidia drivers, it keeps throwing "ERROR: cudaGetLastError() returned 702: the launch timed out and was terminated".
I noticed the nvidia drivers were updated from 450.66 to 450.80 so I rolled back, but no luck...
The only way I can make it work is by disabling the "Interactive" option in X.org config as described in https://nvidia.custhelp.com/app/answers/detail/a_id/3029/~/using-cuda-and-x but it makes the computer pretty much unusable...
Also worth noting that older version of mfaktc I was using before outside of SRBase (mfaktc-0.21.linux64.cuda65) is still working fine, without changing X.org config. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7419 Credit: 42,730,867 RAC: 0 |
Same issue on my machine. mfaktc was happily running (Manjaro Linux), then since I updated the nvidia drivers, it keeps throwing "ERROR: cudaGetLastError() returned 702: the launch timed out and was terminated".
I noticed the nvidia drivers were updated from 450.66 to 450.80 so I rolled back, but no luck...
The only way I can make it work is by disabling the "Interactive" option in X.org config as described in https://nvidia.custhelp.com/app/answers/detail/a_id/3029/~/using-cuda-and-x but it makes the computer pretty much unusable...
Also worth noting that older version of mfaktc I was using before outside of SRBase (mfaktc-0.21.linux64.cuda65) is still working fine, without changing X.org config.
thx for the info, never touch a running system, as in win10, driver armageddon, from the rollback, something of the new settings stored somewhere. |
|
|
|
Also worth noting, it looks like from Linux kernel 5.9, the nvidia-uvm module won' t load due to licensing issues (it wants to use other bits of GPL code I think). and this causes CUDA applications to fail... So stay with 5.8 until Nvidia fixes their drivers. |
|
|
|
Not sure if this is related, but I converted a Windows 7 host to Linux Mint and it successfully completed more than 40 GPU tasks on an NVIDIA GTX 960 but then started to fail with computation errors.
The error message is:
./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory
Host ID is 211122
Anybody seen this before and know if it's a simple problem to fix?
Rebooting didn't solve it.
____________
|
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7419 Credit: 42,730,867 RAC: 0 |
Not sure if this is related, but I converted a Windows 7 host to Linux Mint and it successfully completed more than 40 GPU tasks on an NVIDIA GTX 960 but then started to fail with computation errors.
The error message is:
./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory
Host ID is 211122
Anybody seen this before and know if it's a simple problem to fix?
Rebooting didn't solve it.
yes, see FAQ |
|
|
|
The FAQ isn't easy to understand :(
The GPU info is:
CUDA: NVIDIA GPU 0: GeForce GTX 960 (driver version 455.38, CUDA version 11.1, compute capability 5.2, 1993MB, 1952MB available, 2618 GFLOPS peak)
The FAQ says I will be sent Cuda111 and Cuda100 tasks.
How do I stop the Cuda100 tasks from causing computation errors?
The FAQ says to copy a missing file to /usr/lib64 -- but that file is deleted when you install Cuda11 Nvidia Drivers. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7419 Credit: 42,730,867 RAC: 0 |
The FAQ isn't easy to understand :(
The GPU info is:
CUDA: NVIDIA GPU 0: GeForce GTX 960 (driver version 455.38, CUDA version 11.1, compute capability 5.2, 1993MB, 1952MB available, 2618 GFLOPS peak)
The FAQ says I will be sent Cuda111 and Cuda100 tasks.
How do I stop the Cuda100 tasks from causing computation errors?
The FAQ says to copy a missing file to /usr/lib64 -- but that file is deleted when you install Cuda11 Nvidia Drivers.
I will try to get a file and upload locally. Stay tuned. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7419 Credit: 42,730,867 RAC: 0 |
http://srbase.my-firewall.org/sr5/download/libcudart.so.10.1 |
|
|
|
I tried this last night with a libcudart.so.10.1.243 copied from another host and renamed to 10.1, but it didn't work in /usr/lib64.
So I created the missing directories and copied it to /usr/local/cuda/lib64 (the location on the other host) and that didn't work either.
Now trying again with your file copied to both locations.
Still getting computation errors.
Do you know that when you install the latest Nvidia driver it removes the entire cuda directory from /usr/local ?
It's not just this file that is missing. Everything has gone. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7419 Credit: 42,730,867 RAC: 0 |
I tried this last night with a libcudart.so.10.1.243 copied from another host and renamed to 10.1, but it didn't work in /usr/lib64.
So I created the missing directories and copied it to /usr/local/cuda/lib64 (the location on the other host) and that didn't work either.
Now trying again with your file copied to both locations.
Still getting computation errors.
Do you know that when you install the latest Nvidia driver it removes the entire cuda directory from /usr/local ?
It's not just this file that is missing. Everything has gone.
You need only the latest cuda stuff and copy file. If its not working then use only cuda10 driver + toolkit. Do you not have the /usr/lib64 folder?
I cannot change the cuda restrictions because then all Turing cards cant get work thats why you are getting cuda10 and cuda11 apps. |
|
|
|
The other cuda files are in /usr/lib (not lib64) and in subdirectories by host architecture.
OK, this now works!
The file needs to go in /usr/lib
Thank you for sending the file link. |
|
|
DeleteNullVolunteer developer Volunteer tester Send message
Joined: 29 Nov 14 Posts: 83 Credit: 370,414,522 RAC: 116 |
Depending on the flavour of Linux you are using you have to choose the right directory.
For Ubuntu (and derivatives) it is /usr/lib and in OpenSuse it is /usr/lib64.
But there are some other Linux Distros to confuse the user ;) |
|
|
|
A little confusing, but also rewarding to use with old Macs and PCs :)
My next job is to get boinc to recognise a Thunderbolt eGPU. Ubuntu has no problem utilising the eGPU, but boinc says "no GPU found" no matter if it is Nvidia or ATI :(
I have a question about your 3080.
Have you tried running multiple tasks at the same time?
I would have thought that two, or perhaps even 4 at a time would be more efficient because the time spent ending/beginning each tasks is significant when processing tasks so quickly?
____________
|
|
|
DeleteNullVolunteer developer Volunteer tester Send message
Joined: 29 Nov 14 Posts: 83 Credit: 370,414,522 RAC: 116 |
Have you tried running multiple tasks at the same time?
No, I haven't. Performance is outstanding, so there is no need to do so (for me). |
|
|
|
Didn't manage to get 3080 Ti running. Had to use driver 465 and re-added libcudart.so.10.1 to /usr/lib/ but got:
process exited with code 195 (0xc3, -61)</message>
rebirther: do we need additional changes to libcudart or to application to use cuda 11.3? file in order to work with 465 driver?
smi: Driver Version: 465.27 CUDA Version: 11.3
host: https://srbase.my-firewall.org/sr5/show_host_detail.php?hostid=211970
boinc does not detect card fully but running great on other projects. |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7419 Credit: 42,730,867 RAC: 0 |
Didn't manage to get 3080 Ti running. Had to use driver 465 and re-added libcudart.so.10.1 to /usr/lib/ but got:
process exited with code 195 (0xc3, -61)
rebirther: do we need additional changes to libcudart file in order to work with 465 driver?
What host is affected and what cuda toolkit is installed? |
|
|
|
https://srbase.my-firewall.org/sr5/show_host_detail.php?hostid=211970
installed just driver then ocl-icd-libopencl1 and nvidia-opencl-dev |
|
|