log in |
Message boards : Number crunching : Nvidia Tesla P100 Problems
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
cuda 12+ still supports CC 6.0 devices. why not just recompile the cuda120 app with CC 6.0 support? | |
ID: 9746 · Rating: 0 · rate: / Reply Quote | |
cuda 12+ still supports CC 6.0 devices. why not just recompile the cuda120 app with CC 6.0 support? cuda12 app support cc6.1, see 1070ti I dont know why the Tesla doesnt run cuda120 WUs, anonymous patform is not allowed. | |
ID: 9747 · Rating: 0 · rate: / Reply Quote | |
cuda 12+ still supports CC 6.0 devices. why not just recompile the cuda120 app with CC 6.0 support? i know your app supports 6.1. but CUDA 12 from nvidia still supports down to CC 5.0 I'm saying you should recompile it to add support for 6.0 also. Tesla P100 has CC level 6.0. 6.1 is used on the mainstream pascal GPUs (GTX 10-series, Quadro P-series) | |
ID: 9748 · Rating: 0 · rate: / Reply Quote | |
cuda 12+ still supports CC 6.0 devices. why not just recompile the cuda120 app with CC 6.0 support? Yes, we will try to recompile, perhaps this will solve a lot of problems. | |
ID: 9749 · Rating: 0 · rate: / Reply Quote | |
there also appears to be a problem with the cuda100 package. | |
ID: 9750 · Rating: 0 · rate: / Reply Quote | |
there also appears to be a problem with the cuda100 package. the libcudart.so.10 file you can find in FAQs. If the app was dynamically compiled you dont need them in package. The .exe in linux was also compiled so but running under linux. The .ini file is fixed. | |
ID: 9751 · Rating: 0 · rate: / Reply Quote | |
there also appears to be a problem with the cuda100 package. when I run ldd mfaktc.exe on the binary it says libcudart.so.10.1 is missing. in your cuda111 and cuda120 packages you include this file. I see nothing in the FAQ that references this file. can you link to it? Is your intended workaround for this to create a symlink from libcudart.so.10->libcudart.so.10.1 ? | |
ID: 9752 · Rating: 0 · rate: / Reply Quote | |
there also appears to be a problem with the cuda100 package. The file must be somewhere in another thread. Linux is not my thing so I will ask if we can do something. The package need a recompile too, old stuff. | |
ID: 9753 · Rating: 0 · rate: / Reply Quote | |
new cuda120 TF mfaktc app is up | |
ID: 9754 · Rating: 0 · rate: / Reply Quote | |
All errors with new version. | |
ID: 9757 · Rating: 0 · rate: / Reply Quote | |
All errors with new version. ok, this was not as planned, reverted back to 0.28, need to check... | |
ID: 9758 · Rating: 0 · rate: / Reply Quote | |
All errors with new version. v30 fixed the issue with a wrong and older.so12 file | |
ID: 9759 · Rating: 0 · rate: / Reply Quote | |
That fixed one of my PCs. | |
ID: 9760 · Rating: 0 · rate: / Reply Quote | |
That fixed one of my PCs. Can you update your driver on this host? | |
ID: 9761 · Rating: 0 · rate: / Reply Quote | |
That fixed one of my PCs. v31 is up, should reduce the driver requirement to 525 | |
ID: 9762 · Rating: 0 · rate: / Reply Quote | |
That fixed one of my PCs. Awesome, thank you. This version works. | |
ID: 9764 · Rating: 0 · rate: / Reply Quote | |
I'm still getting errors on a dual P100 PC trying to run TF 0.31 (cuda120) tasks: <core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
2024-03-18 09:49:38 (2334): wrapper (7.24.26018): starting
2024-03-18 09:49:38 (2334): wrapper (7.24.26018): starting
2024-03-18 09:49:38 (2334): wrapper: running ./mfaktc (-d 1)
2024-03-18 09:49:38 (2334): wrapper: created child process 2336
2024-03-18 09:49:39 (2334): ./mfaktc exited; CPU time 0.254903
2024-03-18 09:49:39 (2334): app exit status: 0x1
2024-03-18 09:49:39 (2334): called boinc_finish(195)
</stderr_txt>
]]> $ nvidia-smi
Mon Mar 18 11:02:56 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:04:00.0 Off | 0 |
| N/A 31C P0 25W / 250W | 197MiB / 16384MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:09:00.0 Off | 0 |
| N/A 30C P0 27W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1165 G /usr/lib/xorg/Xorg 112MiB |
| 0 N/A N/A 1517 G cinnamon 28MiB |
| 0 N/A N/A 2181 G ...2gtk-4.0/WebKitWebProcess 55MiB |
| 1 N/A N/A 1165 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
| |
ID: 9766 · Rating: 0 · rate: / Reply Quote | |
Interesting, try to run ldd ./mfaktc | |
ID: 9767 · Rating: 0 · rate: / Reply Quote | |
Interesting, try to run ldd ./mfaktc $ ldd ./mfaktc
linux-vdso.so.1 (0x00007ffd1f50d000)
libcudart.so.12 => ./libcudart.so.12 (0x00007ff29f6b8000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff29f552000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff29f370000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff29f355000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff29f163000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff29f960000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff29f15d000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff29f138000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff29f12e000)
| |
ID: 9768 · Rating: 0 · rate: / Reply Quote | |
was v31 compiled with explicit support for CC_6.0 in the NVCCFLAGS section of your makefile or build script? --generate-code arch=compute_60,code=sm_60 this was what I was suggesting when I mentioned recompiling it to add 6.0 support. | |
ID: 9769 · Rating: 0 · rate: / Reply Quote | |
Message boards :
Number crunching :
Nvidia Tesla P100 Problems