Author |
Message |
|
I have only been looking at mfaktc on linux for nVidia, don't have any working AMD cards. I only have the one dual GPU box that I put together this afternoon to try.
The output in stderr.txt says:
wrapper: running ./mfaktc.exe ( --device 1)
are you actually passing "-d 1" to the mfaktc program ?
This is a second task, the first task going to first GPU says ( --device 0).
In the paused Boinc slot for the second task (that started on first GPU) I can type "sudo ./mfaktc.exe -d 1' and it will run on the second GPU. It needs sudo to create the checkpoint file. Any other attempt than "-d 1" makes it run on the first GPU again.
If the wrapper has managed to work out the correct device number to pass to mfaktc (which it looks like it has on my dual GPU system) then I don't understand why it wouldn't run it on the second GPU ?
Theoretical both applicatins can support more than one (different) GPU.
But: BOINC enumerates the GPU with 0, 1, 2, ....
In OpenCl you have platforms, e.g. Intel=0, AMD=1, NVidia=2, and for each platform 1..n devices GPU.
A mapping form 0, 1, 2 to 00, 10, 11 is different for each computer with more than one graphics device.
So there is currently only one mapping --device 0 to d 00. (possible) |
|
|
DeleteNullVolunteer developer Volunteer tester Send message
Joined: 29 Nov 14 Posts: 83 Credit: 374,914,522 RAC: 4,661 |
Do you have the complete wrapper.cpp file?
The new wrapper works as expected.
Task starts with 0%, switches to 100% after one minute.
This is because there is no Mxxxxxx.ckp file in the first 5 minutes.
After 5 minutes it switches to the percentage of the last Mxxxxxx.ckp.
The remaining time increases (because the percentage stays at the old level for 5 minutes)
After 10 minutes is the next update, and then every 5 minutes.
Is this o.k., or shall I implement a smoother calculation of the percentage (a few more steps in the 5 minute interval)? |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7479 Credit: 43,688,001 RAC: 42,041 |
Do you have the complete wrapper.cpp file?
The new wrapper works as expected.
Task starts with 0%, switches to 100% after one minute.
This is because there is no Mxxxxxx.ckp file in the first 5 minutes.
After 5 minutes it switches to the percentage of the last Mxxxxxx.ckp.
The remaining time increases (because the percentage stays at the old level for 5 minutes)
After 10 minutes is the next update, and then every 5 minutes.
Is this o.k., or shall I implement a smoother calculation of the percentage (a few more steps in the 5 minute interval)?
no, thats good. |
|
|
DeleteNullVolunteer developer Volunteer tester Send message
Joined: 29 Nov 14 Posts: 83 Credit: 374,914,522 RAC: 4,661 |
o.k., you can find the updated wrapper.cpp here:
https://p-numbers.net/wrapper.cpp
Works for mfakto (mfaktc), but will not work for LLR. |
|
|
|
So there is currently only one mapping --device 0 to d 00. (possible)
I don't think you are mapping anything !
On mfaktc everything goes to the first device because the program does not recognise any of the command line telling it which GPU device to use so it defaults to the first one. The wrapper seems to know the right number of the device to use but it is ignored because it is not formatted correctly. |
|
|
DeleteNullVolunteer developer Volunteer tester Send message
Joined: 29 Nov 14 Posts: 83 Credit: 374,914,522 RAC: 4,661 |
I've patched mfakto, not mfaktc.
So the mapping --device 0 to -d 00 is only implemented for AMD, not NVidia.
(and only linux) |
|
|
|
Started running this task yesterday evening on the second device but it was going to device 0 by default even though it said device 1.
Carried on running it overnight outside of Boinc until GPU 0 was available to complete it on device 0 and report it through Boinc.
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
18:53:44 (21480): wrapper (7.2.26012): starting
18:53:44 (21480): wrapper: running ./mfaktc.exe ( --device 1)
19:01:46 (21530): wrapper (7.2.26012): starting
19:01:46 (21530): wrapper: running ./mfaktc.exe ( --device 1)
19:04:28 (21551): wrapper (7.2.26012): starting
19:04:28 (21551): wrapper: running ./mfaktc.exe ( --device 1)
19:23:31 (21671): wrapper (7.2.26012): starting
19:23:31 (21671): wrapper: running ./mfaktc.exe ( --device 1)
20:56:42 (22038): wrapper (7.2.26012): starting
20:56:42 (22038): wrapper: running ./mfaktc.exe ( --device 0)
10:31:40 (126157): wrapper (7.2.26012): starting
10:31:40 (126157): wrapper: running ./mfaktc.exe ( --device 0)
18:56:42 (126157): ./mfaktc.exe exited; CPU time 24.265708
18:56:42 (126157): called boinc_finish
</stderr_txt>
]]>
Do you have any plans to work on getting '--device' changed to '-d' for mfaktc so it can work as it should ?
It is easy enough for me just to run a second client and put a GPU in each but most won't want, or know how, to do that. |
|
|
|
Started running this task yesterday evening on the second device but it was going to device 0 by default even though it said device 1.
Carried on running it overnight outside of Boinc until GPU 0 was available to complete it on device 0 and report it through Boinc.
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
18:53:44 (21480): wrapper (7.2.26012): starting
18:53:44 (21480): wrapper: running ./mfaktc.exe ( --device 1)
19:01:46 (21530): wrapper (7.2.26012): starting
19:01:46 (21530): wrapper: running ./mfaktc.exe ( --device 1)
19:04:28 (21551): wrapper (7.2.26012): starting
19:04:28 (21551): wrapper: running ./mfaktc.exe ( --device 1)
19:23:31 (21671): wrapper (7.2.26012): starting
19:23:31 (21671): wrapper: running ./mfaktc.exe ( --device 1)
20:56:42 (22038): wrapper (7.2.26012): starting
20:56:42 (22038): wrapper: running ./mfaktc.exe ( --device 0)
10:31:40 (126157): wrapper (7.2.26012): starting
10:31:40 (126157): wrapper: running ./mfaktc.exe ( --device 0)
18:56:42 (126157): ./mfaktc.exe exited; CPU time 24.265708
18:56:42 (126157): called boinc_finish
</stderr_txt>
]]>
Do you have any plans to work on getting '--device' changed to '-d' for mfaktc so it can work as it should ?
It is easy enough for me just to run a second client and put a GPU in each but most won't want, or know how, to do that.
What is your cc_config and app_config setup? I was able to run 2 instances to get both GPUs to work on 1 WU ea, but now with the changes (and who knows what was changed as no one knows apparently) it will only run on my 1st GPU no matter what I've tried. I'm on Windows 10 with 2xc 1660ti.. |
|
|
|
I've patched mfakto, not mfaktc.
So the mapping --device 0 to -d 00 is only implemented for AMD, not NVidia.
(and only linux)
This is ridiculous. You're screwing us on Windows...please fix/implement so it works properly there too. |
|
|
|
I can run the TF tasks in my windows pc's with no problems but on my every single one of my Linux pc's every single task errors out. Any ideas on how to fix it?
I just checked one and it says
"Stderr output
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:35:21 (2977): wrapper (7.2.26012): starting
14:35:21 (2977): wrapper: running ./mfaktc.exe ( --device 0)
./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory
14:35:22 (2977): ./mfaktc.exe exited; CPU time 0.000571
14:35:22 (2977): app exit status: 0x7f00
14:35:22 (2977): called boinc_finish"
The pc above is: CPU type AuthenticAMD
AMD Ryzen Threadripper 1920X 12-Core Processor [Family 23 Model 1 Stepping 1]
Number of processors 24
Coprocessors NVIDIA GeForce GTX 1080 Ti (4095MB) driver: 435.21 OpenCL: 1.2
Operating System Linux LinuxMint
Linux Mint 19.3 Tricia [5.0.0-32-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7479 Credit: 43,688,001 RAC: 42,041 |
I can run the TF tasks in my windows pc's with no problems but on my every single one of my Linux pc's every single task errors out. Any ideas on how to fix it?
I just checked one and it says
"Stderr output
7.16.6
process exited with code 195 (0xc3, -61)
14:35:21 (2977): wrapper (7.2.26012): starting
14:35:21 (2977): wrapper: running ./mfaktc.exe ( --device 0)
./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory
14:35:22 (2977): ./mfaktc.exe exited; CPU time 0.000571
14:35:22 (2977): app exit status: 0x7f00
14:35:22 (2977): called boinc_finish"
The pc above is: CPU type AuthenticAMD
AMD Ryzen Threadripper 1920X 12-Core Processor [Family 23 Model 1 Stepping 1]
Number of processors 24
Coprocessors NVIDIA GeForce GTX 1080 Ti (4095MB) driver: 435.21 OpenCL: 1.2
Operating System Linux LinuxMint
Linux Mint 19.3 Tricia [5.0.0-32-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)]
You must install cuda libs 10, see FAQ |
|
|
|
I can run the TF tasks in my windows pc's with no problems but on my every single one of my Linux pc's every single task errors out. Any ideas on how to fix it?
I just checked one and it says
"Stderr output
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:35:21 (2977): wrapper (7.2.26012): starting
14:35:21 (2977): wrapper: running ./mfaktc.exe ( --device 0)
./mfaktc.exe: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory
14:35:22 (2977): ./mfaktc.exe exited; CPU time 0.000571
14:35:22 (2977): app exit status: 0x7f00
14:35:22 (2977): called boinc_finish"
The pc above is: CPU type AuthenticAMD
AMD Ryzen Threadripper 1920X 12-Core Processor [Family 23 Model 1 Stepping 1]
Number of processors 24
Coprocessors NVIDIA GeForce GTX 1080 Ti (4095MB) driver: 435.21 OpenCL: 1.2
Operating System Linux LinuxMint
Linux Mint 19.3 Tricia [5.0.0-32-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)]
You must install cuda libs 10, see FAQ
I did the CUDA thing and 1804 is gone and 2004 is in it's place, so I tried that and of course it failed as I'm using Linux Mint 19.3. I will work on just the Lib 10 files |
|
|
rebirtherVolunteer moderator Project administrator Project developer Project tester Project scientist
Send message
Joined: 2 Jan 13 Posts: 7479 Credit: 43,688,001 RAC: 42,041 |
I did the CUDA thing and 1804 is gone and 2004 is in it's place, so I tried that and of course it failed as I'm using Linux Mint 19.3. I will work on just the Lib 10 files
https://mrprajesh.blogspot.com/2018/11/install-cuda-10-on-linux-mint-19-or.html |
|
|
|
I did the CUDA thing and 1804 is gone and 2004 is in it's place, so I tried that and of course it failed as I'm using Linux Mint 19.3. I will work on just the Lib 10 files
https://mrprajesh.blogspot.com/2018/11/install-cuda-10-on-linux-mint-19-or.html
Thank you but it fails at the part where it says wget...cuda-repo-ubuntu...and says no such directory |
|
|
|
I did the CUDA thing and 1804 is gone and 2004 is in it's place, so I tried that and of course it failed as I'm using Linux Mint 19.3. I will work on just the Lib 10 files
https://mrprajesh.blogspot.com/2018/11/install-cuda-10-on-linux-mint-19-or.html
Thank you but it fails at the part where it says wget...cuda-repo-ubuntu...and says no such directory
I also tried the "runfile(local)" option at this site https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=2004&target_type=runfilelocal and it got to the 98% range and stopped downloading.
I'm guessing either I have the wrong version or don't have enough knowledge and will just run it on my Windows pc's and be happy. |
|
|
|
I did the CUDA thing and 1804 is gone and 2004 is in it's place, so I tried that and of course it failed as I'm using Linux Mint 19.3. I will work on just the Lib 10 files
https://mrprajesh.blogspot.com/2018/11/install-cuda-10-on-linux-mint-19-or.html
Thank you but it fails at the part where it says wget...cuda-repo-ubuntu...and says no such directory
There is no wget command on that web page ! |
|
|
|
I did the CUDA thing and 1804 is gone and 2004 is in it's place, so I tried that and of course it failed as I'm using Linux Mint 19.3. I will work on just the Lib 10 files
https://mrprajesh.blogspot.com/2018/11/install-cuda-10-on-linux-mint-19-or.html
Thank you but it fails at the part where it says wget...cuda-repo-ubuntu...and says no such directory
There is no wget command on that web page !
It's on this page https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=2004&target_type=runfilelocal |
|
|