No GPU work for some hosts
log in

Advanced search

Message boards : News : No GPU work for some hosts

Author Message
Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 6949 - Posted: 16 Nov 2020, 16:40:40 UTC

Yesterday I have added some new plan_class changes in the plan_class_spec.xml

There was a driver limit for different cuda versions. Now some users reported me some specs and the driver version overlapped with upcoming new cuda11 apps while they havent get any new work. Thats bad because old cards cannot running the new cuda11 apps. I need to find out another way which hosts can run which app.

The limitation was removed so all should be fixed now.

If you have still some issues let me know.

xii5ku
Send message
Joined: 17 Jun 17
Posts: 1
Credit: 121,801,088
RAC: 0
Message 6950 - Posted: 16 Nov 2020, 18:21:37 UTC - in response to Message 6949.
Last modified: 16 Nov 2020, 18:24:45 UTC

I am now getting repeated download errors:

Mon 16 Nov 2020 05:45:41 PM CET | SRBase | Giving up on download of worktodo13a83_0049707.txt: permanent HTTP error Mon 16 Nov 2020 06:05:45 PM CET | SRBase | Giving up on download of worktodo13a83_0051780.txt: permanent HTTP error Mon 16 Nov 2020 06:05:45 PM CET | SRBase | Giving up on download of worktodo13a83_0051971.txt: permanent HTTP error Mon 16 Nov 2020 06:05:47 PM CET | SRBase | Giving up on download of worktodo13a83_0052011.txt: permanent HTTP error Mon 16 Nov 2020 06:05:47 PM CET | SRBase | Giving up on download of worktodo13a83_0052088.txt: permanent HTTP error Mon 16 Nov 2020 06:25:46 PM CET | SRBase | Giving up on download of worktodo13a83_0041722.txt: permanent HTTP error Mon 16 Nov 2020 06:45:47 PM CET | SRBase | Giving up on download of worktodo13a83_0045442.txt: permanent HTTP error Mon 16 Nov 2020 07:05:45 PM CET | SRBase | Giving up on download of worktodo13a83_0042128.txt: permanent HTTP error Mon 16 Nov 2020 07:05:45 PM CET | SRBase | Giving up on download of worktodo13a83_0051883.txt: permanent HTTP error Mon 16 Nov 2020 07:05:47 PM CET | SRBase | Giving up on download of worktodo13a83_0051952.txt: permanent HTTP error Mon 16 Nov 2020 07:15:21 PM CET | SRBase | Giving up on download of worktodo13a83_0006927.txt: permanent HTTP error Mon 16 Nov 2020 07:15:22 PM CET | SRBase | Giving up on download of worktodo13a83_0038077.txt: permanent HTTP error Mon 16 Nov 2020 07:15:22 PM CET | SRBase | Giving up on download of worktodo13a83_0038451.txt: permanent HTTP error Mon 16 Nov 2020 07:15:23 PM CET | SRBase | Giving up on download of worktodo13a83_0038610.txt: permanent HTTP error


All of these are prefixed by worktodo13a83_. I did not see any of my recent worktodo13a81_ and worktodo13a82_ downloads fail. However, there were a few worktodo13a83_'s which were successfully downloaded.

(edit)
The log notices are from one host, but I saw the same on others, all behind the same home internet connection.

OTOH, I saw
Mon 16 Nov 2020 07:18:17 PM CET | SRBase | Started download of mfaktc-linux64-cuda11v1.zip

on one host right now. Maybe the prior failures were a byproduct of the application version transition?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 6952 - Posted: 16 Nov 2020, 18:25:15 UTC - in response to Message 6950.

I am now getting repeated download errors:

Mon 16 Nov 2020 05:45:41 PM CET | SRBase | Giving up on download of worktodo13a83_0049707.txt: permanent HTTP error Mon 16 Nov 2020 06:05:45 PM CET | SRBase | Giving up on download of worktodo13a83_0051780.txt: permanent HTTP error Mon 16 Nov 2020 06:05:45 PM CET | SRBase | Giving up on download of worktodo13a83_0051971.txt: permanent HTTP error Mon 16 Nov 2020 06:05:47 PM CET | SRBase | Giving up on download of worktodo13a83_0052011.txt: permanent HTTP error Mon 16 Nov 2020 06:05:47 PM CET | SRBase | Giving up on download of worktodo13a83_0052088.txt: permanent HTTP error Mon 16 Nov 2020 06:25:46 PM CET | SRBase | Giving up on download of worktodo13a83_0041722.txt: permanent HTTP error Mon 16 Nov 2020 06:45:47 PM CET | SRBase | Giving up on download of worktodo13a83_0045442.txt: permanent HTTP error Mon 16 Nov 2020 07:05:45 PM CET | SRBase | Giving up on download of worktodo13a83_0042128.txt: permanent HTTP error Mon 16 Nov 2020 07:05:45 PM CET | SRBase | Giving up on download of worktodo13a83_0051883.txt: permanent HTTP error Mon 16 Nov 2020 07:05:47 PM CET | SRBase | Giving up on download of worktodo13a83_0051952.txt: permanent HTTP error Mon 16 Nov 2020 07:15:21 PM CET | SRBase | Giving up on download of worktodo13a83_0006927.txt: permanent HTTP error Mon 16 Nov 2020 07:15:22 PM CET | SRBase | Giving up on download of worktodo13a83_0038077.txt: permanent HTTP error Mon 16 Nov 2020 07:15:22 PM CET | SRBase | Giving up on download of worktodo13a83_0038451.txt: permanent HTTP error Mon 16 Nov 2020 07:15:23 PM CET | SRBase | Giving up on download of worktodo13a83_0038610.txt: permanent HTTP error


All of these are prefixed by worktodo13a83_. I did not see any of my recent worktodo13a81_ and worktodo13a82_ downloads fail. However, there were a few worktodo13a83_'s which were successfully downloaded.


These are some older WUs after the DB crash. Dont worry about it. The rest should be ok.

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,272,676,567
RAC: 2,531,253
Message 6956 - Posted: 16 Nov 2020, 20:02:19 UTC - in response to Message 6952.

I started getting new tasks on one host that uses a GTX 1080ti but they've now stopped.

Event Log just shows "Project requested delay of 7 seconds" and the same response again after the 7 second delay.

GPU info is: NVIDIA GeForce GTX 1080 Ti (4095MB) driver: 456.71 OpenCL: 1.2
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 6957 - Posted: 16 Nov 2020, 20:40:03 UTC - in response to Message 6956.

I started getting new tasks on one host that uses a GTX 1080ti but they've now stopped.

Event Log just shows "Project requested delay of 7 seconds" and the same response again after the 7 second delay.

GPU info is: NVIDIA GeForce GTX 1080 Ti (4095MB) driver: 456.71 OpenCL: 1.2


Can you try again?

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,272,676,567
RAC: 2,531,253
Message 6959 - Posted: 16 Nov 2020, 21:03:25 UTC - in response to Message 6957.
Last modified: 16 Nov 2020, 21:10:54 UTC

Just tried. Neither that host, or another using a GTX 1660 Super are able to download.

Both receive 0 tasks, delay 7 seconds and then repeat the delay request.

16/11/2020 21:09:02 | SRBase | Started upload of TF_72-73_219-245M_wu_59734_1_0 16/11/2020 21:09:04 | SRBase | Finished upload of TF_72-73_219-245M_wu_59734_1_0 16/11/2020 21:09:07 | SRBase | Sending scheduler request: To report completed tasks. 16/11/2020 21:09:07 | SRBase | Reporting 1 completed tasks 16/11/2020 21:09:07 | SRBase | Requesting new tasks for NVIDIA GPU 16/11/2020 21:09:09 | SRBase | Scheduler request completed: got 0 new tasks 16/11/2020 21:09:09 | SRBase | Project requested delay of 7 seconds 16/11/2020 21:09:19 | SRBase | Sending scheduler request: To fetch work. 16/11/2020 21:09:19 | SRBase | Requesting new tasks for NVIDIA GPU 16/11/2020 21:09:20 | SRBase | Scheduler request completed: got 0 new tasks 16/11/2020 21:09:20 | SRBase | Project requested delay of 7 seconds

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 6960 - Posted: 16 Nov 2020, 21:07:49 UTC - in response to Message 6959.
Last modified: 16 Nov 2020, 21:13:49 UTC

Just tried. Neither that host, or another using a GTX 1660 Super are able to download.

Both receive 0 tasks, delay 7 seconds and then repeat the delay request.


what cuda version do you have (10.0, 10.1, 10.2)? Which host?

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,272,676,567
RAC: 2,531,253
Message 6961 - Posted: 16 Nov 2020, 21:20:01 UTC - in response to Message 6960.

How do you find that on Windows?

Can you do it from the command line?

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,272,676,567
RAC: 2,531,253
Message 6962 - Posted: 16 Nov 2020, 21:21:40 UTC - in response to Message 6960.

Hosts:

ID: 208630 -- 1660 super

ID: 208801 -- 1080ti

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 6963 - Posted: 16 Nov 2020, 21:22:21 UTC - in response to Message 6961.

How do you find that on Windows?

Can you do it from the command line?


copy the app and the ini file to another folder and run

mfaktc.exe --perftest

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,272,676,567
RAC: 2,531,253
Message 6964 - Posted: 16 Nov 2020, 22:11:01 UTC - in response to Message 6963.
Last modified: 16 Nov 2020, 22:12:12 UTC

I'll do that tomorrow, as these hosts are remote.

I've just checked them all and we are in the same situation we were last night.

None of the Nvidia GTX GPUs are receiving work units.

The only GPU receiving work is an old Nvidia P600.
____________

Senilix
Send message
Joined: 1 Dec 14
Posts: 4
Credit: 50,041,869
RAC: 0
Message 6965 - Posted: 16 Nov 2020, 23:07:49 UTC

My nVidia GeForce GTX 1060 can't get any work too. BOINC client keeps on telling me >>No taks are available for TF<<.

Here's the output of mfaktc.

d:\Backup\Temp1>mfaktc-win-64.exe --perftest mfaktc v0.21 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits Checkpoints enabled CheckpointDelay 300s WorkFileAddDelay disabled Stages enabled StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 10.0 CUDA runtime version 10.0 CUDA driver version 11.10 CUDA device info name GeForce GTX 1060 3GB compute capability 6.1 max threads per block 1024 max shared memory per MP 98304 byte number of multiprocessors 9 clock rate (CUDA cores) 1784MHz memory clock rate: 4004MHz memory bus width: 192 bit Automatic parameters threads per grid 589824 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 running a simple selftest... Selftest statistics number of tests 107 successfull tests 107 selftest PASSED! Can't open workfile worktodo.txt ERROR: get_next_assignment(): can't open "worktodo.txt" d:\Backup\Temp1>

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 6966 - Posted: 17 Nov 2020, 6:52:56 UTC
Last modified: 17 Nov 2020, 6:53:10 UTC

I have removed the max_cuda_version limit for cuda10 and replaced it with maximum compute capability. Pls check if its working.

Profile PDW
Send message
Joined: 15 Oct 15
Posts: 41
Credit: 1,078,217,894
RAC: 43,599
Message 6967 - Posted: 17 Nov 2020, 8:32:13 UTC - in response to Message 6966.

That worked for me.

Profile IDEA
Avatar
Send message
Joined: 23 Sep 20
Posts: 33
Credit: 4,272,676,567
RAC: 2,531,253
Message 6968 - Posted: 17 Nov 2020, 10:07:19 UTC - in response to Message 6966.

Yes, working with GTX GPUs again.

Excluding the 1660 would be annoying.

Do you consider the 1660 GPU unsuitable for future tasks?
____________

Senilix
Send message
Joined: 1 Dec 14
Posts: 4
Credit: 50,041,869
RAC: 0
Message 6969 - Posted: 17 Nov 2020, 10:49:39 UTC - in response to Message 6966.

I have removed the max_cuda_version limit for cuda10 and replaced it with maximum compute capability. Pls check if its working.

Yes, it's working now. Excellent job.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 6970 - Posted: 17 Nov 2020, 10:59:50 UTC - in response to Message 6968.

Yes, working with GTX GPUs again.

Excluding the 1660 would be annoying.

Do you consider the 1660 GPU unsuitable for future tasks?


no, still running under different apps/cuda versions.


Post to thread

Message boards : News : No GPU work for some hosts


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther