Download problems
log in

Advanced search

Message boards : Number crunching : Download problems

Author Message
Profile DoctorNow
Avatar
Send message
Joined: 28 Nov 14
Posts: 17
Credit: 12,421,084
RAC: 0
Message 7202 - Posted: 5 Jan 2021, 13:13:31 UTC
Last modified: 5 Jan 2021, 13:14:19 UTC

Don't know what's up suddenly, but since I ran out of tasks a short while ago and tried to get new ones I keep getting download errors on the wus, plus the manager constantly reports this here with every new try:
05.01.2021 14:07:53 | SRBase | Fetching scheduler list
05.01.2021 14:07:54 | SRBase | Master file download succeeded
So something seems to be off server-wise, can it be?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 7203 - Posted: 5 Jan 2021, 15:32:10 UTC - in response to Message 7202.
Last modified: 5 Jan 2021, 15:37:11 UTC

Don't know what's up suddenly, but since I ran out of tasks a short while ago and tried to get new ones I keep getting download errors on the wus, plus the manager constantly reports this here with every new try:
05.01.2021 14:07:53 | SRBase | Fetching scheduler list
05.01.2021 14:07:54 | SRBase | Master file download succeeded
So something seems to be off server-wise, can it be?


its from the last crash, the input file was already deleted because the WU was processed before and after restoring the database there are some old entries.

Edit:
I have set the max error results to 3, so it should be faster move to delete.

Profile vaughan
Send message
Joined: 9 Dec 14
Posts: 83
Credit: 2,138,100,147
RAC: 3,682,080
Message 7213 - Posted: 8 Jan 2021, 4:59:51 UTC

Download fails today for TF tasks.
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 7214 - Posted: 8 Jan 2021, 7:55:03 UTC - in response to Message 7213.
Last modified: 8 Jan 2021, 9:39:08 UTC

Download fails today for TF tasks.


I have set all WUs to max error results 3, this will take a while to remove the failed WUs. The main issue is the boinc client which set the backlog time up to 24h, must be changed in later versions. You could try to write a script which requests an update of the client every x minutes.

Update:

cleaned up the rest, new fresh WUs will be followed shortly.

PecosRiverM
Send message
Joined: 25 Jun 18
Posts: 7
Credit: 1,727,647,232
RAC: 766,033
Message 7236 - Posted: 12 Jan 2021, 21:18:09 UTC - in response to Message 7214.

Do we have another bad batch?
Some of my cards are getting lots (all) errors as of this morning.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 7237 - Posted: 12 Jan 2021, 21:29:06 UTC - in response to Message 7236.
Last modified: 12 Jan 2021, 21:34:40 UTC

Do we have another bad batch?
Some of my cards are getting lots (all) errors as of this morning.


Normally not, could be some older stuff. You could check the names.

Found this error in logs:

./mfaktc.exe: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by ./mfaktc.exe)

Check your config. The only thing what happened is a driver update and using the cuda110 version not cuda100

PecosRiverM
Send message
Joined: 25 Jun 18
Posts: 7
Credit: 1,727,647,232
RAC: 766,033
Message 7238 - Posted: 12 Jan 2021, 21:34:50 UTC - in response to Message 7237.

This is one of the wu's:

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
15:32:44 (2074854): wrapper (7.2.26012): starting
15:32:44 (2074854): wrapper: running ./mfaktc.exe ( --device 0)
15:32:45 (2074854): ./mfaktc.exe exited; CPU time 0.004503
15:32:45 (2074854): app exit status: 0x100
15:32:45 (2074854): called boinc_finish

</stderr_txt>
]]>


Not sure if that helps

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 7239 - Posted: 12 Jan 2021, 21:37:27 UTC - in response to Message 7238.

This is one of the wu's:

7.16.6

process exited with code 195 (0xc3, -61)


15:32:44 (2074854): wrapper (7.2.26012): starting
15:32:44 (2074854): wrapper: running ./mfaktc.exe ( --device 0)
15:32:45 (2074854): ./mfaktc.exe exited; CPU time 0.004503
15:32:45 (2074854): app exit status: 0x100
15:32:45 (2074854): called boinc_finish


]]>


Not sure if that helps


no see below the glibc error. Your linux has only 2.27 installed.

PecosRiverM
Send message
Joined: 25 Jun 18
Posts: 7
Credit: 1,727,647,232
RAC: 766,033
Message 7240 - Posted: 12 Jan 2021, 21:39:43 UTC - in response to Message 7239.

Okay
This changed today?
How do I get the needed one?

Thanks for your time helping me..

PecosRiverM
Send message
Joined: 25 Jun 18
Posts: 7
Credit: 1,727,647,232
RAC: 766,033
Message 7241 - Posted: 12 Jan 2021, 21:42:19 UTC - in response to Message 7240.

So this is wrong?

"Operating System Linux Ubuntu
Ubuntu 20.04.1 LTS [5.4.0-51-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]"

Sorry I don't know enough.
I've stopped d/l's until I can get this working again.
Don't want to hammer your server.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 7242 - Posted: 12 Jan 2021, 21:46:02 UTC - in response to Message 7241.

So this is wrong?

"Operating System Linux Ubuntu
Ubuntu 20.04.1 LTS [5.4.0-51-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]"

Sorry I don't know enough.
I've stopped d/l's until I can get this working again.
Don't want to hammer your server.


You could try to run the ldd command on the mfaktc app and see whats missing.

PecosRiverM
Send message
Joined: 25 Jun 18
Posts: 7
Credit: 1,727,647,232
RAC: 766,033
Message 7243 - Posted: 12 Jan 2021, 22:11:41 UTC - in response to Message 7242.

Don't know a reboot seems to have fixed it.
At least for now.

PecosRiverM
Send message
Joined: 25 Jun 18
Posts: 7
Credit: 1,727,647,232
RAC: 766,033
Message 7244 - Posted: 13 Jan 2021, 0:06:17 UTC - in response to Message 7243.

All Good so far.
Only difference I'm seeing is "New" wu's are "Cuda 100" and old one's were "Cuda 110"

Sorry about all the errors.

Dr Who Fan
Avatar
Send message
Joined: 30 Nov 14
Posts: 31
Credit: 21,994,505
RAC: 1,398
Message 7260 - Posted: 18 Jan 2021, 7:42:35 UTC

Had a task that has failed downloading for 2 of 2 users so far including me:
Workunit 27257802 ■ name: S520_700-750k_wu_985
Task created 27 Dec 2020, 9:33:52 UTC

Guessing it is one that was corrupted on the last server crash.
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 7261 - Posted: 18 Jan 2021, 16:22:15 UTC - in response to Message 7260.

Had a task that has failed downloading for 2 of 2 users so far including me:
Workunit 27257802 ■ name: S520_700-750k_wu_985
Task created 27 Dec 2020, 9:33:52 UTC

Guessing it is one that was corrupted on the last server crash.


yepp I guess so, will change the max error rate.

Profile marmot
Avatar
Send message
Joined: 17 Nov 16
Posts: 97
Credit: 126,410,450
RAC: 19,839
Message 7279 - Posted: 28 Jan 2021, 19:25:01 UTC

Only doing long3 atm.
Getting a few to d/l (11 in progress) and the 55 d/l errors are across every different machine in the house.

Here's an image: https://ibb.co/7tnxHxt


This still more of the DB crash?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7231
Credit: 42,729,227
RAC: 31
Message 7280 - Posted: 28 Jan 2021, 19:30:47 UTC - in response to Message 7279.
Last modified: 28 Jan 2021, 20:31:05 UTC

Only doing long3 atm.
Getting a few to d/l (11 in progress) and the 55 d/l errors are across every different machine in the house.

Here's an image: https://ibb.co/7tnxHxt


This still more of the DB crash?


I will check...

Update:
The others looks ok but I have set the max_error_results to 5 from 10 to get rid of the older WUs from the crash.


Post to thread

Message boards : Number crunching : Download problems


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther