Server crash
log in

Advanced search

Message boards : News : Server crash

Author Message
Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9899 - Posted: 7 Apr 2024, 14:56:42 UTC
Last modified: 7 Apr 2024, 20:54:32 UTC

The server crashed due a memory leak. The database was corrupt so restored from todays backup, it was expected but need to investigate which was causing this. wanted to fix this in 3d but came earlier.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9901 - Posted: 7 Apr 2024, 16:35:52 UTC - in response to Message 9899.

No log was created, need a day somewhere this week to check the RAM, 2 years ago, one ram module failed too and was replaced by the vendor.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9902 - Posted: 7 Apr 2024, 20:57:02 UTC

Ignore the download errors, the results were already done and the input files dont exist anymore.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9903 - Posted: 8 Apr 2024, 18:14:22 UTC
Last modified: 8 Apr 2024, 19:20:56 UTC

I will cancel all TF WUs, still a lot of broken in background, report the rest and reload the open ones.

Update:
All WUs of batches were cancelled

733-789M
789-799M
799-810M
810-820M

Tomorrow all the rest will be cancelled to cleanup things.

Speedy51
Send message
Joined: 7 Feb 18
Posts: 65
Credit: 196,773,177
RAC: 65,046
Message 9904 - Posted: 8 Apr 2024, 21:33:51 UTC

As I write out of 160 tasks only 18 ran successfully the rest were "failed download" I even reset the project before downloading second lot of 80 tasks

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9905 - Posted: 8 Apr 2024, 21:40:08 UTC - in response to Message 9904.

As I write out of 160 tasks only 18 ran successfully the rest were "failed download" I even reset the project before downloading second lot of 80 tasks


tomorrow I will reupload, cancel the rest and cleanup, then all should be fine again. There was more work done as expected, thats why there are so many download errors, I have changed max error rate to 5 from 8 but other changes doesnt work

Speedy51
Send message
Joined: 7 Feb 18
Posts: 65
Credit: 196,773,177
RAC: 65,046
Message 9906 - Posted: 8 Apr 2024, 21:45:01 UTC - in response to Message 9905.

All good, I will see what I can get processed for you today

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7257
Credit: 42,729,227
RAC: 1
Message 9907 - Posted: 9 Apr 2024, 14:16:38 UTC

All TF WUs were deleted, got the missing datas from GIMPs and will now reimport the rest. No new work will be added until I can check the hardware.

prazape
Send message
Joined: 6 Mar 24
Posts: 2
Credit: 830,990
RAC: 2,206
Message 9927 - Posted: 27 Apr 2024, 16:40:45 UTC - in response to Message 9907.
Last modified: 27 Apr 2024, 17:06:49 UTC

Hi,
it been about a week since all WUs have finished. Are there any news on checking the server and re-starting generation of new ones?

DeleteNull
Volunteer developer
Volunteer tester
Send message
Joined: 29 Nov 14
Posts: 83
Credit: 367,636,322
RAC: 84,255
Message 9928 - Posted: 27 Apr 2024, 17:27:21 UTC - in response to Message 9927.

Sadly No.

rebirther is currently in hospital.

Profile Bill F
Avatar
Send message
Joined: 5 Jul 18
Posts: 18
Credit: 30,474,117
RAC: 232
Message 9931 - Posted: 28 Apr 2024, 3:28:50 UTC - in response to Message 9928.

Sadly No.

rebirther is currently in hospital.


Best Wishes for a speedy and complete recovery.

prazape
Send message
Joined: 6 Mar 24
Posts: 2
Credit: 830,990
RAC: 2,206
Message 9933 - Posted: 28 Apr 2024, 16:38:45 UTC - in response to Message 9928.

Sadly No.

rebirther is currently in hospital.


Good luck and quick recovery to him/her.


Post to thread

Message boards : News : Server crash


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther