Server/DB crash / Hardwarefailure
log in

Advanced search

Message boards : News : Server/DB crash / Hardwarefailure

Author Message
Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7457
Credit: 42,792,827
RAC: 5,432
Message 8067 - Posted: 15 Jan 2022, 12:49:03 UTC

Looks like a broken RAM / hardware crashed the hole server, the downtime was 15h. I have moved the VM to my old server, this was stable but have now some errors in Linux, if you fix something another error appears. The win log still telling me broken hardware but need to take time to maybe reinstall OS, testing RAM etc.

You will now getting a lot of download errors because the input files were deleted while the WU was already finished. Around 1h after my last backup yesterday the server seems to be crashed.

Iam thinking about to let the BOINC VM on the old server, no updates (except firewall), less OS crash impacts but slower.

Apologies for the outage.

Paul
Send message
Joined: 7 Feb 16
Posts: 3
Credit: 1,791,210
RAC: 0
Message 8068 - Posted: 15 Jan 2022, 14:09:00 UTC - in response to Message 8067.

Old tasks uploaded and reported but do not show on tasks page
New tasks downloaded and listed as expected.

Paul.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7457
Credit: 42,792,827
RAC: 5,432
Message 8069 - Posted: 15 Jan 2022, 22:08:59 UTC

The OS is now reinstalled completely, need to check some things further. I cant run the VM on the old Win7 system. Its very slow to upload files to DB or backups / reading folders with many files. The OS was broken on the other drive.

bluestang
Send message
Joined: 6 Jun 19
Posts: 60
Credit: 2,244,690,070
RAC: 1,573,450
Message 8070 - Posted: 16 Jan 2022, 5:03:44 UTC

I'm trying to add a new host and getting a failure error in BOINC. Never even get to the username and password screen.

Is this an issue cause by the recent crash?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7457
Credit: 42,792,827
RAC: 5,432
Message 8071 - Posted: 16 Jan 2022, 5:13:00 UTC - in response to Message 8070.

I'm trying to add a new host and getting a failure error in BOINC. Never even get to the username and password screen.

Is this an issue cause by the recent crash?


No, if you have installed an older version of BOINC dont forget to update the cert file.

bluestang
Send message
Joined: 6 Jun 19
Posts: 60
Credit: 2,244,690,070
RAC: 1,573,450
Message 8072 - Posted: 16 Jan 2022, 5:28:59 UTC - in response to Message 8071.

Windows 10 machine and was on 7.16.11 so I updated to 7.16.20 and that worked for some reason. Thanks.

Paul
Send message
Joined: 7 Feb 16
Posts: 3
Credit: 1,791,210
RAC: 0
Message 8073 - Posted: 16 Jan 2022, 8:48:26 UTC - in response to Message 8067.

Are task such as below that are not now listed on the web site lost and should I abort them?

Application
Sierpinski / Riesel Base - long 0.22
Name
R2_3-4M_wu_2402
Received
Fri 14 Jan 2022 21:33:33 GMT
Report deadline
Mon 17 Jan 2022 21:32:57 GMT

Paul.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7457
Credit: 42,792,827
RAC: 5,432
Message 8074 - Posted: 16 Jan 2022, 8:53:42 UTC - in response to Message 8073.

Are task such as below that are not now listed on the web site lost and should I abort them?

Application
Sierpinski / Riesel Base - long 0.22
Name
R2_3-4M_wu_2402
Received
Fri 14 Jan 2022 21:33:33 GMT
Report deadline
Mon 17 Jan 2022 21:32:57 GMT

Paul.


If you have received this shortly after the crash then you can abort them. Your listed one is still in your account.

Paul
Send message
Joined: 7 Feb 16
Posts: 3
Credit: 1,791,210
RAC: 0
Message 8075 - Posted: 16 Jan 2022, 12:00:27 UTC - in response to Message 8074.

BOINCTasks shows 3 units returned but only R2_3-4M_wu_2403 is listed.
I have another 12 or so units on that PC fetched at the same time, around 21:33 on 14th.
Lenovo1 SRBase Sierpinski / Riesel Base - long R2_3-4M_wu_2402_0 11:50:45 94.41% Jan 16, 2022, 09:33:22 AM OK
Lenovo1 SRBase Sierpinski / Riesel Base - long R2_3-4M_wu_2403_0 12:03:42 93.08% Jan 15, 2022, 09:41:55 PM OK
Lenovo1 SRBase Sierpinski / Riesel Base - long R2_3-4M_wu_2399_0 11:58:35 93.79% Jan 15, 2022, 12:53:46 PM OK

darineugenius
Send message
Joined: 20 Jan 22
Posts: 2
Credit: 0
RAC: 0
Message 8082 - Posted: 20 Jan 2022, 20:29:50 UTC - in response to Message 8074.

I apologize for making an account just to comment, but my world community grid will not connect with your project at this time. (Jan 20th, 2022 -- 15:27) I read a comment about updating a "cert" file, so if you think that is the problem with me, would you mind explaining how I update this on Win10? Or on the World community grid site?

darineugenius
Send message
Joined: 20 Jan 22
Posts: 2
Credit: 0
RAC: 0
Message 8083 - Posted: 20 Jan 2022, 20:32:29 UTC - in response to Message 8082.
Last modified: 20 Jan 2022, 20:34:33 UTC

I guess I am meaning to say BOINC MAnager, and maybe not world community grid. I will check back here later.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7457
Credit: 42,792,827
RAC: 5,432
Message 8084 - Posted: 20 Jan 2022, 20:33:11 UTC - in response to Message 8082.

I apologize for making an account just to comment, but my world community grid will not connect with your project at this time. (Jan 20th, 2022 -- 15:27) I read a comment about updating a "cert" file, so if you think that is the problem with me, would you mind explaining how I update this on Win10? Or on the World community grid site?


Sure see the FAQ thread, you only need to replace the file

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7457
Credit: 42,792,827
RAC: 5,432
Message 8085 - Posted: 23 Jan 2022, 8:44:42 UTC

Testing complete:

1 of 4 RAM modules was broken, need to be replaced it soon, still in warranty but has held only for 2 years perhaps earlier with all theses issues.

The server VM will be moved to the new server shortly. This will take around <1h


Post to thread

Message boards : News : Server/DB crash / Hardwarefailure


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther