server issues / updates
log in

Advanced search

Message boards : Number crunching : server issues / updates

1 · 2 · 3 · Next
Author Message
Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 3827 - Posted: 15 Sep 2017, 16:30:09 UTC

A windows process which doesnt closed caused a bluescreen on the machine. I have used the time to install the latest updates. Now all should be back to normal.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 3944 - Posted: 29 Oct 2017, 9:19:16 UTC

There was a short power cut due a big storm here. Everything should be back to normal.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 4048 - Posted: 12 Dec 2017, 2:42:48 UTC

The database crashed over night, Iam trying to reduce the size by purging some files. Only a restart of the server helped to get back all normal. I think we hit the upper limit again due a lot of WUs.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 4261 - Posted: 11 Mar 2018, 12:38:33 UTC

My email account was blocked 2 times this week due hacking attempts, I have changed the password again. This also affected the notifications on the server so if you dont get any in the last days pls check your account.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 4924 - Posted: 12 Feb 2019, 17:42:34 UTC
Last modified: 12 Feb 2019, 18:39:58 UTC

What a bad day, at the end the VM dont reacting anymore during WU creation. Need to check where it was cutted, restarted...

Around 1000-2000+ input files were empty. This was strange. If you are getting errors then you got some. The new batch has an _a behind.

Update:
I think there was a memory issue (hit the limit) and nothing worked anymore correctly.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 5056 - Posted: 16 Mar 2019, 21:25:34 UTC

There was a 5h downtime of my cable provider. Nothing worked until now. All is back to normal.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 5075 - Posted: 25 Mar 2019, 6:28:41 UTC
Last modified: 25 Mar 2019, 8:13:00 UTC

The project was away. 2 SSDs were gone also the project SSD. Thats not a good sign. The mainboard doesnt recognized both SSD anymore. After a few restarts they are all online again. The health is still 100% so it must be something with the mainboard. Lets see for how long.

Update:
Could also be the SATA cable. Need more observations.

Profile marmot
Avatar
Send message
Joined: 17 Nov 16
Posts: 97
Credit: 148,895,682
RAC: 586,289
Message 5076 - Posted: 25 Mar 2019, 15:05:36 UTC - in response to Message 5075.

Is the RAID controller on a riser card?

Could be small increase in transfer resistance at the contacts.

One of the Cisco's didn't boot after sitting during the summer, reseated the RAID and it's worked fine since.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 5077 - Posted: 25 Mar 2019, 15:23:49 UTC - in response to Message 5076.

Is the RAID controller on a riser card?

Could be small increase in transfer resistance at the contacts.

One of the Cisco's didn't boot after sitting during the summer, reseated the RAID and it's worked fine since.


No raid, all ssds are hanging on the SATA ports.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 5207 - Posted: 19 May 2019, 19:49:38 UTC

Got a Freeze with Bluescreen Memory Management. I hope the RAM is still ok. Restart looks ok.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 5580 - Posted: 13 Dec 2019, 21:33:10 UTC
Last modified: 14 Dec 2019, 8:21:39 UTC

There was an unexpected server crash while running a program in the background. Still investigating.

Update:
stopped the program for a while. Also found a bad host trashed the results (TOP1 host). This host was blacklisted and credits were deleted. There are a lot of WUs need to be rerun later.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 5629 - Posted: 21 Jan 2020, 20:12:15 UTC
Last modified: 23 Jan 2020, 20:18:03 UTC

There was a DB error with max_connections. I cant explain yet why.

Update:
the team table and some related tables were crashed. Fixed itself by mysql.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 7442 - Posted: 17 Mar 2021, 9:53:53 UTC
Last modified: 17 Mar 2021, 10:01:34 UTC

There was a massive slowdown of the host system, the firewall was updated in the background and got a higher CPU load. After a server restart all seems to be fine again.

Sorry for the short outage.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 7450 - Posted: 24 Mar 2021, 7:26:20 UTC
Last modified: 26 Apr 2021, 17:49:21 UTC

If you see some download errors Iam having a lot of provider outages the last 3 month.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 7526 - Posted: 26 Apr 2021, 17:50:19 UTC

The ssl certificate was renewed.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 7535 - Posted: 2 May 2021, 8:53:31 UTC

My cable provider is a big mess today, many disconnects. The router restarts every time.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 7543 - Posted: 9 May 2021, 14:01:38 UTC - in response to Message 7535.

My cable provider is a big mess today, many disconnects. The router restarts every time.


Same today.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 7545 - Posted: 10 May 2021, 5:47:32 UTC
Last modified: 10 May 2021, 15:13:20 UTC

There could be a massive slowdown of the internet connection. Has someone else seen this by connecting to the project?

I need to restart the VM, something Database related is very slow.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,651,567
RAC: 40,309
Message 7546 - Posted: 10 May 2021, 16:11:42 UTC

Sorry for the timeout, the firewall need to be reinstalled due an error, updated also the graphics driver.

Still have a slowdown on the database, still investigating...

Greger
Send message
Joined: 1 Nov 16
Posts: 11
Credit: 3,408,414,505
RAC: 338,470
Message 7547 - Posted: 10 May 2021, 16:13:35 UTC - in response to Message 7545.

Site and servers was down for a few minutes recently. All back now.

It did say it was in maintenance.

1 · 2 · 3 · Next
Post to thread

Message boards : Number crunching : server issues / updates


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther