webserver down
log in

Advanced search

Message boards : News : webserver down

Author Message
Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2685 - Posted: 8 Jul 2016, 14:48:29 UTC

Due the high amount of new hosts attached today this killed the webserver. Nearly 300+ hosts are trying to get new apps over the slow upload connection. There was also reached the max connections to the database. I cant do anything at the moment because the bottleneck is the upload of 2Mbit. I will order 6Mbit upload next time but this will not help much.

There is no problem if all hosts have work but all at once kill every server.

Sorry for the downtime. I will try to tweaking some things but there is only a small hope to get it run better.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2686 - Posted: 8 Jul 2016, 16:11:06 UTC

The upload connection is still overloaded. More and more new hosts are attaching from the same user (400+ now). I have decreased the max_wus_in_progress to 10 per core and set the RPC time to 20s. As long as all new hosts havent got the apps yet the situation will not be better.

Profile Mankka*
Avatar
Send message
Joined: 8 Feb 15
Posts: 40
Credit: 338,892,941
RAC: 0
Message 2687 - Posted: 8 Jul 2016, 16:26:46 UTC

No worries Reb =)

That increase is huge if it happens in a short amount of time and as you are also running those very short ones, the traffic must be massive !

But the most important thing is that you react very fast everytime and always give us an explenation what happened and sometimes we are also able to try to help you out.

THAT'S WHY it's a pleasure to crunch your project (ok, also the primes :)) as it's so well managed !

Mankka*

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2688 - Posted: 8 Jul 2016, 16:32:54 UTC - in response to Message 2687.

No worries Reb =)

That increase is huge if it happens in a short amount of time and as you are also running those very short ones, the traffic must be massive !

But the most important thing is that you react very fast everytime and always give us an explenation what happened and sometimes we are also able to try to help you out.

THAT'S WHY it's a pleasure to crunch your project (ok, also the primes :)) as it's so well managed !

Mankka*


Thank you Mankka*. A 40Mbit connection would be nice and doesnt affect the amount of hosts. Sending out work (small ones) have not so much traffic as the app itself for newer hosts.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2689 - Posted: 8 Jul 2016, 17:06:26 UTC

It seems that all the new hosts got the app now. The server is normalizing slowly but the settings are now back to the last ones.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2690 - Posted: 8 Jul 2016, 18:38:47 UTC

Nearly 600 new hosts now. The user is unbelievable. I hope I can add more work soon until the server is faster or we run dry.

Profile Mankka*
Avatar
Send message
Joined: 8 Feb 15
Posts: 40
Credit: 338,892,941
RAC: 0
Message 2691 - Posted: 8 Jul 2016, 19:04:08 UTC - in response to Message 2690.

Nearly 600 new hosts now. The user is unbelievable. I hope I can add more work soon until the server is faster or we run dry.


You actually answered my next question, as I know you have to do it "manually" and when you upload big amounts of new tasks, the connection is also very stressed..

But I think it's a happy problem for the project and if you need some new stuff, don't forget to add them to the ongoing BU campaign ;)

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2692 - Posted: 8 Jul 2016, 19:16:05 UTC - in response to Message 2691.

Nearly 600 new hosts now. The user is unbelievable. I hope I can add more work soon until the server is faster or we run dry.


You actually answered my next question, as I know you have to do it "manually" and when you upload big amounts of new tasks, the connection is also very stressed..

But I think it's a happy problem for the project and if you need some new stuff, don't forget to add them to the ongoing BU campaign ;)


The main problem is that most of the hosts are erroring out with disk_limit_exceeded and stress the network connection which is causing to crash and I cannot get it alive after a restart of the VM only. I must increase the max_connections and user_connections now in the webserver config due some mysql errors.

Profile Mankka*
Avatar
Send message
Joined: 8 Feb 15
Posts: 40
Credit: 338,892,941
RAC: 0
Message 2693 - Posted: 8 Jul 2016, 19:40:00 UTC - in response to Message 2692.

Nearly 600 new hosts now. The user is unbelievable. I hope I can add more work soon until the server is faster or we run dry.


You actually answered my next question, as I know you have to do it "manually" and when you upload big amounts of new tasks, the connection is also very stressed..

But I think it's a happy problem for the project and if you need some new stuff, don't forget to add them to the ongoing BU campaign ;)


The main problem is that most of the hosts are erroring out with disk_limit_exceeded and stress the network connection which is causing to crash and I cannot get it alive after a restart of the VM only. I must increase the max_connections and user_connections now in the webserver config due some mysql errors.


...ok ? but as I told you earlier, make 'em 10 times the needed ones, as the MySQL seems to sit on "old" connections forever before dropping them, and I haven't figured out how to get around it (kill them fast) :(

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2694 - Posted: 8 Jul 2016, 19:44:23 UTC - in response to Message 2693.

Nearly 600 new hosts now. The user is unbelievable. I hope I can add more work soon until the server is faster or we run dry.


You actually answered my next question, as I know you have to do it "manually" and when you upload big amounts of new tasks, the connection is also very stressed..

But I think it's a happy problem for the project and if you need some new stuff, don't forget to add them to the ongoing BU campaign ;)


The main problem is that most of the hosts are erroring out with disk_limit_exceeded and stress the network connection which is causing to crash and I cannot get it alive after a restart of the VM only. I must increase the max_connections and user_connections now in the webserver config due some mysql errors.


...ok ? but as I told you earlier, make 'em 10 times the needed ones, as the MySQL seems to sit on "old" connections forever before dropping them, and I haven't figured out how to get around it (kill them fast) :(


I have increased now the max_connections to 3000 and user_connections to 300 but the warning told me 42000/1200, thats out of the limit ^^

Profile Mankka*
Avatar
Send message
Joined: 8 Feb 15
Posts: 40
Credit: 338,892,941
RAC: 0
Message 2695 - Posted: 8 Jul 2016, 20:09:21 UTC
Last modified: 8 Jul 2016, 20:22:25 UTC

I hope you have sent him a PM if his wus are erroring out with lack of disk space, kinda odd with SRBase ? (If I understood you right ?)

He should actually be in here, taking part of this discussion !

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2696 - Posted: 8 Jul 2016, 20:12:18 UTC - in response to Message 2695.

I hope you have sent him a PM if he's wus are erroring out with lack of disk space, kinda odd with SRBase ? (If I understood you right ?)

He should atually be in here, taking part of this discussion !


Yes, I have and no this error is on the hosts side.

Profile Mankka*
Avatar
Send message
Joined: 8 Feb 15
Posts: 40
Credit: 338,892,941
RAC: 0
Message 2697 - Posted: 8 Jul 2016, 20:35:52 UTC
Last modified: 8 Jul 2016, 20:43:13 UTC

...well, at some point (and VERY soonPLS) you will have to ban him for a while, as it's messing up the whole project (stalled up/downloads, report problems, Boinc client backing off for 24 hrs...), as it's not nice if He's not serious about crunching the wus, sorry !

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2698 - Posted: 8 Jul 2016, 20:46:23 UTC - in response to Message 2697.

...well, at some point (soon PLS) you will have to ban him for a while, as it's messing up the whole project (stalled up/downloads, report problems, Boinc client backing off for 24 hrs...), not nice if He's not serious & crunch the wus, sorry !


Its a bad timing with all these short WUs and I dont want to ban him. I have reduced the max_WUs_in_progress to 20. The new hosts are still climbing. The back off time of BOINC is a bit mess and should be changed by the devs soon.

Since a few hours Iam trying to keep the server alive. Need a break now. We will see how its going until tomorrow and I have no idea how many new hosts are up at this time. Must be an end...

Profile Mankka*
Avatar
Send message
Joined: 8 Feb 15
Posts: 40
Credit: 338,892,941
RAC: 0
Message 2699 - Posted: 8 Jul 2016, 20:51:33 UTC - in response to Message 2698.

I understand you, and I'll babysit a while until I have my caches full ;)

So, grab a cold one Kippis! (Prost! )

Profile Johnny Rotten
Avatar
Send message
Joined: 9 Apr 15
Posts: 2
Credit: 65,877,207
RAC: 317
Message 2700 - Posted: 9 Jul 2016, 1:12:00 UTC - in response to Message 2698.

I seriously hope Syracuse University has sanctioned 950+ workstations for this project, otherwise somebody is in serious trouble. Just my 2 cents

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2706 - Posted: 9 Jul 2016, 19:26:16 UTC
Last modified: 9 Jul 2016, 19:26:24 UTC

The server has been normalized. New work is incoming.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7228
Credit: 42,729,227
RAC: 34
Message 2716 - Posted: 11 Jul 2016, 8:09:27 UTC

I have ordered now a new internet connection with 400Mbit dl / 25Mbit ul but the download is not relevant. The change will be done next week. The download speed is 3MB/s instead 256kb (for all users). This should be reduce the bottleneck of the network.


Post to thread

Message boards : News : webserver down


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther