server overloaded ???
log in

Advanced search

Message boards : Number crunching : server overloaded ???

Author Message
ardo
Send message
Joined: 14 Mar 15
Posts: 11
Credit: 54,761,114
RAC: 0
Message 1925 - Posted: 7 Oct 2015, 17:48:55 UTC

Since this weekend new work got uploaded to the server I see a lot of messages "Project communication failed" on all my hosts resulting in my hosts being more idle than productive when trying to do the small tasks.

Thanks,
Ardo

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 1926 - Posted: 7 Oct 2015, 18:22:15 UTC - in response to Message 1925.

Since this weekend new work got uploaded to the server I see a lot of messages "Project communication failed" on all my hosts resulting in my hosts being more idle than productive when trying to do the small tasks.

Thanks,
Ardo


Hmm, I have no problems here.

ardo
Send message
Joined: 14 Mar 15
Posts: 11
Credit: 54,761,114
RAC: 0
Message 1927 - Posted: 7 Oct 2015, 20:10:27 UTC

Interesting. Is perhaps the database server "busy" which on its turn makes the web server "busy"?

By the way, I switched the hosts to long tasks and average tasks to keep them happily crunching...

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 1928 - Posted: 7 Oct 2015, 20:43:16 UTC - in response to Message 1927.

Interesting. Is perhaps the database server "busy" which on its turn makes the web server "busy"?

By the way, I switched the hosts to long tasks and average tasks to keep them happily crunching...


Not really, all looks ok on server side. Is anyone else seeing this issue?

ardo
Send message
Joined: 14 Mar 15
Posts: 11
Credit: 54,761,114
RAC: 0
Message 1929 - Posted: 8 Oct 2015, 1:22:57 UTC

To gather additional data: I did some tasks from Milkyway@Home and they all went through without any issues. Then I switched back to SRBase and the first update command resulted again in "failing to communicate".

ardo
Send message
Joined: 14 Mar 15
Posts: 11
Credit: 54,761,114
RAC: 0
Message 1943 - Posted: 12 Oct 2015, 21:19:03 UTC

Some further data: I rebooted all the hosts as well as all the networking equipment, up to and including the cable modem, and the issue persists.

Could this be an issue between the Internet and the host you're running this one, like e.g. a firewall?

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 1944 - Posted: 13 Oct 2015, 18:10:50 UTC - in response to Message 1943.

Some further data: I rebooted all the hosts as well as all the networking equipment, up to and including the cable modem, and the issue persists.

Could this be an issue between the Internet and the host you're running this one, like e.g. a firewall?


No, the ports are all open for BOINC upload/download. Can you upload the files after a second time or is it still unchanged?

ardo
Send message
Joined: 14 Mar 15
Posts: 11
Credit: 54,761,114
RAC: 0
Message 1949 - Posted: 13 Oct 2015, 22:18:00 UTC
Last modified: 13 Oct 2015, 22:19:47 UTC

The situation has not changed: Most of the uploads, reports and downloads are OK, but there are quite a few retries, double retries, and even triple retries. And once the uploads succeeds, then there is the wait for the reports to succeed and then there is the downloads with a similar number of retries.

I also noticed that some of the download issues are of the "project deferred for large number of minutes" category.

The cache is set to 0.1 day. For the (very) short tasks, say less than a minute, that is not sufficient, with the maximum of 7 WUs per core, to survive all the retries making the running idle from time to time. For the longer tasks the cache is sufficient to keep the cores busy continuously.

I put one of the hosts temporarily on PrimeGrid doing ESP/PSP/SoB sieving and it all went without any issue whatsoever. And in all the years of doing PrimeGrid WUs I never had this issue.

Maybe this project simply does not want me to process that many WUs... ;-)

ardo
Send message
Joined: 14 Mar 15
Posts: 11
Credit: 54,761,114
RAC: 0
Message 1961 - Posted: 16 Oct 2015, 19:44:22 UTC

Just FYI: I do not know what happened and what, if anything, got changed and/or cleared up where, but since yesterday the WUs are flowing as smooth as before I reported the issue in this thread. Life is good again. :-)

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 1962 - Posted: 16 Oct 2015, 19:52:58 UTC - in response to Message 1961.

Just FYI: I do not know what happened and what, if anything, got changed and/or cleared up where, but since yesterday the WUs are flowing as smooth as before I reported the issue in this thread. Life is good again. :-)


Good to hear, nothing changed on server side.

Ananas
Send message
Joined: 26 Nov 15
Posts: 10
Credit: 370,238
RAC: 0
Message 2130 - Posted: 1 Dec 2015, 8:26:15 UTC - in response to Message 1962.

...
Good to hear, nothing changed on server side.

Lots of tiny results seem to have a major impact on the server's connection to the internet, when a lot of the 2-3 minutes Riesel results are available, the server becomes somewhat sluggish sometimes.

I guess it is not necessarily the throughput but rather the number of concurrent connections that causes this.

Profile Mankka*
Avatar
Send message
Joined: 8 Feb 15
Posts: 40
Credit: 338,892,941
RAC: 0
Message 2455 - Posted: 2 Apr 2016, 9:40:45 UTC

Just FYI Reb.
I get this when I try to look at my valid tasks:
"Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 64 bytes) in /home/boincadm/projects/sr5/html/inc/db_conn.inc on line 119"

Mankka*

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 2456 - Posted: 2 Apr 2016, 9:50:32 UTC - in response to Message 2455.
Last modified: 2 Apr 2016, 10:04:53 UTC

Just FYI Reb.
I get this when I try to look at my valid tasks:
"Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 64 bytes) in /home/boincadm/projects/sr5/html/inc/db_conn.inc on line 119"

Mankka*


Its working for me but the current work creation with 1M WUs could make some issues with the php RAM reserveration.

Update:
I have changed the memory limit in php.ini to 1GB. Can you check if the error is still present?

Profile Mankka*
Avatar
Send message
Joined: 8 Feb 15
Posts: 40
Credit: 338,892,941
RAC: 0
Message 2457 - Posted: 2 Apr 2016, 12:01:50 UTC - in response to Message 2456.

Just FYI Reb.
I get this when I try to look at my valid tasks:
"Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 64 bytes) in /home/boincadm/projects/sr5/html/inc/db_conn.inc on line 119"

Mankka*


Its working for me but the current work creation with 1M WUs could make some issues with the php RAM reserveration.

Update:
I have changed the memory limit in php.ini to 1GB. Can you check if the error is still present?


I still get the same, but it looks like the change didn't update (reboot needed ?), as I get 536870912 bytes (512 MB & not 1 GB) in the error...

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 2458 - Posted: 2 Apr 2016, 12:19:37 UTC - in response to Message 2457.
Last modified: 2 Apr 2016, 21:05:35 UTC

Just FYI Reb.
I get this when I try to look at my valid tasks:
"Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 64 bytes) in /home/boincadm/projects/sr5/html/inc/db_conn.inc on line 119"

Mankka*


Its working for me but the current work creation with 1M WUs could make some issues with the php RAM reservation.

Update:
I have changed the memory limit in php.ini to 1GB. Can you check if the error is still present?


I still get the same, but it looks like the change didn't update (reboot needed ?), as I get 536870912 bytes (512 MB & not 1 GB) in the error...


Yes, I think so, need to restart Apache but must wait until the WU generation is done.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7232
Credit: 42,729,227
RAC: 31
Message 2459 - Posted: 3 Apr 2016, 9:02:40 UTC

Mankka*: I have restarted Apache but the work generation script is not running anymore. I hope the error is gone now.

Profile Mankka*
Avatar
Send message
Joined: 8 Feb 15
Posts: 40
Credit: 338,892,941
RAC: 0
Message 2460 - Posted: 3 Apr 2016, 13:46:59 UTC - in response to Message 2459.

Mankka*: I have restarted Apache but the work generation script is not running anymore. I hope the error is gone now.


Thanks Reb !

Everything works like a charm now =)


Post to thread

Message boards : Number crunching : server overloaded ???


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther