short deadlines cause panic mode
log in

Advanced search

Message boards : Number crunching : short deadlines cause panic mode

Author Message
zombie67 [MM]
Avatar
Send message
Joined: 4 Dec 14
Posts: 13
Credit: 24,226,718
RAC: 1,948
Message 1720 - Posted: 1 Aug 2015, 2:14:40 UTC

Can anything please be done about the extremely short deadlines? They are causing all tasks to run in panic mode, which messes with multiple GPU machines, causing all but one to sit idle.
____________
Dublin, California
Team: SETI.USA

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 3222
Credit: 22,879,153
RAC: 10
Message 1721 - Posted: 1 Aug 2015, 8:05:42 UTC - in response to Message 1720.

Can anything please be done about the extremely short deadlines? They are causing all tasks to run in panic mode, which messes with multiple GPU machines, causing all but one to sit idle.


Do you have set some cores free for the GPU?

zombie67 [MM]
Avatar
Send message
Joined: 4 Dec 14
Posts: 13
Credit: 24,226,718
RAC: 1,948
Message 1727 - Posted: 1 Aug 2015, 15:32:21 UTC
Last modified: 1 Aug 2015, 15:34:21 UTC

Yes, I have the app_config.xml set to reserve a full thread per GPU. he problem is not that the GPUs do not get enough CPU cycles. The problem is with BOINC scheduling. When CPU tasks go into panic mode, they take over all available threads, and only one GPU tasks will run.
____________
Dublin, California
Team: SETI.USA

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 3222
Credit: 22,879,153
RAC: 10
Message 1728 - Posted: 1 Aug 2015, 15:48:16 UTC - in response to Message 1727.
Last modified: 1 Aug 2015, 15:48:26 UTC

Yes, I have the app_config.xml set to reserve a full thread per GPU. he problem is not that the GPUs do not get enough CPU cycles. The problem is with BOINC scheduling. When CPU tasks go into panic mode, they take over all available threads, and only one GPU tasks will run.


You should set your CPU cores in BM to 90%. So you have always 1 core free for GPU.

zombie67 [MM]
Avatar
Send message
Joined: 4 Dec 14
Posts: 13
Credit: 24,226,718
RAC: 1,948
Message 1729 - Posted: 1 Aug 2015, 16:27:02 UTC - in response to Message 1728.
Last modified: 1 Aug 2015, 16:30:50 UTC

That doesn't solve the problem.

But I did find a work-around using the app_config.xml:

<app_config>
<project_max_concurrent>N</project_max_concurrent>
</app_config>

It constrains the number of CPU threads available to the project. Setting this to be number of threads minus the number of threads needed for the GPUs.
____________
Dublin, California
Team: SETI.USA

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 3222
Credit: 22,879,153
RAC: 10
Message 1730 - Posted: 1 Aug 2015, 16:49:29 UTC - in response to Message 1729.

That doesn't solve the problem.

But I did find a work-around using the app_config.xml:


N


It constrains the number of CPU threads available to the project. Setting this to be number of threads minus the number of threads needed for the GPUs.


ok, but its not working with older clients.

zombie67 [MM]
Avatar
Send message
Joined: 4 Dec 14
Posts: 13
Credit: 24,226,718
RAC: 1,948
Message 1731 - Posted: 1 Aug 2015, 17:24:04 UTC

Agreed. Longer deadlines would solve the problem for everyone. Panic mode causes the BOINC client to do abnormal things.
____________
Dublin, California
Team: SETI.USA

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 3222
Credit: 22,879,153
RAC: 10
Message 1732 - Posted: 1 Aug 2015, 17:26:19 UTC - in response to Message 1731.

Agreed. Longer deadlines would solve the problem for everyone. Panic mode causes the BOINC client to do abnormal things.


Only the small bases have a 1 day runtime because if I make it longer and a mix of long and short ones will have a longer waiting time and run out of work.

Ananas
Send message
Joined: 26 Nov 15
Posts: 10
Credit: 370,238
RAC: 0
Message 2177 - Posted: 14 Dec 2015, 8:36:54 UTC - in response to Message 1732.
Last modified: 14 Dec 2015, 8:49:40 UTC

Something that would help - but I'm not sure if it can be done so easily :

Delay the resend of the long ones after a timeout by 2 days without changing the deadline itself.

This would give the slower hosts a better chance to return them before the server side scheduler sends them out a second time, even if they are somewhat behind the deadline.

According to the server status page, they currently have a worst runtime of 52+ hours, hard to do within the deadline if they are not on top of the stack on client side.

p.s.: if it is hard to do by application ... this behaviour wouldn't hurt for any kind of results, I even think it should be default for BOINC servers. The client still would do its best to stay within the deadline but the server would be more tolerant. I made that proposal a few years ago but the Berkeley guys didn't pick up the idea :-/

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 3222
Credit: 22,879,153
RAC: 10
Message 2180 - Posted: 14 Dec 2015, 17:48:52 UTC - in response to Message 2177.

Something that would help - but I'm not sure if it can be done so easily :

Delay the resend of the long ones after a timeout by 2 days without changing the deadline itself.

This would give the slower hosts a better chance to return them before the server side scheduler sends them out a second time, even if they are somewhat behind the deadline.

According to the server status page, they currently have a worst runtime of 52+ hours, hard to do within the deadline if they are not on top of the stack on client side.

p.s.: if it is hard to do by application ... this behaviour wouldn't hurt for any kind of results, I even think it should be default for BOINC servers. The client still would do its best to stay within the deadline but the server would be more tolerant. I made that proposal a few years ago but the Berkeley guys didn't pick up the idea :-/


I have reduced the deadline from 6 to 4 days for the long runners. As long as the first WU was sending back and the second was not running by another host the server is sending an abort signal.


Post to thread

Message boards : Number crunching : short deadlines cause panic mode


Main page · Your account · Message boards


Copyright © 2014-2017 BOINC Confederation / rebirther