Riesel Base Short running too long
log in

Advanced search

Message boards : Number crunching : Riesel Base Short running too long

Author Message
Roadranner
Send message
Joined: 10 Dec 14
Posts: 7
Credit: 110,895,616
RAC: 537,230
Message 2359 - Posted: 15 Feb 2016, 0:50:57 UTC
Last modified: 15 Feb 2016, 1:02:15 UTC

8 wus of Riesel Base Short running now for 6 hours on machine #977.


Edit:

After deleting the 8 wus, the machine seems to run fine.

Roadranner
Send message
Joined: 10 Dec 14
Posts: 7
Credit: 110,895,616
RAC: 537,230
Message 2372 - Posted: 15 Feb 2016, 13:46:09 UTC

Today one Riesel Base Short wu deleted running 12 hours (pc #3216)

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,686,081
RAC: 42,669
Message 2374 - Posted: 15 Feb 2016, 15:20:22 UTC - in response to Message 2372.
Last modified: 15 Feb 2016, 15:20:30 UTC

Today one Riesel Base Short wu deleted running 12 hours (pc #3216)


Can you run this WU in standalone? If we can reproduce this I can contact the devs. At the moment all is running fine on my host.

Roadranner
Send message
Joined: 10 Dec 14
Posts: 7
Credit: 110,895,616
RAC: 537,230
Message 2375 - Posted: 15 Feb 2016, 15:38:39 UTC - in response to Message 2374.

Sorry reb, I don't know how to receive a deleted wu again.
It will be easier for you to see, how the wu runs on another pc.
Furthermore I try to prevent testing on that pc because it's an external (productive) machine.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,686,081
RAC: 42,669
Message 2377 - Posted: 15 Feb 2016, 15:42:57 UTC - in response to Message 2375.

Sorry reb, I don't know how to receive a deleted wu again.
It will be easier for you to see, how the wu runs on another pc.
Furthermore I try to prevent testing on that pc because it's an external (productive) machine.


If this happens again let me know the WU ID or link to.

Roadranner
Send message
Joined: 10 Dec 14
Posts: 7
Credit: 110,895,616
RAC: 537,230
Message 2378 - Posted: 15 Feb 2016, 15:45:42 UTC - in response to Message 2377.

wu name: R63_23-25k_826.19-904.76M_wu_34456

wu id: #116270456

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,686,081
RAC: 42,669
Message 2379 - Posted: 15 Feb 2016, 15:48:56 UTC - in response to Message 2378.
Last modified: 15 Feb 2016, 15:52:03 UTC

wu name: R63_23-25k_826.19-904.76M_wu_34456

wu id: #116270456


Did you have the problem also with the older app? Perhaps update to the latest BOINC client.

Edit:
If you have another WU with the same issue send me the stderr.txt content from the slot dir.

Roadranner
Send message
Joined: 10 Dec 14
Posts: 7
Credit: 110,895,616
RAC: 537,230
Message 2380 - Posted: 15 Feb 2016, 16:08:25 UTC - in response to Message 2379.

No problems with the older app.
Boinc client varies from 7.2.42 (Ubuntu) to 7.6.22 (Win) on my pcs.
I'm not updating pc #3216 at the moment, because I don't want to interrupt a long running Numberfields-wu ( 36 days now) which we were asked to complete.

KWSN Sir Clark
Send message
Joined: 22 Apr 16
Posts: 3
Credit: 28,883
RAC: 0
Message 2531 - Posted: 4 May 2016, 22:41:13 UTC

I'm getting some that run about 30 mins and some which run quickly.

The higher the percentage the slower it seems to progress.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,686,081
RAC: 42,669
Message 2532 - Posted: 5 May 2016, 1:26:20 UTC - in response to Message 2531.

I'm getting some that run about 30 mins and some which run quickly.

The higher the percentage the slower it seems to progress.


You are running different apps. The timesteps are constant in log.

KWSN Sir Clark
Send message
Joined: 22 Apr 16
Posts: 3
Credit: 28,883
RAC: 0
Message 2533 - Posted: 5 May 2016, 8:18:30 UTC - in response to Message 2532.
Last modified: 5 May 2016, 8:18:55 UTC

I'm wondering why the short ones are taking as long as the non-short ones.

With only a 24 deadline if you're getting a lot of short ones that take a lot longer that expected then it's potentially easy to not make the deadline on others.

Not sure what you mean by different apps. It was the short ones that were taking ages

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,686,081
RAC: 42,669
Message 2534 - Posted: 5 May 2016, 8:59:28 UTC - in response to Message 2533.

I'm wondering why the short ones are taking as long as the non-short ones.

With only a 24 deadline if you're getting a lot of short ones that take a lot longer that expected then it's potentially easy to not make the deadline on others.

Not sure what you mean by different apps. It was the short ones that were taking ages


You should not missing the deadline because the server stores the FLOP counts of the WUs and you will only get as many WUs as you can process in time. There is also a difference if you have AVX / AVX2 or nothing.

denravonska
Send message
Joined: 8 Dec 16
Posts: 4
Credit: 11,488,121
RAC: 0
Message 3132 - Posted: 15 Dec 2016, 6:19:02 UTC

This seems to be a recurring issue for me on the older computers (i686 and CoreDuo). I frequently have to monitor and cancel tasks which run for far too long. I don't recall seeing this on Core2Duo and up.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,686,081
RAC: 42,669
Message 3133 - Posted: 15 Dec 2016, 18:42:33 UTC - in response to Message 3132.

This seems to be a recurring issue for me on the older computers (i686 and CoreDuo). I frequently have to monitor and cancel tasks which run for far too long. I don't recall seeing this on Core2Duo and up.


Please note that some older CPUs have no AVX which is a big advantage in runtime.

denravonska
Send message
Joined: 8 Dec 16
Posts: 4
Credit: 11,488,121
RAC: 0
Message 3140 - Posted: 16 Dec 2016, 5:30:55 UTC - in response to Message 3133.
Last modified: 16 Dec 2016, 5:33:38 UTC

This seems to be a recurring issue for me on the older computers (i686 and CoreDuo). I frequently have to monitor and cancel tasks which run for far too long. I don't recall seeing this on Core2Duo and up.


Please note that some older CPUs have no AVX which is a big advantage in runtime.


Fair enough. Right now I have three tasks estimated to run for a minute or so, but they have been running for 23 hours. "Estimated app speed 1.09 GFLOP/s, estimated task size 80 GFLOPs". The work units are:

Comp #1
S148_900-950k_wu_4159
S148_850-900k_wu_2020

Comp #2
S148_950k-1M_wu_7209

Edit: These are for riesel long, not short.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7479
Credit: 43,686,081
RAC: 42,669
Message 3142 - Posted: 16 Dec 2016, 18:58:30 UTC - in response to Message 3140.

This seems to be a recurring issue for me on the older computers (i686 and CoreDuo). I frequently have to monitor and cancel tasks which run for far too long. I don't recall seeing this on Core2Duo and up.


Please note that some older CPUs have no AVX which is a big advantage in runtime.


Fair enough. Right now I have three tasks estimated to run for a minute or so, but they have been running for 23 hours. "Estimated app speed 1.09 GFLOP/s, estimated task size 80 GFLOPs". The work units are:

Comp #1
S148_900-950k_wu_4159
S148_850-900k_wu_2020

Comp #2
S148_950k-1M_wu_7209

Edit: These are for riesel long, not short.


Thats normal. If you run a new app the first time the server need some results to calculate the average runtime. You can see it after some WUs.

denravonska
Send message
Joined: 8 Dec 16
Posts: 4
Credit: 11,488,121
RAC: 0
Message 3144 - Posted: 17 Dec 2016, 5:45:28 UTC - in response to Message 3142.


Thats normal. If you run a new app the first time the server need some results to calculate the average runtime. You can see it after some WUs.


Aha, then I'll let them run. Thanks :)


Post to thread

Message boards : Number crunching : Riesel Base Short running too long


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther