Long 2 are a joke
log in

Advanced search

Message boards : Number crunching : Long 2 are a joke

1 · 2 · 3 · Next
Author Message
[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 1 Sep 16
Posts: 7
Credit: 567,001
RAC: 0
Message 4755 - Posted: 19 Nov 2018, 19:49:59 UTC

http://srbase.my-firewall.org/sr5/result.php?resultid=359194127

More than 11 days of calculation and "task declared too late" = 0 credit

WTF !!!

I had other 3 running with same duration, I just canceled them.

What a poorly developed application.

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 5426
Credit: 23,904,520
RAC: 11,313
Message 4756 - Posted: 19 Nov 2018, 20:17:14 UTC - in response to Message 4755.

http://srbase.my-firewall.org/sr5/result.php?resultid=359194127

More than 11 days of calculation and "task declared too late" = 0 credit

WTF !!!

I had other 3 running with same duration, I just canceled them.

What a poorly developed application.


It looks like that your CPU is running on HT. The app is highly optimised.

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 1 Sep 16
Posts: 7
Credit: 567,001
RAC: 0
Message 4757 - Posted: 19 Nov 2018, 20:18:49 UTC
Last modified: 19 Nov 2018, 20:19:30 UTC

Yes I have an iMac (late 2009), HT is not an option here (you can't switch it off).

It is a problem ?

I think I remember this is not the first time I have that issue (never ending long 2 task), I have removed that app in my preferences.


(btw I forgot to say "hello" but i was a bit upset, sorry for that)

[SG]Felix
Avatar
Send message
Joined: 25 Dec 17
Posts: 63
Credit: 12,867,590
RAC: 12,084
Message 4758 - Posted: 19 Nov 2018, 20:24:39 UTC

just use mt, it helps to get them back within the deadline

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 1 Sep 16
Posts: 7
Credit: 567,001
RAC: 0
Message 4759 - Posted: 19 Nov 2018, 20:40:07 UTC

I don't understand "just use mt" ?

[SG]Felix
Avatar
Send message
Joined: 25 Dec 17
Posts: 63
Credit: 12,867,590
RAC: 12,084
Message 4760 - Posted: 19 Nov 2018, 21:00:01 UTC

use app_config.xml to say boinc, that llr should use multiple cores for one task

Profile PDW
Send message
Joined: 15 Oct 15
Posts: 41
Credit: 696,427,546
RAC: 474,985
Message 4761 - Posted: 19 Nov 2018, 22:27:41 UTC - in response to Message 4760.

It is in the FAQ thread here http://srbase.my-firewall.org/sr5/forum_thread.php?id=6

Profile Conan
Avatar
Send message
Joined: 7 Dec 14
Posts: 28
Credit: 4,338,656
RAC: 0
Message 4762 - Posted: 20 Nov 2018, 5:46:26 UTC
Last modified: 20 Nov 2018, 6:35:53 UTC

Interesting, I didn't know about the multi-threading on SRBase, I knew about it on Primegrid and have implemented it there.

I have an SRBase Long2 running now for over 10 Days, 4 days past deadline, saying it is at 98.692% and 3.15 hours to go.
Those 3.15 hours will take 1 to 2 days at least to run (they are not very accurate estimates).

The computer is in HT mode (an Intel Xeon) and I wont be changing that.

I will however use that link and get my app_config.xml file created and installed on each computer running this project, in the next few days.

That link is dated to last year, how observant am I?

Thanks for the link and the information.

Conan

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 1 Sep 16
Posts: 7
Credit: 567,001
RAC: 0
Message 4763 - Posted: 20 Nov 2018, 13:46:49 UTC

Regarding the mt : now I remember that, I even think I had implemented it long time ago. But obviously it got lost in my boinc setup somehow...

I will look at that solution again, thanks.


@Conan: I'm afraid that this task running on your machine is lost, when mine terminated (normally) it had been running at 99,99% for some time already but it was several days overdue, and the assimilator or validator (or whatever it is called) had no mercy...

Strangely the other 3 were still running at 99,99% when I decided to kill them - it made no difference for credit, maybe it did a difference for "science" and it was silly to do that by that time after waiting so long, but I was upset.

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 1 Sep 16
Posts: 7
Credit: 567,001
RAC: 0
Message 4764 - Posted: 20 Nov 2018, 21:46:23 UTC

I did the app_config thing, I'll see what happens the next time I get some SRBase in my machine.

Thanks.

Profile Conan
Avatar
Send message
Joined: 7 Dec 14
Posts: 28
Credit: 4,338,656
RAC: 0
Message 4765 - Posted: 21 Nov 2018, 7:15:59 UTC
Last modified: 21 Nov 2018, 7:16:47 UTC

Yes, well, after doing the app_config.xml, I now get nothing but errors.

I will check to see if it is in the wording >-t4< or as I have it >-t 4<, like I have at Primegrid.

Conan

Profile PDW
Send message
Joined: 15 Oct 15
Posts: 41
Credit: 696,427,546
RAC: 474,985
Message 4766 - Posted: 21 Nov 2018, 8:10:17 UTC - in response to Message 4765.

Mine work using <cmdline>-t4</cmdline>

Wuprop only counts 1 thread though.

Profile Conan
Avatar
Send message
Joined: 7 Dec 14
Posts: 28
Credit: 4,338,656
RAC: 0
Message 4767 - Posted: 21 Nov 2018, 9:04:25 UTC

Thanks PDW,

Adjusted that line, so we will see what happens.

Conan

Profile Conan
Avatar
Send message
Joined: 7 Dec 14
Posts: 28
Credit: 4,338,656
RAC: 0
Message 4768 - Posted: 21 Nov 2018, 10:26:21 UTC

I got lucky with that long running task of mine, it ran for over 972,000 seconds (270 hours or 11.25 Days), and I managed to get it back first, so got the points.

That is what single thread does, here's to MultiThreading doing it a lot quicker.

Have to wait now for BOINC to grab new tasks, could be a few days on that machine, it will have to let other work run first.

Conan

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 5426
Credit: 23,904,520
RAC: 11,313
Message 4770 - Posted: 21 Nov 2018, 11:14:54 UTC - in response to Message 4768.

I got lucky with that long running task of mine, it ran for over 972,000 seconds (270 hours or 11.25 Days), and I managed to get it back first, so got the points.

That is what single thread does, here's to MultiThreading doing it a lot quicker.

Have to wait now for BOINC to grab new tasks, could be a few days on that machine, it will have to let other work run first.

Conan


As long as the second result not came back all other WUs are still valid.

Profile Conan
Avatar
Send message
Joined: 7 Dec 14
Posts: 28
Credit: 4,338,656
RAC: 0
Message 4773 - Posted: 22 Nov 2018, 9:56:51 UTC - in response to Message 4767.
Last modified: 22 Nov 2018, 9:58:43 UTC

Thanks PDW,

Adjusted that line, so we will see what happens.

Conan


I have downloaded and am now running a Long2 work unit. After 8 hours it shows over 14% done, which is a lot quicker than before.

@ PDW,
I am showing all 4 threads on a 4 core computer being recorded at WUProp, 8 hours done and showing 31.6 hours, which seem right to me.

I do have an extra line in my app_config.xml file that may be what is different to what you have.
I have
<cmdline>-t4</cmdline>
<avg_ncpus>4</avg_ncpus>

Not sure if this might be how I am getting all threads picked up or not.

Anyway all seems to be working again,

Thanks
Conan

Partygott
Send message
Joined: 14 Dec 16
Posts: 2
Credit: 9,422,932
RAC: 0
Message 4815 - Posted: 16 Dec 2018, 7:52:53 UTC - in response to Message 4773.
Last modified: 16 Dec 2018, 7:54:17 UTC


I have downloaded and am now running a Long2 work unit. After 8 hours it shows over 14% done, which is a lot quicker than before.

@ PDW,
I am showing all 4 threads on a 4 core computer being recorded at WUProp, 8 hours done and showing 31.6 hours, which seem right to me.

I do have an extra line in my app_config.xml file that may be what is different to what you have.
I have
<cmdline>-t4</cmdline>
<avg_ncpus>4</avg_ncpus>

Not sure if this might be how I am getting all threads picked up or not.

Anyway all seems to be working again,

Thanks
Conan


No, you don't have to integrate an extra line with "avg_ncpus" because you don't want to separate between threads and cpu cores.
You should use the "<project_max_concurrent>1</project_max_concurrent>" command, which tells the project to only use one single wu overall.
At least to me this is usefull in a time where most of the wus are long type.
So the app_config in the FAQ becomes useless at the moment where the project uses different apps (and this happens very often), because in this way every of the 12 possible apps is allowed to use the number of cores you set in the config file.
The result is that you still have 4 threads on a 4 core cpu with the little difference that now the OS switches the cores between the tasks, so you can have a long task with 70% cpu, a long3 task with 20% and 2 short tasks with 5% each, instead of 4x 25%. But in the end, no time is won, because you want to use 100% of you cpu to only one single long task.

So a usefull app_config (with a proper formatting, unlike in the FAQ) where all cpu cores (4 in this case) concentrate to only one wu should look like this:

<app_config>

<app>
<name>srbase</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase2</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase3</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase4</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase5</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase6</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase7</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase8</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase9</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase10</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase11</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<app>
<name>srbase12</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase</app_name>
<cmdline>-t4</cmdline>
</app_version>

<project_max_concurrent>1</project_max_concurrent>

<report_results_immediately/>

</app_config>

Bill Michael
Send message
Joined: 26 Dec 18
Posts: 2
Credit: 4,348,295
RAC: 17,988
Message 4850 - Posted: 1 Jan 2019, 19:35:36 UTC - in response to Message 4815.

You have all your app_versions as "srbase" and not "srbase2" "srbase12" etc...

Theadalus
Send message
Joined: 1 May 17
Posts: 1
Credit: 21,103,366
RAC: 0
Message 4851 - Posted: 1 Jan 2019, 23:48:00 UTC

Just cancelled bunch of "long" wu's (I gave it a chance; some wu's were running for 4+ days and at 95%). Short wu's fine, but i'm not going to waste anymore resources to this "long" sh*t!

See ya!

Bill Michael
Send message
Joined: 26 Dec 18
Posts: 2
Credit: 4,348,295
RAC: 17,988
Message 4854 - Posted: 3 Jan 2019, 16:21:11 UTC

Here is what I've wound up using on my 4790K, and it seems to work quite well - I DID have to add the "ncpus" in there so BOINC would report them as using 3 CPUs. I also note that usage never gets over about 130% (100% = 1 core) even though 3 are specified. Other apps run up to 287%. However, it works, I'm not going to mess with it.

I've only included srbase12 here - obviously you can replicate this for the other 11...

<app_config>

<app>
<name>srbase12</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>srbase12</app_name>
<cmdline>-t3</cmdline>
<avg_ncpus>3</avg_ncpus>
</app_version>

<project_max_concurrent>1</project_max_concurrent>
<report_results_immediately/>
</app_config>

I had to abort several "long3" wus as there was no way they were even going to START before the deadline. These deadlines seem very short for such long tasks - I am set for one days work plus one day extra, yet got like 10 long3's, each around 11 hours runtime, that were due three or four days later. Even if no other projects were running I couldn't have made it. It's possible that the overload was due to messing around with app_config, so I'm not sweating that yet.

1 · 2 · 3 · Next
Post to thread

Message boards : Number crunching : Long 2 are a joke


Main page · Your account · Message boards


Copyright © 2014-2021 BOINC Confederation / rebirther